llvm-project

Commit Graph

Author	SHA1	Message	Date
Elia Geretto	2627f99613	[dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075 The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary. Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D101048	2021-04-22 21:12:20 +00:00
Arthur Eubanks	16ff1a7023	[GlobalOpt] Don't replace alias with aliasee if aliasee is interposable Both the alias and aliasee linkage are important. PR27866 provides some background. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99629	2021-04-22 13:12:34 -07:00
Philip Reames	15e19a2599	Revert "[instcombine] Exploit UB implied by nofree attributes" This change effectively reverts `86664638`, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert. Why revert this now? Two main reasons: 1) There are continuing discussion around what the semantics of nofree. I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes. 2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443). At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs. Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree. In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.	2021-04-22 10:53:17 -07:00
Jianzhou Zhao	7fdf270965	[dfsan] Track origin at loads The first version of origin tracking tracks only memory stores. Although this is sufficient for understanding correct flows, it is hard to figure out where an undefined value is read from. To find reading undefined values, we still have to do a reverse binary search from the last store in the chain with printing and logging at possible code paths. This is quite inefficient. Tracking memory load instructions can help this case. The main issues of tracking loads are performance and code size overheads. With tracking only stores, the code size overhead is 38%, memory overhead is 1x, and cpu overhead is 3x. In practice #load is much larger than #store, so both code size and cpu overhead increases. The first blocker is code size overhead: link fails if we inline tracking loads. The workaround is using external function calls to propagate metadata. This is also the workaround ASan uses. The cpu overhead is ~10x. This is a trade off between debuggability and performance, and will be used only when debugging cases that tracking only stores is not enough. Reviewed By: gbalats Differential Revision: https://reviews.llvm.org/D100967	2021-04-22 16:25:24 +00:00
Alexey Bataev	18c61fc498	[SLP]Skip undefs trying to find perfect/shuffled tree entries matching. We can skip check for undefs trying to find perfect/shuffled tree entries matching, they can be ignored completely improving the final cost/vectorization results. Differential Revision: https://reviews.llvm.org/D101061	2021-04-22 08:59:07 -07:00
Joe Ellis	2c551aedcf	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	d4f5f23bbb	[SLP]Replace more `TTI` with `TTIRef`, NFC. To pacify MSVC buildbots.	2021-04-22 07:53:20 -07:00
Alexey Bataev	da2cdfd421	[SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC.	2021-04-22 07:49:48 -07:00
Alexey Bataev	e99b98cb1b	[SLP]Improve cost model for the vectorized extractelements. 1. No need to call `areAllUsersVectorized` as later the cost is calculated only if the instruction has one use and gets vectorized. 2. Need to calculate the cost of the dead extractelement more precisely, taking the vector type of the vector operand, not the resulting vector type. Part of D57059. Differential Revision: https://reviews.llvm.org/D99980	2021-04-22 07:40:17 -07:00
Dawid Jurczak	57f443c348	[SimplifyLibCalls][NFC] Use StringRef::back instead explicit indexing. Split off from D100724. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D101032	2021-04-22 15:02:47 +02:00
David Sherwood	5a229a6702	[LoopVectorize] Don't create unnecessary vscale intrinsic calls In quite a few cases in LoopVectorize.cpp we call createStepForVF with a step value of 0, which leads to unnecessary generation of llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale and createStepForVF to return 0 when attempting to multiply vscale by 0. Differential Revision: https://reviews.llvm.org/D100763	2021-04-22 09:01:52 +01:00
Max Kazantsev	8fe62b7af1	[GVN] Introduce loop load PRE This patch allows PRE of the following type of loads: ``` preheader: br label %loop loop: br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p br label %merge merge: ... br i1 ..., label %loop, label %exit ``` Into ``` preheader: %x0 = load %p br label %loop loop: %x.pre = phi(x0, x2) br i1 ..., label %merge, label %clobber clobber: call foo() // Clobbers %p %x1 = load %p br label %merge merge: x2 = phi(x.pre, x1) ... br i1 ..., label %loop, label %exit ``` So instead of loading from %p on every iteration, we load only when the actual clobber happens. The typical pattern which it is trying to address is: hot loop, with all code inlined and provably having no side effects, and some side-effecting calls on cold path. The worst overhead from it is, if we always take clobber block, we make 1 more load overall (in preheader). It only matters if loop has very few iteration. If clobber block is not taken at least once, the transform is neutral or profitable. There are several improvements prospect open up: - We can sometimes be smarter in loop-exiting blocks via split of critical edges; - If we have block frequency info, we can handle multiple clobbers. The only obstacle now is that we don't know if their sum is colder than the header. Differential Revision: https://reviews.llvm.org/D99926 Reviewed By: reames	2021-04-22 12:50:38 +07:00
Chuanqi Xu	77ca2a6893	[Coroutine] Collect CoroBegin if all of terminators are dominated by one coro.destroy Summary: The original logic seems to be we could collecting a CoroBegin if one of the terminators could be dominated by one of coro.destroy, which doesn't make sense. This patch rewrites the logics to collect CoroBegin if all of terminators are dominated by one coro.destroy. If there is no such coro.destroy, we would call hasEscapePath to evaluate if we should collect it. Test Plan: check-llvm Reviewed by: lxfind Differential Revision: https://reviews.llvm.org/D100614	2021-04-22 11:21:37 +08:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Fangrui Song	775a9483e5	[IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables On ELF targets, if a function has uwtable or personality, or does not have nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module. Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified. (i.e. If -g[123], every function gets `.eh_frame`. This behavior is strange but that is the status quo on GCC and Clang.) Let's take asan as an example. Other sanitizers are similar. `asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true, so every function gets `.eh_frame` if `-g[123]` is specified. This is the root cause that `-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame while `-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame. This patch * sets the nounwind attribute on sanitizer module ctor/dtor. * let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute. The "uwtable" mechanism is generic: synthesized functions not cloned/specialized from existing ones should consider `Function::createWithDefaultAttr` instead of `Function::create` if they want to get some default attributes which have more of module semantics. Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955 https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc. Differential Revision: https://reviews.llvm.org/D100251	2021-04-21 15:58:20 -07:00
Olle Fredriksson	f5446b769a	[MemCpyOpt] Allow variable lengths in memcpy optimizer This makes the memcpy-memcpy and memcpy-memset optimizations work for variable sizes as long as they are equal, relaxing the old restriction that they are constant integers. If they're not equal, the old requirement that they are constant integers with certain size restrictions is used. The implementation works by pushing the length tests further down in the code, which reveals some places where it's enough that the lengths are equal (but not necessarily constant). Differential Revision: https://reviews.llvm.org/D100870	2021-04-21 23:23:38 +02:00
Arthur Eubanks	b606e2df4d	[Evaluator] Bitcast result of pointer stripping Trying to evaluate a GEP would assert with "Ty == cast<PointerType>(C->getType()->getScalarType())->getElementType()" because the type of the pointer we would evaluate the GEP argument to would be a different type than the GEP was expecting. We should treat pointer stripping as a bitcast. The test adds a redundant GEP that would crash due to type mismatch. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D100970	2021-04-21 13:32:29 -07:00
Nikita Popov	24e9fbc1a3	Revert "[InstCombine] Fold multiuse shr eq zero" This reverts commit `9423f78240`. A performance regression with this patch has been reported at https://reviews.llvm.org/rG9423f78240a2#990953. Reverting for now.	2021-04-21 21:40:52 +02:00
sstefan1	62cdcd6c5a	[FuncAttrs] Don't infer willreturn for nonexact definitions Discovered during attributor testing comparing stats with and without the attributor. Willreturn should not be inferred for nonexact definitions. Differential Revision: https://reviews.llvm.org/D100988	2021-04-21 21:26:09 +02:00
sstefan1	656ebd519e	[SimplifyLibCalls] Don't change alignment when creating memset Fix for PR49984 This was discovered during Attributor testing. Memset was always created with alignment of 1 and in case when strncpy alignment was changed it triggered an assertion in the AttrBuilder. Memset will now be created with appropriate alignment. Differential Revision: https://reviews.llvm.org/D100875	2021-04-21 20:34:13 +02:00
Nico Weber	ba7a92c01e	[Support] Don't include VirtualFileSystem.h in CommandLine.h CommandLine.h is indirectly included in ~50% of TUs when building clang, and VirtualFileSystem.h is large. (Already remarked by jhenderson on D70769.) No behavior change. Differential Revision: https://reviews.llvm.org/D100957	2021-04-21 10:19:01 -04:00
George Balatsouras	79b5280a6c	[dfsan] Enable origin tracking with fast8 mode All related instrumentation tests have been updated. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D100903	2021-04-20 18:10:32 -07:00
Arthur Eubanks	326da4adcb	[FuncAttrs] Always preserve FunctionAnalysisManagerCGSCCProxy FunctionAnalysisManagerCGSCCProxy should not be preserved if any of its keys may be invalid. Since we are not removing/adding functions in FuncAttrs, it's fine to preserve it. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D100893	2021-04-20 16:37:45 -07:00
Reid Kleckner	91f7a4fff7	Revert "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)" This reverts commit `13ec913bdf`. This commit introduces new uses of the overflow checking intrinsics that depend on implementations in compiler-rt, which Windows users generally do not link against. I filed an issue (somewhere) to make clang auto-link the builtins library to resolve this situation, but until that happens, it isn't reasonable for the optimizer to introduce new link time dependencies.	2021-04-20 15:53:34 -07:00
Philip Reames	4824d876f0	Revert "Allow invokable sub-classes of IntrinsicInst" This reverts commit `d87b9b81cc`. Post commit review raised concerns, reverting while discussion happens.	2021-04-20 15:38:38 -07:00
Roman Lebedev	5a654bfeab	Revert "[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543)" I forgot about the case where we sign-extend to width smaller than the original. This reverts commit `1e6ca23ab8`.	2021-04-21 01:11:15 +03:00
Roman Lebedev	1e68d338c1	Revert "[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543)" I forgot about the case where we sign-extend to width smaller than the original. This reverts commit `41b71f718b`.	2021-04-21 01:11:14 +03:00
Philip Reames	d87b9b81cc	Allow invokable sub-classes of IntrinsicInst It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked. This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke. After this lands and has baked for a couple days, planned cleanups: Make GCStatepointInst a IntrinsicInst subclass. Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor. Do the same in SelectionDAG. Do the same in FastISEL. Differential Revision: https://reviews.llvm.org/D99976	2021-04-20 15:03:49 -07:00
Roman Lebedev	41b71f718b	[InstCombine] "Bypass" NUW trunc of lshr if we are going to sext the result (PR49543) This is a more convoluted form of the same pattern "sext of NSW trunc", but in this case the operand of trunc was a right-shift, and the truncation chops off just the zero bits that were shifted-in.	2021-04-21 00:31:46 +03:00
Roman Lebedev	1e6ca23ab8	[InstCombine] `sext(trunc(x)) --> sext(x)` iff trunc is NSW (PR49543) If we can tell that trunc only chops off sign bits, and not all of them, then we can simply sign-extend the trunc's source.	2021-04-21 00:31:45 +03:00
Sanjay Patel	1e202e8f39	[InstCombine] fold shift-of-srem-by-2 to mask+shift There are several potential srem-by-2 folds because the result is known {-1,0,1}. https://alive2.llvm.org/ce/z/LuVyeK	2021-04-20 17:10:16 -04:00
Roman Lebedev	13ec913bdf	[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769) We already had support for it's unsigned variant, so simply extend it to also handle the signed variant. Fixes https://bugs.llvm.org/show_bug.cgi?id=48769	2021-04-20 21:29:43 +03:00
Joseph Huber	b2ad63d3cf	[OpenMP] Add OpenMPOpt as a Module pass Summary: This patch registers OpenMPOpt as a Module pass in addition to a CGSCC pass. This is so certain optimzations that are sensitive to intact call-sites can happen before inlining. The old `openmpopt` pass name is changed to `openmp-opt-cgscc` and `openmp-opt` calls the Module pass. The current module pass only runs a single check but will be expanded in the future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D99202	2021-04-20 12:28:58 -04:00
Alexey Bataev	af870e11ae	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 09:08:46 -07:00
Philip Reames	3b1474cab2	free(nullptr) does not violate the nofree specification This fixes a subtle and nasty bug in my `86664638`. The problem is that free(nullptr) is well defined (and common). The specification for the nofree attributes talks about memory objects, and doesn't explicitly address null, but I think it's reasonable to assume that nofree doesn't disallow a call to free(nullptr). If it did, we'd have to prove nonnull on an argument to ever infer nofree which doesn't seem to be the intent. This was found by Nuno and Alive2 over in https://reviews.llvm.org/D100141#2697374. Differential Revision: https://reviews.llvm.org/D100779	2021-04-20 09:08:05 -07:00
Alexey Bataev	b82344a019	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `daf6e18c55` to fix the compiler crash.	2021-04-20 08:29:32 -07:00
Alexey Bataev	daf6e18c55	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 07:46:49 -07:00
Alexey Bataev	cf00cb8bed	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `b232771aca` to fix buildbots.	2021-04-20 07:16:11 -07:00
Alexey Bataev	b232771aca	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 06:55:55 -07:00
Sander de Smalen	86729538bd	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Luo, Yuanke	bcdaccfe34	[X86][AMX] Verify illegal types or instructions for x86_amx. This patch is related to https://reviews.llvm.org/D100032 which define some illegal types or operations for x86_amx. There are no arguments, arrays, pointers, vectors or constants of x86_amx. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D100472	2021-04-20 16:14:22 +08:00
Arthur Eubanks	5e71b9fa93	Explicitly pass type to cast load constant folding result Previously we would use the type of the pointee to determine what to cast the result of constant folding a load. To aid with opaque pointer types, we should explicitly pass the type of the load rather than looking at pointee types. ConstantFoldLoadThroughBitcast() converts the const prop'd value to the proper load type (e.g. [1 x i32] -> i32). Instead of calling this in every intermediate step like bitcasts, we only call this when we actually see the global initializer value. In some existing uses of this API, we don't know the exact type we're loading from immediately (e.g. first we visit a bitcast, then we visit the load using the bitcast). In those cases we have to manually call ConstantFoldLoadThroughBitcast() when simplifying the load to make sure that we cast to the proper type. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100718	2021-04-20 00:53:21 -07:00
Dávid Bolvanský	324d641b75	[InstCombine] Enhance deduction of alignment for aligned_alloc This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch. > The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment. Currently, we simply bail out if we see a non-constant size - change that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100785	2021-04-20 02:04:18 +02:00
Alexey Bataev	8030481065	Revert "[SLP]Add detection of shuffled/perfect matching of tree entries." This reverts commit `d6fde91379` to fix compiler crashes.	2021-04-19 14:10:04 -07:00
Zequan Wu	e28435caf6	[ThinLTO] Copy UnnamedAddr when spliting module. The unnamedaddr property of a function is lost when using `-fwhole-program-vtables` and thinlto which causes size increase under linker's safe icf mode. The size increase of chrome on Linux when switching from all icf to safe icf drops from 5 MB to 3 MB after this change, and from 6 MB to 4 MB on Windows. There is a repro: ``` # a.h struct A { virtual int f(); virtual int g(); }; # a.cpp #include "a.h" int A::f() { return 10; } int A::g() { return 10; } # main.cpp #include "a.h" int g(A* a) { return a->f(); } int main(int argv, char** args) { A a; return g(&a); } $ clang++ -O2 -ffunction-sections -flto=thin -fwhole-program-vtables -fsplit-lto-unit -c main.cpp -o main.o && clang++ -Wl,--icf=safe -fuse-ld=lld -flto=thin main.o -o a.out && llvm-readobj -t a.out \| grep -A 1 -e _ZN1A1fEv -e _ZN1A1gEv Name: _ZN1A1fEv (480) Value: 0x201830 -- Name: _ZN1A1gEv (490) Value: 0x201840 ``` Differential Revision: https://reviews.llvm.org/D100498	2021-04-19 14:04:58 -07:00
Alexey Bataev	d6fde91379	[SLP]Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Differential Revision: https://reviews.llvm.org/D100495	2021-04-19 13:29:30 -07:00
Philip Reames	3c54762226	[funcattrs] Consistently check call site attributes This is mostly stylistic cleanup after D100226, but not entirely. When skimming the code, I found one case where we weren't accounting for attributes on the callsite at all. I'm also suspicious we had some latent bugs related to operand bundles (which are supposed to be able to override attributes on declarations), but I don't have concrete test cases for those, just suspicions. Aside: The only case left in the file which directly checks attributes on the declaration is the norecurse logic. I left that because I didn't understand it; it looks obviously wrong, so I suspect I'm misinterpreting the intended semantics of the attribute. Differential Revision: https://reviews.llvm.org/D100689	2021-04-19 13:20:50 -07:00
Philip Reames	01801d5274	[rs4gc] Fix a latent bug around attribute stripping for intrinsics This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032). The story on this is a bit involved. Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic. Why? Because the lowering of the intrinsic relies on the presence of the readonly attribute. We don't have a matcher to select the case where there's a glue node needed. Now, on the surface, this still seems like a codegen bug. However, here it gets fun. I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail. This reproduces only when RS4GC and codegen are run in the same process and context. Why? Because it turns out we can't roundtrip the stripped attribute through serialized IR! We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td. This makes it impossible to exercise SelectionDAG from a standalone test case. At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future. As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted. I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.	2021-04-19 13:14:07 -07:00
Nikita Popov	9423f78240	[InstCombine] Fold multiuse shr eq zero The single-use case is handled implicity by converting the icmp into a mask check first. When comparing with zero in particular, we don't need the one-use restriction, as we only produce a single icmp. https://alive2.llvm.org/ce/z/MSixcm https://alive2.llvm.org/ce/z/GwpG0M	2021-04-19 22:13:11 +02:00
Nikita Popov	d440f9a326	[LICM] Make capture check more precise During store promotion, we check whether the pointer was captured to exclude potential reads from other threads. However, we're only interested in captures before or inside the loop. Check this using PointerMayBeCapturedBefore against the loop header. Differential Revision: https://reviews.llvm.org/D100706	2021-04-19 20:34:23 +02:00
Roman Lebedev	d746fefb6f	[SCEVExpander] ReuseOrCreateCast(): use IRBuilder to actually create the cast In particular, this allows to create constant expressions instead of IR Instruction's if the argumen is a constant.	2021-04-19 18:38:39 +03:00
Roman Lebedev	ecc9d7e913	[SCEVExpander] Expand explicit PtrToInt casts just like we would implicit ones I.e., use GetOptimalInsertionPointForCastOf() helper to get the insertion point, and try to reuse casts first.	2021-04-19 18:38:39 +03:00
Roman Lebedev	442c408e0e	[SCEVExpander] GetOptimalInsertionPointForCastOf(): gracefully handle Constant's I guess this case hasn't come up thus far, and i'm not sure if it can really happen for the existing usages, thus no test in this commit. But, the following commit adds test coverage, there we'd expirience a crash without this fix.	2021-04-19 18:38:39 +03:00
Roman Lebedev	b8a3705896	[NFCI][SCEVExpander] Extract GetOptimalInsertionPointForCastOf() helper	2021-04-19 18:38:38 +03:00
Roman Lebedev	73f60e3988	[SCEVExpander] generateOverflowCheck(): explicitly PtrToInt the Start Currently, InsertNoopCastOfTo() would implicitly insert that cast, but now that we have SCEVPtrToIntExpr, i'm hoping we could stop InsertNoopCastOfTo() from doing that. But first all users must be fixed.	2021-04-19 18:38:38 +03:00
Cullen Rhodes	f0bc2782f2	[TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100377	2021-04-19 11:01:34 +00:00
OCHyams	0ebf9a8e34	[DebugInfo] Move the findDbg* functions into DebugInfo.cpp Move the findDbg* functions into lib/IR/DebugInfo.cpp from lib/Transforms/Utils/Local.cpp. D99169 adds a call to a function (findDbgUsers) that lives in lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp (LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp which is part of LLVMCore. Reviewed By: dblaikie, rnk Differential Revision: https://reviews.llvm.org/D100632	2021-04-19 10:30:25 +01:00
Evgeniy Brevnov	35e95c6817	[CVP] processCallSite returns wrong status Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true' even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100538	2021-04-19 12:13:22 +07:00
Xun Li	5faba87938	Revert "[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass" This reverts commit `fa6b54c44a`. The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.	2021-04-18 17:22:28 -07:00
Xun Li	fa6b54c44a	[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining. The presplit coroutine attributes are set in CoroEarly pass. However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine. This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920 To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass. Reviewed By: rjmccall, ChuanqiXu Differential Revision: https://reviews.llvm.org/D100282	2021-04-18 15:41:09 -07:00
Xun Li	c0211e8d7d	Revert "[Coroutines] Move CoroEarly pass to before AlwaysInliner" This reverts commit `2b50f5a434`. Forgot to update the description of the commit to sync with phabricator. Going to redo the commit.	2021-04-18 15:38:19 -07:00
Xun Li	2b50f5a434	[Coroutines] Move CoroEarly pass to before AlwaysInliner Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining. The presplit coroutine attributes are set in CoroEarly pass. However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine. This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920 Differential Revision: https://reviews.llvm.org/D100282	2021-04-18 14:54:04 -07:00
Juneyoung Lee	1c10201d96	Update InstCombine to use undef matcher instead This is a patch to use m_Undef() matcher instead of isa<UndefValue>(). As suggested in D100122, this update is separately committed.	2021-04-18 11:05:36 +09:00
Florian Hahn	af523514c4	[SimplifyCFG] Skip dbg intrinsics when checking for branch-only BBs. Debug intrinsics are free to hoist and should be skipped when looking for terminator-only blocks. As a consequence, we have to delegate to the main hoisting loop to hoist any dbg intrinsics instead of jumping to the terminator case directly. This fixes PR49982. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100640	2021-04-17 15:17:50 +01:00
Nikita Popov	e68b12c99e	[Inline] Don't add noalias metadata to inaccessiblememonly calls It will not do anything useful for them, as we already know that they don't modref with any accessible memory. In particular, this prevents noalias metadata from being placed on noalias.scope.decl intrinsics. This reduces the amount of metadata needed, and makes it more likely that unnecessary decls can be eliminated.	2021-04-17 14:56:13 +02:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Philip Reames	11707435cc	[inferattrs] Don't infer lib func attributes for nobuiltin functions If we have a nobuiltin function, we can't assume we know anything about the implementation. I noticed this when tracing through a log from an in the wild miscompile (https://github.com/emscripten-core/emscripten/issues/9443) triggered after `8666463`. We were incorrectly assuming that a custom allocator could not free. (It's not clear yet this is the only problem in said issue.) I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history. Through, from what I can tell, that commit fixed symptom not root cause. The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner. I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.	2021-04-16 15:36:15 -07:00
Philip Reames	f549176ad9	[funcattrs] Add the maximal set of implied attributes to definitions Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time. Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now. The old behavior can end up quite confusing for two reasons: * Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.) * We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent. I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement. Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own. Differential Revision: https://reviews.llvm.org/D100226	2021-04-16 14:22:19 -07:00
Philip Reames	ff55d01a8e	[nofree] Restrict semantics to memory visible to caller This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee. This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute". The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D100141	2021-04-16 11:38:55 -07:00
Marcythm	f8cf3b9931	[LICM][NFC] Fix typo fixed some typos which may lead to misunderstandings in LICM.cpp Reviewed By: nikic, asbirlea Differential Revision: https://reviews.llvm.org/D100470	2021-04-16 09:42:00 +08:00
Arthur Eubanks	9c776c2fa2	[NFC][NewPM] Remove some AnalysisManager invalidate methods These were misleading, they're more of a "clear" than an "invalidate". We shouldn't be individually clearing analysis results. Either we clear all analyses when some IR becomes invalid, or we properly go through invalidation. There was only one use of this, which can be simulated with AM.invalidate(F, PA). Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D100519	2021-04-15 16:51:26 -07:00
Florian Hahn	3e7ee5428d	[InferAttrs] Do not mark first argument of str(n)cat as writeonly. str(n)cat appends a copy of the second argument to the end of the first argument. To find the end of the first argument, str(n)cat has to read from it until it finds the terminating 0. So it should not be marked as writeonly. I think this means the argument should not be marked as writeonly. (This is causing a mis-compile with legacy DSE, before it got removed) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D100601	2021-04-15 23:00:21 +01:00
Florian Hahn	49999d4364	[VPlan] Replace a few unnecessary includes with forward decls.	2021-04-15 20:08:31 +01:00
Danilo C. Grael	55487079a9	[LoopUnrollAndJam] Avoid repeated instructions for UAJ analysis Avoid visiting repeated instructions for processHeaderPhiOperands as it can cause a scenario of endless loop. Test case is attached and can be ran with `opt -basic-aa -tbaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D97407	2021-04-15 12:59:42 -04:00
Mark Johnston	f511dc75e4	[asan] Add an offset for the kernel address sanitizer on FreeBSD This is based on a port of the sanitizer runtime to the FreeBSD kernel that has been commited as https://cgit.freebsd.org/src/commit/?id=38da497a4dfcf1979c8c2b0e9f3fa0564035c147 and the following commits. Reviewed By: emaste, dim Differential Revision: https://reviews.llvm.org/D98285	2021-04-15 17:49:00 +01:00
Stelios Ioannou	bf147c4653	[LSR] Fix for pre-indexed generated constant offset This patch changed the isLegalUse check to ensure that LSRInstance::GenerateConstantOffsetsImpl generates an offset that results in a legal addressing mode and formula. The check is changed to look similar to the assert check used for illegal formulas. Differential Revision: https://reviews.llvm.org/D100383 Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a	2021-04-15 16:44:42 +01:00
Florian Hahn	6adebe3fd2	[VPlan] Add VPRecipeBase::mayHaveSideEffects. Add an initial version of a helper to determine whether a recipe may have side-effects. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100259	2021-04-15 11:49:40 +01:00
David Sherwood	ea14df695e	[SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction There were a few places in widenPHIInstruction where calculations of offsets were failing to take the runtime calculation of VF into account for scalable vectors. I've fixed those cases in this patch as well as adding an assert that we should not be scalarising for scalable vectors. Tests are added here: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D99254	2021-04-15 10:51:49 +01:00
David Sherwood	7120f89f7d	[NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts There are a few places in LoopVectorize.cpp where we have been too cautious in adding VF.isScalable() asserts and it can be confusing. It also makes it more difficult to see the genuine places where work needs doing to improve scalable vectorization support. This patch changes getMemInstScalarizationCost to return an invalid cost instead of firing an assert for scalable vectors. Also, vectorizeInterleaveGroup had multiple asserts all for the same thing. I have removed all but one assert near the start of the function, and added a new assert that we aren't dealing with masks for scalable vectors. Differential Revision: https://reviews.llvm.org/D99727	2021-04-15 09:41:03 +01:00
Florian Hahn	5a3ff24b12	[NewGVN] Add phi-of-ops operands if no real PHI is created. If the PHI-of-ops simplifies to an existing value, no real PHI is created, which means the dependencies between the PHI-of-ops and its operands is not materialized in IR. At the moment, we fail to create a real PHI node for the PHI-of-ops, because the PHI-of-ops root instruction is not re-visited if one of the PHI-of-ops operands changes. We need to add the operands as additional users in this case. Even with this patch, there are still some dependencies missing. I will continue tackling the outstanding reporeted crashes in this area. Fixes PR36501, PR42422, PR42557. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D66924	2021-04-15 08:25:10 +01:00
Philip Reames	dd985551c2	Reapply "[InferAttributes] Materialize all infered attributes for declaration"" and follow on patches. This reverts commit `ab98f2c712` and `98eea392cd`. It includes a fix for the clang test which triggered the revert. I failed to notice this one because there was another AMDGPU llvm test with a similiar name and the exact same text in the error message. Odd. Since only one build bot reported the clang test, I didn't notice that one.	2021-04-14 16:38:07 -07:00
Nico Weber	ab98f2c712	Revert "[InferAttributes] Materialize all infered attributes for declaration" Breaks check-clang, see comments on D100400 Also revert follow-up "[NFC] Move a recently added utility into a location to enable reuse" This reverts commit `3ce61fb6d6`. This reverts commit `61a85da882`.	2021-04-14 18:41:20 -04:00
Philip Reames	3ce61fb6d6	[NFC] Move a recently added utility into a location to enable reuse About to refresh a patch that uses this in FunctionAtrrs, doing the move seperately to control build times.	2021-04-14 15:05:16 -07:00
Philip Reames	61a85da882	[InferAttributes] Materialize all infered attributes for declaration We have some cases today where attributes can be inferred from another on access, but the result is not explicitly materialized in IR. This change is a step towards changing that. Why? Two main reasons: * Human clarity. It's really confusing trying to figure out why a transform is triggering when the IR doesn't appear to have the required attributes. * This avoids the need to special case declarations in e.g. functionattrs. Since we can assume the attribute is present, we can work directly from attributes (and only attributes) without also needing to query accessors on Function to avoid missing cases due to unannotated (but infered on use) declarations. (This piece will appear must easier to follow once D100226 also lands.) Differential Revision: https://reviews.llvm.org/D100400	2021-04-14 14:45:24 -07:00
Mehrnoosh Heidarpour	29f189f90d	[InstCombine] Conditionally emit nowrap flags when combining two adds Currently, the InstCombineCompare is combining two add operations into a single add operation which always has a nsw flag, without checking the conditions to see if this flag should be present according to the original two add operations or not. This patch will change the InstCombineCompare to emit the nsw or nuw only when these flags are allowed to be generated according to the original add operations and remove the possibility of applying wrong optimization with passes that will perform on the IR later in the pipeline. To confirm that the current results are buggy and the results after proposed patch are the correct IR the following examples from Alive2 are attached; the same results can be seen in the case of nuw flag and nsw is just used as an example. The following link shows that the generated IR with current LLVM is a buggy IR when none of the original add operations have nsw flag. https://alive2.llvm.org/ce/z/WGaDrm The following link proves that the generated IR after the patch in the former case is the correct IR. https://alive2.llvm.org/ce/z/wQ7G_e Differential Revision: https://reviews.llvm.org/D100095	2021-04-14 20:53:06 +02:00
Sjoerd Meijer	39d29817f3	[SCCP] Follow up of rGbbab9f986c6d. NFC. This addresses the linter messages, mainly the inconsistent capitalisation of member functions.	2021-04-14 17:14:46 +01:00
Benjamin Kramer	cf4161673c	[Instcombine] Disable memcpy of alloca bypass for instruction sources This transformation is fundamentally broken when it comes to dominance, it just happened to work when the source of the memcpy can be moved into the place of the alloca. The bug shows up a lot more often since `077bff39d4` allows the source to be a switch. It would be possible to check dominance of the source and all its operands, but that seems very heavy for instcombine.	2021-04-14 16:52:09 +02:00
Simon Pilgrim	b49c41afba	[SLP] createOp - fix null dereference warning. NFCI. Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.	2021-04-14 15:24:41 +01:00
Sjoerd Meijer	bbab9f986c	[SCCP] Create SCCP Solver This refactors SCCP and creates a SCCPSolver interface and class so that it can be used by other passes and transformations. We will use this in D93838, which adds a function specialisation pass. This is based on an early version by Vinay Madhusudan. Differential Revision: https://reviews.llvm.org/D93762	2021-04-14 14:58:03 +01:00
Roman Lebedev	2fea5d5d4a	[InstCombine] tmp alloca bypass: ensure that the replacement dominates all alloca uses After `077bff39d4`, isDereferenceableForAllocaSize() can recurse into selects, which is causing a problem for the new test case, reduced from https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20210412/904154.html because the replacement (the select) is defined after the first use of an alloca, so we'd end up with a verifier error. Now, this new check is too restrictive. We likely can handle some cases, by trying to sink all uses of an alloca to after the the def.	2021-04-14 13:04:12 +03:00
Sterling Augustine	32e264921b	Revert "[GlobalOpt] Revert valgrind hacks" This reverts commit `dbc16ed199`.	2021-04-13 17:47:07 -07:00
Evgeny Leviant	dbc16ed199	[GlobalOpt] Revert valgrind hacks Differential revision: https://reviews.llvm.org/D69428	2021-04-13 19:11:10 +03:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
Florian Hahn	467b1f1cd2	[SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false. As a side-effect of the change to default HoistCommonInsts to false early in the pipeline, we fail to convert conditional branch & phis to selects early on, which prevents vectorization for loops that contain conditional branches that effectively are selects (or if the loop gets vectorized, it will get vectorized very inefficiently). This patch updates SimplifyCFG to perform hoisting if the only instruction in both BBs is an equal branch. In this case, the only additional instructions are selects for phis, which should be cheap. Even though we perform hoisting, the benefits of this kind of hoisting should by far outweigh the negatives. For example, the loop in the code below will not get vectorized on AArch64 with the current default, but will with the patch. This is a fundamental pattern we should definitely vectorize. Besides that, I think the select variants should be easier to use for reasoning across other passes as well. https://clang.godbolt.org/z/sbjd8Wshx ``` double clamp(double v) { if (v < 0.0) return 0.0; if (v > 6.0) return 6.0; return v; } void loop(double* X, double *Y) { for (unsigned i = 0; i < 20000; i++) { X[i] = clamp(Y[i]); } } ``` Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D100329	2021-04-13 10:33:35 +01:00
Amy Huang	dad5caa59e	Revert "Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"" This change causes an assert / segmentation fault in LTO builds. This reverts commit `f2e4f3eff3`.	2021-04-12 20:10:17 -07:00
Evgeniy Brevnov	e50aa1af2d	[NARY][NFC] Use hasNUsesOrMore instead of getNumUses since it's more efficient.	2021-04-13 09:29:49 +07:00
Gulfem Savrun Yeniceri	e96df3e531	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-04-13 01:29:41 +00:00
Nick Desaulniers	237d4ee835	[JumpThreading] merge debug info when merging select+br Jump threading can replace select then unconditional branch with conditional branch, but when doing so loses debug info. This destructive transform is eventually leading to a failed Verifier run during full LTO builds of the Linux kernel with CFI and KCOV enabled, as reported in PR39531. ModuleSanitizerCoveragePass will insert calls to __sanitizer_cov_trace_pc, and sometimes split critical edges, using whatever debug info may or may not exist for the branch for the added libcall. Since we can inline calls to __sanitizer_cov_trace_pc due to LTO, this can lead to the error observed in PR39531 when the debug info isn't propagated to the libcall, because of prior destructive transforms that failed to retain debug info. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100137	2021-04-12 17:51:21 -07:00
Arthur Eubanks	a8ab1f98d2	[Evaluator] Look through invariant.group intrinsics Turning on -fstrict-vtable-pointers in Chrome caused an extra global initializer. Turns out that a llvm.strip.invariant.group intrinsic was causing GlobalOpt to fail to step through some simple code. We can treat .invariant.group uses as simply their operand. Value::stripPointerCastsForAliasAnalysis() does exactly this. This should be safe because the Evaluator does not skip memory accesses due to invariants or alias analysis. However, we don't want to leak that we've stripped arbitrary pointer casts to users of Evaluator, so we bail out if we evaluate a function to any constant, since we may have looked through .invariant.group calls and aliasing pointers cannot be arbitrarily substituted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D98843	2021-04-12 16:12:15 -07:00
Nick Desaulniers	4914c98367	[SantizerCoverage] handle missing DBG MD when inserting libcalls Instruction::getDebugLoc can return an invalid DebugLoc. For such cases where metadata was accidentally removed from the libcall insertion point, simply insert a DILocation with line 0 scoped to the caller. When we can inline the libcall, such as during LTO, then we won't fail a Verifier check that all calls to functions with debug metadata themselves must have debug metadata. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D100158	2021-04-12 15:55:58 -07:00
Yuanfang Chen	c5fda0e662	Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"" This reverts commit `a3fabc79ae` (relands `f4d682d6ce` with fix for the compile-time regression issue).	2021-04-12 14:50:54 -07:00
Nikita Popov	a3fabc79ae	Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom" This reverts commit `f4d682d6ce`. This caused a significant compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions Possibly this is due to overeager parsing of target triples.	2021-04-12 22:55:59 +02:00
Sanjay Patel	5354a213a0	[InstCombine] fold shift+trunc signbit check https://alive2.llvm.org/ce/z/6vQvrP This solves: https://llvm.org/PR49866	2021-04-12 16:19:43 -04:00
Sanjay Patel	661cc71a1c	[PassManager][PhaseOrdering] lower expects before running simplifyCFG Retry of `330619a3a6` that includes a clang test update. Original commit message: If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898 <https://reviews.llvm.org/D98898>. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 15:07:53 -04:00
Sanjay Patel	23ac9d1e6e	Revert "[PassManager][PhaseOrdering] lower expects before running simplifyCFG" This reverts commit `330619a3a6`. There are clang tests that also need to be updated.	2021-04-12 13:58:54 -04:00
Yuanfang Chen	f4d682d6ce	[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom D24453 enabled libcalls simplication for ARM PCS. This may cause caller/callee calling conventions mismatch in some situations such as LTO. This patch makes instcombine aware that the compatible calling conventions differences are benign (not emitting undef idom). Differential Revision: https://reviews.llvm.org/D99773	2021-04-12 09:32:23 -07:00
Sanjay Patel	330619a3a6	[PassManager][PhaseOrdering] lower expects before running simplifyCFG If we run passes before lowering llvm.expect intrinsics to metadata, then those passes have no way to act on the hints provided by llvm.expect. SimplifyCFG is the known offender, and we made it smarter about profile metadata in D98898. In the motivating example from https://llvm.org/PR49336 , this means we were ignoring the recommended method for a programmer to tell the compiler that a compare+branch is expensive. This change appears to solve that case - the metadata survives to the backend, the compare order is as expected in IR, and the backend does not do anything to reverse it. We make the same change to the old pass manager to keep things synchronized. Differential Revision: https://reviews.llvm.org/D100213	2021-04-12 12:23:31 -04:00
Stephen Tozer	f2e4f3eff3	Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" The causes of the previous build errors have been fixed in revisions `aa3e78a59f`, and `140757bfaa` This reverts commit `f40976bd01`.	2021-04-12 16:57:29 +01:00
Evgeniy Brevnov	36b932d6a3	[NARY] Don't optimize min/max if there are side uses Say we have %1=min(%a,%b) %2=min(%b,%c) %3=min(%2,%a) The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2. But if %2 has another uses outside of %3 then we can't remove %2 and end up with: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1, %c) This doesn't harm by itself except it is not profitable and changes IR for no good reason. What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates: %1=min(%a,%b) %2=min(%b,%c) %3=min(%1,%c) %4=min(%2,%a) and so on... The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and known to be not removed. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D100170	2021-04-12 12:43:54 +07:00
Roman Lebedev	8fc8c745cf	[NFCI][SimplifyCFG] PerformValueComparisonIntoPredecessorFolding(): improve Dominator Tree updating Same as with previous patches.	2021-04-11 23:56:23 +03:00
Roman Lebedev	13fca9d816	[NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updating Same as with previous patches.	2021-04-11 23:56:23 +03:00
Roman Lebedev	0699da1569	[NFCI][Local] MergeBasicBlockIntoOnlyPred(): improve Dominator Tree updating Same as with TryToSimplifyUncondBranchFromEmptyBlock()/MergeBlockIntoPredecessor() patch.	2021-04-11 23:56:23 +03:00
Roman Lebedev	e5692a564a	[NFCI][BasicBlockUtils] MergeBlockIntoPredecessor(): improve Dominator Tree updating Same as with TryToSimplifyUncondBranchFromEmptyBlock() patch.	2021-04-11 23:56:23 +03:00
Roman Lebedev	2def9c3d8e	[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): improve Dominator Tree updating First, we don't need vector-ness for the predecessor lists. Secondly, like elsewhere, do insertions before deletions. Lastly, the check that we actually need to insert an edge, that it doesn't exist already, is backwards. Instead of looking at successors of every single 'PredOfBB', just always look at predecessors of the 'Succ'. The result is always the same, but we avoid really inefficient code.	2021-04-11 23:56:22 +03:00
Roman Lebedev	91248e2db9	[InstCombine] Improve "get low bit mask upto and including bit X" pattern https://alive2.llvm.org/ce/z/3u-48R	2021-04-11 18:08:08 +03:00
Roman Lebedev	a36bb7fd76	[InstCombine] (X \| Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add https://alive2.llvm.org/ce/z/Coc5yf	2021-04-11 18:08:08 +03:00
Roman Lebedev	005881e96e	[LoopIdiom] left-shift-until-bittest: set all allowed no-wrap flags on add/sub I've checked each one of these with alive2, and this is both correct and precise.	2021-04-11 18:08:07 +03:00
Roman Lebedev	9829f5e6b1	[CVP] @llvm.[us]{min,max}() intrinsics handling If we can tell that either one of the arguments is taken, bypass the intrinsic. Notably, we are indeed fine with non-strict predicate: * UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i * UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx * SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W * SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr Much like with all other comparison handling in CVP, while we could sort-of handle two Value's, at least for plain ICmpInst it does not appear to be worthwhile. This only fires 78 times on test-suite + dt + rs, but we don't canonicalize to these yet. (only SCEV produces them)	2021-04-11 00:33:47 +03:00
Roman Lebedev	f041757e9c	[NFC][JumpThreading] Increment 'NumFolds' statistic all places terminator becomes uncond	2021-04-10 21:24:29 +03:00
Roman Lebedev	a407738def	[NFC][CVP] Add statistic for function pointer argument non-null-ness deduction	2021-04-10 21:23:20 +03:00
Roman Lebedev	fe7b3ad8d5	[CVP] LVI: Use in-block values when checking value signedness domain This has a huge positive impact on all the folds that use these helpers, as it can be seen on vanilla test-suite + rawspeed + darktable: correlated-value-propagation.NumSRems +75.68% (+ 28) correlated-value-propagation.NumAShrs +63.87% (+198) correlated-value-propagation.NumSDivs +49.42% (+127) correlated-value-propagation.NumSExt + 8.85% (+593) correlated-value-propagation.NumUDivURemsNarrowed + 8.65% (+34) ... while having pretty minimal compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=e8c7f43e2c2c6f3581ec1c6489ec21ad9f98958a&to=4cd197711e58ee1b2faeee0c35eea54540185569&stat=instructions	2021-04-10 21:10:59 +03:00
Roman Lebedev	257eda0794	[NFC][LVI] getPredicateAt(): drop default value for UseBlockValue The default is likely wrong. Out of all the callees, only a single one needs to pass-in false (JumpThread), everything else either already passes true, or should pass true. Until the default is flipped, at least make it harder to unintentionally add new callees with UseBlockValue=false.	2021-04-10 20:46:01 +03:00
Roman Lebedev	e8c7f43e2c	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:38:55 +03:00
Roman Lebedev	7b12c8c59d	Revert "[NFC][ConstantRange] Add 'icmp' helper method" This reverts commit `17cf2c9423`.	2021-04-10 19:37:53 +03:00
Roman Lebedev	17cf2c9423	[NFC][ConstantRange] Add 'icmp' helper method "Does the predicate hold between two ranges?" Not very surprisingly, some places were already doing this check, without explicitly naming the algorithm, cleanup them all.	2021-04-10 19:09:52 +03:00
Roman Lebedev	c329a47d9e	[CVP] @llvm.abs() handling Iff we know the sigdness domain of the argument, we can either skip @llvm.abs, or do negation directly. Notably, INT_MIN can belong to either domain: * X u<= INT_MIN --> X is always fine https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS * X s<= 0 --> -X is always fine https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84 If all else fails, try to inferr NSW flag: https://alive2.llvm.org/ce/z/qCJfYm	2021-04-10 16:47:31 +03:00
Adrian Prantl	6ce76ff7eb	Update the linkage name of coro-split functions in the debug info. This patch updates the linkage name in the DISubprogram of coro-split functions, which is particularly important for Swift, where the funclets have a special name mangling. This patch does not affect C++ coroutines, since the DW_AT_specification is expected to hold the (original) linkage name. I believe this is mostly due to limitations in AsmPrinter, so we might be able to relax this restriction in the future. Differential Revision: https://reviews.llvm.org/D99693	2021-04-09 09:50:56 -07:00
Sanjay Patel	84cdccc9dc	[InstCombine] try to eliminate an instruction in min/max -> abs fold As suggested in the review thread for `5094e12` and seen in the motivating example from https://llvm.org/PR49885, it's not clear if we have a way to create the optimal code without this heuristic.	2021-04-09 10:34:03 -04:00
dfukalov	c1a88e007b	[AA][NFC] Convert AliasResult to class containing offset for PartialAlias case. Add an ability to store `Offset` between partially aliased location. Use this storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98718	2021-04-09 13:26:09 +03:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Max Kazantsev	baf17e2cc9	[NFC] Move statictic increment out of helper	2021-04-09 16:32:35 +07:00
Max Kazantsev	275f3a2540	[GVN][NFC] Factor out load elimination logic via PRE for reuse	2021-04-09 16:12:25 +07:00
Arthur Eubanks	4c89bcadf6	[LICM] Hoist loads with invariant.group metadata Previously loading the vtable used in calling a virtual method in a loop was not hoisted out of the loop. This fixes that. canSinkOrHoistInst() itself doesn't check that the load operands are loop invariant, callers also check that separately. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99784	2021-04-08 21:57:37 -07:00
Serguei Katkov	d2e15a83a6	[RS4GC] Cleanup meetBDVState. NFC. meetBDVState looks pretty difficult to read and follow. This is purely NFC but doing several things: 1) Combine meet and meetBDVState 2) Move the function to be a member of BDVState 3) Make BDVState be a mutable object 4) Convert switch to sequence of ifs 5) Adds comments. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99064	2021-04-09 10:20:25 +07:00
Alexey Bataev	ab124bbe2a	[SLP]Fix PR49898: Infinite loop in SLP vectorizer. We should not re-try attempt of finding of the consecutive store chain if it was tried before. Differential Revision: https://reviews.llvm.org/D100131	2021-04-08 14:18:06 -07:00
Philip Reames	35393c865c	[funcattrs] Infer nosync from instruction walk Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync. A couple points of interest: * I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere. * The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately. * I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising. Differential Revision: https://reviews.llvm.org/D99769	2021-04-08 14:05:00 -07:00
Arthur Eubanks	c5d1ccbcdf	[GVN] Properly invalidate ICF cache when we simplify a value This fixes a "Cached first special instruction is wrong!" assert. The assert fires because replacing a value with another can cause an instruction to no longer be "special" to ICF. In this case, devirtualization happened, turning an indirect call to a call to a willreturn function which is no longer special. Reviewed By: nikic, rnk Differential Revision: https://reviews.llvm.org/D99977	2021-04-08 14:01:57 -07:00
Nikita Popov	59a2f67011	[LoopRotate] Don't split loop pass manager After D99249 we use three different loop pass managers for LICM, LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI and LazyBPI are not preserved by LoopRotate (note that D74640 is no longer needed). Avoid this by marking them as preserved. My understanding of D86156 is that it is okay to simply preserve them (which LoopUnswitch already does for the same reason) and rely on callbacks to deal with deleted blocks. Differential Revision: https://reviews.llvm.org/D99843	2021-04-08 22:05:18 +02:00
Congzhe Cao	ce2db9005d	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to the `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-08 14:58:13 -04:00
Sanjay Patel	5094e1279e	[InstCombine] fold min/max intrinsic with negated operand to abs The smax case shows up in https://llvm.org/PR49885 . The others seem unlikely, but we might as well try for uniformity (although that could mean an extra instruction to create "nabs"). smax -- https://alive2.llvm.org/ce/z/8yYaGy smin -- https://alive2.llvm.org/ce/z/0_7zc_ umax -- https://alive2.llvm.org/ce/z/EcsZWs umin -- https://alive2.llvm.org/ce/z/Xw6WvB	2021-04-08 14:37:39 -04:00
Florian Hahn	e4de3cdf3d	[LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC). Instead of passing the start value and the defined value to widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be used to get both (and more in future patches).	2021-04-08 14:25:10 +01:00
Stephen Tozer	140757bfaa	[DebugInfo] Prevent invalid debug info being produced during LoopStrengthReduce During LoopStrengthReduce, some of the SSA values that are used by debug values may be lost and/or salvaged. After LSR we attempt to recover any undef debug values, including any that were salvaged but then lost their values afterwards, by replacing the lost values with any live equal values (plus a possible constant offset) that have been gathered prior to running LSR. When we do this we restore the debug value's original DIExpression, to undo any salvaging (as we have gone back to using the original debug value). This process can currently produce invalid debug info if the number of operands has changed by salvaging during LSR. Replacing old values during the applyEqualValues step does not change the number of location operands, which means that when we restore the old DIExpression we may have a mismatch between the number of operands used by the debug value and the number of operands referenced by the DIExpression. This patch fixes this by restoring the full original location metadata at the start of the applyEqualValues step, so that there is no mismatch in operand count between the debug value and its DIExpression. Differential Revision: https://reviews.llvm.org/D98644	2021-04-08 13:04:48 +01:00
David Green	8675ef100f	[LV] Logical and/or select costs D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly. This fixes 60% performance regressions from code like the attached test case. Differential Revision: https://reviews.llvm.org/D99884	2021-04-08 10:39:47 +01:00
Congzhe Cao	593cb46550	Revert "[LoopInterchange] Fix transformation bugs in loop interchange" This reverts commit 6ec68bd815d00c1eec2a6b9766452554f0e6cb61.	2021-04-07 21:17:30 -04:00
CongzheUalberta	f5645ea65f	[LoopInterchange] Fix transformation bugs in loop interchange After loop interchange, the (old) outer loop header should not jump to `LoopExit`. Note that the old outer loop becomes the new inner loop after interchange. If we branched to `LoopExit` then after interchange we would jump directly from the (new) inner loop header to `LoopExit` without executing the rest of (new) outer loop. This patch modifies adjustLoopBranches() such that the old outer loop header (which becomes the new inner loop header) jumps to the old inner loop latch which becomes the new outer loop latch after interchange. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D98475	2021-04-07 20:55:44 -04:00
Sanjay Patel	c0bbd0cc35	[InstCombine] fold not ops around min/max intrinsics This is another step towards parity with the existing cmp+select folds (see D98152).	2021-04-07 17:31:36 -04:00
Craig Topper	5fc0e98d9a	[LoopIdiomRecognize] Minor cleanups to the FFS idiom matching. NFC -Make sure of the CreateShl/LShr/AShr methods that take a uint64_t instead of creating a ConstantInt for 1 ourselves. -Use Builder.getInt1 or ConstantInt::getBool instead of a conditional. -Pull out repeated calls to getType.	2021-04-07 10:03:14 -07:00
Roman Lebedev	24f67473dd	[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants All of the code that handles general constant here (other than the more restrictive APInt-dealing code) expects that it is an immediate, because otherwise we won't actually fold the constants, and increase instruction count. And it isn't obvious why we'd be okay with increasing the number of constant expressions, those still will have to be run.. But after `2829094a8e` this could also cause endless combine loops. So actually properly restrict this code to immediates.	2021-04-07 19:50:19 +03:00
Sanjay Patel	1894c6c59e	[InstCombine] avoid infinite loop from partial undef vectors This fixes the examples from D99674 and https://llvm.org/PR49878 The matchers succeed on partial undef/poison vector constants, but the transform creates a full 'not' (-1) constant, so it would undo a demanded vector elements change triggered by the extractelement. Differential Revision: https://reviews.llvm.org/D100044	2021-04-07 12:18:12 -04:00
wlei	6d5132b426	[CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it. `Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again. This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D99787	2021-04-07 08:48:59 -07:00
Alexey Bataev	a78e86e6be	[SLP]Avoid multiple attempts to vectorize CmpInsts. No need to lookup through and/or try to vectorize operands of the CmpInst instructions during attempts to find/vectorize min/max reductions. Compiler implements postanalysis of the CmpInsts so we can skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save compile time. Differential Revision: https://reviews.llvm.org/D99950	2021-04-07 06:15:42 -07:00
Sanjay Patel	0333ed8e0c	[InstCombine] move abs transform to helper function; NFC The swap of the operands can affect later transforms that are expecting a constant as operand 1. I don't think we can trigger a bug with the current code, but I hit that problem while drafting a new transform for min/max intrinsics.	2021-04-07 08:35:07 -04:00
Roman Lebedev	2829094a8e	Reland [InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) This reverts commit `a547b4e26b`, relanding commit `31d219d299`, which was reverted because there was a conflicting inverse transform, which was causing an endless combine loop, which has now been adjusted. Original commit message: https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-07 12:06:25 +03:00
Roman Lebedev	93d1d94b74	[InstCombine] Restrict "C-(X+C2) --> (C-C2)-X" fold to immediate constants I.e., if any/all of the consants is an expression, don't do it. Since those constants won't reduce into an immediate, but would be left as an constant expression, they could cause endless combine loops after `31d219d299` added an inverse transformation.	2021-04-07 12:06:24 +03:00
Petr Hosek	a547b4e26b	Revert "[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858)" This reverts commit `31d219d299` which causes an infinite loop when compiling the XRay runtime.	2021-04-06 22:30:28 -07:00
Sidharth Baveja	d81d9e8b86	[SplitEdge] Update SplitCriticalEdge to return a nullptr only when the edge is not critical Summary: The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it can always split all critical edges, which is an incorrect assumption. The three cases where the function SplitCriticalEdge will return a nullptr is: 1. DestBB is an exception block 2. Options.IgnoreUnreachableDests is set to true and isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr 3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true) and it cannot be maintained for a loop due to indirect branches For each of these situations they are handled in the following way: 1. Modified the function ehAwareSplitEdge originally from llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB is an exception block. This function is called directly in SplitEdge. SplitEdge does not call SplitCriticalEdge in this case 2. Options.IgnoreUnreachableDests is set to false by default, so this situation does not apply. 3. Return a nullptr in this situation since the SplitCriticalEdge also returned nullptr. Nothing we can do in this case. Reviewed By: asbirlea Differential Revision:https://reviews.llvm.org/D94619	2021-04-06 21:24:40 +00:00
Philip Reames	4bf8985f4f	Replace calls to IntrinsicInst::Create with CallInst::Create [nfc] There is no IntrinsicInst::Create. These are binding to the method in the super type. Be explicitly about which method is being called.	2021-04-06 13:23:58 -07:00
Philip Reames	908215b346	Use AssumeInst in a few more places [nfc] Follow up to `a6d2a8d6f5`. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.	2021-04-06 13:18:53 -07:00
Philip Reames	9ef6aa020b	Plumb AssumeInst through operand bundle apis [nfc] Follow up to `a6d2a8d6f5`. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.	2021-04-06 12:53:53 -07:00
Luís Marques	0c3bc1f3a4	[ASan][RISCV] Fix RISC-V memory mapping Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and D87581). This should be an improvement both in terms of first principles soundness and observed test failures --- test failures would occur non-deterministically depending on the ASLR random offset. On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as `PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result be the not very round number 0x1555556000. That address had to be further rounded to ensure page alignment after the shadow scale shifting is applied. Still, that value explains why the mapping table may look less regular than expected. Further cleanups: - Moved the mapping table comment, to ensure that the two Linux/AArch64 tables stayed together; - Removed mention of Sv48. Neither the original mapping nor this one are compatible with an actual Linux Sv48 address space (mainline Linux still operates Sv48 in Sv39 mode). A future patch can improve this; - Removed the additional comments, for consistency. Differential Revision: https://reviews.llvm.org/D97646	2021-04-06 20:46:17 +01:00
Philip Reames	a6d2a8d6f5	Add a subclass of IntrinsicInst for llvm.assume [nfc] Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.	2021-04-06 11:16:22 -07:00
Arthur Eubanks	4e83e59eb8	[GVN] Add missing ICF update performScalarPREInsertion() inserts instructions into blocks that we need to tell ImplicitControlFlowTracking about, otherwise the ICF cache may be invalid. Fixes PR49193. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99909	2021-04-06 10:13:42 -07:00
Philip Reames	21d4839948	Move GCRelocateInst and GCResultInst to IntrinsicInst.h [nfc] These two are part of the IntrinsicInst class hierarchy and it helps to cut down on some redundant includes.	2021-04-06 08:33:15 -07:00
Philip Reames	52ecd94cfb	Remove last remnants of PR49607 migration [NFC] The key change (`4f5e92c`) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.	2021-04-06 07:56:55 -07:00
Jan Svoboda	fb6a5237aa	Revert "[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction" This reverts commit `167ea67d` This causes a bunch of build failures: * http://lab.llvm.org:8011/#/builders/121/builds/6287 * http://green.lab.llvm.org/green/job/clang-stage1-RA/19915	2021-04-06 16:33:28 +02:00
Benjamin Kramer	ce4acb01b3	Avoid unused variable warning in Release builds	2021-04-06 16:25:19 +02:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Roman Lebedev	31d219d299	[InstCombine] Fold `((X - Y) - Z)` to `X - (Y + Z)` (PR49858) https://alive2.llvm.org/ce/z/67w-wQ We prefer `add`s over `sub`, and this particular xform allows further folds to happen: Fixes https://bugs.llvm.org/show_bug.cgi?id=49858	2021-04-06 15:58:14 +03:00
Simon Pilgrim	b8aba76a4e	LoopFlatten - CanWidenIV - Fix uninitialized variable warnings and use for-range loop. NFCI. Fix static analysis uninitialized variable warnings, and use for-range loop iteration across WideIVs array.	2021-04-06 12:24:20 +01:00
Abhina Sreeskantharajan	82b3e28e83	[SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text Problem: On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable. Solution: This patch adds two new flags - OF_CRLF which indicates that CRLF translation is used. - OF_TextWithCRLF = OF_Text \| OF_CRLF indicates that the file is text and uses CRLF translation. Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF. So this is the behaviour per platform with my patch: z/OS: OF_None: open in binary mode OF_Text : open in text mode OF_TextWithCRLF: open in text mode Windows: OF_None: open file with no carriage return OF_Text: open file with no carriage return OF_TextWithCRLF: open file with carriage return The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set. ``` if (Flags & OF_CRLF) CrtOpenFlags \|= _O_TEXT; ``` These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows. ./llvm/lib/Support/raw_ostream.cpp ./llvm/lib/TableGen/Main.cpp ./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp ./llvm/unittests/Support/Path.cpp ./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp ./clang/lib/Frontend/CompilerInstance.cpp ./clang/lib/Driver/Driver.cpp ./clang/lib/Driver/ToolChains/Clang.cpp Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99426	2021-04-06 07:23:31 -04:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Florian Hahn	a6b06b785c	[VPlan] Print VPValue operands for VPWidenPHI if possible. For VPWidenPHIRecipes that model all incoming values as VPValue operands, print those operands instead of printing the original PHI. D99294 updates recipes of reduction PHIs to use the VPValue for the incoming value from the loop backedge, making use of this new printing.	2021-04-06 12:11:21 +01:00
madhur13490	167ea67d76	[IR] Ignore bitcasts of function pointers which are only used as callees in callbase instruction This patch enhances hasAddressTaken() to ignore bitcasts as a callee in callbase instruction. Such bitcast usage doesn't really take the address in a useful meaningful way. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D98884	2021-04-06 09:23:46 +00:00
Arthur Eubanks	ea0e2ca1ac	[SROA] Allow SROA on pointers with invariant group intrinsic uses When we are able to SROA an alloca, we know all uses of it, meaning we don't have to preserve the invariant group intrinsics and metadata. It's possible that we could lose information regarding redundant loads/stores, but that's unlikely to have any real impact since right now the only user is Clang and vtables. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D99760	2021-04-05 19:53:40 -07:00
Ta-Wei Tu	6a82ace5f2	[LoopFusion] Bails out if only the second candidate is guarded (PR48060) If only the second candidate loop is guarded while the first one is not, fusioning two loops might not be valid but this check is currently missing. Fixes https://bugs.llvm.org/show_bug.cgi?id=48060 Reviewed By: sidbav Differential Revision: https://reviews.llvm.org/D99716	2021-04-06 01:08:56 +08:00
Sanjay Patel	c590a9880d	[InstCombine] fix potential miscompile in select value equivalence As shown in the example based on: https://llvm.org/PR49832 ...and the existing test, we can't substitute a vector value because the equality compare replacement that we are attempting requires that the comparison is true for the entire value. Vector select can be partly true/false.	2021-04-05 12:25:40 -04:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Roman Lebedev	2760a808b9	[InstCombine] dropRedundantMaskingOfLeftShiftInput(): check that adding shift amounts doesn't overflow (PR49778) This is identical to `781d077afb`, but for the other function. For certain shift amount bit widths, we must first ensure that adding shift amounts is safe, that the sum won't have an unsigned overflow. Fixes https://bugs.llvm.org/show_bug.cgi?id=49778	2021-04-04 23:26:41 +03:00
Roman Lebedev	dceb3e5996	[NFC][InstCombine] Extract canTryToConstantAddTwoShiftAmounts() as helper	2021-04-04 23:26:41 +03:00
Sanjay Patel	c0645f1324	[InstCombine] fold popcount of exactly one bit to shift This is discussed in https://llvm.org/PR48999 , but it does not solve that request. The difference in the vector test shows that some other logic transform is limited to scalar types.	2021-04-04 11:43:49 -04:00
Nikita Popov	9bad7de9a3	[SimplifyCFG] Handle two equal cases in switch to select When converting a switch with two cases and a default into a select, also handle the denegerate case where two cases have the same value. Generate this case directly as %or = or i1 %cmp1, %cmp2 %res = select i1 %or, i32 %val, i32 %default rather than %sel1 = select i1 %cmp1, i32 %val, i32 %default %res = select i1 %cmp2, i32 %val, i32 %sel1 as InstCombine is going to canonicalize to the former anyway.	2021-04-04 17:27:28 +02:00
Juneyoung Lee	5207cde5cb	[InstCombine] Conditionally fold select i1 into and/or This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or: ``` select cond, cond2, false -> and cond, cond2 ``` This is not safe if cond2 is poison whereas cond isn’t. Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s. To minimize its impact, this patch conservatively checks whether cond2 is an instruction that creates a poison or its operand creates a poison. This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99674	2021-04-04 14:11:28 +09:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Sanjay Patel	412fc74140	[InstCombine] fold not+or+neg ~((-X) \| Y) --> (X - 1) & (~Y) We generally prefer 'add' over 'sub', this reduces the dependency chain, and this looks better for codegen on x86, ARM, and AArch64 targets. https://llvm.org/PR45755 https://alive2.llvm.org/ce/z/cxZDSp	2021-04-02 13:16:36 -04:00
Dimitry Andric	6abb92f210	[SCCP] Avoid modifying AdditionalUsers while iterating over it When run under valgrind, or with a malloc that poisons freed memory, this can lead to segfaults or other problems. To avoid modifying the AdditionalUsers DenseMap while still iterating, save the instructions to be notified in a separate SmallPtrSet, and use this to later call OperandChangedState on each instruction. Fixes PR49582. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98602	2021-04-02 19:05:59 +02:00
Florian Hahn	8867fc69f0	[LV] Hoist mapping of IR operands to VPValues (NFC). This patch moves mapping of IR operands to VPValues out of tryToCreateWidenRecipe. This allows using existing VPValue operands when widening recipes directly, which will be introduced in future patches.	2021-04-02 17:57:20 +01:00
Philip Reames	2c4548e18e	[rs4gc] Use loops instead of straightline code for attribute stripping [nfc] Mostly because I'm about to add more attributes and the straightline copies get much uglier. What's currently there isn't too bad.	2021-04-02 09:25:15 -07:00
Philip Reames	a505801e2b	[rs4gc] Strip nofree and nosync attributes when lowering from abstract model The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering. I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.	2021-04-02 09:12:24 -07:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Jeroen Dobbelaere	b82b305cf9	[InstCombine] Fix out-of-bounds ashr(shl) optimization This fixes a crash found by the oss fuzzer and reported by @fhahn. The suggestion of @RKSimon seems to be the correct fix here. (See D91343). The oss fuzz report can be found here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32759 Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D99792	2021-04-02 13:45:11 +02:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Evgeniy Brevnov	2388aae401	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev, lebedev.ri Differential Revision: https://reviews.llvm.org/D88287	2021-04-02 15:30:13 +07:00
Roman Lebedev	a26f1bf67e	[PassManager] Run additional LICM before LoopRotate Loop rotation often has to perform code duplication from header into preheader, which introduces PHI nodes. >>! In D99204, @thopre wrote: > > With loop peeling, it is important that unnecessary PHIs be avoided or > it will leads to spurious peeling. One source of such PHIs is loop > rotation which creates PHIs for invariant loads. Those PHIs are > particularly problematic since loop peeling is now run as part of simple > loop unrolling before GVN is run, and are thus a source of spurious > peeling. > > Note that while some of the load can be hoisted and eventually > eliminated by instruction combine, this is not always possible due to > alignment issue. In particular, the motivating example [1] was a load > inside a class instance which cannot be hoisted because the `this' > pointer has an alignment of 1. > > [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp Now, we could enhance LoopRotate to avoid duplicating code when not needed, but instead hoist loop-invariant code, but isn't that a code duplication? (sic) We have LICM, and in fact we already run it right after LoopRotation. We could try to move it to before LoopRotation, that is basically free from compile-time perspective: https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions But, looking at stats, i think it isn't great that we would no longer do LICM after LoopRotation, in particular: \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9015799 \| -131 \| 0.00% \| 0.00% \| \| indvars.NumElimCmp \| 3536 \| 3544 \| 8 \| 0.23% \| 0.23% \| \| indvars.NumElimExt \| 36725 \| 36580 \| -145 \| -0.39% \| 0.39% \| \| indvars.NumElimIV \| 1197 \| 1187 \| -10 \| -0.84% \| 0.84% \| \| indvars.NumElimIdentity \| 143 \| 136 \| -7 \| -4.90% \| 4.90% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29890 \| 48 \| 0.16% \| 0.16% \| \| indvars.NumReplaced \| 2293 \| 2227 \| -66 \| -2.88% \| 2.88% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26329 \| -109 \| -0.41% \| 0.41% \| \| instcount.TotalBlocks \| 1178338 \| 1173840 \| -4498 \| -0.38% \| 0.38% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9896139 \| -9303 \| -0.09% \| 0.09% \| \| lcssa.NumLCSSA \| 425871 \| 423961 \| -1910 \| -0.45% \| 0.45% \| \| licm.NumHoisted \| 378357 \| 378753 \| 396 \| 0.10% \| 0.10% \| \| licm.NumMovedCalls \| 2193 \| 2208 \| 15 \| 0.68% \| 0.68% \| \| licm.NumMovedLoads \| 35899 \| 31821 \| -4078 \| -11.36% \| 11.36% \| \| licm.NumPromoted \| 11178 \| 11154 \| -24 \| -0.21% \| 0.21% \| \| licm.NumSunk \| 13359 \| 13587 \| 228 \| 1.71% \| 1.71% \| \| loop-delete.NumDeleted \| 8547 \| 8402 \| -145 \| -1.70% \| 1.70% \| \| loop-instsimplify.NumSimplified \| 12876 \| 11890 \| -986 \| -7.66% \| 7.66% \| \| loop-peel.NumPeeled \| 1008 \| 925 \| -83 \| -8.23% \| 8.23% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42003 \| -12 \| -0.03% \| 0.03% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 242 \| 2 \| 0.83% \| 0.83% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 497 \| 20 \| -477 \| -95.98% \| 95.98% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 336 \| -282 \| -45.63% \| 45.63% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11032 \| 4 \| 0.04% \| 0.04% \| \| loop-unroll.NumUnrolled \| 12608 \| 12529 \| -79 \| -0.63% \| 0.63% \| \| mem2reg.NumDeadAlloca \| 10222 \| 10221 \| -1 \| -0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192110 \| 192106 \| -4 \| 0.00% \| 0.00% \| \| mem2reg.NumSingleStore \| 637650 \| 637643 \| -7 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 814 \| 812 \| -2 \| -0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282934 \| -174 \| -0.06% \| 0.06% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106718 \| 6 \| 0.01% \| 0.01% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| ... but that actually regresses LICM (-12% `licm.NumMovedLoads`), loop-simplifycfg (`NumLoopExitsDeleted`, `NumTerminatorsFolded`), simple-loop-unswitch (`NumTrivial`). What if we instead have LICM both before and after LoopRotate? \| statistic name \| LoopRotate-LICM \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015930 \| 9014474 \| -1456 \| -0.02% \| 0.02% \| \| indvars.NumElimCmp \| 3536 \| 3546 \| 10 \| 0.28% \| 0.28% \| \| indvars.NumElimExt \| 36725 \| 36681 \| -44 \| -0.12% \| 0.12% \| \| indvars.NumElimIV \| 1197 \| 1185 \| -12 \| -1.00% \| 1.00% \| \| indvars.NumElimIdentity \| 143 \| 146 \| 3 \| 2.10% \| 2.10% \| \| indvars.NumElimRem \| 4 \| 5 \| 1 \| 25.00% \| 25.00% \| \| indvars.NumLFTR \| 29842 \| 29899 \| 57 \| 0.19% \| 0.19% \| \| indvars.NumReplaced \| 2293 \| 2299 \| 6 \| 0.26% \| 0.26% \| \| indvars.NumSimplifiedSDiv \| 6 \| 8 \| 2 \| 33.33% \| 33.33% \| \| indvars.NumWidened \| 26438 \| 26404 \| -34 \| -0.13% \| 0.13% \| \| instcount.TotalBlocks \| 1178338 \| 1173652 \| -4686 \| -0.40% \| 0.40% \| \| instcount.TotalFuncs \| 111825 \| 111829 \| 4 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9905442 \| 9895452 \| -9990 \| -0.10% \| 0.10% \| \| lcssa.NumLCSSA \| 425871 \| 425373 \| -498 \| -0.12% \| 0.12% \| \| licm.NumHoisted \| 378357 \| 383352 \| 4995 \| 1.32% \| 1.32% \| \| licm.NumMovedCalls \| 2193 \| 2204 \| 11 \| 0.50% \| 0.50% \| \| licm.NumMovedLoads \| 35899 \| 35755 \| -144 \| -0.40% \| 0.40% \| \| licm.NumPromoted \| 11178 \| 11163 \| -15 \| -0.13% \| 0.13% \| \| licm.NumSunk \| 13359 \| 14321 \| 962 \| 7.20% \| 7.20% \| \| loop-delete.NumDeleted \| 8547 \| 8538 \| -9 \| -0.11% \| 0.11% \| \| loop-instsimplify.NumSimplified \| 12876 \| 12041 \| -835 \| -6.48% \| 6.48% \| \| loop-peel.NumPeeled \| 1008 \| 924 \| -84 \| -8.33% \| 8.33% \| \| loop-rotate.NumNotRotatedDueToHeaderSize \| 368 \| 365 \| -3 \| -0.82% \| 0.82% \| \| loop-rotate.NumRotated \| 42015 \| 42005 \| -10 \| -0.02% \| 0.02% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 240 \| 241 \| 1 \| 0.42% \| 0.42% \| \| loop-simplifycfg.NumTerminatorsFolded \| 618 \| 619 \| 1 \| 0.16% \| 0.16% \| \| loop-unroll.NumCompletelyUnrolled \| 11028 \| 11029 \| 1 \| 0.01% \| 0.01% \| \| loop-unroll.NumUnrolled \| 12608 \| 12525 \| -83 \| -0.66% \| 0.66% \| \| mem2reg.NumPHIInsert \| 192110 \| 192073 \| -37 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637650 \| 637652 \| 2 \| 0.00% \| 0.00% \| \| scalar-evolution.NumTripCountsComputed \| 283108 \| 282998 \| -110 \| -0.04% \| 0.04% \| \| scalar-evolution.NumTripCountsNotComputed \| 106712 \| 106691 \| -21 \| -0.02% \| 0.02% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 5185 \| 7 \| 0.14% \| 0.14% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 925 \| 11 \| 1.20% \| 1.20% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 179 \| -4 \| -2.19% \| 2.19% \| \| simple-loop-unswitch.NumBranches \| 5178 \| 4752 \| -426 \| -8.23% \| 8.23% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 914 \| 503 \| -411 \| -44.97% \| 44.97% \| \| simple-loop-unswitch.NumSwitches \| 20 \| 18 \| -2 \| -10.00% \| 10.00% \| \| simple-loop-unswitch.NumTrivial \| 183 \| 95 \| -88 \| -48.09% \| 48.09% \| I.e. we end up with less instructions, less peeling, more LICM activity, also note how none of those 4 regressions are here. Namely: \| statistic name \| LICM-LoopRotate \| LICM-LoopRotate-LICM \| Δ \| % \| abs(%) \| \| asm-printer.EmittedInsts \| 9015799 \| 9014474 \| -1325 \| -0.01% \| 0.01% \| \| indvars.NumElimCmp \| 3544 \| 3546 \| 2 \| 0.06% \| 0.06% \| \| indvars.NumElimExt \| 36580 \| 36681 \| 101 \| 0.28% \| 0.28% \| \| indvars.NumElimIV \| 1187 \| 1185 \| -2 \| -0.17% \| 0.17% \| \| indvars.NumElimIdentity \| 136 \| 146 \| 10 \| 7.35% \| 7.35% \| \| indvars.NumLFTR \| 29890 \| 29899 \| 9 \| 0.03% \| 0.03% \| \| indvars.NumReplaced \| 2227 \| 2299 \| 72 \| 3.23% \| 3.23% \| \| indvars.NumWidened \| 26329 \| 26404 \| 75 \| 0.28% \| 0.28% \| \| instcount.TotalBlocks \| 1173840 \| 1173652 \| -188 \| -0.02% \| 0.02% \| \| instcount.TotalInsts \| 9896139 \| 9895452 \| -687 \| -0.01% \| 0.01% \| \| lcssa.NumLCSSA \| 423961 \| 425373 \| 1412 \| 0.33% \| 0.33% \| \| licm.NumHoisted \| 378753 \| 383352 \| 4599 \| 1.21% \| 1.21% \| \| licm.NumMovedCalls \| 2208 \| 2204 \| -4 \| -0.18% \| 0.18% \| \| licm.NumMovedLoads \| 31821 \| 35755 \| 3934 \| 12.36% \| 12.36% \| \| licm.NumPromoted \| 11154 \| 11163 \| 9 \| 0.08% \| 0.08% \| \| licm.NumSunk \| 13587 \| 14321 \| 734 \| 5.40% \| 5.40% \| \| loop-delete.NumDeleted \| 8402 \| 8538 \| 136 \| 1.62% \| 1.62% \| \| loop-instsimplify.NumSimplified \| 11890 \| 12041 \| 151 \| 1.27% \| 1.27% \| \| loop-peel.NumPeeled \| 925 \| 924 \| -1 \| -0.11% \| 0.11% \| \| loop-rotate.NumRotated \| 42003 \| 42005 \| 2 \| 0.00% \| 0.00% \| \| loop-simplifycfg.NumLoopBlocksDeleted \| 242 \| 241 \| -1 \| -0.41% \| 0.41% \| \| loop-simplifycfg.NumLoopExitsDeleted \| 20 \| 497 \| 477 \| 2385.00% \| 2385.00% \| \| loop-simplifycfg.NumTerminatorsFolded \| 336 \| 619 \| 283 \| 84.23% \| 84.23% \| \| loop-unroll.NumCompletelyUnrolled \| 11032 \| 11029 \| -3 \| -0.03% \| 0.03% \| \| loop-unroll.NumUnrolled \| 12529 \| 12525 \| -4 \| -0.03% \| 0.03% \| \| mem2reg.NumDeadAlloca \| 10221 \| 10222 \| 1 \| 0.01% \| 0.01% \| \| mem2reg.NumPHIInsert \| 192106 \| 192073 \| -33 \| -0.02% \| 0.02% \| \| mem2reg.NumSingleStore \| 637643 \| 637652 \| 9 \| 0.00% \| 0.00% \| \| scalar-evolution.NumBruteForceTripCountsComputed \| 812 \| 814 \| 2 \| 0.25% \| 0.25% \| \| scalar-evolution.NumTripCountsComputed \| 282934 \| 282998 \| 64 \| 0.02% \| 0.02% \| \| scalar-evolution.NumTripCountsNotComputed \| 106718 \| 106691 \| -27 \| -0.03% \| 0.03% \| \| simple-loop-unswitch.NumBranches \| 4752 \| 5185 \| 433 \| 9.11% \| 9.11% \| \| simple-loop-unswitch.NumCostMultiplierSkipped \| 503 \| 925 \| 422 \| 83.90% \| 83.90% \| \| simple-loop-unswitch.NumSwitches \| 18 \| 20 \| 2 \| 11.11% \| 11.11% \| \| simple-loop-unswitch.NumTrivial \| 95 \| 179 \| 84 \| 88.42% \| 88.42% \| {F15983613} {F15983615} {F15983616} (this is vanilla llvm testsuite + rawspeed + darktable) As an example of the code where early LICM only is bad, see: https://godbolt.org/z/GzEbacs4K This does have an observable compile-time regression of +~0.5% geomean https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions but i think that's basically nothing, and there's potential that it might be avoidable in the future by fixing clang to produce alignment information on function arguments, thus making the second run unneeded. Differential Revision: https://reviews.llvm.org/D99249	2021-04-02 11:11:42 +03:00
Juneyoung Lee	c664769330	[AssumeBundles] offset should be added to correctly calculate align This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492). Consider this code: ``` call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)] %arrayidx = getelementptr inbounds i32, i32* %a, i64 -1 ; aligment of %arrayidx? ``` The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k. Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned. The reason why this happened is as follows. `DiffSCEV` stores `%arrayidx - %a` which is -4. `OffSCEV` stores the offset value of “align”, which is 28. `DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D98759	2021-04-02 12:32:05 +09:00
Philip Reames	91790c6785	[indvars[ Fix pr49802 by checking for SCEVCouldNotCompute The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts. We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler. (Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...) p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.	2021-04-01 17:53:44 -07:00
Philip Reames	b23a314146	[funcattrs] Respect nofree attribute on callsites (not just callee)	2021-04-01 14:45:49 -07:00
Philip Reames	1e69a5af92	[Attributor] Cleanup detection of non-relaxed atomics in nosync inference The code was checking for cases which are disallowed by the verifier. Delete dead code and adjust style.	2021-04-01 12:01:29 -07:00
Philip Reames	8e596f7e27	[Attributor] Cleanup intrinsic handling in nosync inference [mostly NFC] Mostly stylistic adjustment, but the old code didn't handle the memcpy.inline intrinsic. By using the matcher class, we now do.	2021-04-01 11:49:59 -07:00
Philip Reames	6ef4505298	[funcattrs] Infer nosync from readnone and non-convergent This implements the most basic possible nosync inference. The choice of inference rule is taken from the comments in attributor and the discussion on the review of the change which introduced the nosync attribute (`0626367202`). This is deliberately minimal. As noted in code comments, I do plan to add a more robust inference which actually scans the function IR directly, but a) I need to do some refactoring of the attributor code to use common interfaces, and b) I wanted to get something in. I also wanted to minimize the "interesting" analysis discussion since that's time intensive. Context: This combines with existing nofree attribute inference to help prove dereferenceability in the ongoing deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D99749	2021-04-01 11:37:34 -07:00
Philip Reames	ffa15e9463	Extract isVolatile helper on Instruction [NFCI] We have this logic duplicated in several cases, none of which were exhaustive. Consolidate it in one place. I don't believe this actually impacts behavior of the callers. I think they all filter their inputs such that their partial implementations were correct. If not, this might be fixing a cornercase bug.	2021-04-01 11:24:02 -07:00
Philip Reames	6b05d753e0	Mark unordered memset/memmove/memcpy as nosync Mostly a means to remove a bit of code from attributor in advance of implementing a FuncAttr inference for nosync.	2021-04-01 10:38:54 -07:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Yevgeny Rouban	1ed53d44d8	[LoopFlatten] Do not report CFG analyses as up-to-date Removes CFGAnalyses from the preserved analyses set returned by LoopFlattenPass::run(). Reviewed By: Dave Green, Ta-Wei Tu Differential Revision: https://reviews.llvm.org/D99700	2021-04-01 15:52:36 +07:00
Max Kazantsev	a1d83776bf	[NFC] Undo some erroneous renamings Some vars renamed by mistake during auto-replacements. Undoing them.	2021-04-01 13:10:10 +07:00
Max Kazantsev	630818a850	[NFC] Disambiguate LI in GVN Name GVN uses name 'LI' for two different unrelated things: LoadInst and LoopInfo. This patch relates the variables with former meaning into 'Load' to disambiguate the code.	2021-04-01 12:40:35 +07:00
KAWASHIMA Takahiro	5fac7c6046	[GVN] Propagate llvm.access.group metadata of loads Before this change, the `llvm.access.group` metadata was dropped when moving a load instruction in GVN. This prevents vectorizing a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`. This change propagates the metadata as well as other metadata if it is safe (the move-destination basic block and source basic block belong to the same loop). Differential Revision: https://reviews.llvm.org/D93503	2021-04-01 10:00:48 +09:00
qixingxue	62b74f7564	[GVN][NFC] Refactor analyzeLoadFromClobberingWrite This commit adjusts the order of two swappable if statements to make code cleaner. Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D99648	2021-04-01 08:35:35 +08:00
Roman Lebedev	43ded90094	[NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader	2021-03-31 23:27:36 +03:00
Huihui Zhang	fe5c4a06a4	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Sanjay Patel	1462bdf1b9	[InstCombine] fold abs(srem X, 2) This is a missing optimization based on an example in: https://llvm.org/PR49763 As noted there and the test here, we could add a more general fold if that is shown useful. https://alive2.llvm.org/ce/z/xEHdTv https://alive2.llvm.org/ce/z/97dcY5	2021-03-31 11:29:20 -04:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Chuanqi Xu	eb51dd719f	[Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2 Summary: Try to insert dbg.declare to entry.resume basic block in resume function. In this way, we could print alloca such as __promise in gdb/lldb under O2, which would be beneficial to debug coroutine program. Test Plan: check-llvm Reviewed by: aprantl Differential Revision: https://reviews.llvm.org/D96938	2021-03-31 10:37:06 +08:00
Fangrui Song	3e5ee194c0	[SimpleLoopUnswitch] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `431a40e1e2`	2021-03-30 19:27:10 -07:00
Juneyoung Lee	431a40e1e2	[LoopUnswitch] Assert that branch condition is either and/or but not both as suggested at https://reviews.llvm.org/rG5bb38e84d3d0#986321	2021-03-31 10:35:22 +09:00
Sanjay Patel	c2ebad8d55	[InstCombine] add fold for demand of low bit of abs() This is one problem shown in https://llvm.org/PR49763 https://alive2.llvm.org/ce/z/cV6-4K https://alive2.llvm.org/ce/z/9_3g-L	2021-03-30 15:14:37 -04:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
spupyrev	22998738e8	[SamplePGO] Keeping prof metadata for IndirectBrInst Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot. This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99550	2021-03-30 10:44:48 -07:00
Hongtao Yu	3e3fc431df	[CSSPGO] Top-down processing order based on full profile. Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example: 1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them. 2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining. 3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph. 4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions. Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4. I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%. The change is an enhancement to https://reviews.llvm.org/D95988. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D99351	2021-03-30 10:42:22 -07:00
Krasimir Georgiev	c51e91e046	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5178ffc7cf`. Compiling `llvm-profdata` with a compiler build from this produces a crashing binary.	2021-03-30 14:13:37 +02:00
Juneyoung Lee	6b4b1dc6ec	[LoopUnswitch] Simplify branch condition if it is select with constant operands This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 . `select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later transformations confused. Simplify the branch condition to not have the form.	2021-03-30 20:09:42 +09:00
Sander de Smalen	f71ed5dfe2	NFC: Migrate PartialInlining to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D97382	2021-03-30 11:59:45 +01:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Krasimir Georgiev	8e7df996e3	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `92ddd3c1b6`. Causes multistage clang crashes, e.g.: https://lab.llvm.org/buildbot/#/builders/36/builds/6678	2021-03-30 11:47:12 +02:00
Han Zhu	92ddd3c1b6	[loop-idiom] Hoist loop memcpys to loop preheader For a simple loop like: ``` struct S { int x; int y; char b; }; unsigned foo(S* __restrict__ a, S* b, int n) { for (int i = 0; i < n; i++) a[i] = b[i]; return sizeof(a[0]); } ``` We could eliminate the loop and convert it to a large memcpy of 12n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll` ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %0 = bitcast %struct.S* %arrayidx2 to i8* %1 = bitcast %struct.S* %arrayidx to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false) %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic. With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass. ``` %struct.S = type { i32, i32, i8 } define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr { entry: %a1 = bitcast %struct.S* %a to i8* %b2 = bitcast %struct.S* %b to i8* %cmp7 = icmp sgt i32 %n, 0 br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %0 = zext i32 %n to i64 %1 = mul nuw nsw i64 %0, 12 call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false) br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry ret i32 12 for.body: ; preds = %for.body, %for.body.preheader %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i32 %i.08 to i64 %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom %2 = bitcast %struct.S* %arrayidx2 to i8* %3 = bitcast %struct.S* %arrayidx to i8* %inc = add nuw nsw i32 %i.08, 1 %cmp = icmp slt i32 %inc, %n br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit } ; Function Attrs: argmemonly nofree nosync nounwind willreturn declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0 attributes #0 = { argmemonly nofree nosync nounwind willreturn } ``` Reviewed By: zino Differential Revision: https://reviews.llvm.org/D97667	2021-03-29 23:36:26 -07:00
Han Zhu	2bd4049ceb	Revert "[loop-idiom] Hoist loop memcpys to loop preheader" This reverts commit `deb5095833`. Bad commit message.	2021-03-29 23:35:35 -07:00
Han Zhu	deb5095833	[loop-idiom] Hoist loop memcpys to loop preheader Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision: Differential Revision: https://phabricator.intern.facebook.com/D26380397	2021-03-29 23:14:42 -07:00
Huihui Zhang	ca721042f1	[IPO][SampleContextTracker] Use SmallVector to track context profiles to prevent non-determinism. Use SmallVector instead of SmallSet to track the context profiles mapped. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test profile-context-tracker-debug.ll . Reviewed By: MaskRay, wenlei Differential Revision: https://reviews.llvm.org/D99547	2021-03-29 16:37:10 -07:00
Gulfem Savrun Yeniceri	5178ffc7cf	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-29 21:53:32 +00:00
Wenlei He	30b0232336	[CSSPGO][llvm-profgen] Context-sensitive global pre-inliner This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen. It will serve two purposes: 1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size. 2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation. Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler. This change only has the framework, with a few TODOs left for follow up patches for a complete implementation: 1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation. 2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR. Differential Revision: https://reviews.llvm.org/D99146	2021-03-29 09:46:14 -07:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Matt Arsenault	9a0c9402fa	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `07e46367ba`.	2021-03-29 08:55:30 -04:00
Jingu Kang	e4abb64100	[LoopUnswitch] Use reference variables instead of pointer one Differential Revision: https://reviews.llvm.org/D99496	2021-03-29 13:08:46 +01:00
Hans Wennborg	c6e5c4654b	Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places Using $ breaks demangling of the symbols. For example, $ c++filt _Z3foov\$123 _Z3foov$123 This causes problems for developers who would like to see nice stack traces etc., but also for automatic crash tracking systems which try to organize crashes based on the stack traces. Instead, use the period as suffix separator, since Itanium demanglers normally ignore such suffixes: $ c++filt _Z3foov.123 foo() [clone .123] This is already done in some places; try to do it everywhere. Differential revision: https://reviews.llvm.org/D97484	2021-03-29 13:03:52 +02:00
Oliver Stannard	07e46367ba	Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute"" Reverting because test 'Bindings/Go/go.test' is failing on most buildbots. This reverts commit `fc9df30991`.	2021-03-29 11:32:22 +01:00
Jingu Kang	cfe87d4edd	[NFC][LoopUnswitch] Move hasPartialIVCondition to LoopUtils Differential revision: https://reviews.llvm.org/D99490	2021-03-29 10:29:45 +01:00
Matt Arsenault	fc9df30991	Reapply "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `20d5c42e0e`.	2021-03-28 13:35:21 -04:00
Sanjay Patel	01ae6e5ead	[InstCombine] sink min/max intrinsics with common op after select This is another step towards parity with cmp+select min/max idioms. See D98152.	2021-03-28 13:13:04 -04:00
Nico Weber	20d5c42e0e	Revert "OpaquePtr: Turn inalloca into a type attribute" This reverts commit `4fefed6563`. Broke check-clang everywhere.	2021-03-28 13:02:52 -04:00
Matt Arsenault	4fefed6563	OpaquePtr: Turn inalloca into a type attribute I think byval/sret and the others are close to being able to rip out the code to support the missing type case. A lot of this code is shared with inalloca, so catch this up to the others so that can happen.	2021-03-28 11:12:23 -04:00
Florian Hahn	8c6c357897	[LV] Mark a few more cost-model members as const (NFC).	2021-03-28 14:59:48 +01:00
Florian Hahn	d2855eba81	[LV] Fix formatting from `2f9d68c3f1`.	2021-03-27 21:29:56 +00:00
Florian Hahn	2f9d68c3f1	[LV] Mark some methods as const (NFC). Mark a few methods as const, as they do not modify any state.	2021-03-27 21:27:53 +00:00
Juneyoung Lee	05884d3b52	Make FoldBranchToCommonDest poison-safe by default This is a small patch to make FoldBranchToCommonDest poison-safe by default. After `fc3f0c9c`, only two syntactic changes are needed to fix unit tests. This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99452	2021-03-27 19:05:12 +09:00
Juneyoung Lee	fc3f0c9cc0	[IRCE] Use m_LogicalAnd This is a minor fix to use m_LogicalAnd. This allows IRCE to recognize select form of and conditions as well.	2021-03-27 15:23:18 +09:00
Hongtao Yu	12ac0403b1	[CSSPGO][NFC] Fix a debug dump issue. During context promotion, intermediate nodes that are on a call path but do not come with a profile can be promoted together with their parent nodes. Do not print sample context string for such nodes since they do not have profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D99441	2021-03-26 16:06:56 -07:00
Chris Lattner	62c41cfba1	Add a missing file header comment, NFC.	2021-03-26 15:34:04 -07:00
Nikita Popov	4622648a06	Revert "[ArgPromotion] Copy additional metadata for loads." This reverts commit `166620a4f0`. A miscompile has been reported in https://reviews.llvm.org/D93927#2653480 and following.	2021-03-26 21:34:54 +01:00
Florian Hahn	4858e081d7	[ConstraintElimination] Only strip casts preserving the representation. Things like addrspacecast may not be no-ops, so we should not look through them.	2021-03-26 20:07:41 +00:00
Sanjay Patel	b0797e0c12	[SLP] use dyn_cast instead of isa + cast; NFC	2021-03-26 13:52:31 -04:00
Sanjay Patel	a26312f9d4	Revert "[SLP] allow matching integer min/max intrinsics as reduction ops" This reverts commit `3c8473ba53` and includes test diffs to maintain testing status. There's at least 1 place that was not updated with `7202f47508` , so we can crash mismatching select and intrinsics as shown in PR49730.	2021-03-26 09:59:14 -04:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Wenlei He	5f59f407f5	[CSSPGO] Minor tweak for inline candidate priority tie breaker When prioritize call site to consider for inlining in sample loader, use number of samples as a first tier breaker before using name/guid comparison. This would favor smaller functions when hotness is the same (from the same block). We could try to retrieve accurate function size if this turns out to be more important. Differential Revision: https://reviews.llvm.org/D99370	2021-03-25 21:15:36 -07:00
Leonard Chan	36eaeaf728	[llvm][hwasan] Add Fuchsia shadow mapping configuration Ensure that Fuchsia shadow memory starts at zero. Differential Revision: https://reviews.llvm.org/D99380	2021-03-25 15:28:59 -07:00
Guozhi Wei	3240910f00	[DAE] Adjust param/arg attributes when changing parameter to undef In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code. To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg. Differential Revision: https://reviews.llvm.org/D98899	2021-03-25 14:53:22 -07:00
Roman Lebedev	1c55dcbca7	[NFCI][SimplifyCFG] Don't pay for a Small{Map,Set}Vector when plain SmallSet will suffice This only changes the cases where we really don't care about the iteration order of the underlying contained, namely when we will use the values from it to form DTU updates.	2021-03-25 23:25:40 +03:00
Yevgeny Rouban	f7ef26ef0b	[SLP] Fix crash in reduction for integer min/max The SCEV commit `b46c085d2b` [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions seems to reveal a new crash in SLPVectorizer. SLP crashes expecting a SelectInst as an externally used value but umin() call is found. The patch relaxes the assumption to make the IR flag propagation safe. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D99328	2021-03-25 21:44:21 +07:00
Matt Morehouse	96a4167b4c	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-25 07:04:14 -07:00
Alexey Bataev	568c874117	[SLP]Improve and simplify extendSchedulingRegion. We do not need to scan further if the upper end or lower end of the basic block is reached already and the instruction is not found. It means that the instruction is definitely in the lower part of basic block or in the upper block relatively. This should improve compile time for the very big basic blocks. Differential Revision: https://reviews.llvm.org/D99266	2021-03-25 05:31:58 -07:00
Sameer Sahasrabuddhe	b92c8c22b9	[NewPM] Disable non-trivial loop-unswitch on targets with divergence Unswitching a loop on a non-trivial divergent branch is expensive since it serializes the execution of both version of the loop. But identifying a divergent branch needs divergence analysis, which is a function level analysis. The legacy pass manager handles this dependency by isolating such a loop transform and rerunning the required function analyses. This functionality is currently missing in the new pass manager, and there is no safe way for the SimpleLoopUnswitch pass to depend on DivergenceAnalysis. So we conservatively assume that all non-trivial branches are divergent if the target has divergence. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D98958	2021-03-25 11:27:10 +00:00
Philip Reames	9a82f42d12	Plumb TLI through isSafeToExecuteUnconditionally [NFC] Split from D95815 to reduce patch size. Isn't (yet) used for anything, only the client side is wired up.	2021-03-24 17:52:04 -07:00
Matt Morehouse	c8ef98e5de	Revert "[HWASan] Use page aliasing on x86_64." This reverts commit `63f73c3eb9` due to breakage on aarch64 without TBI.	2021-03-24 16:18:29 -07:00
Roman Lebedev	2070fe7144	[NFCI][SimplifyCFG] Don't form DTU updates if we aren't going to apply them I think we may want to have a thin wrapper over a vector to deduplicate those `if(DTU)` predicates, and instead do them in the `insert()` itself.	2021-03-25 00:02:37 +03:00
Congzhe Cao	829c1b6443	[LoopInterchange] fix tightlyNested() in LoopInterchange legality This is yet another attempt to fix tightlyNested(). Add checks in tightlyNested() for the inner loop exit block, such that 1) if there is control-flow divergence in between the inner loop exit block and the outer loop latch, or 2) if the inner loop exit block contains unsafe instructions, tightlyNested() returns false. The reasoning behind is that after interchange, the original inner loop exit block, which was part of the outer loop, would be put into the new inner loop, and will be executed different number of times before and after interchange. Thus it should be dealt with appropriately. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D98263	2021-03-24 15:49:25 -04:00
Florian Hahn	9d45579279	[LV] Factor out phi type access to variable (NFC). A slight simplification of the code to reduce future diffs.	2021-03-24 19:25:22 +00:00
Florian Hahn	8d1342f79d	[LV] Remove redundant access to Legal::getReductionVars() (NFC). The reduction descriptor is retrieved earlier and stored in a variable RdxDesc already.	2021-03-24 19:15:14 +00:00
Gulfem Savrun Yeniceri	5fbe1fdf17	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `5fd001a5ff` because it broke clang-with-thin-lto-ubuntu bot.	2021-03-24 18:59:33 +00:00
Matt Morehouse	63f73c3eb9	[HWASan] Use page aliasing on x86_64. Userspace page aliasing allows us to use middle pointer bits for tags without untagging them before syscalls or accesses. This should enable easier experimentation with HWASan on x86_64 platforms. Currently stack, global, and secondary heap tagging are unsupported. Only primary heap allocations get tagged. Note that aliasing mode will not work properly in the presence of fork(), since heap memory will be shared between the parent and child processes. This mode is non-ideal; we expect Intel LAM to enable full HWASan support on x86_64 in the future. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98875	2021-03-24 11:43:41 -07:00
Gulfem Savrun Yeniceri	5fd001a5ff	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-24 17:31:18 +00:00
Nikita Popov	8a168d2d70	[LICM] Fix NumSunk statistic (NFC) LICM can sink instructions that have uses inside the loop, as long as these uses are considered "free". However, if there were only free uses inside the loop, and no uses outside the loop at all, the instruction would still count towards the NumSunk statistic. This resulted in a wild inflation of the NumSunk metric. After this patch it drops down from 1141787 to 5852 on test-suite O3.	2021-03-24 18:28:19 +01:00
Thomas Preud'homme	3b52c04e82	Make FindAvailableLoadedValue TBAA aware FindAvailableLoadedValue() relies on FindAvailablePtrLoadStore() to run the alias analysis when searching for an equivalent value. However, FindAvailablePtrLoadStore() calls the alias analysis framework with a memory location for the load constructed from an address and a size, which thus lacks TBAA metadata info. This commit modifies FindAvailablePtrLoadStore() to accept an optional memory location as parameter to allow FindAvailableLoadedValue() to create it based on the load instruction, which would then have TBAA metadata info attached. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99206	2021-03-24 17:20:26 +00:00
Roman Lebedev	fe36b834db	[NFCI][SimplifyCFG] Fold branch to common dest: don't check cost if no qualified preds	2021-03-24 19:01:47 +03:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Florian Hahn	cd0c00c9fe	[LV] Move exact FP math check out of Requirements. We know if the loop contains FP instructions preventing vectorization after we are done with legality checks. This patch updates the code the check for un-vectorizable FP operations earlier, to avoid unnecessarily running the cost model and picking a vectorization factor. It also makes the code more direct and moves the check to a position where similar checks are done. I might be missing something, but I don't see any reason to handle this check differently to other, similar checks. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98633	2021-03-24 11:01:44 +00:00
Ta-Wei Tu	4d9d736875	[NFC] Improve debug message and test description in `4c1f74a`	2021-03-24 18:21:13 +08:00
Ta-Wei Tu	4c1f74a76c	[LoopFlatten] Fix invalid assertion (PR49571) The `InductionPHI` is not necessarily the increment instruction, as demonstrated in pr49571.ll. This patch removes the assertion and instead bails out from the `LoopFlatten` pass if that happens. This fixes https://bugs.llvm.org/show_bug.cgi?id=49571 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99252	2021-03-24 18:08:27 +08:00
Ta-Wei Tu	8fde25b3c3	[NFC] Remove redundant `struct` prefix Reviewed By: SjoerdMeijer, fhahn Differential Revision: https://reviews.llvm.org/D99251	2021-03-24 17:58:33 +08:00
Alexey Bataev	99203f2004	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 14:25:36 -07:00
Alexey Bataev	f1b47ad278	Revert "[Analysis]Add getPointersDiff function to improve compile time." This reverts commit `065a14a12d` to investigate and fix crash in SLP vectorizer.	2021-03-23 13:17:54 -07:00
Alexey Bataev	065a14a12d	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 12:58:42 -07:00
Roman Lebedev	b5822026dd	[SimplifyCFG] 'Fold branch to common dest': don't overestimate the cost `FoldBranchToCommonDest()` has a certain budget (`-bonus-inst-threshold=`) for bonus instruction duplication. And currently it calculates the cost as-if it will actually duplicate into each predecessor. But ignoring the budget, it won't always duplicate into each predecessor, there are some correctness and profitability checks. So when calculating the cost, we should first check into which blocks will we actually duplicate, and only then use that block count to do budgeting.	2021-03-23 18:30:26 +03:00
Roman Lebedev	514bc01ca3	[SimplifyCFG] FoldBranchToCommonDest(): properly handle same-block external uses (PR49510/PR49689) We clone bonus instructions to the end of the predecessor block, and then use `SSAUpdater::RewriteUseAfterInsertions()`. But that only deals with the cases where the use-to-be-rewritten are either in different block from the def, or come after the def. But in some loop cases, the external use may be in the beginning of predecessor block, before the newly cloned bonus instruction. `SSAUpdater::RewriteUseAfterInsertions()` does not deal with that. Notably, the external use can't happen to be both in the same block and after the newly-cloned instruction, because of the fold preconditions. To properly handle these cases, when the use is in the same block, we should instead use `SSAUpdater::RewriteUse()`. TBN, they do the same thing for PHI users. Fixes https://bugs.llvm.org/show_bug.cgi?id=49510 Likely Fixes https://bugs.llvm.org/show_bug.cgi?id=49689	2021-03-23 17:37:28 +03:00
Sanjay Patel	1bf8f9e228	[SimplifyCFG] use profile metadata to refine merging branch conditions 2nd try (original: `27ae17a6b0`) with fix/test for crash. We must make sure that TTI is available before trying to use it because it is not required (might be another bug). Original commit message: This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-23 10:19:37 -04:00
Sanjay Patel	3c8473ba53	[SLP] allow matching integer min/max intrinsics as reduction ops As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-23 08:56:44 -04:00
Luke Drummond	520f70e94d	[NFC] clang-format llvm/lib/Transforms/Utils/CloneFunction.cpp Differential Revision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	ab44ec1b22	[NFC] Minor refactor - Give unwieldy repeated expression a name - Use a ranged `for` basic block iterator Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:28 +00:00
Luke Drummond	0448ddd169	[NFCI] cleanup CloneFunctionInto Hoist early return for decl-only clones to before DIFinder calculation. Also fix an out of date assert message after invariants changed in `22a52dfddc`. Reviewed by: nikic, dexonsmith Differential Revisision: https://reviews.llvm.org/D98957	2021-03-23 12:53:27 +00:00
Nashe Mncube	5d929794a8	[llvm-opt] Bug fix within combining FP vectors A bug was found within InstCombineCasts where a function call is only implemented to work with FixedVectors. This caused a crash when a ScalableVector was passed to this function. This commit introduces a regression test which recreates the failure and a bug fix. Differential Revision: https://reviews.llvm.org/D98351	2021-03-23 12:13:41 +00:00
Florian Hahn	e43e8e9138	[AnnotationRemarks] Use subprogram location for summary remarks. The summary remarks are generated on a per-function basis. Using the first instruction's location is sub-optimal for 2 reasons: 1. Sometimes the first instruction is missing !dbg 2. The location of the first instruction may be mis-leading. Instead, just use the location of the function directly.	2021-03-23 12:05:41 +00:00
David Sherwood	d70251163f	[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector In places where we create a ConstantVector whose elements are a linear sequence of the form <start, start + 1, start + 2, ...> I've changed the code to make use of CreateStepVector, which creates a vector with the sequence <0, 1, 2, ...>, and a vector addition operation. This patch is a non-functional change, since the output from the vectoriser remains unchanged for fixed length vectors and there are existing asserts that still fire when attempting to use scalable vectors for vectorising induction variables. In a later patch we will enable support for scalable vectors in InnerLoopVectorizer::getStepVector(), which relies upon the new stepvector intrinsic in IRBuilder::CreateStepVector. Differential Revision: https://reviews.llvm.org/D97861	2021-03-23 11:29:05 +00:00
Florian Hahn	f759d512c8	[VPlan] Include name when printing after `93a9d2de8f`. The name is included when printing in DOT mode. Also print it in non-DOT mode after `93a9d2de8f`. This will become more important to distinguish different plans once VPlans are gradually refined.	2021-03-23 09:50:14 +00:00
Juneyoung Lee	960a767368	Reland "[InstCombine] Add simplification of two logical and/ors" This relands `07c3b97e18` (D96945) which was reverted by commit `f49354838e`. The two-stage compilation successfully tests passes on my machine.	2021-03-23 16:24:50 +09:00
Fangrui Song	3c81822ec5	[SanitizerCoverage] Use External on Windows This should fix https://reviews.llvm.org/D98903#2643589 though it is not clear to me why ExternalWeak does not work.	2021-03-22 23:05:36 -07:00
Serguei Katkov	9fec382601	[RS4GC] Fix hang on infinite loop meetBDVState utility may sets the base pointer for the conflict state. At this moment the base for conflict state does not have any meaning but is used in comparison of BDV states. This comparison is used as an indicator of progress done on iteration and RS4GC pass uses infinite loop to reach fixed point. As a result for added test on each iteration state for some phi nodes is updated with other base value for conflict state and it indicates as a progress while for conflict state there is no any progress more possible. In reality the base value is transferred from one state to another and pass detects the progress on these states. The test is very fragile. The traversal order of states and operands of phi nodes plays important role. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D99058	2021-03-23 12:54:51 +07:00
Gulfem Savrun Yeniceri	e3a6d70c68	Revert "[Passes] Add relative lookup table converter pass" This reverts commit `78a65cd945` which caused buildbot failures.	2021-03-23 00:43:16 +00:00
Juneyoung Lee	5c2e50b5d2	Reland "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This relands commit `99108c791d` (D95026) which was reverted by `8d5a981a13` because the underlying problem (https://llvm.org/pr49495) is fixed.	2021-03-23 09:19:53 +09:00
Gulfem Savrun Yeniceri	78a65cd945	[Passes] Add relative lookup table converter pass Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in: https://bugs.llvm.org/show_bug.cgi?id=45244 This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly. Differential Revision: https://reviews.llvm.org/D94355	2021-03-22 22:09:02 +00:00
Roman Lebedev	d37fe26a2b	[NFC][IR] Type: add getWithNewType() method Sometimes you want to get a type with same vector element count as the current type, but different element type, but there's no QOL wrapper to do that. Add one.	2021-03-23 00:50:58 +03:00
Sanjay Patel	95f7f7c21b	Revert "[SimplifyCFG] use profile metadata to refine merging branch conditions" This reverts commit `27ae17a6b0`. There are bot failures that end with: #4 0x00007fff7ae3c9b8 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0 #5 0x00007fff84e504d8 (linux-vdso64.so.1+0x4d8) #6 0x00007fff7c419a5c llvm::TargetTransformInfo::getPredictableBranchThreshold() const (/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1.install/bin/../lib/libLLVMAnalysis.so.13git+0x479a5c) ...but not sure how to trigger that yet.	2021-03-22 17:48:06 -04:00
Sanjay Patel	27ae17a6b0	[SimplifyCFG] use profile metadata to refine merging branch conditions This is one step towards solving: https://llvm.org/PR49336 In that example, we disregard the recommended usage of builtin_expect, so an expensive (unpredictable) branch is folded into another branch that is guarding it. Here, we read the profile metadata to see if the 1st (predecessor) condition is likely to cause execution to bypass the 2nd (successor) condition before merging conditions by using logic ops. Differential Revision: https://reviews.llvm.org/D98898	2021-03-22 16:49:21 -04:00
Sanjay Patel	664d0c052c	[TargetTransformInfo] move branch probability query from TargetLoweringInfo This is no-functional-change intended (NFC), but needed to allow optimizer passes to use the API. See D98898 for a proposed usage by SimplifyCFG. I'm simplifying the code by removing the cl::opt. That was added back with the original commit in D19488, but I don't see any evidence in regression tests that it was used. Target-specific overrides can use the usual patterns to adjust as necessary. We could also restore that cl::opt, but it was not clear to me exactly how to do it in the convoluted TTI class structure.	2021-03-22 15:55:34 -04:00
Bjorn Pettersson	688cdddafb	[SLP] Honor min/max regsize and min/max VF in vectorizeStores Make sure we use PowerOf2Floor instead of PowerOf2Ceil when calculating max number of elements that fits inside a vector register (otherwise we could end up creating vectors larger than the maximum vector register size). Also make sure we honor the min/max VF (as given by TTI or cmd line parameters) when doing vectorizeStores. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D97691	2021-03-22 17:29:35 +01:00
Matt Morehouse	772851ca4e	[HWASan] Disable stack, globals and force callbacks for x86_64. Subsequent patches will implement page-aliasing mode for x86_64, which will initially only work for the primary heap allocator. We force callback instrumentation to simplify the initial aliasing implementation. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D98069	2021-03-22 08:02:27 -07:00
Bradley Smith	48f5a392cb	[IR] Add vscale_range IR function attribute This attribute represents the minimum and maximum values vscale can take. For now this attribute is not hooked up to anything during codegen, this will be added in the future when such codegen is considered stable. Additionally hook up the -msve-vector-bits=<x> clang option to emit this attribute. Differential Revision: https://reviews.llvm.org/D98030	2021-03-22 12:05:06 +00:00
Max Kazantsev	8fab9f824f	[IndVars] Sharpen context in eliminateIVComparison When eliminating comparisons, we can use common dominator of all its users as context. This gives better results when ICMP is not computed right before the branch that uses it. Differential Revision: https://reviews.llvm.org/D98924 Reviewed By: lebedev.ri	2021-03-22 11:55:57 +07:00
Roman Lebedev	e3a4701627	[clang][CodeGen] Lower Likelihood attributes to @llvm.expect intrin instead of branch weights `08196e0b2e` exposed LowerExpectIntrinsic's internal implementation detail in the form of LikelyBranchWeight/UnlikelyBranchWeight options to the outside. While this isn't incorrect from the results viewpoint, this is suboptimal from the layering viewpoint, and causes confusion - should transforms also use those weights, or should they use something else, D98898? So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight internal again, and fixing all the code that used it directly, which currently is only clang codegen, thankfully, to emit proper @llvm.expect intrinsics instead.	2021-03-21 22:50:21 +03:00
Roman Lebedev	37d6be9052	Revert "[BranchProbability] move options for 'likely' and 'unlikely'" Upon reviewing D98898 i've come to realization that these are implementation detail of LowerExpectIntrinsicPass, and they should not be exposed to outside of it. This reverts commit `ee8b53815d`.	2021-03-21 22:50:21 +03:00
Sanjay Patel	ee8b53815d	[BranchProbability] move options for 'likely' and 'unlikely' This makes the settings available for use in other passes by housing them within the Support lib, but NFC otherwise. See D98898 for the proposed usage in SimplifyCFG (where this change was originally included). Differential Revision: https://reviews.llvm.org/D98945	2021-03-20 14:46:46 -04:00
Jeroen Dobbelaere	77080a1eb6	Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types. Now that intrinsic name mangling can cope with unnamed types, the custom name mangling in PredicateInfo (introduced by D49126) can be removed. (See D91250, D48541) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91661	2021-03-20 11:37:09 +01:00
Arthur Eubanks	a17394dc88	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those when the checks are enabled (which is by default with expensive checks on). Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. This is a reland of https://reviews.llvm.org/D98820 which was reverted due to unacceptably large compile time regressions in normal debug builds.	2021-03-19 14:56:37 -07:00
Arthur Eubanks	a1ab5627f0	Revert "[NewPM] Verify LoopAnalysisResults after a loop pass" This reverts commit `94c269baf5`. Still causes too large of compile time regression in normal debug builds. Will put under expensive checks instead.	2021-03-19 14:31:08 -07:00
Arthur Eubanks	94c269baf5	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Only verify MSSA when VerifyMemorySSA, normally it's very expensive. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98820	2021-03-19 13:26:45 -07:00
Philip Reames	5698537f81	Update basic deref API to account for possiblity of free [NFC] This patch is plumbing to support work towards the goal outlined in the recent llvm-dev post "[llvm-dev] RFC: Decomposing deref(N) into deref(N) + nofree". The point of this change is purely to simplify iteration on other pieces on way to making the switch. Rebuilding with a change to Value.h is slow and painful, so I want to get the API change landed. Once that's done, I plan to more closely audit each caller, add the inference rules in their own patch, then post a patch with the langref changes and test diffs. The value of the command line flag is that we can exercise the inference logic in standalone patches without needing the whole switch ready to go just yet. Differential Revision: https://reviews.llvm.org/D98908	2021-03-19 11:17:19 -07:00
Andrei Elovikov	92205cb27f	[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)" Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98897	2021-03-19 10:50:12 -07:00
Andrei Elovikov	93a9d2de8f	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-19 10:50:12 -07:00
Max Kazantsev	8eefa07fcf	[NFC] Move function up in code	2021-03-19 14:03:31 +07:00
Max Kazantsev	8bb952b57f	[NFC] Factor out utility function for finding common dom of user set	2021-03-19 13:49:29 +07:00
Max Kazantsev	16370e02a7	[IndVars] Provide eliminateIVComparison with context We can prove more predicates when we have a context when eliminating ICmp. As first (and very obvious) approximation we can use the ICmp instruction itself, though in the future we are going to use a common dominator of all its users. Need some refactoring before that. Observed ~0.5% negative compile time impact. Differential Revision: https://reviews.llvm.org/D98697 Reviewed By: lebedev.ri	2021-03-19 12:28:22 +07:00
Fangrui Song	9558456b53	[SanitizerCoverage] Make __start_/__stop_ symbols extern_weak On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or `comdat noduplicates`). With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage collect such sections. If all `__sancov_bools` are discarded, LLD will error `error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar). ``` % cat a.c void discarded() {} % clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections ... ld.lld: error: undefined hidden symbol: __start___sancov_guards >>> referenced by a.c >>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard) ``` Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the undefined error. Differential Revision: https://reviews.llvm.org/D98903	2021-03-18 16:46:04 -07:00
George Balatsouras	d10f173f34	[dfsan] Add -dfsan-fast-8-labels flag This is only adding support to the dfsan instrumentation pass but not to the runtime. Added more RUN lines for testing: for each instrumentation test that had a -dfsan-fast-16-labels invocation, a new invocation was added using fast8. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D98734	2021-03-18 16:28:42 -07:00
Mehdi Amini	3614df3537	Revert "[VPlan] Add plain text (not DOT's digraph) dumps" This reverts commit `6b053c9867`. The build is broken: ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a	2021-03-18 19:20:39 +00:00
Andrei Elovikov	6b053c9867	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-18 11:33:39 -07:00
Wei Mi	14756b70ee	[SampleFDO] Don't mix up the existing indirect call value profile with the new value profile annotated after inlining. In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we use the magic number -1 in the value profile to avoid repeated indirect call promotion to the same target for an indirect call. Function updateIDTMetaData is used to mark an target as being promoted in the value profile with the magic number. updateIDTMetaData is also used to update the value profile when an indirect call is inlined and new inline instance profile should be applied. For the second case, currently updateIDTMetaData mixes up the existing value profile of the indirect call with the new profile, leading to the problematic senario that a target count is larger than the total count in the value profile. The patch fixes the problem. When updateIDTMetaData is used to update the value profile after inlining, all the values in the existing value profile will be dropped except the values with the magic number counts. Differential Revision: https://reviews.llvm.org/D98835	2021-03-18 09:54:34 -07:00
Mircea Trofin	4b1c8070bb	[NFC][ArgumentPromotion] Clear FAM cached results of erased function. Not doing it here can lead to subtle bugs - the analysis results are associated by the Function object's address. Nothing stops the memory allocator from allocating new functions at the same address.	2021-03-18 09:17:32 -07:00
Alexey Bataev	b3ced9852c	[SLP]Fix crash on extending scheduling region. If SLP vectorizer tries to extend the scheduling region and runs out of the budget too early, but still extends the region to the new ending instructions (i.e., it was able to extend the region for the first instruction in the bundle, but not for the second), the compiler need to recalculate dependecies in full, just like if the extending was successfull. Without it, the schedule data chunks may end up with the wrong number of (unscheduled) dependecies and it may end up with the incorrect function, where the vectorized instruction does not dominate on the extractelement instruction. Differential Revision: https://reviews.llvm.org/D98531	2021-03-18 06:11:08 -07:00
Max Kazantsev	26ec76add5	[NFC] One more use case for evaluatePredicate	2021-03-18 19:21:29 +07:00
Max Kazantsev	1067a13cc1	[NFC] Use evaluatePredicate in eliminateComparison Just makes code simpler.	2021-03-18 19:21:29 +07:00
Arthur Eubanks	792bed6a4c	Revert "[NewPM] Verify LoopAnalysisResults after a loop pass" This reverts commit `6db3ab2903`. Causing too large of compile time regression.	2021-03-17 15:22:52 -07:00
Arthur Eubanks	6db3ab2903	[NewPM] Verify LoopAnalysisResults after a loop pass All loop passes should preserve all analyses in LoopAnalysisResults. Add checks for those. Note that due to PR44815, we don't check LAR's ScalarEvolution. Apparently calling SE.verify() can change its results. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98805	2021-03-17 13:37:22 -07:00
Philip Reames	31764ea295	[LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC] (Triggered by a review comment on D98728, but otherwise unrelated.)	2021-03-17 12:14:01 -07:00
Philip Reames	7c7f4676cd	[LICM] Fix a crash when sinking instructions w/token operands It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not. This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well. Differential Revision: https://reviews.llvm.org/D98728	2021-03-17 11:18:46 -07:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
Stephen Tozer	3bfddc2593	Reapply "[DebugInfo] Handle multiple variable location operands in IR" Fixed section of code that iterated through a SmallDenseMap and added instructions in each iteration, causing non-deterministic code; replaced SmallDenseMap with MapVector to prevent non-determinism. This reverts commit `01ac6d1587`.	2021-03-17 16:45:25 +00:00
LemonBoy	4f024938e4	[LoopVectorize] Refine hasIrregularType predicate The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector. The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32. The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D97465	2021-03-17 17:03:47 +01:00
Hans Wennborg	01ac6d1587	Revert "[DebugInfo] Handle multiple variable location operands in IR" This caused non-deterministic compiler output; see comment on the code review. > This patch updates the various IR passes to correctly handle dbg.values with a > DIArgList location. This patch does not actually allow DIArgLists to be produced > by salvageDebugInfo, and it does not affect any pass after codegen-prepare. > Other than that, it should cover every IR pass. > > Most of the changes simply extend code that operated on a single debug value to > operate on the list of debug values in the style of any_of, all_of, for_each, > etc. Instances of setOperand(0, ...) have been replaced with with > replaceVariableLocationOp, which takes the value that is being replaced as an > additional argument. In places where this value isn't readily available, we have > to track the old value through to the point where it gets replaced. > > Differential Revision: https://reviews.llvm.org/D88232 This reverts commit `df69c69427`.	2021-03-17 13:36:48 +01:00
David Green	3c25c40d51	[LV] Account for the cost of predication of scalarized load/store This adds the cost of an i1 extract and a branch to the cost in getMemInstScalarizationCost when the instruction is predicated. These predicated loads/store would generate blocks of something like: %c1 = extractelement <4 x i1> %C, i32 1 br i1 %c1, label %if, label %else if: %sa = extractelement <4 x i32> %a, i32 1 %sb = getelementptr inbounds float, float* %pg, i32 %sa %sv = extractelement <4 x float> %x, i32 1 store float %sa, float* %sb, align 4 else: So this increases the cost by the extract and branch. This is probably still too low in many cases due to the cost of all that branching, but there is already an existing hack increasing the cost using useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is a load or there are more than one store. This patch improves the cost for when there is only a single store, and hopefully at some point in the future the hack can be removed. Differential Revision: https://reviews.llvm.org/D98243	2021-03-17 10:57:50 +00:00
Bu Le	9abe500473	[SLP] Fix the trunc instruction insertion problem Current SLP pass has this piece of code that inserts a trunc instruction after the vectorized instruction. In the case that the vectorized instruction is a phi node and not the last phi node in the BB, the trunc instruction will be inserted between two phi nodes, which will trigger verify problem in debug version or unpredictable error in another pass. This patch changes the algorithm to 'if the last vectorized instruction is a phi, insert it after the last phi node in current BB' to fix this problem.	2021-03-17 13:51:08 +03:00
Arthur Eubanks	70af2924a7	[Unswitch] Guard dbgs logging with LLVM_DEBUG	2021-03-16 22:31:57 -07:00
Sanjay Patel	7202f47508	[SLP] separate min/max matching from its instruction-level implementation; NFC The motivation is to handle integer min/max reductions independently of whether they are in the current cmp+sel form or the planned intrinsic form. We assumed that min/max included a select instruction, but we can decouple that implementation detail by checking the instructions themselves rather than relying on the recurrence (reduction) type.	2021-03-16 17:16:11 -04:00
Mohammad Hadi Jooybar	302b80abf0	[InstCombine] Avoid Bitcast-GEP fusion for pointers directly from allocation functions Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages. Here is the original motivation for this patch: ``` #include<stdio.h> #include<malloc.h> typedef struct __Node{ double f; struct __Node next; } Node; void foo () { Node a = (Node) malloc (sizeof(Node)); a->next = NULL; a->f = 11.5f; Node ptr = a; double sum = 0.0f; while (ptr) { sum += ptr->f; ptr = ptr->next; } printf("%f\n", sum); } ``` By explicit assignment `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation . The final IR before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end, label %while.body while.end: ; preds = %while.body, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` Final IR after this patch: ``` ; Function Attrs: nofree nounwind define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { while.end: %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01) ret void } ``` IR before GVN before this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2 %f = bitcast i8* %call to double* store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.lr.ph while.body.lr.ph: ; preds = %entry %1 = bitcast i8* %call to %struct.__Node* br label %while.body while.body: ; preds = %while.body.lr.ph, %while.body %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ] %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %2 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %2 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %3 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %3, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` IR before GVN after this patch: ``` define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 { entry: %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2 %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2 %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0 store double 1.150000e+01, double* %f, align 8, !tbaa !8 %tobool12 = icmp eq i8* %call, null br i1 %tobool12, label %while.end, label %while.body.preheader while.body.preheader: ; preds = %entry br label %while.body while.body: ; preds = %while.body.preheader, %while.body %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ] %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ] %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0 %1 = load double, double* %f1, align 8, !tbaa !8 %add = fadd contract double %sum.014, %1 %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1 %2 = load %struct.__Node, %struct.__Node* %next2, align 8, !tbaa !2 %tobool = icmp eq %struct.__Node* %2, null br i1 %tobool, label %while.end.loopexit, label %while.body while.end.loopexit: ; preds = %while.body %add.lcssa = phi double [ %add, %while.body ] br label %while.end while.end: ; preds = %while.end.loopexit, %entry %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ] %call3 = tail call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa) ret void } ``` The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert: ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %0 = bitcast i8* %call to %struct.__Node* %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1 store %struct.__Node* null, %struct.__Node** %next ``` to ``` %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) %next = getelementptr inbounds i8, i8* %call, i64 8 %0 = bitcast i8* %next to %struct.__Node** store %struct.__Node* null, %struct.__Node** %0 ``` GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure. This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96881	2021-03-16 17:05:44 -04:00
Philip Reames	cec9e7352b	[rs4gc] Simplify code by cloning existing instructions when inserting base chain [NFC] Previously we created a new node, then filled in the pieces. Now, we clone the existing node, then change the respective fields. The only change in handling is with phis since we have to handle multiple incoming edges from the same block a bit differently. Differential Revision: https://reviews.llvm.org/D98316	2021-03-16 13:10:32 -07:00
Philip Reames	ef884e155d	[rs4gc] don't force a conflict for a canonical broadcast A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added. The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier. Differential Revision: https://reviews.llvm.org/D98315	2021-03-16 12:59:06 -07:00
Philip Reames	5cabf472cb	[rs4gc] don't duplicate existing values which are provably base pointers RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer. This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change. Differential Revision: D98122	2021-03-16 12:51:21 -07:00
Liam Keegan	edf9565a86	[MemCpyOpt] Add missing MemorySSAWrapperPass dependency macro Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass, since MemCpyOpt now uses MemorySSA by default. Differential Revision: https://reviews.llvm.org/D98484	2021-03-16 20:30:00 +01:00
Philip Reames	6972e39d47	[gvn] CSE gc.relocates based on meaning, not spelling (try 2) This was (partially) reverted in `cfe8f8e0` because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems. This change has been reworked to not need that change (via some explicit checks in client code). This is being done to address the original optimization issue and simplify the testing of the readonly changes. I'm working on that piece under 49607. Original commit message follows: The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974	2021-03-16 10:59:31 -07:00
Florian Hahn	f586de8459	[VPlan] Remove PredInst2Recipe, use VP operands instead. (NFC) Instead of maintaining a separate map from predicated instructions to recipes, we can instead directly look at the VP operands. If the operand comes from a predicated instruction, the operand will be a VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.	2021-03-16 17:40:35 +00:00
Sanjay Patel	40fdb43d30	[SLP] improve readability in reduction logic; NFC We had 2 different and ambiguously-named 'I' variables.	2021-03-16 07:35:13 -04:00
Caroline Concatto	3c03635d53	[SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse This patch adds support for reverse loop vectorization. It is possible to vectorize the following loop: ``` for (int i = n-1; i >= 0; --i) a[i] = b[i] + 1.0; ``` with fixed or scalable vector. The loop-vectorizer will use 'reverse' on the loads/stores to make sure the lanes themselves are also handled in the right order. This patch adds support for scalable vector on IRBuilder interface to create a reverse vector. The IR function CreateVectorReverse lowers to experimental.vector.reverse for scalable vector and keedp the original behavior for fixed vector using shuffle reverse. Differential Revision: https://reviews.llvm.org/D95363	2021-03-16 07:51:59 +00:00
Wenlei He	a5d30421a6	[CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining. For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work: - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee. - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added. - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile). - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate. Differential Revision: https://reviews.llvm.org/D98590	2021-03-15 12:22:15 -07:00
Juneyoung Lee	edf634ebc2	[AssumeBundles] Add nonnull/align to op bundle if noundef exists This is a patch to add nonnull and align to assume's operand bundle only if noundef exists. Since nonnull and align in fn attr have poison semantics, they should be paired with noundef or noundef-implying attributes to be immediate UB. Reviewed By: jdoerfert, Tyker Differential Revision: https://reviews.llvm.org/D98228	2021-03-16 10:23:42 +09:00
Hongtao Yu	beea06c106	[NFC][Inliner] Debugging support to print funtion size after each inlining. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D98439	2021-03-14 22:11:53 -07:00
Chenguang Wang	166620a4f0	[ArgPromotion] Copy additional metadata for loads. Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93927	2021-03-14 21:28:14 +00:00
Simonas Kazlauskas	7d7001b2cb	[InstCombine] Restrict a GEP transform to avoid changing provenance This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check that the underlying objects are both the same and therefore this transformation wouldn't incur a provenance change, if applied. https://alive2.llvm.org/ce/z/SYF_yv Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98588	2021-03-14 16:32:04 +02:00
Luo, Yuanke	66fbf5fafb	[X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D98247	2021-03-14 09:24:56 +08:00
Nikita Popov	5556660971	[MemCpyOpt] Handle read from lifetime.start with offset This fixes a regression from the MemDep-based implementation: MemDep completely ignores lifetime.start intrinsics that aren't MustAlias -- this is probably unsound, but it does mean that the MemDep based implementation successfully eliminated memcpy's from lifetime.start if the memcpy happens at an offset, rather than the base address of the alloca. Add a special case for the case where the lifetime.start spans the whole alloca (which is pretty much the only kind of lifetime.start that frontends ever emit), as we don't need to figure out our exact aliasing relationship in that case, the whole alloca is dead prior to the call. If this doesn't cover all practically relevant cases, then it would be possible to make use of the recently added PartialAlias clobber offsets to make this more precise.	2021-03-13 20:38:09 +01:00
Sanjay Patel	4224a36957	[InstCombine] avoid creating an extra instruction in zext fold and possible inf-loop The structure of this fold is suspect vs. most of instcombine because it creates instructions and tries to delete them immediately after. If we don't have the operand types for the icmps, then we are not behaving as assumed. And as shown in PR49475, we can inf-loop.	2021-03-13 08:30:51 -05:00
Roman Lebedev	6e9b9978cf	[LSR] Don't try to fixup uses in 'EH pad' instructions The added test case crashes before this fix: ``` opt: /repositories/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:5172: BasicBlock::iterator (anonymous namespace)::LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator, const (anonymous namespace)::LSRFixup &, const (anonymous namespace)::LSRUse &, llvm::SCEVExpander &) const: Assertion `!isa<PHINode>(LowestIP) && !LowestIP->isEHPad() && !isa<DbgInfoIntrinsic>(LowestIP) && "Insertion point must be a normal instruction"' failed. ``` This is fully analogous to the previous commit, with the pointer constant replaced to be something non-null. The comparison here can be strength-reduced, but the second operand of the comparison happens to be identical to the constant pointer in the `catch` case of `landingpad`. While LSRInstance::CollectLoopInvariantFixupsAndFormulae() already gave up on uses in blocks ending up with EH pads, it didn't consider this case. Eventually, `LSRInstance::AdjustInsertPositionForExpand()` will be called, but the original insertion point it will get is the user instruction itself, and it doesn't want to deal with EH pads, and asserts as much. It would seem that this basically never happens in-the-wild, otherwise it would have been reported already, so it seems safe to take the cautious approach, and just not deal with such users.	2021-03-13 16:05:34 +03:00
Nikita Popov	2902bdeea1	[MemCpyOpt] Use AA to check for MustAlias between memset and memcpy Rather than checking for simple equality, check for MustAlias, as we do in other transforms. This catches equivalent GEPs.	2021-03-13 11:41:15 +01:00
Nikita Popov	9080444f33	[MemCpyOpt] Don't generate zero-size memset If a memset destination is overwritten by a memcpy and the sizes are exactly the same, then the memset is simply dead. We can directly drop it, instead of replacing it with a memset of zero size, which is particularly ugly for the case of a dynamic size.	2021-03-13 11:41:15 +01:00
Wei Mi	ef9d7db723	[IndirectCallPromotion] Recommit "Don't strip ".__uniq." suffix when it strips ".llvm." suffix". The recommit fixed a bug that symbols with "." at the beginning is not properly handled in the last commit. Original commit message: Currently IndirectCallPromotion simply strip everything after the first "." in LTO mode, in order to match the symbol name and the name with ".llvm." suffix in the value profile. However, if -funique-internal-linkage-names and thinlto are both enabled, the name may have both ".__uniq." suffix and ".llvm." suffix, and the current mechanism will strip them both, which is unexpected. The patch fixes the problem. Differential Revision: https://reviews.llvm.org/D98389	2021-03-12 13:48:14 -08:00
Nikita Popov	42eb658f65	[OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC) This removes some (but not all) uses of type-less CreateGEP() and CreateInBoundsGEP() APIs, which are incompatible with opaque pointers. There are a still a number of tricky uses left, as well as many more variation APIs for CreateGEP.	2021-03-12 21:01:16 +01:00
Florian Hahn	fb3ca70761	[LV] Account IV recipes being uniform in VPTransformState::get(). This patch fixes a crash when trying to get a scalar value using VPTransformState::get() for uniform induction values or truncated induction values. IVs and truncated IVs can be uniform and the updated code accounts for that, fixing the crash. This should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981	2021-03-12 13:29:06 +00:00
Sanjay Patel	bd197ed0a5	[SimplifyCFG] avoid sinking insts within an infinite-loop The test is reduced from a C source example in: https://llvm.org/PR49541 It's possible that the test could be reduced further or the predicate generalized further, but it seems to require a few ingredients (including the "late" SimplifyCFG options on the RUN line) to fall into the infinite-loop trap.	2021-03-12 08:04:57 -05:00
Hans Wennborg	f50aef745c	Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user" This broke the check-profile tests on Mac, see comment on the code review. > This is no longer needed, we can add __llvm_profile_runtime directly > to llvm.compiler.used or llvm.used to achieve the same effect. > > Differential Revision: https://reviews.llvm.org/D98325 This reverts commit `c7712087cb`. Also reverting the dependent follow-up commit: Revert "[InstrProfiling] Generate runtime hook for ELF platforms" > When using -fprofile-list to selectively apply instrumentation only > to certain files or functions, we may end up with a binary that doesn't > have any counters in the case where no files were selected. However, > because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the > runtime would still be pulled in and incur some non-trivial overhead, > especially in the case when the continuous or runtime counter relocation > mode is being used. A better way would be to pull in the profile runtime > only when needed by declaring the __llvm_profile_runtime symbol in the > translation unit only when needed. > > This approach was already used prior to `9a041a7522`, but we changed it > to always generate the __llvm_profile_runtime due to a TAPI limitation. > Since TAPI is only used on Mach-O platforms, we could use the early > emission of __llvm_profile_runtime there, and on other platforms we > could change back to the earlier approach where the symbol is generated > later only when needed. We can stop passing -u__llvm_profile_runtime to > the linker on Linux and Fuchsia since the generated undefined symbol in > each translation unit that needed it serves the same purpose. > > Differential Revision: https://reviews.llvm.org/D98061 This reverts commit `87fd09b25f`.	2021-03-12 13:53:46 +01:00
Serguei Katkov	cfe8f8e0f0	Revert "Mark gc.relocate and gc.result as readnone" As readnone function they become movable and LICM can hoist them out of a loop. As a result in LCSSA form phi node of type token is created. No one is ready that GCRelocate first operand is phi node but expects to be token. GVN test were also updated, it seems it does not do what is expected. Test for LICM is also added. This reverts commit `f352463ade`.	2021-03-12 16:59:17 +07:00
Johannes Doerfert	ff256c1376	[Attributor] Derive `willreturn` based on `mustprogress` Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94125	2021-03-11 23:31:44 -06:00
Nikita Popov	2fe85dd289	[Attributor] Don't access pointer elem type in constructPointer (NFC) Splitting this out as the change is non-trivial: The way this code handled pointer types doesn't really make sense, as GEPs can only apply an offset to the outermost pointer, but can't drill down into interior pointer types (which would require dereferencing memory). Instead give special treatment to the first (pointer) index. I've hardcoded it to zero as that's the only way the function is used right now, but handling non-zero indexes would be straightforward. The original goal here was to have an element type for CreateGEP.	2021-03-11 21:36:40 +01:00
Petr Hosek	87fd09b25f	[InstrProfiling] Generate runtime hook for ELF platforms When using -fprofile-list to selectively apply instrumentation only to certain files or functions, we may end up with a binary that doesn't have any counters in the case where no files were selected. However, because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the runtime would still be pulled in and incur some non-trivial overhead, especially in the case when the continuous or runtime counter relocation mode is being used. A better way would be to pull in the profile runtime only when needed by declaring the __llvm_profile_runtime symbol in the translation unit only when needed. This approach was already used prior to `9a041a7522`, but we changed it to always generate the __llvm_profile_runtime due to a TAPI limitation. Since TAPI is only used on Mach-O platforms, we could use the early emission of __llvm_profile_runtime there, and on other platforms we could change back to the earlier approach where the symbol is generated later only when needed. We can stop passing -u__llvm_profile_runtime to the linker on Linux and Fuchsia since the generated undefined symbol in each translation unit that needed it serves the same purpose. Differential Revision: https://reviews.llvm.org/D98061	2021-03-11 12:29:01 -08:00
Valery N Dmitriev	73f94969b2	[SLP] Fix crash when matching associative reduction for integer min/max. Associative reduction matcher in SLP begins with select instruction but when it reached call to llvm.umax (or alike) via def-use chain the latter also matched as UMax kind. The routine's later code assumes matched instruction to be a select and thus it merely died on the first encountered cast that did not fit. Differential Revision: https://reviews.llvm.org/D98432	2021-03-11 11:52:57 -08:00
Wenlei He	051f2c144e	[SamplePGO] Skip inlinee profile scaling for sample loader inlining For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining. Differential Revision: https://reviews.llvm.org/D98187	2021-03-11 10:18:26 -08:00
Hiroshi Yamauchi	365b225d46	[PGO] Fix two issues in PGOMemOPSizeOpt. 1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from the value profile metadata and preserves the remaining entries for the fallback memop call site. If there are more than five entries, the rest of the entries would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes up to 3 (by default) values, but potentially not for other downstream passes that may use the value profile metadata. 2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept track of. When the range buckets were introduced, it was changed to skip the range buckets, but since it does not grab all entries (only five), if some range buckets exist in the first five entries, it could potentially cause fewer promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it means that wrong entries may be preserved, as it didn't correctly keep track of which were entries were skipped. To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number of value profile buckets), keeps track of which entries were skipped, and preserves all the remaining entries. Differential Revision: https://reviews.llvm.org/D97592	2021-03-11 09:53:05 -08:00
Stephen Tozer	f40976bd01	Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reverts commit `c0f3dfb9f1`. Reverted due to an error on the clang-x64-windows-msvc buildbot.	2021-03-11 14:48:01 +00:00
Nikita Popov	46354bac76	[OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC) Explicitly pass loaded type when creating loads, in preparation for the deprecation of these APIs. There are still a couple of uses left.	2021-03-11 14:40:57 +01:00
gbtozers	c0f3dfb9f1	[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic operations with two or more non-const operands; this includes the GetElementPtr instruction, and most Binary Operator instructions. These salvages produce DIArgList locations and are only valid for dbg.values, as currently variadic DIExpressions must use DW_OP_stack_value. This functionality is also only added for salvageDebugInfoForDbgValues; other functions that directly call salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be updated in a later patch. Differential Revision: https://reviews.llvm.org/D91722	2021-03-11 13:33:49 +00:00
Nikita Popov	403da6a69a	Reapply [LICM] Make promotion faster Relative to the previous implementation, this always uses aliasesUnknownInst() instead of aliasesPointer() to correctly handle atomics. The added test case was previously miscompiled. ----- Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-11 10:50:28 +01:00
Djordje Todorovic	9f41c03f82	[Debugify][OriginalDIMode] Export the report into JSON file By using the original-di check with debugify in the combination with the llvm/utils/llvm-original-di-preservation.py it becomes very user friendly tool. An example of the HTML page with the issues related to debug info can be found at [0]. [0] https://djolertrk.github.io/di-checker-html-report-example/ Differential Revision: https://reviews.llvm.org/D82546	2021-03-11 01:11:13 -08:00
Petr Hosek	c7712087cb	[InstrProfiling] Don't generate __llvm_profile_runtime_user This is no longer needed, we can add __llvm_profile_runtime directly to llvm.compiler.used or llvm.used to achieve the same effect. Differential Revision: https://reviews.llvm.org/D98325	2021-03-10 22:33:51 -08:00
Ruiling Song	8b7d3bed0f	[ValueMapper] Add debug output for metadata remapping This is useful for debugging which pointers are updated during remapping process. Differential Revision: https://reviews.llvm.org/D95775	2021-03-11 09:54:55 +08:00
kuterd	d75c9e61a5	[Attributor] Attributor call site specific AAValueConstantRange This patch makes uses of the context bridges introduced in D83299 to make AAValueConstantRange call site specific. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83744	2021-03-11 01:19:44 +03:00
Mauri Mustonen	0de8aeae72	[VPlan] Support to widen select intructions in VPlan native path Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97136	2021-03-10 20:59:53 +00:00
Matteo Favaro	989051d5f8	[DSE] Extending isOverwrite to support offsetted fully overlapping stores The isOverwrite function is making sure to identify if two stores are fully overlapping and ideally we would like to identify all the instances of OW_Complete as they'll yield possibly killable stores. The current implementation is incapable of spotting instances where the earlier store is offsetted compared to the later store, but still fully overlapped. The limitation seems to lie on the computation of the base pointers with the GetPointerBaseWithConstantOffset API that often yields different base pointers even if the stores are guaranteed to partially overlap (e.g. the alias analysis is returning AliasResult::PartialAlias). The patch relies on the offsets computed and cached by BatchAAResults (available after D93529) to determine if the offsetted overlapping is OW_Complete. Differential Revision: https://reviews.llvm.org/D97676	2021-03-10 21:09:33 +01:00
Sriraman Tallam	0ba1ebcbb7	Remove original implementation of UniqueInternalLinkageNames pass. D96109 was recently submitted which contains the refactored implementation of -funique-internal-linakge-names by adding the unique suffixes in clang rather than as an LLVM pass. Deleting the former implementation in this change. Differential Revision: https://reviews.llvm.org/D98234	2021-03-10 11:57:40 -08:00
gbtozers	81b8357e70	[DebugInfo][NFC] Refactor BinOp+GEP salvaging in salvageDebugInfoImpl This patch refactors out the salvaging of GEP and BinOp instructions into separate functions, in preparation for further changes to the salvaging of these instructions coming in another patch; there should be no functional change as a result of this refactor. Differential Revision: https://reviews.llvm.org/D92851	2021-03-10 18:03:12 +00:00
Daniil Seredkin	7c49f3c75b	[InstCombine][SimplifyLibCalls] An extra sqrtf was produced because of transformations in optimizePow function See: https://bugs.llvm.org/show_bug.cgi?id=47613 There was an extra sqrt call because shrinking emitted a new powf and at the same time optimizePow replaces the previous pow with sqrt and as the result we have two instructions that will be in worklist of InstCombie despite the fact that %powf is not used by anyone (it is alive because of errno). As the result we have two instructions: %powf = call fast float @powf(float %x, float 5.000000e-01) %sqrt = call fast double @sqrt(double %dx) %powf will be converted to %sqrtf on a later iteration. As a quick fix for that I moved shrinking to the end of optimizePow so that pow is replaced with sqrt at first that allows not to emit a new shrunk powf. Differential Revision: https://reviews.llvm.org/D98235	2021-03-10 12:33:05 -05:00
Ta-Wei Tu	7ff2768be1	Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`" This reverts commit `df9158c9a4`.	2021-03-11 01:24:43 +08:00
Jianzhou Zhao	6a9a686ce7	[dfsan] Tracking origins at phi nodes This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D98268	2021-03-10 17:02:58 +00:00
Dávid Bolvanský	c68b560be3	[DSE] Handle memmove with equal non-const sizes Follow up for fhahn's D98284. Also fixes a case from PR47644. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98346	2021-03-10 17:52:00 +01:00
Florian Hahn	8d9b9c0edc	[DSE] Handle memcpy/memset with equal non-const sizes. Currently DSE misses cases where the size is a non-const IR value, even if they match. For example, this means that llvm.memcpy/llvm.memset calls are not eliminated, even if they write the same number of bytes. This patch extends isOverwite to try to get IR values for the number of bytes written from the analyzed instructions. If the values match, alias checks are performed and the result is returned. At the moment this only covers llvm.memcpy/llvm.memset. In the future, we may enable MemoryLocation to also track variable sizes, but this simple approach should allow us to cover the important cases in DSE. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98284	2021-03-10 10:13:58 +00:00
Wei Mi	ee35784a90	[SampleFDO] Support enabling -funique-internal-linkage-name. now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip. When we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This kind of mismatch may introduce transient regression. To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped. Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed. Differential Revision: https://reviews.llvm.org/D96932	2021-03-09 21:41:40 -08:00
Philip Reames	cf1899e0a9	[rs4gc] common bdv operand visitation [nfc]	2021-03-09 20:28:47 -08:00
Arnold Schwaighofer	590ac0a26a	[coro async] Transfer the original function's attributes to the clone rdar://75052917 Differential Revision: https://reviews.llvm.org/D98051	2021-03-09 17:01:41 -08:00
Leonard Chan	cf371573b0	[llvm] Change DSOLocalEquivalent type if the underlying global value type changes We encountered an issue where LTO running on IR that used the DSOLocalEquivalent constant would result in bad codegen. The underlying issue was ValueMapper wasn't properly handling DSOLocalEquivalent, so this just adds the machinery for handling it. This code path is triggered by a fix to DSOLocalEquivalent::handleOperandChangeImpl where DSOLocalEquivalent could potentially not have the same type as its underlying GV. This updates DSOLocalEquivalent::handleOperandChangeImpl to change the type if the GV type changes and handles this constant in ValueMapper. Differential Revision: https://reviews.llvm.org/D97978	2021-03-09 15:09:48 -08:00
Sanjay Patel	23fd647cc6	[SLP] remove dead null check; NFC We cast<> to Instruction (not dyn_cast<>), so we already required/assumed that Cmp is not null.	2021-03-09 17:43:07 -05:00
Jianzhou Zhao	8506fe5b41	[dfsan] Tracking origins at memory transfer This is a part of https://reviews.llvm.org/D95835. Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D98192	2021-03-09 22:15:07 +00:00
Juneyoung Lee	f49354838e	Revert "[InstCombine] Add simplification of two logical and/ors" This reverts commit `07c3b97e18` due to a reported failure in two-stage build.	2021-03-10 05:48:31 +09:00
gbtozers	df69c69427	[DebugInfo] Handle multiple variable location operands in IR This patch updates the various IR passes to correctly handle dbg.values with a DIArgList location. This patch does not actually allow DIArgLists to be produced by salvageDebugInfo, and it does not affect any pass after codegen-prepare. Other than that, it should cover every IR pass. Most of the changes simply extend code that operated on a single debug value to operate on the list of debug values in the style of any_of, all_of, for_each, etc. Instances of setOperand(0, ...) have been replaced with with replaceVariableLocationOp, which takes the value that is being replaced as an additional argument. In places where this value isn't readily available, we have to track the old value through to the point where it gets replaced. Differential Revision: https://reviews.llvm.org/D88232	2021-03-09 16:44:38 +00:00
Sanjay Patel	2986a9c7e2	[InstCombine] canonicalize 'not' op after min/max intrinsic This is another step towards parity between existing select transforms and min/max intrinsics (D98152).. The existing 'not' folds around select are complicated, so it's likely that we will need to enhance this, but this should be a safe step.	2021-03-09 11:33:28 -05:00
Sanjay Patel	41b9209a12	[InstCombine] fold min/max intrinsics with not ops This is a partial translation of the existing select-based folds. We need to recreate several different transforms to avoid regressions as noted in D98152. https://alive2.llvm.org/ce/z/teuZ_J	2021-03-09 08:55:48 -05:00
Florian Hahn	92da5b7119	[InstCombine] Simplify phis with incoming pointer-casts. If the incoming values of a phi are pointer casts of the same original value, replace the phi with a single cast. Such redundant phis are somewhat common after loop-rotate and removing them can avoid some unnecessary code bloat, e.g. because an iteration of a loop is peeled off to make the phi invariant. It should also simplify further analysis on its own. InstCombine already uses stripPointerCasts in a couple of places and also simplifies phis based on the incoming values, so the patch should fit in the existing scope. The patch causes binary changes in 47 out of 237 benchmarks in MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98058	2021-03-09 11:40:18 +00:00
Hongtao Yu	4c3d759d00	[CSSPGO] Always use callsite samples as callsite probe counts. For CS profile, the callsite count of previously inlined callees is populated with the entry count of the callees. Therefore when trying to get a weight for calliste probe after inlinining, the callsite count should always be used. The same fix has already been made for non-probe case. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D98094	2021-03-08 22:52:36 -08:00
Akira Hatanaka	dca5737945	Move ObjCARCUtil.h back to llvm/Analysis Instead of adding the header to llvm/IR, just duplicate the marker string in the auto upgrader.	2021-03-08 16:35:24 -08:00
Alina Sbirlea	29482426b5	Revert "[LICM] Make promotion faster" Revert `3d8f842712` Revision triggers a miscompile sinking a store incorrectly outside a threading loop. Detected by tsan. Reverting while investigating. Differential Revision: https://reviews.llvm.org/D89264	2021-03-08 12:53:03 -08:00
Philip Reames	ebc61f9d3c	[instcombine] Collapse trivial or recurrences If we have a recurrence of the form <Start, Or, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (or part)	2021-03-08 09:21:38 -08:00
Philip Reames	239a618180	[instcombine] Collapse trivial and recurrences If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (and part)	2021-03-08 09:21:38 -08:00
Philip Reames	97a7bc5831	[gvn] Precisely propagate equalities to phi operands The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use). Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify. Differential Revision: https://reviews.llvm.org/D98082	2021-03-08 08:59:00 -08:00
Sanne Wouda	05a6e2eb9a	[InstCombine] Add a combine for a shuffle of similar bitcasts Some intrinsics wrapper code has the habit of ignoring the type of the elements in vectors, thinking of vector registers as a "bag of bits". As a consequence, some operations are shared between vectors of different types are shared. For example, functions that rearrange elements in a vector can be shared between vectors of int32 and float. This can result in bitcasts in awkward places that prevent the backend from recognizing some instructions. For AArch64 in particular, it inhibits the selection of dup from a general purpose register (GPR), and mov from GPR to a vector lane. This patch adds a pattern in InstCombine to move the bitcasts past the shufflevector if this is possible. Sometimes this even allows InstCombine to remove the bitcast entirely, as in the included tests. Alternatively this could be done with a few extra patterns in the AArch64 backend, but InstCombine seems like a better place for this. Differential Revision: https://reviews.llvm.org/D97397	2021-03-08 16:32:30 +00:00
Sanne Wouda	5e963a2441	Rehome an orphaned comment [NFC] As seen in `35827164c4`, the "shuffle x, x, mask" comment has drifted away from the implementation of the pattern. Put it back.	2021-03-08 16:32:30 +00:00
Stephen Tozer	4343c68fa3	Fix: [DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch removed the only use of a lambda capture, triggering an error on `-Werror -Wunused-lambda-capture` builds.	2021-03-08 14:57:11 +00:00
gbtozers	e5d958c456	[DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.	2021-03-08 14:36:13 +00:00
Ta-Wei Tu	df9158c9a4	[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest` The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`. In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour. Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113. `LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`. Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D97290	2021-03-08 11:36:08 +08:00
Mehdi Amini	8d5a981a13	Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This reverts commit `99108c791d`. Clang is miscompiling LLVM with this change, a stage-2 build hits multiple failures. As a repro, I built clang in a stage1 directory and used it this way: cmake -G Ninja ../llvm \ -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \ -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \ -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \ -DLLVM_ENABLE_PROJECTS=mlir \ -DLLVM_BUILD_EXAMPLES=ON \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_ASSERTIONS=On ninja check-mlir	2021-03-08 00:15:47 +00:00
Juneyoung Lee	07c3b97e18	[InstCombine] Add simplification of two logical and/ors This is a patch that adds folding of two logical and/ors that share one variable: a && (a && b) -> a && b a && (a & b) -> a && b ... This is towards removing the poison-unsafe select optimization (D93065 has more context). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96945	2021-03-08 02:38:43 +09:00
Juneyoung Lee	d672c81126	[InstCombine] use safe transformation by default .. since it will be folded into and/or anyway	2021-03-08 02:25:29 +09:00
Nikita Popov	2b494f85f1	[CVP] Remove -cvp-dont-add-nowrap-flags option This option was originally added to work around a bug in LFTR. The bug has long since been fixed.	2021-03-07 18:19:31 +01:00
Nikita Popov	176bbcae11	[DSE] Remove MemDep-based implementation The MemorySSA-based implementation has been enabled without issue for a while now, so keeping the old implementation around doesn't seem useful anymore. This drops the MemDep-based implementation. Differential Revision: https://reviews.llvm.org/D97877	2021-03-07 18:17:31 +01:00
Juneyoung Lee	33590ed4f2	[InstCombine] fix another poison-unsafe select transformation This fixes another unsafe select folding by disabling it if EnableUnsafeSelectTransform is set to false. EnableUnsafeSelectTransform's default value is true, hence it won't affect generated code (unless the flag is explicitly set to false).	2021-03-08 02:11:04 +09:00
Juneyoung Lee	99108c791d	[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG. It is known that merging conditions into and/or is poison-unsafe, and this is towards making things more correct by removing possible miscompilations. Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future. There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065. The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true. Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95026	2021-03-08 01:38:03 +09:00
Juneyoung Lee	5bb38e84d3	[LoopUnswitch] unswitch if cond is in select form of and/or as well Hello all, I'm trying to fix unsafe propagation of poison values in and/or conditions by using equivalent select forms (`select i1 A, i1 B, i1 false` and `select i1 A, i1 true, i1 false`) instead. D93065 has links to patches for this. This patch allows unswitch to happen if the condition is in this form as well. `collectHomogenousInstGraphLoopInvariants` is updated to keep traversal if Root and the visiting I matches both m_LogicalOr()/m_LogicalAnd(). Other than this, the remaining changes are almost straightforward and simply replaces Instruction::And/Or check with match(m_LogicalOr()/m_LogicalAnd()). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D97756	2021-03-08 01:19:43 +09:00
Nikita Popov	3fedaf2a52	[GVN] Don't explicitly materialize undefs from dead blocks When materializing an available load value, do not explicitly materialize the undef values from dead blocks. Doing so will will force creation of a phi with an undef operand, even if there is a dominating definition. The phi will be folded away on subsequent GVN iterations, but by then we may have already poisoned MDA cache slots. Simply don't register these values in the first place, and let SSAUpdater do its thing.	2021-03-06 23:46:24 +01:00
Fangrui Song	fb2cf0dd60	[FunctionImport] Delete unneeded setLive. NFC ValueInfo's in Worklist are guaranteed to be live.	2021-03-06 14:09:54 -08:00
Mauri Mustonen	494b5ba364	[VPlan] Support to widen call intructions in VPlan native path Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Patch by Mauri Mustonen <mauri.mustonen@tuni.fi> Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97278	2021-03-06 21:59:52 +00:00
Roman Lebedev	2ad1f5eb1a	[InstCombine] Don't canonicalize (gep i8* X, -(ptrtoint Y)) as (inttoptr (sub (ptrtoint X), (ptrtoint Y))) It's just a wrong thing to do. We introduce inttoptr where there were none, which results in loosing all provenance information because we no longer have a GEP{i,}, and pessimize all future optimizations, because we are basically not allowed to look past `inttoptr`. (gep i8* X, -(ptrtoint Y)) is the canonical form. So just drop this fold. Noticed while reviewing D98120.	2021-03-06 23:00:25 +03:00
Fangrui Song	9fb6782c69	[rs4gc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds	2021-03-06 11:42:27 -08:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00
Ta-Wei Tu	8a003861a3	[NPM] Add -enable-loopinterchange option to NPM We have the `enable-loopinterchange` option in legacy pass manager but not in NPM. Add `LoopInterchange` pass to the optimization pipeline (at the same position as before) when `enable-loopinterchange` is turned on. Reviewed By: aeubanks, fhahn Differential Revision: https://reviews.llvm.org/D98116	2021-03-07 02:39:28 +08:00
William S. Moses	d163e75c81	[Attributor] Enable heap-to-stack of any size Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1 Differential Revision: https://reviews.llvm.org/D97873	2021-03-06 12:57:32 -05:00
Philip Reames	5db2735af9	[gvn] Handle simply phi equivalence cases GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet. However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress. This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.) Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect. Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi. Differential Revision: https://reviews.llvm.org/D98080	2021-03-06 09:31:12 -08:00
Philip Reames	8fe59ba51e	[rs4gc] track the original value in the state use for base pointer rewriting I'd originally intended to build on this for another purpose and have decided not to, but at a minimum, the stronger asserts are useful.	2021-03-06 08:46:15 -08:00
Philip Reames	6334952ff0	[rs4gc] minor code style improvement	2021-03-06 08:46:15 -08:00
Philip Reames	51b13a7ea0	[gvn] CSE gc.relocates based on meaning, not spelling The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.) We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass. Differential Revision: https://reviews.llvm.org/D97974 (Note: Part of the reviewed change was split and landed as `f352463a`)	2021-03-05 10:16:12 -08:00
Philip Reames	f352463ade	Mark gc.relocate and gc.result as readnone For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism. gc.relocate and gc.result are simply projections of the return values from the associated statepoint. Note that the LangRef has always declared them readnone. The EarlyCSE change is simply moving the special casing from readonly to readnone handling. As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice. This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing. The motivation is to enable the GVN changes in that patch.	2021-03-05 10:07:17 -08:00
Philip Reames	99f93dd3a5	[rs4gc] avoid insert base computation instructions for deopt uses If we have a value live over a call which is used for deopt at the call, we know that the value must be a base pointer. We can avoid potentially inserting IR to materialize a base for this value. In it's current form, this is mostly a compile time optimization. Building the base pointer graph (and then optimizing it away again) is a relatively expensive operation. We also sometimes end up with better codegen in practice - due to failures in optimizing away the inserted base pointer propogation - but those are optimization bugs we're fixing concurrently. The alternative to this would be to extend the base pointer inference with the ability to generally reuse multiple-base input instructions (phis and selects). That's somewhat invasive and complicated, so we're defering it a bit longer. Differential Revision: https://reviews.llvm.org/D97885	2021-03-05 09:55:36 -08:00
gbtozers	65600cb2a7	[DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics This patch adds a new metadata node, DIArgList, which contains a list of SSA values. This node is in many ways similar in function to the existing ValueAsMetadata node, with the difference being that it tracks a list instead of a single value. Internally, it uses ValueAsMetadata to track the individual values, but there is also a reasonable amount of DIArgList-specific value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special case in parsing and printing due to the fact that it requires a function state (as it may reference function-local values). This patch should not result in any immediate functional change; it allows for DIArgLists to be parsed and printed, but debug variable intrinsics do not yet recognize them as a valid argument (outside of parsing). Differential Revision: https://reviews.llvm.org/D88175	2021-03-05 17:02:24 +00:00
David Sherwood	fec0a0adac	[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector There are certain loops like this below: for (int i = 0; i < n; i++) { a[i] = b[i] + 1; *inv = a[i]; } that can only be vectorised if we are able to extract the last lane of the vectorised form of 'a[i]'. For fixed width vectors this already works since we know at compile time what the final lane is, however for scalable vectors this is a different story. This patch adds support for extracting the last lane from a scalable vector using a runtime determined lane value. I have added support to VPIteration for runtime-determined lanes that still permit the caching of values. I did this by introducing a new class called VPLane, which describes the lane we're dealing with and provides interfaces to get both the compile-time known lane and the runtime determined value. Whilst doing this work I couldn't find any explicit tests for extracting the last lane values of fixed width vectors so I added tests for both scalable and fixed width vectors. Differential Revision: https://reviews.llvm.org/D95139	2021-03-05 09:57:56 +00:00
Michael Kruse	b119120673	[clang][OpenMP] Use OpenMPIRBuilder for workshare loops. Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to: * Only the worksharing-loop directive. * Recognizes only the nowait clause. * No loop nests with more than one loop. * Untested with templates, exceptions. * Semantic checking left to the existing infrastructure. This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics: * The distance function: How many loop iterations there will be before entering the loop nest. * The loop variable function: Conversion from a logical iteration number to the loop variable. These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang. The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does. For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting. Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D94973	2021-03-04 22:52:59 -06:00
Wei Mi	2357d29335	[SampleFDO] Another fix to prevent repeated indirect call promotion in sample loader pass. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, to prevent repeated indirect call promotion for the same indirect call and the same target, we used zero-count value profile to indicate an indirect call has been promoted for a certain target. We removed PromotedInsns cache in the same patch. However, there was a problem in that patch described below, and that problem led me to add PromotedInsns back as a mitigation in https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf. When we get value profile from metadata by calling getValueProfDataFromInst, we need to specify the maximum possible number of values we expect to read. We uses MaxNumPromotions in the last patch so the maximum number of value information extracted from metadata is MaxNumPromotions. If we have many values including zero-count values when we write the metadata, some of them will be dropped when we read them because we only read MaxNumPromotions values. It will allow repeated indirect call promotion again. We need to make sure if there are values indicating promoted targets, those values need to be saved in metadata with higher priority than other values. The patch fixed that problem. We change to use -1 to represent the count of a promoted target instead of 0 so it is easier to sort the values. When we prepare to update the metadata in updateIDTMetaData, we will sort the values in the descending count order and extract only MaxNumPromotions values to write into metadata. Since -1 is the max uint64_t number, if we have equal to or less than MaxNumPromotions of -1 count values, they will all be kept in metadata. If we have more than MaxNumPromotions of -1 count values, we will only save MaxNumPromotions such values maximally. In such case, we have logic in place in doesHistoryAllowICP to guarantee no more promotion in sample loader pass will happen for the indirect call, because it has been promoted enough. With this change, now we can remove PromotedInsns without problem. Differential Revision: https://reviews.llvm.org/D97350	2021-03-04 18:44:12 -08:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Jianzhou Zhao	db7fe6cd4b	[dfsan] Propagate origin tracking at store This is a part of https://reviews.llvm.org/D95835. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97789	2021-03-04 23:34:44 +00:00
William S. Moses	2b896e39bf	Revert "[Attributor] Enable heap-to-stack of any size" This reverts commit `51bd42ef9b`.	2021-03-04 17:24:56 -05:00
Sanjay Patel	1bee549737	[LoopVectorize] propagate fast-math-flags from induction instructions This code assumed that FP math was only permissable if it was fully "fast", so it hard-coded "fast" when creating new instructions. The underlying code already allows matching recurrences/reductions that are only "reassoc", so this change should prevent the potential miscompile seen in the test diffs (we created "fast" ops even though none existed in the original code). I don't know if we need to create the temporary IRBuilder objects used here, so that could be follow-up clean-up. There's an open question about whether we should require "nsz" in addition to "reassoc" here. InstCombine uses that combo for its reassociative folds, but I think codegen is not as strict.	2021-03-04 17:21:32 -05:00
William S. Moses	51bd42ef9b	[Attributor] Enable heap-to-stack of any size Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1 Differential Revision: https://reviews.llvm.org/D97873	2021-03-04 17:17:23 -05:00
Francis Visoiu Mistrih	365b78396a	[Remarks] Emit variable info in auto-init remarks This enhances the auto-init remark with information about the variable that is auto-initialized. This is based of debug info if available, or alloca names (mostly for development purposes). ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` This allows to see things like partial initialization of a variable that the optimizer won't be able to completely remove. Differential Revision: https://reviews.llvm.org/D97734	2021-03-04 12:51:22 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Adrian Prantl	d268febc56	Improve the debug info for coro-split .resume functions This patch updates the scope line to point to the suspend point. This makes the first address in the function point to the first source line in the resume function rather than the function declaration. Without this the line table "jumps" from the beginning of the function to the suspend point at the beginning. rdar://73386346 Differential Revision: https://reviews.llvm.org/D97345	2021-03-04 11:05:35 -08:00
Jianzhou Zhao	72abc9bf07	[dfsan] add a missing zero origin at atomic commands	2021-03-04 16:50:05 +00:00
Alexey Bataev	04ba80ca4d	[Instcombiner]Improve emission of logical or/and reductions. For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls. These intrinsics are not effective for the logical or/and reductions, especially if the optimizer is able to emit short circuit versions of the scalar or/and instructions and vector code gets less effective than the scalar version. Instead, or reduction for i1 can be represented as: ``` %val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp ne iReduxWidth %val, 0 ``` and reduction for i1 can be represented as: ``` %val = bitcast <ReduxWidth x i1> to iReduxWidth %res = cmp eq iReduxWidth %val, 11111 ``` This improves perfromance of the vector code significantly and make it to outperform short circuit scalar code. Part of D57059. Differential Revision: https://reviews.llvm.org/D97406	2021-03-04 08:01:02 -08:00
Sanjay Patel	36a489d194	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC Similar to `b3a33553ae`, but this shows a TODO and a potential miscompile is already present. We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification may be possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers. The new test shows that there is an existing bug somewhere in the callers. We assumed that the original code was fully 'fast' and so we produced IR with 'fast' even though it was just 'reassoc'.	2021-03-04 10:40:26 -05:00
Sanjay Patel	b3a33553ae	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification seems possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers.	2021-03-04 08:53:04 -05:00
Hongtao Yu	c75da238b4	[CSSPGO] Deduplicating dangling pseudo probes. Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far. This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added. An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed. Differential Revision: https://reviews.llvm.org/D97482	2021-03-03 22:44:42 -08:00
Hongtao Yu	8985515822	[CSSPGO] Unblocking optimizations by dangling pseudo probes. This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments. The optimizations being unblocked are: 1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted. 2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions. 3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks. Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D97481	2021-03-03 22:44:42 -08:00
Johannes Doerfert	5b70c12f3e	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	e592dad82e	[Attributor] Fold "TrackDependence" into the DepClassTy enum We don't need a bool and an enum to express the three options we currently have. This makes the interface nicer and much easier to use optional dependencies. Also avoids mistakes where the bool is false and enum ignored.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c8c93fdf0a	[Attributor] Avoid work for GEPs and wait till the users are visited	2021-03-04 00:35:52 -06:00
Johannes Doerfert	f3f88287c5	[Attributor] Use known alignment as lower bound to avoid work If we know already more than available from a use, we don't need to invest time on it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	c14213e030	[Attributor][NFC] Move some trivial checks up	2021-03-04 00:35:52 -06:00
Johannes Doerfert	09c3eebf5f	[Attributor] Use sensible initialization in AANoCaptureCallSiteReturned	2021-03-04 00:35:51 -06:00
Evgeniy Brevnov	e94125f054	[DSE] Add support for not aligned begin/end This is an attempt to improve handling of partial overlaps in case of unaligned begin\end. Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end. The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment. I'll update with performance results as measured by the test-suite...it's still running... Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D93530	2021-03-04 12:24:23 +07:00
Serguei Katkov	a0ff0f30df	[InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase statepoint intrinsic can be used in invoke context, so it should be handled in visitCallBase to cover both call and invoke. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D97833	2021-03-04 11:00:22 +07:00
Xun Li	03f668613c	[LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment. This patch disable promotion by check whether loop contains coro.suspend intrinsics. This is a resubmit of D86190. Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose. In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame. Differential Revision: https://reviews.llvm.org/D96928	2021-03-03 15:21:57 -08:00
Whitney Tsang	58d531fd6f	[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable. Reviewed By: Meinersbur, bmahjour Differential Revision: https://reviews.llvm.org/D97747	2021-03-03 20:43:31 +00:00
Philip Reames	89d331a31e	Address review comment from D97219 (follow up to `8051156`) Probably should have done this before landing, but I forgot. Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything. Also happens to set us up for handling non-add recurrences in the future if desired.	2021-03-03 12:20:27 -08:00
Philip Reames	99f5417346	Sink routine for replacing a operand bundle to CallBase [NFC] We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.	2021-03-03 12:07:55 -08:00
Philip Reames	805115655e	[LSR] Unify scheduling of existing and inserted addrecs LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold. The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment. As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow. Differential Revision: https://reviews.llvm.org/D97219	2021-03-03 12:07:55 -08:00
Fangrui Song	a84f4fc0df	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker does not let `__start_/__stop_` references retain their sections. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on COFF/Mach-O are not affected. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D97649	2021-03-03 11:32:24 -08:00
Arnold Schwaighofer	a42bea211a	[coro async] Allow a coro.suspend.async to specify which argument is the context argument Before we used the same argument as the entry point. The resume partial function might want to use a different ABI for its context argument Differential Revision: https://reviews.llvm.org/D97333	2021-03-03 08:27:37 -08:00
Nico Weber	64f5d7e972	Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF" This reverts commit `04c3040f41`. Breaks instrprof-value-merge.c in bootstrap builds.	2021-03-03 10:21:17 -05:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Jianzhou Zhao	ac4c1760b2	Fix the build error caused by D97570	2021-03-03 04:47:00 +00:00
Jianzhou Zhao	d866b9c99d	[dfsan] Propagate origin tracking at load This is a part of https://reviews.llvm.org/D95835. One issue is about origin load optimization: see the comments of useCallbackLoadLabelAndOrigin @gbalats This change may have some conflicts with your 8bit change. PTAL the change at visitLoad. Reviewed By: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D97570	2021-03-03 04:32:30 +00:00
George Balatsouras	6ff18b08e6	[dfsan] Fix clang-tidy warnings This addresses ~50 clang-tidy warnings on dfsan instrumentation pass. It also contains some refactoring (all non-functional changes) to eliminate some variables and simplify code. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D97714	2021-03-02 17:37:45 -08:00
Andrei Elovikov	b24afec8ae	[NFCI][VPlan] Modify Recipes' print methods to honor Indent parameter Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97787	2021-03-02 15:32:10 -08:00
Nikita Popov	3d8f842712	[LICM] Make promotion faster Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-02 22:10:48 +01:00
Simon Pilgrim	232f32f0da	[DSE] eliminateDeadStoresMemorySSA - fix "initialization is never read" clang-tidy warning. NFCI.	2021-03-02 15:01:33 +00:00
Alexey Bataev	a054e94e9e	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the vectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D94992	2021-03-02 06:39:47 -08:00
Juneyoung Lee	365f5e2475	[JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select form of and/ors because the function does not look into binops as well	2021-03-02 18:35:18 +09:00
Ta-Wei Tu	ea1a1ebbc6	[NFC] Use std::swap in LoopInterchange	2021-03-02 11:42:48 +08:00
Fangrui Song	04c3040f41	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker no longer lets `__start_/__stop_` references retain them. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on other COFF/Mach-O are not affected. Differential Revision: https://reviews.llvm.org/D97649	2021-03-01 13:43:23 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Florian Hahn	a6c81d3366	[VPlan] Remove recipes from back to front. Update the deletion order when destroying VPBasicBlocks. This ensures recipes that depend on earlier ones in the block are removed first. Otherwise this may cause issues when recipes have remaining users later in the block.	2021-03-01 16:06:30 +00:00
Florian Hahn	53dacb7b67	[LV] Generate RT checks up-front and remove them if required. This patch updates LV to generate the runtime checks just after cost modeling, to allow a more precise estimate of the actual cost of the checks. This information will be used in future patches to generate larger runtime checks in cases where the checks only make up a small fraction of the expected scalar loop execution time. The runtime checks are created up-front in a temporary block to allow better estimating the cost and un-linked from the existing IR. After deciding to vectorize, the checks are moved backed. If deciding not to vectorize, the temporary block is completely removed. This patch is similar in spirit to D71053, but explores a different direction: instead of delaying the decision on whether to vectorize in the presence of runtime checks it instead optimistically creates the runtime checks early and discards them later if decided to not vectorize. This has the advantage that the cost-modeling decisions can be kept together and can be done up-front and thus preserving the general code structure. I think delaying (part) of the decision to vectorize would also make the VPlan migration a bit harder. One potential drawback of this patch is that we speculatively generate IR which we might have to clean up later. However it seems like the code required to do so is quite manageable. Reviewed By: lebedev.ri, ebrevnov Differential Revision: https://reviews.llvm.org/D75980	2021-03-01 10:48:04 +00:00
Juneyoung Lee	5419b67137	[SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally This is a minor change that fixes FoldTwoEntryPHINode to handle phis with and/ors of select form and binop form equally.	2021-03-01 13:34:51 +09:00
Kazu Hirata	d639120983	[llvm] Use set_is_subset (NFC)	2021-02-28 10:59:20 -08:00
Sanjay Patel	9502061bcc	[InstCombine] avoid infinite loop in demanded bits for select https://llvm.org/PR49205	2021-02-28 10:17:53 -05:00
William S. Moses	b077d82b00	[Attributor] Conditinoally delete fns Allow the attributor to delete functions only if requested Differential Revision: https://reviews.llvm.org/D97238	2021-02-27 20:37:42 -05:00
Sanjay Patel	356cdabd3a	[SimplifyCFG] avoid illegal phi with both poison and undef In the example based on: https://llvm.org/PR49218 ...we are crashing because poison is a subclass of undef, so we merge blocks and create: PHI node has multiple entries for the same basic block with different incoming values! %k3 = phi i64 [ poison, %entry ], [ %k3, %g ], [ undef, %entry ] If both poison and undef values are incoming, we soften the poison values to undef. Differential Revision: https://reviews.llvm.org/D97495	2021-02-27 09:10:32 -05:00
Kazu Hirata	1d4a2f3778	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-26 22:36:40 -08:00
Fangrui Song	bf176c49e8	[InstrProfiling] Use llvm.compiler.used instead of llvm.used for ELF Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics for comdat and may not discard the sections as a unit. The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF) are similar to D97432: `__profd_` is not directly referenced, so `__profd_` may be discarded while `__profc_` is retained, breaking the interconnection. We currently conservatively add all such sections to `llvm.used` and let the linker do GC for ELF. In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN, causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC). Use `llvm.compiler.used` to retain the current GC behavior. Differential Revision: https://reviews.llvm.org/D97585	2021-02-26 16:14:03 -08:00
George Balatsouras	c9075a1c8e	[dfsan] Record dfsan metadata in globals This will allow identifying exactly how many shadow bytes were used during compilation, for when fast8 mode is introduced. Also, it will provide a consistent matching point for instrumentation tests so that the exact llvm type used (i8 or i16) for the shadow can be replaced by a pattern substitution. This is handy for tests with multiple prefixes. Reviewed by: stephan.yichao.zhao, morehouse Differential Revision: https://reviews.llvm.org/D97409	2021-02-26 14:42:46 -08:00
Jianzhou Zhao	a47d435bc4	[dfsan] Propagate origins for callsites This is a part of https://reviews.llvm.org/D95835. Each customized function has two wrappers. The first one dfsw is for the normal shadow propagation. The second one dfso is used when origin tracking is on. It calls the first one, and does additional origin propagation. Which one to use can be decided at instrumentation time. This is to ensure minimal additional overhead when origin tracking is off. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97483	2021-02-26 19:12:03 +00:00
Fangrui Song	b55f29c194	[SanitizerCoverage] Clarify llvm.used/llvm.compiler.used and partially fix unmatched metadata sections on Windows `__sancov_pcs` parallels the other metadata section(s). While some optimizers (e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the sections as a unit, some (e.g. GlobalOpt/ConstantMerge) do not. So we have to conservatively retain all unconditionally in the compiler. When a comdat is used, the COFF/ELF linkers' GC semantics ensure the associated parallel array elements are retained or discarded together, so `llvm.compiler.used` is sufficient. Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not used), we have to use `llvm.used` to conservatively make all sections retain by the linker. This will fix the Windows problem once internal linkage GlobalObject's in `llvm.used` are retained via `/INCLUDE:`. Reviewed By: morehouse, vitalybuka Differential Revision: https://reviews.llvm.org/D97432	2021-02-26 11:10:03 -08:00
Simon Pilgrim	455d43b951	[Utils] collectBitParts - bail for integers > 128-bits collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit. We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers. Thanks to @manojgupta for the repro.	2021-02-26 14:58:01 +00:00
Stephen Tozer	ec7b9b0c18	[InstCombine] Avoid redundant or out-of-order debug value sinking This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent redundant debug intrinsics from being produced, and also prevent the intrinsics from being emitted in an incorrect order. It does this by ensuring that when this pass sinks an instruction and creates clones of the debug intrinsics that use that instruction, it inserts those debug intrinsics in their original order, and only inserts the last debug intrinsic for each variable in the Instruction's block. Differential revision: https://reviews.llvm.org/D95463	2021-02-26 13:04:33 +00:00
Evgeniy Brevnov	13a5cac2ba	Revert "[NARY-REASSOCIATE] Support reassociation of min/max" This reverts commit `83d134c3c4`.	2021-02-26 19:47:54 +07:00
Kazu Hirata	5fc9e30985	[Scalar] Use range-based for loops (NFC)	2021-02-25 19:54:38 -08:00
Jianzhou Zhao	c88fedef2a	[dfsan] Conservative solution to atomic load/store DFSan at store does store shadow data; store app data; and at load does load shadow data; load app data. When an application data is atomic, one overtainting case is thread A: load shadow thread B: store shadow thread B: store app thread A: load app If the application address had been used by other flows, thread A reads previous shadow, causing overtainting. The change is similar to MSan's solution. 1) enforce ordering of app load/store 2) load shadow after load app; store shadow before shadow app 3) do not track atomic store by reseting its shadow to be 0. The last one is to address a case like this. Thread A: load app Thread B: store shadow Thread A: load shadow Thread B: store app This approach eliminates overtainting as a trade-off between undertainting flows via shadow data race. Note that this change addresses only native atomic instructions, but does not support builtin libcalls yet. https://llvm.org/docs/Atomics.html#libcalls-atomic Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97310	2021-02-25 23:34:58 +00:00
James Y Knight	24539f1ef2	Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg. And then push those change throughout LLVM. Keep the old signature in Clang's CGBuilder for now -- that will be updated in a follow-on patch (D97224). The MLIR LLVM-IR dialect is not updated to support the new alignment attribute, but preserves its existing behavior. Differential Revision: https://reviews.llvm.org/D97223	2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih	fee9abe69c	[Remarks] Provide more information about auto-init calls This now analyzes calls to both intrinsics and functions. For intrinsics, grab the ones we know and care about (mem* family) and analyze the arguments. For calls, use TLI to get more information about the libcalls, then analyze the arguments if known. ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` Differential Revision: https://reviews.llvm.org/D97489	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	4753a69a31	[Remarks] Provide more information about auto-init stores This adds support for analyzing the instruction with the !annotation "auto-init" in order to generate a more user-friendly remark. For now, support the store size, and whether it's atomic/volatile. Example: ``` auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks] int var; ^ ``` Differential Revision: https://reviews.llvm.org/D97412	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	c49b600b2f	[Remarks] Emit remarks for "auto-init" !annotations Using the !annotation metadata, emit remarks pointing to code added by `-ftrivial-auto-var-init` that survived the optimizer. Example: ``` auto-init.c:4:7: remark: Initialization inserted by -ftrivial-auto-var-init. [-Rpass-missed=annotation-remarks] int buf[1024]; ^ ``` The tests are testing various situations like calls/stores/other instructions, with debug locations, and extra debug information on purpose: more patches will come to improve the reporting to make it more user-friendly, and these tests will show how the reporting evolves. Differential Revision: https://reviews.llvm.org/D97405	2021-02-25 15:14:09 -08:00
Adrian Prantl	1693180884	Add a nullptr check. This doesn't actually reproduce with a dbg.declare(i8* null, ...) which produces a non-null null Value, but I have seen this show up in crash logs. I'm suspecting that there may be another pass forcibly setting the operand to a nullptr.	2021-02-25 12:01:11 -08:00
Fangrui Song	4d63892acb	[SanitizerCoverage] Drop !associated on metadata sections In SanitizerCoverage, the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After inlining, such a `__sancov_*` section can be referenced by more than one functions, but its sh_link still refers to the original function's section. (Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to section violates the invariant.) If the original function's section is discarded (e.g. LTO internalization + `ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error. This above reasoning means that `!associated` is not appropriate to be called by an inlinable function. Non-interposable functions are inline candidates, so we have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections but is expected to parallel a metadata section, so we have to make sure the two sections are retained or discarded at the same time. A section group does the trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return non-null.) For interposable functions, we could keep using `!associated`, but LTO can change the linkage to `internal` and allow such functions to be inlinable, so we have to drop `!associated`, too. To not interfere with section group resolution, we need to use the `noduplicates` variant (section group flag 0). (This allows us to get rid of the ModuleID parameter.) In -fno-pie and -fpie code (mostly dso_local), instrumented interposable functions have WeakAny/LinkOnceAny linkages, which are rare. So the section group header overload should be low. This patch does not change the object file output for COFF (where `!associated` is ignored). Reviewed By: morehouse, rnk, vitalybuka Differential Revision: https://reviews.llvm.org/D97430	2021-02-25 11:59:23 -08:00
Jon Roelofs	7f6e331645	Support `#pragma clang section` directives on MachO targets rdar://59560986 Differential Revision: https://reviews.llvm.org/D97233	2021-02-25 09:30:10 -08:00
Rong Xu	6103b6ad69	[SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class This patch makes SampleProfileLoaderBaseImpl a template class so it can be used in CodeGen transformation. Noticeable changes: * use one template parameter and use IRTraits to get other used types an type specific functions. * remove the temporary "inline" keywords in previous refactor patch. * change the template function findEquivalencesFor to a regular function. This function has a single caller with type of PostDominatorTree. It's simpler to use the type directly because MachinePostDominatorTree is not a derived type of template DominatorTreeBase. Differential Revision: https://reviews.llvm.org/D96981	2021-02-25 08:26:17 -08:00
Evgeniy Brevnov	d0a6f8bb65	[NFC] Fix build failure after `83d134c3c4`	2021-02-25 18:43:00 +07:00
Evgeniy Brevnov	83d134c3c4	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88287	2021-02-25 18:22:39 +07:00
Xun Li	c38000a9fb	[Coroutine] Check indirect uses of alloca when checking lifetime info In the existing logic, we look at the lifetime.start marker of each alloca, and check all uses of the alloca, to see if any pair of the lifetime marker and an use of alloca crosses suspension point. This approach is unfortunately incorrect. An use of alloca does not need to be a direct use, but can be an indirect use through alias. Only checking direct uses can miss cases where indirect uses are crossing suspension point. This can be demonstrated in the newly added test case 007. In the test case, both x and y are only directly used prior to suspend, but they are captured into an alias, merged through a PHINode (so they couldn't be materialized), and used after CoroSuspend. If we only check whether the lifetime starts cross suspension points with direct uses, we will put the allocas to the stack, and then capture their addresses in the frame. Instead of fixing it in D96441 and D96566, this patch takes a different approach which I think is better. We still checks the lifetime info in the same way as before, but with two differences: 1. The collection of liftime.start is moved into AllocaUseVisitor to make the logic more concentrated. 2. When looking at lifetime.start and use pairs, we not only checks the direct uses as before, but in this patch we check all uses collected by AllocaUseVisitor, which would include all indirect uses through alias. This will make the analysis more accurate without throwing away the lifetime optimization. Differential Revision: https://reviews.llvm.org/D96922	2021-02-24 18:29:23 -08:00
Sanjay Patel	a7cee55762	[InstCombine] fold fdiv with powi divisor (PR49147) This extends `b40fde062c` for the especially non-standard powi pattern. We want to avoid being completely wrong on the negation-of-int-min corner case, so I'm adding an extra FMF check for 'ninf' assuming that gives us the flexibility to handle that possibility. https://llvm.org/PR49147	2021-02-24 16:44:36 -05:00
Sanjay Patel	868d43fbd6	[InstCombine] add helper for x/pow(); NFC We at least want to add powi to this list, so split it off into a switch to reduce code duplication.	2021-02-24 16:44:36 -05:00
Duncan P. N. Exon Smith	01701646d5	Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs This is a follow up to `22a52dfddc` and a revert of `df763188c9`. With this change, we only skip cloning distinct nodes in MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the no-longer-needed local helper `cloneOrBuildODR()`. Skipping cloning in other cases is unsound and breaks CloneModule, which is why the textual IR for PR48841 didn't pass previously. This commit adds the test as: Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll Cloning less often exposed a hole in subprogram cloning in CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function has a subprogram attachment whose scope is a DICompositeType that shouldn't be cloned, but it has no internal debug info pointing at that type, that composite type was being cloned. This commit plugs that hole, calling DebugInfoFinder::processSubprogram from CloneFunctionInto. As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit message, I think we need to formalize ownership of metadata a bit more so that ValueMapper/CloneFunctionInto (and similar functions) can deal with cloning (or not) metadata in a more generic, less fragile way. This fixes PR48841. Differential Revision: https://reviews.llvm.org/D96734	2021-02-24 12:57:52 -08:00
Sander de Smalen	5e19208d96	[InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep. This fixes the types of a few more cost variables to be of type InstructionCost.	2021-02-24 14:30:03 +00:00
Pierre Gousseau	27830bc2b1	[asan] Avoid putting globals in a comdat section when targetting elf. Putting globals in a comdat for dead-stripping changes the semantic and can potentially cause false negative odr violations at link time. If odr indicators are used, we keep the comdat sections, as link time odr violations will be dectected for the odr indicator symbols. This fixes PR 47925	2021-02-24 12:01:56 +00:00
Simon Pilgrim	b94c215592	[Utils] collectBitParts - add truncate() handling	2021-02-24 11:48:34 +00:00
Florian Hahn	6240f436dd	Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts the revert commit `437f0bbcd5`. It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be constructed with a VPRecipeBase *. This should address ambiguous constructor issues for recipe sub-types that also inherit from VPValue.	2021-02-24 10:36:02 +00:00
Dan Liew	7d3ef103b5	[ASan] Introduce a way set different ways of emitting module destructors. Previously there was no way to control how module destructors were emitted by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang) to be able to decide how to emit these destructors (if at all). This patch introduces the `AsanDtorKind` enum that represents the different ways destructors can be emitted. There are currently only two valid ways to emit destructors. * `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default. * `None` - Do not emit module destructors. The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated to take the `AsanDtorKind` as an argument. The `-asan-destructor-kind=` command line argument has been introduced to make this easy to test from `opt`. If this argument is specified it overrides the value passed to the `ModuleAddressSanitizerPass` constructor. Note that `AsanDtorKind` is not `bool` because we will introduce a new way to emit destructors in a subsequent patch. Note that `AsanDtorKind` is given its own header file because if it is declared in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error (Module is ambiguous) when trying to use it in `clang/Basic/CodeGenOptions.def`. rdar://71609176 Differential Revision: https://reviews.llvm.org/D96571	2021-02-23 20:01:21 -08:00
Juneyoung Lee	56d228a14e	[SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes. A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97244	2021-02-24 10:40:50 +09:00
Fangrui Song	ef312951fd	collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128 And delete the SmallPtrSetImpl overload. While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97257	2021-02-23 16:09:06 -08:00
Fangrui Song	ed02f52d28	Fix unstable SmallPtrSet iteration issues due to collectUsedGlobalVariables While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Depends on D97128 (which added a new SmallVecImpl overload for collectUsedGlobalVariables). Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97139	2021-02-23 16:09:05 -08:00
Fangrui Song	3adb89bb9f	[ThinLTO] Make cloneUsedGlobalVariables deterministic Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements. While here, decrease inline element counts from 8 to 4. The number of `llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full LTO/hybrid LTO, the number may be large, so we need to be careful. According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is good for a large project with WholeProgramDevirt, when available_externally vtables are placed in the llvm.compiler.used set. Differential Revision: https://reviews.llvm.org/D97128	2021-02-23 16:09:05 -08:00
Teresa Johnson	0a5949dcfa	[WPD] Fix handling of pure virtual base class The fix in `3c4c205060` caused an assert in the case of a pure virtual base class. In that case, the vTableFuncs list on the summary will be empty, so we were hitting the new assert that the linkage type was not available_externally. In the case of pure virtual, we do not want to assert, and additionally need to set VS so that we don't treat it conservatively and quit the analysis of the type id early. This exposed a pre-existing issue where we were not updating the vcall visibility on pure virtual functions when whole program visibility was specified. We were skipping updating the visibility on any global vars that didn't have any vTableFuncs, which meant all pure virtual were not updated, and the later analysis would block any devirtualization of calls that had a type id used on those pure virtual vtables (see the handling in the other code modified in this patch). Simply remove that check. It will mean that we may update the vcall visibility on global vars that aren't vtables, but that setting is ignored for any global vars that didn't have type metadata anyway. Added a new test case that asserted without removing the assert, and that requires the other fixes in this patch (updateVCallVisibilityInIndex and not skipping all vtables without virtual funcs) to get a successful devirtualization with index-only WPD. I added cases to test hybrid and regular LTO for completeness, although those already worked without the fixes here. With this final fix, a clang multistage bootstrap with WPD builds and runs all tests successfully. Differential Revision: https://reviews.llvm.org/D97126	2021-02-23 16:07:09 -08:00
Jianzhou Zhao	a05aa0dd5e	[dfsan] Update memset and dfsan_(set\|add)_label with origin tracking This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97302	2021-02-23 23:16:33 +00:00
Matthew Voss	6da7d31416	[llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata Under certain (currently unknown) conditions, llvm-profdata is outputting profiles that have two consecutive entries in the MemOPSize section for the value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch instruction with two cases for 0. As mentioned, we’re not quite sure what’s causing this to happen, but this patch prevents llvm-profdata from outputting a profile that has this problem and gives an error with a request for a reproducible. Differential Revision: https://reviews.llvm.org/D92074	2021-02-23 12:51:54 -08:00
Andrei Elovikov	3605b873f6	[NFC][VPlan] Use VPUser to store block's predicate Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96529	2021-02-23 11:08:27 -08:00
Florian Hahn	de40423c85	[LV] Ensure fixNonInductionPHIs uses a valid insertion point. In some cases, Builder's insertion point may be invalidated before using it in VPTransformState::get. Make sure the insertion point is up-to-date. This should fix various sanitizer errors, like https://lab.llvm.org/buildbot/#/builders/5/builds/4933/steps/9/logs/stdio	2021-02-23 18:51:05 +00:00
Florian Hahn	437f0bbcd5	Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts commit `4efa097eb4`, because some the compilers used for some bots do not support automatic conversions to PointerUnion.	2021-02-23 16:57:21 +00:00
Florian Hahn	4efa097eb4	[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends. Generalize the return value of tryToCreateWidenRecipe to return either a newly create recipe or an existing VPValue. Use this to avoid creating unnecessary VPBlendRecipes. Fixes PR44800.	2021-02-23 16:52:03 +00:00
Juneyoung Lee	19c2e12947	[JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns This allows JumpThreading's computeValueKnownInPredecessors to recognize select form of and/or patterns as well.	2021-02-24 00:06:10 +09:00
Nate Chandler	01b4890e47	Add @llvm.coro.async.size.replace intrinsic. The new intrinsic replaces the size in one specified AsyncFunctionPointer with the size in another. This ability is necessary for functions which merely forward to async functions such as those defined for partial applications. Reviewed By: aschwaighofer Differential Revision: https://reviews.llvm.org/D97229	2021-02-23 06:43:52 -08:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Matteo Favaro	633e090528	[DSE] Allow ptrs defined in the entry block in IsGuaranteedLoopInvariant. The IsGuaranteedLoopInvariant function is making sure to check if the incoming pointer is guaranteed to be loop invariant, therefore I think the case where the pointer is defined in the entry block of a function automatically guarantees the pointer to be loop invariant, as the entry block of a function cannot have predecessors or be part of a loop. I implemented this small patch and tested it using ninja check-llvm-unit and ninja check-llvm. I added a contained test file that shows the problem and used opt -O3 -debug on it to make sure the case is not currently handled (in fact the debug log is showing that the DSE pass is bailing out when testing if the killer store is able to clobber the dead store). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96979	2021-02-23 12:00:44 +00:00
Juneyoung Lee	481c62277d	[BuildLibCalls] Add noundef to allocator fns' size This is a patch to explicitly mark the size parameter of allocator functions like malloc/realloc/... as noundef. For C/C++: undef can be created from reading an uninitialized variable or padding. Calling a function with uninitialized variable is already UB. Calling malloc with padding value is.. something that's not expected. Padding bits may appear in a coerced aggregate, which doesn't apply to malloc's size. Therefore, malloc's size can be marked as noundef. For transformations that introduce malloc/realloc/..: I ran LLVM unit tests with an updated Alive2 semantics, and found no regression, so it seems okay. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97045	2021-02-23 13:58:03 +09:00
Kazu Hirata	4ed47858ab	[llvm] Use llvm::drop_begin (NFC)	2021-02-22 20:17:16 -08:00
ksyx	4125cabce1	[GVN] Fix a typo in comment NFC. Differential Revision: https://reviews.llvm.org/D97200 Reviewed By: fhahn	2021-02-23 10:39:34 +08:00
Jianzhou Zhao	7424efd5ad	[dfsan] Propagate origins at non-memory/phi/call instructions This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97200	2021-02-23 02:12:45 +00:00
Petr Hosek	c24b7a16b1	[InstrProfiling] Use ELF section groups for counters, data and values __start_/__stop_ references retain C identifier name sections such as __llvm_prf_*. Putting these into a section group disables this logic. The ELF section group semantics ensures that group members are retained or discarded as a unit. When a function symbol is discarded, this allows allows linker to discard counters, data and values associated with that function symbol as well. Note that `noduplicates` COMDAT is lowered to zero-flag section group in ELF. We only set this for functions that aren't already in a COMDAT and for those that don't have available_externally linkage since we already use regular COMDAT groups for those. Differential Revision: https://reviews.llvm.org/D96757	2021-02-22 14:00:02 -08:00
Alexey Bataev	9a4dd4de9d	[SLP]No need to mark scatter load pointer as scalar as it gets vectorized. Pointer operand of scatter loads does not remain scalar in the tree (it gest vectorized) and thus must not be marked as the scalar that remains scalar in vectorized form. Differential Revision: https://reviews.llvm.org/D96818	2021-02-22 11:58:28 -08:00
Petr Hosek	4827492d9f	Revert "[InstrProfiling] Use ELF section groups for counters, data and values" This reverts commits: `5ca21175e0` `97184ab99c` The instrprof-gc-sections.c is failing on AArch64 LLD bot.	2021-02-22 11:13:55 -08:00
Florian Hahn	95daec6a84	[ConstraintElimination] Use unsigned > 0 instead of != 0. ICMP_NE predicates cannot be directly represented as constraint. But we can use ICMP_UGT instead ICMP_NE for %x != 0. See https://alive2.llvm.org/ce/z/XlLCsW	2021-02-22 17:54:36 +00:00
Nikita Popov	4125afc357	[MemCpyOpt] Fix handling of readnone byval arguments If the call is readnone, then there may not be any MemoryAccess associated with the call. Bail out in that case. This fixes the issue reported at https://reviews.llvm.org/D94376#2578312.	2021-02-22 18:48:31 +01:00
Nikita Popov	5e7e499b91	[JumpThreading] Clone noalias.scope.decl when threading blocks When cloning instructions during jump threading, also clone and adapt any declared scopes. This is primarily important when threading loop exits, because we'll end up with two dominating scope declarations in that case (at least after additional loop rotation). This addresses a loose thread from https://reviews.llvm.org/rG2556b413a7b8#975012. Differential Revision: https://reviews.llvm.org/D97154	2021-02-22 18:35:30 +01:00
Florian Hahn	c7ee57f1dc	[LV] Directly use incoming value for single VPBlendRecipes. VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use the incoming value directly.	2021-02-22 16:10:08 +00:00
Florian Hahn	c11fd0df64	[VPlan] Skip VPWidenPHIRecipe in VPInterleavedACcessInfo. Update unit tests that did not expect VPWidenPHIRecipes after `15a74b64df`.	2021-02-22 10:35:09 +00:00
Florian Hahn	15a74b64df	[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe. This patch extends VPWidenPHIRecipe to manage pairs of incoming (VPValue, VPBasicBlock) in the VPlan native path. This is made possible because we now directly manage defined VPValues for recipes. By keeping both the incoming value and block in the recipe directly, code-generation in the VPlan native path becomes independent of the predecessor ordering when fixing up non-induction phis, which currently can cause crashes in the VPlan native path. This fixes PR45958. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D96773	2021-02-22 09:44:25 +00:00
Petr Hosek	5ca21175e0	[InstrProfiling] Use ELF section groups for counters, data and values __start_/__stop_ references retain C identifier name sections such as __llvm_prf_*. Putting these into a section group disables this logic. The ELF section group semantics ensures that group members are retained or discarded as a unit. When a function symbol is discarded, this allows allows linker to discard counters, data and values associated with that function symbol as well. Note that `noduplicates` COMDAT is lowered to zero-flag section group in ELF. We only set this for functions that aren't already in a COMDAT and for those that don't have available_externally linkage since we already use regular COMDAT groups for those. Differential Revision: https://reviews.llvm.org/D96757	2021-02-21 16:13:06 -08:00
Nikita Popov	e0615bcd39	[Loads] Add optimized FindAvailableLoadedValue() overload (NFCI) FindAvailableLoadedValue() accepts an iterator by reference. If no available value is found, then the iterator will either be left at a clobbering instruction or the beginning of the basic block. This allows using FindAvailableLoadedValue() across multiple blocks. If this functionality is not needed, as is the case in InstCombine, then we can use a much more efficient implementation: First try to find an available value, and only perform clobber checks if we actually found one. As this function only looks at a very small number of instructions (6 by default) and usually doesn't find an available value, this saves many expensive alias analysis queries.	2021-02-21 18:42:56 +01:00
Kristina Bessonova	e97aab8d15	[ThinLTO] Fix import of multiply defined global variables Currently, if there is a module that contains a strong definition of a global variable and a module that has both a weak definition for the same global and a reference to it, it may result in an undefined symbol error while linking with ThinLTO. It happens because: * the strong definition become internal because it is read-only and can be imported; * the weak definition gets replaced by a declaration because it's non-prevailing; * the strong definition failed to be imported because the destination module already contains another definition of the global yet this def is non-prevailing. The patch adds a check to computeImportForReferencedGlobals() that allows considering a global variable for being imported even if the module contains a definition of it in the case this def has an interposable linkage type. Note that currently the check is based only on the linkage type (and this seems to be enough at the moment), but it might be worth to account the information whether the def is prevailing or not. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95943	2021-02-21 18:34:12 +02:00
Jianzhou Zhao	9524632fa2	[dfsan] Comment out unused methods by D97087 temporarily	2021-02-21 03:31:19 +00:00
Sanjay Patel	e772618f1e	[InstCombine] fold fdiv with exp/exp2 divisor (PR49147) Follow-up to: D96648 / `b40fde062` ...for the special-case base calls. From the earlier commit: This is unusual in the general (non-reciprocal) case because we need an extra instruction, but that should be better for general FP reassociation and codegen. We conservatively check for "arcp" FMF here as we do with existing fdiv folds, but it is not strictly necessary to have that.	2021-02-20 16:02:58 -05:00
Teresa Johnson	fde55a9c9b	[LTO] Fix cloning of llvm.used when splitting module Refines the fix in `3c4c205060` to only put globals whose defs were cloned into the split regular LTO module on the cloned llvm.used globals. This avoids an issue where one of the attached values was a local that was promoted in the original module after the module was cloned. We only need to have the values defined in the new module on those globals. Fixes PR49251. Differential Revision: https://reviews.llvm.org/D97013	2021-02-20 09:46:43 -08:00
Simon Pilgrim	609d0c9772	[InstCombine] matchBSwapOrBitReverse - remove pattern matching early-out. NFCI. recognizeBSwapOrBitReverseIdiom + collectBitParts have pattern matching to bail out early if a bswap/bitreverse pattern isn't possible - we should be able to rely on this instead without any notable change in compile time. This is part of a cleanup towards letting matchBSwapOrBitReverse /recognizeBSwapOrBitReverseIdiom use 'root' instructions that aren't ORs (FSHL/FSHRs in particular which can be prematurely created). Differential Revision: https://reviews.llvm.org/D97056	2021-02-20 13:15:34 +00:00
Dávid Bolvanský	cd54c57919	Reland "[Libcalls, Attrs] Annotate libcalls with noundef" Fixed Clang tests.	2021-02-20 06:18:48 +01:00
Dávid Bolvanský	94d034fb86	Revert "[Libcalls, Attrs] Annotate libcalls with noundef" This reverts commit `33b0c63775`. Bots are failing. Some Clang tests need to be updated too.	2021-02-20 04:18:42 +01:00
Dávid Bolvanský	33b0c63775	[Libcalls, Attrs] Annotate libcalls with noundef I think we can use here same logic as for nonnull. strlen(X) - X must be noundef => valid pointer. for libcalls with size arg, we add noundef only if size is known and greater than 0 - so pointers must be noundef (valid ones) Reviewed By: jdoerfert, aqjune Differential Revision: https://reviews.llvm.org/D95122	2021-02-20 04:10:07 +01:00
Dávid Bolvanský	68e6025cf7	Revert "[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly" This reverts commit `05d891a19e`.	2021-02-20 03:58:53 +01:00
Dávid Bolvanský	05d891a19e	[BuildLibcalls] Mark some libcalls with inaccessiblememonly and inaccessiblemem_or_argmemonly Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94850	2021-02-20 03:56:01 +01:00
Jianzhou Zhao	dab953c8e4	[dfsan] Add utils that get/set origins This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97087	2021-02-20 00:52:33 +00:00
Jianzhou Zhao	cb1f1aab90	[dfsan] Add origin address calculation This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D97065	2021-02-19 21:30:07 +00:00
Jianzhou Zhao	efc8f3311b	[msan] Set cmpxchg shadow precisely In terms of https://llvm.org/docs/LangRef.html#cmpxchg-instruction, the return type of chmpxchg is a pair {ty, i1}, while I think we only wanted to set the shadow for the address 0th op, and it has type ty. Reviewed-by: eugenis Differential Revision: https://reviews.llvm.org/D97029	2021-02-19 20:23:23 +00:00
Wei Mi	4ffad1fb48	[SampleFDO] Add PromotedInsns to prevent repeated ICP. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, We use 0 count value profile to memorize which target has been promoted and prevent repeated ICP for the same target, so we delete PromotedInsns. However, I found the implementation in the patch has some shortcomings to be fixed otherwise there will still be repeated ICP. So I add PromotedInsns back temorarily. Will remove it after I get a thorough fix.	2021-02-19 10:01:49 -08:00
Benjamin Kramer	59f442e6bb	[LV] Fold single-use variable into assert. NFC.	2021-02-19 18:11:39 +01:00
Nikita Popov	71a8e4e7d6	[MemCopyOpt] Enable MemorySSA by default This enables use of MemorySSA instead of MemDep in MemCpyOpt. To allow this without significant compile-time impact, the MemCpyOpt pass is moved directly before DSE (in the cases where this was not already the case), which allows us to reuse the existing MemorySSA analysis. Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt can also perform simple optimizations across basic blocks. Differential Revision: https://reviews.llvm.org/D94376	2021-02-19 18:06:25 +01:00
Florian Hahn	edc92a1c42	[LV] Remove VPCallback. Now that all state for generated instructions is managed directly in VPTransformState, VPCallBack is no longer needed. This patch updates the last use of `getOrCreateScalarValue` to instead manage the value directly in VPTransformState and removes VPCallback. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95383	2021-02-19 12:50:41 +00:00
Nikita Popov	2f17ed294f	[DCE] Don't remove non-willreturn calls In both ADCE and BDCE (via DemandedBits) we should not remove instructions that are not guaranteed to return. This issue was pointed out by fhahn in the recent llvm-dev thread. Differential Revision: https://reviews.llvm.org/D96993	2021-02-19 12:35:40 +01:00
Nikita Popov	370addb996	[IR] Move willReturn() to Instruction This moves the willReturn() helper from CallBase to Instruction, so that it can be used in a more generic manner. This will make it easier to fix additional passes (ADCE and BDCE), and will give us one place to change if additional instructions should become non-willreturn (e.g. there has been talk about handling volatile operations this way). I have also included the IntrinsicInst workaround directly in here, so that it gets applied consistently. (As such this change is not entirely NFC -- FuncAttrs will now use this as well.) Differential Revision: https://reviews.llvm.org/D96992	2021-02-19 11:56:01 +01:00
Djordje Todorovic	1a2b3536ef	Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info" As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545 The test that was failing is now forced to use the old PM.	2021-02-18 23:29:22 -08:00
Xun Li	3bf8f162a0	[Coroutine] Relax CoroElide musttail check As discussed in D94834, we don't really need to do complicated analysis. It's safe to just drop the tail call attribute. Differential Revision: https://reviews.llvm.org/D96926	2021-02-18 19:36:11 -08:00
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Jianzhou Zhao	7e658b2fdc	[dfsan] Instrument origin variable and function definitions This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse, gbalats Differential Revision: https://reviews.llvm.org/D96977	2021-02-18 23:50:05 +00:00
Hongtao Yu	e87b1b1d4e	[CSSPGO] Use callsite sample counts to annotate indirect call sites. With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling. I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D96990	2021-02-18 14:52:34 -08:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Ta-Wei Tu	f70cdc5b5c	[NPM] Properly reset parent loop after loop passes This fixes https://bugs.llvm.org/show_bug.cgi?id=49185 When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`. If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes. However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes. This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D96727	2021-02-19 02:50:53 +08:00
Jianzhou Zhao	406dc54903	[dfsan] Refactor defining TLS variables This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96941	2021-02-18 18:04:21 +00:00
Jianzhou Zhao	2e6cd338c6	[dfsan] Refactor runtime functions checking This is a part of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96940	2021-02-18 18:01:46 +00:00
Philip Reames	8666463889	[instcombine] Exploit UB implied by nofree attributes This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred. Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug. Differential Revision: https://reviews.llvm.org/D96349	2021-02-18 08:34:22 -08:00
Djordje Todorovic	c1e23894fc	Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info" This reverts rG8ee7c7e02953. One test is failing, I'll reland this as soon as possible.	2021-02-18 02:04:27 -08:00
Djordje Todorovic	8ee7c7e029	[Debugify] Make the debugify aware of the original (-g) Debug Info As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545	2021-02-18 01:52:16 -08:00
Joseph Huber	c3a3d20093	[LV] Add analysis remark for mixed precision conversions Floating point conversions inside vectorized loops have performance implications but are very subtle. The user could specify a floating point constant, or call a function without realizing that it will force a change in the vector width. An example of this behaviour is seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate when this happens becuase it is most likely unintended behaviour. This patch adds a simple check for this behaviour by following floating point stores in the original loop and checking if a floating point conversion operation occurs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D95539	2021-02-17 21:37:08 -05:00
Teresa Johnson	d55d46f43b	[WPD] Add an optional checking mode for debugging devirtualization This adds an internal option -wholeprogramdevirt-check which if enabled will guard each devirtualization with a runtime check against the expected target, and an invocation of a debug trap if the check fails. This is useful for debugging WPD failures involving undefined behavior (e.g. casting to another class type not in the inheritance chain). Differential Revision: https://reviews.llvm.org/D95969	2021-02-17 16:46:15 -08:00
Rong Xu	7397905ab0	[SampleFDO] Third Try: Refactor SampleProfile.cpp Apply the patch for the third time after fixing buildbot failures. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. (4) Add inline keyword to avoid duplicated symbols -- they will be removed later when the class is changed to a template. Differential Revision: https://reviews.llvm.org/D96455	2021-02-17 15:31:50 -08:00
Teresa Johnson	3c4c205060	[WPD][lld] Test handling of vtable definition from shared libraries Adds a lld test for a case that the handling added for dynamically exported symbols in `1487747e99` already fixes. Because isExportDynamic returns true when the symbol is SharedKind with default visibility, it will treat as dynamically exported and block devirtualization when the definition of a vtable comes from a shared library. This is desireable as it is dangerous to devirtualize in that case, since there could be hidden overrides in the shared library. Typically that happens when the shared library header contains available externally definitions, which applications can override. An example is std::error_category, which is overridden in LLVM and causing failures after a self build with WPD enabled, because libstdc++ contains hidden overrides of the virtual base class methods. The regular LTO case in the new test already worked, but there are 2 fixes in this patch needed for the index-only case and the hybrid LTO case. For the index-only case, WPD should not simply ignore available externally vtables. A follow on fix will be made to clang to emit type metadata for those vtables, which the new test is modeling. For the hybrid case, we need to ensure when the module is split that any llvm.*used globals are cloned to the regular LTO split module so available externally vtable definitions are not prematurely deleted. Another follow on fix will add the equivalent gold test, which requires a small fix to the plugin to treat symbols in dynamic libraries the same way lld already is. Differential Revision: https://reviews.llvm.org/D96721	2021-02-17 12:49:24 -08:00
Vedant Kumar	c28622fbf3	Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455" This reverts commit `c73cbf218a`. Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)" This reverts commit `a23e6b321c`. Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" This reverts commit `6fd5ccff72`. Still seeing link failures when building llc (or other tools), due to the new SampleProfileLoaderBaseImpl.h containing definitions that get duplicated across multiple TU's. ``` duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const) const' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const, llvm::BasicBlock const*>)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) ```	2021-02-17 10:22:24 -08:00
William S. Moses	40862b1a74	[SROA] Propagate correct TBAA/TBAA Struct offsets SROA does not correctly account for offsets in TBAA/TBAA struct metadata. This patch creates functionality for generating new MD with the corresponding offset and updates SROA to use this functionality. Differential Revision: https://reviews.llvm.org/D95826	2021-02-17 11:59:00 -05:00
Ta-Wei Tu	0eeaec2a6d	[NFC] Refactor LoopInterchange into a loop-nest pass This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change. Changes that are not loop-nest related are split to D96650. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D96644	2021-02-18 00:55:38 +08:00
Sjoerd Meijer	f78aa8b2c2	[LSR] Add a flag that overrides the target's preferred addressing mode This adds a new flag -lsr-preferred-addressing-mode to override the target's preferred addressing mode. It replaces flag -lsr-backedge-indexing, which is equivalent to preindexed addressing that is one of the options that -lsr-preferred-addressing-mode accepts. Differential Revision: https://reviews.llvm.org/D96855	2021-02-17 16:50:21 +00:00
Sanjay Patel	85294703a7	[InstCombine] fold fcmp-of-copysign idiom As discussed in: https://llvm.org/PR49179 ...this pattern shows up in library code. There are several potential generalizations as noted, but we need to be careful that we get FP special-values right, and it's not clear how much variation we should expect to see from this exact idiom.	2021-02-17 10:32:33 -05:00
Sjoerd Meijer	5a641cf194	Follow up of rGdea4a63e6359, which committed a slightly different version than intended.	2021-02-17 10:07:26 +00:00
Sjoerd Meijer	dea4a63e63	[LSR] Cleanup of getPreferredAddresingMode. NFC. This is a follow up D96600 and cleans up most calls to getPreferredAddresingMode. I.e., we really don't need to query the same things again and again, but get the preferred addressing mode once for each loop. So this should be a lot friendlier for compile times, especially if we start implementing getPreferredAddresingMode. Differential Revision: https://reviews.llvm.org/D96772	2021-02-17 09:45:29 +00:00
Rong Xu	6fd5ccff72	[SampleFDO] Reapply: Refactor SampleProfile.cpp Reapply patch after fixing buildbot failure. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 16:43:21 -08:00
Mehdi Amini	c761fe77bd	Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp" This reverts commit `310b35304c`. The build is broken with -DBUILD_SHARED_LIBS=ON : lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const, llvm::ProfileSummaryInfo, bool)': SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const' SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const' ...	2021-02-16 22:11:42 +00:00
Rong Xu	310b35304c	[SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 11:18:21 -08:00
Arnold Schwaighofer	627cfd4394	[coro async] Don't promote allocas to the frame or rewrite swifterror if there are no suspend points Also don't call function to update the call graph if there are no clones. The function will fail. rdar://74277860 Differential Revision: https://reviews.llvm.org/D96620	2021-02-16 09:05:38 -08:00
Ta-Wei Tu	6b612a7baf	[NFC][LoopInterchange] Explicitly pass both `InnerLoop` and `OuterLoop` to `processLoop` This is a split patch of D96644. Explicitly pass both `InnerLoop` and `OuterLoop` to function `processLoop` to remove the need to swap elements in loop list and allow making loop list an `ArrayRef`. Also, fix inconsistent spellings of `OuterLoopId` and `Inner Loop Id` in debug log. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96650	2021-02-16 22:17:44 +08:00
Florian Hahn	f64c626069	[VPlan] Remove unused Phi member from VPWidenPHIRecipe (NFC). The member is not needed any longer after recent changes.	2021-02-16 13:53:06 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Sander de Smalen	00fe10c6a6	[SCEVExpander] Migrate costAndCollectOperands to use InstructionCost. This patch changes costAndCollectOperands to use InstructionCost for accumulated cost values. isHighCostExpansion will return true if the cost has exceeded the budget. Reviewed By: CarolineConcatto, ctetreau Differential Revision: https://reviews.llvm.org/D92238	2021-02-16 09:27:34 +00:00
Florian Hahn	54a14c264a	[VPlan] Manage scalarized values using VPValues. This patch updates codegen to use VPValues to manage the generated scalarized instructions. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92285	2021-02-16 09:04:10 +00:00
Akira Hatanaka	32dc79c5ef	[ObjC][ARC] Do not perform code motion on precise release calls This fixes a bug where an object can get deallocated before reaching the end of its full formal lifetime. rdar://72110887 rdar://74123176	2021-02-15 17:39:37 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Max Kazantsev	e3c759bd58	[LoopLoadElim] Pass ScalarEvolution in old pass manager. PR49141 Loop canonicalization may end up deleting blocks from CFG. And Scalar Evolution may still keep cached referenced to those blocks unless updated properly.	2021-02-15 18:08:23 +07:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Florian Hahn	3df5d5aace	[ConstraintElimination] Fix variables used for pattern matching. Re-using the matched variable in the pattern does not work as expected. This patch fixes that by introducing a new variable for the 2nd level match.	2021-02-14 18:42:37 +00:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Sanjay Patel	b40fde062c	[InstCombine] fold fdiv with pow divisor (PR49147) This is unusual in the general (non-reciprocal) case because we need an extra instruction, but that should be better for general FP reassociation and codegen. We conservatively check for "arcp" FMF here as we do with existing fdiv folds, but it is not strictly necessary to have that. This is part of solving: https://llvm.org/PR49147 (The powi variant potentially has a different constraint.) Differential Revision: https://reviews.llvm.org/D96648	2021-02-14 08:07:36 -05:00
Juneyoung Lee	ed253ef772	[LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask This patch fixes pr48832 by correctly generating the mask when a poison value is involved. Consider this CFG (which is a part of the input): ``` for.body: ; preds = %for.cond br i1 true, label %cond.false, label %land.rhs land.rhs: ; preds = %for.body br i1 poison, label %cond.end, label %cond.false cond.false: ; preds = %for.body, %land.rhs br label %cond.end cond.end: ; preds = %land.rhs, %cond.false %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ] ``` The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead. The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation. SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026). Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D95217	2021-02-14 21:12:34 +09:00
Teresa Johnson	a80232bd5f	[LTT] Address post-review comments (NFC) Implement some post-review cleanup suggestions for D96083.	2021-02-13 15:52:59 -08:00
Tyker	642e9225c6	reland [InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-13 13:03:11 +01:00
Wei Wang	80dc0661bd	[LTO] Perform DSOLocal propagation in combined index Perform DSOLocal propagation within summary list of every GV. This avoids the repeated query of this information during function importing. Differential Revision: https://reviews.llvm.org/D96398	2021-02-12 22:58:26 -08:00
Arnold Schwaighofer	e760ec2a01	[coro] Add support for polymorphic return typed coro.suspend.async This allows for suspend point specific resume function types. Return values from a suspend point can therefore be modelled as arguments to the resume function. Allowing for directly passed return types. Differential Revision: https://reviews.llvm.org/D96136	2021-02-12 10:08:00 -08:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Kerry McLaughlin	fea06efe7c	[SVE][LoopVectorize] Support for vectorization of loops with function calls Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs and adds some simple tests for loops containing a function for which there is a vectorized variant available. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96356	2021-02-12 13:47:43 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
Florian Hahn	85fe5c9345	[VPlan] Make VPRecipeBase inherit from VPUser directly (NFC). The individual recipes have been updated to manage their operands using VPUser a while back. Now that the transition is done, we can instead make VPRecipeBase a VPUser and get rid of the toVPUser helper.	2021-02-12 13:06:58 +00:00
David Sherwood	01b87444cb	[NFC][Analysis] Change struct VecDesc to use ElementCount This patch changes the VecDesc struct to use ElementCount instead of an unsigned VF value, in preparation for future work that adds support for vectorized versions of math functions using scalable vectors. Since all I'm doing in this patch is switching the type I believe it's a non-functional change. I changed getWidestVF to now return both the widest fixed-width and scalable VF values, but currently the widest scalable value will be zero. Differential Revision: https://reviews.llvm.org/D96011	2021-02-12 11:07:58 +00:00
David Sherwood	9700228abc	[Analysis] Change VFABI::mangleTLIVectorName to use ElementCount Adds support for mangling TLI vector names for scalable vectors. Differential Revision: https://reviews.llvm.org/D96338	2021-02-12 09:38:12 +00:00
Kazu Hirata	9dc62d1dc1	[PGO] Drop unnecessary const from return types (NFC)	2021-02-11 23:31:29 -08:00
Hongtao Yu	de40f6d623	[CSSPGO] Process functions in a top-down order on a dynamic call graph. Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining. The profile top-down order for SCC is also extended to support non-CS profiles. Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95988	2021-02-11 12:36:59 -08:00
Michael Kruse	606aa622b2	Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache" This reverts commit `b7d870eae7` and the subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)" (commit `e6810cab09`). It caused indeterminism in the output, such that e.g. the polly-x86_64-linux buildbot failed accasionally.	2021-02-11 12:17:38 -06:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Markus Lavin	9498315c9b	Expand masked mem intrinsics correctly wrt big-endian Need to take endianness into account when doing vector to scalar casts such as %bc = bitcast <8 x i1> %v to i8 Companion commit for https://reviews.llvm.org/D94867 Upload in response to https://lists.llvm.org/pipermail/llvm-dev/2021-January/147862.html Attempting to document the actual memory layout rules for vectors in https://reviews.llvm.org/D94964 Differential Revision: https://reviews.llvm.org/D94765	2021-02-11 08:59:52 +00:00
Sander de Smalen	be9bbb57f4	[LoopVectorize] NFC: Change selectVectorizationFactor to work on ElementCount. This patch is NFC and changes occurrences of `unsigned Width` and `unsigned i` to work on type ElementCount instead. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96019	2021-02-11 08:47:59 +00:00
Kazu Hirata	d12a0f4fc0	[GCOV] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-02-10 20:01:18 -08:00
Duncan P. N. Exon Smith	fa35c1f80f	ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and `MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more precisely describe its effect and clarify the header documentation. Found this while helping to investigate PR48841, which pointed out an unsound use of the flag in `CloneModule()`. For now I've just added a FIXME there, but I'm hopeful that the new (more precise) name will prevent other similar errors.	2021-02-10 16:53:21 -08:00
Benjamin Kramer	8fb4a4f7bb	[SampleFDO] Silence -Wnon-virtual-dtor warning There's no polymorphic deletion happening here.	2021-02-10 23:37:15 +01:00
Rong Xu	db0d7d0ba9	[SampleFDO][NFC] Refactor SampleProfileLoader to reuse in CodeGen Break SampleProfileLoader into to a base and a derived class. Base class (SampleProfileLoaderBaseImpl) includes the common code for IR and MachineIR (CodeGen) sample loader. It will be templatelized in the later patch. Inline and Probe related code will remain in the derived class of SampleProfileLoader and stays in SampleProfile.cpp. We need to refactor some functions: (1) getInstWeight() to enable the code sharing -- put the core into getInstWeightImpl(). (2) emitAnnotation() and propagateWeights() to carve out the code specific to SampleProfileLoader. (3) make getInstWeight() and findFunctionSamples() virtual and override in SampleProfileLoader as they need to access the fields in the derived class. Differential Revision: https://reviews.llvm.org/D95832	2021-02-10 13:29:15 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Adrian Prantl	19fc8eede4	Add missing nullptr check. salvageDebugInfoImpl() may fail and return a nullptr.	2021-02-10 12:15:24 -08:00
Sanjay Patel	6e2053983e	[InstCombine] fold lshr(mul X, SplatC), C2 This is a special-case multiply that replicates bits of the source operand. We need this fold to avoid regression if we make canonicalization to `mul` more aggressive for shl+or patterns. I did not see a way to make Alive generalize the bit width condition for even-number-of-bits only, but an example of the proof is: Name: i32 Pre: isPowerOf2(C1 - 1) && log2(C1) == C2 && (C2 * 2 == width(C2)) %m = mul nuw i32 %x, C1 %t = lshr i32 %m, C2 => %t = and i32 %x, C1 - 2 Name: i14 %m = mul nuw i14 %x, 129 %t = lshr i14 %m, 7 => %t = and i14 %x, 127 https://rise4fun.com/Alive/e52	2021-02-10 15:02:31 -05:00
Sander de Smalen	9db6e97a86	[LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount. This patch is NFC and changes occurrences of `unsigned MaxVectorSize` to work on type ElementCount. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D96018	2021-02-10 08:52:10 +00:00
Tyker	5652e192fc	Revert "[InstCombine] convert assumes to operand bundles" This reverts commit `5eb2e994f9`.	2021-02-10 01:32:00 +01:00
Florian Hahn	fd8afa41eb	[VPlan] Use VPUser to manage CondBit VP blocks keep track of a condition, which is a VPValue. This patch updates VPBlockBase to manage the value using VPUser, so replaceAllUsesWith properly updates the condition bit as well. This is required to enable VP2VP transformations and it helps with simplifying some of the code required to manage condition bits. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95382	2021-02-09 21:53:50 +00:00
Johannes Doerfert	81429abd83	[Attributor][FIX] Do not create UB by introducing a `noundef undef` This was reported as PR49104. The reproducer uses varargs but the issue is the same, we know an argument is dead but can't change the signature for some reason. The PR49104 situation was: We are in an CG-SCC traversal and we remove all the uses of an argument and proof it thereby dead. However, if we do not remove the argument, via signature rewrite, we need to ensure that the `undef` we introduce at the call site doesn't clash with a `noundef` attribute.	2021-02-09 13:02:38 -06:00
Tyker	5eb2e994f9	[InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-09 19:33:53 +01:00
Jianzhou Zhao	9887fdebd6	[dfsan] Refactor loadShadow To simplify the review of https://reviews.llvm.org/D95835. Reviewed-by: gbalats, morehouse Differential Revision: https://reviews.llvm.org/D96180	2021-02-09 17:21:41 +00:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Chuanqi Xu	88d7876e1e	[NFC] [Coroutine] Remove Unused Variables	2021-02-09 15:55:06 +08:00
Kazu Hirata	302313a264	[Transforms] Use range-based for loops (NFC)	2021-02-08 22:33:53 -08:00
Kazu Hirata	de6c49ae31	[Transforms/Utils] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-08 22:33:49 -08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Arthur Eubanks	0eda454796	[SimpleLoopUnswitch] Don't non-trivially unswitch loops that are unsafe to clone Non-trivial unswitching can clone loops. The legacy -loop-unswitch pass also checks for this. Fixes PR49085. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D96288	2021-02-08 13:19:24 -08:00
Jianzhou Zhao	64b448b983	[dfsan] Refactor visitCallBase To simplify the review of https://reviews.llvm.org/D95835. Reviewed-by: morehouse Differential Revision: https://reviews.llvm.org/D96177	2021-02-08 19:55:18 +00:00
Florian Hahn	68dc90b347	[ConstraintElimination] Decompose a few more GEP indices. This patch adds handling for zero-extended GEP indices.	2021-02-08 18:06:38 +00:00
Florian Hahn	1f1f037ed3	[ConstraintElimination] Improve index handing during constraint building. This patch improves the index management during constraint building. Previously, the code rejected constraints which used values that were not part of Value2Index, but after combining the coefficients of the new indices were 0 (if ShouldAdd was 0). In those cases, no new indices need to be added. Instead of adding to Value2Index directly, add new indices to the NewIndices map. The caller can then check if it needs to add any new indices. This enables checking constraints like `a + x <= a + n` to `x <= n`, even if there is no constraint for `a` directly.	2021-02-08 13:05:13 +00:00
Florian Hahn	ca268ed285	[ConstraintElimination] Decompose zext for unsigned compares. For unsigned compares, zext should be a no-op and we can add the extended value to the constraint system.	2021-02-07 20:53:06 +00:00
Florian Hahn	3bb6dc0b26	[LV] Replace some uses of VectorLoopValueMap with VPTransformState (NFC) This patch updates some places where VectorLoopValueMap is accessed directly to instead go through VPTransformState. As we move towards managing created values exclusively in VPTransformState, this ensures the use always can fetch the correct value. This is in preparation for D92285, which switches to managing scalarized values through VPValues. In the future, the various fix* functions should be moved directly into the VPlan codegen stage. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95757	2021-02-07 18:28:21 +00:00
Kazu Hirata	be23012d5a	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-07 09:49:36 -08:00
Sanjay Patel	6fd91be354	[Reassociate] allow or->add with shl operands As discussed in: https://llvm.org/PR49055 We invert instcombine's add->or transform here because it makes it easier to identify factorization transforms like the mul in the motivating test. This extends the logic added with: https://reviews.llvm.org/rG70472f3 https://reviews.llvm.org/rG93f3d7f (I intentionally kept the formatting fix in this patch to provide more context about the calling logic.)	2021-02-07 09:45:19 -05:00
Florian Hahn	853c52c988	[ConstraintElimination] Require GEPs to be inbounds for decomposition. During decomposition, we assume the no-wrap properties of inbound GEPs. Make sure the decomposed GEP is actually inbounds.	2021-02-07 11:08:53 +00:00
Johannes Doerfert	b7d870eae7	[AssumptionCache] Avoid dangling llvm.assume calls in the cache PR49043 exposed a problem when it comes to RAUW llvm.assumes. While D96106 would fix it for GVNSink, it seems a more general concern. To avoid future problems this patch moves away from the vector of weak reference model used in the assumption cache. Instead, we track the llvm.assume calls with a callback handle which will remove itself from the cache if the call is deleted. Fixes PR49043. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96168	2021-02-06 12:18:39 -06:00
Teresa Johnson	3a27933ec2	[LTT] Don't attempt to lower type tests used only by assumes Type tests used only by assumes were original for devirtualization, but are meant to be kept through the first invocation of LTT so that they can be used for additional optimization. In the regular LTO case where the IR is analyzed we may find a resolution for the type test and end up rewriting the associated vtable global, which can have implications on section splitting. Simply ignore these type tests. Fixes PR48245. Differential Revision: https://reviews.llvm.org/D96083	2021-02-06 09:02:10 -08:00
Sander de Smalen	79a6cfc29e	NFC: Migrate LoopIdiomRecognize to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html	2021-02-06 14:39:19 +00:00
Sander de Smalen	ae27274b2f	NFC: Migrate LoopFlatten to work on InstructionCost. This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96029	2021-02-06 11:47:04 +00:00
Kazu Hirata	ea3175c15b	[Transforms/Instrumentation] Use range-based for loops (NFC)	2021-02-05 21:02:08 -08:00
Sidharth Baveja	22ebbc4765	LoopUnrollAndJam] Only allow loops with single exit(ing) blocks Summary: This resolves an issue posted on Bugzilla. https://bugs.llvm.org/show_bug.cgi?id=48764 In this issue, the loop had multiple exit blocks, which resulted in the function getExitBlock to return a nullptr, which resulted in hitting the assert. This patch ensures that loops which only have one exit block as allowed to be unrolled and jammed. Reviewed By: Whitney, Meinersbur, dmgreen Differential Revision: https://reviews.llvm.org/D95806	2021-02-05 16:10:53 +00:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Arnold Schwaighofer	8a7f5ad0fd	We can only move static allocas into the resume entry points Dynamic allocas that still exist have been verified to be only used 'locally' not accross a suspend point. rdar://73903220 Differential Revision: https://reviews.llvm.org/D96071	2021-02-05 06:06:10 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Adrian Kuegel	7fe41ac3df	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `3e5ce49e53`. Tests started failing on PPC, for example: http://lab.llvm.org:8011/#/builders/105/builds/5569	2021-02-05 12:51:03 +01:00
Simon Pilgrim	f7d07dbb29	IROutliner.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:38:09 +00:00
Simon Pilgrim	476b912e7c	SampleProfile.cpp - fix Wdocumentation warning. NFCI. Remove duplicate param	2021-02-05 11:31:17 +00:00
Simon Pilgrim	89edda7084	IROutliner.cpp - fix Wdocumentation warnings. NFCI.	2021-02-05 11:21:00 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Kazu Hirata	fb74e1e78a	[Transforms/Scalar] Use range-based for loops (NFC)	2021-02-04 21:18:05 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Philip Reames	3e5ce49e53	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way. Differential Revision: https://reviews.llvm.org/D94892	2021-02-04 17:28:30 -08:00
Richard Smith	ab243efb26	Don't infer attributes on '::operator new'. These attributes were all incorrect or inappropriate for LLVM to infer: - inaccessiblememonly is generally wrong; user replacement operator new can access memory that's visible to the caller, as can a new_handler function. - willreturn is generally wrong; a custom new_handler is not guaranteed to terminate. - noalias is inappropriate: Clang has a flag to determine whether this attribute should be present and adds it itself when appropriate. - noundef and nonnull on the return value should be specified by the frontend on all 'operator new' functions if we want them, not here. In any case, inferring attributes on functions declared 'nobuiltin' (as these are when Clang emits them) seems questionable.	2021-02-04 13:59:49 -08:00
Richard Smith	1484ad4137	Revert "[BuildLibcalls, Attrs] Support more variants of C++'s new, add attributes for C++'s delete" Several of the new attributes here were incorrect, and even the ones that are generally correct were being added even to nobuiltin calls. This reverts commit `bb3f169b59`.	2021-02-04 13:59:49 -08:00
Sander de Smalen	75b2555d6e	NFC: Migrate LoopUnrollPass to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D95817	2021-02-04 14:05:40 +00:00
Florian Hahn	703f6a6828	[ConstraintElimination] Support conditions from loop preheaders This patch extends the condition collection logic to allow adding conditions from pre-headers to loop headers, by allowing cases where the target block dominates some of its predecessors.	2021-02-04 13:58:32 +00:00
Chuanqi Xu	9511fa2dda	[NFC][Coroutine] Remove redundant comment The functionallity in the TODO was added before: https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b	2021-02-04 12:54:30 +08:00
Kazu Hirata	be37475897	[Transforms/IPO] Use range-based for loops (NFC)	2021-02-03 20:41:20 -08:00
Nico Weber	b995314143	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `97ba5cde52`. Still breaks tests: https://reviews.llvm.org/D76802#2540647	2021-02-03 19:14:34 -05:00
Arthur Eubanks	f020544601	[NewPM][HelloWorld] Move HelloWorld to Utils To prevent creating a new component, which creates a new library. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D95907	2021-02-03 12:59:40 -08:00
Rong Xu	b8f13db5b7	[SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker This patch detaches SampleProfileLoader from class SampleCoverageTracker. We plan to move SampleProfileLoader to a template class. This would remain SampleCoverageTracker as a class. Also make callsiteIsHot() as a file static function. Differential Revision: https://reviews.llvm.org/D95823	2021-02-03 11:38:04 -08:00
Florian Hahn	daaa0e3501	[VPlan] Manage induction value creation using VPValues. This patch updates the induction value creation to use VPValues of recipes to map the created values. This should bring is one step closer to being able to optimize induction recipes directly in VPlan. Currently widenIntOrFpInduction also generates vector values for a cast of the induction, if it exists. Make this explicit by adding the cast instruction to the values defined by the recipe. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92284	2021-02-03 17:45:03 +00:00
David Sherwood	d4626eb0bd	[VPlan][NFC] Introduce constructors for VPIteration This patch adds constructors to VPIteration as a cleaner way of initialising the struct and replaces existing constructions of the form: {Part, Lane} with VPIteration(Part, Lane) I have also added a default constructor, which is used by VPlan.cpp when deciding whether to replicate a block or not. This refactoring will be required in a later patch that adds more members and functions to VPIteration. Differential Revision: https://reviews.llvm.org/D95676	2021-02-03 08:52:27 +00:00
Petr Hosek	97ba5cde52	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-02 23:19:51 -08:00
Kazu Hirata	dc3d5453bc	[Transforms/Utils] Use range-based for loops (NFC)	2021-02-02 22:52:47 -08:00
Florian Hahn	d8e90716df	[ConstraintElimination] Skip pointer casts. We should be able to look through pointer casts that do not impact the value.	2021-02-02 21:25:29 +00:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Fangrui Song	51da12680f	[ConstraintElimination] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off build	2021-02-02 10:23:14 -08:00
Jeroen Dobbelaere	50c523a9d4	[InlineFunction] Only update noalias scopes once for an instruction. Inlining sometimes maps different instructions to be inlined onto the same instruction. We must ensure to only remap the noalias scopes once. Otherwise the scope might disappear (at best). This patch ensures that we only replace scopes for which the mapping is known. This approach is preferred over tracking which instructions we already handled in a SmallPtrSet, as that one will need more memory. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95862	2021-02-02 17:57:10 +01:00
Florian Hahn	3e09bc2500	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
Wenlei He	1645f465be	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. This is resubmit of D95024, with build break and overtighten assertion fixed. Test Plan:	2021-02-02 07:55:08 -08:00
Roman Lebedev	485c4b552b	[InstCombine] Host inversion out of ashr's value operand (PR48995) This is a yet another hint that we will eventually need InstCombineInverter, which would consistently sink inversions, but but for that we'll need to consistently hoist inversions where possible, so let's do that here. Example of a proof: https://alive2.llvm.org/ce/z/78SbDq See https://bugs.llvm.org/show_bug.cgi?id=48995	2021-02-02 17:56:43 +03:00
Tom Weaver	4f1320b77d	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `df3e39f60b`. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
Sander de Smalen	3d3ca8f8eb	NFC: Migrate SpeculateAroundPHIs to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: ctetreau Differential Revision: https://reviews.llvm.org/D95353	2021-02-02 13:32:45 +00:00
Sander de Smalen	00da322788	NFC: Migrate SimpleLoopUnswitch to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95352	2021-02-02 13:32:44 +00:00
Adrian Kuegel	48ca6da9d2	Revert "[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline" This reverts commit `9a03058d63`.	2021-02-02 11:51:04 +01:00
Adrian Kuegel	3a65ec4bf9	Revert "Fix build break from D95024" This reverts commit `09cd849fde`.	2021-02-02 11:51:04 +01:00
David Sherwood	d4d4ceeb8f	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Wenlei He	09cd849fde	Fix build break from D95024	2021-02-02 01:01:12 -08:00
Wenlei He	9a03058d63	[CSSPGO] Factor out common part for CSSPGO inline and AFDO inline Refactoring SampleProfileLoader::inlineHotFunctions to use helpers from CSSPGO inlining and reduce similar code in the inlining loop, plus minor cleanup for AFDO path. Test Plan: Differential Revision: https://reviews.llvm.org/D95024	2021-02-02 00:34:06 -08:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Petr Hosek	df3e39f60b	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Hongtao Yu	224fee8219	[CSSPGO] Tweaking inlining with pseudo probes. Fixing up a couple places where `getCallSiteIdentifier` is needed to support pseudo-probe-based callsites. Also fixing an issue in the extbinary profile reader where the metadata section is not fully scanned based on the number of profiles loaded only for the current module. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95791	2021-02-01 13:56:40 -08:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Florian Hahn	0b28d756af	[ConstraintElimination] Add support for EQ predicates. A == B map to A >= B && A <= B (https://alive2.llvm.org/ce/z/_dwxKn). This extends the constraint construction to return a list of constraints, which can be used to properly de-compose nested AND & OR.	2021-02-01 20:48:31 +00:00
Michael Holman	8bfef78722	[ConstantHoisting] Fix bug where constant materialization could insert into EH pad If the incoming block to a phi node is an EH pad, then we will materialize into an EH pad, which is not supposed to happen. To fix this, I added a check to see if incoming block of a phi node is an EH pad before using it as the insertion point. Differential Revision: https://reviews.llvm.org/D95019	2021-02-01 11:23:56 -08:00
Sanjay Patel	0ce2920f17	[InstCombine] try to narrow min/max intrinsics with constant operand The constant trunc/ext may not be the optimal pre-condition, but I think that handles the common cases. Example of Alive2 proof: https://alive2.llvm.org/ce/z/sREeLC This is another step towards canonicalizing to the intrinsics. Narrowing was identified as source of potential regression for abs(), so we need to handle this for min/max - see: https://llvm.org/PR48816 If this is not enough, we could process intrinsics in the trunc-driven matching in canEvaluateTruncated().	2021-02-01 13:44:13 -05:00
Florian Hahn	ce190e4144	[ConstraintElimination] Negate IR condition directly. Instead of using ConstraintSystem::negate when adding new constraints, flip the condition in IR. The main advantage is that EQ predicates can be represented by 2 constraints, which makes negating based on the constraint tricky. The IR condition can easily negated.	2021-02-01 17:21:40 +00:00
Sander de Smalen	bf294953e7	NFC: Migrate SimplifyCFG to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D95351	2021-02-01 16:14:05 +00:00
Sander de Smalen	880b64aa22	[SimplifyCFG] NFC: Rename static methods to clang-tidy standards. This patch is a precursor to D95351, which changes the signature of these methods.	2021-02-01 16:14:05 +00:00
Cullen Rhodes	8cda227432	[LV] Fix crash when computing max VF too early D90687 introduced a crash: llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed. when compiling the following C code: typedef struct { char a; } b; b *c; int d, e; int f() { int g = 0; for (; d; d++) { e = 0; for (; e < c[d].a; e++) g++; } return g; } with: clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o - This occurred since prior to D90687 computeFeasibleMaxVF would only be called in computeMaxVF when a scalar epilogue was allowed, but now it's always called. This causes the assert above since computeFeasibleMaxVF collects all viable VFs larger than the default MaxVF, and for each VF calculates the register usage which results in analysis being done the assert above guards against. This can occur in computeFeasibleMaxVF if TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in the hexagon backend to always return true. Reported by @iajbar. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94869	2021-02-01 12:14:59 +00:00
Sander de Smalen	3b8a1d581e	NFC: Migrate SpeculativeExecution to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D95356	2021-02-01 12:13:23 +00:00
Florian Hahn	a9583a1923	[LoopUnswitch] Pacify compiler warnings. Attempt to fix some compiler warnings on some bots after `b8c81fa5c7`.	2021-02-01 09:13:06 +00:00
Florian Hahn	b8c81fa5c7	[LoopUnswitch] Add shortcut if unswitched path is a no-op. If we determine that the invariant path through the loop has no effects, we can directly branch to the exit block, instead to unswitching first. Besides avoiding some extra work (unswitching first, then deleting the loop again) this allows to be more aggressive than regular unswitching with respect to cost-modeling. This approach should always be be desirable. This is similar in spirit to D93734, just that it uses the previously added checks for loop-unswitching. I tried to add the required no-op checks from scratch, as we only check a subset of the loop. There is potential to unify the checks with LoopDeletion, at the cost of adding a predicate whether a block should be considered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95468	2021-02-01 09:03:30 +00:00
Jeroen Dobbelaere	80cdd30eb9	[LoopPeel] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed. The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95544	2021-02-01 10:01:17 +01:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Florian Hahn	39486753d5	[ConstraintElimination] Verify CS and DFSInStack are in sync.(NFC) After the main loop is done, we should have one constraint per item in DFSInStack. Otherwise we added a constraint without a proper DFSInStack item.	2021-01-30 18:27:04 +00:00
Florian Hahn	10c57268c0	[LoopUnswitch] Properly update MSSA if header has non-clobbering stores. This patch fixes updating MemorySSA if the header contains memory defs that do not clobber a duplicated instruction. We need to find the first defining access outside the loop body and use that as defining access of the duplicated instruction. This fixes a crash caused by `bee486851c`.	2021-01-30 13:51:05 +00:00
Kazu Hirata	8ed1636184	[llvm] Use isa instead of dyn_cast (NFC)	2021-01-29 23:23:37 -08:00
Roman Lebedev	c2534a7097	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Roman Lebedev	a78d8feb48	[LowerConstantIntrinsics] Preserve Dominator Tree, if avaliable	2021-01-30 01:14:50 +03:00
Sriraman Tallam	9a81a4ef79	Emit metadata when instr. profiles hash mismatch occurs. This patch emits "instr_prof_hash_mismatch" function annotation metadata if there is a hash mismatch while applying instrumented profiles. During the PGO optimized build using instrumented profiles, if the CFG of the function has changed since generating the profile, a hash mismatch is encountered. This patch emits this information as annotation metadata. We plan to use this with Propeller which is done at the machine IR level. Propeller is usually applied on top of PGO and a hash mismatch during PGO could be used to detect source drift. Differential Revision: https://reviews.llvm.org/D95495	2021-01-29 12:56:01 -08:00
Florian Hahn	f3a710cade	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Yang Fan	59bd2068e9	[NFC][ScalarizeMaskedMemIntrin] Fix unused variable warning GCC warning: ``` /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedStore(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:295:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 295 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp: In function ‘void scalarizeMaskedScatter(llvm::CallInst, llvm::DomTreeUpdater, bool&)’: /llvm-project/llvm/lib/Transforms/Scalar/ScalarizeMaskedMemIntrin.cpp:555:15: warning: variable ‘IfBlock’ set but not used [-Wunused-but-set-variable] 555 \| BasicBlock IfBlock = CI->getParent(); \| ^~~~~~~ ```	2021-01-29 15:15:58 +08:00
Roman Lebedev	056385921d	[ScalarizeMaskedMemIntrin] Preserve Dominator Tree, if avaliable This de-pessimizes the arguably more usual case of no masked mem intrinsics, and gets rid of one more Dominator Tree recalculation. As per llvm/test/CodeGen/X86/opt-pipeline.ll, there's one more Dominator Tree recalculation left, we could get rid of.	2021-01-29 01:11:36 +03:00
Roman Lebedev	577fdcaa93	[PartiallyInlineLibCalls] Preserve Dominator Tree, if avaliable This doesn't get rid of any Dominator Tree recalculations just yet, there is one more pass to update..	2021-01-29 01:11:36 +03:00
Roman Lebedev	573f74117b	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedCompressStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	2e4bb3f119	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedExpandLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	e8efc03a1e	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedScatter(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:35 +03:00
Roman Lebedev	1356399a11	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedGather(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	22b8421156	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedStore(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	0ea45a412a	[NFC][ScalarizeMaskedMemIntrin] scalarizeMaskedLoad(): port to SplitBlockAndInsertIfThen() Makes Dominator Tree preservation in a followup patch somewhat easier.	2021-01-29 01:11:34 +03:00
Roman Lebedev	394685481c	[NFC][PartiallyInlineLibCalls] Port to SplitBlockAndInsertIfThen() This makes follow-up patch for Dominator Tree preservation somewhat more straight-forward.	2021-01-29 01:11:33 +03:00
Roman Lebedev	2de2d84ed0	[NFC][EntryExitInstrumenter] Mark Dominator Tree as preserved in legacy-PM too This is correctly handled in new-PM wrappers, but not in old-PM.	2021-01-29 01:11:33 +03:00
Adrian Prantl	62140d943c	Better document the limitations of coro::salvageDebugInfo() and fix a few edge cases that show up in the Swift compiler but weren't caught by the existing tests. Most notably the old code wasn't salvaging load operations correctly. The patch also gets rid of the LoadFromFramePtr argument and replaces it with a more generalized mechanism.	2021-01-28 09:53:19 -08:00
Roman Lebedev	8cfa963463	[SimplifyCFG] If provided, preserve Dominator Tree SimplifyCFG is an utility pass, and the fact that it does not preserve DomTree's, forces it's users to somehow workaround that, likely by not preserving DomTrees's themselves. Indeed, simplifycfg pass didn't know how to preserve dominator tree, it took me just under a month (starting with `e113317958`) do rectify that, now it fully knows how to, there's likely some problems with that still, but i've dealt with everything i can spot so far. I think we now can flip the switch. Note that this is functionally an NFC change, since this doesn't change the users to pass in the DomTree, that is a separate question. Reviewed By: kuhar, nikic Differential Revision: https://reviews.llvm.org/D94827	2021-01-28 14:11:34 +03:00
Yang Fan	8644eb024b	[NFC][Transforms][Coroutines] Remove unused variable	2021-01-28 16:42:30 +08:00
Kazu Hirata	0da15ea581	[llvm] Use append_range (NFC)	2021-01-27 23:25:41 -08:00
Hongtao Yu	7e99bddfea	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Teresa Johnson	1487747e99	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Fangrui Song	54fb3ca96e	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Florian Hahn	28410d17f5	[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks. This gives the user control over which expander to use, which in turn allows the user to decide what to do with the expanded instructions. Used in D75980. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94295	2021-01-27 17:36:19 +00:00
Petr Hosek	bb9eb19829	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 17:13:34 -08:00
Adrian Prantl	0554541b44	Salvage debug info for function arguments in coro-split funclets. This patch improves the availability for variables stored in the coroutine frame by emitting an alloca to hold the pointer to the frame object and rewriting dbg.declare intrinsics to point inside the frame object using salvaged DIExpressions. Finally, a new alloca is created in the funclet to hold the FramePtr pointer to ensure that it is available throughout the entire function at -O0. This path also effectively reverts D90772. The testcase updates highlight nicely how every removed CHECK for a dbg.value is preceded by a new CHECK for a dbg.declare. Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews! Differential Revision: https://reviews.llvm.org/D93497 rdar://71866936	2021-01-26 15:01:26 -08:00
Bjorn Pettersson	a9bd3d37bd	[NewPM] Add ExtraVectorizerPasses support As it looks like NewPM generally is using SimpleLoopUnswitch instead of LoopUnswitch, this patch also use SimpleLoopUnswitch in the ExtraVectorizerPasses sequence (compared with LegacyPM which use the LoopUnswitch pass). Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D95457	2021-01-26 22:59:10 +01:00
Valery N Dmitriev	716b9dd0d8	[InstCombine] Preserve FMF for powi simplifications. Differential Revision: https://reviews.llvm.org/D95455	2021-01-26 13:26:06 -08:00
Petr Hosek	1e634f3952	Revert "Support for instrumenting only selected files or functions" This reverts commit `4edf35f11a` because the test fails on Windows bots.	2021-01-26 12:25:28 -08:00
Petr Hosek	4edf35f11a	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 11:11:39 -08:00
Sanjay Patel	09b1c56366	[LoopUtils] do not initialize Cmp predicate unnecessarily; NFC The switch must set the predicate correctly; anything else should lead to unreachable/assert. I'm trying to fix FMF propagation here and the callers, so this is a preliminary cleanup.	2021-01-26 11:22:51 -05:00
Florian Hahn	1272f16d14	[LoopUnswitch] Avoid partially unswitching too aggressively. This patch adds additional checks to avoid partial unswitching in cases where it won't be profitable, e.g. because the path directly exits the loop anyways.	2021-01-26 15:18:41 +00:00
Florian Hahn	35b3989a30	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
Sergey Dmitriev	13cedcaf45	[llvm-link] Fix crash when materializing appending global This patch fixes llvm-link crash when materializing global variable with appending linkage and initializer that depends on another global with appending linkage. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D95329	2021-01-25 18:08:07 -08:00
modimo	ce7f9cdb50	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Nikita Popov	835104a114	[LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943) When LSR converts a branch on the pre-inc IV into a branch on the post-inc IV, the nowrap flags on the addition may no longer be valid. Previously, a poison result of the addition might have been ignored, in which case the program was well defined. After branching on the post-inc IV, we might be branching on poison, which is undefined behavior. Fix this by discarding nowrap flags which are not present on the SCEV expression. Nowrap flags on the SCEV expression are proven by SCEV to always hold, independently of how the expression will be used. This is essentially the same fix we applied to IndVars LFTR, which also performs this kind of pre-inc to post-inc conversion. I believe a similar problem can also exist for getelementptr inbounds, but I was not able to come up with a problematic test case. The inbounds case would have to be addressed in a differently anyway (as SCEV does not track this property). Fixes https://bugs.llvm.org/show_bug.cgi?id=46943. Differential Revision: https://reviews.llvm.org/D95286	2021-01-25 23:13:48 +01:00
Richard Smith	925ae8c790	Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV" This reverts commit `53176c1680`, which introduceed a layering violation. LLVM's IR library can't include headers from Analysis.	2021-01-25 13:53:38 -08:00
Akira Hatanaka	53176c1680	[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end annotates calls with attribute "clang.arc.rv"="retain" or "clang.arc.rv"="claim", which indicates the call is implicitly followed by a marker instruction and a retainRV/claimRV call that consumes the call result. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the annotated calls in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the annotated calls. It doesn't remove the attribute on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes the autoreleaseRV call in the callee that returns the result if nothing in the callee prevents it from being paired up with the calls annotated with "clang.arc.rv"="retain/claim" in the caller. If the call is annotated with "claim", a release call is inserted since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the attributes to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV call returning the callee result, which makes it impossible to pair it up with the retainRV or claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the call is annotated with "retain" and does nothing if it's annotated with "claim". - This patch teaches dead argument elimination pass not to change the return type of a function if any of the calls to the function are annotated with attribute "clang.arc.rv". This is necessary since the pass can incorrectly determine nothing in the IR uses the function return, which can happen since the front-end no longer explicitly emits retainRV/claimRV calls in the IR, and change its return type to 'void'. Future work: - Use the attribute on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the attributes. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-01-25 11:57:08 -08:00
Florian Hahn	76afbf60ed	[VPlan] Replace uses with new value in VPInstructionsToVPRecipe (NFC). Now that VPRecipeBase inherits from VPDef, we can always use the new VPValue for replacement, if the recipe defines one. Given the recipes that are supported at the moment, all new recipes must have either 0 or 1 defined values.	2021-01-25 19:38:08 +00:00
Nick Desaulniers	d36812892c	[GVN] do not repeat PRE on failure to split critical edge Fixes an infinite loop encountered in GVN. GVN will delay PRE if it encounters critical edges, attempt to split them later via calls to SplitCriticalEdge(), then restart. The caller of GVN::splitCriticalEdges() assumed a return value of true meant that critical edges were split, that the IR had changed, and that PRE should be re-attempted, upon which we loop infinitely. This was exposed after D88438, by compiling the Linux kernel for s390, but the test case is reproducible on x86. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1261 Reviewed By: void Differential Revision: https://reviews.llvm.org/D94996	2021-01-25 11:23:44 -08:00
Wei Mi	c9cd9a0066	[SampleFDO] Report error when reading a bad/incompatible profile instead of turning off SampleFDO silently. Currently sample loader pass turns off SampleFDO optimization silently when it sees error in reading the profile. This behavior will defeat the tests which could have caught those bad/incompatible profile problems. This patch change the behavior to report error. Differential Revision: https://reviews.llvm.org/D95269	2021-01-25 10:28:23 -08:00
Xun Li	17c3538aef	Revert "Fix unused variable in CoroFrame.cpp when building Release with GCC 10" This reverts commit `ff5e896425`.	2021-01-25 08:37:45 -08:00
Florian Hahn	3201274dea	[VPlan] Handle scalarized values in VPTransformState. This patch adds plumbing to handle scalarized values directly in VPTransformState. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92282	2021-01-25 14:21:56 +00:00
Sanjay Patel	09a136bcc6	[InstCombine] narrow min/max intrinsics with extended inputs We can sink extends after min/max if they match and would not change the sign-interpreted compare. The only combo that doesn't work is zext+smin/smax because the zexts could change a negative number into positive: https://alive2.llvm.org/ce/z/D6sz6J Sext+umax/umin works: define i32 @src(i8 %x, i8 %y) { %0: %sx = sext i8 %x to i32 %sy = sext i8 %y to i32 %m = umax i32 %sx, %sy ret i32 %m } => define i32 @tgt(i8 %x, i8 %y) { %0: %m = umax i8 %x, %y %r = sext i8 %m to i32 ret i32 %r } Transformation seems to be correct!	2021-01-25 07:52:50 -05:00
Sander de Smalen	171d12489f	[SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. This change also changes getReductionCost to return InstructionCost, and it simplifies two expressions by removing a redundant 'isValid' check.	2021-01-25 12:27:01 +00:00
Nikita Popov	8b9df70bf7	[Utils] Use NoAliasScopeDeclInst in a few more places (NFC) In the cloning infrastructure, only track an MDNode mapping, without explicitly storing the Metadata mapping, same as is done during inlining. This makes things slightly simpler.	2021-01-24 16:24:11 +01:00
Sanjay Patel	77adbe6a8c	[SLP] fix fast-math requirements for fmin/fmax reductions `a6f0221276` enabled intersection of FMF on reduction instructions, so it is safe to ease the check here. There is still some room to improve here - it looks like we have nearly duplicate flags propagation logic inside of the LoopUtils helper but it is limited targets that do not form reduction intrinsics (they form the shuffle expansion).	2021-01-24 08:55:56 -05:00
Jeroen Dobbelaere	dcc7706fcf	[InstCombine] Remove unused llvm.experimental.noalias.scope.decl A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope. When that is not the case for at least one of the two, the intrinsic call can as well be removed. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95141	2021-01-24 13:55:50 +01:00
Jeroen Dobbelaere	659c7bcde6	[LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary. This is based on the version from the Full Restrict paches (D68511). The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94306	2021-01-24 13:53:13 +01:00
Jeroen Dobbelaere	774629641b	[LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed. Notes: - it also includes changes and tests from D90104 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92887	2021-01-24 13:48:20 +01:00
Roman Lebedev	6f2753273e	[NFC][SimplifyCFG] Extract CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses() out of PerformBranchToCommonDestFolding() To be used in PerformValueComparisonIntoPredecessorFolding()	2021-01-24 00:54:55 +03:00
Roman Lebedev	67f9c87a65	[NFC][SimplifyCFG] Perform early-continue in FoldValueComparisonIntoPredecessors() per-pred loop	2021-01-24 00:54:54 +03:00
Roman Lebedev	a4e6c2e647	[NFC][SimplifyCFG] Extract PerformValueComparisonIntoPredecessorFolding() out of FoldValueComparisonIntoPredecessors() Less nested code is much easier to follow and modify.	2021-01-24 00:54:54 +03:00
Nikita Popov	c83cff45c7	[IR] Add NoAliasScopeDeclInst (NFC) Add an intrinsic type class to represent the llvm.experimental.noalias.scope.decl intrinsic, to make code working with it a bit nicer by hiding the metadata extraction from view.	2021-01-23 22:40:32 +01:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Florian Hahn	d60b74c28a	[InstCombine] Set MadeIRChange in replaceInstUsesWith. Some utilities used by InstCombine, like SimplifyLibCalls, may add new instructions and replace the uses of a call, but return nullptr because the inserted call produces multiple results. Previously, the replaced library calls would get removed by InstCombine's deleter, but after `292077072e` this may not happen, if the willreturn attribute is missing. As a work-around, update replaceInstUsesWith to set MadeIRChange, if it replaces any uses. This catches the cases where it is used as replacer by utilities used by InstCombine and seems useful in general; updating uses will modify the IR. This fixes an expensive-check failure when replacing @__sinpif/@__cospifi with @__sincospif_sret.	2021-01-23 17:52:59 +00:00
Sanjay Patel	a6f0221276	[SLP] fix fast-math-flag propagation on FP reductions As shown in the test diffs, we could miscompile by propagating flags that did not exist in the original code. The flags required for fmin/fmax reductions will be fixed in a follow-up patch.	2021-01-23 11:17:20 -05:00
Florian Hahn	292077072e	[Local] Treat calls that may not return as being alive. With the addition of the `willreturn` attribute, functions that may not return (e.g. due to an infinite loop) are well defined, if they are not marked as `willreturn`. This patch updates `wouldInstructionBeTriviallyDead` to not consider calls that may not return as dead. This patch still provides an escape hatch for intrinsics, which are still assumed as willreturn unconditionally. It will be removed once all intrinsics definitions have been reviewed and updated. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94106	2021-01-23 16:05:14 +00:00
Roman Lebedev	022da61f6b	[SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers If i change it to AssertingVH instead, a number of existing tests fail, which means we don't consistently remove from the set when deleting blocks, which means newly-created blocks may happen to appear in that set if they happen to occupy the same memory chunk as did some block that was in the set originally. There are many places where we delete blocks, and while we could probably consistently delete from LoopHeaders when deleting a block in transforms located in SimplifyCFG.cpp itself, transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may delete blocks, and it doesn't seem good to teach them to deal with it. Since we at most only ever delete from LoopHeaders, let's just delegate to WeakVH to do that automatically. But to be honest, personally, i'm not sure that the idea behind LoopHeaders is sound.	2021-01-23 16:48:35 +03:00
Jeroen Dobbelaere	2b9a834c43	[InlineFunction] Use llvm.experimental.noalias.scope.decl for noalias arguments. Insert a llvm.experimental.noalias.scope.decl intrinsic that identifies where a noalias argument was inlined. This patch includes some refactorings from D90104. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93040	2021-01-23 12:10:57 +01:00
Zequan Wu	867bdfeff1	[InstCombine] remove incompatible attribute when simplifying some lib calls Like D95088, remove incompatible attribute in more lib calls. Differential Revision: https://reviews.llvm.org/D95278	2021-01-22 17:27:36 -08:00
Philip Reames	ef51eed37b	[LoopDeletion] Handle inner loops w/untaken backedges This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA. When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes. This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop. (I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.) The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue. Differential Revision: https://reviews.llvm.org/D94378	2021-01-22 16:31:29 -08:00
Francis Visoiu Mistrih	0cc38acfc4	[Matrix] Propagate shape information through fneg Similar to binary operators like fadd/fmul/fsub, propagate shape info through unary operators (fneg is the only one?). Differential Revision: https://reviews.llvm.org/D95252	2021-01-22 14:34:28 -08:00
Roman Lebedev	1742203844	[SimplifyCFG] FoldBranchToCommonDest(): re-lift restrictions on liveout uses of bonus instructions I have previously tried doing that in `b33fbbaa34` / `d38205144f`, but eventually it was pointed out that the approach taken there was just broken wrt how the uses of bonus instructions are updated to account for the fact that they should now use either bonus instruction or the cloned bonus instruction. In particluar, all that manual handling of PHI nodes in successors was just wrong. But, the fix is actually much much simpler than my initial approach: just tell SSAUpdate about both instances of bonus instruction, and let it deal with all the PHI handling. Alive2 confirms that the reproducers from the original bugs (@pr48450*) are now handled correctly. This effectively reverts commit `59560e8589`, effectively relanding `b33fbbaa34`.	2021-01-23 01:29:05 +03:00
Roman Lebedev	eae1cc0de5	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): move instruction cloning to after CFG update This simplifies follow-up patch, and is NFC otherwise.	2021-01-23 01:29:04 +03:00
Roman Lebedev	9bd8bcf993	[NFC][SimplifyCFG] PerformBranchToCommonDestFolding(): fix instruction name preservation NewBonusInst just took name from BonusInst, so BonusInst has no name, so BonusInst.getName() makes no sense. So we need to ask NewBonusInst for the name.	2021-01-23 01:29:03 +03:00
Shimin Cui	99a0aa07e9	[Analysis] Support AIX vec_malloc routines This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX. Differential Revision: https://reviews.llvm.org/D94710	2021-01-22 16:03:01 -05:00
Nikita Popov	45b259f995	[SimplifyLibCalls] Skip unused calls in sincos transform If the call result is unused, we should let it get DCEd rather than replacing it. Also, don't try to replace an existing sincos with another one (unless it's as part of combining sin and cos). This avoids an infinite combine loop if the calls are not DCEd as expected, which can happen with D94106 and lack of willreturn annotation in hand-crafted IR.	2021-01-22 20:57:13 +01:00
Sanjay Patel	411c144e4c	[InstCombine] narrow abs with sign-extended input In the motivating cases from https://llvm.org/PR48816 , we have a trailing trunc. But that is not required to reduce the abs width: https://alive2.llvm.org/ce/z/ECaz-p ...as long as we clear the int-min-is-poison bit (nsw). We have some existing tests that are affected, and I'm not sure what the overall implications are, but in general we favor narrowing operations over preserving nsw/nuw. If that causes problems, we could restrict this transform based on type (shouldChangeType() and/or vector vs. scalar). Differential Revision: https://reviews.llvm.org/D95235	2021-01-22 13:36:04 -05:00
Florian Hahn	86991d3231	[LoopUnswitch] Fix logic to avoid unswitching with atomic loads. The existing code did not deal with atomic loads correctly. Such loads are represented as MemoryDefs. Bail out on any MemoryAccess that is not a MemoryUse.	2021-01-22 15:10:12 +00:00
Arnold Schwaighofer	87b628dadd	[coro.async] Make sure we process async coroutines Because we were not looking for the llvm.coro.id.async intrinsic in the early coro pass which triggers follow-up passes we relied on the llvm.coro.end intrinsic being present. This might not be the case in functions that end in unreachable code. Differential Revision: https://reviews.llvm.org/D95144	2021-01-22 07:04:01 -08:00
Roman Lebedev	85e7578c6d	Revert "[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches" Does not build in XCode: http://green.lab.llvm.org/green/job/clang-stage1-RA/17963/consoleFull#-1704658317a1ca8a51-895e-46c6-af87-ce24fa4cd561 This reverts commit `aabed3718a`.	2021-01-22 17:37:11 +03:00
Roman Lebedev	d1a6f92fd5	[InstCombine] Fold `(~x) \| y` --> `~(x & (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count. Note that we could position this transformation as just hoisting of the `not` (still, iff y is freely negatible), but the test changes show a number of regressions, so let's not do that.	2021-01-22 17:23:54 +03:00
Roman Lebedev	79b0d21ce9	[InstCombine] Fold `(~x) & y` --> `~(x \| (~y))` iff it is free to do so Iff we know we can get rid of the inversions in the new pattern, we can thus get rid of the inversion in the old pattern, this decreasing instruction count.	2021-01-22 17:23:54 +03:00
Roman Lebedev	4ed0d8f2f0	[NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate() I'd like to use it in an upcoming fold.	2021-01-22 17:23:53 +03:00
Roman Lebedev	efeb8caf8b	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract the actual transform into helper function I'm intentionally structuring it this way, so that the actual fold only does the fold, and no legality/correctness checks, all of which must be done by the caller. This allows for the fold code to be more compact and more easily grokable.	2021-01-22 17:23:53 +03:00
Roman Lebedev	b482560a59	[NFC][SimplifyCFG] FoldBranchToCommonDest(): extract check for destination sharing into a helper function As a follow-up, i'll extract the actual transform into a function, and this helper will be called from both places, so this avoids code duplication.	2021-01-22 17:23:53 +03:00
Roman Lebedev	7b89efb55e	[NFC][SimplifyCFG] FoldBranchToCommonDest(): somewhat better structure weight updating code Hoist the successor updating out of the code that deals with branch weight updating, and hoist the 'has weights' check from the latter, making code more consistent and easier to follow.	2021-01-22 17:23:41 +03:00
Roman Lebedev	256a035752	[NFC][SimplifyCFG] FoldBranchToCommonDest(): unclutter Cond/CondInPred handling We don't need those variables, we can just get the final value directly.	2021-01-22 17:23:11 +03:00
Roman Lebedev	aabed3718a	[NFCI-ish][SimplifyCFG] FoldBranchToCommonDest(): really don't deal with uncond branches While we already ignore uncond branches, we could still potentially end up with a conditional branches with identical destinations due to the visitation order, or because we were called as an utility. But if we have such a disguised uncond branch, we still probably shouldn't deal with it here.	2021-01-22 17:23:10 +03:00

... 14 15 16 17 18 ...

27882 Commits