llvm-project

Commit Graph

Author	SHA1	Message	Date
Hongtao Yu	b9db70369b	[CSSPGO] Split context string to deduplicate function name used in the context. Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context. A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing. When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`. Testing This is no-diff change in terms of code quality and profile content (for text profile). For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated. The compile time of ads is dropped by 25%. Differential Revision: https://reviews.llvm.org/D107299	2021-08-30 20:09:29 -07:00
Artem Belevich	30dfd3449e	[MemCpyOpt] Allow specifying --enable-memcpyopt-without-libcalls more than once so we can override it via clang's CLI if necessary.	2021-08-30 13:55:55 -07:00
Andrew Litteken	c58d4c4bd3	[IROutliner] Changing outliner to prioritize reductions on assembly rather than IR instruction Currently, the IROutliner uses a simple metric to outline the largest amount of IR possible to outline first if it fits the cost model. This is model loses out on smaller blocks of code that have higher reductions in cost that are contained within larger blocks of IR. This reverses the order, where we calculate all of the costs first, and then reorder and extract items based on the calculated results. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106440	2021-08-30 13:43:08 -07:00
Mikhail Goncharov	5097b6e352	Revert "[SLP]Improve graph reordering." This reverts commit `84cbd71c95`. This commit breaks one of the internal tests. As agreed with Alexey I will provide the reproducer later.	2021-08-30 19:16:44 +02:00
Andrew Litteken	f564299fe9	[IROutliner] Ensure instructions at end of candidate are excluded Occasionally instructions are between the last instruction in a region, and the following instruction as identified by the Candidate. This adds an extra check right before splitting a candidate that excludes the region from being split/checked for outlining to remove errors. Tests Added: Tranforms/IROuutliner/outlining-extra-bitcasts.ll Reviewer: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D104142	2021-08-30 09:30:26 -07:00
Roman Lebedev	795d142d23	[NFCI][IndVars] rewriteLoopExitValues(): don't expand SCEV's until needed Previously, we'd expand ALL the SCEV's eagerly, because we needed to check with `isValidRewrite()`, and discard bad rewrite candidates, but now that we do not do that, we also don't need to always expand. In particular, this avoids expanding potentially-huge SCEV's that we would discard anyways because they are high-cost and we aren't rewriting aggressively.	2021-08-30 12:28:24 +03:00
Roman Lebedev	7b0d59da9a	[IndVars] Drop check for the validity of rewrite `isValidRewrite()` checks that the both the original SCEV, and the rewrite SCEV have the same base pointer. I //believe//, after all the recent SCEV improvements, this invariant is already enforced by SCEV itself. I originally tried changing it into an assert in D108043, but that showed that it triggers on e.g. https://reviews.llvm.org/D108043#2946621, where SCEV manages to forward the store to load, test added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108655	2021-08-30 12:06:58 +03:00
Florian Hahn	abd36fe512	[VPlan] Introduce code to limit querying VPValues using IR references. After applying VPlan-to-VPlan transformations, using IR references to query VPlan values may be incorrect, as the IR is not in sync with the VPlan any longer. To better detect such mis-matches, this patch introduces a new flag to VPlans to indicate whether it is safe to query VPValues using IR values. getVPValue is updated to assert if it is called when the flag indicates it is not safe any longer. There is an escape hatch via an extra argument, because there are 3 places that need to be fixed first. Those are 1. truncateToMinimalBitwidths 2. clearReductionWrapFlags 3. fixLCSSAPHIs As a first step, this flag will help preventing new code from violating this property. Any suggestions with respect to naming very welcome! Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D108573	2021-08-30 09:12:09 +02:00
Arthur Eubanks	099e4bcd5d	[InstCombine] Remove invariant group intrinsincs when comparing against null We cannot leak any equivalency information by comparing against null since null never has virtual metadata associated with it (when null is not a valid dereferenceable pointer). Instcombine seems to make sure that a null will be on the RHS, so we don't have to check both operands. This fixes a missed optimization in llvm-test-suite's MultiSource lambda benchmark under -fstrict-vtable-pointers. Reviewed By: Prazek Differential Revision: https://reviews.llvm.org/D108734	2021-08-29 15:45:25 -07:00
Nikita Popov	9f7873784d	[SCEVExpander] Reuse removePointerBase() for canonical addrecs ExposePointerBase() in SCEVExpander implements basically the same functionality as removePointerBase() in SCEV, so reuse it. The SCEVExpander code assumes that the pointer operand on adds is the last one -- I'm not sure that always holds. As such this might not be strictly NFC.	2021-08-29 21:12:35 +02:00
Nikita Popov	0886fd5b3a	[SCEVExpander] Remove unnecessary mul/udiv check (NFC) Pointer-typed SCEV expressions can no longer be mul or udiv, so we do not need to specially handle them here.	2021-08-29 20:47:00 +02:00
Nikita Popov	3f162e8e6d	[SCEVExpander] Assert single pointer op in add (NFC) There can only be one pointer operand in an add expression, and we have sorted operands to guarantee that it is the first. As such, the pointer check for other operands is dead code.	2021-08-29 20:30:56 +02:00
Andrew Litteken	063af63b96	[IRSim][IROutliner] Canonicalizing commutative value numbering between similarity sections. When the initial relationship between two pairs of values between similar sections is ambiguous to commutativity, arguments to the outlined functions can be passed in such that the order is incorrect, causing miscompilations. This adds a canonical mapping to each similarity section, so that we can maintain the relationship of global value numbering from one section to another. Added Tests: Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll unittests/Analysis/IRSimilarityIdentifierTest.cpp - IRSimilarityCandidate:CanonicalNumbering Reviewers: jroelofs, jpaquette, yroux Differential Revision: https://reviews.llvm.org/D104143	2021-08-27 15:02:56 -07:00
Johannes Doerfert	56e372b56e	[Attributor][NFC] Silence unused variable warning	2021-08-27 16:38:13 -05:00
Nikita Popov	757409da7a	[MergeICmps] Ignore clobbering instructions before the loads This is another followup to D106591. Even if there is an instruction that clobbers one of the loads, this doesn't matter if it happens before the loads. Those instructions aren't affected by the transform at all. The gep-references-bb.ll is modified to preserve the spirit of the test, as the store to @g no longer impacts the transform. Differential Revision: https://reviews.llvm.org/D108782	2021-08-27 23:31:35 +02:00
Philip Reames	c7b25e4359	[LoopDeletion] Use max trip count to break backedge in addition to exact one We'd added support a while back from breaking the backedge if SCEV can prove the trip count is zero. However, we used the exact trip count which requires all exits be analyzeable. I noticed while writing test cases for another patch that this disallows cases where one exit is provably taken paired with another which is unknown. This patch adds the upper bound case. We could use a symbolic max trip count here instead, but we use an isKnownNonZero filter (presumably for compile time?) for the first-iteration reasoning. I decided this was a more obvious incremental step, and we could go back and untangle the schemes separately. Differential Revision: https://reviews.llvm.org/D108833	2021-08-27 14:19:44 -07:00
Valentin Churavy	4cacb5cad0	[MergeICmps] Don't merge icmps derived from pointers with addressspaces IIUC we can't emit `memcmp` between pointers in addressspaces, doing so will trigger an assertion since the signature of the memcmp will not match it's arguments (https://bugs.llvm.org/show_bug.cgi?id=48661). This PR disables the attempt to merge icmps, when the pointer is in an addressspace. Reviewed By: #julialang, vtjnash Differential Revision: https://reviews.llvm.org/D94813	2021-08-27 22:15:02 +02:00
Johannes Doerfert	e05940de2a	[Attributor][FIX] Recursion via memory needs to be tracked explicitly Recursion can happen when we see a PHI use the second time or when we look at a store value operand use again. We already visited the potential copies and doing so again will just cause endless looping. Reviewed By: kuter Differential Revision: https://reviews.llvm.org/D108190	2021-08-27 13:12:13 -05:00
Johannes Doerfert	caa3b28260	[Attributor][FIX] Do not treat byval args as local memory (for now) For now we do should not treat byval arguments as local copies performed on the call edge, though, in general we should. To make that happen we need to teach various passes, e.g., DSE, about the copy effect of a byval. That would also allow us to mark functions only accessing byval arguments as readnone again, atguably their acceses have no effect outside of the function, like accesses to allocas. Reviewed By: kuter Differential Revision: https://reviews.llvm.org/D108140	2021-08-27 13:12:11 -05:00
Philip Reames	6a82376012	Special case common branch patterns in breakLoopBackedge (try 2) Changes since aec08e: * Adjust placement of a closing brace so that the general case actually runs. Turns out we had no coverage of the switch case. I added one in `eae90fd`. * Drop .llvm.loop.* metadata from the new branch as there is no longer a loop to annotate. Original commit message: This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.	2021-08-27 10:27:16 -07:00
Sanjay Patel	416a119f9e	[GlobalOpt] don't hoist constant expressions that can trap We try to forward a stored-once-constant-value from one global access to another, but that's not safe if the constant value is an expression that can trap. The tests are reduced from the miscompile examples in: https://llvm.org/PR47578 Differential Revision: https://reviews.llvm.org/D108771	2021-08-27 08:10:20 -04:00
Kirill Stoimenov	a3f4139626	[asan] Implemented flag to emit intrinsics to optimize ASan callbacks. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D108377	2021-08-26 20:33:57 +00:00
Alexey Bataev	84cbd71c95	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-08-26 12:31:18 -07:00
Andrew Litteken	9d2c859ebb	[CodeExtractor] Making the arguments outlined easier to access from the outside The Code Extractor does not provide an easy mechanism for determining the inputs and outputs after extraction has occurred, this patch gives the ability to pass in empty SetVectors to be filled with the inputs and outputs if they need to be analyzed. Added Tests: - InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106991	2021-08-26 09:47:53 -07:00
Alexey Bataev	b00f73d8bf	Revert "[SLP]Improve graph reordering." This reverts commit `a28234e37a` to investigate a compiler crash caused by the commit.	2021-08-26 09:19:40 -07:00
Anna Thomas	55bdb14026	[LoopPredication] Preserve MemorySSA Since LICM has now unconditionally moved to MemorySSA based form, all passes that run in same LPM as LICM need to preserve MemorySSA (i.e. our downstream pipeline). Added loop-mssa to all tests and perform -verify-memoryssa within LoopPredication itself. Differential Revision: https://reviews.llvm.org/D108724	2021-08-26 11:36:25 -04:00
Alexey Bataev	a28234e37a	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-08-26 07:19:07 -07:00
Andrew Wei	99c4336374	[LoopDataPrefetch] Add missed LoopSimplify dependence for prefetch pass SCEVExpander::expandCodeFor may expand add recurrences for loop with a preheader, so we should make LoopDataPrefetch dependent on LoopSimplify. This patch will try to fix : https://bugs.llvm.org/show_bug.cgi?id=43784 Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D108448	2021-08-26 21:01:59 +08:00
Florian Hahn	aa5b6c9779	[ConstraintElimination] Initial support for using info from assumes. This patch adds initial support to use facts from @llvm.assume calls. It intentionally does not handle all possible cases to keep things simple initially. For now, the condition from an assume is made available on entry to the containing block, if the assume is guaranteed to execute. Otherwise it is only made available in the successor blocks.	2021-08-26 10:08:00 +01:00
Wenlei He	a45d72e024	[CSSPGO] Add switch for sample loader to honor global pre-inliner decision from llvm-profgen The change adds a switch to allow sample loader to use global pre-inliner's decision instead. The pre-inliner in llvm-profgen makes inline decision globally based on whole program profile and function byte size as cost proxy. Since pre-inliner also adjusts/merges context profile based on its inline decision, honoring its inline decision in sample loader would lead to better post-inline profile quality especially for thinlto where cross module profile merging isn't possible without pre-inliner. Minor fix in profile reader is also included. When pre-inliner is use, we now also turn off the default merging and trimming logic unless it's explicitly asked. Differential Revision: https://reviews.llvm.org/D108677	2021-08-25 17:20:15 -07:00
Alexey Bataev	a36bc873a2	[SLP]No need to schedule/check parent for extract{element/value} instruction. The instruction extractelement/extractvalue are not required to be scheduled since they only depend on the source vector/aggregate (with constant indices), smae applies to the parent basic block checks. Improves compile time and saves scheduling budget. Differential Revision: https://reviews.llvm.org/D108703	2021-08-25 09:27:55 -07:00
Wenlei He	a6f15e9a49	[CSSPGO] Use probe inline tree to track zero size fully optimized context for pre-inliner This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen. Differential Revision: https://reviews.llvm.org/D108350	2021-08-25 09:01:11 -07:00
Kirill Stoimenov	832aae738b	[asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86. The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D107850	2021-08-25 15:31:46 +00:00
Vyacheslav Zakharin	2e192ab1f4	[CodeExtractor] Preserve topological order for the return blocks. Differential Revision: https://reviews.llvm.org/D108673	2021-08-25 08:09:01 -07:00
Florian Hahn	90d09eb300	[LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks. Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6. So far it has only been enabled for loops where all non-latch exits are 'de-optimizing' exits (D63923). But peeling of multi-exit loops can be highly beneficial in other cases too, like if all non-latch exiting blocks are unreachable. The motivating case are loops with runtime checks, like the C++ example below. The main issue preventing vectorization is that the invariant accesses to load the bounds of B is conditionally executed in the loop and cannot be hoisted out. If we peel off the first iteration, they become dereferenceable in the loop, because they must execute before the loop is executed, as all non-latch exits are terminated with unreachable. This subsequently allows hoisting the loads and runtime checks out of the loop, allowing vectorization of the loop. int sum(std::vector<int> A, std::vector<int> B, int N) { int cost = 0; for (int i = 0; i < N; ++i) cost += A->at(i) + B->at(i); return cost; } This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64. Note that this requires a follow-up improvement to the peeling cost model to actually peel iterations off loops as above. I will share that shortly. Also, peeling of multi-exits might be beneficial for exit blocks with other terminators, but I would like to keep the scope limited to known high-reward cases for now. I removed the option to disable peeling for multi-deopt exits because the code is more general now. Alternatively, the option could also be generalized, but I am not sure if there's much value in the option? Reviewed By: reames Differential Revision: https://reviews.llvm.org/D108108	2021-08-25 13:26:40 +01:00
Dawid Jurczak	bdcf04246c	[LoopIdiom] Don't transform loop into memmove when load from body has more than one use This change fixes issue found by Markus: https://reviews.llvm.org/rG11338e998df1 Before this patch following code was transformed to memmove: for (int i = 15; i >= 1; i--) { p[i] = p[i-1]; sum += p[i-1]; } However load from p[i-1] is used not only by store to p[i] but also by sum computation. Therefore we cannot emit memmove in loop header. Differential Revision: https://reviews.llvm.org/D107964	2021-08-25 14:22:40 +02:00
Rosie Sumpter	e221724714	[LoopFlatten] Add statistic for number of loops flattened. NFC Differential Revision: https://reviews.llvm.org/D108644	2021-08-25 10:10:10 +01:00
Fangrui Song	9ab9a9595b	[InstrProfiling] Keep profd non-private for non-renamable comdat functions The NS==0 condition used by D103717 missed a corner case: if the current copy does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a different CFG) may exist. This is super rare, but is possible with pre-inlining PGO instrumentation (which can make a weak_odr function inlines its callees differently, sometimes with value profiling while sometimes without). If the current copy with private profd is prevailing, the non-prevailing copy may get an undefined symbol if a caller inlining the non-prevailing function references its profd. If the other copy with non-private profd is prevailing, the current copy may cause a "relocation to discarded section" linker error. The fix is straightforward: just keep non-private profd in such a `DataReferencedByCode` case. With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) clang is 0.08% larger (172431496/172286720-1). `stat -c %s */.o \| awk '{s+=$1}END{print s}' is 0.026% larger. The majority of D103717's benefits remains. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D108432	2021-08-24 20:14:03 -07:00
Fangrui Song	32e2326cda	Revert D108432 "[InstrProfiling] Keep profd non-private for non-renamable comdat functions" This reverts commit `f653beea88`. It broke Windows coverage-inline.cpp because link.exe has a limitation that external symbols in IMAGE_COMDAT_SELECT_ASSOCIATIVE don't work. It essentially dropped the previous size optimization for coverage because coverage doesn't rename comdat by default. Needs more investigation what we should do.	2021-08-24 19:16:07 -07:00
Shimin Cui	cea5ab090b	[GlobalOpt] Fix the assert for null check of global value This is to fix the reported assert - https://bugs.llvm.org/show_bug.cgi?id=51608. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D108674	2021-08-24 20:47:33 -04:00
Fangrui Song	f653beea88	[InstrProfiling] Keep profd non-private for non-renamable comdat functions The NS==0 condition used by D103717 missed a corner case: if the current copy does not have a hash suffix (e.g. weak_odr), a copy with value profiling (with a different CFG) may exist. This is super rare, but is possible with pre-inlining PGO instrumentation (which can make a weak_odr function inlines its callees differently, sometimes with value profiling while sometimes without). If the current copy with private profd is prevailing, the non-prevailing copy may get an undefined symbol if a caller inlining the non-prevailing function references its profd. If the other copy with non-private profd is prevailing, the current copy may cause a "relocation to discarded section" linker error. The fix is straightforward: just keep non-private profd in this case. With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) clang is 0.08% larger (172431496/172286720-1). `stat -c %s */.o \| awk '{s+=$1}END{print s}' is 0.026% larger. The majority of D103717's benefits remains. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D108432	2021-08-24 15:59:35 -07:00
Kirill Stoimenov	b97ca3aca1	Revert "[asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86." This reverts commit `9588b685c6`. Breaks a bunch of builds. Reviewed By: GMNGeoffrey Differential Revision: https://reviews.llvm.org/D108658	2021-08-24 13:21:20 -07:00
Kirill Stoimenov	9588b685c6	[asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86. The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D107850	2021-08-24 19:34:34 +00:00
Rong Xu	de620f5b13	[CSPGO] Fix lost IRPGOFlag in CSPGO instrumentation The IRPGOFlag symbol (__llvm_profile_raw_version) is dropped when identified as non-prevailing for either regular or thin LTO during the mixed-LTO mode compilation. This happens in the module where IRPGOFlag is marked as non-prevailing. This variable is emitted in the final object from the prevailing module. This is still problematic because we currently query this symbol to coordinate some actions between PGOInstrumentation pass and InstrProfiling lowering pass, like whether to do value profiling, whether to do comdat renaming. This problem is bought up by YolandaCY in https://reviews.llvm.org/D107034 YolandCY reported unresolved symbol linker errors in CSPGO instrumentation build for chromium. This patch let LTO retain IRPGOFlag decl by adding it to CompilerUsed list and relax the check in isIRPGOFlagSet() when doing the InstrProfiling lowering. The test case in the patch is from D107034 <https://reviews.llvm.org/D107034>. Differential Revision: https://reviews.llvm.org/D108581	2021-08-24 09:41:29 -07:00
Philip Reames	1e07f19bfc	Revert "Special case common branch patterns in breakLoopBackedge" This reverts commit `aec08e8600`. Several problems have been reported with malformed loopinfo after this change, see discussion on https://reviews.llvm.org/rGaec08e86004b.	2021-08-24 08:53:42 -07:00
Jingu Kang	b52171629f	[GVN] Execute performLoopLoadPRE ahead of PerformLoadPRE Differential Revision: https://reviews.llvm.org/D108204	2021-08-24 09:50:27 +01:00
Anton Afanasyev	bed587631f	[AggressiveInstCombine] Add arithmetic shift right instr to `TruncInstCombine` DAG Add `ashr` instruction to the DAG post-dominated by `trunc`, allowing `TruncInstCombine` to reduce bitwidth of expressions containing these instructions. We should be shifting by less than the target bitwidth. Also it is sufficient to require that all truncated bits of the value-to-be-shifted are sign bits (all zeros or ones) and one sign bit is left untruncated: https://alive2.llvm.org/ce/z/Ajo2__ Part of https://reviews.llvm.org/D107766 Differential Revision: https://reviews.llvm.org/D108355	2021-08-24 10:41:16 +03:00
Sanjay Patel	cc9c545fb4	[InstCombine] generalize subtract with 'not' operands; 2nd try This is a re-try of `3aa009cc87` which was reverted at `9577fac0fd` because it caused an infinite loop. For the extra test case, either re-ordering the transforms or adding the extra clause to avoid sub-of-sub is enough to prevent the infinite compile, but I'm doing both to be safer. Original commit message: The motivation was to get min/max intrinsics to parity with cmp+select idioms, but this unlocks a few more folds because isFreeToInvert recognizes add/sub with constants too. In the min/max example, we have too many extra uses for smaller folds to improve things, but this fold is able to eliminate uses even though we can't reduce the number of instructions.	2021-08-23 17:06:51 -04:00
Simon Pilgrim	10c982e0b3	Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450	2021-08-23 21:09:26 +01:00
Nikita Popov	19dc02e99f	[MergeICmps] Allow sinking past non-load/store This is a followup to D106591. MergeICmps currently only allows sinking the loads past either instructions that don't write to memory at all, or simple loads/stores that don't modify the memory the loads access. The "simple loads/stores" part of this check doesn't seem necessary to me -- AA isModRef() already accurately models any operation that may clobber the memory. For example, in the adjusted test case the transform is still fine if the call to @foo() isn't readonly, but inaccessiblememonly -- in both cases, the call cannot modify the loaded memory. Differential Revision: https://reviews.llvm.org/D108517	2021-08-23 22:03:49 +02:00
Alina Sbirlea	e8723abf43	[DSE] Check post-dominance for malloc+memset->calloc transform. Aiming to address the regression discussed in https://reviews.llvm.org/D103009. Differential Revision: https://reviews.llvm.org/D108485	2021-08-23 12:39:51 -07:00
Florian Hahn	9577fac0fd	Revert "[InstCombine] generalize subtract with 'not' operands" This reverts commit `3aa009cc87`. The reverted commit causes an infinite loop in instcombine. See PR51584.	2021-08-23 15:47:21 +01:00
Chuanqi Xu	2556f58148	[FuncSpec] Don't specialize function which are easy to inline It would waste time to specialize a function which would inline finally. This patch did two things: - Don't specialize functions which are always-inline. - Don't spescialize functions whose lines of code are less than threshold (100 by default). For spec2017int, this patch could reduce the number of specialized functions by 33%. Then the compile time didn't increase for every benchmark. Reviewed By: SjoerdMeijer, xbolva00, snehasish Differential Revision: https://reviews.llvm.org/D107897	2021-08-23 19:20:21 +08:00
Alexander Potapenko	8300d52e8c	[tsan] Add support for disable_sanitizer_instrumentation attribute Unlike __attribute__((no_sanitize("thread"))), this one will cause TSan to skip the entire function during instrumentation. Depends on https://reviews.llvm.org/D108029 Differential Revision: https://reviews.llvm.org/D108202	2021-08-23 12:38:33 +02:00
Florian Hahn	d024a01511	Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts the revert `ab9296f13b`. The issue causing the revert should be fixed in `9baed023b4`.	2021-08-23 11:25:27 +01:00
Nikita Popov	2b70b68efb	[GVN] Don't short-circuit load PRE `4ad41902e8` changed this code to propagate Changed if scalar GEP PRE is performed. However, as implemented this would skip the load PRE entirely if GEP indices were PREd. Make sure load PRE runs even if Changed is already true. This likely has no functional effect as load PRE would then occur on a later GVN iteration.	2021-08-22 21:12:58 +02:00
Philip Reames	d8d84c9df8	[runtimeunroll] Use early return to reduce nesting [nfc]	2021-08-22 11:34:50 -07:00
Philip Reames	aec08e8600	Special case common branch patterns in breakLoopBackedge This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.	2021-08-22 10:42:23 -07:00
Nikita Popov	fafe5a6f44	[InstCombine] Perform "eq of parts" fold with logical ops The pattern matched here is too complex for the general logical and/or to bitwise and/or conversion to trigger. However, the fold is poison-safe, so match it with a select root as well: https://alive2.llvm.org/ce/z/vNzzSg https://alive2.llvm.org/ce/z/Beyumt	2021-08-22 16:55:53 +02:00
Sanjay Patel	3aa009cc87	[InstCombine] generalize subtract with 'not' operands The motivation was to get min/max intrinsics to parity with cmp+select idioms, but this unlocks a few more folds because isFreeToInvert recognizes add/sub with constants too. In the min/max example, we have too many extra uses for smaller folds to improve things, but this fold is able to eliminate uses even though we can't reduce the number of instructions.	2021-08-22 07:18:31 -04:00
Florian Hahn	9baed023b4	[LV] Adjust reduction recipes before recurrence handling. Adjusting the reduction recipes still relies on references to the original IR, which can become outdated by the first-order recurrence handling. Until reduction recipe construction does not require IR references, move it before first-order recurrence handling, to prevent a crash as exposed by D106653.	2021-08-22 11:02:33 +01:00
Sanjay Patel	41af8f0ad5	[InstCombine] combine constants by reassociating add/sub/add This may overlap partially with the reassociate pass, but it seems simple enough that we should try it here in InstCombine to enable other folds. This shows up as an opportunity and potential regression if we improve a subtract fold with 'not' ops to be more general.	2021-08-21 11:45:43 -04:00
eopXD	4fc98ca617	[NFC][LoopIdiom] Let processLoopStoreOfLoopLoad take StoreSize as SCEV instead of unsigned Letting it take SCEV allows further modification on the function to optimize if the StoreSize / Stride is runtime determined. The plan is to let memcpy / memmove deal with runtime-determined sizes, just like what D107353 did to memset. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D108289	2021-08-21 00:03:28 -07:00
Nikita Popov	0afd10b403	[LoopPassManager] Assert that MemorySSA is preserved if used Currently it's possible to silently use a loop pass that does not preserve MemorySSA in a loop-mssa pass manager, as we don't statically know which loop passes preserve MemorySSA (as was the case with the legacy pass manager). However, we can at least add a check after the fact that if MemorySSA is used, then it should also have been preserved. Hopefully this will reduce confusion as seen in https://bugs.llvm.org/show_bug.cgi?id=51020. Differential Revision: https://reviews.llvm.org/D108399	2021-08-20 22:48:04 +02:00
Florian Hahn	ab9296f13b	Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts commit `f4122398e7` to investigate a crash exposed by it. The patch breaks building the code below with `clang -O2 --target=aarch64-linux` int a; double b, c; void d() { for (; a; a++) { b += c; c = a; } }	2021-08-20 21:24:28 +01:00
Aditya Kumar	b8e345b266	PR46874: Reset stack after visiting a node When the stack is not reset it keeps previously visited Basic Block which results in bugs where an instruction is hoisted to a predecessor where the instruction was not fully anticipable. Differential Revision: https://reviews.llvm.org/D108425	2021-08-20 11:25:05 -07:00
Sanjay Patel	dd19f342fa	[AggressiveInstCombine] guard against applying instruction flags with constant folding This is a minimized version of a crash reported in: D108201	2021-08-20 12:22:18 -04:00
Kirill Stoimenov	05a8c0b5f8	[asan] Implemented getAddressSanitizerParams used by the ASan callback optimization code. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D108397	2021-08-20 14:17:07 +00:00
Simon Pilgrim	c1f3bab23b	MainSwitch::isValidSelectInst - don't dereference dyn_cast<> results. We've already checked that the pointer isa<PHINode>, so we can use cast<Instruction> safely. Fixes static analyser warning.	2021-08-20 14:31:11 +01:00
Alexander Potapenko	8dc7dcdca1	[msan] Add support for disable_sanitizer_instrumentation attribute Unlike __attribute__((no_sanitize("memory"))), this one will cause MSan to skip the entire function during instrumentation. Depends on https://reviews.llvm.org/D108029 Differential Revision: https://reviews.llvm.org/D108199	2021-08-20 15:11:26 +02:00
Alexander Potapenko	b0391dfc73	[clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute The purpose of __attribute__((disable_sanitizer_instrumentation)) is to prevent all kinds of sanitizer instrumentation applied to a certain function, Objective-C method, or global variable. The no_sanitize(...) attribute drops instrumentation checks, but may still insert code preventing false positive reports. In some cases though (e.g. when building Linux kernel with -fsanitize=kernel-memory or -fsanitize=thread) the users may want to avoid any kind of instrumentation. Differential Revision: https://reviews.llvm.org/D108029	2021-08-20 14:01:06 +02:00
Simon Pilgrim	5d21ee4224	MemProfilerPass::run - remove (dead) duplicate return. NFC.	2021-08-20 12:36:28 +01:00
Roman Lebedev	5d4f37e895	[NFCI][SimplifyCFG] Rewrite `createUnreachableSwitchDefault()` The only thing that function should do as per it's semantic, is to ensure that the switch's default is a block consisting only of an `unreachable` terminator. So let's just create such a block and update switch's default to point to it. There should be no need for all this weird dance around predecessors/successors.	2021-08-20 13:28:08 +03:00
Anton Afanasyev	3890ce708d	[NFC][AggressiveInstCombine] Simplify code for shift truncation	2021-08-20 06:37:02 +03:00
Fangrui Song	77b435aaa1	Revert "[InstrProfiling] Make COFF use the ELF comdat scheme (drop link.exe compatibility)" This reverts commit `fbb8e772ec`. Accidentally pushed.	2021-08-19 16:42:57 -07:00
Fangrui Song	fbb8e772ec	[InstrProfiling] Make COFF use the ELF comdat scheme (drop link.exe compatibility) The COFF specific `DataReferencedByCode` complexity (D103372 D103717) is due to a link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE is not really dropped, so it can cause duplicate definition error.	2021-08-19 16:38:32 -07:00
Akira Hatanaka	898dc4590c	Refactor inlineRetainOrClaimRVCalls. NFC This is in preparation for committing https://reviews.llvm.org/D103000.	2021-08-19 14:55:45 -07:00
Arthur Eubanks	44a3241f10	[NFC] Replace some attribute methods that use confusing indexes	2021-08-19 14:10:26 -07:00
Florian Mayer	73323c6eaa	[hwasan] re-enable stack safety by default. The failed assertion was fixed in D108337. Reviewed By: vitalybuka, eugenis Differential Revision: https://reviews.llvm.org/D108381	2021-08-19 21:11:24 +01:00
Philip Reames	17b9cb1817	[runtimeunroll] Support multiple exits to latch exit w/prolog loop This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used. This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch. I roled in the analogous fix here as it seemed obvious, and not worth re-review. As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic. (Alternatively, maybe we should be peeling these cases more aggressively?) Differential Revision: https://reviews.llvm.org/D108262	2021-08-19 11:43:52 -07:00
Nikita Popov	8cf5b69f69	[GuardWidening] Preserve MemorySSA As reported on https://bugs.llvm.org/show_bug.cgi?id=51020, the guard widening pass doesn't preserve MemorySSA, so it can no longer be scheduled in the same loop pass manager as LICM. However, the loop-schedule.ll test indicates that this is supposed to work. Fix this by preserving MemorySSA if available, as this seems to be trivial in this case (we only need to drop the memory access for the removed guards). Differential Revision: https://reviews.llvm.org/D108386	2021-08-19 20:23:17 +02:00
Philip Reames	447256f22b	[runtimeunroll] Fix reported DT verification error after `94d0914` In `94d0914`, I added support for unrolling of multiple exit loops which have multiple exits reaching the latch. Per reports on the review post commit, I'd missed updating the domtree for one case. This fix addresses that ommission. There's no new test as this is covered by existing tests with expensive verification turned on.	2021-08-19 11:06:17 -07:00
Chang-Sun Lin, Jr	9cae598f8b	[InstCombine] Avoid folding GEPs across loop boundaries Folding a GEP from outside to inside a loop will materialize an add where there wasn't an equivalent operation before. Check the containing loops before making this fold. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107935	2021-08-19 20:03:44 +03:00
Arthur Eubanks	33d44b762e	[OpaquePtr][Inline] Use byval type instead of pointee type Reviewed By: #opaque-pointers, dblaikie Differential Revision: https://reviews.llvm.org/D105711	2021-08-19 09:56:08 -07:00
Sanjay Patel	ec54e275f5	Revert "[CVP] processSwitch: Remove default case when switch cover all possible values." This reverts commit `9934a5b2ed`. This patch may cause miscompiles because it missed a constraint as shown in the examples from: https://llvm.org/PR51531	2021-08-19 08:43:51 -04:00
Sanjay Patel	eee0ded337	[InstCombine] add min/max intrinsics as freely invertible candidates In the optimized test, we are able to peak through the min/max that has 2 min/max operands and invert them all: https://alive2.llvm.org/ce/z/7gYMN5	2021-08-19 08:41:38 -04:00
Sanjay Patel	e10c3beca5	[InstCombine] add one-use check for min/max fold with not operands; NFC This makes the intrinsic logic match the cmp+select idiom folds just below. It's not clearly a win either way unless we think that a 'not' op costs more than min/max. The cmp+select folds on these patterns are more extensive than the intrinsics currently and may have some complicated interactions, so I'm trying to make those line up and bring the optimizations for intrinsics up to parity.	2021-08-19 08:41:38 -04:00
Rosie Sumpter	d1aa075129	[LoopFlatten] Fix assertion failure There is an assertion failure in computeOverflowForUnsignedMul (used in checkOverflow) due to the inner and outer trip counts having different types. This occurs when the IV has been widened, but the loop components are not successfully rediscovered. This is fixed by some refactoring of the code in findLoopComponents which identifies the trip count of the loop. Differential Revision: https://reviews.llvm.org/D108107	2021-08-19 13:18:57 +01:00
Bjorn Pettersson	36d5138619	[NewPM] Make some sanitizer passes parameterized in the PassRegistry Refactored implementation of AddressSanitizerPass and HWAddressSanitizerPass to use pass options similar to passes like MemorySanitizerPass. This makes sure that there is a single mapping from class name to pass name (needed by D108298), and options like -debug-only and -print-after makes a bit more sense when (despite that it is the unparameterized pass name that should be used in those options). A result of the above is that some pass names are removed in favor of the parameterized versions: - "khwasan" is now "hwasan<kernel;recover>" - "kasan" is now "asan<kernel>" - "kmsan" is now "msan<kernel>" Differential Revision: https://reviews.llvm.org/D105007	2021-08-19 12:43:37 +02:00
David Sherwood	f4122398e7	[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64 I have added a new TTI interface called enableOrderedReductions() that controls whether or not ordered reductions should be enabled for a given target. By default this returns false, whereas for AArch64 it returns true and we rely upon the cost model to make sensible vectorisation choices. It is still possible to override the new TTI interface by setting the command line flag: -force-ordered-reductions=true\|false I have added a new RUN line to show that we use ordered reductions by default for SVE and Neon: Transforms/LoopVectorize/AArch64/strict-fadd.ll Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D106653	2021-08-19 09:29:40 +01:00
Wenlei He	eca03d2768	[CSSPGO] Track and use context-sensitive post-optimization function size to drive global pre-inliner in llvm-profgen This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions. To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context. The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed. Differential Revision: https://reviews.llvm.org/D108180	2021-08-18 22:50:57 -07:00
Rong Xu	5fdaaf7fd8	[SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loader This patch implements Flow Sensitive Sample FDO (FSAFDO) profile loader. We have two profile loaders for FS profile, one before RegAlloc and one before BlockPlacement. To enable it, when -fprofile-sample-use=<profile> is specified, add "-enable-fs-discriminator=true \ -disable-ra-fsprofile-loader=false \ -disable-layout-fsprofile-loader=false" to turn on the FS profile loaders. Differential Revision: https://reviews.llvm.org/D107878	2021-08-18 18:37:35 -07:00
Anton Afanasyev	cfb6dfcbd1	[AggressiveInstCombine] Add logical shift right instr to `TruncInstCombine` DAG Add `lshr` instruction to the DAG post-dominated by `trunc`, allowing TruncInstCombine to reduce bitwidth of expressions containing these instructions. We should be shifting by less than the target bitwidth. Also it is sufficient to require that all truncated bits of the value-to-be-shifted are zeros: https://alive2.llvm.org/ce/z/_LytbB Alive2 variable-length proof: https://godbolt.org/z/1srE1aqzf => s/32/8/ => https://alive2.llvm.org/ce/z/StwPia Part of https://reviews.llvm.org/D107766 Differential Revision: https://reviews.llvm.org/D108201	2021-08-18 22:20:58 +03:00
Ali Sedaghati	cc7bcef3e3	Reapply: [NFC] factor out unrolling decision logic reverting `ffd8a268bd` (reapplying `4d559837e8`) - removed spurious inclusion of <optional> Differential Revision: https://reviews.llvm.org/D106001	2021-08-18 12:04:33 -07:00
Geoffrey Martin-Noble	ffd8a268bd	Revert "[NFC] factor out unrolling decision logic" This patch added a requirement for C++17, while LLVM is supposed to build with C++14 (https://llvm.org/docs/CodingStandards.html#c-standard-versions). Posted a note to the original review thread (https://reviews.llvm.org/D106001). This reverts commit `4d559837e8`. Differential Revision: https://reviews.llvm.org/D108314	2021-08-18 11:38:48 -07:00
Nikita Popov	3dd8c9176b	[LICM] Remove AST-based implementation MSSA-based LICM has been enabled by default for a few years now. This drops the old AST-based implementation. Using loop(licm) will result in a fatal error, the use of loop-mssa(licm) is required (or just licm, which defaults to loop-mssa). Note that the core canSinkOrHoistInst() logic has to retain AST support for now, because it is shared with LoopSink. Differential Revision: https://reviews.llvm.org/D108244	2021-08-18 20:21:53 +02:00
Ali Sedaghati	4d559837e8	[NFC] factor out unrolling decision logic Decoupling the unrolling logic into three different functions. The shouldPragmaUnroll() covers the 1st and 2nd priorities of the previous code, the shouldFullUnroll() covers the 3rd, and the shouldPartialUnroll() covers the 5th. The output of each function, Optional<unsigned>, could be a value for UP.Count, which means unrolling factor has been set, or None, which means decision hasn't been made yet and should try the next priority. Reviewed By: mtrofin, jdoerfert Differential Revision: https://reviews.llvm.org/D106001	2021-08-18 11:21:40 -07:00
Arthur Eubanks	fde0eb1f9a	[NFC] A couple more removeAttribute() cleanups	2021-08-18 11:15:20 -07:00
Han Zhu	687f046c97	[NFC][loop-idiom] Rename Stores to IgnoredInsts; Fix a typo When dealing with memmove, we also add the load instruction to the ignored instructions list passed to `mayLoopAccessLocation`. Renaming "Stores" to "IgnoredInsts" to be more precise. Differential Revision: https://reviews.llvm.org/D108275	2021-08-18 10:52:16 -07:00
Arthur Eubanks	7557d6c896	[NFC] Cleanup calls to CallBase::getAttribute()	2021-08-18 09:39:33 -07:00
Florian Mayer	164e09de2e	[hwasan] Default -hwasan-use-stack-safety to off. This very occasionally causes to an assertion failure in the compiler. Turning off until we can get to the bottom of this. Reviewed By: hctim Differential Revision: https://reviews.llvm.org/D108282	2021-08-18 17:21:32 +01:00
Joseph Huber	13d8f000d7	[OpenMP][NFC] Improve debug message for shared memory Summary: Make the debug message for HeapToShared more helpful by showing the actual call.	2021-08-18 11:56:09 -04:00
Joseph Huber	58f9326487	[OpenMP] Change AAKernelInfo to ignore non-kernels Currently, AAKernelInfo will fail on an assertion if we attempt to run it on a kernel without the init / deinit runtime calls. However, this occurs for global constructors on the device. This will cause OpenMPOpt to crash whenever global constructors are present. This patch removes this assertion and just gives up instead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108258	2021-08-18 11:24:29 -04:00
Petr Hosek	1c84167149	[InstrProfiling][NFC] Initialize MadeChange variable This addresses an issue introduced in `389dc94d4b` which triggers a crash on Windows.	2021-08-17 23:33:38 -07:00
Anton Afanasyev	803270c0c6	[AggressiveInstCombine] Fix unsigned overflow Fix issue reported here: https://reviews.llvm.org/D108091#2950930	2021-08-18 08:42:46 +03:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Arthur Eubanks	de0ae9e89e	[NFC] Cleanup more AttributeList::addAttribute()	2021-08-17 21:05:41 -07:00
Arthur Eubanks	ad727ab7d9	[NFC] Migrate some callers away from Function/AttributeLists methods that take an index These methods can be confusing.	2021-08-17 21:05:40 -07:00
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Jun Ma	9934a5b2ed	[CVP] processSwitch: Remove default case when switch cover all possible values. Differential Revision: https://reviews.llvm.org/D106056	2021-08-18 10:23:13 +08:00
Philip Reames	94d0914292	[runtimeunroll] Support multiple exits to latch exit w/epilogue loop This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used. I decided to restrict this to the epilogue case. Given the changes ended up being pretty generic, we may be able to unblock the prolog case too, but I want to do that in a separate change to reduce the amount of code we all have to understand at one time. Differential Revision: https://reviews.llvm.org/D107381	2021-08-17 17:52:04 -07:00
Florian Mayer	8f750e8814	[hwasan] [NFC] pull out helper function. Reviewed By: hctim Differential Revision: https://reviews.llvm.org/D107334	2021-08-17 23:31:47 +01:00
Arthur Eubanks	16890e0040	[GlobalOpt] Check stored once value's type before setting global initializer In the provided test case, we were trying to set the global's initializer to `i32* null` when the global's value type was `@0`. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D108232	2021-08-17 14:34:29 -07:00
Adrian Prantl	8ae5e0b154	Add missing nullptr check Unfortunatley the IR Verifier doesn't reject debug intrinsics that have nullptr as arguments, so coro::salvageDebugInfo for now also needs to deal with them. rdar://81979541	2021-08-17 13:59:52 -07:00
Nikita Popov	e918ba6958	[LICM] Drop -licm-n2-threshold option This was a diagnostic option used to demonstrate a weakness in the AST-based LICM implementation. This problem does not exist in the MSSA-based LICM implementation, which has been enabled for a long time now. As such, this option is no longer relevant.	2021-08-17 22:41:31 +02:00
Sanjay Patel	50c1138796	[InstCombine] add TODO about another min/max fold; NFC Suggested in post-commit for `d0975b7cb0`	2021-08-17 14:14:25 -04:00
Joseph Huber	339aa76526	[OpenMP][NFC] Add option to print module after OpenMPOpt for debugging This patch adds an extra option to print the module after running one of the OpenMPOpt passes if debugging is enabled. This makes it much easier to inspect the effects of this pass when doing debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108146	2021-08-17 12:46:10 -04:00
Philip Reames	982da7a20c	[SCEVExpander] Stop hoisting IR when reusing phis his is a fix for PR43678, and is an alternate patch to D105723. The basic issue we're running into is that LSR + SCEVExpander are moving the very instruction whose operand we're in the process of expanding. This breaks the subtle and ill-documented invariant which let LSR work. (Full story can be found here: https://reviews.llvm.org/D105723#2878473) Rather than attempting a fix, this change just removes the optimization entirely. The code is entirely untested, and removing it appears to have no impact I can find. This code was added back in 2014 by `1e12f8563d` with a single test which does not seem to actually test the hoisting logic. From a philosophical standpoint, it also seems very strange to have the expander implementing optimizations which should live in a dedicated transform pass. Differential Revision: https://reviews.llvm.org/D106178	2021-08-17 09:38:32 -07:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
Sanjay Patel	e73f4e1123	[InstCombine] remove unused function argument; NFC	2021-08-17 08:10:42 -04:00
Sanjay Patel	d0975b7cb0	[InstCombine] fold signed min/max intrinsics with negated operands If both operands are negated, we can invert the min/max and do the negation after: smax (neg nsw X), (neg nsw Y) --> neg nsw (smin X, Y) smin (neg nsw X), (neg nsw Y) --> neg nsw (smax X, Y) This is visible as a remaining regression in D98152. I don't see a way to generalize this for 'unsigned' or adapt Negator to handle it. This only appears to be safe with 'nsw': https://alive2.llvm.org/ce/z/GUy1zJ Differential Revision: https://reviews.llvm.org/D108165	2021-08-17 08:10:42 -04:00
Anton Afanasyev	1f3e35b6d1	[AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG Add `shl` instruction to the DAG post-dominated by `trunc`, allowing TruncInstCombine to reduce bitwidth of expressions containing left shifts. The only thing we need to check is that the target bitwidth must be wider than the maximal shift amount: https://alive2.llvm.org/ce/z/AwArqu Part of https://reviews.llvm.org/D107766 Differential Revision: https://reviews.llvm.org/D108091	2021-08-17 12:44:37 +03:00
Whitney Tsang	a41c95c0e3	[LNICM] Fix infinite loop There is a bug introduced by https://reviews.llvm.org/D107219 which causes an infinite loop, when there are more than 2 levels PHINode chain. Reviewed By: uint256_t Differential Revision: https://reviews.llvm.org/D108166	2021-08-17 12:55:22 +09:00
Arthur Eubanks	0d822da2bd	[NFC] Remove/replace some confusing attribute getters on Function	2021-08-16 16:12:37 -07:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Nikita Popov	570c9beb8e	[MemorySSA] Remove unnecessary MSSA dependencies LoopLoadElimination, LoopVersioning and LoopVectorize currently fetch MemorySSA when construction LoopAccessAnalysis. However, LoopAccessAnalysis does not actually use MemorySSA and we can pass nullptr instead. This saves one MemorySSA calculation in the default pipeline, and thus improves compile-time. Differential Revision: https://reviews.llvm.org/D108074	2021-08-16 20:40:55 +02:00
Sanjay Patel	de285eacb0	[InstCombine] allow for constant-folding in GEP transform This would crash the reduced test or as described in https://llvm.org/PR51485 ...because we can't mark a constant (-expression) with 'inbounds'.	2021-08-16 10:36:56 -04:00
Roman Lebedev	febcedf18c	Revert "[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer" https://bugs.llvm.org/show_bug.cgi?id=51490 was filed. This reverts commit `35a8bdc775`.	2021-08-16 14:30:29 +03:00
David Sherwood	9b19b77883	[NFC] Remove unused code in llvm::createSimpleTargetReduction	2021-08-16 09:50:45 +01:00
Roman Lebedev	2eb554a9fe	Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit `3d9beefc7d`.	2021-08-16 11:07:42 +03:00
David Green	c6b7db015f	[InstCombine] Add call to matchSAddSubSat from min/max This adds a call to matchSAddSubSat from smin/smax instrinsics, allowing the same patterns to match if the canonical form of a min/max is an intrinsics, not a icmp/select. Differential Revision: https://reviews.llvm.org/D108077	2021-08-15 17:25:16 +01:00
Roman Lebedev	3d9beefc7d	Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) ... with test change this time. LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:16:04 +03:00
Roman Lebedev	60dd0121c9	Revert "[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Forgot to stage the test change. This reverts commit `78af5cb213`.	2021-08-15 19:15:09 +03:00
Roman Lebedev	78af5cb213	[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:02:34 +03:00
Roman Lebedev	35a8bdc775	[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer Currently/previously, while SCEV guaranteed that it produces the same value, the way it was produced may be illegal IR, so we have an ugly check that the replacement is valid. But now that the SCEV strictness wrt the pointer/integer types has been improved, i believe this invariant is already upheld by the SCEV itself, natively. I think we should add an assertion, wait for a week, and then, if all is good, rip out all this checking. Or we could just do the latter directly i guess. This reverts commit rL127839. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108043	2021-08-15 18:59:32 +03:00
Nikita Popov	944dfa4975	[IndVars] Don't check for pointer exit count (NFC) After recent changes, exit counts and BE taken counts are always integers, so convert these to assertions. While here, also convert the loop invariance checks to asserts. Exit counts are always loop invariant.	2021-08-15 16:49:30 +02:00
Nikita Popov	3c503ba06a	[FunctionImport] Fix build with old mingw (NFC) std::errc::operation_not_supported is not universally supported. Make use of LLVM's errc interoperability header, which lists known-good errc values.	2021-08-15 15:47:59 +02:00
Paul Walker	f7a831daa6	[LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested. Previously we emitted a "does not support scalable vectors" remark for all targets whenever vectorisation is attempted. This pollutes the output for architectures that don't support scalable vectors and is likely confusing to the user. Instead this patch introduces a debug message that reports when scalable vectorisation is allowed by the target and only issues the previous remark when scalable vectorisation is specifically requested, for example: #pragma clang loop vectorize_width(2, scalable) Differential Revision: https://reviews.llvm.org/D108028	2021-08-15 12:15:52 +01:00
eopXD	012173680f	[LoopIdiom] let the pass deal with runtime memset size The current LIR does not deal with runtime-determined memset-size. This patch utilizes SCEV and check if the PointerStrideSCEV and the MemsetSizeSCEV are equal. Before comparison the pass would try to fold the expression that is already protected by the loop guard. Testcase file `memset-runtime.ll`, `memset-runtime-debug.ll` added. This patch deals with proper loop-idiom. Proceeding patch wants to deal with SCEV-s that are inequal after folding with the loop guards. Reviewed By: lebedev.ri, Whitney Differential Revision: https://reviews.llvm.org/D107353	2021-08-14 19:22:06 +08:00
Dawid Jurczak	107401002e	[NFC][DSE] Clean up KnownNoReads and MemorySSAScanLimit in DSE Another simple cleanups set in DSE. CheckCache is removed since `1f1145006b` and in consequence KnownNoReads is useless. Also update description of MemorySSAScanLimit which default value is 150 instead 100. Differential Revision: https://reviews.llvm.org/D107812	2021-08-14 11:26:57 +02:00
Arthur Eubanks	c19d7f8af0	[CallPromotion] Check for inalloca/byval mismatch Previously we would allow promotion even if the byval/inalloca attributes on the call and the callee didn't match. It's ok if the byval/inalloca types aren't the same. For example, LTO importing may rename types. Fixes PR51397. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107998	2021-08-13 16:52:04 -07:00
Arthur Eubanks	dc41c558dd	[NFC] Make AttributeList::hasAttribute(AttributeList::ReturnIndex) its own method AttributeList::hasAttribute() is confusing. In an attempt to change the name to something that suggests using other methods, fix up some existing uses.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	f80ae58068	[NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex) getAttribute() is confusing, use a clearer method.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	a9831cce1e	[NFC] Remove public uses of AttributeList::getAttributes() Use methods that better convey the intent.	2021-08-13 11:38:12 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Roman Lebedev	c46546bd52	Reland "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological"" The commit originally unearthed a problem, reported as https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff#1019890 Now that the problem has been fixed, and the assertion no longer fires, let's see if there are other cases it fires on. This reverts commit `5c8c24d2de`, relanding commit `f30a7dff8a`.	2021-08-13 15:45:03 +03:00
Roman Lebedev	2702fb1148	[SimplifyCFG] Restart if `removeUndefIntroducingPredecessor()` made changes It might changed the condition of a branch into a constant, so we should restart and constant-fold terminator, instead of continuing with the tautological "conditional" branch. This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff	2021-08-13 15:45:03 +03:00
Roman Lebedev	5c8c24d2de	Revert "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological" The assertion does not hold on a provided reproducer. Reverting until after fixing the problem. This reverts commit `f30a7dff8a`.	2021-08-13 13:16:22 +03:00
Dylan Fleming	4be7fb9762	[SVE] Add folds for truncation of vscale Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107453	2021-08-13 10:18:00 +01:00
Rosie Sumpter	46abd1fbe8	[LoopFlatten] Fix assertion failure in checkOverflow There is an assertion failure in computeOverflowForUnsignedMul (used in checkOverflow) due to the inner and outer trip counts having different types. This occurs when the IV has been widened, but the loop components are not successfully rediscovered. This is fixed by some refactoring of the code in findLoopComponents which identifies the trip count of the loop.	2021-08-13 10:07:49 +01:00
Giorgis Georgakoudis	60e643fe05	[OpenMP][Fix] Fix disable spmdization option Besides SPMDization, other analysis and optimization for original, frontend-generated SPMD regions uses information from the AAKernelInfoFunction attribute. This fix makes sure disabling SPMDization through the corresponding option applies only to generic mode regions, which should not be SPMDized, while it leaves unaffected the attribute state of original SPMD regions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108001	2021-08-12 17:59:14 -07:00
Sanjay Patel	14eefa57f2	[InstCombine] factorize min/max intrinsic ops with common operand (2nd try) This is a re-try of `6de1dbbd09` which was reverted because it missed a null check. Extra test for that failure added. Original commit message: This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 16:32:07 -04:00
Amy Huang	427520a8fa	Revert "[InstCombine] factorize min/max intrinsic ops with common operand" This reverts commit `6de1dbbd09` because it causes a compiler crash.	2021-08-12 12:36:25 -07:00
Florian Hahn	f999312872	Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts the revert `28c04794df`. The failing MLIR test that caused the revert should be fixed in this version. Also includes a PPC test fix previously in `1f87c7c478`.	2021-08-12 18:31:57 +01:00
Roman Lebedev	f30a7dff8a	[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological We really shouldn't deal with a conditional branch that can be trivially constant-folded into an unconditional branch. Indeed, barring failure to trigger BB reprocessing, that should be true, so let's assert as much, and hope the assertion never fires. If it does, we have a bug to fix.	2021-08-12 20:03:09 +03:00
Roman Lebedev	628f63d3d5	[SimplifyCFG] If FoldTwoEntryPHINode() changed things, restart Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()` doesn't get asked to deal with non-unconditional branches, and if i do that, then said assertion fires on existing tests, and this is what prevents it from firing.	2021-08-12 20:03:09 +03:00
Sanjay Patel	790c29ab86	[InstCombine] fold umax/umin intrinsics based on demanded bits This is a direct translation of the select folds added with D53033 / D53036 and another step towards canonicalization using the intrinsics (see D98152).	2021-08-12 12:37:45 -04:00
maekawatoshiki	dd3eea6566	[LICM] Support sinking in LNICM Currently, LNICM pass does not support sinking instructions out of loop nest. This patch enables LNICM to sink down as many instructions to the exit block of outermost loop as possible. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D107219	2021-08-13 00:56:26 +09:00
Sanjay Patel	cd44cc86e3	[InstCombine] remove unused function argument; NFC This was just added with `6de1dbbd09` , and I missed pulling the extra arg from the final revision.	2021-08-12 11:47:25 -04:00
Johannes Doerfert	4e7d7cae67	[Attributor][FIX] Do not try to rewrite functions with casted call sites If we cast a function at the call site it is hard(er) to get the rewrite correct, let's not attempt it for now. Fixes PR51448.	2021-08-12 10:39:53 -05:00
Johannes Doerfert	5f543919b2	[Attributor][FIX] Guard constant casts with type size checks	2021-08-12 10:39:53 -05:00
Johannes Doerfert	a420f80bf1	[Attributor] Do not delete volatile stores to null/undef See D106309. Differential Revision: https://reviews.llvm.org/D107906	2021-08-12 10:39:52 -05:00
Sanjay Patel	be0698559b	[InstCombine] remove shl(neg x), y transform This diff was accidentally committed with: `1b5a195845`	2021-08-12 11:27:22 -04:00
Sanjay Patel	6de1dbbd09	[InstCombine] factorize min/max intrinsic ops with common operand This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 11:19:09 -04:00
Sanjay Patel	1b5a195845	[InstCombine] add tests for factorization of min/max intrinsics; NFC	2021-08-12 11:19:09 -04:00
Liqiang Tao	422fc5603a	[llvm][Inline] Refactor out InlineOrder Move InlineOrder to separated file. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D107831	2021-08-12 22:19:53 +08:00
Mehdi Amini	28c04794df	Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts commit `a1ef81de35`. Broke the MLIR buildbot.	2021-08-12 11:57:19 +00:00
Florian Hahn	a1ef81de35	[Matrix] Overload stride arg in matrix.columnwise.load/store. This patch adjusts the intrinsics definition of llvm.matrix.column.major.load and llvm.matrix.column.major.store to allow overloading the type of the stride. The bitwidth of the stride is used to perform the offset computation. This fixes a crash when using __builtin_matrix_column_major_load or __builtin_matrix_column_major_store on 32 bit platforms. The stride argument of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit platforms. Note that we still perform offset computations with 64 bit width on 32 bit platforms for accesses that do not take a user-specified stride. This can be fixed separately. Fixes PR51304. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D107349	2021-08-12 10:45:25 +01:00
Christudasan Devadasan	5d940b71ae	Reapply "SROA: Enhance speculateSelectInstLoads" Originally committed as `ffc3fb665d` Reverted in `fcf2d5f402` due to an assertion failure. Original commit message: Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-11 22:58:54 -04:00
Akira Hatanaka	643ce61fb3	[ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the release call findSafeStoreForStoreStrongContraction checks whether it's safe to move the release call to the store by inspecting all instructions between the two, but was ignoring retain instructions. This was causing objects to be released and deallocated before they were retained. rdar://81668577	2021-08-11 13:50:19 -07:00
Sanjay Patel	a0a9c9e188	[InstCombine] avoid breaking up min/max (cmp+sel) idioms This is a quick fix for a motivating case that looks like this: https://godbolt.org/z/GeMqzMc38 As noted, we might be able to restore the min/max patterns with select folds, or we just wait for this to become easier with canonicalization to min/max intrinsics.	2021-08-11 12:48:11 -04:00
Yolanda Chen	8fa16cc628	[LTO][lld] Add lto-pgo-warn-mismatch option When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX) due to source changes (e.g. `#if` code runs for profile generation but not for profile use) To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option. In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option. Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO. Differential Revision: https://reviews.llvm.org/D104431	2021-08-11 09:45:55 -07:00
Wang, Pengfei	6c4809825d	Revert "[lld] Add lto-pgo-warn-mismatch option" This reverts commit `0cfb00a1c9`.	2021-08-11 16:25:42 +08:00
Yolanda Chen	0cfb00a1c9	[lld] Add lto-pgo-warn-mismatch option When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX). To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option. In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option. Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104431	2021-08-11 14:43:26 +08:00
Petr Hosek	389dc94d4b	[InstrProfiling] Generate runtime hook for Fuchsia When none of the translation units in the binary have been instrumented we shouldn't need to link the profile runtime. However, because we pass -u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still be pulled in and incur some overhead. On Fuchsia which uses runtime counter relocation, it also means that we cannot reference the bias variable unconditionally. This change modifies the InstrProfiling pass to pull in the profile runtime only when needed by declaring the __llvm_profile_runtime symbol in the translation unit only when needed. For now we restrict this only for Fuchsia, but this can be later expanded to other platforms. This approach was already used prior to `9a041a7522`, but we changed it to always generate the __llvm_profile_runtime due to a TAPI limitation, but that limitation may no longer apply, and it certainly doesn't apply on platforms like Fuchsia. Differential Revision: https://reviews.llvm.org/D98061	2021-08-10 23:21:15 -07:00
Petr Hosek	c0c1c3cf93	Revert "[InstrProfiling] Emit bias variable eagerly" This reverts commit `6660cec568` since it was superseded by https://reviews.llvm.org/D98061.	2021-08-10 23:21:15 -07:00
Johannes Doerfert	fc32a5c87d	[Attributor][NFC] Try to make the windows build bots happy Failed for some reason, potentially because of the inner type declaration in combination with the `using`. This might help. Failure: https://lab.llvm.org/buildbot/#/builders/127/builds/15432	2021-08-11 01:11:37 -05:00
Johannes Doerfert	e7e3585cde	[Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly PHI nodes are not pass through but change their value, we have to account for that to avoid missing stores. Follow up for D107798 to fix PR51249 for good. Differential Revision: https://reviews.llvm.org/D107808	2021-08-11 00:49:54 -05:00
Johannes Doerfert	96da6dd6ba	[Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249) AAPointerInfoFloating needs to visit all uses and some multiple times if we go through PHI nodes. Attributor::checkForAllUses keeps a visited set so we don't recurs endlessly. We now allow recursion for non-phi uses so we track all pointer offsets via PHI nodes properly without endless recursion. This replaces the first attempt D107579. Differential Revision: https://reviews.llvm.org/D107798	2021-08-11 00:49:54 -05:00
Johannes Doerfert	e0c5d83a92	[OpenMP][FIX] Disabled optimizations have to be made known To avoid simplification with wrong constants we need to make sure we know that we won't perform specific optimizations based on the users request. The non-SPMDzation and non-CustomStateMachine flags did only prevent the final transformation but allowed to value simplification to go ahead. Differential Revision: https://reviews.llvm.org/D107862	2021-08-11 00:49:53 -05:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
Adrian Prantl	a353edb8d6	Simplify coro::salvageDebugInfo() (NFC-ish) This patch removes the hand-rolled implementation of salvageDebugInfo for cast and GEPs and replaces it with a call into llvm::salvageDebugInfoImpl(). A side-effect of this is that additional redundant convert operations are introduced, but those don't have any negative effect on the resulting DWARF expression. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107384	2021-08-10 15:21:18 -07:00
Adrian Prantl	d6b6880172	Streamline the API of salvageDebugInfoImpl (NFC) This patch refactors / simplifies salvageDebugInfoImpl(). The goal here is to simplify the implementation of coro::salvageDebugInfo() in a followup patch. 1. Change the return value to I.getOperand(0). Currently users of salvageDebugInfoImpl() assume that the first operand is I.getOperand(0). This patch makes this information explicit. A nice side-effect of this change is that it allows us to salvage expressions such as add i8 1, %a in the future. 2. Factor out the creation of a DIExpression and return an array of DIExpression operations instead. This change allows users that call salvageDebugInfoImpl() in a loop to avoid the costly creation of temporary DIExpressions and to defer the creation of a DIExpression until the end. This patch does not change any functionality. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107383	2021-08-10 15:21:18 -07:00
Nikita Popov	17db125b48	[MemCpyOpt] Optimize MemoryDef insertion When converting a store into a memset, we currently insert the new MemoryDef after the store MemoryDef, which requires all uses to be renamed to the new def using a whole block scan. Instead, we can insert the new MemoryDef before the store and not rename uses, because we know that the location is immediately overwritten, so all uses should still refer to the old MemoryDef. Those uses will get renamed when the old MemoryDef is actually dropped, which is efficient. I expect something similar can be done for some of the other MSSA updates in MemCpyOpt. This is an alternative to D107513, at least for this particular case. Differential Revision: https://reviews.llvm.org/D107702	2021-08-10 21:28:29 +02:00
Sanjay Patel	b267d3ce8d	[InstCombine] avoid infinite loops from min/max canonicalization The intrinsics have an extra chunk of known bits logic compared to the normal cmp+select idiom. That allows folding the icmp in each case to something better, but that then opposes the canonical form of min/max that we try to form for a select. I'm carving out a narrow exception to preserve all existing regression tests while avoiding the inf-loop. It seems unlikely that this is the only bug like this left, but this should fix: https://llvm.org/PR51419	2021-08-10 14:42:37 -04:00
Carl Ritson	a1783b54e8	[SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI. Avoid stack overflow errors on systems with small stack sizes by removing recursion in FoldCondBranchOnPHI. This is a simple change as the recursion was only iteratively calling the function again on the same arguments. Ideally this would be compiled to a tail call, but there is no guarantee. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107803	2021-08-10 19:14:31 +09:00
David Sherwood	ce394161cb	[InstCombine] Add more complex folds for extractelement + stepvector I have updated cheapToScalarize to also consider the case when extracting lanes from a stepvector intrinsic. This required removing the existing 'bool IsConstantExtractIndex' and passing in the actual index as a Value instead. We do this because we need to know if the index is <= known minimum number of elements returned by the stepvector intrinsic. Effectively, when extracting lane X from a stepvector we know the value returned is also X. New tests added here: Transforms/InstCombine/vscale_extractelement.ll Differential Revision: https://reviews.llvm.org/D106358	2021-08-10 09:17:21 +01:00
Arnold Schwaighofer	b987c283ae	[coro] Correct CurrentBlock tracking bug recently introduced We use the CurrentBlock to determine whether we have already processed a block. Don't reuse this variable for setting where we should insert the rematerialization. The rematerialization block is different to the current block when we rematerialize for coro suspend block users. Differential Revision: https://reviews.llvm.org/D107573	2021-08-09 10:41:41 -07:00
Christudasan Devadasan	fcf2d5f402	Revert "SROA: Enhance speculateSelectInstLoads" This reverts commit `ffc3fb665d`.	2021-08-09 01:13:39 -04:00
Michael Liao	b5e470aa2e	[LowerMemIntrinsics] Typo fix.	2021-08-08 22:38:58 -04:00
Dorit Nuzman	67278b8a90	[LV] Support Interleaved Store Group With Gaps Teach LV to use masked-store to support interleave-store-group with gaps (instead of scatters/scalarization). The symmetric case of using masked-load to support interleaved-load-group with gaps was introduced a while ago, by https://reviews.llvm.org/D53668; This patch completes the store-scenario leftover from D53668, and solves PR50566. Reviewed by: Ayal Zaks Differential Revision: https://reviews.llvm.org/D104750	2021-08-08 10:32:02 +03:00
Nikita Popov	88003cea1c	[MemCpyOpt] Remove MemDepAnalysis-based implementation The MemorySSA-based implementation has been enabled for a few months (since D94376). This patch drops the old MDA-based implementation entirely. I've kept this to only the basic cleanup of dropping various conditions -- the code could be further cleaned up now that there is only one implementation. Differential Revision: https://reviews.llvm.org/D102113	2021-08-07 22:35:44 +02:00
Krishna	a9a176ca3b	[InstCombine] Remove nnan requirement for transformation to fabs from select In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs. (i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub). (ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D106872	2021-08-07 22:38:45 +05:30
Roman Lebedev	0a241e90d4	[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))` Instead of expanding it ourselves, we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2: https://alive2.llvm.org/ce/z/ymz7zE (self) https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext) https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)	2021-08-07 17:31:33 +03:00
Roman Lebedev	c6ff867f92	[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))` Now that we canonicalize low bit splatting to the form we were emitting here ourselves, emit simpler IR that will be canonicalized later. See `1e801439be` for proofs: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)	2021-08-07 17:31:24 +03:00
Roman Lebedev	e71870512f	[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305) Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF), so we should have a preference. It seems like mask+negation is better than two shifts.	2021-08-07 17:25:28 +03:00
Christudasan Devadasan	ffc3fb665d	SROA: Enhance speculateSelectInstLoads Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-07 09:09:14 -04:00
Florian Hahn	a00aafc30d	[VPlan] Iterate over phi recipes to detect reductions to fix. After refactoring the phi recipes, we can now iterate over all header phis in a VPlan to detect reductions when it comes to fixing them up when tail folding. This reduces the coupling with the cost model & legal by using the information directly available in VPlan. It also removes a call to getOrAddVPValue, which references the original IR value which may become outdated after VPlan transformations. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100102	2021-08-07 14:06:50 +01:00
Sanjay Patel	0369714b31	[InstCombine] reduce vector casting before icmp There may be some generalizations (see test comments) of these patterns, but this should handle the cases motivated by: https://llvm.org/PR51315 https://llvm.org/PR51259 The backend may want to transform differently, but at least for the x86 examples that I looked at, there does not appear to be any significant perf diff either way.	2021-08-06 17:09:38 -04:00
Artem Belevich	6a9cf21f5a	[CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that there are users that do not expect LLVM to materialize `memset` intrinsic. While other passes can do that, too, MemCpyOpt triggers it more frequently and breaks sanitizers and some downstream users. For now introduce a flag to force-enable the flag and opt-in only CUDA compilation with NVPTX back-end. Differential Revision: https://reviews.llvm.org/D106401	2021-08-06 11:13:52 -07:00
Michael Liao	d1cacd5928	[MemCpyOpt] Teach memcpyopt to handle loads from the constant memory. - Loads from the constant memory (either explicit one or as the source of memory transfer intrinsics) won't alias any stores. Reviewed By: asbirlea, efriedma Differential Revision: https://reviews.llvm.org/D107605	2021-08-06 12:43:52 -04:00
David Sherwood	3fd96e1b2e	[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-06 10:13:15 +01:00
Chuanqi Xu	0fd03feb4b	[FuncSpec] Return changed if function is changed by tryToReplaceWithConstant The may get changed before specialization by RunSCCPSolver. In other words, the pass may change the function without specialization happens. Add test and comment to reveal this. And it may return No Changed if the function get changed by RunSCCPSolver before the specialization. It looks like a potential bug. Test Plan: check-all Reviewed By: https://reviews.llvm.org/D107622 Differential Revision: https://reviews.llvm.org/D107622	2021-08-06 17:00:17 +08:00
David Sherwood	43a5c750d1	Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors" This reverts commit `95800da914`.	2021-08-06 09:48:16 +01:00
Chuanqi Xu	62fc3e0ad6	[NFC] [FuncSpec] Remove unused variables in isArgumentInteresting	2021-08-06 16:38:20 +08:00
Chuanqi Xu	cc3f40bb41	[FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish) Noticed that the computation for function specialization cost of a function wouldn't change during the traversal of the arguments for the function. We could hoist the computation out of the traversal. I observed about ~1% improvement on compile time for spec2017. But I guess it may not be precise. This should be NFC and fine. Reviewed By: Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D107621	2021-08-06 15:43:05 +08:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Florian Hahn	3e58dd19df	[LV] Move reduction PHI node fixup to VPlan::execute (NFC). All information to fix-up the reduction phi nodes in the vectorized loop is available in VPlan now. This patch moves the code to do so, to make this clearer. Fixing up the loop exit value still relies on other information and remains outside of VPlan for now. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100113	2021-08-06 08:29:20 +01:00
Chuanqi Xu	82ca845b47	[NFC] [FuncSpec] Update the Todo list for recursive functions Now the recursive functions may get specialized many times when `func-specialization-max-iters` increases. See discussion in https://reviews.llvm.org/D106426 for details.	2021-08-06 14:43:17 +08:00
Arthur Eubanks	a1b21ed3fb	[GCov] Emit memset instead of stores in __llvm_gcov_reset For a very large module, __llvm_gcov_reset can become very large. __llvm_gcov_reset previously emitted stores to a bunch of globals in one huge basic block. MemCpyOpt would turn many of these stores into memsets, and updating MemorySSA would be extremely slow. Verified that this makes the compile time of certain files go down drastically (20min -> 5min). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107538	2021-08-05 22:40:15 -07:00
Chris Jackson	113a06f7a5	{DebugInfo][LSR] Don't cache dbg.value that are already undef The SCEV-based salvaging method caches dbg.value information pre-LSR so that salvaging may be attempted post-LSR. If the dbg.value are already undef pre-LSR then a salvage attempt would be fruitless, so avoid caching them. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D107448	2021-08-05 19:16:43 +01:00
Kazu Hirata	72661f337a	[Transforms] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-08-05 08:53:17 -07:00
Alexey Bataev	e7c3eaa8ae	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107494	2021-08-05 08:41:24 -07:00
David Sherwood	e9177b0958	Fix build issues caused by `95800da914`	2021-08-05 16:26:34 +01:00
Sander de Smalen	3e47f009ff	[LV] Consider ExtractValue as uniform. Since all operands to ExtractValue must be loop-invariant when we deem the loop vectorizable, we can consider ExtractValue to be uniform. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107286	2021-08-05 16:20:50 +01:00
eopXD	fd7f6a3c81	[NFC][LoopIdiom] rename boolean variable NegStride to IsNegStride Rename variable for better code readability. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107570	2021-08-05 23:11:42 +08:00
Momchil Velikov	f171149e0d	[SimpifyCFG] Speculate a store preceded by a local non-escaping load In SimplifyCFG we may simplify the CFG by speculatively executing certain stores, when they are preceded by a store to the same location. This patch allows such speculation also when the stores are similarly preceded by a load. In order for this transformation to be correct we need to ensure that the memory location is writable and the store in the new location does not introduce a data race. Local objects (created by an `alloca` instruction) are always writable, so once we are past a read from a location it is valid to also write to that same location. Seeing just a load does not guarantee absence of a data race (unlike if we see a store) - the load may still be part of a race, just not causing undefined behaviour (cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic). In the original program, a data race might have been prevented by the condition, but once we move the store outside the condition, we must be sure a data race wasn't possible anyway, no matter what the condition evaluates to. One way to be sure that a local object is never concurrently read/written is check that its address never escapes the function. Hence this transformation is restricted to local, non-escaping objects. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D107281	2021-08-05 15:54:42 +01:00
Florian Hahn	38b098be66	[VectorCombine] Limit scalarization known non-poison indices. We can only trust the range of the index if it is guaranteed non-poison. Fixes PR50949. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107364	2021-08-05 15:36:31 +01:00
Dawid Jurczak	06206a8cd1	[BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc Additionally with this patch aligned DSE which is the only user of emitCalloc. Differential Revision: https://reviews.llvm.org/D103523	2021-08-05 16:18:38 +02:00
David Sherwood	95800da914	[LoopVectorize] Add support for replication of more intrinsics with scalable vectors This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-05 15:17:27 +01:00
Dawid Jurczak	f8cdde7195	[SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation FoldMallocMemset can be safely removed because since https://reviews.llvm.org/D103009 such transformation is already performed in DSE. Differential Revision: https://reviews.llvm.org/D103451	2021-08-05 16:08:32 +02:00
Sander de Smalen	8d08a84745	[LV] Remove a change that was added in D106164. This change wasn't strictly necessary for D106164 and could be removed. This patch addresses the post-commit comments from @fhahn on D106164, and also changes sve-widen-gep.ll to use the same IR test as shown in pointer-induction.ll. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106878	2021-08-05 14:44:53 +01:00
eopXD	26aa1bbe97	[NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned Letting it take SCEV allows further modification on the function to optimize if the StoreSize / Stride is runtime determined. This is a preceeding of D107353. The big picture is to let LoopIdiom deal with runtime-determined sizes. Reviewed By: Whitney, lebedev.ri Differential Revision: https://reviews.llvm.org/D104595	2021-08-05 13:21:48 +08:00
Nikita Popov	bb15861e14	[MemCpyOpt] Relax libcall checks Rather than blocking the whole MemCpyOpt pass if the libcalls are not available, only disable creation of new memset/memcpy intrinsics where only load/stores were used previously. This only affects the store merging and load-store conversion optimization. Other optimizations are derived from existing intrinsics, which are well-defined in the absence of libcalls -- not having the libcalls just means that call simplification won't convert them to intrinsics. This is a weaker variation of D104801, which dropped these checks entirely. Ideally we would not couple emission of intrinsics to libcall availability at all, but as the intrinsics may be legalized to libcalls we need to be a bit careful right now. Differential Revision: https://reviews.llvm.org/D106769	2021-08-04 21:17:51 +02:00
Giorgis Georgakoudis	29a3e3dd7b	[OpenMPOpt] Expand SPMDization with guarding for target parallel regions This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106892	2021-08-04 11:49:24 -07:00
Alexey Bataev	214f99b27c	Revert "[SLP]Do not emit extra shuffle for insertelements vectorization." This reverts commit `871ea69803` to fix the problem if the first vector is not just undef.	2021-08-04 11:28:59 -07:00
Dawid Jurczak	238139be09	[DSE][NFC] Clean up DeadStoreElimination from unused variables Differential Revision: https://reviews.llvm.org/D106446	2021-08-04 19:44:40 +02:00
Petr Hosek	6660cec568	[InstrProfiling] Emit bias variable eagerly Rather than emitting the bias variable lazily as needed, emit it eagerly. This allows profile runtime to refer to this variable unconditionally without having to use the weak reference. The bias variable is in a COMDAT so there'll never be more than one instance, and if it's not needed, linker should be able to GC it, so the overhead should be minimal. Differential Revision: https://reviews.llvm.org/D107377	2021-08-04 10:17:08 -07:00
Sander de Smalen	fe6ae81ef3	[InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded. According to the LangRef, a (vscale_range) value of 0 means unbounded. This patch additionally cleans up the test file vscale_sext_and_zext.ll.	2021-08-04 17:17:37 +01:00
Chris Jackson	21ee38e24f	[DebugInfo][LSR] Avoid crashes on large integer inputs SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may contain very large integers but the translation does not support integers greater than 64 bits. This patch adds checks to ensure conversions of these large integers is not attempted. A regression test is added to ensure no such translation is attempted. Reviewed by: StephenTozer PR: https://bugs.llvm.org/show_bug.cgi?id=51329 Differential Revision: https://reviews.llvm.org/D107438	2021-08-04 15:51:22 +01:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Sjoerd Meijer	30fbb06979	[FuncSpec] Support specialising recursive functions This adds support for specialising recursive functions. For example: int Global = 1; void recursiveFunc(int arg) { if (arg < 4) { print(arg); recursiveFunc(arg + 1); } } void main() { recursiveFunc(&Global); } After 3 iterations of function specialisation, followed by inlining of the specialised versions of recursiveFunc, the main function looks like this: void main() { print(1); print(2); print(3); } To support this, the following has been added: - Update the solver and state of the new specialised functions, - An optimisation to propagate constant stack values after each iteration of function specialisation, which is necessary for the next iteration to recognise the constant values and trigger. Specialising recursive functions is (at the moment) controlled by option -func-specialization-max-iters and is opt-in for compile-time reasons. I.e., the default is -func-specialization-max-iters=1, but for the example above we would need to use -func-specialization-max-iters=3. Future work is to see if we can increase the default, or improve the cost-model/heuristics to control compile-times. Differential Revision: https://reviews.llvm.org/D106426	2021-08-04 08:07:04 +01:00
Shimin Cui	2d9759c790	[GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107397	2021-08-03 19:22:53 -04:00
Craig Topper	b818da27ab	[SimplifyCFG] Enable switch to lookup table for more types. This transform has been restricted to legal types since https://reviews.llvm.org/rG65df808f6254617b9eee931d00e95d900610b660 in 2012. This is particularly restrictive on RISCV64 which only has i64 as a legal integer type. i32 is a very common type in code generated from C, but we won't form a lookup table with it. This also effects other common types like i8/i16 types on ARM, AArch64, RISCV, etc. This patch proposes to allow power of 2 types larger than 8 bit, if they will fit in the largest legal integer type in DataLayout. These types are common in C code so generally well handled in the backends. We could probably do this for other types like i24 and rely on alignment and padding to allow the backend to use a single wider load. This isn't my main concern right now and it will need more tests. We could also allow larger types up to some limit and let the backend split into multiple loads, but we need to define that limit. It's also not my main concern right now. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107233	2021-08-03 15:35:16 -07:00
Alexey Bataev	871ea69803	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107344	2021-08-03 13:18:41 -07:00
Alexey Bataev	7d9d926a18	Revert "[SLP]Improve graph reordering." This reverts commit `e408d1dfab` and 2 other (`4b25c11321` and `c2deb2afaf`) related to fix the problem with the reordering shuffles.	2021-08-03 12:13:43 -07:00
Sami Tolvanen	7ce1c4da77	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands `700d07f8ce` with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058	2021-08-03 11:35:30 -07:00
Dylan Fleming	3943a74666	[InstCombine] Fixed select + masked load fold failure Fixed type assertion failure caused by trying to fold a masked load with a select where the select condition is a scalar value Reviewed By: sdesmalen, lebedev.ri Differential Revision: https://reviews.llvm.org/D107372	2021-08-03 19:06:12 +01:00
Philip Reames	223835f08b	[runtimeunroll] A bit of style cleanup to simplify a following change [NFC] Use for-range, use the idiomatic pattern for non-loop values, etc..	2021-08-03 10:28:46 -07:00
Krishna	946fd4ea65	Revert "[InstCombine] Remove nnan requirement for transformation to fabs from select" This reverts commit `6180ce2e2a`.	2021-08-03 18:08:11 +05:30
Krishna	d99260641b	[InstCombine] Fold phi ( inttoptr/ptrtoint x ) to phi (x) The inttoptr/ptrtoint roundtrip optimization is not always correct. We are working towards removing this optimization and adding support to specific cases where this optimization works. In this patch, we focus on phi-node operands with inttoptr casts. We know that ptrtoint( inttoptr( ptrtoint x) ) is same as ptrtoint (x). So, we want to remove this roundtrip cast which goes through phi-node. Reviewed By: aqjune Differential Revision: https://reviews.llvm.org/D106289	2021-08-03 17:52:59 +05:30
Krishna	6180ce2e2a	[InstCombine] Remove nnan requirement for transformation to fabs from select In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs. (i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub). (ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg). Differential Revision: https://reviews.llvm.org/D106872	2021-08-03 17:52:58 +05:30
David Sherwood	0156f91f3b	[NFC] Rename enable-strict-reductions to force-ordered-reductions I'm renaming the flag because a future patch will add a new enableOrderedReductions() TTI interface and so the meaning of this flag will change to be one of forcing the target to enable/disable them. Also, since other places in LoopVectorize.cpp use the word 'Ordered' instead of 'strict' I changed the flag to match. Differential Revision: https://reviews.llvm.org/D107264	2021-08-03 09:33:01 +01:00
Shimin Cui	7ce98cf56e	[GlobalOpt] Fix the assert for stored once non-pointer to global address This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107302	2021-08-02 19:23:29 -04:00
Roman Lebedev	6f6e9a867f	[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107271	2021-08-03 00:57:26 +03:00
Roman Lebedev	4ba3326f17	[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) This allows the expansion logic to actually trigger if the argument was extended from i1 element type, like the rest of the reductions expect. Alive2 agrees: https://alive2.llvm.org/ce/z/wcfews (or zext) https://alive2.llvm.org/ce/z/FCXNFx (or sext) https://alive2.llvm.org/ce/z/f26zUY (and zext) https://alive2.llvm.org/ce/z/jprViN (and sext)	2021-08-03 00:54:35 +03:00
Roman Lebedev	554fc9ad0a	[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/3oqir9 (self) https://alive2.llvm.org/ce/z/6cuI5m (zext) https://alive2.llvm.org/ce/z/4FL8rD (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Roman Lebedev	f47b7b6d10	[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/noXtZ8 (self) https://alive2.llvm.org/ce/z/JNrN6C (zext) https://alive2.llvm.org/ce/z/58snuN (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Nikita Popov	c7770574f9	Revert "[unroll] Move multiple exit costing into consumer pass [NFC]" This reverts commit `76940577e4`. This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.	2021-08-02 22:23:34 +02:00
Roman Lebedev	b9b7162b8b	[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/NbBaeT (self) https://alive2.llvm.org/ce/z/iEaig4 (zext) https://alive2.llvm.org/ce/z/meGb3y (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 23:02:23 +03:00
Roman Lebedev	0c13798056	[InstCombine] `vector_reduce_umin(?ext(<n x i1>))` --> `?ext(vector_reduce_and(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/XxUScW (self) https://alive2.llvm.org/ce/z/3usTF- (zext) https://alive2.llvm.org/ce/z/GVxwQz (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 23:02:22 +03:00
Philip Reames	76940577e4	[unroll] Move multiple exit costing into consumer pass [NFC] This aligns the multiple exit costing with all the other cost decisions. Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.	2021-08-02 12:46:23 -07:00
Nikita Popov	380b8a603c	[DFAJumpThreading] Use SmallPtrSet for Visited (NFC) This set is only used for contains checks, so there is no need to use std::set.	2021-08-02 21:30:25 +02:00
Nikita Popov	3f7aea1a37	[DFAJumpThreading] Use insert return value (NFC) Rather than find + insert. Also use range based for loop.	2021-08-02 21:21:21 +02:00
Nikita Popov	84602f98c6	[DFAJumpThreading] Remove unnecessary includes (NFC) This file uses neither unordered_map nor unordered_set.	2021-08-02 21:13:30 +02:00
Nikita Popov	e97524cba2	[DFAJumpThreading] Mark DT as preserved in LegacyPM It is marked as preserved in NewPM, but not LegacyPM.	2021-08-02 21:13:30 +02:00
Roman Lebedev	469793efa7	[InstCombine] `vector_reduce_mul(?ext(<n x i1>))` --> `zext(vector_reduce_and(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/PDansB (self) https://alive2.llvm.org/ce/z/55D-Xc (zext) https://alive2.llvm.org/ce/z/LxG3-r (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 21:57:51 +03:00
Philip Reames	9016beaa24	[unrollruntime] Pull out a helper function for readability and eventual reuse [nfc]	2021-08-02 11:47:27 -07:00
Philip Reames	ebc4c4e3b0	[unroll] Add clarifying comment The option to not preserve LCSSA is in fact not tested at all in upstream. I was tempted to just remove the code entirely, but realized I didn't need to for my actual goal.	2021-08-02 10:44:56 -07:00
Florian Hahn	bb725c9803	[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe. This patch updates VPInterleaveRecipe::print to print the actual defined VPValues for load groups and the store VPValue operands for store groups. The IR references may become outdated while transforming the VPlan and the defined and stored VPValues always are up-to-date. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D107223	2021-08-02 18:36:36 +01:00
Roman Lebedev	1e801439be	[InstCombine] `xor` reduction w/ i1 elt type is a parity check For i1 element type, `xor` and `add` are interchangeable (https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like an `add` reduction and consistently transform them both: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext) Though, let's emit the IR that is similar to the one we produce for `vector_reduce_add(<n x i1>)`. See https://bugs.llvm.org/show_bug.cgi?id=51259	2021-08-02 20:21:37 +03:00
Florian Mayer	66b4aafa2e	[hwasan] Detect use after scope within function. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D105201	2021-08-02 11:34:12 +01:00
Rosie Sumpter	f117ed542f	[LoopFlatten] Fix missed LoopFlatten opportunity When the limit of the inner loop is a known integer, the InstCombine pass now causes the transformation e.g. imcp ult i32 %inc, tripcount -> icmp ult %j, tripcount-step (where %j is the inner loop induction variable and %inc is add %j, step), which is now accounted for when identifying the trip count of the loop. This is also an acceptable use of %j (provided the step is 1) so is ignored as long as the compare that it's used in is also the condition of the inner branch. Differential Revision: https://reviews.llvm.org/D105802	2021-08-02 11:09:54 +01:00
Shimin Cui	732b05555c	[GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106589	2021-07-31 18:42:02 -04:00
Sanjay Patel	f2a322bfcf	[SROA] prevent crash on large memset length (PR50910) I don't know much about this pass, but we need a stronger check on the memset length arg to avoid an assert. The current code was added with D59000. The test is reduced from: https://llvm.org/PR50910 Differential Revision: https://reviews.llvm.org/D106462	2021-07-31 14:07:30 -04:00
Sanjay Patel	a22c99c3c1	[InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant We can invert a compare constant and preserve the logic as shown in this sampling: https://alive2.llvm.org/ce/z/YAXbfs (In theory, we could deal with non-all-ones/zero as well, but it doesn't seem worthwhile.) I noticed this as a part of the x86 codegen difference in https://llvm.org/PR51259 - it ends up using "test" instead of "not + cmp" in that example. This pattern also shows up in https://llvm.org/PR41312 and https://llvm.org/PR50798 . Differential Revision: https://reviews.llvm.org/D107170	2021-07-31 13:31:12 -04:00
Florian Mayer	b5b023638a	Revert "[hwasan] Detect use after scope within function." This reverts commit `84705ed913`.	2021-07-30 22:32:04 +01:00
Brendon Cahoon	c4c379d633	[LoopStrengthReduction] Fix pointer extend asserts Additional asserts were added to ScalarEvolution to enforce pointer/int type rules. An assert is triggered when the LSR pass attempts to extend a pointer SCEV in GenerateTruncates. This patch changes GenerateTruncates to exit early if the Formaula contains a ScaledReg or BaseReg with a pointer type. Differential Revision: https://reviews.llvm.org/D107185	2021-07-30 17:24:08 -04:00
Fangrui Song	a1532ed275	[InstrProfiling] Make CountersPtr in __profd_ relative Change `CountersPtr` in `__profd_` to a label difference, which is a link-time constant. On ELF, when linking a shared object, this requires that `__profc_` is either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that `.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation. ``` # ELF: R_X86_64_PC64 (PC-relative) .quad .L__profc_foo-.L__profd_foo # Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR .quad l___profc_foo-l___profd_foo # COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so # the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo # As compensation, we truncate CountersDelta in the header so that # __llvm_profile_merge_from_buffer and llvm-profdata reader keep working. .quad .L__profc_foo-.L__profd_foo ``` (Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer has `.lprfd` before `.lprfc`, so we cannot work around by reordering `.lprfc` and `.lprfd`.) With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) `ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations. ``` % readelf -r pie \| awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}' R_X86_64_JUMP_SLO 331 R_X86_64_TPOFF64 2 R_X86_64_RELATIVE 476059 # was: 607712 R_X86_64_64 2616 R_X86_64_GLOB_DAT 31 ``` The absolute function address (used by llvm-profdata to collect indirect call targets) can be converted to relative as well, but is not done in this patch. Differential Revision: https://reviews.llvm.org/D104556	2021-07-30 11:52:18 -07:00
Simon Pilgrim	afc6b09dee	[InstCombine] getMaskedTypeForICmpPair - remove dead code. NFCI. Ok should be true at this point, so the early-out is dead - replace with an assert.	2021-07-30 19:23:05 +01:00
Alexey Bataev	95e5d401ae	[SLP]Improve splats vectorization. Replace insertelement instructions for splats with just single insertelement + broadcast shuffle. Also, try to merge these instructions if they come from the same/shuffled gather node. Differential Revision: https://reviews.llvm.org/D107104	2021-07-30 10:17:45 -07:00
Kazu Hirata	e76ddfa9ef	[Transforms] Remove HasValueForBlock (NFC) The function seems to be unused for at least one year.	2021-07-30 08:56:49 -07:00
Dylan Fleming	a7a39ec886	[SVE] Add folds for sign and zero extends of vscale Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D105994	2021-07-30 16:02:50 +01:00
Florian Mayer	84705ed913	[hwasan] Detect use after scope within function. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D105201	2021-07-30 13:59:36 +01:00
Alexey Bataev	4b25c11321	[SLP]Fix an assertion for the size of user nodes. For the nodes with reused scalars the user may be not only of the size of the final shuffle but also of the size of the scalars themselves, need to check for this. It is safe to just modify the check here, since the order of the scalars themselves is preserved, only indeces of the reused scalars are changed. So, the users with the same size as the number of scalars in the node, will not be affected, they still will get the operands in the required order. Reported by @mstorsjo in D105020. Differential Revision: https://reviews.llvm.org/D107080	2021-07-30 05:46:44 -07:00
Alexey Bataev	f4fb854811	[SLP]Do not consider deleted instruction as external users. If the instruction was previously deleted, it should not be treated as an external user. This fixes cost estimation and removes dead extractelement instructions. Differential Revision: https://reviews.llvm.org/D107106	2021-07-30 05:37:43 -07:00
Alexey Bataev	c2deb2afaf	[SLP]Fix a crash in gathered loads analysis. Need to check that the minimum acceptable vector factor is at least 2, not 0, to avoid compiler crash during gathered loads analysis. Differential Revision: https://reviews.llvm.org/D107058	2021-07-30 05:19:17 -07:00
Joseph Huber	cd0dd8ece8	[OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following: - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared. - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode. - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall. - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106802	2021-07-29 19:28:31 -04:00
Andy Kaylor	b4d945bacd	Fixing an infinite loop problem in InstCombine Patch by Mohammad Fawaz This issues started happening after `b373b5990d` Basically, if the memcpy is volatile, the collectUsers() function should return false, just like we do for volatile loads. Differential Revision: https://reviews.llvm.org/D106950	2021-07-29 12:57:17 -07:00
Dawid Jurczak	5c315bee8c	[DSE] Transform memset + malloc --> calloc (PR25892) After this change DSE can eliminate malloc + memset and emit calloc. It's https://reviews.llvm.org/D101440 follow-up. Differential Revision: https://reviews.llvm.org/D103009	2021-07-29 18:34:10 +02:00
Rosie Sumpter	fab5659c79	Revert "[LoopFlatten] Fix missed LoopFlatten opportunity" This reverts commit `2df8bf9339`. Reverting because it causes an assertion failure.	2021-07-29 15:52:45 +01:00
Jeremy Morse	2537120c87	Follow-up to D105207, only salvage affine SCEVs to avoid a crash SCEVToIterCountExpr only expects to be fed affine expressions, but DbgRewriteSalvageableDVIs is feeding it non-affine induction variables. Following this up with an obvious fix, will add test coverage too if this avoids D105207 being reverted.	2021-07-29 11:48:08 +01:00
Rosie Sumpter	2df8bf9339	[LoopFlatten] Fix missed LoopFlatten opportunity When the trip count of the inner loop is a constant, the InstCombine pass now causes the transformation e.g. imcp ult i32 %inc, tripcount -> icmp ult %j, tripcount-step (where %j is the inner loop induction variable and %inc is add %j, step), which is now accounted for when identifying the trip count of the loop. This is also an acceptable use of %j (provided the step is 1) so is ignored as long as the compare that it's used in is also the condition of the inner branch. Differential Revision: https://reviews.llvm.org/D105802	2021-07-29 09:47:41 +01:00
Joseph Huber	adbaa39dfc	[Attributor] Change function internalization to not replace uses in internalized callers The current implementation of function internalization creats a copy of each function and replaces every use. This has the downside that the external versions of the functions will call into the internalized versions of the functions. This prevents them from being fully independent of eachother. This patch replaces the current internalization scheme with a method that creates all the copies of the functions intended to be internalized first and then replaces the uses as long as their caller is not already internalized. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106931	2021-07-28 18:57:28 -04:00
Chris Jackson	0ba8595287	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR Reapply commit `d675b594f4` that was reverted due to buildbot failures. A simple fix has been applied to remove an assertion. Differential Revision: https://reviews.llvm.org/D105207	2021-07-28 23:04:59 +01:00
Sjoerd Meijer	bc43078fe8	[LoopFlatten] Fix bug where SCEVCouldNotCompute object is used The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute object if the backedge-taken count is unpredictable. This fix ensures there is no longer an attempt to use such an object to find the trip count. Patch by: Rosie Sumpter. Differential Revision: https://reviews.llvm.org/D106970	2021-07-28 18:35:08 +01:00
Jeroen Dobbelaere	03b8c69d06	[PredicateInfo] Use Intrinsic::getDeclaration now that it handles unnamed types. This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned In D91661#2875179 by @jroelofs. (The first attempt was in D105983) D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls. This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead of looking at the available users of a prototype. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D106147	2021-07-28 19:30:29 +02:00
Jeroen Dobbelaere	dc5570d149	Revert "Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types." This reverts commit `77080a1eb6`. This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the needed function cleanup. A next patch will then just improve on the name mangling.	2021-07-28 19:30:29 +02:00
Fangrui Song	6da3d8b19c	[llvm] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]] [[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015. Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.	2021-07-28 09:31:14 -07:00
Chris Jackson	3992896043	Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR" Reverted due to buildbot failures. This reverts commit `d675b594f4`.	2021-07-28 16:44:54 +01:00
Chris Jackson	d675b594f4	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR Reapply commit `796b84d26f` that was reverted due to reports of crashes. A minor change now guards against getVariableLocationOperand() returning a nullptr. Differential Revision: https://reviews.llvm.org/D106659	2021-07-28 16:28:46 +01:00
Sanjay Patel	5b83261c15	[DivRemPairs] make sure we have a valid CFG for hoisting division This transform was added with `e38b7e8948` and as shown in: https://llvm.org/PR51241 ...it could crash without an extra check of the blocks. There might be a more compact way to write this constraint, but we can't just count the successors/predecessors without affecting a test that includes a switch instruction.	2021-07-28 11:09:12 -04:00
Alexey Bataev	3ad6437fcc	[SLP]Fix build on MacOS, NFC.	2021-07-28 06:33:13 -07:00
Alexey Bataev	e408d1dfab	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-07-28 05:49:06 -07:00
Florian Hahn	c07dd2b885	[LV] Move recurrence backedge fixup code to VPlan::execute (NFC). As suggested in D105008, move the code that fixes up the backedge value for first order recurrences to VPlan::execute. Now all that remains in fixFirstOrderRecurrences is the code responsible for creating the exit values in the middle block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D106244	2021-07-28 13:32:40 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
Chris Jackson	04b94c7cae	Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR" Crashes were reported on the upstreamm revision: https://reviews.llvm.org/D105207 This reverts commit `796b84d26f`.	2021-07-28 10:05:54 +01:00

... 4 5 6 7 8 ...

28640 Commits