llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	40bc309911	Revert "[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration" This reverts commit `d40b4911bd`. This causes a large compile-time regression: https://llvm-compile-time-tracker.com/compare.php?from=0aa637b2037d882ddf7861284169abf63f524677&to=d40b4911bd9aca0573752e065f29ddd9aff280e1&stat=instructions	2021-03-16 20:41:26 +01:00
Mircea Trofin	d40b4911bd	[regalloc] Ensure Query::collectInterferringVregs is called before interval iteration The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs() needs to be called before iterating the interfering live ranges. The rest of the patch offers support that is the case: instead of clearing the query's InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix is invalidated (existing triggering mechanism). Without the change in RegAllocGreedy.cpp, the compiler ices. This patch should make it more easily discoverable by developers that collectInterferringVregs needs to be called before iterating. I will follow up with a subsequent patch to improve the usability and maintainability of Query. Differential Revision: https://reviews.llvm.org/D98232	2021-03-16 12:10:10 -07:00
Nick Lewycky	b743bbc505	Add ConstantDataVector::getRaw() to create a constant data vector from raw data. This parallels ConstantDataArray::getRaw() and can be used with ConstantDataSequential::getRawDataValues() in the base class for both types. Update BuildConstantData{Array,Vector} tests to test the getRaw API. Also removes its unused Module. In passing, update some comments to include the support for half and bfloat. Update tests to include testing for bfloat. Differential Revision: https://reviews.llvm.org/D98302	2021-03-16 11:57:53 -07:00
Aaron Puchert	1cb15b10ea	Correct Doxygen syntax for inline code There is no syntax like {@code ...} in Doxygen, @code is a block command that ends with @endcode, and generally these are not enclosed in braces. The correct syntax for inline code snippets is @c <code>. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D98665	2021-03-16 15:17:45 +01:00
serge-sans-paille	b2e78a061c	[NFC] Use SmallString instead of std::string for the AttrBuilder This avoids a few unnecessary conversion from StringRef to std::string, and a bunch of extra allocation thanks to the SmallString. Differential Revision: https://reviews.llvm.org/D98190	2021-03-16 13:34:14 +01:00
Caroline Concatto	3c03635d53	[SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse This patch adds support for reverse loop vectorization. It is possible to vectorize the following loop: ``` for (int i = n-1; i >= 0; --i) a[i] = b[i] + 1.0; ``` with fixed or scalable vector. The loop-vectorizer will use 'reverse' on the loads/stores to make sure the lanes themselves are also handled in the right order. This patch adds support for scalable vector on IRBuilder interface to create a reverse vector. The IR function CreateVectorReverse lowers to experimental.vector.reverse for scalable vector and keedp the original behavior for fixed vector using shuffle reverse. Differential Revision: https://reviews.llvm.org/D95363	2021-03-16 07:51:59 +00:00
Bing1 Yu	4f198b0c27	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-16 10:40:22 +08:00
Lang Hames	ecf6466f01	[JITLink][MachO][x86-64] Introduce generic x86-64 support. This patch introduces generic x86-64 edge kinds, and refactors the MachO/x86-64 backend to use these edge kinds. This simplifies the implementation of the MachO/x86-64 backend and makes it possible to write generic x86-64 passes and utilities. The new edge kinds are different from the original set used in the MachO/x86-64 backend. Several edge kinds that were not meaningfully distinguished in that backend (e.g. the PCRelMinusN edges) have been merged into single edge kinds in the new scheme (these edge kinds can be reintroduced later if we find a use for them). At the same time, new edge kinds have been introduced to convey extra information about the state of the graph. E.g. The RequestAndTransformTo* edges represent GOT/TLVP relocations prior to synthesis of the GOT/TLVP entries, and the 'Relaxable' suffix distinguishes edges that are candidates for optimization from edges which should be left as-is (e.g. to enable runtime redirection). ELF/x86-64 will be refactored to use these generic edges at some point in the future, and I anticipate a similar refactor to create a generic arm64 support header too. Differential Revision: https://reviews.llvm.org/D98305	2021-03-15 15:43:07 -07:00
Nick Lewycky	483a253ae9	NFC: Formatting changes. Run clang-format over these files. Capitalize some variable names per clang-tidy's request. Pulled out to simplify review of D98302.	2021-03-15 14:26:39 -07:00
Florian Hahn	bb244ea2a8	[AnnotationRemarks] Remove unneeded Function.h include (NFC).	2021-03-15 21:09:35 +00:00
Wenlei He	a5d30421a6	[CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining. For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work: - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee. - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added. - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile). - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate. Differential Revision: https://reviews.llvm.org/D98590	2021-03-15 12:22:15 -07:00
Fangrui Song	5d44c92bf8	Change void getNoop(MCInst &NopInst) to MCInst getNop() Prefer (self-documenting) return values to output parameters (which are liable to be used). While here, rename Noop to Nop which is more widely used and improves consistency with hasEmitNops/setEmitNops/emitNop/etc.	2021-03-15 12:05:34 -07:00
Stelios Ioannou	ab86edbc88	[AArch64] Implement __rndr, __rndrrs intrinsics This patch implements the __rndr and __rndrrs intrinsics to provide access to the random number instructions introduced in Armv8.5-A. They are only defined for the AArch64 execution state and are available when __ARM_FEATURE_RNG is defined. These intrinsics store the random number in their pointer argument and return a status code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter intrinsic reseeds the random number generator. The instructions write the NZCV flags indicating the success of the operation that we can then read with a CSET. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics [2] https://bugs.llvm.org/show_bug.cgi?id=47838 Differential Revision: https://reviews.llvm.org/D98264 Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc	2021-03-15 17:51:48 +00:00
Zahira Ammarguellat	80ca4fd154	[NFC] Fix "unused parameter" error revealed in the Linux self-build.	2021-03-15 12:17:11 -04:00
Jan Kratochvil	a28facba1c	[llvm] [dwarf] Fix DWARFListTableHeader::getOffsetEntry off-by-one D98289 was erroneously reporting `invalid range list offset 0x20110` instead of `invalid range list table index 0`. Differential Revision: https://reviews.llvm.org/D98589	2021-03-14 21:42:44 +01:00
Matt Arsenault	d57d8f364f	CodeGen: Reorder MachinePointerInfo fields This saves a little bit of padding.	2021-03-14 10:06:39 -04:00
kuterd	44c1425c17	[Attributor][fix] Remove problematic EXPENSIVE_CHECK Remove the check that is causing compilation issues in some build configurations.	2021-03-13 18:03:24 +03:00
Lang Hames	4e30b20bdb	[JITLink][ORC] Make the LinkGraph available to modifyPassConfig. This makes the target triple, graph name, and full graph content available when making decisions about how to populate the linker pass pipeline. Also updates the LLJITWithObjectLinkingLayerPlugin example to show more API use, including use of the API changes in this patch.	2021-03-12 18:42:51 -08:00
Hubert Tong	4f9cc1512d	Revert "[AsmParser][SystemZ][z/OS] Introducing HLASM Comment Syntax" This reverts commit `bcdd40f802`. See https://reviews.llvm.org/D98543.	2021-03-12 14:48:00 -05:00
Zahira Ammarguellat	c2006f857d	[NFC] Fix "unused parameter" error revealed in the Linux self-build.	2021-03-12 10:26:40 -08:00
Anirudh Prasad	bcdd40f802	[AsmParser][SystemZ][z/OS] Introducing HLASM Comment Syntax - This patch adds in support for the ordinary HLASM comment syntax asm statements (Reference - Chapter 7, Comment Statements, Ordinary Comment Statements) - In brief, the ordinary comment syntax if used, must begin with the "" character - To achieve this, this patch makes use of the CommentString attribute provided in the base MCAsmInfo class - In the SystemZMCAsmInfo class, the CommentString attribute was set to "" based on the assembler dialect - Furthermore, a new attribute RestrictCommentString, is provided to only treat a string as a comment if it appears at the start of the asm statement. Example: "jo -4" is valid in HLASM (jump back 4 bytes from current point - similar to jo -4 in gnu asm) and we don't want "-4" to be treated as a comment. - RFC for HLASM Parser support implementation: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html Reviewed By: scott.linder, Kai Differential Revision: https://reviews.llvm.org/D97703	2021-03-12 11:56:11 -05:00
Matt Arsenault	6b76d82853	GlobalISel: Fix marking byval arguments as immutable byval arguments need to be assumed writable. Only implicitly stack passed arguments which aren't addressable in the IR can be assumed immutable. Mips is still broken since for some reason its doing its own thing with the ValueHandlers (and x86 doesn't actually handle byval arguments now, although some of the code is there).	2021-03-12 09:01:53 -05:00
serge-sans-paille	bc4a5bdce4	[NFC] Use StringRef instead of const char* for AsmPrinter This avoids calling strlen to repeatedly compute some string size.	2021-03-12 14:39:25 +01:00
Hans Wennborg	f50aef745c	Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user" This broke the check-profile tests on Mac, see comment on the code review. > This is no longer needed, we can add __llvm_profile_runtime directly > to llvm.compiler.used or llvm.used to achieve the same effect. > > Differential Revision: https://reviews.llvm.org/D98325 This reverts commit `c7712087cb`. Also reverting the dependent follow-up commit: Revert "[InstrProfiling] Generate runtime hook for ELF platforms" > When using -fprofile-list to selectively apply instrumentation only > to certain files or functions, we may end up with a binary that doesn't > have any counters in the case where no files were selected. However, > because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the > runtime would still be pulled in and incur some non-trivial overhead, > especially in the case when the continuous or runtime counter relocation > mode is being used. A better way would be to pull in the profile runtime > only when needed by declaring the __llvm_profile_runtime symbol in the > translation unit only when needed. > > This approach was already used prior to `9a041a7522`, but we changed it > to always generate the __llvm_profile_runtime due to a TAPI limitation. > Since TAPI is only used on Mach-O platforms, we could use the early > emission of __llvm_profile_runtime there, and on other platforms we > could change back to the earlier approach where the symbol is generated > later only when needed. We can stop passing -u__llvm_profile_runtime to > the linker on Linux and Fuchsia since the generated undefined symbol in > each translation unit that needed it serves the same purpose. > > Differential Revision: https://reviews.llvm.org/D98061 This reverts commit `87fd09b25f`.	2021-03-12 13:53:46 +01:00
Serguei Katkov	cfe8f8e0f0	Revert "Mark gc.relocate and gc.result as readnone" As readnone function they become movable and LICM can hoist them out of a loop. As a result in LCSSA form phi node of type token is created. No one is ready that GCRelocate first operand is phi node but expects to be token. GVN test were also updated, it seems it does not do what is expected. Test for LICM is also added. This reverts commit `f352463ade`.	2021-03-12 16:59:17 +07:00
Johannes Doerfert	0fe0d114e4	Revert "[OpenMP] Introduce the `disable_selector_propagation` variant selector trait" Need to revert `ad9e98b8ef` which this commit depends on. This reverts commit f771ef7b5f0ed260d00931cd50e6fe462edbacaf.	2021-03-11 23:48:35 -06:00
Johannes Doerfert	b2642456ab	[OpenMP] Introduce the `disable_selector_propagation` variant selector trait Nested `omp [begin\|end] declare variant` inherit the selectors from surrounding `omp (begin\|end) declare variant` constructs. To stop such propagation the user can add the `disable_selector_propagation` to the `extension` set in the `implementation` selector. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95765	2021-03-11 23:31:25 -06:00
Carl Ritson	c07f2025e4	[AMDGPU] Restrict image_msaa_load to MSAA dimension types This instruction is only valid on 2D MSAA and 2D MSAA Array surfaces. Remove intrinsic support for other dimension types, and block assembly for unsupported dimensions. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D98397	2021-03-12 09:47:24 +09:00
Florian Hahn	c92ec0dd92	[Matrix] Add support for matrix-by-scalar division. This patch extends the matrix spec to allow matrix-by-scalar division. Originally support for `/` was left out to avoid ambiguity for the matrix-matrix version of `/`, which could either be elementwise or specified as matrix multiplication M1 * (1/M2). For the matrix-scalar version, no ambiguity exists; `*` is also an elementwise operation in that case. Matrix-by-scalar division is commonly supported by systems including Matlab, Mathematica or NumPy. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D97857	2021-03-11 22:21:23 +00:00
Wenlei He	051f2c144e	[SamplePGO] Skip inlinee profile scaling for sample loader inlining For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining. Differential Revision: https://reviews.llvm.org/D98187	2021-03-11 10:18:26 -08:00
David Green	fad70c3068	[ARM] Improve WLS lowering Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729	2021-03-11 17:56:19 +00:00
Hiroshi Yamauchi	365b225d46	[PGO] Fix two issues in PGOMemOPSizeOpt. 1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from the value profile metadata and preserves the remaining entries for the fallback memop call site. If there are more than five entries, the rest of the entries would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes up to 3 (by default) values, but potentially not for other downstream passes that may use the value profile metadata. 2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept track of. When the range buckets were introduced, it was changed to skip the range buckets, but since it does not grab all entries (only five), if some range buckets exist in the first five entries, it could potentially cause fewer promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it means that wrong entries may be preserved, as it didn't correctly keep track of which were entries were skipped. To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number of value profile buckets), keeps track of which entries were skipped, and preserves all the remaining entries. Differential Revision: https://reviews.llvm.org/D97592	2021-03-11 09:53:05 -08:00
Nikita Popov	6312c53870	[IRBuilder] Deprecate CreateLoad APIs with implicit type These APIs are not compatible with opaque pointers. Deprecate them to avoid the introduction of further uses.	2021-03-11 18:46:45 +01:00
Craig Topper	e9426dfbae	[ValueTypes][RISCV] Add MVT for v1f16. RISCV makes all fixed vector MVTs with size less than or equal to a command line option legal. This didn't include v1f16 because it was missing but did include v1f32 and v1f64. One test is affected where we did test this type, but it is a horizontal reduction so it is non-sensical. Perhaps we should canonicalize that away somewhere. I'm not sure if we should be making v1 types legal, but this will at least make RISCV consistent across all types. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98365	2021-03-11 09:23:18 -08:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Stephen Tozer	f40976bd01	Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This reverts commit `c0f3dfb9f1`. Reverted due to an error on the clang-x64-windows-msvc buildbot.	2021-03-11 14:48:01 +00:00
Simon Pilgrim	cc48b45d24	[llvm-mca] Fix uninitialized variable in InOrderIssueStage constructor warning. NFCI.	2021-03-11 14:41:20 +00:00
Simon Pilgrim	9a259f4386	[Transforms] SampleProfileLoaderBaseImpl<BT>::getFunctionLoc - fix Wdocumentation warnings. NFCI.	2021-03-11 14:04:08 +00:00
gbtozers	c0f3dfb9f1	[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands This patch improves salvageDebugInfoImpl by allowing it to salvage arithmetic operations with two or more non-const operands; this includes the GetElementPtr instruction, and most Binary Operator instructions. These salvages produce DIArgList locations and are only valid for dbg.values, as currently variadic DIExpressions must use DW_OP_stack_value. This functionality is also only added for salvageDebugInfoForDbgValues; other functions that directly call salvageDebugInfoImpl (such as in ISel or Coroutine frame building) can be updated in a later patch. Differential Revision: https://reviews.llvm.org/D91722	2021-03-11 13:33:49 +00:00
Serguei Katkov	0480927712	[Statepoint Lowering] Handle the case with several gc.result Recently gc.result has been marked with readnone instead of readonly and this opens a door for different optimization to duplicate gc.result. Statepoint lowering is not ready to see several gc.results. The problem appears when there are gc.results with one located in the same basic block and another located in other basic block. In this case we need both export VR and fill local setValue. Note that this case is not sufficient optimization done before CodeGen. It is evident that local gc.result dominates all other gc.results and it is handled by GVN and EarlyCSE. But anyway, even if IR is not optimal Backend should not crash on a valid IR. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D98393	2021-03-11 18:44:44 +07:00
Simon Pilgrim	e74d626925	[IPO] Fix EXPENSIVE_CHECKS assert added at D83744. NFCI. It wasn't taking into account that QueryingAA was a pointer.	2021-03-11 10:29:15 +00:00
Nikita Popov	403da6a69a	Reapply [LICM] Make promotion faster Relative to the previous implementation, this always uses aliasesUnknownInst() instead of aliasesPointer() to correctly handle atomics. The added test case was previously miscompiled. ----- Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-11 10:50:28 +01:00
Djordje Todorovic	9f41c03f82	[Debugify][OriginalDIMode] Export the report into JSON file By using the original-di check with debugify in the combination with the llvm/utils/llvm-original-di-preservation.py it becomes very user friendly tool. An example of the HTML page with the issues related to debug info can be found at [0]. [0] https://djolertrk.github.io/di-checker-html-report-example/ Differential Revision: https://reviews.llvm.org/D82546	2021-03-11 01:11:13 -08:00
Petr Hosek	c7712087cb	[InstrProfiling] Don't generate __llvm_profile_runtime_user This is no longer needed, we can add __llvm_profile_runtime directly to llvm.compiler.used or llvm.used to achieve the same effect. Differential Revision: https://reviews.llvm.org/D98325	2021-03-10 22:33:51 -08:00
Reid Kleckner	b69db4a7ab	Re-land "[PDB] Defer relocating .debug$S until commit time and parallelize it" This reverts commit `bacf9cf2c5` and reinstates commit `1a9bd5b813`. Reverting this commit did not appear to make the problem go away, so we can go ahead and reland it.	2021-03-10 15:14:09 -08:00
kuterd	d75c9e61a5	[Attributor] Attributor call site specific AAValueConstantRange This patch makes uses of the context bridges introduced in D83299 to make AAValueConstantRange call site specific. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D83744	2021-03-11 01:19:44 +03:00
Quentin Colombet	49942c6d4a	[NFC] Fix a compiler warning Fix a warning caused by -Wrange-loop-analysis Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D98297	2021-03-10 13:28:53 -08:00
Alexey Lapshin	4f16e177e1	[llvm-objcopy][NFC] replace class Buffer/MemBuffer/FileBuffer with streams. During D88827 it was requested to remove the local implementation of Memory/File Buffers: // TODO: refactor the buffer classes in LLVM to enable us to use them here // directly. This patch uses raw_ostream instead of Buffers. Generally, using streams could allow us to reduce memory usages. No need to load all data into the memory - the data could be streamed through a smaller buffer. Thus, this patch uses raw_ostream as an interface for output data: Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In, raw_ostream &Out); Note 1. This patch does not change the implementation of Writers so that data would be directly stored into raw_ostream. This is assumed to be done later. Note 2. It would be better if Writers would be implemented in a such way that data could be streamed without seeking/updating. If that would be inconvenient then raw_ostream could be replaced with raw_pwrite_stream to have a possibility to seek back and update file headers. This is assumed to be done later if necessary. Note 3. Current FileOutputBuffer allows using a memory-mapped file. The raw_fd_ostream (which could be used if data should be stored in the file) does not allow us to use a memory-mapped file. Memory map functionality could be implemented for raw_fd_ostream: It is possible to add resize() method into raw_ostream. class raw_ostream { void resize(uint64_t size); } That method, implemented for raw_fd_ostream, could create a memory-mapped file. The streamed data would be written into that memory file then. Thus we would be able to use memory-mapped files with raw_fd_ostream. This is assumed to be done later if necessary. Differential Revision: https://reviews.llvm.org/D91028	2021-03-10 23:50:04 +03:00
Sriraman Tallam	0ba1ebcbb7	Remove original implementation of UniqueInternalLinkageNames pass. D96109 was recently submitted which contains the refactored implementation of -funique-internal-linakge-names by adding the unique suffixes in clang rather than as an LLVM pass. Deleting the former implementation in this change. Differential Revision: https://reviews.llvm.org/D98234	2021-03-10 11:57:40 -08:00
Quentin Colombet	66dab2fa84	[NFC] Fix compiler warnings Fix warnings caused by -Wrange-loop-analysis. Patch by Xiaoqing Wu <xiaoqing_wu@apple.com> Differential Revision: https://reviews.llvm.org/D98298	2021-03-10 11:03:50 -08:00
Craig Topper	9106d04554	[RISCV][SelectionDAG] Introduce an ISD::SPLAT_VECTOR_PARTS node that can represent a splat of 2 i32 values into a nxvXi64 vector for riscv32. On riscv32, i64 isn't a legal scalar type but we would like to support scalable vectors of i64. This patch introduces a new node that can represent a splat made of multiple scalar values. I've used this new node to solve the current crashes we experience when getConstant is used after type legalization. For RISCV, we are now default expanding SPLAT_VECTOR to SPLAT_VECTOR_PARTS when needed and then handling the SPLAT_VECTOR_PARTS later during LegalizeOps. I've remove the special case I previously put in for ABS for D97991 as the default expansion is now able to succesfully use getConstant. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98004	2021-03-10 09:46:18 -08:00
Craig Topper	0c73a506e8	[RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32. Currently we crash in type legalization any time an intrinsic uses a scalar i64 on RV32. This patch adds support for type legalizing this to prevent crashing. I don't promise that it uses the best possible codegen just that it is functional. This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x intrinsic and intrinsics that take a scalar input, splat it and then do some operation. For vmv.v.x we'll either rely on hardware sign extension for constants or we'll convert it to multiple splats and bit manipulation. For vmv.s.x we use a really unoptimal sequence inspired by what we do for an INSERT_VECTOR_ELT. For the third case we'll either try to use the .vi form for constants or convert to a complicated splat and bitmanip and use the .vv form of the operation. I've renamed the ExtendOperand field to SplatOperand now use it specifically for the third case. The first two cases are handled by custom lowering specifically for those intrinsics. I haven't updated all tests yet, but I tried to cover a subset that includes single-width, widening, and narrowing. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97895	2021-03-10 09:45:38 -08:00
Ta-Wei Tu	7ff2768be1	Revert "[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest`" This reverts commit `df9158c9a4`.	2021-03-11 01:24:43 +08:00
Stephen Tozer	1db137b185	[DebugInfo] Handle DBG_VALUES with multiple variable location operands in MIR This patch adds handling for DBG_VALUE_LIST in the MIR-passes (after finalize-isel), excluding the debug liveness passes and DWARF emission. This most significantly affects MachineSink, which now needs to consider all used registers of a debug value when sinking, but for most passes this change is simply replacing getDebugOperand(0) with an iteration over all debug operands. Differential Revision: https://reviews.llvm.org/D92578	2021-03-10 17:15:24 +00:00
Jingu Kang	25951c5ab8	[AArch64] Add missing intrinsics for scalar FP rounding Differential Revision: https://reviews.llvm.org/D98269	2021-03-10 13:22:29 +00:00
Christudasan Devadasan	4c6ab48fb1	GlobalISel: Try to combine G_[SU]DIV and G_[SU]REM It is good to have a combined `divrem` instruction when the `div` and `rem` are computed from identical input operands. Some targets can lower them through a single expansion that computes both division and remainder. It effectively reduces the number of instructions than individually expanding them. Reviewed By: arsenm, paquette Differential Revision: https://reviews.llvm.org/D96013	2021-03-10 18:46:07 +05:30
Alex Richardson	35bf23e965	Avoid shuffle self-assignment in EXPENSIVE_CHECKS builds Some versions of libstdc++ perform self-assignment in std::shuffle. This breaks the EXPENSIVE_CHECKS builds of TableGen due to an incorrect assertion in libstdc++. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85828. Fixes https://llvm.org/PR37652 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98167	2021-03-10 11:17:34 +00:00
Vladislav Vinogradov	dc8446c2a0	[ADT][NFC] Use `size_t` type for index in `indexed_accessor_range` It makes it consistent with `size()` method return type and with STL-like containers API. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D97921	2021-03-10 11:40:59 +03:00
Wei Mi	ee35784a90	[SampleFDO] Support enabling -funique-internal-linkage-name. now -funique-internal-linkage-name flag is available, and we want to flip it on by default since it is beneficial to have separate sample profiles for different internal symbols with the same name. As a preparation, we want to avoid regression caused by the flip. When we flip -funique-internal-linkage-name on, the profile is collected from binary built without -funique-internal-linkage-name so it has no uniq suffix, but the IR in the optimized build contains the suffix. This kind of mismatch may introduce transient regression. To avoid such mismatch, we introduce a NameTable section flag indicating whether there is any name in the profile containing uniq suffix. Compiler will decide whether to keep uniq suffix during name canonicalization depending on the NameTable section flag. The flag is only available for extbinary format. For other formats, by default compiler will keep uniq suffix so they will only experience transient regression when -funique-internal-linkage-name is just flipped. Another type of regression is caused by places where we miss to call getCanonicalFnName. Those places are fixed. Differential Revision: https://reviews.llvm.org/D96932	2021-03-09 21:41:40 -08:00
Amara Emerson	55e760769b	[GlobalISel] Fold away G_BUILD_VECTOR with all elements extracted. If every element is extracted from a G_BUILD_VECTOR, pass through the source registers. This is different to the extract(build_vector) combine because this one tolerates multiple users as long as they're exhaustive. Differential Revision: https://reviews.llvm.org/D97890	2021-03-09 11:34:26 -08:00
Amara Emerson	e60ab72137	[AArch64][GlobalISel] Add combine for extract_vector_elt(build_vector, cst) Differential Revision: https://reviews.llvm.org/D97835	2021-03-09 11:08:02 -08:00
Fangrui Song	42e3f97a9d	[MC] Change ELFOSABI_NONE to ELFOSABI_GNU for SHF_GNU_RETAIN GNU ld does not give SHF_GNU_RETAIN GC root semantics for ELFOSABI_NONE. (https://sourceware.org/pipermail/binutils/2021-March/115581.html) This allows GNU ld to interpret SHF_GNU_RETAIN and avoids a gold quirk https://sourceware.org/bugzilla/show_bug.cgi?id=27490 Because ELFObjectWriter is in an anonymous namespace, I have to place `markGnuAbi` in the parent MCObjectWriter. Differential Revision: https://reviews.llvm.org/D97976	2021-03-09 09:59:47 -08:00
Nikita Popov	55ae279ba7	[FastISel] Don't trivially kill extractvalues (PR49467) All extractvalues of the same value at the same index will map to the same register, so even if one specific extractvalue only has one use, we should not mark it as a trivial kill, as there may be more extractvalues later. Fixes https://bugs.llvm.org/show_bug.cgi?id=49467. Differential Revision: https://reviews.llvm.org/D98145	2021-03-09 18:46:38 +01:00
gbtozers	f0513413c7	[DebugInfo] Add replaceArg function to simplify DBG_VALUE_LIST expressions The LiveDebugValues and LiveDebugVariables implementations for handling DBG_VALUE_LIST instructions can be simplified significantly if they do not have to deal with any duplicated operands, such as a DBG_VALUE_LIST that uses the same register multiple times in its expression. This patch adds a function, replaceArg, that can be used to simplify a DIExpression in the case of duplicated operands. Differential Revision: https://reviews.llvm.org/D83896	2021-03-09 17:41:04 +00:00
Dave Lee	736afe465f	Revert "[build][modules] Fix ObjCARCUtil.h modularization" This reverts commit `f1b690598e`.	2021-03-09 09:36:47 -08:00
gbtozers	df69c69427	[DebugInfo] Handle multiple variable location operands in IR This patch updates the various IR passes to correctly handle dbg.values with a DIArgList location. This patch does not actually allow DIArgLists to be produced by salvageDebugInfo, and it does not affect any pass after codegen-prepare. Other than that, it should cover every IR pass. Most of the changes simply extend code that operated on a single debug value to operate on the list of debug values in the style of any_of, all_of, for_each, etc. Instances of setOperand(0, ...) have been replaced with with replaceVariableLocationOp, which takes the value that is being replaced as an additional argument. In places where this value isn't readily available, we have to track the old value through to the point where it gets replaced. Differential Revision: https://reviews.llvm.org/D88232	2021-03-09 16:44:38 +00:00
Stefan Gränitz	265bc5af7b	[Orc] Always check mapped sections for ELFDebugObject are in bounds of working memory buffer As stated in the JITLink user guide: Do not assume that the input object is well formed. https://llvm.org/docs/JITLink.html#tips-for-jitlink-backend-developers	2021-03-09 14:01:50 +01:00
Cullen Rhodes	2750f3ed31	[IR] Introduce llvm.experimental.vector.splice intrinsic This patch introduces a new intrinsic @llvm.experimental.vector.splice that constructs a vector of the same type as the two input vectors, based on a immediate where the sign of the immediate distinguishes two variants. A positive immediate specifies an index into the first vector and a negative immediate specifies the number of trailing elements to extract from the first vector. For example: @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count These intrinsics support both fixed and scalable vectors, where the former is lowered to a shufflevector to maintain existing behaviour, although while marked as experimental the recommended way to express this operation for fixed-width vectors is to use shufflevector. For scalable vectors where it is not possible to express a shufflevector mask for this operation, a new ISD node has been implemented. This is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker and Cullen Rhodes. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94708	2021-03-09 10:44:22 +00:00
gbtozers	93b170ea24	[DebugInfo] Handle dbg.values with multiple variable location operands in ISel This patch adds partial support in Instruction Selection for dbg.values that use a DIArgList. This patch does not add support for producing DBG_VALUE_LIST, but adds the logic for processing DIArgLists within the ISel pass. This change is largely focused on handleDebugValue and some of the functions that it calls. Outside of this, salvageDebugInfo and transferDbgValues have been modified to replace individual operands instead of the entire value; dangling debug info for variadic debug values is not currently supported (but may be added later). Differential Revision: https://reviews.llvm.org/D88589	2021-03-09 09:48:03 +00:00
Jan Kratochvil	4289a7f1d7	llvm-dwarfdump: Fix DWARF-5 DW_FORM_implicit_const (used by GCC) Differential Revision: https://reviews.llvm.org/D98195	2021-03-09 09:26:58 +01:00
Akira Hatanaka	dca5737945	Move ObjCARCUtil.h back to llvm/Analysis Instead of adding the header to llvm/IR, just duplicate the marker string in the auto upgrader.	2021-03-08 16:35:24 -08:00
Rahman Lavaee	c245c21c43	[llvm-readelf] Support dumping the BB address map section with --bb-addr-map. This patch lets llvm-readelf dump the content of the BB address map section in the following format: ``` Function { At: <address> BB entries [ { Offset: <offset> Size: <size> Metadata: <metadata> }, ... ] } ... ``` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95511	2021-03-08 16:20:11 -08:00
Dave Lee	f1b690598e	[build][modules] Fix ObjCARCUtil.h modularization This change was introduced by this fix: `a2a55def35`. This makes a submodule for ObjCARCUtil.h, because leaving it in LLVM_intrinsic_gen causes a dependency cycle between LLVM_IR and LLVM_intrinsic_gen.	2021-03-08 15:51:27 -08:00
Benjamin Kramer	0d96ea0792	[ValueTracking] Move matchSimpleRecurrence out of line The header only has a forward declaration of PHINode available, and this function doesn't seem to get much out of inlining.	2021-03-09 00:04:47 +01:00
Sanjay Patel	34d0d644ff	[ValueTracking] move/add helper to get inverse min/max; NFC We will need to this functionality to improve min/max folds in instcombine when we canonicalize to intrinsics.	2021-03-08 17:38:22 -05:00
wlei	c460ef61d6	[CSSPGO][llvm-profgen] Change sample count of dangling probe in llvm-profgen Differential Revision: https://reviews.llvm.org/D96811	2021-03-08 14:36:02 -08:00
Masoud Ataei	820f508b08	[PowerPC] Removing _massv place holder Since P8 is the oldest machine supported by MASSV pass, _massv place holder is removed and the oldest version of MASSV functions is assumed. If the P9 vector specific is detected in the compilation process, the P8 prefix will be updated to P9. Differential Revision: https://reviews.llvm.org/D98064	2021-03-08 21:43:24 +00:00
Jessica Paquette	5c26be214d	[AArch64][GlobalISel] Lower G_BUILD_VECTOR -> G_DUP If we have ``` %vec = G_BUILD_VECTOR %reg, %reg, ..., %reg ``` Then lower it to ``` %vec = G_DUP %reg ``` Also update the selector to handle constant splats on G_DUP. This will not combine when the splat is all zeros or ones. Tablegen-imported patterns rely on these being G_BUILD_VECTOR. Minor code size improvements on CTMark at -Os. Also adds some utility functions to make it a bit easier to recognize splats, and an AArch64-specific splat helper. Differential Revision: https://reviews.llvm.org/D97731	2021-03-08 13:01:10 -08:00
Alina Sbirlea	29482426b5	Revert "[LICM] Make promotion faster" Revert `3d8f842712` Revision triggers a miscompile sinking a store incorrectly outside a threading loop. Detected by tsan. Reverting while investigating. Differential Revision: https://reviews.llvm.org/D89264	2021-03-08 12:53:03 -08:00
Min-Yih Hsu	8dddc15297	[M68k](4/8) MC layer and object file support - Add the M68k-specific MC layer implementation - Add ELF support for M68k - Add M68k-specifc CC and reloc TODO: Currently AsmParser and disassembler are not implemented yet. Please use this bug to track the status: https://bugs.llvm.org/show_bug.cgi?id=48976 Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88390	2021-03-08 12:30:57 -08:00
Min-Yih Hsu	bec7b16692	[M68k](3/8) Skeleton and target description files - Infrastructure for the target (i.e. build files, target triple etc.) - All of the target description TableGen file Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88389	2021-03-08 12:30:57 -08:00
Min-Yih Hsu	6dcc325ce0	[M68k][MIR](2/8) Changes in the target-independent MIR part - Add new callback in `TargetInstrInfo` -- `isPCRelRegisterOperandLegal` -- to query whether pc-rel register MachineOperand is legal. - Add new function to search DebugLoc in a reverse ordering Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88386	2021-03-08 12:30:57 -08:00
Min-Yih Hsu	503343191e	[M68k][TableGen](1/8) TableGen related changes - Add a new TableGen backend: CodeBeads - Add support to generate logical operand information For the first item, it is currently a workaround of M68k's (complex) instruction encoding. A typical architecture, especially CISC one like X86, normally uses `MCInstrDesc::TSFlags` to carry instruction encoding info. However, at the early days of M68k backend development, we found it difficult to fit every possible encoding into the 64-bit `MCInstrDesc::TSFlags`. Therefore CodeBeads was invented to provide an alternative, arbitrary length container for instruciton encoding info. However, in the long term we incline not to use a new TG backend for less common pattern like what we encountered in M68k. A bug has been created to host to discussion on migrating from CodeBeads to more concise solution: https://bugs.llvm.org/show_bug.cgi?id=48792 The second item was also served for similar purpose. It created utility functions that tell you the index of a `MachineOperand` in a `MachineInst` given a logical operand index. In normal cases a logical operand is the same as `MachineOperand`, but for operands using complex addressing mode a logical operand might be consisting of multiple `MachineOperand`. The TableGen-ed `getLogicalOperandIdx`, for instance, can give you the mapping between these two concepts. Nevertheless, we hope to remove this feature in the future if possible. Since it's not really useful for the targets supported by LLVM now either. Authors: myhsu, m4yers, glaubitz Differential Revision: https://reviews.llvm.org/D88385	2021-03-08 12:30:56 -08:00
Victor Huang	621023b218	[AIX][TLS] Add assert check of valid csect type for the storage mapping class XCOFF::XMC_UL This patch adds the assert check inside the constructor for the csect (MCSectionXCOFF) to ensure valid csect type used for the storage mappping class XCOFF:XMC_UL.	2021-03-08 14:18:57 -06:00
Yuta Saito	aa0c571a5f	[WebAssembly] Add new relocation for location relative data This `R_WASM_MEMORY_ADDR_SELFREL_I32` relocation represents an offset between its relocating address and the symbol address. It's very similar to `R_X86_64_PC32` but restricted to be used for only data segments. ``` S + A - P ``` A: Represents the addend used to compute the value of the relocatable field. P: Represents the place of the storage unit being relocated. S: Represents the value of the symbol whose index resides in the relocation entry. Proposal: https://github.com/WebAssembly/tool-conventions/issues/162 Differential Revision: https://reviews.llvm.org/D96659	2021-03-08 11:34:10 -08:00
Philip Reames	d9a29a6752	constify getUnderlyingObject implementation [nfc]	2021-03-08 11:32:54 -08:00
Philip Reames	239a618180	[instcombine] Collapse trivial and recurrences If we have a recurrence of the form <Start, And, Step> we know that the value taken by the recurrence stabilizes on the first iteration (provided step is loop invariant). We can exploit that fact to remove the loop carried dependence in the recurrence. Differential Revision: https://reviews.llvm.org/D97578 (and part)	2021-03-08 09:21:38 -08:00
Philip Reames	97a7bc5831	[gvn] Precisely propagate equalities to phi operands The code used for propagating equalities (e.g. assume facts) was conservative in two ways - one of which this patch fixes. Specifically, it shifts the code reasoning about whether a use is dominated by the end of the assume block to consider phi uses to exist on the predecessor edge. This matches the dominator tree handling for dominates(Edge, Use), and simply extends it to dominates(BB, Use). Note that the decision to use the end of the block is itself a conservative choice. The more precise option would be to use the later of the assume and the value, and replace all uses after that. GVN handles that case separately (with the replace operand mechanism) because it used to be expensive to ask dominator questions within blocks. With the new instruction ordering support, we should probably rewrite this code at some point to simplify. Differential Revision: https://reviews.llvm.org/D98082	2021-03-08 08:59:00 -08:00
gbtozers	e5d958c456	[DebugInfo] Support DIArgList in DbgVariableIntrinsic This patch updates DbgVariableIntrinsics to support use of a DIArgList for the location operand, resulting in a significant change to its interface. This patch does not update all IR passes to support multiple location operands in a dbg.value; the only change is to update the DbgVariableIntrinsic interface and its uses. All code outside of the intrinsic classes assumes that an intrinsic will always have exactly one location operand; they will still support DIArgLists, but only if they contain exactly one Value. Among other changes, the setOperand and setArgOperand functions in DbgVariableIntrinsic have been made private. This is to prevent code from setting the operands of these intrinsics directly, which could easily result in incorrect/invalid operands being set. This does not prevent these functions from being called on a debug intrinsic at all, as they can still be called on any CallInst pointer; it is assumed that any code directly setting the operands on a generic call instruction is doing so safely. The intention for making these functions private is to prevent DIArgLists from being overwritten by code that's naively trying to replace one of the Values it points to, and also to fail fast if a DbgVariableIntrinsic is updated to use a DIArgList without a valid corresponding DIExpression.	2021-03-08 14:36:13 +00:00
Ta-Wei Tu	df9158c9a4	[LoopInterchange] Replace tightly-nesting-ness check with the one from `LoopNest` The check `tightlyNested()` in `LoopInterchange` is similar to the one in `LoopNest`. In fact, the former misses some cases where loop-interchange is not feasible and results in incorrect behaviour. Replacing it with the much robust version provided by `LoopNest` reduces code duplications and fixes https://bugs.llvm.org/show_bug.cgi?id=48113. `LoopInterchange` has a weaker definition of tightly or perfectly nesting-ness than the one implemented in `LoopNest::arePerfectlyNested()`. Therefore, `tightlyNested()` is instead implemented with `LoopNest::checkLoopsStructure` and additional checks for unsafe instructions. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D97290	2021-03-08 11:36:08 +08:00
Mehdi Amini	fe9a4b55da	Fix build post-revert in `8d5a981a13` One commit introduced after the reverted change was using an API introduced there, this is reintroducing the API, but not the original broken change.	2021-03-08 00:59:05 +00:00
Mehdi Amini	8d5a981a13	Revert "[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe" This reverts commit `99108c791d`. Clang is miscompiling LLVM with this change, a stage-2 build hits multiple failures. As a repro, I built clang in a stage1 directory and used it this way: cmake -G Ninja ../llvm \ -DCMAKE_CXX_COMPILER=`pwd`/../build-stage1/bin/clang++ \ -DCMAKE_C_COMPILER=`pwd`/../build-stage1/bin/clang \ -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;AMDGPU" \ -DLLVM_ENABLE_PROJECTS=mlir \ -DLLVM_BUILD_EXAMPLES=ON \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_ASSERTIONS=On ninja check-mlir	2021-03-08 00:15:47 +00:00
Juneyoung Lee	99108c791d	[SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe This patch makes FoldBranchToCommonDest merge branch conditions into `select i1` rather than `and/or i1` when it is called by SimplifyCFG. It is known that merging conditions into and/or is poison-unsafe, and this is towards making things more correct by removing possible miscompilations. Currently, InstCombine simply consumes these selects into and/or of i1 (which is also unsafe), so the visible effect would be very small. The unsafe select -> and/or transformation will be removed in the future. There has been efforts for updating optimizations to support the select form as well, and they are linked to D93065. The safe transformation is fired when it is called by SimplifyCFG only. This is done by setting the new `PoisonSafe` argument as true. Another place that calls FoldBranchToCommonDest is LoopSimplify. `PoisonSafe` flag is set to false in this case because enabling it has a nontrivial impact in performance because SCEV is more conservative with select form and InductiveRangeCheckElimination isn't aware of select form of and/or i1. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D95026	2021-03-08 01:38:03 +09:00
Fangrui Song	bb6732cf62	[MC] Add parseEOL() overload and migrate some parseToken(AsmToken::EndOfStatement) to parseEOL() For many directives, the following diagnostics * `error: unexpected token` * `error: unexpected token in '.abort' directive"` are replaced with `error: expected newline`. `unexpected token` may make the user think a different token is needed. `expected newline` is clearer about the expected token. For `in '...' directive`, the directive name is not useful because the next line replicates the error line which includes the directive.	2021-03-06 17:45:23 -08:00
Fangrui Song	d96af2ed2d	[MC] Support .symver , , remove As a resolution to https://sourceware.org/bugzilla/show_bug.cgi?id=25295 , GNU as from binutils 2.35 supports the optional third argument for the .symver directive. 'remove' for a non-default version is useful: `.symver def_v1, def@v1, remove` => def_v1 is not retained in the symbol table. Previously the user has to strip the original symbol or specify a `local:` version node in a version script to localize the symbol. `.symver def, def@@v1, remove` and `.symver def, def@@@v1, remove` are supported as well, though they are identical to `.symver def, def@@@v1`. local/hidden are not useful so this patch does not implement them.	2021-03-06 15:23:02 -08:00
Vy Nguyen	f8b01d54c3	Reland `293e8fa13d` [llvm-exegesis] Disable the LBR check on AMD https://bugs.llvm.org/show_bug.cgi?id=48918 The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW. Theory we've got is that the lbr-checking code could be confused on AMD. Differential Revision: https://reviews.llvm.org/D97504 New change: - Surround usages of x86 helper in llvm-exegesis/X86/Target.cpp with ifdef - Fix bug which caused the caller of getVendorSignature to not have a copy of EAX that it expected.	2021-03-05 13:23:42 -05:00
Philip Reames	f352463ade	Mark gc.relocate and gc.result as readnone For some reason, we had been marking gc.relocates as reading memory. There's no known reason for this, and I suspect it to be a legacy of very early implementation conservatism. gc.relocate and gc.result are simply projections of the return values from the associated statepoint. Note that the LangRef has always declared them readnone. The EarlyCSE change is simply moving the special casing from readonly to readnone handling. As noted by the test diffs, this does allow some additional CSE when relocates are separated by stores, but since we generate gc.relocates in batches, this is unlikely to help anything in practice. This was reviewed as part of https://reviews.llvm.org/D97974, but split at reviewer request before landing. The motivation is to enable the GVN changes in that patch.	2021-03-05 10:07:17 -08:00
gbtozers	65600cb2a7	[DebugInfo] Add DIArgList MD to store multple values in DbgVariableIntrinsics This patch adds a new metadata node, DIArgList, which contains a list of SSA values. This node is in many ways similar in function to the existing ValueAsMetadata node, with the difference being that it tracks a list instead of a single value. Internally, it uses ValueAsMetadata to track the individual values, but there is also a reasonable amount of DIArgList-specific value-tracking logic on top of that. Similar to ValueAsMetadata, it is a special case in parsing and printing due to the fact that it requires a function state (as it may reference function-local values). This patch should not result in any immediate functional change; it allows for DIArgLists to be parsed and printed, but debug variable intrinsics do not yet recognize them as a valid argument (outside of parsing). Differential Revision: https://reviews.llvm.org/D88175	2021-03-05 17:02:24 +00:00
Stephen Tozer	f677413071	Reapply "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values" Rewrites test to use correct architecture triple; fixes incorrect reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour so as to not be dependent on changes elsewhere in the patch stack. This reverts commit `d2000b45d0`.	2021-03-05 12:32:05 +00:00
Jingu Kang	9b302513f6	[AArch64] Add missing intrinsics for vrnd	2021-03-05 11:26:12 +00:00
Simon Pilgrim	6955524c2f	Fix Wdocumentation unknown parameter warning. NFCI.	2021-03-05 11:24:44 +00:00
Simon Pilgrim	3fd2fa1220	Revert rG8198d83965ba4b9db6922b44ef3041030b2bac39: "[X86] Pass to transform amx intrinsics to scalar operation." This reverts commit 8198d83965ba4b9db6922b44ef3041030b2bac39.due to buildbot breakages	2021-03-05 11:09:14 +00:00
Andy Wingo	a5a3659de7	[WebAssembly][yaml2obj][obj2yaml] Elem sections for nonzero tables With reference types, tables can have non-zero table numbers. This commit adds support for element sections against these tables. Differential Revision: https://reviews.llvm.org/D97923	2021-03-05 11:45:15 +01:00
Petar Avramovic	d44f61f81c	Reland [GlobalISel] Combine zext(trunc x) to x Recommit `4112299ee7`. Depends on `4c8fb7ddd6` which was reverted. Combine zext(trunc x) to x when truncated bits are known to be zero. Differential Revision: https://reviews.llvm.org/D96031	2021-03-05 11:05:37 +01:00
Luo, Yuanke	8198d83965	[X86] Pass to transform amx intrinsics to scalar operation. This pass runs in any situations but we skip it when it is not O0 and the function doesn't have optnone attribute. With -O0, the def of shape to amx intrinsics is near the amx intrinsics code. We are not able to find a point which post-dominate all the shape and dominate all amx intrinsics. To decouple the dependency of the shape, we transform amx intrinsics to scalar operation, so that compiling doesn't fail. In long term, we should improve fast register allocation to allocate amx register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93594	2021-03-05 16:02:02 +08:00
Fangrui Song	063b19dea6	[DebugInfo] Delete unused DIVariable::getSource	2021-03-04 21:53:13 -08:00
Fangrui Song	9e28b89827	[DebugInfo] Delete deleted getLine/getColumn r250405 deleted the functions from DILexicalBlockBase.	2021-03-04 21:49:21 -08:00
Michael Kruse	b119120673	[clang][OpenMP] Use OpenMPIRBuilder for workshare loops. Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to: * Only the worksharing-loop directive. * Recognizes only the nowait clause. * No loop nests with more than one loop. * Untested with templates, exceptions. * Semantic checking left to the existing infrastructure. This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics: * The distance function: How many loop iterations there will be before entering the loop nest. * The loop variable function: Conversion from a logical iteration number to the loop variable. These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang. The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does. For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting. Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset. Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D94973	2021-03-04 22:52:59 -06:00
Wei Mi	2357d29335	[SampleFDO] Another fix to prevent repeated indirect call promotion in sample loader pass. In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9, to prevent repeated indirect call promotion for the same indirect call and the same target, we used zero-count value profile to indicate an indirect call has been promoted for a certain target. We removed PromotedInsns cache in the same patch. However, there was a problem in that patch described below, and that problem led me to add PromotedInsns back as a mitigation in https://reviews.llvm.org/rG4ffad1fb489f691825d6c7d78e1626de142f26cf. When we get value profile from metadata by calling getValueProfDataFromInst, we need to specify the maximum possible number of values we expect to read. We uses MaxNumPromotions in the last patch so the maximum number of value information extracted from metadata is MaxNumPromotions. If we have many values including zero-count values when we write the metadata, some of them will be dropped when we read them because we only read MaxNumPromotions values. It will allow repeated indirect call promotion again. We need to make sure if there are values indicating promoted targets, those values need to be saved in metadata with higher priority than other values. The patch fixed that problem. We change to use -1 to represent the count of a promoted target instead of 0 so it is easier to sort the values. When we prepare to update the metadata in updateIDTMetaData, we will sort the values in the descending count order and extract only MaxNumPromotions values to write into metadata. Since -1 is the max uint64_t number, if we have equal to or less than MaxNumPromotions of -1 count values, they will all be kept in metadata. If we have more than MaxNumPromotions of -1 count values, we will only save MaxNumPromotions such values maximally. In such case, we have logic in place in doesHistoryAllowICP to guarantee no more promotion in sample loader pass will happen for the indirect call, because it has been promoted enough. With this change, now we can remove PromotedInsns without problem. Differential Revision: https://reviews.llvm.org/D97350	2021-03-04 18:44:12 -08:00
Chen Zheng	87bbf3d1f8	[XCOFF][DebugInfo] support DWARF for XCOFF for assembly output. Reviewed By: jasonliu Differential Revision: https://reviews.llvm.org/D95518	2021-03-04 21:07:52 -05:00
David Blaikie	a2a55def35	Move llvm/Analysis/ObjCARCUtil.h to IR to fix layering. This is included from IR files, and IR doesn't/can't depend on Analysis (because Analysis depends on IR). Also fix the implementation - don't use non-member static in headers, as it leads to ODR violations, inaccurate "unused function" warnings, etc. And fix the header protection macro name (we don't generally include "LIB" in the names, so far as I can tell).	2021-03-04 16:14:53 -08:00
Francis Visoiu Mistrih	365b78396a	[Remarks] Emit variable info in auto-init remarks This enhances the auto-init remark with information about the variable that is auto-initialized. This is based of debug info if available, or alloca names (mostly for development purposes). ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes.Variables: var (4096 bytes). [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` This allows to see things like partial initialization of a variable that the optimizer won't be able to completely remove. Differential Revision: https://reviews.llvm.org/D97734	2021-03-04 12:51:22 -08:00
Nicolas Guillemot	6b8cf7356c	Revert "[Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream" This reverts commit `7479a2e00b`. This commit causes compile errors on clang-x64-windows-msvc, so I'm reverting the patch for now. For reference, the error in question is: ``` error C2280: 'llvm::raw_ostream_iterator<char,char> &llvm::raw_ostream_iterator<char,char>::operator =(const llvm::raw_ostream_iterator<char,char> &)': attempting to reference a deleted function note: compiler has generated 'llvm::raw_ostream_iterator<char,char>::operator =' here note: 'llvm::raw_ostream_iterator<char,char> &llvm::raw_ostream_iterator<char,char>::operator =(const llvm::raw_ostream_iterator<char,char> &)': function was implicitly deleted because 'llvm::raw_ostream_iterator<char,char>' has a data member 'llvm::raw_ostream_iterator<char,char>::OutStream' of reference type ```	2021-03-04 12:46:58 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Nicolas Guillemot	7479a2e00b	[Support] Add raw_ostream_iterator: ostream_iterator for raw_ostream Adds a class `raw_ostream_iterator` that behaves like std::ostream_iterator, but can be used with raw_ostream. This is useful for using raw_ostream with std algorithms. For example, it can be used to output std containers as follows: ``` std::vector<int> V = { 1, 2, 3 }; std::copy(V.begin(), V.end(), raw_ostream_iterator<int>(outs(), ", ")); // Output: "1, 2, 3, " ``` The API tries to follow std::ostream_iterator as closely as is practically possible. Reviewed By: dblaikie, mkitzan Differential Revision: https://reviews.llvm.org/D78795	2021-03-04 11:12:48 -08:00
Adrian Prantl	d268febc56	Improve the debug info for coro-split .resume functions This patch updates the scope line to point to the suspend point. This makes the first address in the function point to the first source line in the resume function rather than the function declaration. Without this the line table "jumps" from the beginning of the function to the suspend point at the beginning. rdar://73386346 Differential Revision: https://reviews.llvm.org/D97345	2021-03-04 11:05:35 -08:00
Albion Fung	36192790d8	[PowerPC][PC Rel] Implement option to omit Power10 instructions from stubs Implemented the option to omit Power10 instructions from save stubs via the option --no-power10-stubs or --power10-stubs=no on lld. --power10-stubs= will override the other option. --power10-stubs=auto also exists to use the default behaviour (ie allow Power10 instructions in stubs). Differential Revision: https://reviews.llvm.org/D94627	2021-03-04 13:27:46 -05:00
Nico Weber	76148caa50	Revert "[llvm-exegesis] Disable the LBR check on AMD" This reverts commit `293e8fa13d`. Breaks build on non-intel hosts, see e.g. http://45.33.8.238/macm1/4600/step_3.txt	2021-03-04 11:48:33 -05:00
Xiangling Liao	e9f9ec837d	[CMake][AIX] Adjust plugin library extension used on AIX As stated in the CMake manual, we are supposed to use MODULE rules to generate plugin libraries: "MODULE libraries are plugins that are not linked into other targets but may be loaded dynamically at runtime using dlopen-like functionality" Besides, LLVM's plugin infrastructure fits with the AIX treatment of .so shared objects more than it fits with the AIX treatment of .a library archives (which may contain shared objects). Differential revision: https://reviews.llvm.org/D96282	2021-03-04 11:23:06 -05:00
Vy Nguyen	293e8fa13d	[llvm-exegesis] Disable the LBR check on AMD https://bugs.llvm.org/show_bug.cgi?id=48918 The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW. Theory we've got is that the lbr-checking code could be confused on AMD. Differential Revision: https://reviews.llvm.org/D97504	2021-03-04 11:16:38 -05:00
Sanjay Patel	36a489d194	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC Similar to `b3a33553ae`, but this shows a TODO and a potential miscompile is already present. We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification may be possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers. The new test shows that there is an existing bug somewhere in the callers. We assumed that the original code was fully 'fast' and so we produced IR with 'fast' even though it was just 'reassoc'.	2021-03-04 10:40:26 -05:00
Nico Weber	59beb1ef6d	Revert "[GlobalISel] Combine zext(trunc x) to x" This reverts commit `4112299ee7`. Seems to depend on `4c8fb7ddd6` which is being reverted.	2021-03-04 10:13:40 -05:00
Petar Avramovic	4112299ee7	[GlobalISel] Combine zext(trunc x) to x Combine zext(trunc x) to x when truncated bits are known to be zero. Differential Revision: https://reviews.llvm.org/D96031	2021-03-04 15:05:23 +01:00
Sanjay Patel	b3a33553ae	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification seems possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers.	2021-03-04 08:53:04 -05:00
Stephen Tozer	d2000b45d0	Revert "[DebugInfo] Add new instruction and DIExpression operator for variadic debug values" This reverts commit `d07f106f4a`.	2021-03-04 11:59:21 +00:00
gbtozers	d07f106f4a	[DebugInfo] Add new instruction and DIExpression operator for variadic debug values This patch adds a new instruction that can represent variadic debug values, DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set of basic code changes in MachineInstr and a few adjacent areas, but does not correctly handle variadic debug values outside of these areas, nor does it generate them at any point. The new instruction is similar to the existing DBG_VALUE instruction, with the following differences: the operands are in a different order, any number of values may be used in the instruction following the Variable and Expression operands (these are referred to in code as “debug operands”) and are indexed from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the expression. The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the index given by the argument onto the Expression stack. For example the sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at index 0 onto the expression stack.” Differential Revision: https://reviews.llvm.org/D82363	2021-03-04 11:45:35 +00:00
Andrew Savonichev	d791695cb5	[MCA] Add support for in-order CPUs This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928	2021-03-04 14:08:19 +03:00
Fraser Cormack	c907681b07	[NFC] Fix typos in CallingConvLower.h	2021-03-04 10:29:14 +00:00
Hongtao Yu	c75da238b4	[CSSPGO] Deduplicating dangling pseudo probes. Same dangling probes are redundant since they all have the same semantic that is to rely on the counts inference tool to get reasonable count for the same original block. Therefore, there's no need to keep multiple copies of them. I've seen jump threading created tons of redundant dangling probes that slowed down the compiler dramatically. Other optimization passes can also result in redundant probes though without an observed impact so far. This change removes block-wise redundant dangling probes specifically introduced by jump threading. To support removing redundant dangling probes caused by all other passes, a final function-wise deduplication is also added. An 18% size win of the .pseudo_probe section was seen for SPEC2017. No performance difference was observed. Differential Revision: https://reviews.llvm.org/D97482	2021-03-03 22:44:42 -08:00
Hongtao Yu	8985515822	[CSSPGO] Unblocking optimizations by dangling pseudo probes. This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments. The optimizations being unblocked are: 1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted. 2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions. 3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks. Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D97481	2021-03-03 22:44:42 -08:00
Hongtao Yu	ad2a59f584	[CSSPGO] Introducing dangling pseudo probes. Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block. To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission. If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95962	2021-03-03 22:44:41 -08:00
Johannes Doerfert	5b70c12f3e	[Attributor] Make DepClass a required argument We often used a sub-optimal dependence class in the past because we didn't see the argument. Let's make it explicit so we remember to think about it.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	e592dad82e	[Attributor] Fold "TrackDependence" into the DepClassTy enum We don't need a bool and an enum to express the three options we currently have. This makes the interface nicer and much easier to use optional dependencies. Also avoids mistakes where the bool is false and enum ignored.	2021-03-04 00:35:52 -06:00
Johannes Doerfert	f3f88287c5	[Attributor] Use known alignment as lower bound to avoid work If we know already more than available from a use, we don't need to invest time on it.	2021-03-04 00:35:52 -06:00
Philip Reames	99f5417346	Sink routine for replacing a operand bundle to CallBase [NFC] We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.	2021-03-03 12:07:55 -08:00
Fangrui Song	a84f4fc0df	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker does not let `__start_/__stop_` references retain their sections. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on COFF/Mach-O are not affected. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D97649	2021-03-03 11:32:24 -08:00
Philip Reames	e6e5ef40cb	[basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset This was pointed out in review of D97520 by Nikita, but existed in the original code as well. The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr. The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps. Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds". If that small sub-offset lies within the object, the result was unsound. We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.	2021-03-03 08:43:32 -08:00
Philip Reames	dd9922c487	[basicaa] Minor indentation fix	2021-03-03 08:33:19 -08:00
Arnold Schwaighofer	a42bea211a	[coro async] Allow a coro.suspend.async to specify which argument is the context argument Before we used the same argument as the entry point. The resume partial function might want to use a different ABI for its context argument Differential Revision: https://reviews.llvm.org/D97333	2021-03-03 08:27:37 -08:00
Nico Weber	64f5d7e972	Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF" This reverts commit `04c3040f41`. Breaks instrprof-value-merge.c in bootstrap builds.	2021-03-03 10:21:17 -05:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Hans Wennborg	ddf43e5130	revert llvm/include/llvm/Analysis/ObjCARCUtil.h part of `1cc558bd4f`	2021-03-03 15:50:02 +01:00
Matt Arsenault	78dcff4841	GlobalISel: Add default implementation of assignValueToReg Refactor insertion of the asserting ops. This enables using them for AMDGPU. This code should essentially be the same for every target. Mips, X86 and ARM all have different code there now, but this seems to be an accident. The assignment functions are called with different types than they would be in the DAG, so this is all likely an assortment of hacks to get around that.	2021-03-03 09:29:53 -05:00
Piotr Sobczak	4672bac177	[AMDGPU] Introduce Strict WQM mode * Add amdgcn_strict_wqm intrinsic. * Add a corresponding STRICT_WQM machine instruction. * The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active. * The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96258	2021-03-03 14:19:16 +01:00
Piotr Sobczak	c3ce7bae80	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm * Introduce the new intrinsic amdgcn_strict_wwm * Deprecate the old intrinsic amdgcn_wwm The change is done for consistency as the "strict" prefix will become an important, distinguishing factor between amdgcn_wqm and amdgcn_strictwqm in the future. The "strict" prefix indicates that inactive lanes do not take part in control flow, specifically an inactive lane enabled by a strict mode will always be enabled irrespective of control flow decisions. The amdgcn_wwm will be removed, but doing so in two steps gives users time to switch to the new name at their own pace. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D96257	2021-03-03 09:33:57 +01:00
Carl Ritson	2ddac69f98	[AMDGPU] Rename llvm.amdgcn.msaa.load to llvm.amdgcn.msaa.load.x While the underlying instruction is called image_msaa_load, the resource must be x component only. Rename the intrinsic for clarity. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97829	2021-03-03 17:30:39 +09:00
Victor Huang	1756b2adc9	[AIX][TLS] Generate TLS variables in assembly files This patch allows generating TLS variables in assembly files on AIX. Initialized and external uninitialized variables are generated with the .csect pseudo-op and local uninitialized variables are generated with the .comm/.lcomm pseudo-ops. The patch also adds a check to explicitly say that TLS is not yet supported on AIX. Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile Originally patched by: bsaleil Commandeered by: NeHuang Differential Revision: https://reviews.llvm.org/D96184	2021-03-02 18:22:48 -06:00
Arthur Eubanks	99f1e86cbb	[opt] Error if -debug-pass is specified alongside the new PM Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D97810	2021-03-02 15:59:28 -08:00
Nikita Popov	29034f3876	[AST] Remove unused Loop member (NFC) To fix some build bots after D89264.	2021-03-02 23:11:51 +01:00
Nikita Popov	3d8f842712	[LICM] Make promotion faster Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-02 22:10:48 +01:00
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
Sanjay Patel	415c67ba4c	[SDAG] allow partial undef vector constants with select->logic folds This is an enhancement suggested in the original review/commit: D97730 / `7fce3322a2`	2021-03-02 14:29:15 -05:00
Vy Nguyen	9a2e2de15f	[lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD. Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature. The concrete use case is `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree. This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it. The right thing should be: - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD. - When loading from an actual dylib, no additional TBD documents need to be examined. - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file Differential Revision: https://reviews.llvm.org/D97438	2021-03-02 12:14:31 -05:00
Krzysztof Parzyszek	d96b5e606a	[TableGen] Add IntrNoMerge as intrinsic property There is a function attribute 'nomerge' in addition to 'noduplicate' and 'convergent'. Both 'noduplicate' and 'convergent' have corresponding intrinsic properties. This patch adds an intrinsic property for the 'nomerge' attribute. Differential Revision: https://reviews.llvm.org/D96364	2021-03-02 09:04:50 -08:00
dfukalov	6e967834b9	[AA] Cache (optionally) estimated PartialAlias offsets. For the cases of two clobbering loads and one loaded object is fully contained in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that is actually more common case of partial overlap, it doesn't say anything about actual overlapping sizes. AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs with non-constant offsets. The change stores estimated relative offsets so they can be used further. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93529	2021-03-02 19:04:15 +03:00
Tim Northover	888c5c24ca	AArch64: report fp16 arithmetic is present for apple-a11 CPU. AArch64.td got it right, but the target-parser dropped it, leading to missing feature flags in Clang.	2021-03-02 15:07:18 +00:00
Stefan Gränitz	ef2389235c	[Orc] Add JITLink debug support plugin for ELF x86-64 Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that. The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`. This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them. Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility. The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future. [1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97335	2021-03-02 15:07:35 +01:00
Stefan Gränitz	48c2acff0c	[JITLink] LinkGraph::getName() can be const	2021-03-02 15:07:34 +01:00
Jan Svoboda	4545813b17	[clang][cli] NFC: Rename marshalling multiclass The new name drops `String` from `MarshallingInfoStringInt`, which follows the naming convention of other marshalling multiclasses.	2021-03-02 11:53:40 +01:00
Yuanfang Chen	5de2d189e6	[Diagnose] Unify MCContext and LLVMContext diagnosing The situation with inline asm/MC error reporting is kind of messy at the moment. The errors from MC layout are not reliably propagated and users have to specify an inlineasm handler separately to get inlineasm diagnose. The latter issue is not a correctness issue but could be improved. * Kill LLVMContext inlineasm diagnose handler and migrate it to use DiagnoseInfo/DiagnoseHandler. * Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This covers use cases like inlineasm, MC, and any clients using SourceMgr. * Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all use cases, only one of them is used. * If LLVMContext is available, let MCContext uses LLVMContext's diagnose handler; if LLVMContext is not available, MCContext uses its own default diagnose handler which just prints SMDiagnostic. * Change a few clients(Clang, llc, lldb) to use the new way of reporting. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D97449	2021-03-01 15:58:37 -08:00
Matt Arsenault	0131498402	GlobalISel: Remove dead code Generic code should probably not introduce G_INSERT/G_EXTRACT. The mirror unpackRegs should also be removed, but AMDGPU still has a use remaining which needs to be fixed.	2021-03-01 17:06:43 -05:00
Fangrui Song	04c3040f41	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker no longer lets `__start_/__stop_` references retain them. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on other COFF/Mach-O are not affected. Differential Revision: https://reviews.llvm.org/D97649	2021-03-01 13:43:23 -08:00
Nicolas Guillemot	6fb6bdff37	Fix the value_type of defusechain_iterator to match its operator() defusechain_iterator has an operator() and operator->() that return references to a MachineOperand, but its "reference" and "pointer" typedefs are set as if the iterator returns a MachineInstr reference. This causes compilation errors when defusechain_iterator is used in generic code that uses the "reference" and "pointer" typedefs. This patch fixes this by updating the typedefs to use MachineOperand instead of MachineInstr. Reviewed By: mkitzan Differential Revision: https://reviews.llvm.org/D97522	2021-03-01 10:41:10 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Jay Foad	216dee9170	[AMDGPU] Add IntrWillReturn to recently added intrinsics This adds IntrWillReturn to the gfx90a mfma intrinsics, to match all the other mfma intrinsics, and llvm.amdgcn.live.mask, to match llvm.amdgcn.ps.live. Differential Revision: https://reviews.llvm.org/D97675	2021-03-01 17:35:26 +00:00
Juneyoung Lee	c89d9d8a48	[TTI] Consider select form of and/or i1 as having arithmetic cost This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b` as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`. Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe. This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked. These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97360	2021-03-02 02:18:19 +09:00
Andy Wingo	2632ba6a35	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via TABLE_NUMBER relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. Differential Revision: https://reviews.llvm.org/D90948	2021-03-01 16:49:00 +01:00
Masoud Ataei	5fe0cab79e	[PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined. Inline sqrt is faster than MASSV call on leppc. Differential Revision: https://reviews.llvm.org/D97487	2021-03-01 15:42:19 +00:00
Jay Foad	796a60d2ea	[AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32) The expected use case is for frontends to insert this into shaders that are to be run under a debugger. The shader can then be resumed or single stepped from the point of the call under debugger control. Differential Revision: https://reviews.llvm.org/D97670	2021-03-01 14:30:23 +00:00
Matt Arsenault	6c260d3bc0	GlobalISel: Move splitToValueTypes to generic code I copied the nearly identical function from AArch64 into AMDGPU, so fix this duplication. Mips and X86 have their own more exotic versions which should be removed. However replacing those is better left for a separate patch since it requires other changes to avoid regressions.	2021-03-01 08:58:18 -05:00
Florian Hahn	53dacb7b67	[LV] Generate RT checks up-front and remove them if required. This patch updates LV to generate the runtime checks just after cost modeling, to allow a more precise estimate of the actual cost of the checks. This information will be used in future patches to generate larger runtime checks in cases where the checks only make up a small fraction of the expected scalar loop execution time. The runtime checks are created up-front in a temporary block to allow better estimating the cost and un-linked from the existing IR. After deciding to vectorize, the checks are moved backed. If deciding not to vectorize, the temporary block is completely removed. This patch is similar in spirit to D71053, but explores a different direction: instead of delaying the decision on whether to vectorize in the presence of runtime checks it instead optimistically creates the runtime checks early and discards them later if decided to not vectorize. This has the advantage that the cost-modeling decisions can be kept together and can be done up-front and thus preserving the general code structure. I think delaying (part) of the decision to vectorize would also make the VPlan migration a bit harder. One potential drawback of this patch is that we speculatively generate IR which we might have to clean up later. However it seems like the code required to do so is quite manageable. Reviewed By: lebedev.ri, ebrevnov Differential Revision: https://reviews.llvm.org/D75980	2021-03-01 10:48:04 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
Juneyoung Lee	5419b67137	[SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally This is a minor change that fixes FoldTwoEntryPHINode to handle phis with and/ors of select form and binop form equally.	2021-03-01 13:34:51 +09:00
Kazu Hirata	b4bed1cb24	[IR] Use range-based for loops (NFC)	2021-02-28 10:59:23 -08:00
Kazu Hirata	d639120983	[llvm] Use set_is_subset (NFC)	2021-02-28 10:59:20 -08:00
William S. Moses	b077d82b00	[Attributor] Conditinoally delete fns Allow the attributor to delete functions only if requested Differential Revision: https://reviews.llvm.org/D97238	2021-02-27 20:37:42 -05:00
Heejin Ahn	aa097ef8d4	[WebAssembly] Fix reverse mapping in WasmEHFuncInfo D97247 added the reverse mapping from unwind destination to their source, but it had a critical bug; sources can be multiple, because multiple BBs can have a single BB as their unwind destination. This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes it return a vector rather than a single BB. It does not return the const reference to the existing vector but creates a new vector because `WasmEHFuncInfo` stores not `BasicBlock` or `MachineBasicBlock` but `PointerUnion` of them. Also I hoped to unify those methods for `BasicBlock` and `MachineBasicBlock` into one using templates to reduce duplication, but failed because various usages require `BasicBlock*` to be `const` but it's hard to make it `const` for `MachineBasicBlock` usages. Fixes https://github.com/emscripten-core/emscripten/issues/13514. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97583	2021-02-26 17:12:10 -08:00
Fangrui Song	47c5576d7d	ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects If a global object is listed in `@llvm.used`, place it in a unique section with the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections` with LLD>=13 or GNU ld>=2.36. For front ends which do not expect to see multiple sections of the same name, consider emitting `@llvm.compiler.used` instead of `@llvm.used`. SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in binutils. We don't do the restriction - see the rationale in D95749. The integrated assembler has supported SHF_GNU_RETAIN since D95730. GNU as>=2.36 supports section flag 'R'. We don't need to worry about GNU ld support because older GNU ld just ignores the unknown SHF_GNU_RETAIN. With this change, `__attribute__((retain))` functions/variables emitted by clang will get the SHF_GNU_RETAIN flag. Differential Revision: https://reviews.llvm.org/D97448	2021-02-26 16:38:44 -08:00
Philip Reames	f2cfef3596	Be more mathematicly precise about definition of recurrence [NFC] This clarifies the interface of the matchSimpleRecurrence helper introduced in `8020be0b8` for non-commutative operators. After `ebd3aeba`, I realized the original way I framed the routine was inconsistent. For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information. So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed. I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks. (It was for me.)	2021-02-26 11:22:01 -08:00
Philip Reames	ebd3aeba27	Use helper introduced in `8020be0b8` to simplify ValueTracking [NFC] Direct rewrite of the code the helper was extracted from.	2021-02-26 10:47:26 -08:00
Philip Reames	8020be0b8b	Add a helper for matching simple recurrence cycles This helper came up in another review, and I've got about 4 different patches with copies of this copied into it. Time to precommit the routine. :)	2021-02-26 10:21:23 -08:00
Mircea Trofin	a2bfc43ae1	[NFC] Const-ed 2 APIs in VirtRegMap	2021-02-26 09:32:42 -08:00
Mircea Trofin	a00f7dc2d5	[NFC] MCRegister fixes in RegisterClassInfo, and const-ed APIs	2021-02-26 08:53:57 -08:00
Vladislav Vinogradov	9909237d99	[ADT][NFC] Add extra typedefs to `ArrayRef` and `MutableArrayRef` * `value_type` * `pointer` * `const_pointer` * `reference` * `const_reference` * `const_reverse_iterator` * `size_type` * `difference_type` It makes `ArrayRef` and `MutableArrayRef` types fully compliant with STL Container concept. Reviewed By: lattner, courbet Differential Revision: https://reviews.llvm.org/D95611	2021-02-26 18:37:08 +03:00
Evgeniy Brevnov	13a5cac2ba	Revert "[NARY-REASSOCIATE] Support reassociation of min/max" This reverts commit `83d134c3c4`.	2021-02-26 19:47:54 +07:00
Stefan Gränitz	406ef36b03	[Orc] Use extensible RTTI for the orc::ObjectLayer class hierarchy So far we had no way to distinguish between JITLink and RuntimeDyld in lli. Instead, we used implicit knowledge that RuntimeDyld would be used for linking ELF. In order to get D97337 to work with lli though, we have to move on and allow JITLink for ELF. This patch uses extensible RTTI to allow external clients to add their own layers without touching the LLVM sources. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97338	2021-02-26 13:13:05 +01:00
Chen Zheng	d39bc36b1b	[debug-info] refactor emitDwarfUnitLength remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we can make caller of emitDwarfUnitLength easier. Reviewed By: MaskRay, dblaikie, ikudrin Differential Revision: https://reviews.llvm.org/D96409	2021-02-25 21:00:25 -05:00
James Y Knight	24539f1ef2	Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg. And then push those change throughout LLVM. Keep the old signature in Clang's CGBuilder for now -- that will be updated in a follow-on patch (D97224). The MLIR LLVM-IR dialect is not updated to support the new alignment attribute, but preserves its existing behavior. Differential Revision: https://reviews.llvm.org/D97223	2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih	fee9abe69c	[Remarks] Provide more information about auto-init calls This now analyzes calls to both intrinsics and functions. For intrinsics, grab the ones we know and care about (mem* family) and analyze the arguments. For calls, use TLI to get more information about the libcalls, then analyze the arguments if known. ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` Differential Revision: https://reviews.llvm.org/D97489	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	4753a69a31	[Remarks] Provide more information about auto-init stores This adds support for analyzing the instruction with the !annotation "auto-init" in order to generate a more user-friendly remark. For now, support the store size, and whether it's atomic/volatile. Example: ``` auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks] int var; ^ ``` Differential Revision: https://reviews.llvm.org/D97412	2021-02-25 15:14:09 -08:00
Adrian Prantl	00b3f2f310	Add more historic DWARF vendor extensions The maintainer of libdwarf kindly provided this patch with a bunch of historic DWARF extensions that are missing from Dwarf.def. This list is helpful to avoid potential conflicts in the user-defined vendor extension space in the future. Patch by David Anderson! [Relanded with an updated test.] Differential Revision: https://reviews.llvm.org/D97242	2021-02-25 15:09:42 -08:00
Richard Smith	95d0d8e9e9	Fix constructor declarations that are invalid in C++20 onwards. Under C++ CWG DR 2237, the constructor for a class template C must be written as 'C(...)' not as 'C<T>(...)'. This fixes a build failure with GCC in C++20 mode. In passing, remove some other redundant '<T>' qualification from the affected classes.	2021-02-25 14:25:01 -08:00
Fangrui Song	4d63892acb	[SanitizerCoverage] Drop !associated on metadata sections In SanitizerCoverage, the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After inlining, such a `__sancov_*` section can be referenced by more than one functions, but its sh_link still refers to the original function's section. (Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to section violates the invariant.) If the original function's section is discarded (e.g. LTO internalization + `ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error. This above reasoning means that `!associated` is not appropriate to be called by an inlinable function. Non-interposable functions are inline candidates, so we have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections but is expected to parallel a metadata section, so we have to make sure the two sections are retained or discarded at the same time. A section group does the trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return non-null.) For interposable functions, we could keep using `!associated`, but LTO can change the linkage to `internal` and allow such functions to be inlinable, so we have to drop `!associated`, too. To not interfere with section group resolution, we need to use the `noduplicates` variant (section group flag 0). (This allows us to get rid of the ModuleID parameter.) In -fno-pie and -fpie code (mostly dso_local), instrumented interposable functions have WeakAny/LinkOnceAny linkages, which are rare. So the section group header overload should be low. This patch does not change the object file output for COFF (where `!associated` is ignored). Reviewed By: morehouse, rnk, vitalybuka Differential Revision: https://reviews.llvm.org/D97430	2021-02-25 11:59:23 -08:00
Stanislav Mekhanoshin	d9c99043bd	Option to ignore llvm[.compiler].used uses in hasAddressTaken() Differential Revision: https://reviews.llvm.org/D96087	2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin	29e2d9461a	Option to ignore assume like intrinsic uses in hasAddressTaken() Differential Revision: https://reviews.llvm.org/D96081	2021-02-25 09:48:29 -08:00
Jon Roelofs	7f6e331645	Support `#pragma clang section` directives on MachO targets rdar://59560986 Differential Revision: https://reviews.llvm.org/D97233	2021-02-25 09:30:10 -08:00
Rong Xu	6103b6ad69	[SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class This patch makes SampleProfileLoaderBaseImpl a template class so it can be used in CodeGen transformation. Noticeable changes: * use one template parameter and use IRTraits to get other used types an type specific functions. * remove the temporary "inline" keywords in previous refactor patch. * change the template function findEquivalencesFor to a regular function. This function has a single caller with type of PostDominatorTree. It's simpler to use the type directly because MachinePostDominatorTree is not a derived type of template DominatorTreeBase. Differential Revision: https://reviews.llvm.org/D96981	2021-02-25 08:26:17 -08:00
Fraser Cormack	b368fc735d	[CodeGen] Format code comment to 80 columns. NFC.	2021-02-25 15:55:21 +00:00
Jan Svoboda	43cac1d27d	[clang][cli] NFC: Remove ArgList infrastructure for recording queries This patch removes the infrastructure for recording queries in `ArgList`, partially reverting D94472. The infrastructure was used during command line round-trip to determine which arguments should a certain subset of `CompilerInvocation` generate. Since D96280, the command line arguments are being generated all at once, making this code no longer necessary. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D96325	2021-02-25 13:53:24 +01:00
Evgeniy Brevnov	83d134c3c4	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88287	2021-02-25 18:22:39 +07:00

... 2 3 4 5 6 ...

44488 Commits