llvm-project

Commit Graph

Author	SHA1	Message	Date
Samuel Eubanks	47dbee6790	Make NPM OptBisectInstrumentation use global singleton OptBisect Currently there is an issue where the legacy pass manager uses a different OptBisect counter than the new pass manager. This fix makes the npm OptBisectInstrumentation use the global OptBisect. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D92897	2020-12-20 13:47:56 -08:00
Kazu Hirata	3285ee143b	[Analysis, IR, CodeGen] Use llvm::erase_if (NFC)	2020-12-20 09:19:35 -08:00
Kazu Hirata	805d59593f	[Analysis, CodeGen, IR] Use contains (NFC)	2020-12-18 19:08:17 -08:00
Chih-Ping Chen	5f75dcf571	[DebugInfo] Support Fortran 'use <external module>' statement. The main change is to add a 'IsDecl' field to DIModule so that when IsDecl is set to true, the debug info entry generated for the module would be marked as a declaration. That way, the debugger would look up the definition of the module in the gloabl scope. Please see the comments in llvm/test/DebugInfo/X86/dimodule.ll for what the debug info entries would look like. Differential Revision: https://reviews.llvm.org/D93462	2020-12-18 13:10:57 -05:00
Whitney Tsang	2a814cd9e1	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-18 17:37:17 +00:00
Rong Xu	3733463dbb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `d20e0c3444`.	2020-12-17 21:00:37 +00:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Jun Ma	0138399903	[InstCombine] Remove scalable vector restriction in InstCombineCasts Differential Revision: https://reviews.llvm.org/D93389	2020-12-17 22:02:33 +08:00
Barry Revzin	92310454bf	Make LLVM build in C++20 mode Part of the <=> changes in C++20 make certain patterns of writing equality operators ambiguous with themselves (sorry!). This patch goes through and adjusts all the comparison operators such that they should work in both C++17 and C++20 modes. It also makes two other small C++20-specific changes (adding a constructor to a type that cases to be an aggregate, and adding casts from u8 literals which no longer have type const char*). There were four categories of errors that this review fixes. Here are canonical examples of them, ordered from most to least common: // 1) Missing const namespace missing_const { struct A { #ifndef FIXED bool operator==(A const&); #else bool operator==(A const&) const; #endif }; bool a = A{} == A{}; // error } // 2) Type mismatch on CRTP namespace crtp_mismatch { template <typename Derived> struct Base { #ifndef FIXED bool operator==(Derived const&) const; #else // in one case changed to taking Base const& friend bool operator==(Derived const&, Derived const&); #endif }; struct D : Base<D> { }; bool b = D{} == D{}; // error } // 3) iterator/const_iterator with only mixed comparison namespace iter_const_iter { template <bool Const> struct iterator { using const_iterator = iterator<true>; iterator(); template <bool B, std::enable_if_t<(Const && !B), int> = 0> iterator(iterator<B> const&); #ifndef FIXED bool operator==(const_iterator const&) const; #else friend bool operator==(iterator const&, iterator const&); #endif }; bool c = iterator<false>{} == iterator<false>{} // error \|\| iterator<false>{} == iterator<true>{} \|\| iterator<true>{} == iterator<false>{} \|\| iterator<true>{} == iterator<true>{}; } // 4) Same-type comparison but only have mixed-type operator namespace ambiguous_choice { enum Color { Red }; struct C { C(); C(Color); operator Color() const; bool operator==(Color) const; friend bool operator==(C, C); }; bool c = C{} == C{}; // error bool d = C{} == Red; } Differential revision: https://reviews.llvm.org/D78938	2020-12-17 10:44:10 +00:00
Hongtao Yu	ac068e014b	[CSSPGO] Consume pseudo-probe-based AutoFDO profile This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`. Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites). Adjustment to profile format A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums. Differential Revision: https://reviews.llvm.org/D92347	2020-12-16 15:57:18 -08:00
Bangtian Liu	c10757200d	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `cf638d793c`.	2020-12-16 11:52:30 +00:00
Bangtian Liu	cf638d793c	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-15 23:32:29 +00:00
Fangrui Song	41c3b27139	[IR] Delete deprecated DebugLoc::get	2020-12-15 14:53:12 -08:00
Johannes Doerfert	b9c77542e2	[Clang][Attr] Introduce the `assume` function attribute The `assume` attribute is a way to provide additional, arbitrary information to the optimizer. For now, assumptions are restricted to strings which will be accumulated for a function and emitted as comma separated string function attribute. The key of the LLVM-IR function attribute is `llvm.assume`. Similar to `llvm.assume` and `__builtin_assume`, the `assume` attribute provides a user defined assumption to the compiler. A follow up patch will introduce an LLVM-core API to query the assumptions attached to a function. We also expect to add more options, e.g., expression arguments, to the `assume` attribute later on. The `omp [begin] asssumes` pragma will leverage this attribute and expose the functionality in the absence of OpenMP. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D91979	2020-12-15 16:51:34 -06:00
Kazu Hirata	504e4be2c1	[IR] Remove isPowerOf2ByteWidth The predicate used to be used with the C backend, which was removed on Mar 23, 2012 in commit `64a232343a`. It seems to be unused since then.	2020-12-14 23:00:17 -08:00
Gulfem Savrun Yeniceri	7c0e3a77bc	[clang][IR] Add support for leaf attribute This patch adds support for leaf attribute as an optimization hint in Clang/LLVM. Differential Revision: https://reviews.llvm.org/D90275	2020-12-14 14:48:17 -08:00
Matt Arsenault	2e0e03c6a0	OpaquePtr: Require byval on x86_intrcc parameter 0 Currently the backend special cases x86_intrcc and treats the first parameter as byval. Make the IR require byval for this parameter to remove this special case, and avoid the dependence on the pointee element type. Fixes bug 46672. I'm not sure the IR is enforcing all the calling convention constraints. clang seems to ignore the attribute for empty parameter lists, but the IR tolerates it.	2020-12-14 16:34:37 -05:00
Fangrui Song	b5ad32ef5c	Migrate deprecated DebugLoc::get to DILocation::get This migrates all LLVM (except Kaleidoscope and CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get. The CodeGen/StackProtector.cpp usage may have a nullptr Scope and can trigger an assertion failure, so I don't migrate it. Reviewed By: #debug-info, dblaikie Differential Revision: https://reviews.llvm.org/D93087	2020-12-11 12:45:22 -08:00
David Sherwood	9b76160e53	[Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174	2020-12-11 08:12:54 +00:00
Hongtao Yu	705a4c149d	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 17:29:28 -08:00
Mitch Phillips	7ead5f5aa3	Revert "[CSSPGO] Pseudo probe encoding and emission." This reverts commit `b035513c06`. Reason: Broke the ASan buildbots: http://lab.llvm.org:8011/#/builders/5/builds/2269	2020-12-10 15:53:39 -08:00
Hongtao Yu	b035513c06	[CSSPGO] Pseudo probe encoding and emission. This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. ELF object emission The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission. Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index. Assembling Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis. A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file. A example assembly looks like: ``` foo2: # @foo2 # %bb.0: # %bb0 pushq %rax testl %edi, %edi .pseudoprobe 837061429793323041 1 0 0 je .LBB1_1 # %bb.2: # %bb2 .pseudoprobe 837061429793323041 6 2 0 callq foo .pseudoprobe 837061429793323041 3 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq .LBB1_1: # %bb1 .pseudoprobe 837061429793323041 5 1 0 callq %rsi .pseudoprobe 837061429793323041 2 0 0 .pseudoprobe 837061429793323041 4 0 0 popq %rax retq # -- End function .section .pseudo_probe_desc,"",@progbits .quad 6699318081062747564 .quad 72617220756 .byte 3 .ascii "foo" .quad 837061429793323041 .quad 281547593931412 .byte 4 .ascii "foo2" ``` With inlining turned on, the assembly may look different around %bb2 with an inlined probe: ``` # %bb.2: # %bb2 .pseudoprobe 837061429793323041 3 0 .pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6 .pseudoprobe 837061429793323041 4 0 popq %rax retq ``` Disassembling* We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file. An example disassembly looks like: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D91878	2020-12-10 09:50:08 -08:00
Florian Hahn	bb9cef7628	[CallBase] Add hasRetAttr version that takes StringRef. This makes it slightly easier to deal with custom attributes and CallBase already provides hasFnAttr versions that support both AttrKind and StringRef arguments in a similar fashion. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D92567	2020-12-10 17:00:16 +00:00
Sander de Smalen	d568cff696	[LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. * Steps are scaled by `vscale`, a runtime value. * Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately. This can vectorize the following loop [1]: void loop(int N, double a, double b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } [1] This source-level example is based on the pragma proposed separately in D89031. This patch only implements the LLVM part. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91077	2020-12-09 11:25:21 +00:00
Joe Ellis	80c33de2d3	[SelectionDAG] Add llvm.vector.{extract,insert} intrinsics This commit adds two new intrinsics. - llvm.experimental.vector.insert: used to insert a vector into another vector starting at a given index. - llvm.experimental.vector.extract: used to extract a subvector from a larger vector starting from a given index. The codegen work for these intrinsics has already been completed; this commit is simply exposing the existing ISD nodes to LLVM IR. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D91362	2020-12-09 11:08:41 +00:00
Cullen Rhodes	4167a0259e	[IR] Support scalable vectors in CastInst::CreatePointerCast Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92482	2020-12-09 10:39:36 +00:00
Simon Moll	3ffbc79357	[VP] Build VP SDNodes Translate VP intrinsics to VP_* SDNodes. The tests check whether a matching vp_* SDNode is emitted. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D91441	2020-12-09 11:36:51 +01:00
Roman Lebedev	a2876ec74f	[NFC][Instructions] Refactor CmpInst::getFlippedStrictnessPredicate() in terms of is{,Non}StrictPredicate()/get{Non,}StrictPredicate() In particular, this creates getStrictPredicate() method, to be symmetrical with already-existing getNonStrictPredicate().	2020-12-09 12:43:08 +03:00
Kazu Hirata	70de324046	[IR] Use llvm::is_contained (NFC)	2020-12-08 19:06:37 -08:00
Yuanfang Chen	1821265db6	[Time-report] Add a flag -ftime-report={per-pass,per-pass-run} to control the pass timing aggregation Currently, -ftime-report + new pass manager emits one line of report for each pass run. This potentially causes huge output text especially with regular LTO or large single file (Obeserved in private tests and was reported in D51276). The behaviour of -ftime-report + legacy pass manager is emitting one line of report for each pass object which has relatively reasonable text output size. This patch adds a flag `-ftime-report=` to control time report aggregation for new pass manager. The flag is for new pass manager only. Using it with legacy pass manager gives an error. It is a driver and cc1 flag. `per-pass` is the new default so `-ftime-report` is aliased to `-ftime-report=per-pass`. Before this patch, functionality-wise `-ftime-report` is aliased to `-ftime-report=per-pass-run`. * Adds an boolean variable TimePassesHandler::PerRun to control per-pass vs per-pass-run. * Adds a new clang CodeGen flag CodeGenOptions::TimePassesPerRun to work with the existing CodeGenOptions::TimePasses. * Remove FrontendOptions::ShowTimers, its uses are replaced by the existing CodeGenOptions::TimePasses. * Remove FrontendTimesIsEnabled (It was introduced in D45619 which was largely reverted.) Differential Revision: https://reviews.llvm.org/D92436	2020-12-08 10:13:19 -08:00
Cullen Rhodes	2cfbdaf601	[IR] Remove CastInst::isCastable since it is not used It was removed back in 2013 (`f63dfbb`) by Matt Arsenault but then reverted since DragonEgg used it, but that project is no longer maintained. Reviewed By: ldionne, dexonsmith Differential Revision: https://reviews.llvm.org/D92571	2020-12-08 10:31:53 +00:00
Cullen Rhodes	7b1cb47150	[IR] Bail out for scalable vectors in ShuffleVectorInst::isConcat Shuffle mask for concat can't be expressed for scalable vectors, so we should bail out. A test has been added that previously crashed, also tested isIdentityWithPadding and isIdentityWithExtract where we already bail out. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D92475	2020-12-07 10:48:35 +00:00
Arthur Eubanks	7f6f9f4cf9	[NewPM] Make pass adaptors less templatey Currently PassBuilder.cpp is by far the file that takes longest to compile. This is due to tons of templates being instantiated per pass. Follow PassManager by using wrappers around passes to avoid making the adaptors templated on the pass type. This allows us to move various adaptors' run methods into .cpp files. This reduces the compile time of PassBuilder.cpp on my machine from 66 to 39 seconds. It also reduces the size of opt from 685M to 676M. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D92616	2020-12-04 08:30:50 -08:00
Arthur Eubanks	2f0de58294	[NewPM] Support --print-before/after in NPM This changes --print-before/after to be a list of strings rather than legacy passes. (this also has the effect of not showing the entire list of passes in --help-hidden after --print-before/after, which IMO is great for making it less verbose). Currently PrintIRInstrumentation passes the class name rather than pass name to llvm::shouldPrintBeforePass(), meaning llvm::shouldPrintBeforePass() never functions as intended in the NPM. There is no easy way of converting class names to pass names outside of within an instance of PassBuilder. This adds a map of pass class names to their short names in PassRegistry.def within PassInstrumentationCallbacks. It is populated inside the constructor of PassBuilder, which takes a PassInstrumentationCallbacks. Add a pointer to PassInstrumentationCallbacks inside PrintIRInstrumentation and use the newly created map. This is a bit hacky, but I can't think of a better way since the short id to class name only exists within PassRegistry.def. This also doesn't handle passes not in PassRegistry.def but rather added via PassBuilder::registerPipelineParsingCallback(). llvm/test/CodeGen/Generic/print-after.ll doesn't seem very useful now with this change. Reviewed By: ychen, jamieschmeiser Differential Revision: https://reviews.llvm.org/D87216	2020-12-03 16:52:14 -08:00
Fangrui Song	756fa8b9be	[Metadata] Fix layer violation in D91576 There is a library layering issue. LLVMAnalysis provides llvm/Analysis/ScopedNoAliasAA.h and depends on LLVMCore. LLVMCore provides llvm/IR/Metadata.cpp and it should not include a header file in LLVMAnalysis	2020-12-03 10:58:46 -08:00
modimo	1860331932	[MemCpyOpt] Correctly merge alias scopes during call slot optimization When MemCpyOpt performs call slot optimization it will concatenate the `alias.scope` metadata between the function call and the memcpy. However, scoped AA relies on the domains in metadata to be maintained in a caller-callee relationship. Naive concatenation breaks this assumption leading to bad AA results. The fix is to take the intersection of domains then union the scopes within those domains. The original bug came from a case of rust bad codegen which uses this bad aliasing to perform additional memcpy optimizations. As show in the added test case `%src` got forwarded past its lifetime leading to a dereference of garbage data. Testing ninja check-llvm Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D91576	2020-12-03 09:23:37 -08:00
Xun Li	80b0f74c8c	Small improvements to Intrinsic::getName While I was adding a new intrinsic instruction (not overloaded), I accidentally used CreateUnaryIntrinsic to create the intrinsics, which turns out to be passing the type list to getName, and ended up naming the intrinsics function with type suffix, which leads to wierd bugs latter on. It took me a long time to debug. It seems a good idea to add an assertion in getName so that it fails if types are passed but it's not a overloaded function. Also, the overloade version of getName is less efficient because it creates an std::string. We should avoid calling it if we know that there are no types provided. Differential Revision: https://reviews.llvm.org/D92523	2020-12-02 16:49:12 -08:00
Nick Desaulniers	bc044a88ee	[Inline] prevent inlining on stack protector mismatch It's common for code that manipulates the stack via inline assembly or that has to set up its own stack canary (such as the Linux kernel) would like to avoid stack protectors in certain functions. In this case, we've been bitten by numerous bugs where a callee with a stack protector is inlined into an attribute((no_stack_protector)) caller, which generally breaks the caller's assumptions about not having a stack protector. LTO exacerbates the issue. While developers can avoid this by putting all no_stack_protector functions in one translation unit together and compiling those with -fno-stack-protector, it's generally not very ergonomic or as ergonomic as a function attribute, and still doesn't work for LTO. See also: https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/ https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u SSP attributes can be ordered by strength. Weakest to strongest, they are: ssp, sspstrong, sspreq. Callees with differing SSP attributes may be inlined into each other, and the strongest attribute will be applied to the caller. (No change) After this change: * A callee with no SSP attributes will no longer be inlined into a caller with SSP attributes. * The reverse is also true: a callee with an SSP attribute will not be inlined into a caller with no SSP attributes. * The alwaysinline attribute overrides these rules. Functions that get synthesized by the compiler may not get inlined as a result if they are not created with the same stack protector function attribute as their callers. Alternative approach to https://reviews.llvm.org/D87956. Fixes pr/47479. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: rnk, MaskRay Differential Revision: https://reviews.llvm.org/D91816	2020-12-02 11:00:16 -08:00
Leonard Chan	19bdc8e5a3	[llvm] Fix for failing test from `fdbd84c6c8` When handling a DSOLocalEquivalent operand change: - Remove assertion checking that the `To` type and current type are the same type. This is not always a requirement. - Add a missing bitcast from an old DSOLocalEquivalent to the type of the new one.	2020-12-01 15:47:55 -08:00
Wei Wang	93dc1b5b8c	[Remarks][2/2] Expand remarks hotness threshold option support in more tools This is the #2 of 2 changes that make remarks hotness threshold option available in more tools. The changes also allow the threshold to sync with hotness threshold from profile summary with special value 'auto'. This change expands remarks hotness threshold option -fdiagnostics-hotness-threshold in clang and *-remarks-hotness-threshold in other tools to utilize hotness threshold from profile summary. Remarks hotness filtering relies on several driver options. Table below lists how different options are correlated and affect final remarks outputs: \| profile \| hotness \| threshold \| remarks printed \| \|---------\|---------\|-----------\|-----------------\| \| No \| No \| No \| All \| \| No \| No \| Yes \| None \| \| No \| Yes \| No \| All \| \| No \| Yes \| Yes \| None \| \| Yes \| No \| No \| All \| \| Yes \| No \| Yes \| None \| \| Yes \| Yes \| No \| All \| \| Yes \| Yes \| Yes \| >=threshold \| In the presence of profile summary, it is often more desirable to directly use the hotness threshold from profile summary. The new argument value 'auto' indicates threshold will be synced with hotness threshold from profile summary during compilation. The "auto" threshold relies on the availability of profile summary. In case of missing such information, no remarks will be generated. Differential Revision: https://reviews.llvm.org/D85808	2020-11-30 21:55:50 -08:00
Wei Wang	3acda91742	[Remarks][1/2] Expand remarks hotness threshold option support in more tools This is the #1 of 2 changes that make remarks hotness threshold option available in more tools. The changes also allow the threshold to sync with hotness threshold from profile summary with special value 'auto'. This change modifies the interface of lto::setupLLVMOptimizationRemarks() to accept remarks hotness threshold. Update all the tools that use it with remarks hotness threshold options: * lld: '--opt-remarks-hotness-threshold=' * llvm-lto2: '--pass-remarks-hotness-threshold=' * llvm-lto: '--lto-pass-remarks-hotness-threshold=' * gold plugin: '-plugin-opt=opt-remarks-hotness-threshold=' Differential Revision: https://reviews.llvm.org/D85809	2020-11-30 21:55:49 -08:00
Leonard Chan	4d7f3d68f1	[llvm] Fix for failing test from `cf8ff75bad` Handle null values when handling operand changes for DSOLocalEquivalent.	2020-11-30 17:22:28 -08:00
Nikita Popov	b5f23189fb	[DL] Inline getAlignmentInfo() implementation (NFC) Apart from getting the entry in the table (which is already a separate function), the remaining logic is different for all alignment types and is better combined with getAlignment(). This is a minor efficiency improvement, and should make further improvements like using separate storage for different alignment types simpler.	2020-11-30 20:56:15 +01:00
Nick Lewycky	fe43168348	Creating a named struct requires only a Context and a name, but looking up a struct by name requires a Module. The method on Module merely accesses the LLVMContextImpl and no data from the module itself, so this patch moves getTypeByName to a static method on StructType that takes a Context and a name. There's a small number of users of this function, they are all updated. This updates the C API adding a new method LLVMGetTypeByName2 that takes a context and a name. Differential Revision: https://reviews.llvm.org/D78793	2020-11-30 11:34:12 -08:00
Sanjay Patel	9eb2c0113d	[IR][LoopRotate] remove assertion that phi must have at least one operand This was suggested in D92247 - I initially committed an alternate fix ( `bfd2c216ea` ) to avoid the crash/assert shown in https://llvm.org/PR48296 , but that was reverted because it caused msan failures on other tests. We can try to revive that patch using the test included here, but I do not have an immediate plan to isolate that problem.	2020-11-30 11:32:42 -05:00
Sanjay Patel	1dc38f8cfb	[IR] improve code comment/logic in removePredecessor(); NFC This was suggested in the post-commit review of `ce134da4b1`.	2020-11-30 10:51:30 -05:00
Sanjay Patel	355aee3dcd	Revert "[IR][LoopRotate] avoid leaving phi with no operands (PR48296)" This reverts commit `bfd2c216ea`. This appears to be causing stage2 msan failures on buildbots: FAIL: LLVM :: Transforms/SimplifyCFG/X86/bug-25299.ll (65872 of 71835) ****************** TEST 'LLVM :: Transforms/SimplifyCFG/X86/bug-25299.ll' FAILED ****************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SimplifyCFG/X86/bug-25299.ll -simplifycfg -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SimplifyCFG/X86/bug-25299.ll -- Exit Code: 2 Command Output (stderr): -- ==87374==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x9de47b6 in getBasicBlockIndex /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/Instructions.h:2749:5 #1 0x9de47b6 in simplifyCommonResume /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:4112:23 #2 0x9de47b6 in simplifyResume /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:4039:12 #3 0x9de47b6 in (anonymous namespace)::SimplifyCFGOpt::simplifyOnce(llvm::BasicBlock) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6330:16 #4 0x9dcca13 in run /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6358:16 #5 0x9dcca13 in llvm::simplifyCFG(llvm::BasicBlock, llvm::TargetTransformInfo const&, llvm::SimplifyCFGOptions const&, llvm::SmallPtrSetImpl<llvm::BasicBlock>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6369:8 #6 0x974643d in iterativelySimplifyCFG(	2020-11-30 10:15:42 -05:00
Sanjay Patel	bfd2c216ea	[IR][LoopRotate] avoid leaving phi with no operands (PR48296) https://llvm.org/PR48296 shows an example where we delete all of the operands of a phi without actually deleting the phi, and that is currently considered invalid IR. The reduced test included here would crash for that reason. A suggested follow-up is to loosen the assert to allow 0-operand phis in unreachable blocks. Differential Revision: https://reviews.llvm.org/D92247	2020-11-30 09:28:45 -05:00
Juneyoung Lee	9c49dcc356	[ConstantFold] Don't fold and/or i1 poison to poison (NFC) .. because it causes miscompilation when combined with select i1 -> and/or. It is the select fold which is incorrect; but it is costly to disable the fold, so hack this one. D92270	2020-11-30 22:58:31 +09:00

1 2 3 4 5 ...

4578 Commits