llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	8a316045ed	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
Sanjay Patel	415c67ba4c	[SDAG] allow partial undef vector constants with select->logic folds This is an enhancement suggested in the original review/commit: D97730 / `7fce3322a2`	2021-03-02 14:29:15 -05:00
Vy Nguyen	9a2e2de15f	[lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD. Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature. The concrete use case is `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree. This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it. The right thing should be: - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD. - When loading from an actual dylib, no additional TBD documents need to be examined. - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file Differential Revision: https://reviews.llvm.org/D97438	2021-03-02 12:14:31 -05:00
Krzysztof Parzyszek	d96b5e606a	[TableGen] Add IntrNoMerge as intrinsic property There is a function attribute 'nomerge' in addition to 'noduplicate' and 'convergent'. Both 'noduplicate' and 'convergent' have corresponding intrinsic properties. This patch adds an intrinsic property for the 'nomerge' attribute. Differential Revision: https://reviews.llvm.org/D96364	2021-03-02 09:04:50 -08:00
dfukalov	6e967834b9	[AA] Cache (optionally) estimated PartialAlias offsets. For the cases of two clobbering loads and one loaded object is fully contained in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that is actually more common case of partial overlap, it doesn't say anything about actual overlapping sizes. AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs with non-constant offsets. The change stores estimated relative offsets so they can be used further. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93529	2021-03-02 19:04:15 +03:00
Tim Northover	888c5c24ca	AArch64: report fp16 arithmetic is present for apple-a11 CPU. AArch64.td got it right, but the target-parser dropped it, leading to missing feature flags in Clang.	2021-03-02 15:07:18 +00:00
Stefan Gränitz	ef2389235c	[Orc] Add JITLink debug support plugin for ELF x86-64 Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that. The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`. This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them. Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility. The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future. [1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97335	2021-03-02 15:07:35 +01:00
Stefan Gränitz	48c2acff0c	[JITLink] LinkGraph::getName() can be const	2021-03-02 15:07:34 +01:00
Jan Svoboda	4545813b17	[clang][cli] NFC: Rename marshalling multiclass The new name drops `String` from `MarshallingInfoStringInt`, which follows the naming convention of other marshalling multiclasses.	2021-03-02 11:53:40 +01:00
Yuanfang Chen	5de2d189e6	[Diagnose] Unify MCContext and LLVMContext diagnosing The situation with inline asm/MC error reporting is kind of messy at the moment. The errors from MC layout are not reliably propagated and users have to specify an inlineasm handler separately to get inlineasm diagnose. The latter issue is not a correctness issue but could be improved. * Kill LLVMContext inlineasm diagnose handler and migrate it to use DiagnoseInfo/DiagnoseHandler. * Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This covers use cases like inlineasm, MC, and any clients using SourceMgr. * Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all use cases, only one of them is used. * If LLVMContext is available, let MCContext uses LLVMContext's diagnose handler; if LLVMContext is not available, MCContext uses its own default diagnose handler which just prints SMDiagnostic. * Change a few clients(Clang, llc, lldb) to use the new way of reporting. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D97449	2021-03-01 15:58:37 -08:00
Matt Arsenault	0131498402	GlobalISel: Remove dead code Generic code should probably not introduce G_INSERT/G_EXTRACT. The mirror unpackRegs should also be removed, but AMDGPU still has a use remaining which needs to be fixed.	2021-03-01 17:06:43 -05:00
Fangrui Song	04c3040f41	[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF `__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not referenced via relocation in the translation unit. With `-z start-stop-gc` (D96914 https://sourceware.org/bugzilla/show_bug.cgi?id=27451), the linker no longer lets `__start_/__stop_` references retain them. Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make them retained by the linker. This patch changes most existing `UsedVars` cases to `CompilerUsedVars` to reflect the ideal state - if the binary format properly supports section based GC (dead stripping), `llvm.compiler.used` should be sufficient. `__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars` since we want them to be unconditionally retained by both compiler and linker. Behaviors on other COFF/Mach-O are not affected. Differential Revision: https://reviews.llvm.org/D97649	2021-03-01 13:43:23 -08:00
Nicolas Guillemot	6fb6bdff37	Fix the value_type of defusechain_iterator to match its operator() defusechain_iterator has an operator() and operator->() that return references to a MachineOperand, but its "reference" and "pointer" typedefs are set as if the iterator returns a MachineInstr reference. This causes compilation errors when defusechain_iterator is used in generic code that uses the "reference" and "pointer" typedefs. This patch fixes this by updating the typedefs to use MachineOperand instead of MachineInstr. Reviewed By: mkitzan Differential Revision: https://reviews.llvm.org/D97522	2021-03-01 10:41:10 -08:00
Arthur Eubanks	040c1b49d7	Move EntryExitInstrumentation pass location This seems to be more of a Clang thing rather than a generic LLVM thing, so this moves it out of LLVM pipelines and as Clang extension hooks into LLVM pipelines. Move the post-inline EEInstrumentation out of the backend pipeline and into a late pass, similar to other sanitizer passes. It doesn't fit into the codegen pipeline. Also fix up EntryExitInstrumentation not running at -O0 under the new PM. PR49143 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D97608	2021-03-01 10:08:10 -08:00
Jay Foad	216dee9170	[AMDGPU] Add IntrWillReturn to recently added intrinsics This adds IntrWillReturn to the gfx90a mfma intrinsics, to match all the other mfma intrinsics, and llvm.amdgcn.live.mask, to match llvm.amdgcn.ps.live. Differential Revision: https://reviews.llvm.org/D97675	2021-03-01 17:35:26 +00:00
Juneyoung Lee	c89d9d8a48	[TTI] Consider select form of and/or i1 as having arithmetic cost This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b` as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`. Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe. This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked. These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97360	2021-03-02 02:18:19 +09:00
Andy Wingo	2632ba6a35	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via TABLE_NUMBER relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. Differential Revision: https://reviews.llvm.org/D90948	2021-03-01 16:49:00 +01:00
Masoud Ataei	5fe0cab79e	[PowerPC] Removing sqrtd2 and sqrtf4 from list of vectorizable function with MASSV Under -O3 and -Ofast, the MASSV conversion prevents the sqrt call to be inlined. Inline sqrt is faster than MASSV call on leppc. Differential Revision: https://reviews.llvm.org/D97487	2021-03-01 15:42:19 +00:00
Jay Foad	796a60d2ea	[AMDGPU] New intrinsic void llvm.amdgcn.s.sethalt(i32) The expected use case is for frontends to insert this into shaders that are to be run under a debugger. The shader can then be resumed or single stepped from the point of the call under debugger control. Differential Revision: https://reviews.llvm.org/D97670	2021-03-01 14:30:23 +00:00
Matt Arsenault	6c260d3bc0	GlobalISel: Move splitToValueTypes to generic code I copied the nearly identical function from AArch64 into AMDGPU, so fix this duplication. Mips and X86 have their own more exotic versions which should be removed. However replacing those is better left for a separate patch since it requires other changes to avoid regressions.	2021-03-01 08:58:18 -05:00
Florian Hahn	53dacb7b67	[LV] Generate RT checks up-front and remove them if required. This patch updates LV to generate the runtime checks just after cost modeling, to allow a more precise estimate of the actual cost of the checks. This information will be used in future patches to generate larger runtime checks in cases where the checks only make up a small fraction of the expected scalar loop execution time. The runtime checks are created up-front in a temporary block to allow better estimating the cost and un-linked from the existing IR. After deciding to vectorize, the checks are moved backed. If deciding not to vectorize, the temporary block is completely removed. This patch is similar in spirit to D71053, but explores a different direction: instead of delaying the decision on whether to vectorize in the presence of runtime checks it instead optimistically creates the runtime checks early and discards them later if decided to not vectorize. This has the advantage that the cost-modeling decisions can be kept together and can be done up-front and thus preserving the general code structure. I think delaying (part) of the decision to vectorize would also make the VPlan migration a bit harder. One potential drawback of this patch is that we speculatively generate IR which we might have to clean up later. However it seems like the code required to do so is quite manageable. Reviewed By: lebedev.ri, ebrevnov Differential Revision: https://reviews.llvm.org/D75980	2021-03-01 10:48:04 +00:00
Fraser Cormack	6718fda6ad	[CodeGen] Fix issues with subvector intrinsic index types This patch addresses issues arising from the fact that the index type used for subvector insertion/extraction is inconsistent between the intrinsics and SDNodes. The intrinsic forms require i64 whereas the SDNodes use the type returned by SelectionDAG::getVectorIdxTy. Rather than update the intrinsic definitions to use an overloaded index type, this patch fixes the issue by transforming the index to the correct type as required. Any loss of index bits going from i64 to a smaller type is unexpected, and will be caught by an assertion in SelectionDAG::getVectorIdxConstant. The patch also updates the documentation for INSERT_SUBVECTOR and adds an assertion to its creation to bring it in line with EXTRACT_SUBVECTOR. This necessitated changes to AArch64 which was using i64 for EXTRACT_SUBVECTOR but i32 for INSERT_SUBVECTOR. Only one test changed its codegen after updating the backend accordingly. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D97459	2021-03-01 10:28:21 +00:00
Juneyoung Lee	5419b67137	[SimplifyCFG] Update FoldTwoEntryPHINode to handle and/or of select and binop equally This is a minor change that fixes FoldTwoEntryPHINode to handle phis with and/ors of select form and binop form equally.	2021-03-01 13:34:51 +09:00
Kazu Hirata	b4bed1cb24	[IR] Use range-based for loops (NFC)	2021-02-28 10:59:23 -08:00
Kazu Hirata	d639120983	[llvm] Use set_is_subset (NFC)	2021-02-28 10:59:20 -08:00
William S. Moses	b077d82b00	[Attributor] Conditinoally delete fns Allow the attributor to delete functions only if requested Differential Revision: https://reviews.llvm.org/D97238	2021-02-27 20:37:42 -05:00
Heejin Ahn	aa097ef8d4	[WebAssembly] Fix reverse mapping in WasmEHFuncInfo D97247 added the reverse mapping from unwind destination to their source, but it had a critical bug; sources can be multiple, because multiple BBs can have a single BB as their unwind destination. This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes it return a vector rather than a single BB. It does not return the const reference to the existing vector but creates a new vector because `WasmEHFuncInfo` stores not `BasicBlock` or `MachineBasicBlock` but `PointerUnion` of them. Also I hoped to unify those methods for `BasicBlock` and `MachineBasicBlock` into one using templates to reduce duplication, but failed because various usages require `BasicBlock*` to be `const` but it's hard to make it `const` for `MachineBasicBlock` usages. Fixes https://github.com/emscripten-core/emscripten/issues/13514. (More precisely, fixes https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744) Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97583	2021-02-26 17:12:10 -08:00
Fangrui Song	47c5576d7d	ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects If a global object is listed in `@llvm.used`, place it in a unique section with the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections` with LLD>=13 or GNU ld>=2.36. For front ends which do not expect to see multiple sections of the same name, consider emitting `@llvm.compiler.used` instead of `@llvm.used`. SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in binutils. We don't do the restriction - see the rationale in D95749. The integrated assembler has supported SHF_GNU_RETAIN since D95730. GNU as>=2.36 supports section flag 'R'. We don't need to worry about GNU ld support because older GNU ld just ignores the unknown SHF_GNU_RETAIN. With this change, `__attribute__((retain))` functions/variables emitted by clang will get the SHF_GNU_RETAIN flag. Differential Revision: https://reviews.llvm.org/D97448	2021-02-26 16:38:44 -08:00
Philip Reames	f2cfef3596	Be more mathematicly precise about definition of recurrence [NFC] This clarifies the interface of the matchSimpleRecurrence helper introduced in `8020be0b8` for non-commutative operators. After `ebd3aeba`, I realized the original way I framed the routine was inconsistent. For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information. So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed. I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks. (It was for me.)	2021-02-26 11:22:01 -08:00
Philip Reames	ebd3aeba27	Use helper introduced in `8020be0b8` to simplify ValueTracking [NFC] Direct rewrite of the code the helper was extracted from.	2021-02-26 10:47:26 -08:00
Philip Reames	8020be0b8b	Add a helper for matching simple recurrence cycles This helper came up in another review, and I've got about 4 different patches with copies of this copied into it. Time to precommit the routine. :)	2021-02-26 10:21:23 -08:00
Mircea Trofin	a2bfc43ae1	[NFC] Const-ed 2 APIs in VirtRegMap	2021-02-26 09:32:42 -08:00
Mircea Trofin	a00f7dc2d5	[NFC] MCRegister fixes in RegisterClassInfo, and const-ed APIs	2021-02-26 08:53:57 -08:00
Vladislav Vinogradov	9909237d99	[ADT][NFC] Add extra typedefs to `ArrayRef` and `MutableArrayRef` * `value_type` * `pointer` * `const_pointer` * `reference` * `const_reference` * `const_reverse_iterator` * `size_type` * `difference_type` It makes `ArrayRef` and `MutableArrayRef` types fully compliant with STL Container concept. Reviewed By: lattner, courbet Differential Revision: https://reviews.llvm.org/D95611	2021-02-26 18:37:08 +03:00
Evgeniy Brevnov	13a5cac2ba	Revert "[NARY-REASSOCIATE] Support reassociation of min/max" This reverts commit `83d134c3c4`.	2021-02-26 19:47:54 +07:00
Stefan Gränitz	406ef36b03	[Orc] Use extensible RTTI for the orc::ObjectLayer class hierarchy So far we had no way to distinguish between JITLink and RuntimeDyld in lli. Instead, we used implicit knowledge that RuntimeDyld would be used for linking ELF. In order to get D97337 to work with lli though, we have to move on and allow JITLink for ELF. This patch uses extensible RTTI to allow external clients to add their own layers without touching the LLVM sources. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97338	2021-02-26 13:13:05 +01:00
Chen Zheng	d39bc36b1b	[debug-info] refactor emitDwarfUnitLength remove `Hi` `Lo` argument from `emitDwarfUnitLength`, so we can make caller of emitDwarfUnitLength easier. Reviewed By: MaskRay, dblaikie, ikudrin Differential Revision: https://reviews.llvm.org/D96409	2021-02-25 21:00:25 -05:00
James Y Knight	24539f1ef2	Add Alignment argument to IRBuilder CreateAtomicRMW and CreateAtomicCmpXchg. And then push those change throughout LLVM. Keep the old signature in Clang's CGBuilder for now -- that will be updated in a follow-on patch (D97224). The MLIR LLVM-IR dialect is not updated to support the new alignment attribute, but preserves its existing behavior. Differential Revision: https://reviews.llvm.org/D97223	2021-02-25 18:29:42 -05:00
Francis Visoiu Mistrih	fee9abe69c	[Remarks] Provide more information about auto-init calls This now analyzes calls to both intrinsics and functions. For intrinsics, grab the ones we know and care about (mem* family) and analyze the arguments. For calls, use TLI to get more information about the libcalls, then analyze the arguments if known. ``` auto-init.c:4:7: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 4096 bytes. [-Rpass-missed=annotation-remarks] int var[1024]; ^ ``` Differential Revision: https://reviews.llvm.org/D97489	2021-02-25 15:14:09 -08:00
Francis Visoiu Mistrih	4753a69a31	[Remarks] Provide more information about auto-init stores This adds support for analyzing the instruction with the !annotation "auto-init" in order to generate a more user-friendly remark. For now, support the store size, and whether it's atomic/volatile. Example: ``` auto-init.c:4:7: remark: Store inserted by -ftrivial-auto-var-init.Store size: 4 bytes. [-Rpass-missed=annotation-remarks] int var; ^ ``` Differential Revision: https://reviews.llvm.org/D97412	2021-02-25 15:14:09 -08:00
Adrian Prantl	00b3f2f310	Add more historic DWARF vendor extensions The maintainer of libdwarf kindly provided this patch with a bunch of historic DWARF extensions that are missing from Dwarf.def. This list is helpful to avoid potential conflicts in the user-defined vendor extension space in the future. Patch by David Anderson! [Relanded with an updated test.] Differential Revision: https://reviews.llvm.org/D97242	2021-02-25 15:09:42 -08:00
Richard Smith	95d0d8e9e9	Fix constructor declarations that are invalid in C++20 onwards. Under C++ CWG DR 2237, the constructor for a class template C must be written as 'C(...)' not as 'C<T>(...)'. This fixes a build failure with GCC in C++20 mode. In passing, remove some other redundant '<T>' qualification from the affected classes.	2021-02-25 14:25:01 -08:00
Fangrui Song	4d63892acb	[SanitizerCoverage] Drop !associated on metadata sections In SanitizerCoverage, the metadata sections (`__sancov_guards`, `__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After inlining, such a `__sancov_*` section can be referenced by more than one functions, but its sh_link still refers to the original function's section. (Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to section violates the invariant.) If the original function's section is discarded (e.g. LTO internalization + `ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error. This above reasoning means that `!associated` is not appropriate to be called by an inlinable function. Non-interposable functions are inline candidates, so we have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections but is expected to parallel a metadata section, so we have to make sure the two sections are retained or discarded at the same time. A section group does the trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return non-null.) For interposable functions, we could keep using `!associated`, but LTO can change the linkage to `internal` and allow such functions to be inlinable, so we have to drop `!associated`, too. To not interfere with section group resolution, we need to use the `noduplicates` variant (section group flag 0). (This allows us to get rid of the ModuleID parameter.) In -fno-pie and -fpie code (mostly dso_local), instrumented interposable functions have WeakAny/LinkOnceAny linkages, which are rare. So the section group header overload should be low. This patch does not change the object file output for COFF (where `!associated` is ignored). Reviewed By: morehouse, rnk, vitalybuka Differential Revision: https://reviews.llvm.org/D97430	2021-02-25 11:59:23 -08:00
Stanislav Mekhanoshin	d9c99043bd	Option to ignore llvm[.compiler].used uses in hasAddressTaken() Differential Revision: https://reviews.llvm.org/D96087	2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin	29e2d9461a	Option to ignore assume like intrinsic uses in hasAddressTaken() Differential Revision: https://reviews.llvm.org/D96081	2021-02-25 09:48:29 -08:00
Jon Roelofs	7f6e331645	Support `#pragma clang section` directives on MachO targets rdar://59560986 Differential Revision: https://reviews.llvm.org/D97233	2021-02-25 09:30:10 -08:00
Rong Xu	6103b6ad69	[SampleFDO][NFC] Refactor: make SampleProfileLoaderBaseImpl a template class This patch makes SampleProfileLoaderBaseImpl a template class so it can be used in CodeGen transformation. Noticeable changes: * use one template parameter and use IRTraits to get other used types an type specific functions. * remove the temporary "inline" keywords in previous refactor patch. * change the template function findEquivalencesFor to a regular function. This function has a single caller with type of PostDominatorTree. It's simpler to use the type directly because MachinePostDominatorTree is not a derived type of template DominatorTreeBase. Differential Revision: https://reviews.llvm.org/D96981	2021-02-25 08:26:17 -08:00
Fraser Cormack	b368fc735d	[CodeGen] Format code comment to 80 columns. NFC.	2021-02-25 15:55:21 +00:00
Jan Svoboda	43cac1d27d	[clang][cli] NFC: Remove ArgList infrastructure for recording queries This patch removes the infrastructure for recording queries in `ArgList`, partially reverting D94472. The infrastructure was used during command line round-trip to determine which arguments should a certain subset of `CompilerInvocation` generate. Since D96280, the command line arguments are being generated all at once, making this code no longer necessary. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D96325	2021-02-25 13:53:24 +01:00
Evgeniy Brevnov	83d134c3c4	[NARY-REASSOCIATE] Support reassociation of min/max Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88287	2021-02-25 18:22:39 +07:00
Jan Svoboda	88e45f00c1	[clang][cli] Add MarshallingInfoEnum multiclass This patch introduces a tablegen multiclass called `MarshallingInfoEnum`. It has the same semantics as `MarshallingInfoString` had in combination with `AutoNormalizeEnum`, but it's easier to use and follows the convention used for other `MarshallingInfoXxx` multiclasses. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D97375	2021-02-25 08:47:18 +01:00
Liu, Chen3	4bc7c8631a	[X86] Support amx-bf16 intrinsic. Adding support for intrinsics of AMX-BF16. This patch alse fix a bug that AMX-INT8 instructions will be selected with wrong predicate. Differential Revision: https://reviews.llvm.org/D97358	2021-02-25 09:06:48 +08:00
Greg McGary	151990dd94	[lld-macho] add code signature for native arm64 macOS Differential Revision: https://reviews.llvm.org/D96164	2021-02-24 17:05:23 -08:00
Duncan P. N. Exon Smith	01701646d5	Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs This is a follow up to `22a52dfddc` and a revert of `df763188c9`. With this change, we only skip cloning distinct nodes in MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the no-longer-needed local helper `cloneOrBuildODR()`. Skipping cloning in other cases is unsound and breaks CloneModule, which is why the textual IR for PR48841 didn't pass previously. This commit adds the test as: Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll Cloning less often exposed a hole in subprogram cloning in CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function has a subprogram attachment whose scope is a DICompositeType that shouldn't be cloned, but it has no internal debug info pointing at that type, that composite type was being cloned. This commit plugs that hole, calling DebugInfoFinder::processSubprogram from CloneFunctionInto. As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit message, I think we need to formalize ownership of metadata a bit more so that ValueMapper/CloneFunctionInto (and similar functions) can deal with cloning (or not) metadata in a more generic, less fragile way. This fixes PR48841. Differential Revision: https://reviews.llvm.org/D96734	2021-02-24 12:57:52 -08:00
Duncan P. N. Exon Smith	1e1b92f76d	IR: Rename Metadata::ImplicitCode to SubclassData1, NFC Metadata::ImplicitCode is a bit shaved off of Metadata::Storage, currently only in use by the subclass DILocation. However, the bit isn't reserved for that purpose. Rename it `SubclassData1` to make it clear that it has nothing to do with Metadata itself (and other subclasses are free to use it). As a drive-by, remove an old TODO about exposing bits to subclasses (looks like that has mostly been done). No functionality change here. Differential Revision: https://reviews.llvm.org/D96740	2021-02-24 12:56:26 -08:00
James Y Knight	c2487bf7df	Remove a workaround for MSVC 2013, now that MSVC 2017 is the minimum. In MSVC 2013, 'alignas(integer-template-arg)' didn't compile; verified on godbolt that this now works properly.	2021-02-24 13:56:49 -05:00
Lang Hames	8380d07e39	[JITLink] Add assertions, fix a comment. The new assertions check that Addressables removed when removing external or absolute symbols are not referenced by another symbol. A comment on post-fixup passes is updated: vmaddrs have all been set up by the time the pre-fixup passes are run, post-fixup passes run after fixups have been applied to content.	2021-02-24 21:02:37 +11:00
Dan Liew	7d3ef103b5	[ASan] Introduce a way set different ways of emitting module destructors. Previously there was no way to control how module destructors were emitted by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang) to be able to decide how to emit these destructors (if at all). This patch introduces the `AsanDtorKind` enum that represents the different ways destructors can be emitted. There are currently only two valid ways to emit destructors. * `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default. * `None` - Do not emit module destructors. The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated to take the `AsanDtorKind` as an argument. The `-asan-destructor-kind=` command line argument has been introduced to make this easy to test from `opt`. If this argument is specified it overrides the value passed to the `ModuleAddressSanitizerPass` constructor. Note that `AsanDtorKind` is not `bool` because we will introduce a new way to emit destructors in a subsequent patch. Note that `AsanDtorKind` is given its own header file because if it is declared in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error (Module is ambiguous) when trying to use it in `clang/Basic/CodeGenOptions.def`. rdar://71609176 Differential Revision: https://reviews.llvm.org/D96571	2021-02-23 20:01:21 -08:00
Nico Weber	f14a14dd25	Revert "Add more historic DWARF vendor extensions" This reverts commit `c4a9144468`. Breaks check-llvm everywhere, see https://reviews.llvm.org/D97242#2583716	2021-02-23 22:10:02 -05:00
Chen Zheng	be5d92e37e	[Debug-Info][NFC] move emitDwarfUnitLength to MCStreamer class We may need to do some customization for DWARF unit length in DWARF section headers for some targets for some code generation path. For example, for XCOFF in assembly path, AIX assembler does not require the debug section containing its debug unit length in the header. Move emitDwarfUnitLength to MCStreamer class so that we can do customization in different Streamers Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D95932	2021-02-23 21:29:05 -05:00
Adrian Prantl	c4a9144468	Add more historic DWARF vendor extensions The maintainer of libdwarf kindly provided this patch with a bunch of historic DWARF extensions that are missing from Dwarf.def. This list is helpful to avoid potential conflicts in the user-defined vendor extension space in the future. Patch by David Anderson! Differential Revision: https://reviews.llvm.org/D97242	2021-02-23 17:54:04 -08:00
Juneyoung Lee	56d228a14e	[SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes. A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97244	2021-02-24 10:40:50 +09:00
Erich Keane	af4451eb4f	[NFC] Make TrailingObjects non-copyable/non-movable This got me pretty recently... TrailingObjects cannot be copied or moved, since they need to be pre-allocated. This patch deletes the copy and move operations (plus re-adds the default ctor). Differential Revision: https://reviews.llvm.org/D97324	2021-02-23 16:30:13 -08:00
Fangrui Song	ef312951fd	collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128 And delete the SmallPtrSetImpl overload. While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97257	2021-02-23 16:09:06 -08:00
Fangrui Song	3adb89bb9f	[ThinLTO] Make cloneUsedGlobalVariables deterministic Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements. While here, decrease inline element counts from 8 to 4. The number of `llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full LTO/hybrid LTO, the number may be large, so we need to be careful. According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is good for a large project with WholeProgramDevirt, when available_externally vtables are placed in the llvm.compiler.used set. Differential Revision: https://reviews.llvm.org/D97128	2021-02-23 16:09:05 -08:00
Heejin Ahn	ea8c6375e3	[WebAssembly] Fix incorrect grouping and sorting of exceptions This CL is not big but contains changes that span multiple analyses and passes. This description is very long because it tries to explain basics on what each pass/analysis does and why we need this change on top of that. Please feel free to skip parts that are not necessary for your understanding. --- `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next unwind destination>. The value (unwind dest) here is where an exception should end up when it is not caught by the key (EH pad). We record this info in WasmEHPrepare to fix catch mismatches, because the CFG itself does not have this info. A CFG only contains BBs and predecessor-successor relationship between them, but in `WasmEHFuncInfo` the unwind destination BB is not necessarily a successor or the key EH pad BB. Their relationship can be intuitively explained by this C++ code snippet: ``` try { try { foo(); } catch (int) { // EH pad ... } } catch (...) { // unwind destination } ``` So when `foo()` throws, it goes to `catch (int)` first. But if it is not caught by it, it ends up in the next unwind destination `catch (...)`. This unwind destination is what you see in `catchswitch`'s `unwind label %bb` part. --- `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted continuously together in CFGSort, as we do for loops. What this analysis does is very simple: it creates a single `WebAssemblyException` per EH pad, and all BBs that are dominated by that EH pad are included in this exception. We also identify subexception relationship in this way: if EHPad A domiantes EHPad B, EHPad B's exception is a subexception of EHPad A's exception. This simple rule turns out to be incorrect in some cases. In `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means semantically EHPad B should not be included in EHPad A's exception, because it does not make sense to rethrow/delegate to an inner scope. This is what happened in CFGStackify as a result of this: ``` try try catch ... <- %dest_bb is among here! end delegate %dest_bb ``` So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to make sure excptions' unwind destinations are not subexceptions of their unwind sources in `WasmEHFuncInfo`. But this alone does not prevent `dest_bb` in the example above from being sorted within the inner `catch`'s exception, even if its exception is not a subexception of that `catch`'s exception anymore, because of how CFGSort works, which will be explained below. --- CFGSort places BBs within the same `SortRegion` (loop or exception) continuously together so they can be demarcated with `loop`-`end_loop` or `catch`-`end_try` in CFGStackify. `SortRegion` is a wrapper for one of `MachineLoop` or `WebAssemblyException`. `SortRegionInfo` already does some complicated things because there discrepancies between those two data structures. `WebAssemblyException` is what we control, and it is defined as an EH pad as its header and BBs dominated by the header as its BBs (with a newly added exception of unwind destinations explained in the previous paragraph). But `MachineLoop` is an LLVM data structure and uses the standard loop detection algorithm. So by the algorithm, BBs that are 1. dominated by the loop header and 2. have a path back to its header. Because of the second condition, many BBs that are dominated by the loop header are not included in the loop. So BBs that contain `return` or branches to outside of the loop are not technically included in `MachineLoop`, but they can be sorted together with the loop with no problem. Maybe to relax the condition, in CFGSort, when we are in a `SortRegion` we allow sorting of not only BBs that belong to the current innermost region but also BBs that are by the current region header. (This was written this way from the first version written by Dan, when only loops existed.) But now, we have cases in exceptions when EHPad B is the unwind destination for EHPad A, even if EHPad B is dominated by EHPad A it should not be included in EHPad A's exception, and should not be sorted within EHPad A. One way to make things work, at least correctly, is change `dominates` condition to `contains` condition for `SortRegion` when sorting BBs, but this will change compilation results for existing non-EH code and I can't be sure it will not degrade performance or code size. I think it will degrade performance because it will force many BBs dominated by a loop, which don't have the path back to the header, to be placed after the loop and it will likely to create more branches and blocks. So this does a little hacky check when adding BBs to `Preferred` list: (`Preferred` list is a ready list. CFGSort maintains ready list in two priority queues: `Preferred` and `Ready`. I'm not very sure why, but it was written that way from the beginning. BBs are first added to `Preferred` list and then some of them are pushed to `Ready` list, so here we only need to guard condition for `Preferred` list.) When adding a BB to `Preferred` list, we check if that BB is an unwind destination of another BB. To do this, this adds the reverse mapping, `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB is an unwind destination, it checks if the current stack of regions (`Entries`) contains its source BB by traversing the stack backwards. If we find its unwind source in there, we add the BB to its `Deferred` list, to make sure that unwind destination BB is added to `Preferred` list only after that region with the unwind source BB is sorted and popped from the stack. --- This does not contain a new test that crashes because of this bug, but this fix changes the result for one of existing test case. This test case didn't crash because it fortunately didn't contain `delegate` to the incorrectly placed unwind destination BB. Fixes https://github.com/emscripten-core/emscripten/issues/13514. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97247	2021-02-23 14:54:55 -08:00
Matthew Voss	6da7d31416	[llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata Under certain (currently unknown) conditions, llvm-profdata is outputting profiles that have two consecutive entries in the MemOPSize section for the value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch instruction with two cases for 0. As mentioned, we’re not quite sure what’s causing this to happen, but this patch prevents llvm-profdata from outputting a profile that has this problem and gives an error with a request for a reproducible. Differential Revision: https://reviews.llvm.org/D92074	2021-02-23 12:51:54 -08:00
Jay Foad	a6be26710b	[GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC.	2021-02-23 17:08:34 +00:00
Juneyoung Lee	19c2e12947	[JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns This allows JumpThreading's computeValueKnownInPredecessors to recognize select form of and/or patterns as well.	2021-02-24 00:06:10 +09:00
Nate Chandler	01b4890e47	Add @llvm.coro.async.size.replace intrinsic. The new intrinsic replaces the size in one specified AsyncFunctionPointer with the size in another. This ability is necessary for functions which merely forward to async functions such as those defined for partial applications. Reviewed By: aschwaighofer Differential Revision: https://reviews.llvm.org/D97229	2021-02-23 06:43:52 -08:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Alexey Lapshin	875b3b2cdd	[Support] Add reserve() method to the raw_ostream. If resulting size of the output stream is already known, then the space for stream data could be preliminary allocated in some cases. f.e. raw_string_ostream could preallocate the space for the target string(it allows to avoid reallocations during writing into the stream). Differential Revision: https://reviews.llvm.org/D91693	2021-02-23 14:06:38 +03:00
Andy Wingo	7dc98adbb0	Revert "[WebAssembly] call_indirect issues table number relocs" This reverts commit `861dbe1a02`. It broke emscripten -- see https://reviews.llvm.org/D90948#2578843.	2021-02-23 11:48:08 +01:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Lang Hames	430817d0d5	[JITLink] Add a getFixupAddress convenience method to Block.	2021-02-23 11:08:54 +11:00
Lang Hames	adf2098bd8	[JITLink] Don't allow creation of sections with duplicate names.	2021-02-23 11:08:54 +11:00
Heejin Ahn	a08e609d2e	[WebAssembly] Rename methods in WasmEHFuncInfo (NFC) This renames variable and method names in `WasmEHFuncInfo` class to be simpler and clearer. For example, unwind destinations are EH pads by definition so it doesn't necessarily need to be included in every method name. Also I am planning to add the reverse mapping in a later CL, something like `UnwindDestToSrc`, so this renaming will make meanings clearer. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97173	2021-02-22 12:16:11 -08:00
Leonard Chan	1c932baeaa	[llvm][Bitcode] Add bitcode reader/writer for DSOLocalEquivalent This is necessary for compilation with [thin]lto. Differential Revision: https://reviews.llvm.org/D96170	2021-02-22 10:37:57 -08:00
Nikita Popov	5e7e499b91	[JumpThreading] Clone noalias.scope.decl when threading blocks When cloning instructions during jump threading, also clone and adapt any declared scopes. This is primarily important when threading loop exits, because we'll end up with two dominating scope declarations in that case (at least after additional loop rotation). This addresses a loose thread from https://reviews.llvm.org/rG2556b413a7b8#975012. Differential Revision: https://reviews.llvm.org/D97154	2021-02-22 18:35:30 +01:00
Ryan Santhiraraja	2c25efcbd3	[AArch64] Adding SHA3 Intrinsics support This patch adds the following SHA3 Intrinsics: vsha512hq_u64, vsha512h2q_u64, vsha512su0q_u64, vsha512su1q_u64 veor3q_u8 veor3q_u16 veor3q_u32 veor3q_u64 veor3q_s8 veor3q_s16 veor3q_s32 veor3q_s64 vrax1q_u64 vxarq_u64 vbcaxq_u8 vbcaxq_u16 vbcaxq_u32 vbcaxq_u64 vbcaxq_s8 vbcaxq_s16 vbcaxq_s32 vbcaxq_s64 Note need to include +sha3 and +crypto when building from the front-end Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96381	2021-02-22 12:09:20 +00:00
Andy Wingo	861dbe1a02	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via `TABLE_NUMBER` relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. We abuse the used-in-reloc flag on symbols to indicate which tables should end up in the symbol table. We do this because unfortunately older wasm-ld will carp if it see a table symbol. Differential Revision: https://reviews.llvm.org/D90948	2021-02-22 10:13:36 +01:00
Kazu Hirata	5032b5890b	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-21 19:58:05 -08:00
madhur13490	5fe23de5db	[NFC] Remove redundant word in comment Differential Revision: https://reviews.llvm.org/D97157	2021-02-21 18:04:20 +00:00
Nikita Popov	e0615bcd39	[Loads] Add optimized FindAvailableLoadedValue() overload (NFCI) FindAvailableLoadedValue() accepts an iterator by reference. If no available value is found, then the iterator will either be left at a clobbering instruction or the beginning of the basic block. This allows using FindAvailableLoadedValue() across multiple blocks. If this functionality is not needed, as is the case in InstCombine, then we can use a much more efficient implementation: First try to find an available value, and only perform clobber checks if we actually found one. As this function only looks at a very small number of instructions (6 by default) and usually doesn't find an available value, this saves many expensive alias analysis queries.	2021-02-21 18:42:56 +01:00
Juneyoung Lee	aacf7878bc	[ValueTracking] Improve impliesPoison This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning: ``` %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b) %overflow = extractvalue { i64, i1 } %res, 1 %mul = extractvalue { i64, i1 } %res, 0 ; If %mul is poison, %overflow is also poison, and vice versa. ``` This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`: ``` define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) { ; CHECK-LABEL: @test2_logical( ; CHECK-NEXT: [[MUL:%.]] = mul i64 [[A:%.]], [[B:%.]] ; CHECK-NEXT: [[TMP1:%.]] = icmp ne i64 [[A]], 0 ; CHECK-NEXT: [[TMP2:%.]] = icmp ne i64 [[B]], 0 ; CHECK-NEXT: [[OVERFLOW_1:%.]] = and i1 [[TMP1]], [[TMP2]] ; CHECK-NEXT: [[NEG:%.]] = sub i64 0, [[MUL]] ; CHECK-NEXT: store i64 [[NEG]], i64 [[PTR:%.]], align 8 ; CHECK-NEXT: ret i1 [[OVERFLOW_1]] ; %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b) %overflow = extractvalue { i64, i1 } %res, 1 %mul = extractvalue { i64, i1 } %res, 0 %cmp = icmp ne i64 %mul, 0 %overflow.1 = select i1 %overflow, i1 true, i1 %cmp %neg = sub i64 0, %mul store i64 %neg, i64 %ptr, align 8 ret i1 %overflow.1 } ``` Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`. Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true. This improvement allows `impliesPoison` to do the reasoning. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96929	2021-02-20 13:22:34 +09:00
Craig Topper	baab797878	[ValueTypes] Assert if changeVectorElementType is called on a simple type with an extended element type. Previously we would use the extended implementation, but the extended implementation requires the vector type to be extended so that we can access the LLVMContext. In theory we could detect this case and use the context from the element type instead, but since I know of no cases hitting this in practice today I've done the simplest thing. Also add asserts to several extended EVT functions that assume LLVMTy is non-null. Follow from discussion in D97036 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D97070	2021-02-19 17:30:46 -08:00
Mircea Trofin	82492f24ff	[NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit VirtRegAuxInfo is an extensibility point, so the register allocator's decision on which implementation to use should be communicated to the other users - namely, LiveRangeEdit. Differential Revision: https://reviews.llvm.org/D96898	2021-02-19 07:44:28 -08:00
Simon Pilgrim	aa44815f84	Remove unnecessary "using namespace llvm" inside "namespace llvm". NFCI.	2021-02-19 11:15:16 +00:00
Nikita Popov	370addb996	[IR] Move willReturn() to Instruction This moves the willReturn() helper from CallBase to Instruction, so that it can be used in a more generic manner. This will make it easier to fix additional passes (ADCE and BDCE), and will give us one place to change if additional instructions should become non-willreturn (e.g. there has been talk about handling volatile operations this way). I have also included the IntrinsicInst workaround directly in here, so that it gets applied consistently. (As such this change is not entirely NFC -- FuncAttrs will now use this as well.) Differential Revision: https://reviews.llvm.org/D96992	2021-02-19 11:56:01 +01:00
Djordje Todorovic	1a2b3536ef	Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info" As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545 The test that was failing is now forced to use the old PM.	2021-02-18 23:29:22 -08:00
Serge Pavlov	2c4f60e45b	[FPEnv][AArch64] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96836	2021-02-19 13:16:51 +07:00
Lang Hames	0469256d35	[ORC] Print CPU feature string in JITTargetMachineBuilder debugging output.	2021-02-19 15:18:19 +11:00
Konstantin Zhuravlyov	71d1f785a5	AMDGPU/ELF: Sort MACHs by value and add missing reserved MACHs - Sort MACHs by its value - Add missing reserved MACHs - EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3D - EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3E Differential Revision: https://reviews.llvm.org/D97010	2021-02-18 20:46:27 -05:00
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
Petr Hosek	5fbd1a333a	[Coverage] Store compilation dir separately in coverage mapping We currently always store absolute filenames in coverage mapping. This is problematic for several reasons. It poses a problem for distributed compilation as source location might vary across machines. We are also duplicating the path prefix potentially wasting space. This change modifies how we store filenames in coverage mapping. Rather than absolute paths, it stores the compilation directory and file paths as given to the compiler, either relative or absolute. Later when reading the coverage mapping information, we recombine relative paths with the working directory. This approach is similar to handling ofDW_AT_comp_dir in DWARF. Finally, we also provide a new option, -fprofile-compilation-dir akin to -fdebug-compilation-dir which can be used to manually override the compilation directory which is useful in distributed compilation cases. Differential Revision: https://reviews.llvm.org/D95753	2021-02-18 14:34:39 -08:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Petr Hosek	fbf8b957fd	Revert "[Coverage] Store compilation dir separately in coverage mapping" This reverts commit `97ec8fa5bb` since the test is failing on some bots.	2021-02-18 12:50:24 -08:00
Petr Hosek	97ec8fa5bb	[Coverage] Store compilation dir separately in coverage mapping We currently always store absolute filenames in coverage mapping. This is problematic for several reasons. It poses a problem for distributed compilation as source location might vary across machines. We are also duplicating the path prefix potentially wasting space. This change modifies how we store filenames in coverage mapping. Rather than absolute paths, it stores the compilation directory and file paths as given to the compiler, either relative or absolute. Later when reading the coverage mapping information, we recombine relative paths with the working directory. This approach is similar to handling ofDW_AT_comp_dir in DWARF. Finally, we also provide a new option, -fprofile-compilation-dir akin to -fdebug-compilation-dir which can be used to manually override the compilation directory which is useful in distributed compilation cases. Differential Revision: https://reviews.llvm.org/D95753	2021-02-18 12:27:42 -08:00
Sam Powell	eb2eeeb76f	[llvm][TextAPI] add equality operator for InterfaceFile This patch adds functionality to compare for the equality between `InterfaceFile`s based on attributes specific to linking. Reviewed By: cishida, steven_wu Differential Revision: https://reviews.llvm.org/D96629	2021-02-18 11:53:08 -08:00
Ta-Wei Tu	f70cdc5b5c	[NPM] Properly reset parent loop after loop passes This fixes https://bugs.llvm.org/show_bug.cgi?id=49185 When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`. If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes. However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes. This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D96727	2021-02-19 02:50:53 +08:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Paul C. Anagnostopoulos	49d663d546	Revert "[TableGen] Improve algorithms for processing template arguments" This reverts commit e589207d5aaee6cbf1d7c7de8867a17727d14aca.	2021-02-18 09:26:26 -05:00
Paul C. Anagnostopoulos	d248cce44e	[TableGen] Improve algorithms for processing template arguments Rework template argument checking so that all arguments are type-checked and cast if necessary. Add a test. Differential Revision: https://reviews.llvm.org/D96416	2021-02-18 09:15:26 -05:00
Djordje Todorovic	c1e23894fc	Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info" This reverts rG8ee7c7e02953. One test is failing, I'll reland this as soon as possible.	2021-02-18 02:04:27 -08:00
Djordje Todorovic	8ee7c7e029	[Debugify] Make the debugify aware of the original (-g) Debug Info As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545	2021-02-18 01:52:16 -08:00
Chen Zheng	4c23707a41	[XCOFF][NFC] make StorageMappingClass/SymbolType member optional This patch makes StorageMappingClass/SymbolType member optional in class MCSectionXCOFF. Non-csect sections like debug sections have no such properties. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D96641	2021-02-18 04:46:05 -05:00
Fangrui Song	018a484cd2	[llvm-objdump] Map STT_TLS to ST_Other (previously ST_Data) ST_Data is used to model BFD `BFD_OBJECT`. A STT_TLS symbol does not have the `BFD_OBJECT` flag in BFD. This makes sense because a STT_TLS symbol is like in a different address space, normal data/object properties do not apply on them. With this change, a STT_TLS symbol will not be displayed as 'O'. This new behavior matches objdump. Differential Revision: https://reviews.llvm.org/D96735	2021-02-17 23:17:20 -08:00
Chen Zheng	5517923b1c	[XCOFF][NFC] make csect properties optional for getXCOFFSection We are going to support debug sections for XCOFF. So the csect properties are not necessary. This patch makes these properties optional. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D95931	2021-02-17 20:51:42 -05:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Rahman Lavaee	0252e6ead1	[obj2yaml,yaml2obj] Add NumBlocks to the BBAddrMapEntry yaml field. As discussed in D95511, this allows us to encode invalid BBAddrMap sections to be used in more rigorous testing. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D96831	2021-02-17 15:45:13 -08:00
Rong Xu	7397905ab0	[SampleFDO] Third Try: Refactor SampleProfile.cpp Apply the patch for the third time after fixing buildbot failures. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. (4) Add inline keyword to avoid duplicated symbols -- they will be removed later when the class is changed to a template. Differential Revision: https://reviews.llvm.org/D96455	2021-02-17 15:31:50 -08:00
Jessica Paquette	60aa646441	[GlobalISel] Add G_ASSERT_SEXT This adds a G_ASSERT_SEXT opcode, similar to G_ASSERT_ZEXT. This instruction signifies that an operation was already sign extended from a smaller type. This is useful for functions with sign-extended parameters. E.g. ``` define void @foo(i16 signext %x) { ... } ``` This adds verifier, regbankselect, and instruction selection support for G_ASSERT_SEXT equivalent to G_ASSERT_ZEXT. Differential Revision: https://reviews.llvm.org/D96890	2021-02-17 13:10:34 -08:00
Vedant Kumar	c28622fbf3	Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455" This reverts commit `c73cbf218a`. Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)" This reverts commit `a23e6b321c`. Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" This reverts commit `6fd5ccff72`. Still seeing link failures when building llc (or other tools), due to the new SampleProfileLoaderBaseImpl.h containing definitions that get duplicated across multiple TU's. ``` duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const) const' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const, llvm::BasicBlock const*>)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) ```	2021-02-17 10:22:24 -08:00
Vedant Kumar	c73cbf218a	[SampleFDO] Add missing #includes to unbreak modules build after D96455 Bot: http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/28999 ``` /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:124:19: error: missing '#include "llvm/Analysis/PostDominators.h"'; 'PostDominatorTree' must be declared before it is used std::unique_ptr<PostDominatorTree> PDT; ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/PostDominators.h:28:7: note: declaration here is not visible class PostDominatorTree : public PostDomTreeBase<BasicBlock> { ^ While building module 'LLVM_Transforms' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Transforms/CFGuard/CFGuard.cpp:15: In file included from <module-includes>:191: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:125:19: error: missing '#include "llvm/Analysis/LoopInfo.h"'; 'LoopInfo' must be declared before it is used std::unique_ptr<LoopInfo> LI; ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/LoopInfo.h:1079:7: note: declaration here is not visible class LoopInfo : public LoopInfoBase<BasicBlock, Loop> { ^ While building module 'LLVM_Transforms' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Transforms/CFGuard/CFGuard.cpp:15: In file included from <module-includes>:191: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:149:3: error: missing '#include "llvm/Analysis/OptimizationRemarkEmitter.h"'; 'OptimizationRemarkEmitter' must be declared before it is used OptimizationRemarkEmitter *ORE = nullptr; ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h:33:7: note: declaration here is not visible class OptimizationRemarkEmitter { ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Transforms/CFGuard/CFGuard.cpp:15:10: fatal error: could not build module 'LLVM_Transforms' ```	2021-02-17 10:02:22 -08:00
Marianne Mailhot-Sarrasin	f0ec9f1bb3	[Pipeliner] Fixed optimization remarks and debug dumps Initiation Interval value The II value was incremented before exiting the loop, and therefor when used in the optimization remarks and debug dumps it did not reflect the initiation interval actually used in Schedule. Differential Revision: https://reviews.llvm.org/D95692	2021-02-17 12:28:37 -05:00
William S. Moses	40862b1a74	[SROA] Propagate correct TBAA/TBAA Struct offsets SROA does not correctly account for offsets in TBAA/TBAA struct metadata. This patch creates functionality for generating new MD with the corresponding offset and updates SROA to use this functionality. Differential Revision: https://reviews.llvm.org/D95826	2021-02-17 11:59:00 -05:00
Ta-Wei Tu	0eeaec2a6d	[NFC] Refactor LoopInterchange into a loop-nest pass This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change. Changes that are not loop-nest related are split to D96650. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D96644	2021-02-18 00:55:38 +08:00
Andrew Savonichev	4bee0dc918	[NFC] Use the same type for bit fields in MCSchedClassDesc Otherwise they are not allocated as a single bit field and take 4 bytes instead of 2. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D95954	2021-02-17 15:54:22 +03:00
Igor Kudrin	aa84289629	[DebugInfo] Keep the DWARF64 flag in the module metadata This allows the option to affect the LTO output. Module::Max helps to generate debug info for all modules in the same format. Differential Revision: https://reviews.llvm.org/D96597	2021-02-17 17:03:34 +07:00
Sam McCall	9ebc837f55	[ADT] Add SFINAE guards to unique_function constructor. We can't construct a working unique_function from an object that's not callable with the right types, so don't allow deduction to succeed. This avoids some ambiguous conversion cases, e.g. allowing to overload on different unique_function types, and to conversion operators to unique_function. std::function and the any_invocable proposal have these. This was added to llvm::function_ref in D88901 and followups Differential Revision: https://reviews.llvm.org/D96794	2021-02-17 10:36:07 +01:00
Yang Fan	a23e6b321c	[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC) MSVC warning: ``` SampleProfileLoaderBaseImpl.h(41): warning C4515: 'llvm': namespace uses itself ```	2021-02-17 15:27:30 +08:00
Kazu Hirata	2620459baa	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-16 23:23:07 -08:00
Rong Xu	6fd5ccff72	[SampleFDO] Reapply: Refactor SampleProfile.cpp Reapply patch after fixing buildbot failure. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 16:43:21 -08:00
Sriraman Tallam	d1a838babc	Basic block sections should enable function sections implicitly. Basic block sections enables function sections implicitly, this is not needed and is inefficient with "=list" option. We had basic block sections enable function sections implicitly in clang. This is particularly inefficient with "=list" option as it places functions that do not have any basic block sections in separate sections. This causes unnecessary object file overhead for large applications. This patch disables this implicit behavior. It only creates function sections for those functions that require basic block sections. Further, there was an inconistent behavior with llc as llc was not turning on function sections by default. This patch makes llc and clang consistent and tests are added to check the new behavior. This is the first of two patches and this adds functionality in LLVM to create a new section for the entry block if function sections is not enabled. Differential Revision: https://reviews.llvm.org/D93876	2021-02-16 16:27:16 -08:00
Petr Hosek	16af973933	[MC][ELF] Support for zero flag section groups This change introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers already support zero flag ELF section groups and this change helps us reach feature parity. Differential Revision: https://reviews.llvm.org/D95851	2021-02-16 14:23:40 -08:00
Mehdi Amini	c761fe77bd	Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp" This reverts commit `310b35304c`. The build is broken with -DBUILD_SHARED_LIBS=ON : lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const, llvm::ProfileSummaryInfo, bool)': SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const' SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const' ...	2021-02-16 22:11:42 +00:00
David Blaikie	c3120291f4	Effectively revert `ba2aa5f49e` since the object isn't destroyed polymorphically	2021-02-16 13:45:25 -08:00
David Blaikie	f8af06d60d	Fix -Wnon-virtual-dtor by making the ctor protected	2021-02-16 13:38:28 -08:00
Kazu Hirata	ba2aa5f49e	[SampleFDO] Provide a virtual desructor for SampleProfileLoaderBaseImpl This patch fixes a warning: llvm-project/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h:69:7: error: 'llvm::SampleProfileLoaderBaseImpl' has virtual functions but non-virtual destructor [-Werror,-Wnon-virtual-dtor] Differential Revision: https://reviews.llvm.org/D96810	2021-02-16 13:17:33 -08:00
Rong Xu	310b35304c	[SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 11:18:21 -08:00
Michael Kruse	6c05005238	[OpenMP] Implement '#pragma omp tile', by Michael Kruse (@Meinersbur). The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard. This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult. A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once. I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest). Differential Revision: https://reviews.llvm.org/D76342	2021-02-16 09:45:07 -08:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Sander de Smalen	00fe10c6a6	[SCEVExpander] Migrate costAndCollectOperands to use InstructionCost. This patch changes costAndCollectOperands to use InstructionCost for accumulated cost values. isHighCostExpansion will return true if the cost has exceeded the budget. Reviewed By: CarolineConcatto, ctetreau Differential Revision: https://reviews.llvm.org/D92238	2021-02-16 09:27:34 +00:00
Sameer Sahasrabuddhe	11bf7da64a	[NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis" since there is no conflict with LegacyDivergenceAnalysis. In the legacy PM, this analysis can only be used through the legacy DA serving as a wrapper. It is now made available as a pass in the new PM, and has no relation with the legacy DA. The new DA currently cannot handle irreducible control flow; its presence can cause the analysis to run indefinitely. The analysis is now modified to detect this and report all instructions in the function as divergent. This is super conservative, but allows the analysis to be used without hanging the compiler. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D96615	2021-02-16 10:26:45 +05:30
Kazu Hirata	f0d5898f93	[Support] Use ListSeparator (NFC)	2021-02-15 14:46:09 -08:00
Kazu Hirata	c82cd5e54e	[LazyCallGraph] Remove forward declarations of nonexistent classes (NFC)	2021-02-15 14:46:07 -08:00
Matt Arsenault	392e0fcfd1	GlobalISel: Handle arguments partially passed on the stack The API is a bit awkward since you need to index into an array in the passed struct. I guess an alternative would be to pass all of the individual fields.	2021-02-15 17:06:14 -05:00
Matt Arsenault	1b3d8ddeb9	CodeGen: Move function to get subregister indexes to cover a LaneMask Return the best covering index, and additional needed to complete the mask. This logically belongs in TargetRegisterInfo, although I ended up not needing it for why I originally split this out.	2021-02-15 17:05:37 -05:00
Craig Topper	eb75f250fe	[RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. Simlilar was done for BSWAP previously. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96681	2021-02-15 12:33:16 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Caroline Concatto	b52e6c5891	[CostModel]Add cost model for experimental.vector.reverse This patch uses the function getShuffleCost with SK_Reverse to compute the cost for experimental.vector.reverse. For scalable vector type, it adds a table will the legal types on AArch64TTIImpl::getShuffleCost to not assert in BasicTTIImpl::getShuffleCost, and for fixed vector, it relies on the existing cost model in BasicTTIImpl. Depends on D94883 Differential Revision: https://reviews.llvm.org/D95603	2021-02-15 14:23:57 +00:00
Kerry McLaughlin	5fe1593438	[LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax Currently, setting the `no-nans-fp-math` attribute to true will allow loops with fmin/fmax to vectorize, though we should be requiring that `no-signed-zeros-fp-math` is also set. This patch adds the check for no-signed-zeros at the function level and includes tests to make sure we don't vectorize functions with only one of the attributes associated. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D96604	2021-02-15 13:47:05 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Marco Antognini	e54811ff7e	Restore diagnostic handler after CodeGenAction::ExecuteAction Fix dangling pointer to local variable and address some typos. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D96487	2021-02-15 10:33:00 +00:00
Florian Hahn	c70737ba1d	Recommit "[LTO] Use lto::backend for code generation." This version of the patch includes a fix for the cfi failures. (undoes the revert commit `7db390cc77`) It also undoes reverts of follow-up patches that also needed reverting originally: * [LTO] Add option enable NewPM with LTOCodeGenerator. (undoes revert commit `0a17664b47`) * [LTOCodeGenerator] Use lto::Config for options (NFC)." (undoes revert commit `b0a8e41cff`)	2021-02-15 10:05:42 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Carl Ritson	aef781b47a	[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic Add intrinsic which demotes all active lanes to helper lanes. This is used to implement demote to helper Vulkan extension. In practice demoting a lane to helper simply means removing it from the mask of live lanes used for WQM/WWM/Exact mode. Where the shader does not use WQM, demotes just become kills. Additionally add llvm.amdgcn.live.mask intrinsic to complement demote operations. In theory llvm.amdgcn.ps.live can be used to detect helper lanes; however, ps.live can be moved by LICM. The movement of ps.live cannot be remedied without changing its type signature and such a change would require ps.live users to update as well. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94747	2021-02-15 08:45:46 +09:00
Cassie Jones	36246388ba	[GlobalISel] Extract a narrowScalarAddSub method. NFC Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D95426	2021-02-14 18:06:32 -05:00
cynecx	656ead1fb7	[llvm/Support] Add SHA256 implementation Adds an unaudited SHA-256 implementation to `llvm/Support`. The ongoing lld-macho effort needs this to emit an adhoc code signature for macho files on macOS Big Sur. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D96540	2021-02-14 19:01:01 +00:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Kazu Hirata	1cc558bd4f	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-14 08:36:18 -08:00
Nikita Popov	728803ed74	[BasicAA] Use index difference to detect GEPs with identical indexes We currently detect GEPs that have exactly the same indexes by comparing the Offsets and VarIndices. However, the latter implicitly performs equality comparisons between two values, which is not generally legal inside BasicAA, due to the possibility of comparisons across phi cycles. I believe that in this particular instance this actually ends up being unproblematic, at least I wasn't able to come up with any cases that could result in an incorrect root query result. In the interest of being defensive, compute GetIndexDifference earlier (which knows how to handle phi cycles properly) and use the result of that to determine whether the offsets are identical.	2021-02-14 17:11:03 +01:00
aqjune	5f3c99085d	[ValueTracking] Dereferenced pointers are noundef This is a follow-up of D95238's LangRef update. This patch updates `programUndefinedIfUndefOrPoison(V)` to return true if `V` is used by any memory-accessing instruction. Interestingly, this affected many tests in Attributors, mainly about adding noundefs. The tests are updated using llvm/utils/update_test_checks.py. I checked that the diffs are about updating noundefs. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96642	2021-02-14 22:50:48 +09:00
Kazu Hirata	dfa3ead01e	[Analysis] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-02-13 20:41:38 -08:00
Nikita Popov	f515ca8995	[IRBuilder] Remove Align-related deprecated APIs This removes IRBuilder methods accepting unsigned alignments in favor of their Align/MaybeAlign variants. These methods have been deprecated for more than a year at this point, so they should be safe to remove.	2021-02-13 16:42:37 +01:00
Tyker	642e9225c6	reland [InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-13 13:03:11 +01:00
Wei Wang	80dc0661bd	[LTO] Perform DSOLocal propagation in combined index Perform DSOLocal propagation within summary list of every GV. This avoids the repeated query of this information during function importing. Differential Revision: https://reviews.llvm.org/D96398	2021-02-12 22:58:26 -08:00
Jian Cai	c2a84771bb	[llvm-objcopy] preserve file ownership when overwritten by root As of binutils 2.36, GNU strip calls chown(2) for "sudo strip foo" and "sudo strip foo -o foo", but no "sudo strip foo -o bar" or "sudo strip foo -o ./foo". In other words, while "sudo strip foo -o bar" creates a new file bar with root access, "sudo strip foo" will keep the owner and group of foo unchanged. Currently llvm-objcopy and llvm-strip behave differently, always changing the owner and gropu to root. The discrepancy prevents Chrome OS from migrating to llvm-objcopy and llvm-strip as they change file ownership and cause intended users/groups to lose access when invoked by sudo with the following sequence (recommended in man page of GNU strip). 1.<Link the executable as normal.> 1.<Copy "foo" to "foo.full"> 1.<Run "strip --strip-debug foo"> 1.<Run "objcopy --add-gnu-debuglink=foo.full foo"> This patch makes llvm-objcopy and llvm-strip follow GNU's behavior. Link: crbug.com/1108880	2021-02-12 18:01:43 -08:00
James Y Knight	8bd8534aa3	LLVM-C: Allow LLVM{Get/Set}Alignment on an atomicrmw/cmpxchg instruction. (Now that these can have alignment specified.)	2021-02-12 18:31:18 -05:00
Nikita Popov	191e469ede	[AA] Move Depth member from AAResults to AAQI (NFC) Rather than storing the query depth in AAResults, store it in AAQI. This makes more sense, as it is a property of the query. This sidesteps the issue of D94363, fixing slightly inaccurate AA statistics. Additionally, I plan to use the Depth from BasicAA in the future, where fetching it from AAResults would be unreliable. This change is not quite as straightforward as it seems, because we need to preserve the depth when creating a new AAQI for recursive queries across phis. I'm adding a new method for this, as we may need to preserve additional information here in the future.	2021-02-12 21:42:36 +01:00
Jessica Paquette	145549ff89	[GlobalISel] Combine (x + 0) -> x, G_PTR_ADD edition Add it to right_identity_zero. Differential Revision: https://reviews.llvm.org/D96621	2021-02-12 12:09:48 -08:00
Amara Emerson	5d6d9b63a3	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
Scott Linder	12999d749d	[Symbolize] Teach symbolizer to work directly on object file. This patch intended to provide additional interface to LLVMsymbolizer such that they work directly on object files. There is an existing method - symbolizecode which takes an object file, this patch provides similar overloads for symbolizeInlinedCode, symbolizeData, symbolizeFrame. This can be useful for clients who already have a in-memory object files to symbolize for. Patch By: pvellien (praveen velliengiri) Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95232	2021-02-12 18:26:24 +00:00
Arnold Schwaighofer	e760ec2a01	[coro] Add support for polymorphic return typed coro.suspend.async This allows for suspend point specific resume function types. Return values from a suspend point can therefore be modelled as arguments to the resume function. Allowing for directly passed return types. Differential Revision: https://reviews.llvm.org/D96136	2021-02-12 10:08:00 -08:00
Lukas Sommer	6577cef9b0	[CodeGen] New pass: Replace vector intrinsics with call to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. This is a re-try of the original commit `2303e93e66` that was reverted due to pass manager problems. Other minor changes have also been made. Differential Revision: https://reviews.llvm.org/D95373	2021-02-12 12:53:27 -05:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
David Sherwood	01b87444cb	[NFC][Analysis] Change struct VecDesc to use ElementCount This patch changes the VecDesc struct to use ElementCount instead of an unsigned VF value, in preparation for future work that adds support for vectorized versions of math functions using scalable vectors. Since all I'm doing in this patch is switching the type I believe it's a non-functional change. I changed getWidestVF to now return both the widest fixed-width and scalable VF values, but currently the widest scalable value will be zero. Differential Revision: https://reviews.llvm.org/D96011	2021-02-12 11:07:58 +00:00
Vitaly Buka	fc05b2d9e5	[NFC][ProfileData] Improve language	2021-02-12 02:55:58 -08:00
David Sherwood	9700228abc	[Analysis] Change VFABI::mangleTLIVectorName to use ElementCount Adds support for mangling TLI vector names for scalable vectors. Differential Revision: https://reviews.llvm.org/D96338	2021-02-12 09:38:12 +00:00
Sander de Smalen	1d42ba254f	[BasicTTIImpl] Fix getCastInstrCost for scalable vectors by querying for ElementCount. This fixes an overly restrictive assumption that the vector is a FixedVectorType, in code that tries to calculate the cost of a cast operation when splitting a too-wide vector. The algorithm works the same for scalable vectors, so this patch removes the cast<FixedVectorType>. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96253	2021-02-12 08:28:52 +00:00
Sander de Smalen	63d787e5d4	[CostModel] An extending load to illegal type is not free. COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when <4 x i64> is an illegal result type that requires splitting of the operation. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96250	2021-02-12 07:59:21 +00:00
Pengxuan Zheng	61cca0f2e5	[AArch64] Adding Neon Sm3 & Sm4 Intrinsics This adds SM3 and SM4 Intrinsics support for AArch64, specifically: vsm3ss1q_u32 vsm3tt1aq_u32 vsm3tt1bq_u32 vsm3tt2aq_u32 vsm3tt2bq_u32 vsm3partw1q_u32 vsm3partw2q_u32 vsm4eq_u32 vsm4ekeyq_u32 Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D95655	2021-02-11 14:20:20 -08:00
Hongtao Yu	de40f6d623	[CSSPGO] Process functions in a top-down order on a dynamic call graph. Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining. The profile top-down order for SCC is also extended to support non-CS profiles. Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95988	2021-02-11 12:36:59 -08:00
Stanislav Mekhanoshin	8151c1b442	Move implementation of isAssumeLikeIntrinsic into IntrinsicInst This is remove dependency on ValueTracking in the future patch. Differential Revision: https://reviews.llvm.org/D96079	2021-02-11 11:41:34 -08:00
Michael Kruse	606aa622b2	Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache" This reverts commit `b7d870eae7` and the subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)" (commit `e6810cab09`). It caused indeterminism in the output, such that e.g. the polly-x86_64-linux buildbot failed accasionally.	2021-02-11 12:17:38 -06:00
Alex Hoppen	7e3b9aba60	[Timer] On macOS count number of executed instructions In addition to wall time etc. this should allow us to get less noisy values for time measurements. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D96049	2021-02-11 17:26:37 +01:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Duncan P. N. Exon Smith	fa35c1f80f	ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and `MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more precisely describe its effect and clarify the header documentation. Found this while helping to investigate PR48841, which pointed out an unsound use of the flag in `CloneModule()`. For now I've just added a FIXME there, but I'm hopeful that the new (more precise) name will prevent other similar errors.	2021-02-10 16:53:21 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Arthur Eubanks	5d960cba34	[opt][NewPM] Add a --print-passes flag to print all available passes It seems nicer to list passes given a flag rather than displaying all passes in opt --help. This is awkwardly structured because a PassBuilder is required, but reusing the PassBuilder in runPassPipeline() doesn't work because we read the input IR before getting to runPassPipeline(). So printing the list of passes needs to happen before reading the input IR. If we remove the legacy PM code in main() and move everything from NewPMDriver.cpp into opt.cpp, we can create the PassBuilder before reading IR and check if we should print the list of passes and exit. But until then this hack seems fine. Compared to the legacy PM, the new PM passes are lacking descriptions. We'll need to figure out a way to add descriptions if we think this is important. Also, this only works for passes specified in PassRegistry.def. If we want to print other custom registered passes, we'll need a different mechanism. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D96101	2021-02-10 11:22:12 -08:00
Jeremy Morse	1d68e0a075	Reland [DWARF] Location-less inlined variables should not have DW_TAG_variable Originally landed in `ddc2f1e3fb` and reverted in `d32deaab4d` because of a Generic test objecting. That was fixed up in `013613964f`. Original landing commit message follows: [DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-10 15:40:47 +00:00
Sander de Smalen	750a78cd5d	[ValueTypes] Add MVT for nxv1bf16. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96249	2021-02-10 08:50:41 +00:00
Kazu Hirata	781d0fea72	[TableGen] Drop unnecessary const from return types (NFC)	2021-02-09 22:14:28 -08:00
Ta-Wei Tu	e89fcbfad6	Fix deprecated usage of `mallinfo` glibc deprecates `mallinfo` in the latest version of 2.33. This patch replaces the usage of `mallinfo` with the new `mallinfo2` when it's available. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D96359	2021-02-10 13:53:57 +08:00
Tyker	5652e192fc	Revert "[InstCombine] convert assumes to operand bundles" This reverts commit `5eb2e994f9`.	2021-02-10 01:32:00 +01:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Tyker	5eb2e994f9	[InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-09 19:33:53 +01:00
Alex Richardson	7dc3136033	[llvm-readobj] Add support for decoding FreeBSD ELF notes The current support only printed coredump notes, but most binaries also contain notes. This change adds names for four FreeBSD-specific notes and pretty-prints three of them: NT_FREEBSD_ABI_TAG: This note holds a 32-bit (decimal) integer containing the value of the __FreeBSD_version macro, which is defined in crt1.o and will hold a value such as 1300076 for a binary build on a FreeBSD 13 system. NT_FREEBSD_ARCH_TAG: A string containing the value of the build-time MACHINE_ARCH NT_FREEBSD_FEATURE_CTL: A 32-bit flag that indicates to the kernel that the binary wants certain bevahiour. Examples include setting NT_FREEBSD_FCTL_ASLR_DISABLE which tells the kernel to disable ASLR. After this change llvm-readobj also no longer decodes coredump-only FreeBSD notes in non-coredump files. I've also converted the note-freebsd.s test to use yaml2obj instead of llvm-mc. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D74393	2021-02-09 16:59:22 +00:00
Alex Richardson	d613d8eb0e	[yaml2obj] Handle NT_* string values in for ELF note types This is required for D74393. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95953	2021-02-09 16:59:22 +00:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Dylan McKay	2ccb941740	[AVR] Fix global references to function symbols References to functions are in program memory and need a `pm()` fixup. This should fix trait objects for Rust on AVR. Differential Revision: https://reviews.llvm.org/D87631 Patch by Alex Mikhalev.	2021-02-10 00:40:49 +13:00
Jan Svoboda	e721bc9eff	[clang][cli] Generate and round-trip CodeGen options This patch implements generation of remaining codegen options and tests it by performing parse-generate-parse round trip. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D96056	2021-02-09 11:43:38 +01:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Amara Emerson	ec41ed5b1b	[AArch64][GlobalISel] Support the 'returned' parameter attribute. On AArch64 (which seems to be the only target that supports it), this attribute allows codegen to avoid saving/restoring the value in x0 across a call. Gives a 0.1% geomean -Os code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D96099	2021-02-08 12:47:39 -08:00
Jamie Schmeiser	4b661b4059	Introduce -print-changed=[diff \| diff-quiet] which show changes in patch-like format Summary: Introduce base classes that hold a textual represent of the IR based on basic blocks and a base class for comparing this representation. A new change printer is introduced that uses these classes to save and compare representations of the IR before and after each pass. It only reports when changes are made by a pass (similar to -print-changed) except that the changes are shown in a patch-like format with those lines that are removed shown in red prefixed with '-' and those added shown in green with '+'. This functionality was introduced in my tutorial at the 2020 virtual developer's meeting. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D91890	2021-02-08 10:11:22 -05:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00
Thomas Symalla	f89f6d1e5d	[AMDGPU]: Fixes an invalid clamp selection pattern. When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation." Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95878	2021-02-08 13:06:30 +01:00
Sander de Smalen	ba8637ca84	[ValueTypes] Fix size of nxv1f16 (32 -> 16). Clearly seems like this was a typo.	2021-02-08 11:00:47 +00:00
David Sherwood	3bbaece5a0	[Analysis] Remove unused functions from TargetLibraryInfo A simple clean-up to remove dead code. Differential Revision: https://reviews.llvm.org/D95934	2021-02-08 09:50:36 +00:00
Raphael Isemann	0ebf904baf	[modules] Put Frontend/OpenMP headers into a Clang module to fix the module build These headers can be in a Clang module like the rest. This also fixes the modules build that is currently struggling with these headers being textually included in several other modules.	2021-02-08 09:54:45 +01:00
Fangrui Song	d3e13b58cd	ELFObjectWriter: Don't de-duplicate STT_FILE symbols	2021-02-07 18:21:36 -08:00
Fangrui Song	09294642be	ELFObjectWriter: Make STT_FILE precede associated local symbols	2021-02-07 17:51:40 -08:00
Kazu Hirata	7b9f6c2d42	[SelectionDAG] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-07 09:49:33 -08:00
Kazu Hirata	b3ec6a602d	[IR] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-06 11:17:06 -08:00
Johannes Doerfert	b7d870eae7	[AssumptionCache] Avoid dangling llvm.assume calls in the cache PR49043 exposed a problem when it comes to RAUW llvm.assumes. While D96106 would fix it for GVNSink, it seems a more general concern. To avoid future problems this patch moves away from the vector of weak reference model used in the assumption cache. Instead, we track the llvm.assume calls with a callback handle which will remove itself from the cache if the call is deleted. Fixes PR49043. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96168	2021-02-06 12:18:39 -06:00
Johannes Doerfert	378f4e5ec2	[AssumptionCache] Do not track llvm.assume calls (PR49043) This fixes PR49043 by invalidating the handle on RAUW. This will work fine assuming all existing RAUW users add the new assumption to the cache. That means, if a new llvm.assume call replaces an old one, you need to add the new one now as a RAUW is not enough anymore. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96208	2021-02-06 12:18:30 -06:00
Fangrui Song	e44a100942	.gcc_except_table: Set SHF_LINK_ORDER if binutils>=2.36, and drop unneeded unique ID for -fno-unique-section-names GNU ld>=2.36 supports mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an output section, so we can set SHF_LINK_ORDER if -fbinutils-version=2.36 or above. If -fno-function-sections or older binutils, drop unique ID for -fno-unique-section-names. The users can just specify -fbinutils-version=2.36 or above to allow GC with both GNU ld and LLD. (LLD does not support garbage collection of non-group non-SHF_LINK_ORDER .gcc_except_table sections.)	2021-02-05 21:45:21 -08:00
Kazu Hirata	aa5c09bead	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-05 21:02:06 -08:00
Wenlei He	801d9cc7b9	[CSSPGO] Use merged base profile for hot threshold calculation Context-sensitive profile effectively split a function profile into many copies each representing the CFG profile of a particular calling context. That makes the count distribution looks more flat as we now have more function profiles each with lower counts, which in turn leads to lower hot thresholds. Now we tells threshold computation to merge context profile first before calculating percentile based cutoffs to compensate for seemingly flat context profile. This can be controlled by swtich `sample-profile-contextless-threshold`. Earlier measurement showed ~0.4% perf boost with this tuning on spec2k6 for CSSPGO (with pseudo-probe and new inliner). Differential Revision: https://reviews.llvm.org/D95980	2021-02-05 17:51:00 -08:00
Wouter van Oortmerssen	5e5b2cb131	[WebAssembly] Prevent data inside text sections in assembly This is not supported in Wasm, unless the data was encoded instructions, but that wouldn't work with the assembler's other functionality (enforcing nesting etc.). Fixes: https://bugs.llvm.org/show_bug.cgi?id=48971 Differential Revision: https://reviews.llvm.org/D95838	2021-02-05 13:48:25 -08:00
Aaron Ballman	ec04e2850a	Allow SmallPtrSet to be used with a std::insert_iterator Currently, the SmallPtrSet type allows inserting elements but it does not support inserting elements with a positional hint. The lack of this signature means that you cannot use SmallPtrSet with std::insert_iterator or std::inserter(), which makes some code constructs more awkward. This adds an overload of insert() that can be used in these scenarios. The positional hint is unused by SmallPtrSet and the call is equivalent to calling insert() without a hint.	2021-02-05 16:12:47 -05:00
Sanjay Patel	c981f6f8e1	Revert "[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library" This reverts commit `2303e93e66`. Investigating bot failures.	2021-02-05 15:10:11 -05:00
Lukas Sommer	2303e93e66	[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. Differential Revision: https://reviews.llvm.org/D95373	2021-02-05 14:25:19 -05:00
Thomas Preud'homme	00a62547da	Stop traping on sNaN in __builtin_isnan __builtin_isnan currently generates a floating-point compare operation which triggers a trap when faced with a signaling NaN in StrictFP mode. This commit uses integer operations instead to not generate any trap in such a case. Reviewed By: kpn Differential Revision: https://reviews.llvm.org/D95948	2021-02-05 18:28:48 +00:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Simon Pilgrim	0712c2a2b8	CodeGenPassBuilder.h - fix Wdocumentation warning. NFCI. void functions shouldn't have a \returns	2021-02-05 11:11:37 +00:00
Simon Pilgrim	2a957e3e87	DWARFDebugFrame.h - fix Wdocumentation warning. NFCI.	2021-02-05 10:57:38 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Kazu Hirata	d29562b29c	[IR] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-04 21:18:02 -08:00
Fangrui Song	8d4cd2da1f	[MC] Add isFPImm after D96091	2021-02-04 20:51:02 -08:00
Fangrui Song	68d6918e7a	[MC] Add createFPImm/isFPImm/setFPImm to smooth migration from FPImm to DFPImm after D96091	2021-02-04 20:42:35 -08:00
Craig Topper	6b280ce34c	[RISCV] Use LLVMScalarOrSameVectorWidth to make avoid needing to mention the index type for vrgatherei16 intrinsics. Add .vv to the intrinsic name to be consistent with D95979. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D95981	2021-02-04 20:26:45 -08:00
Craig Topper	25ff302a79	[RISCV] Split vrgather intrinsics into separate vrgather.vv and vrgather.vx intrinsics. The vrgather.vv instruction uses a vector of indices with the same SEW as operand 0. The vrgather.vx instructions use a scalar index operand of XLen bits. By splitting this into 2 intrinsics we are able to use LLVMatchType in the definition to avoid specifying the type for the index operand when creating the IR for the intrinsic. For .vv it will match the operand 0 type. And for .vx it will match the type of the vl operand we already needed to specify a type for. I'm considering splitting more intrinsics. This was a somewhat odd one because the .vx doesn't use the element type, it always use XLen. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D95979	2021-02-04 19:50:12 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Dan Gohman	698c6b0a09	[WebAssembly] Support single-floating-point immediate value As mentioned in TODO comment, casting double to float causes NaNs to change bits. To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode. Patch by Yuta Saito. Differential Revision: https://reviews.llvm.org/D77384	2021-02-04 18:05:06 -08:00
Christopher Tetreault	b8b054aa8a	Reland "Ensure that InstructionCost actually implements a total ordering" The operator< in the previous attempt was incorrect. It is unfortunate that this was only caught by the expensive checks. This reverts commit `ff1147c363`.	2021-02-04 10:04:10 -08:00
Sander de Smalen	8f69da9f97	[ElementCount] NFC: Set 'const' qualifier for getWithIncrement/Decrement. These class methods simply return a new UnivariateLinearPolyBase (e.g. ElementCount), and do not modify the object in any way or form, so qualify for being 'const'.	2021-02-04 11:27:45 +00:00
Jan Svoboda	225ccf0c50	[clang][cli] Command line round-trip for HeaderSearch options This patch implements generation of remaining header search arguments. It's done manually in C++ as opposed to TableGen, because we need the flexibility and don't anticipate reuse. This patch also tests the generation of header search options via a round-trip. This way, the code gets exercised whenever Clang is built and tested in asserts mode. All `check-clang` tests pass. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D94472	2021-02-04 10:18:34 +01:00
Joachim Meyer	e3f02302e3	[Support] Indent multi-line descr of enum cli options. As noted in https://reviews.llvm.org/D93459, the formatting of multi-line descriptions of clEnumValN and the likes is unfavorable. Thus this patch adds support for correctly indenting these. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D93494	2021-02-04 10:14:44 +01:00
Petr Hosek	b42ccdf38f	[NFC] Fix the noprofile attribute comment	2021-02-03 21:54:09 -08:00
Kazu Hirata	b4de30f6af	[Support] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-03 20:41:16 -08:00
Michael Kruse	26b5be66f9	[OpenMPIRBuilder] Implement collapseLoops. The collapseLoops method implements a transformations facilitating the implementation of the collapse-clause. It takes a list of loops from a loop nest and reduces it to a single loop that can be used by other methods that are implemented on just a single loop, such as createStaticWorkshareLoop. This patch shares some changes with D92974 (such as adding some getters to CanonicalLoopNest), used by both patches. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93268	2021-02-03 19:12:02 -06:00
Nico Weber	b995314143	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `97ba5cde52`. Still breaks tests: https://reviews.llvm.org/D76802#2540647	2021-02-03 19:14:34 -05:00
Florian Hahn	7db390cc77	Revert "[LTO] Use lto::backend for code generation." This reverts commit `6a59f05606`, because it is causing failures on green dragon.	2021-02-03 22:49:30 +00:00
Florian Hahn	0a17664b47	Revert "[LTO] Add option enable NewPM with LTOCodeGenerator." This reverts commit `7a6a2cc81a` because it is causing failures on green dragon.	2021-02-03 22:49:20 +00:00
Florian Hahn	b0a8e41cff	Revert "[LTOCodeGenerator] Use lto::Config for options (NFC)." This reverts commit `0d487cf87a` because it is causing failures on green dragon.	2021-02-03 22:48:54 +00:00
Amara Emerson	1a13ee1efb	[GlobalISel] Add sext(constant) -> constant artifact combine. This is the G_SEXT counterpart to the existing G_ZEXT/G_ANYEXT combines. Differential Revision: https://reviews.llvm.org/D95729	2021-02-03 14:10:08 -08:00
Arthur Eubanks	f020544601	[NewPM][HelloWorld] Move HelloWorld to Utils To prevent creating a new component, which creates a new library. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D95907	2021-02-03 12:59:40 -08:00
Richard Smith	6b14c12688	Fix overflowing signed left shift, found by ubsan buildbot.	2021-02-03 12:51:39 -08:00
Krzysztof Parzyszek	0bb1985102	[Hexagon] Add LLVM instruction definitions for Hexagon V68	2021-02-03 13:59:34 -06:00
Jeremy Morse	d32deaab4d	Revert "[DWARF] Location-less inlined variables should not have DW_TAG_variable" This reverts commit `ddc2f1e3fb`. A build-bot objected: http://lab.llvm.org:8011/#builders/105/builds/5486	2021-02-03 17:54:33 +00:00
Jeremy Morse	ddc2f1e3fb	[DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-03 17:32:31 +00:00
Krzysztof Parzyszek	3562d253da	[Hexagon] Add ELF flags for Hexagon V68	2021-02-03 11:02:59 -06:00
Sebastian Neubauer	d49efdc969	Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path." This reverts commits 62af0305b7cc..677a3529d3e6 from D93708. They cause failures in the sanitizer builds because of uninitialized values. A fix is in D95878, but it might take some time until this is pushed, so reverting the changes for now.	2021-02-03 11:03:34 +01:00
Wang, Pengfei	fae6d129da	[X86] Correct types in tablegen multiclasses found by D95874. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95926	2021-02-03 16:05:05 +08:00
Petr Hosek	97ba5cde52	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-02 23:19:51 -08:00
Kazu Hirata	c18231e3dd	[CodeGen] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-02 22:52:45 -08:00
Hsiangkai Wang	c7189ba785	[RISCV] Add new vector instructions in v0.10. * Add new vector instructions in v0.10. - load/store for mask value vle1.v vse1.v - vsetivli for 0-31 immediate vector length. * Rename vector instructions in v0.10. - vfrsqrte7 -> vfrsqrt7 - vfrece7 -> vfrec7 * Reserve memory width encodings for EEW>128b. Differential Revision: https://reviews.llvm.org/D95781	2021-02-03 13:28:58 +08:00
Yang Fan	c90c261e44	[CSSPGO] Fix MSVC initializing truncation warning (NFC) MSVC warning: ``` \llvm-project\llvm\include\llvm\Transforms\IPO\SampleProfileProbe.h(65): warning C4305: 'initializing': truncation from 'double' to 'const float' ```	2021-02-03 11:04:58 +08:00
Yang Fan	8178a55b25	[VFS] Fix Wreturn-type gcc warning (NFC) GCC warning: ``` In file included from /llvm-project/llvm/lib/Support/VirtualFileSystem.cpp:13: /llvm-project/llvm/include/llvm/Support/VirtualFileSystem.h: In static member function ‘static bool llvm::vfs::RedirectingFileSystem::RemapEntry::classof(const llvm::vfs::RedirectingFileSystem::Entry*)’: /llvm-project/llvm/include/llvm/Support/VirtualFileSystem.h:681:5: warning: control reaches end of non-void function [-Wreturn-type] 681 \| } \| ^ ```	2021-02-03 10:22:30 +08:00
Richard Smith	32e98f05fe	Diagnose if a SLEB128 is too large to fit in an int64_t. Previously we'd hit UB due to an invalid left shift operand. Also fix the WASM emitter to properly use SLEB128 encoding instead of ULEB128 encoding for signed fields so that negative numbers don't result in overly-large values that we can't read back any more. In passing, don't diagnose a non-canonical ULEB128 that fits in a uint64_t but has redundant trailing zero bytes. Reviewed By: dblaikie, aardappel Differential Revision: https://reviews.llvm.org/D95510	2021-02-02 14:33:34 -08:00
Christopher Tetreault	ff1147c363	Revert "Ensure that InstructionCost actually implements a total ordering" This reverts commit `b481cd519e`.	2021-02-02 12:10:02 -08:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Christopher Tetreault	b481cd519e	Ensure that InstructionCost actually implements a total ordering Previously, operator== would consider the actual equality of the pairs (lhs.Value, lhs.State) == (rhs.Value, rhs.State). However, if an invalid cost was involved in a call to operator<, only the state would be compared. Thus, it was not the case that ({2, Invalid} < {3, Invalid} \|\| {2, Invalid} > {3, Invalid} \|\| {2, Invalid} == {3, Invalid}). This patch implements a true total ordering, where cost state is considered first, then value. While it's not really imporant that {2, Invalid} be considered to be less than {3, Invalid}, it's not a problem either. This patch also implements operator== in terms of operator<, so the two definitions will be kept in sync. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D95803	2021-02-02 11:49:14 -08:00
Greg McGary	3a9d2f1488	[lld-macho][NFC] refactor relocation handling Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes. Many cleanups Differential Revision: https://reviews.llvm.org/D95121	2021-02-02 10:54:53 -07:00
Fangrui Song	1560a00032	[yaml2obj/obj2yaml/llvm-readobj] Support SHF_GNU_RETAIN In binutils, the flag is defined for ELFOSABI_GNU and ELFOSABI_FREEBSD. It can be used to mark a section as a GC root. In practice, the flag has generic semantics and can be applied to many EI_OSABI values, so we consider it generic. Differential Revision: https://reviews.llvm.org/D95728	2021-02-02 09:19:53 -08:00
Florian Hahn	3e09bc2500	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
Tom Weaver	4f1320b77d	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `df3e39f60b`. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
Thomas Symalla	09508d2849	Reverted whitespace changes. Differential Revision: https://reviews.llvm.org/D90968	2021-02-02 09:14:54 +01:00
Thomas Symalla	ecbed4e0ab	Resolve formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	cdfd9b3bf5	Move Combiner to PreLegalize step	2021-02-02 09:14:53 +01:00
Thomas Symalla	88a832aef1	Refactored the pattern matching.	2021-02-02 09:14:52 +01:00
Thomas Symalla	d41b7fa9bf	Renames	2021-02-02 09:14:52 +01:00
Thomas Symalla	d722924f20	Added comments.	2021-02-02 09:14:52 +01:00
Thomas Symalla	ec043967ec	clang-format	2021-02-02 09:14:52 +01:00
Thomas Symalla	62af0305b7	Added clamp i64 to i16 global isel pattern.	2021-02-02 09:14:52 +01:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Nathan Hawes	ecb00a7762	[VFS] Add support to RedirectingFileSystem for mapping a virtual directory to one in the external FS. Previously file entries in the -ivfsoverlay yaml could map to a file in the external file system, but directories had to list their contents in the form of other file entries or directories. Allowing directory entries to map to a directory in the external file system makes it possible to present an external directory's contents in a different location and (in combination with the 'fallthrough' option) overlay one directory's contents on top of another. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94844	2021-02-02 14:56:17 +10:00
Kazu Hirata	7a37d981d9	[llvm] Use pop_back_val (NFC)	2021-02-01 20:55:05 -08:00
Rahman Lavaee	f1ff6d210a	[obj2yaml, yaml2obj] Use Hex64 for BBAddressMap fields. This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata. Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95767	2021-02-01 15:37:30 -08:00
Petr Hosek	df3e39f60b	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Philip Reames	2a53d9a6e7	[Loads] Plumb through TLI argument [NFC] This is a (rather delayed) follow up to commit `0129cd5`. This commit is entirely NFC, the semantic change to leverage the new information will be submitted separate with a test case.	2021-02-01 11:45:30 -08:00
Simon Pilgrim	657e769688	Revert rGce587529ad8b5 - "[APFloat] multiplySignificand - pass IEEEFloat as const reference. NFCI." Breaks on some buildbots	2021-02-01 16:15:23 +00:00
J-Y You	267b573b55	[TableGen] Fix anonymous record self-reference in foreach and multiclass If we instantiate self-referenced anonymous records in foreach and multiclass, the NAME value will point to incorrect record. It's because anonymous name is resolved too early. This patch adds AnonymousNameInit to represent an anonymous record name. When instantiating an anonymous record, it will update the referred name. Differential Revision: https://reviews.llvm.org/D95309	2021-02-01 10:59:07 -05:00
Simon Pilgrim	ce587529ad	[APFloat] multiplySignificand - pass IEEEFloat as const reference. NFCI. Avoids unnecessary IEEEFloat copies.	2021-02-01 15:41:50 +00:00
Kerry McLaughlin	9b4fcfaa9e	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
Serge Pavlov	bf416d166b	[FPEnv] Intrinsic for setting rounding mode To set non-default rounding mode user usually calls function 'fesetround' from standard C library. This way has some disadvantages. * It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instructions and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels. * Compiler could generate more effective code if it knows that a particular call just sets rounding mode. This change introduces new IR intrinsic, namely 'llvm.set.rounding', which sets current rounding mode, similar to 'fesetround'. It however differs from the latter, because it is a lower level facility: * 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases. * Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level. C standard provides a target-independent way to specify rounding mode, it is used in FLT_ROUNDS, however it does not define standard way to set rounding mode using this encoding. This change implements only IR intrinsic. Lowering it to machine code is target-specific and will be implemented latter. Mapping of 'fesetround' to 'llvm.set.rounding' is also not implemented here. Differential Revision: https://reviews.llvm.org/D74729	2021-02-01 11:28:14 +07:00
Craig Topper	70289ea6f5	[RISCV][LegalizeTypes] Try to expand BSWAP before promoting if the promoted BSWAP would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. This is helfpul on RISCV where we might have to promote i16 all the way to i64. Differential Revision: https://reviews.llvm.org/D95756	2021-01-31 14:33:29 -08:00
Florian Hahn	0d487cf87a	[LTOCodeGenerator] Use lto::Config for options (NFC). This patch removes some options that have been duplicated in LTOCodeGenerator and instead use lto::Config directly to manage the options. This is a cleanup after `6a59f05606`. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95738	2021-01-31 19:08:07 +00:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Alexey Lapshin	fb244ffb9f	[dsymutil][DWARFLinker][NFC] make AddressManager not depending on the order of checks for relocations. Current dsymutil implementation of hasLiveMemoryLocation()/hasLiveAddressRange() and applyValidRelocs() assume that calls should be done in certain order (from first Dies to last). Multi-thread implementation might call these methods in other order(it might process compilation units in order other than they are physically located), so we remove restriction that searching for relocations should be done in ascending order. This change does not introduce noticable performance degradation. The testing results for clang binary: golden-dsymutil/dsymutil 23787992 clang MD5: 5efa8fd9355ebf81b65f24db5375caa2 elapsed time=91sec build-Release/bin/dsymutil 23855616 clang MD5: 5efa8fd9355ebf81b65f24db5375caa2 elapsed time=91sec Differential Revision: https://reviews.llvm.org/D93106	2021-01-31 16:34:10 +03:00
Kazu Hirata	627b5bda11	[llvm] Add missing header guards (NFC) Identified with llvm-header-guard.	2021-01-30 09:53:42 -08:00
Florian Hahn	7a6a2cc81a	[LTO] Add option enable NewPM with LTOCodeGenerator. This patch adds an option to enable the new pass manager in LTOCodeGenerator. It also updates a few tests with legacy PM specific tests, which started failing after `6a59f05606` when LLVM_ENABLE_NEW_PASS_MANAGER=true.	2021-01-30 11:54:20 +00:00
Florian Hahn	6a59f05606	[LTO] Use lto::backend for code generation. This patch updates LTOCodeGenerator to use the utilities provided by LTOBackend to run middle-end optimizations and backend code generation. This is a first step towards unifying the code used by libLTO's C API and the newer, C++ interface (see PR41541). The immediate motivation is to allow using the new pass manager when doing LTO using libLTO's C API, which is used on Darwin, among others. With the changes, there are no codegen/stats differences when building MultiSource/SPEC2000/SPEC2006 on Darwin X86 with LTO, compared to without the patch. Reviewed By: steven_wu Differential Revision: https://reviews.llvm.org/D94487	2021-01-30 10:09:55 +00:00
Kazu Hirata	7728cc003a	[llvm] Use append_range (NFC)	2021-01-29 23:23:34 -08:00
Nathan Hawes	719f778441	[VFS] Combine VFSFromYamlDirIterImpl and OverlayFSDirIterImpl into a single implementation (NFC) As a fixme notes, both of these directory iterator implementations are conceptually similar and duplicate the functionality of returning and uniquing entries across two or more directories. This patch combines them into a single class 'CombiningDirIterImpl'. This also drops the 'Redirecting' prefix from RedirectingDirEntry and RedirectingFileEntry to save horizontal space. There's no loss of clarity as they already have to be prefixed with 'RedirectingFileSystem::' whenever they're referenced anyway. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94857	2021-01-30 11:10:10 +10:00
Roman Lebedev	c2534a7097	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Christopher Tetreault	49a6502cd5	[SVE] delete VectorType::getNumElements() The previously agreed-upon deprecation period for VectorType::getNumElements() has passed. This patch removes this method and completes the refactor proposed in the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-March/139811.html Reviewed By: david-arm, rjmccall Differential Revision: https://reviews.llvm.org/D95570	2021-01-29 13:46:54 -08:00
Jay Foad	5cf6412a27	[GlobalISel] Fix modifying a G_OR without notifying the observer Remove the call to setFlags in favour of creating the instruction with the correct flags in the first place, so we don't have to explicitly notify the observer. Differential Revision: https://reviews.llvm.org/D95681	2021-01-29 16:32:24 +00:00
Florian Hahn	f3a710cade	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Kazu Hirata	046cfb8565	[llvm] Forward-declare formatted_raw_ostream (NFC) Various TargetStreamer.h need formatted_raw_ostream but rely on a forward declaration of formatted_raw_ostream in MCStreamer.h. This patch adds forward declarations right in TargetStreamer.h. While we are at it, this patch removes the one in MCStreamer.h, where it is unnecessary.	2021-01-28 22:21:13 -08:00
Christudasan Devadasan	892e4567e1	Support a list of CostPerUse values This patch allows targets to define multiple cost values for each register so that the cost model can be more flexible and better used during the register allocation as per the target requirements. For AMDGPU the VGPR allocation will be more efficient if the register cost can be associated dynamically based on the calling convention. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86836	2021-01-29 10:14:52 +05:30
Duncan P. N. Exon Smith	17c584551d	ADT: Add SFINAE to the generic IntrusiveRefCntPtr constructors Add an `enable_if` to the generic `IntrusiveRefCntPtr` constructors so that std::is_convertible gives an honest answer when the underlying pointers cannot be converted. Added `static_assert`s to the test suite to verify. Also combine generic constructors from `IntrusiveRefCntPtr<X>&&` and `const IntrusiveRefCntPtr<X>&`. At first glance this appears to be an infinite loop, but the real copy/move constructors are spelled out separately above. Added a unit test to verify. Differential Revision: https://reviews.llvm.org/D95498	2021-01-28 15:07:27 -08:00
Jessica Paquette	daffab1985	Recommit "[GlobalISel] Walk through hints in getDefIgnoringCopies et al" Recommit of `4580acf675` `Opc = DefMI->getOpcode()` was in the wrong place.	2021-01-28 14:43:00 -08:00
Jessica Paquette	dcb5b5f1f2	Revert "[GlobalISel] Walk through hints in getDefIgnoringCopies et al" This reverts commit `4580acf675`. Reverting while looking into some test failures.	2021-01-28 14:37:57 -08:00
Jessica Paquette	4580acf675	[GlobalISel] Walk through hints in getDefIgnoringCopies et al Treat hint instructions like G_ASSERT_ZEXT like COPY instructions in helpers which walk through copies. This ensures that instructions like G_ASSERT_ZEXT won't impact any optimizations that rely on these helpers. Differential Revision: https://reviews.llvm.org/D95577	2021-01-28 14:27:00 -08:00
Cassie Jones	f22f4557a7	[GlobalISel] Implement widenScalar for carry-in add/sub These are widened to a wider UADDE/USUBE, with the overflow value unused, and with the same synthesis of a new overflow value as for the O operations. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D95326	2021-01-28 17:06:24 -05:00
Jessica Paquette	24261729a4	[GlobalISel] Add G_ASSERT_ZEXT This adds a generic opcode which communicates that a type has already been zero-extended from a narrower type. This is intended to be similar to AssertZext in SelectionDAG. For example, ``` %x_was_extended:_(s64) = G_ASSERT_ZEXT %x, 16 ``` Signifies that the top 48 bits of %x are known to be 0. This is useful in cases like this: ``` define i1 @zeroext_param(i8 zeroext %x) { %cmp = icmp ult i8 %x, -20 ret i1 %cmp } ``` In AArch64, `%x` must use a 32-bit register, which is then truncated to a 8-bit value. If we know that `%x` is already zero-ed out in the relevant high bits, we can avoid the truncate. Currently, in GISel, this looks like this: ``` _zeroext_param: and w8, w0, #0xff ; We don't actually need this! cmp w8, #236 cset w0, lo ret ``` While SDAG does not produce the truncation, since it knows that it's unnecessary: ``` _zeroext_param: cmp w0, #236 cset w0, lo ret ``` This patch - Adds G_ASSERT_ZEXT - Adds MIRBuilder support for it - Adds MachineVerifier support for it - Documents it It also puts G_ASSERT_ZEXT into its own class of "hint instruction." (There should be a G_ASSERT_SEXT in the future, maybe a G_ASSERT_ALIGN as well.) This allows us to skip over hints in the legalizer etc. These can then later be selected like COPY instructions or removed. Differential Revision: https://reviews.llvm.org/D95564	2021-01-28 13:58:37 -08:00
Greg Clayton	f8122d3532	Add the ability to extract the unwind rows from DWARF Call Frame Information. This patch adds the ability to evaluate the state machine for CIE and FDE unwind objects and produce a UnwindTable with all UnwindRow objects needed to unwind registers. It will also dump the UnwindTable for each CIE and FDE when dumping DWARF .debug_frame or .eh_frame sections in llvm-dwarfdump or llvm-objdump. This allows users to see what the unwind rows actually look like for a given CIE or FDE instead of just seeing a list of opcodes. This patch adds new classes: UnwindLocation, RegisterLocations, UnwindRow, and UnwindTable. UnwindLocation is a class that describes how to unwind a register or Call Frame Address (CFA). RegisterLocations is a class that tracks registers and their UnwindLocations. It gets populated when parsing the DWARF call frame instruction opcodes for a unwind row. The registers are mapped from their register numbers to the UnwindLocation in a map. UnwindRow contains the result of evaluating a row of DWARF call frame instructions for the CIE, or a row from a FDE. The CIE can produce a set of initial instructions that each FDE that points to that CIE will use as the seed for the state machine when parsing FDE opcodes. A UnwindRow for a CIE will not have a valid address, whille a UnwindRow for a FDE will have a valid address. The UnwindTable is a class that contains a sorted (by address) vector of UnwindRow objects and is the result of parsing all opcodes in a CIE, or FDE. Parsing a CIE should produce a UnwindTable with a single row. Parsing a FDE will produce a UnwindTable with one or more UnwindRow objects where all UnwindRow objects have valid addresses. The rows in the UnwindTable will be sorted from lowest Address to highest after parsing the state machine, or an error will be returned if the table isn't sorted. To parse a UnwindTable clients can use the following methods: static Expected<UnwindTable> UnwindTable::create(const CIE Cie); static Expected<UnwindTable> UnwindTable::create(const FDE Fde); A valid table will be returned if the DWARF call frame instruction opcodes have no encoding errors. There are a few things that can go wrong during the evaluation of the state machine and these create functions will catch and return them. Differential Revision: https://reviews.llvm.org/D89845	2021-01-28 13:39:17 -08:00
Reid Kleckner	bacf9cf2c5	Revert "[PDB] Defer relocating .debug$S until commit time and parallelize it" This reverts commit `1a9bd5b813`. I suspect that this patch may have caused https://crbug.com/1171438.	2021-01-28 13:17:27 -08:00
Thomas Lively	4b68b64dcc	[WebAssembly] Prototype i8x16 to i32x4 widening instructions As proposed in https://github.com/WebAssembly/simd/pull/395 and matching the opcodes used in V8: https://chromium-review.googlesource.com/c/v8/v8/+/2617385/4/src/wasm/wasm-opcodes.h Differential Revision: https://reviews.llvm.org/D95557	2021-01-28 10:59:32 -08:00
David Blaikie	4318028cd2	DebugInfo: Add a DWARF FORM extension for addrx+offset references to reduce relocations This is an alternative to the use of complex DWARF expressions for addresses - shaving off a few extra bytes of expression overhead.	2021-01-28 10:20:02 -08:00
Simon Pilgrim	b06ccc7446	[APFloat] Remove orphan ilogb(DoubleAPFloat) declaration. NFCI.	2021-01-28 15:18:25 +00:00
Simon Pilgrim	5169627c14	[APFloat] scalbn - pass DoubleAPFloat arg as const-ref. NFCI. Avoid unnecessary copy and fix clang-tidy warning.	2021-01-28 15:18:24 +00:00
Stefan Gränitz	b9ff5da0c8	[Orc] Remove unused header from TPC server The header would include OrcJIT headers in OrcTargetProcess, which is not desired. All common declarations should be in OrcShared. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D95606	2021-01-28 14:16:49 +01:00
Simon Pilgrim	7396f720f9	[DebugInfo] Remove some unused includes. NFCI. Mainly removing a lot of <vector> includes from files that don't explicitly use std::vector	2021-01-28 11:21:35 +00:00
Georgii Rymar	68195b15a3	[yaml2obj] - Allow empty SectionHeaderTable definitions. Currently we don't allow the following definition: ``` Sections: - Type: SectionHeaderTable - Name: .foo Type: SHT_PROGBITS ``` We report an error: "SectionHeaderTable can't be empty. Use 'NoHeaders' key to drop the section header table". It was implemented in this way earlier, when `SectionHeaderTable` was a dedicated key outside of the `Sections` list. And we did not allow to select where the table is written. Currently it makes sense to allow it, because a user might want to place the default section header table at an arbitrary position, e.g. before other sections. In this case it is not convenient and error prone to require specifying all sections: ``` Sections: - Type: SectionHeaderTable Sections: - Name: .foo - Name: .strtab - Name: .shstrtab - Name: .foo Type: SHT_PROGBITS ``` This patch allows empty SectionHeaderTable definitions. Differential revision: https://reviews.llvm.org/D95341	2021-01-28 10:51:52 +03:00
Kazu Hirata	f82b5a647e	[DebugInfo] Forward-declare PDBFile (NFC) NativeEnumInjectedSources.h needs PDBFile but relies on a forward declaration of PDBFile in InjectedSourceStream.h. This patch adds a forward declaration right in NativeEnumInjectedSources.h. While we are at it, this patch removes the one in InjectedSourceStream.h, where it is unnecessary.	2021-01-27 23:25:38 -08:00
Hongtao Yu	7e99bddfea	[CSSPGO] Support of CS profiles in extended binary format. This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names. Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D95547	2021-01-27 21:29:46 -08:00
Fangrui Song	6612c2bb68	[llvm-c] Move LLVMX86_AMXTypeKind & LLVMPoisonValueValueKind to the bottom to avoid value changes compared with LLVM<=11 Fixes PR48905	2021-01-27 16:28:04 -08:00
Teresa Johnson	1487747e99	[LTO] Prevent devirtualization for symbols dynamically exported Identify dynamically exported symbols (--export-dynamic[-symbol=], --dynamic-list=, or definitions needed to preempt shared objects) and prevent their LTO visibility from being upgraded. This helps avoid use of whole program devirtualization when there may be overrides in dynamic libraries. Differential Revision: https://reviews.llvm.org/D91583	2021-01-27 15:54:13 -08:00
James Y Knight	9c7aeaebb3	Itanium Mangling: Mangle `__alignof__` differently than `alignof`. The two operations have acted differently since Clang 8, but were unfortunately mangled the same. The new mangling uses new "vendor extended expression" syntax proposed in https://github.com/itanium-cxx-abi/cxx-abi/issues/112 GCC had the same mangling problem, https://gcc.gnu.org/PR88115, and will hopefully be switching to the same mangling as implemented here. Additionally, fix the mangling of `__uuidof` to use the new extension syntax, instead of its previous nonstandard special-case. Adjusts the demangler accordingly. Differential Revision: https://reviews.llvm.org/D93922	2021-01-27 16:46:51 -05:00
Varun Gandhi	44f792966e	[Demangle] Support demangling Swift calling convention in MS demangler. Previously, Clang was able to mangle the Swift calling convention but 'MicrosoftDemangle.cpp' was not able to demangle it. Reviewed By: compnerd, rnk Differential Revision: https://reviews.llvm.org/D95053	2021-01-27 13:24:54 -08:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Fangrui Song	54fb3ca96e	[ThinLTO] Add Visibility bits to GlobalValueSummary::GVFlags Imported functions and variable get the visibility from the module supplying the definition. However, non-imported definitions do not get the visibility from (ELF) the most constraining visibility among all modules (Mach-O) the visibility of the prevailing definition. This patch * adds visibility bits to GlobalValueSummary::GVFlags * computes the result visibility and propagates it to all definitions Protected/hidden can imply dso_local which can enable some optimizations (this is stronger than GVFlags::DSOLocal because the implied dso_local can be leveraged for ELF -shared while default visibility dso_local has to be cleared for ELF -shared). Note: we don't have summaries for declarations, so for ELF if a declaration has the most constraining visibility, the result visibility may not be that one. Differential Revision: https://reviews.llvm.org/D92900	2021-01-27 10:43:51 -08:00
Craig Topper	0b50fa9945	[FaultsMaps][llvm-objdump] Move FaultMapParser to Object/. Remove CodeGen dependency from llvm-objdump FaultsMapParser lived in CodeGen and was forcing llvm-objdump to link CodeGen and everything CodeGen depends on. This was previously attempted in r240364 to fix a link failure. The CodeGen dependency was independently added to fix the same link failure, and that ended up being kept. Removing the dependency seems like the correct layering for llvm-objdump. Reviewed By: MaskRay, jhenderson Differential Revision: https://reviews.llvm.org/D95414	2021-01-27 10:39:59 -08:00
Valentin Clement	f30c523660	[flang][openacc] Allow multiple wait clauses kernels loop and enter data had a too restrictive constraint for the wait clause. The wait clause is allowed multiple times and not only once. This patch fix this problem. Reviewed By: SouraVX Differential Revision: https://reviews.llvm.org/D95469	2021-01-27 13:18:46 -05:00
Florian Hahn	28410d17f5	[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks. This gives the user control over which expander to use, which in turn allows the user to decide what to do with the expanded instructions. Used in D75980. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94295	2021-01-27 17:36:19 +00:00
Valentin Clement	b65896ef8b	[flang][openacc] Fix clause restriction for exit data directive Restriction on clauses for the EXIT DATA directive were not fully correct. This patch fixes the situation. The async, if and finalize clauses are allowed only once. Reviewed By: SouraVX Differential Revision: https://reviews.llvm.org/D95470	2021-01-27 10:07:19 -05:00
Valentin Clement	5e09a02527	[flang][openacc] Fix clause restriction for host_data directive Restriction on clauses for the HOST_DATA directive were not fully correct. This patch fixes the situation. The if and if_present clauses are allowed only once. Reviewed By: SouraVX Differential Revision: https://reviews.llvm.org/D95473	2021-01-27 10:06:33 -05:00
Kazu Hirata	6bde085366	[AMDGPU] Forward-declare TargetRegisterClass (NFC) AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a forward declaration of TargetRegisterClass in InstructionSelector.h. This patch adds a forward declaration right in AMDGPUInstructionSelector.h. While we are at it, this patch removes the one in InstructionSelector.h, where it is unnecessary.	2021-01-26 20:00:16 -08:00
Petr Hosek	bb9eb19829	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 17:13:34 -08:00
Duncan P. N. Exon Smith	2f721476d1	Frontend: Simplify handling of non-seeking streams in CompilerInstance, NFC Add a new `raw_pwrite_ostream` variant, `buffer_unique_ostream`, which is like `buffer_ostream` but with unique ownership of the stream it's wrapping. Use this in CompilerInstance to simplify the ownership of non-seeking output streams, avoiding logic sprawled around to deal with them specially. This also simplifies future work to encapsulate output files in a different class. Differential Revision: https://reviews.llvm.org/D93260	2021-01-26 15:20:43 -08:00
Haowei Wu	15313f64be	[llvm-elfabi] Support ELF file that lacks .gnu.hash section Before this change, when reading ELF file, elfabi determines number of entries in .dynsym by reading the .gnu.hash section. This change makes elfabi read section headers directly first. This change allows elfabi works on ELF files which do not have .gnu.hash sections. Differential Revision: https://reviews.llvm.org/D93362	2021-01-26 12:31:52 -08:00
Fangrui Song	34b60d8a56	Add -fbinutils-version= to gate ELF features on the specified binutils version There are two use cases. Assembler We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some features are supported by latest GNU as, but we have to use MCAsmInfo::useIntegratedAs() because the newer versions have not been widely adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26). Linker We want to use features supported only by LLD or very new GNU ld, or don't want to work around older GNU ld. We currently can't represent that "we don't care about old GNU ld". You can find such workarounds in a few other places, e.g. Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276), R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727 https://sourceware.org/bugzilla/show_bug.cgi?id=22969) Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001; GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available). This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table). This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc. It changes one codegen place in SHF_MERGE to demonstrate its usage. `-fbinutils-version=2.35` means the produced object file does not care about GNU ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced assembly can be consumed by GNU as>=2.35, but older versions may not work. `-fbinutils-version=none` means that we can use all ELF features, regardless of GNU as/ld support. Both clang and llc need `parseBinutilsVersion`. Such command line parsing is usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen), however, ClangCodeGen does not depend on LLVMCodeGen. So I add `parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget). Differential Revision: https://reviews.llvm.org/D85474	2021-01-26 12:28:23 -08:00
Petr Hosek	1e634f3952	Revert "Support for instrumenting only selected files or functions" This reverts commit `4edf35f11a` because the test fails on Windows bots.	2021-01-26 12:25:28 -08:00
Petr Hosek	4edf35f11a	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 11:11:39 -08:00
Simon Pilgrim	f82cff31d3	[AMDGPU] HSAMD::fromString - replace std::string arg with StringRef. NFCI. Removes an unnecessary chain of StringRef -> std::string -> StringRef conversions	2021-01-26 16:09:39 +00:00
Sander de Smalen	b9417c3616	[CostModel] Handle CTLZ and CCTZ in getTypeBasedIntrinsicInstrCost Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D95355	2021-01-26 14:37:51 +00:00
Sebastian Neubauer	b36370d153	[AMDGPU] Add IntrWillReturn to three intrinsics None of these can terminate a wave or lane. With these, all intrinsic are IntrWillReturn except those that change exec or can terminate the wave. Not marking intrinsics as WillReturn may prevent optimizations in the future: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148047.html Differential Revision: https://reviews.llvm.org/D95436	2021-01-26 15:33:15 +01:00
Florian Hahn	35b3989a30	[Passes] Run peeling as part of simple/full loop unrolling. Loop peeling removes conditions from loop bodies that become invariant after a small number of iterations. When triggered, this leads to fewer compares and possibly PHIs in loop bodies, enabling further optimizations. The current cost-model of loop peeling should be quite conservative/safe, i.e. only peel if a condition in the loop becomes known after peeling. For example, see PR47671, where loop peeling enables vectorization by removing a PHI the vectorizer does not understand. Granted, the loop-vectorizer could also be taught about constant PHIs, but loop peeling is likely to enable other optimizations as well. This has an impact on quite a few benchmarks from MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto, for example Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsVectorized Program base patch diff test-suite...ve-susan/automotive-susan.test 8.00 9.00 12.5% test-suite...nal/skidmarks10/skidmarks.test 35.00 31.00 -11.4% test-suite...lications/sqlite3/sqlite3.test 41.00 43.00 4.9% test-suite...s/ASC_Sequoia/AMGmk/AMGmk.test 25.00 26.00 4.0% test-suite...006/450.soplex/450.soplex.test 88.00 89.00 1.1% test-suite...TimberWolfMC/timberwolfmc.test 120.00 119.00 -0.8% test-suite.../CINT2006/403.gcc/403.gcc.test 215.00 216.00 0.5% test-suite...006/447.dealII/447.dealII.test 957.00 958.00 0.1% test-suite...ternal/HMMER/hmmcalibrate.test 75.00 75.00 0.0% Same hash: 186 (filtered out) Remaining: 51 Metric: loop-vectorize.LoopsAnalyzed Program base patch diff test-suite...ks/Prolangs-C/agrep/agrep.test 440.00 434.00 -1.4% test-suite...nal/skidmarks10/skidmarks.test 312.00 308.00 -1.3% test-suite...marks/7zip/7zip-benchmark.test 6399.00 6323.00 -1.2% test-suite...lications/minisat/minisat.test 134.00 135.00 0.7% test-suite...rks/FreeBench/pifft/pifft.test 295.00 297.00 0.7% test-suite...TimberWolfMC/timberwolfmc.test 1879.00 1869.00 -0.5% test-suite...pplications/treecc/treecc.test 689.00 691.00 0.3% test-suite...T2000/300.twolf/300.twolf.test 1593.00 1597.00 0.3% test-suite.../Benchmarks/Bullet/bullet.test 1394.00 1392.00 -0.1% test-suite...ications/JM/ldecod/ldecod.test 1431.00 1429.00 -0.1% test-suite...6/464.h264ref/464.h264ref.test 2229.00 2230.00 0.0% test-suite...lications/sqlite3/sqlite3.test 2590.00 2589.00 -0.0% test-suite...ications/JM/lencod/lencod.test 2732.00 2733.00 0.0% test-suite...006/453.povray/453.povray.test 3395.00 3394.00 -0.0% Note the -11% regression in number of loops vectorized for skidmarks. I suspect this corresponds to the fact that those loops are gone now (see the reduction in number of loops analyzed by LV). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88471	2021-01-26 13:52:30 +00:00
Georgii Rymar	d5e48f1347	[yaml2obj][obj2yaml] - Improve how we set/dump the sh_entsize field. We already set the `sh_entsize` field in a single place for all non-implicit sections. This patch reorders the logic slightly and with it we finally have the only one place where the `sh_entsize` is set. obj2yaml will not dump the `EntSize` key for `SHT_DYNSYM/SHT_SYMTAB` sections anymore, when the value of `sh_entsize` is equal to `sizeof(Elf_Sym)` Note that this also seems revealed an issue in llvm-objcopy: Previously yaml2obj set the `sh_entsize` for the `.symtab` section to 0x18, now we it sets it for `SHT_SYMTAB` sections, i.e. by type. But the `llvm-objcopy/ELF/only-keep-debug.test` has a `.symtab` section of type `SHT_STRTAB`, and now yaml2obj sets the `sh_entsize` to 0 for it. I had to update the corresponding check lines for `ES`, but the behavior of `llvm-objcopy` should be fixed instead I think. I've added a TODO and a comment. Differential revision: https://reviews.llvm.org/D95364	2021-01-26 13:33:02 +03:00
Georgii Rymar	e98d5c3192	[libObject,llvm-readelf/obj] - Don't use @@ when printing versions of undefined symbols. A default version (@@) is only available for defined symbols. Currently we use "@@" for undefined symbols too. This patch fixes the issue and improves our test case. Differential revision: https://reviews.llvm.org/D95219	2021-01-26 12:05:59 +03:00
Jan Svoboda	9338f3a586	[clang][cli] Accept strings instead of options in ImpliedByAnyOf To be able to refer to constant keypaths (e.g. `defvar cplusplus = LangOpts<"CPlusPlus">`) inside `ImpliedByAnyOf`, let's accept strings instead of `Option` instances. This somewhat weakens the guarantees that we're referring to an existing (option) record, but we can still use the option.KeyPath syntax to simulate this. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D95344	2021-01-26 09:30:36 +01:00
Hsiangkai Wang	b69932b550	[RISCV] Implement vlsegff intrinsics. Differential Revision: https://reviews.llvm.org/D95303	2021-01-26 12:02:43 +08:00
Kazu Hirata	c85b6bf33c	[AMDGPU] Forward-declare MachineIRBuilder (NFC) AMDGPULegalizerInfo.h needs MachineIRBuilder but relies on a forward declaration of MachineIRBuilder in LegalizerInfo.h. This patch adds a forward declaration right in AMDGPULegalizerInfo.h. While we are at it, this patch removes the one in LegalizerInfo.h, where it is unnecessary.	2021-01-25 19:24:01 -08:00
Kazu Hirata	772134e3ec	[StackSafety] Use ListSeparator (NFC)	2021-01-25 19:23:59 -08:00
Amara Emerson	03bce0bf4e	[GlobalISel][Localizer] Don't localize phi operands which are used more than once in the phi. The current algorithm just tries to localize defs as far as they can go, and in the case of G_PHI operands, it clones the def into the predecessor block for each incoming edge. When multiple edges have the same register value, this can cause unnecessary code bloat, and inhibit later optimizations. This change checks if a given phi operand is unique in the phi, if not the def of that register is not localized to the predecessor. Differential Revision: https://reviews.llvm.org/D95406	2021-01-25 17:48:04 -08:00
Mitch Phillips	c9466ede7e	Revert "Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method"" This reverts commit `554b3211fe`. Differential Revision: https://reviews.llvm.org/D95035	2021-01-25 16:22:22 -08:00

... 5 6 7 8 9 ...

44488 Commits