llvm-project

Commit Graph

Author	SHA1	Message	Date
Duncan P. N. Exon Smith	01701646d5	Transforms: Clone distinct nodes in metadata mapper unless RF_ReuseAndMutateDistinctMDs This is a follow up to `22a52dfddc` and a revert of `df763188c9`. With this change, we only skip cloning distinct nodes in MDNodeMapper::mapDistinct if RF_ReuseAndMutateDistinctMDs, dropping the no-longer-needed local helper `cloneOrBuildODR()`. Skipping cloning in other cases is unsound and breaks CloneModule, which is why the textual IR for PR48841 didn't pass previously. This commit adds the test as: Transforms/ThinLTOBitcodeWriter/cfi-debug-info-cloned-type-references-global-value.ll Cloning less often exposed a hole in subprogram cloning in CloneFunctionInto thanks to df763188c9a1ecb1e7e5c4d4ea53a99fbb755903's test ThinLTO/X86/Inputs/dicompositetype-unique-alias.ll. If a function has a subprogram attachment whose scope is a DICompositeType that shouldn't be cloned, but it has no internal debug info pointing at that type, that composite type was being cloned. This commit plugs that hole, calling DebugInfoFinder::processSubprogram from CloneFunctionInto. As hinted at in 22a52dfddcefad4f275eb8ad1cc0e200074c2d8a's commit message, I think we need to formalize ownership of metadata a bit more so that ValueMapper/CloneFunctionInto (and similar functions) can deal with cloning (or not) metadata in a more generic, less fragile way. This fixes PR48841. Differential Revision: https://reviews.llvm.org/D96734	2021-02-24 12:57:52 -08:00
Duncan P. N. Exon Smith	1e1b92f76d	IR: Rename Metadata::ImplicitCode to SubclassData1, NFC Metadata::ImplicitCode is a bit shaved off of Metadata::Storage, currently only in use by the subclass DILocation. However, the bit isn't reserved for that purpose. Rename it `SubclassData1` to make it clear that it has nothing to do with Metadata itself (and other subclasses are free to use it). As a drive-by, remove an old TODO about exposing bits to subclasses (looks like that has mostly been done). No functionality change here. Differential Revision: https://reviews.llvm.org/D96740	2021-02-24 12:56:26 -08:00
James Y Knight	c2487bf7df	Remove a workaround for MSVC 2013, now that MSVC 2017 is the minimum. In MSVC 2013, 'alignas(integer-template-arg)' didn't compile; verified on godbolt that this now works properly.	2021-02-24 13:56:49 -05:00
Lang Hames	8380d07e39	[JITLink] Add assertions, fix a comment. The new assertions check that Addressables removed when removing external or absolute symbols are not referenced by another symbol. A comment on post-fixup passes is updated: vmaddrs have all been set up by the time the pre-fixup passes are run, post-fixup passes run after fixups have been applied to content.	2021-02-24 21:02:37 +11:00
Dan Liew	7d3ef103b5	[ASan] Introduce a way set different ways of emitting module destructors. Previously there was no way to control how module destructors were emitted by `ModuleAddressSanitizerPass`. However, we want language frontends (e.g. Clang) to be able to decide how to emit these destructors (if at all). This patch introduces the `AsanDtorKind` enum that represents the different ways destructors can be emitted. There are currently only two valid ways to emit destructors. * `Global` - Use `llvm.global_dtors`. This was the previous behavior and is the default. * `None` - Do not emit module destructors. The `ModuleAddressSanitizerPass` and the various wrappers around it have been updated to take the `AsanDtorKind` as an argument. The `-asan-destructor-kind=` command line argument has been introduced to make this easy to test from `opt`. If this argument is specified it overrides the value passed to the `ModuleAddressSanitizerPass` constructor. Note that `AsanDtorKind` is not `bool` because we will introduce a new way to emit destructors in a subsequent patch. Note that `AsanDtorKind` is given its own header file because if it is declared in `Transforms/Instrumentation/AddressSanitizer.h` it leads to compile error (Module is ambiguous) when trying to use it in `clang/Basic/CodeGenOptions.def`. rdar://71609176 Differential Revision: https://reviews.llvm.org/D96571	2021-02-23 20:01:21 -08:00
Nico Weber	f14a14dd25	Revert "Add more historic DWARF vendor extensions" This reverts commit `c4a9144468`. Breaks check-llvm everywhere, see https://reviews.llvm.org/D97242#2583716	2021-02-23 22:10:02 -05:00
Chen Zheng	be5d92e37e	[Debug-Info][NFC] move emitDwarfUnitLength to MCStreamer class We may need to do some customization for DWARF unit length in DWARF section headers for some targets for some code generation path. For example, for XCOFF in assembly path, AIX assembler does not require the debug section containing its debug unit length in the header. Move emitDwarfUnitLength to MCStreamer class so that we can do customization in different Streamers Reviewed By: ikudrin Differential Revision: https://reviews.llvm.org/D95932	2021-02-23 21:29:05 -05:00
Adrian Prantl	c4a9144468	Add more historic DWARF vendor extensions The maintainer of libdwarf kindly provided this patch with a bunch of historic DWARF extensions that are missing from Dwarf.def. This list is helpful to avoid potential conflicts in the user-defined vendor extension space in the future. Patch by David Anderson! Differential Revision: https://reviews.llvm.org/D97242	2021-02-23 17:54:04 -08:00
Juneyoung Lee	56d228a14e	[SimplifyCFG] Update passingValueIsAlwaysUndefined to check more attributes This is a simple patch to update SimplifyCFG's passingValueIsAlwaysUndefined to inspect more attributes. A new function `CallBase::isPassingUndefUB` checks attributes that imply noundef. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97244	2021-02-24 10:40:50 +09:00
Erich Keane	af4451eb4f	[NFC] Make TrailingObjects non-copyable/non-movable This got me pretty recently... TrailingObjects cannot be copied or moved, since they need to be pre-allocated. This patch deletes the copy and move operations (plus re-adds the default ctor). Differential Revision: https://reviews.llvm.org/D97324	2021-02-23 16:30:13 -08:00
Fangrui Song	ef312951fd	collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128 And delete the SmallPtrSetImpl overload. While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97257	2021-02-23 16:09:06 -08:00
Fangrui Song	3adb89bb9f	[ThinLTO] Make cloneUsedGlobalVariables deterministic Iterating on `SmallPtrSet<GlobalValue *, 8>` with more than 8 elements is not deterministic. Use a SmallVector instead because `Used` is guaranteed to contain unique elements. While here, decrease inline element counts from 8 to 4. The number of `llvm.used`/`llvm.compiler.used` elements is usually 0 or 1. For full LTO/hybrid LTO, the number may be large, so we need to be careful. According to tejohnson's analysis https://reviews.llvm.org/D97128#2582399 , 4 is good for a large project with WholeProgramDevirt, when available_externally vtables are placed in the llvm.compiler.used set. Differential Revision: https://reviews.llvm.org/D97128	2021-02-23 16:09:05 -08:00
Heejin Ahn	ea8c6375e3	[WebAssembly] Fix incorrect grouping and sorting of exceptions This CL is not big but contains changes that span multiple analyses and passes. This description is very long because it tries to explain basics on what each pass/analysis does and why we need this change on top of that. Please feel free to skip parts that are not necessary for your understanding. --- `WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next unwind destination>. The value (unwind dest) here is where an exception should end up when it is not caught by the key (EH pad). We record this info in WasmEHPrepare to fix catch mismatches, because the CFG itself does not have this info. A CFG only contains BBs and predecessor-successor relationship between them, but in `WasmEHFuncInfo` the unwind destination BB is not necessarily a successor or the key EH pad BB. Their relationship can be intuitively explained by this C++ code snippet: ``` try { try { foo(); } catch (int) { // EH pad ... } } catch (...) { // unwind destination } ``` So when `foo()` throws, it goes to `catch (int)` first. But if it is not caught by it, it ends up in the next unwind destination `catch (...)`. This unwind destination is what you see in `catchswitch`'s `unwind label %bb` part. --- `WebAssemblyExceptionInfo` groups exceptions so that they can be sorted continuously together in CFGSort, as we do for loops. What this analysis does is very simple: it creates a single `WebAssemblyException` per EH pad, and all BBs that are dominated by that EH pad are included in this exception. We also identify subexception relationship in this way: if EHPad A domiantes EHPad B, EHPad B's exception is a subexception of EHPad A's exception. This simple rule turns out to be incorrect in some cases. In `WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means semantically EHPad B should not be included in EHPad A's exception, because it does not make sense to rethrow/delegate to an inner scope. This is what happened in CFGStackify as a result of this: ``` try try catch ... <- %dest_bb is among here! end delegate %dest_bb ``` So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to make sure excptions' unwind destinations are not subexceptions of their unwind sources in `WasmEHFuncInfo`. But this alone does not prevent `dest_bb` in the example above from being sorted within the inner `catch`'s exception, even if its exception is not a subexception of that `catch`'s exception anymore, because of how CFGSort works, which will be explained below. --- CFGSort places BBs within the same `SortRegion` (loop or exception) continuously together so they can be demarcated with `loop`-`end_loop` or `catch`-`end_try` in CFGStackify. `SortRegion` is a wrapper for one of `MachineLoop` or `WebAssemblyException`. `SortRegionInfo` already does some complicated things because there discrepancies between those two data structures. `WebAssemblyException` is what we control, and it is defined as an EH pad as its header and BBs dominated by the header as its BBs (with a newly added exception of unwind destinations explained in the previous paragraph). But `MachineLoop` is an LLVM data structure and uses the standard loop detection algorithm. So by the algorithm, BBs that are 1. dominated by the loop header and 2. have a path back to its header. Because of the second condition, many BBs that are dominated by the loop header are not included in the loop. So BBs that contain `return` or branches to outside of the loop are not technically included in `MachineLoop`, but they can be sorted together with the loop with no problem. Maybe to relax the condition, in CFGSort, when we are in a `SortRegion` we allow sorting of not only BBs that belong to the current innermost region but also BBs that are by the current region header. (This was written this way from the first version written by Dan, when only loops existed.) But now, we have cases in exceptions when EHPad B is the unwind destination for EHPad A, even if EHPad B is dominated by EHPad A it should not be included in EHPad A's exception, and should not be sorted within EHPad A. One way to make things work, at least correctly, is change `dominates` condition to `contains` condition for `SortRegion` when sorting BBs, but this will change compilation results for existing non-EH code and I can't be sure it will not degrade performance or code size. I think it will degrade performance because it will force many BBs dominated by a loop, which don't have the path back to the header, to be placed after the loop and it will likely to create more branches and blocks. So this does a little hacky check when adding BBs to `Preferred` list: (`Preferred` list is a ready list. CFGSort maintains ready list in two priority queues: `Preferred` and `Ready`. I'm not very sure why, but it was written that way from the beginning. BBs are first added to `Preferred` list and then some of them are pushed to `Ready` list, so here we only need to guard condition for `Preferred` list.) When adding a BB to `Preferred` list, we check if that BB is an unwind destination of another BB. To do this, this adds the reverse mapping, `UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB is an unwind destination, it checks if the current stack of regions (`Entries`) contains its source BB by traversing the stack backwards. If we find its unwind source in there, we add the BB to its `Deferred` list, to make sure that unwind destination BB is added to `Preferred` list only after that region with the unwind source BB is sorted and popped from the stack. --- This does not contain a new test that crashes because of this bug, but this fix changes the result for one of existing test case. This test case didn't crash because it fortunately didn't contain `delegate` to the incorrectly placed unwind destination BB. Fixes https://github.com/emscripten-core/emscripten/issues/13514. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97247	2021-02-23 14:54:55 -08:00
Matthew Voss	6da7d31416	[llvm-profdata] Emit Error when Invalid MemOpSize Section is Created by llvm-profdata Under certain (currently unknown) conditions, llvm-profdata is outputting profiles that have two consecutive entries in the MemOPSize section for the value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch instruction with two cases for 0. As mentioned, we’re not quite sure what’s causing this to happen, but this patch prevents llvm-profdata from outputting a profile that has this problem and gives an error with a request for a reproducible. Differential Revision: https://reviews.llvm.org/D92074	2021-02-23 12:51:54 -08:00
Jay Foad	a6be26710b	[GlobalISel] Make more use of replaceSingleDefInstWithReg. NFC.	2021-02-23 17:08:34 +00:00
Juneyoung Lee	19c2e12947	[JumpThreading] Update computeValueKnownInPredecessors to recognize logical and/or patterns This allows JumpThreading's computeValueKnownInPredecessors to recognize select form of and/or patterns as well.	2021-02-24 00:06:10 +09:00
Nate Chandler	01b4890e47	Add @llvm.coro.async.size.replace intrinsic. The new intrinsic replaces the size in one specified AsyncFunctionPointer with the size in another. This ability is necessary for functions which merely forward to async functions such as those defined for partial applications. Reviewed By: aschwaighofer Differential Revision: https://reviews.llvm.org/D97229	2021-02-23 06:43:52 -08:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Alexey Lapshin	875b3b2cdd	[Support] Add reserve() method to the raw_ostream. If resulting size of the output stream is already known, then the space for stream data could be preliminary allocated in some cases. f.e. raw_string_ostream could preallocate the space for the target string(it allows to avoid reallocations during writing into the stream). Differential Revision: https://reviews.llvm.org/D91693	2021-02-23 14:06:38 +03:00
Andy Wingo	7dc98adbb0	Revert "[WebAssembly] call_indirect issues table number relocs" This reverts commit `861dbe1a02`. It broke emscripten -- see https://reviews.llvm.org/D90948#2578843.	2021-02-23 11:48:08 +01:00
Liu, Chen3	f8b9035aae	[X86] Support amx-int8 intrinsic. Adding support for intrinsics of TDPBSUD/TDPBUSD/TDPBUUD. Differential Revision: https://reviews.llvm.org/D97259	2021-02-23 17:08:05 +08:00
Lang Hames	430817d0d5	[JITLink] Add a getFixupAddress convenience method to Block.	2021-02-23 11:08:54 +11:00
Lang Hames	adf2098bd8	[JITLink] Don't allow creation of sections with duplicate names.	2021-02-23 11:08:54 +11:00
Heejin Ahn	a08e609d2e	[WebAssembly] Rename methods in WasmEHFuncInfo (NFC) This renames variable and method names in `WasmEHFuncInfo` class to be simpler and clearer. For example, unwind destinations are EH pads by definition so it doesn't necessarily need to be included in every method name. Also I am planning to add the reverse mapping in a later CL, something like `UnwindDestToSrc`, so this renaming will make meanings clearer. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D97173	2021-02-22 12:16:11 -08:00
Leonard Chan	1c932baeaa	[llvm][Bitcode] Add bitcode reader/writer for DSOLocalEquivalent This is necessary for compilation with [thin]lto. Differential Revision: https://reviews.llvm.org/D96170	2021-02-22 10:37:57 -08:00
Nikita Popov	5e7e499b91	[JumpThreading] Clone noalias.scope.decl when threading blocks When cloning instructions during jump threading, also clone and adapt any declared scopes. This is primarily important when threading loop exits, because we'll end up with two dominating scope declarations in that case (at least after additional loop rotation). This addresses a loose thread from https://reviews.llvm.org/rG2556b413a7b8#975012. Differential Revision: https://reviews.llvm.org/D97154	2021-02-22 18:35:30 +01:00
Ryan Santhiraraja	2c25efcbd3	[AArch64] Adding SHA3 Intrinsics support This patch adds the following SHA3 Intrinsics: vsha512hq_u64, vsha512h2q_u64, vsha512su0q_u64, vsha512su1q_u64 veor3q_u8 veor3q_u16 veor3q_u32 veor3q_u64 veor3q_s8 veor3q_s16 veor3q_s32 veor3q_s64 vrax1q_u64 vxarq_u64 vbcaxq_u8 vbcaxq_u16 vbcaxq_u32 vbcaxq_u64 vbcaxq_s8 vbcaxq_s16 vbcaxq_s32 vbcaxq_s64 Note need to include +sha3 and +crypto when building from the front-end Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D96381	2021-02-22 12:09:20 +00:00
Andy Wingo	861dbe1a02	[WebAssembly] call_indirect issues table number relocs If the reference-types feature is enabled, call_indirect will explicitly reference its corresponding function table via `TABLE_NUMBER` relocations against a table symbol. Also, as before, address-taken functions can also cause the function table to be created, only with reference-types they additionally cause a symbol table entry to be emitted. We abuse the used-in-reloc flag on symbols to indicate which tables should end up in the symbol table. We do this because unfortunately older wasm-ld will carp if it see a table symbol. Differential Revision: https://reviews.llvm.org/D90948	2021-02-22 10:13:36 +01:00
Kazu Hirata	5032b5890b	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-21 19:58:05 -08:00
madhur13490	5fe23de5db	[NFC] Remove redundant word in comment Differential Revision: https://reviews.llvm.org/D97157	2021-02-21 18:04:20 +00:00
Nikita Popov	e0615bcd39	[Loads] Add optimized FindAvailableLoadedValue() overload (NFCI) FindAvailableLoadedValue() accepts an iterator by reference. If no available value is found, then the iterator will either be left at a clobbering instruction or the beginning of the basic block. This allows using FindAvailableLoadedValue() across multiple blocks. If this functionality is not needed, as is the case in InstCombine, then we can use a much more efficient implementation: First try to find an available value, and only perform clobber checks if we actually found one. As this function only looks at a very small number of instructions (6 by default) and usually doesn't find an available value, this saves many expensive alias analysis queries.	2021-02-21 18:42:56 +01:00
Juneyoung Lee	aacf7878bc	[ValueTracking] Improve impliesPoison This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning: ``` %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b) %overflow = extractvalue { i64, i1 } %res, 1 %mul = extractvalue { i64, i1 } %res, 0 ; If %mul is poison, %overflow is also poison, and vice versa. ``` This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`: ``` define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) { ; CHECK-LABEL: @test2_logical( ; CHECK-NEXT: [[MUL:%.]] = mul i64 [[A:%.]], [[B:%.]] ; CHECK-NEXT: [[TMP1:%.]] = icmp ne i64 [[A]], 0 ; CHECK-NEXT: [[TMP2:%.]] = icmp ne i64 [[B]], 0 ; CHECK-NEXT: [[OVERFLOW_1:%.]] = and i1 [[TMP1]], [[TMP2]] ; CHECK-NEXT: [[NEG:%.]] = sub i64 0, [[MUL]] ; CHECK-NEXT: store i64 [[NEG]], i64 [[PTR:%.]], align 8 ; CHECK-NEXT: ret i1 [[OVERFLOW_1]] ; %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b) %overflow = extractvalue { i64, i1 } %res, 1 %mul = extractvalue { i64, i1 } %res, 0 %cmp = icmp ne i64 %mul, 0 %overflow.1 = select i1 %overflow, i1 true, i1 %cmp %neg = sub i64 0, %mul store i64 %neg, i64 %ptr, align 8 ret i1 %overflow.1 } ``` Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`. Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true. This improvement allows `impliesPoison` to do the reasoning. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96929	2021-02-20 13:22:34 +09:00
Craig Topper	baab797878	[ValueTypes] Assert if changeVectorElementType is called on a simple type with an extended element type. Previously we would use the extended implementation, but the extended implementation requires the vector type to be extended so that we can access the LLVMContext. In theory we could detect this case and use the context from the element type instead, but since I know of no cases hitting this in practice today I've done the simplest thing. Also add asserts to several extended EVT functions that assume LLVMTy is non-null. Follow from discussion in D97036 Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D97070	2021-02-19 17:30:46 -08:00
Mircea Trofin	82492f24ff	[NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit VirtRegAuxInfo is an extensibility point, so the register allocator's decision on which implementation to use should be communicated to the other users - namely, LiveRangeEdit. Differential Revision: https://reviews.llvm.org/D96898	2021-02-19 07:44:28 -08:00
Simon Pilgrim	aa44815f84	Remove unnecessary "using namespace llvm" inside "namespace llvm". NFCI.	2021-02-19 11:15:16 +00:00
Nikita Popov	370addb996	[IR] Move willReturn() to Instruction This moves the willReturn() helper from CallBase to Instruction, so that it can be used in a more generic manner. This will make it easier to fix additional passes (ADCE and BDCE), and will give us one place to change if additional instructions should become non-willreturn (e.g. there has been talk about handling volatile operations this way). I have also included the IntrinsicInst workaround directly in here, so that it gets applied consistently. (As such this change is not entirely NFC -- FuncAttrs will now use this as well.) Differential Revision: https://reviews.llvm.org/D96992	2021-02-19 11:56:01 +01:00
Djordje Todorovic	1a2b3536ef	Reland "[Debugify] Make the debugify aware of the original (-g) Debug Info" As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545 The test that was failing is now forced to use the old PM.	2021-02-18 23:29:22 -08:00
Serge Pavlov	2c4f60e45b	[FPEnv][AArch64] Implement lowering of llvm.set.rounding Differential Revision: https://reviews.llvm.org/D96836	2021-02-19 13:16:51 +07:00
Lang Hames	0469256d35	[ORC] Print CPU feature string in JITTargetMachineBuilder debugging output.	2021-02-19 15:18:19 +11:00
Konstantin Zhuravlyov	71d1f785a5	AMDGPU/ELF: Sort MACHs by value and add missing reserved MACHs - Sort MACHs by its value - Add missing reserved MACHs - EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3D - EF_AMDGPU_MACH_AMDGCN_RESERVED_0X3E Differential Revision: https://reviews.llvm.org/D97010	2021-02-18 20:46:27 -05:00
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Leonard Chan	c77659e549	[llvm][IR] Do not place constants with static relocations in a mergeable section This patch provides two major changes: 1. Add getRelocationInfo to check if a constant will have static, dynamic, or no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.) 2. Only allow a constant with no relocations (static or dynamic) to be placed in a mergeable section. This will allow unused symbols that contain static relocations and happen to fit in mergeable constant sections (.rodata.cstN) to instead be placed in unique-named sections if -fdata-sections is used and subsequently garbage collected by --gc-sections. See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html. Differential Revision: https://reviews.llvm.org/D95960	2021-02-18 15:39:00 -08:00
Petr Hosek	5fbd1a333a	[Coverage] Store compilation dir separately in coverage mapping We currently always store absolute filenames in coverage mapping. This is problematic for several reasons. It poses a problem for distributed compilation as source location might vary across machines. We are also duplicating the path prefix potentially wasting space. This change modifies how we store filenames in coverage mapping. Rather than absolute paths, it stores the compilation directory and file paths as given to the compiler, either relative or absolute. Later when reading the coverage mapping information, we recombine relative paths with the working directory. This approach is similar to handling ofDW_AT_comp_dir in DWARF. Finally, we also provide a new option, -fprofile-compilation-dir akin to -fdebug-compilation-dir which can be used to manually override the compilation directory which is useful in distributed compilation cases. Differential Revision: https://reviews.llvm.org/D95753	2021-02-18 14:34:39 -08:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Petr Hosek	fbf8b957fd	Revert "[Coverage] Store compilation dir separately in coverage mapping" This reverts commit `97ec8fa5bb` since the test is failing on some bots.	2021-02-18 12:50:24 -08:00
Petr Hosek	97ec8fa5bb	[Coverage] Store compilation dir separately in coverage mapping We currently always store absolute filenames in coverage mapping. This is problematic for several reasons. It poses a problem for distributed compilation as source location might vary across machines. We are also duplicating the path prefix potentially wasting space. This change modifies how we store filenames in coverage mapping. Rather than absolute paths, it stores the compilation directory and file paths as given to the compiler, either relative or absolute. Later when reading the coverage mapping information, we recombine relative paths with the working directory. This approach is similar to handling ofDW_AT_comp_dir in DWARF. Finally, we also provide a new option, -fprofile-compilation-dir akin to -fdebug-compilation-dir which can be used to manually override the compilation directory which is useful in distributed compilation cases. Differential Revision: https://reviews.llvm.org/D95753	2021-02-18 12:27:42 -08:00
Sam Powell	eb2eeeb76f	[llvm][TextAPI] add equality operator for InterfaceFile This patch adds functionality to compare for the equality between `InterfaceFile`s based on attributes specific to linking. Reviewed By: cishida, steven_wu Differential Revision: https://reviews.llvm.org/D96629	2021-02-18 11:53:08 -08:00
Ta-Wei Tu	f70cdc5b5c	[NPM] Properly reset parent loop after loop passes This fixes https://bugs.llvm.org/show_bug.cgi?id=49185 When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`. If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes. However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes. This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`. Reviewed By: asbirlea, ychen Differential Revision: https://reviews.llvm.org/D96727	2021-02-19 02:50:53 +08:00
Bradley Smith	8bad8a43c3	[AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD Adjust generateFMAsInMachineCombiner to return false if SVE is present in order to combine fmul+fadd into fma. Also add new pseudo instructions so as to select the most appropriate of FMLA/FMAD depending on register allocation. Depends on D96599 Differential Revision: https://reviews.llvm.org/D96424	2021-02-18 16:55:16 +00:00
Paul C. Anagnostopoulos	49d663d546	Revert "[TableGen] Improve algorithms for processing template arguments" This reverts commit e589207d5aaee6cbf1d7c7de8867a17727d14aca.	2021-02-18 09:26:26 -05:00
Paul C. Anagnostopoulos	d248cce44e	[TableGen] Improve algorithms for processing template arguments Rework template argument checking so that all arguments are type-checked and cast if necessary. Add a test. Differential Revision: https://reviews.llvm.org/D96416	2021-02-18 09:15:26 -05:00
Djordje Todorovic	c1e23894fc	Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info" This reverts rG8ee7c7e02953. One test is failing, I'll reland this as soon as possible.	2021-02-18 02:04:27 -08:00
Djordje Todorovic	8ee7c7e029	[Debugify] Make the debugify aware of the original (-g) Debug Info As discussed on the RFC [0], I am sharing the set of patches that enables checking of original Debug Info metadata preservation in optimizations. The proof-of-concept/proposal can be found at [1]. The implementation from the [1] was full of duplicated code, so this set of patches tries to merge this approach into the existing debugify utility. For example, the utility pass in the original-debuginfo-check mode could be invoked as follows: $ opt -verify-debuginfo-preserve -pass-to-test sample.ll Since this is very initial stage of the implementation, there is a space for improvements such as: - Add support for the new pass manager - Add support for metadata other than DILocations and DISubprograms [0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ [1] https://github.com/djolertrk/llvm-di-checker Differential Revision: https://reviews.llvm.org/D82545	2021-02-18 01:52:16 -08:00
Chen Zheng	4c23707a41	[XCOFF][NFC] make StorageMappingClass/SymbolType member optional This patch makes StorageMappingClass/SymbolType member optional in class MCSectionXCOFF. Non-csect sections like debug sections have no such properties. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D96641	2021-02-18 04:46:05 -05:00
Fangrui Song	018a484cd2	[llvm-objdump] Map STT_TLS to ST_Other (previously ST_Data) ST_Data is used to model BFD `BFD_OBJECT`. A STT_TLS symbol does not have the `BFD_OBJECT` flag in BFD. This makes sense because a STT_TLS symbol is like in a different address space, normal data/object properties do not apply on them. With this change, a STT_TLS symbol will not be displayed as 'O'. This new behavior matches objdump. Differential Revision: https://reviews.llvm.org/D96735	2021-02-17 23:17:20 -08:00
Chen Zheng	5517923b1c	[XCOFF][NFC] make csect properties optional for getXCOFFSection We are going to support debug sections for XCOFF. So the csect properties are not necessary. This patch makes these properties optional. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D95931	2021-02-17 20:51:42 -05:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Rahman Lavaee	0252e6ead1	[obj2yaml,yaml2obj] Add NumBlocks to the BBAddrMapEntry yaml field. As discussed in D95511, this allows us to encode invalid BBAddrMap sections to be used in more rigorous testing. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D96831	2021-02-17 15:45:13 -08:00
Rong Xu	7397905ab0	[SampleFDO] Third Try: Refactor SampleProfile.cpp Apply the patch for the third time after fixing buildbot failures. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. (4) Add inline keyword to avoid duplicated symbols -- they will be removed later when the class is changed to a template. Differential Revision: https://reviews.llvm.org/D96455	2021-02-17 15:31:50 -08:00
Jessica Paquette	60aa646441	[GlobalISel] Add G_ASSERT_SEXT This adds a G_ASSERT_SEXT opcode, similar to G_ASSERT_ZEXT. This instruction signifies that an operation was already sign extended from a smaller type. This is useful for functions with sign-extended parameters. E.g. ``` define void @foo(i16 signext %x) { ... } ``` This adds verifier, regbankselect, and instruction selection support for G_ASSERT_SEXT equivalent to G_ASSERT_ZEXT. Differential Revision: https://reviews.llvm.org/D96890	2021-02-17 13:10:34 -08:00
Vedant Kumar	c28622fbf3	Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" Revert "[SampleFDO] Add missing #includes to unbreak modules build after D96455" This reverts commit `c73cbf218a`. Revert "[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC)" This reverts commit `a23e6b321c`. Revert "[SampleFDO] Reapply: Refactor SampleProfile.cpp" This reverts commit `6fd5ccff72`. Still seeing link failures when building llc (or other tools), due to the new SampleProfileLoaderBaseImpl.h containing definitions that get duplicated across multiple TU's. ``` duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::findEquivalenceClasses(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::buildEdges(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::computeDominanceAndLoopInfo(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getFunctionLoc(llvm::Function&)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::getBlockWeight(llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockWeight(llvm::raw_ostream&, llvm::BasicBlock const) const' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printBlockEquivalence(llvm::raw_ostream&, llvm::BasicBlock const)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) duplicate symbol 'llvm::SampleProfileLoaderBaseImpl::printEdgeWeight(llvm::raw_ostream&, std::__1::pair<llvm::BasicBlock const, llvm::BasicBlock const*>)' in: tools/llc/CMakeFiles/llc.dir/llc.cpp.o lib/libLLVMInstCombine.a(InstCombineVectorOps.cpp.o) ```	2021-02-17 10:22:24 -08:00
Vedant Kumar	c73cbf218a	[SampleFDO] Add missing #includes to unbreak modules build after D96455 Bot: http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/28999 ``` /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:124:19: error: missing '#include "llvm/Analysis/PostDominators.h"'; 'PostDominatorTree' must be declared before it is used std::unique_ptr<PostDominatorTree> PDT; ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/PostDominators.h:28:7: note: declaration here is not visible class PostDominatorTree : public PostDomTreeBase<BasicBlock> { ^ While building module 'LLVM_Transforms' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Transforms/CFGuard/CFGuard.cpp:15: In file included from <module-includes>:191: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:125:19: error: missing '#include "llvm/Analysis/LoopInfo.h"'; 'LoopInfo' must be declared before it is used std::unique_ptr<LoopInfo> LI; ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/LoopInfo.h:1079:7: note: declaration here is not visible class LoopInfo : public LoopInfoBase<BasicBlock, Loop> { ^ While building module 'LLVM_Transforms' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Transforms/CFGuard/CFGuard.cpp:15: In file included from <module-includes>:191: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h:149:3: error: missing '#include "llvm/Analysis/OptimizationRemarkEmitter.h"'; 'OptimizationRemarkEmitter' must be declared before it is used OptimizationRemarkEmitter *ORE = nullptr; ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h:33:7: note: declaration here is not visible class OptimizationRemarkEmitter { ^ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/Transforms/CFGuard/CFGuard.cpp:15:10: fatal error: could not build module 'LLVM_Transforms' ```	2021-02-17 10:02:22 -08:00
Marianne Mailhot-Sarrasin	f0ec9f1bb3	[Pipeliner] Fixed optimization remarks and debug dumps Initiation Interval value The II value was incremented before exiting the loop, and therefor when used in the optimization remarks and debug dumps it did not reflect the initiation interval actually used in Schedule. Differential Revision: https://reviews.llvm.org/D95692	2021-02-17 12:28:37 -05:00
William S. Moses	40862b1a74	[SROA] Propagate correct TBAA/TBAA Struct offsets SROA does not correctly account for offsets in TBAA/TBAA struct metadata. This patch creates functionality for generating new MD with the corresponding offset and updates SROA to use this functionality. Differential Revision: https://reviews.llvm.org/D95826	2021-02-17 11:59:00 -05:00
Ta-Wei Tu	0eeaec2a6d	[NFC] Refactor LoopInterchange into a loop-nest pass This is the preliminary patch of converting `LoopInterchange` pass to a loop-nest pass and has no intended functional change. Changes that are not loop-nest related are split to D96650. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D96644	2021-02-18 00:55:38 +08:00
Andrew Savonichev	4bee0dc918	[NFC] Use the same type for bit fields in MCSchedClassDesc Otherwise they are not allocated as a single bit field and take 4 bytes instead of 2. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D95954	2021-02-17 15:54:22 +03:00
Igor Kudrin	aa84289629	[DebugInfo] Keep the DWARF64 flag in the module metadata This allows the option to affect the LTO output. Module::Max helps to generate debug info for all modules in the same format. Differential Revision: https://reviews.llvm.org/D96597	2021-02-17 17:03:34 +07:00
Sam McCall	9ebc837f55	[ADT] Add SFINAE guards to unique_function constructor. We can't construct a working unique_function from an object that's not callable with the right types, so don't allow deduction to succeed. This avoids some ambiguous conversion cases, e.g. allowing to overload on different unique_function types, and to conversion operators to unique_function. std::function and the any_invocable proposal have these. This was added to llvm::function_ref in D88901 and followups Differential Revision: https://reviews.llvm.org/D96794	2021-02-17 10:36:07 +01:00
Yang Fan	a23e6b321c	[SampleFDO] Fix MSVC "namespace uses itself" warning (NFC) MSVC warning: ``` SampleProfileLoaderBaseImpl.h(41): warning C4515: 'llvm': namespace uses itself ```	2021-02-17 15:27:30 +08:00
Kazu Hirata	2620459baa	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-16 23:23:07 -08:00
Rong Xu	6fd5ccff72	[SampleFDO] Reapply: Refactor SampleProfile.cpp Reapply patch after fixing buildbot failure. Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 16:43:21 -08:00
Sriraman Tallam	d1a838babc	Basic block sections should enable function sections implicitly. Basic block sections enables function sections implicitly, this is not needed and is inefficient with "=list" option. We had basic block sections enable function sections implicitly in clang. This is particularly inefficient with "=list" option as it places functions that do not have any basic block sections in separate sections. This causes unnecessary object file overhead for large applications. This patch disables this implicit behavior. It only creates function sections for those functions that require basic block sections. Further, there was an inconistent behavior with llc as llc was not turning on function sections by default. This patch makes llc and clang consistent and tests are added to check the new behavior. This is the first of two patches and this adds functionality in LLVM to create a new section for the entry block if function sections is not enabled. Differential Revision: https://reviews.llvm.org/D93876	2021-02-16 16:27:16 -08:00
Petr Hosek	16af973933	[MC][ELF] Support for zero flag section groups This change introduces support for zero flag ELF section groups to LLVM. LLVM already supports COMDAT sections, which in ELF are a special type of ELF section groups. These are generally useful to enable linker GC where you want a group of sections to always travel together, that is to be either retained or discarded as a whole, but without the COMDAT semantics. Other ELF assemblers already support zero flag ELF section groups and this change helps us reach feature parity. Differential Revision: https://reviews.llvm.org/D95851	2021-02-16 14:23:40 -08:00
Mehdi Amini	c761fe77bd	Revert "[SampleFDO][NFC] Refactor SampleProfile.cpp" This reverts commit `310b35304c`. The build is broken with -DBUILD_SHARED_LIBS=ON : lib/ProfileData/CMakeFiles/LLVMProfileData.dir/SampleProfileLoaderBaseUtil.cpp.o: In function `llvm::sampleprofutil::callsiteIsHot(llvm::sampleprof::FunctionSamples const, llvm::ProfileSummaryInfo, bool)': SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x1a): undefined reference to `llvm::ProfileSummaryInfo::isColdCount(unsigned long) const' SampleProfileLoaderBaseUtil.cpp:(.text._ZN4llvm14sampleprofutil13callsiteIsHotEPKNS_10sampleprof15FunctionSamplesEPNS_18ProfileSummaryInfoEb+0x28): undefined reference to `llvm::ProfileSummaryInfo::isHotCount(unsigned long) const' ...	2021-02-16 22:11:42 +00:00
David Blaikie	c3120291f4	Effectively revert `ba2aa5f49e` since the object isn't destroyed polymorphically	2021-02-16 13:45:25 -08:00
David Blaikie	f8af06d60d	Fix -Wnon-virtual-dtor by making the ctor protected	2021-02-16 13:38:28 -08:00
Kazu Hirata	ba2aa5f49e	[SampleFDO] Provide a virtual desructor for SampleProfileLoaderBaseImpl This patch fixes a warning: llvm-project/llvm/include/llvm/ProfileData/SampleProfileLoaderBaseImpl.h:69:7: error: 'llvm::SampleProfileLoaderBaseImpl' has virtual functions but non-virtual destructor [-Werror,-Wnon-virtual-dtor] Differential Revision: https://reviews.llvm.org/D96810	2021-02-16 13:17:33 -08:00
Rong Xu	310b35304c	[SampleFDO][NFC] Refactor SampleProfile.cpp Refactor SampleProfile.cpp to use the core code in CodeGen. The main changes are: (1) Move SampleProfileLoaderBaseImpl class to a header file. (2) Split SampleCoverageTracker to a head file and a cpp file. (3) Move the common codes (common options and callsiteIsHot()) to the common cpp file. Differential Revision: https://reviews.llvm.org/D96455	2021-02-16 11:18:21 -08:00
Michael Kruse	6c05005238	[OpenMP] Implement '#pragma omp tile', by Michael Kruse (@Meinersbur). The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard. This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult. A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once. I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest). Differential Revision: https://reviews.llvm.org/D76342	2021-02-16 09:45:07 -08:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Sander de Smalen	00fe10c6a6	[SCEVExpander] Migrate costAndCollectOperands to use InstructionCost. This patch changes costAndCollectOperands to use InstructionCost for accumulated cost values. isHighCostExpansion will return true if the cost has exceeded the budget. Reviewed By: CarolineConcatto, ctetreau Differential Revision: https://reviews.llvm.org/D92238	2021-02-16 09:27:34 +00:00
Sameer Sahasrabuddhe	11bf7da64a	[NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis" since there is no conflict with LegacyDivergenceAnalysis. In the legacy PM, this analysis can only be used through the legacy DA serving as a wrapper. It is now made available as a pass in the new PM, and has no relation with the legacy DA. The new DA currently cannot handle irreducible control flow; its presence can cause the analysis to run indefinitely. The analysis is now modified to detect this and report all instructions in the function as divergent. This is super conservative, but allows the analysis to be used without hanging the compiler. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D96615	2021-02-16 10:26:45 +05:30
Kazu Hirata	f0d5898f93	[Support] Use ListSeparator (NFC)	2021-02-15 14:46:09 -08:00
Kazu Hirata	c82cd5e54e	[LazyCallGraph] Remove forward declarations of nonexistent classes (NFC)	2021-02-15 14:46:07 -08:00
Matt Arsenault	392e0fcfd1	GlobalISel: Handle arguments partially passed on the stack The API is a bit awkward since you need to index into an array in the passed struct. I guess an alternative would be to pass all of the individual fields.	2021-02-15 17:06:14 -05:00
Matt Arsenault	1b3d8ddeb9	CodeGen: Move function to get subregister indexes to cover a LaneMask Return the best covering index, and additional needed to complete the mask. This logically belongs in TargetRegisterInfo, although I ended up not needing it for why I originally split this out.	2021-02-15 17:05:37 -05:00
Craig Topper	eb75f250fe	[RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. Simlilar was done for BSWAP previously. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96681	2021-02-15 12:33:16 -08:00
Duncan P. N. Exon Smith	22a52dfddc	TransformUtils: Fix metadata handling in CloneModule (and improve CloneFunctionInto) This commit fixes how metadata is handled in CloneModule to be sound, and improves how it's handled in CloneFunctionInto (although the latter is still awkward when called within a module). Ruiling Song pointed out in PR48841 that CloneModule was changed to unsoundly use the RF_ReuseAndMutateDistinctMDs flag (renamed in `fa35c1f80f` for clarity). This flag papered over a crash caused by other various changes made to CloneFunctionInto over the past few years that made it unsound to use cloning between different modules. (This commit partially addresses PR48841, fixing the repro from preprocessed source but not textual IR. MDNodeMapper::mapDistinctNode became unsound in `df763188c9` and this commit does not address that regression.) RF_ReuseAndMutateDistinctMDs is designed for the IRMover to use, avoiding unnecessary clones of all referenced metadata when linking between modules (with IRMover, the source module is discarded after linking). It never makes sense to use when you're not discarding the source. This commit drops its incorrect use in CloneModule. Sadly, the right thing to do with metadata when cloning a function is complicated, and this patch doesn't totally fix it. The first problem is that there are two different types of referenceable metadata and it's not obvious what to with one of them when remapping. - `!0 = !{!1}` is metadata's version of a constant. Programatically it's called "uniqued" (probably a better term would be "constant") because, like `ConstantArray`, it's stored in uniquing tables. Once it's constructed, it's illegal to change its arguments. - `!0 = distinct !{!1}` is a bit closer to a global variable. It's legal to change the operands after construction. What should be done with distinct metadata when cloning functions within the same module? - Should new, cloned nodes be created? - Should all references point to the same, old nodes? The answer depends on whether that metadata is effectively owned by a function. And that's the second problem. Referenceable metadata's ownership model is not clear or explicit. Technically, it's all stored on an LLVMContext. However, any metadata that is `distinct`, that transitively references a `distinct` node, or that transitively references a GlobalValue is specific to a Module and is effectively owned by it. More specifically, some metadata is effectively owned by a specific Function within a module. Effectively function-local metadata was introduced somewhere around `c10d0e5ccd`, which made it illegal for two functions to share a DISubprogram attachment. When cloning a function within a module, you need to clone the function-local debug info and suppress cloning of global debug info (the status quo suppresses cloning some global debug info but not all). When cloning a function to a new/different module, you need to clone all of the debug info. Here's what I think we should do (eventually? soon? not this patch though): - Distinguish explicitly (somehow) between pure constant metadata owned by the LLVMContext, global metadata owned by the Module, and local metadata owned by a GlobalValue (such as a function). - Update CloneFunctionInto to trigger cloning of all "local" metadata (only), perhaps by adding a bit to RemapFlag. Alternatively, split out a separate function CloneFunctionMetadataInto to prime the metadata map that callers are updated to call ahead of time as appropriate. Here's the somewhat more isolated fix in this patch: - Converted the `ModuleLevelChanges` parameter to `CloneFunctionInto` to an enum called `CloneFunctionChangeType` that is one of LocalChangesOnly, GlobalChanges, DifferentModule, and ClonedModule. - The code maintaining the "functions uniquely own subprograms" invariant is now only active in the first two cases, where a function is being cloned within a single module. That's necessary because this code inhibits cloning of (some) "global" metadata that's effectively owned by the module. - The code maintaining the "all compile units must be explicitly referenced by !llvm.dbg.cu" invariant is now only active in the DifferentModule case, where a function is being cloned into a new module in isolation. - CoroSplit.cpp's call to CloneFunctionInto in CoroCloner::create uses LocalChangeOnly, since `fa635d730f` only set `ModuleLevelChanges` to trigger cloning of local metadata. - CloneModule drops its unsound use of RF_ReuseAndMutateDistinctMDs and special handling of !llvm.dbg.cu. - Fixed some outdated header docs and left a couple of FIXMEs. Differential Revision: https://reviews.llvm.org/D96531	2021-02-15 11:56:00 -08:00
Caroline Concatto	b52e6c5891	[CostModel]Add cost model for experimental.vector.reverse This patch uses the function getShuffleCost with SK_Reverse to compute the cost for experimental.vector.reverse. For scalable vector type, it adds a table will the legal types on AArch64TTIImpl::getShuffleCost to not assert in BasicTTIImpl::getShuffleCost, and for fixed vector, it relies on the existing cost model in BasicTTIImpl. Depends on D94883 Differential Revision: https://reviews.llvm.org/D95603	2021-02-15 14:23:57 +00:00
Kerry McLaughlin	5fe1593438	[LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax Currently, setting the `no-nans-fp-math` attribute to true will allow loops with fmin/fmax to vectorize, though we should be requiring that `no-signed-zeros-fp-math` is also set. This patch adds the check for no-signed-zeros at the function level and includes tests to make sure we don't vectorize functions with only one of the attributes associated. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D96604	2021-02-15 13:47:05 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Marco Antognini	e54811ff7e	Restore diagnostic handler after CodeGenAction::ExecuteAction Fix dangling pointer to local variable and address some typos. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D96487	2021-02-15 10:33:00 +00:00
Florian Hahn	c70737ba1d	Recommit "[LTO] Use lto::backend for code generation." This version of the patch includes a fix for the cfi failures. (undoes the revert commit `7db390cc77`) It also undoes reverts of follow-up patches that also needed reverting originally: * [LTO] Add option enable NewPM with LTOCodeGenerator. (undoes revert commit `0a17664b47`) * [LTOCodeGenerator] Use lto::Config for options (NFC)." (undoes revert commit `b0a8e41cff`)	2021-02-15 10:05:42 +00:00
Arlo Siemsen	080866470d	Add ehcont section support In the future Windows will enable Control-flow Enforcement Technology (CET aka shadow stacks). To protect the path where the context is updated during exception handling, the binary is required to enumerate valid unwind entrypoints in a dedicated section which is validated when the context is being set during exception handling. This change allows llvm to generate the section that contains the appropriate symbol references in the form expected by the msvc linker. This feature is enabled through a new module flag, ehcontguard, which was modelled on the cfguard flag. The change includes a test that when the module flag is enabled the section is correctly generated. The set of exception continuation information includes returns from exceptional control flow (catchret in llvm). In order to collect catchret we: 1) Includes an additional flag on machine basic blocks to indicate that the given block is the target of a catchret operation, 2) Introduces a new machine function pass to insert and collect symbols at the start of each block, and 3) Combines these targets with the other EHCont targets that were already being collected. Change originally authored by Daniel Frampton <dframpto@microsoft.com> For more details, see MSVC documentation for `/guard:ehcont` https://docs.microsoft.com/en-us/cpp/build/reference/guard-enable-eh-continuation-metadata Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D94835	2021-02-15 14:27:12 +08:00
Carl Ritson	aef781b47a	[AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic Add intrinsic which demotes all active lanes to helper lanes. This is used to implement demote to helper Vulkan extension. In practice demoting a lane to helper simply means removing it from the mask of live lanes used for WQM/WWM/Exact mode. Where the shader does not use WQM, demotes just become kills. Additionally add llvm.amdgcn.live.mask intrinsic to complement demote operations. In theory llvm.amdgcn.ps.live can be used to detect helper lanes; however, ps.live can be moved by LICM. The movement of ps.live cannot be remedied without changing its type signature and such a change would require ps.live users to update as well. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D94747	2021-02-15 08:45:46 +09:00
Cassie Jones	36246388ba	[GlobalISel] Extract a narrowScalarAddSub method. NFC Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D95426	2021-02-14 18:06:32 -05:00
cynecx	656ead1fb7	[llvm/Support] Add SHA256 implementation Adds an unaudited SHA-256 implementation to `llvm/Support`. The ongoing lld-macho effort needs this to emit an adhoc code signature for macho files on macOS Big Sur. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D96540	2021-02-14 19:01:01 +00:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Kazu Hirata	1cc558bd4f	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-14 08:36:18 -08:00
Nikita Popov	728803ed74	[BasicAA] Use index difference to detect GEPs with identical indexes We currently detect GEPs that have exactly the same indexes by comparing the Offsets and VarIndices. However, the latter implicitly performs equality comparisons between two values, which is not generally legal inside BasicAA, due to the possibility of comparisons across phi cycles. I believe that in this particular instance this actually ends up being unproblematic, at least I wasn't able to come up with any cases that could result in an incorrect root query result. In the interest of being defensive, compute GetIndexDifference earlier (which knows how to handle phi cycles properly) and use the result of that to determine whether the offsets are identical.	2021-02-14 17:11:03 +01:00
aqjune	5f3c99085d	[ValueTracking] Dereferenced pointers are noundef This is a follow-up of D95238's LangRef update. This patch updates `programUndefinedIfUndefOrPoison(V)` to return true if `V` is used by any memory-accessing instruction. Interestingly, this affected many tests in Attributors, mainly about adding noundefs. The tests are updated using llvm/utils/update_test_checks.py. I checked that the diffs are about updating noundefs. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96642	2021-02-14 22:50:48 +09:00
Kazu Hirata	dfa3ead01e	[Analysis] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-02-13 20:41:38 -08:00
Nikita Popov	f515ca8995	[IRBuilder] Remove Align-related deprecated APIs This removes IRBuilder methods accepting unsigned alignments in favor of their Align/MaybeAlign variants. These methods have been deprecated for more than a year at this point, so they should be safe to remove.	2021-02-13 16:42:37 +01:00
Tyker	642e9225c6	reland [InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-13 13:03:11 +01:00
Wei Wang	80dc0661bd	[LTO] Perform DSOLocal propagation in combined index Perform DSOLocal propagation within summary list of every GV. This avoids the repeated query of this information during function importing. Differential Revision: https://reviews.llvm.org/D96398	2021-02-12 22:58:26 -08:00
Jian Cai	c2a84771bb	[llvm-objcopy] preserve file ownership when overwritten by root As of binutils 2.36, GNU strip calls chown(2) for "sudo strip foo" and "sudo strip foo -o foo", but no "sudo strip foo -o bar" or "sudo strip foo -o ./foo". In other words, while "sudo strip foo -o bar" creates a new file bar with root access, "sudo strip foo" will keep the owner and group of foo unchanged. Currently llvm-objcopy and llvm-strip behave differently, always changing the owner and gropu to root. The discrepancy prevents Chrome OS from migrating to llvm-objcopy and llvm-strip as they change file ownership and cause intended users/groups to lose access when invoked by sudo with the following sequence (recommended in man page of GNU strip). 1.<Link the executable as normal.> 1.<Copy "foo" to "foo.full"> 1.<Run "strip --strip-debug foo"> 1.<Run "objcopy --add-gnu-debuglink=foo.full foo"> This patch makes llvm-objcopy and llvm-strip follow GNU's behavior. Link: crbug.com/1108880	2021-02-12 18:01:43 -08:00
James Y Knight	8bd8534aa3	LLVM-C: Allow LLVM{Get/Set}Alignment on an atomicrmw/cmpxchg instruction. (Now that these can have alignment specified.)	2021-02-12 18:31:18 -05:00
Nikita Popov	191e469ede	[AA] Move Depth member from AAResults to AAQI (NFC) Rather than storing the query depth in AAResults, store it in AAQI. This makes more sense, as it is a property of the query. This sidesteps the issue of D94363, fixing slightly inaccurate AA statistics. Additionally, I plan to use the Depth from BasicAA in the future, where fetching it from AAResults would be unreliable. This change is not quite as straightforward as it seems, because we need to preserve the depth when creating a new AAQI for recursive queries across phis. I'm adding a new method for this, as we may need to preserve additional information here in the future.	2021-02-12 21:42:36 +01:00
Jessica Paquette	145549ff89	[GlobalISel] Combine (x + 0) -> x, G_PTR_ADD edition Add it to right_identity_zero. Differential Revision: https://reviews.llvm.org/D96621	2021-02-12 12:09:48 -08:00
Amara Emerson	5d6d9b63a3	[GlobalISel] Propagate extends through G_PHIs into the incoming value blocks. This combine tries to do inter-block hoisting of extends of G_PHIs, into the originating blocks of the phi's incoming value. The idea is to expose further optimization opportunities that are normally obscured by the PHI. Some basic heuristics, and a target hook for AArch64 is added, to allow tuning. E.g. if the extend is used by a G_PTR_ADD, it doesn't perform this combine since it may be folded into the addressing mode during selection. There are very minor code size improvements on AArch64 -Os, but the real benefit is that it unlocks optimizations like AArch64 conditional compares on some benchmarks. Differential Revision: https://reviews.llvm.org/D95703	2021-02-12 11:52:52 -08:00
Scott Linder	12999d749d	[Symbolize] Teach symbolizer to work directly on object file. This patch intended to provide additional interface to LLVMsymbolizer such that they work directly on object files. There is an existing method - symbolizecode which takes an object file, this patch provides similar overloads for symbolizeInlinedCode, symbolizeData, symbolizeFrame. This can be useful for clients who already have a in-memory object files to symbolize for. Patch By: pvellien (praveen velliengiri) Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95232	2021-02-12 18:26:24 +00:00
Arnold Schwaighofer	e760ec2a01	[coro] Add support for polymorphic return typed coro.suspend.async This allows for suspend point specific resume function types. Return values from a suspend point can therefore be modelled as arguments to the resume function. Allowing for directly passed return types. Differential Revision: https://reviews.llvm.org/D96136	2021-02-12 10:08:00 -08:00
Lukas Sommer	6577cef9b0	[CodeGen] New pass: Replace vector intrinsics with call to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. This is a re-try of the original commit `2303e93e66` that was reverted due to pass manager problems. Other minor changes have also been made. Differential Revision: https://reviews.llvm.org/D95373	2021-02-12 12:53:27 -05:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
David Sherwood	01b87444cb	[NFC][Analysis] Change struct VecDesc to use ElementCount This patch changes the VecDesc struct to use ElementCount instead of an unsigned VF value, in preparation for future work that adds support for vectorized versions of math functions using scalable vectors. Since all I'm doing in this patch is switching the type I believe it's a non-functional change. I changed getWidestVF to now return both the widest fixed-width and scalable VF values, but currently the widest scalable value will be zero. Differential Revision: https://reviews.llvm.org/D96011	2021-02-12 11:07:58 +00:00
Vitaly Buka	fc05b2d9e5	[NFC][ProfileData] Improve language	2021-02-12 02:55:58 -08:00
David Sherwood	9700228abc	[Analysis] Change VFABI::mangleTLIVectorName to use ElementCount Adds support for mangling TLI vector names for scalable vectors. Differential Revision: https://reviews.llvm.org/D96338	2021-02-12 09:38:12 +00:00
Sander de Smalen	1d42ba254f	[BasicTTIImpl] Fix getCastInstrCost for scalable vectors by querying for ElementCount. This fixes an overly restrictive assumption that the vector is a FixedVectorType, in code that tries to calculate the cost of a cast operation when splitting a too-wide vector. The algorithm works the same for scalable vectors, so this patch removes the cast<FixedVectorType>. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96253	2021-02-12 08:28:52 +00:00
Sander de Smalen	63d787e5d4	[CostModel] An extending load to illegal type is not free. COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when <4 x i64> is an illegal result type that requires splitting of the operation. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96250	2021-02-12 07:59:21 +00:00
Pengxuan Zheng	61cca0f2e5	[AArch64] Adding Neon Sm3 & Sm4 Intrinsics This adds SM3 and SM4 Intrinsics support for AArch64, specifically: vsm3ss1q_u32 vsm3tt1aq_u32 vsm3tt1bq_u32 vsm3tt2aq_u32 vsm3tt2bq_u32 vsm3partw1q_u32 vsm3partw2q_u32 vsm4eq_u32 vsm4ekeyq_u32 Reviewed By: labrinea Differential Revision: https://reviews.llvm.org/D95655	2021-02-11 14:20:20 -08:00
Hongtao Yu	de40f6d623	[CSSPGO] Process functions in a top-down order on a dynamic call graph. Functions are currently processed by the sample profiler loader in a top-down order defined by the static call graph. The order is being adjusted to be a top-down order based on the input context-sensitive profile. One benefit is that the processing order of caller and callee in one SCC would follow the context order in the profile to favor more inlining. Another benefit is that the processing order of caller and callee through an indirect call (which is not on the static call graph) can be honored which in turn allows for more inlining. The profile top-down order for SCC is also extended to support non-CS profiles. Two switches `-mllvm -use-profile-indirect-call-edges` and `-mllvm -use-profile-top-down-order` are being introduced. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95988	2021-02-11 12:36:59 -08:00
Stanislav Mekhanoshin	8151c1b442	Move implementation of isAssumeLikeIntrinsic into IntrinsicInst This is remove dependency on ValueTracking in the future patch. Differential Revision: https://reviews.llvm.org/D96079	2021-02-11 11:41:34 -08:00
Michael Kruse	606aa622b2	Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache" This reverts commit `b7d870eae7` and the subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)" (commit `e6810cab09`). It caused indeterminism in the output, such that e.g. the polly-x86_64-linux buildbot failed accasionally.	2021-02-11 12:17:38 -06:00
Alex Hoppen	7e3b9aba60	[Timer] On macOS count number of executed instructions In addition to wall time etc. this should allow us to get less noisy values for time measurements. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D96049	2021-02-11 17:26:37 +01:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Duncan P. N. Exon Smith	fa35c1f80f	ValueMapper: Rename RF_MoveDistinctMDs => RF_ReuseAndMutateDistinctMDs, NFC Rename the `RF_MoveDistinctMDs` flag passed into `MapValue` and `MapMetadata` to `RF_ReuseAndMutateDistinctMDs` in order to more precisely describe its effect and clarify the header documentation. Found this while helping to investigate PR48841, which pointed out an unsound use of the flag in `CloneModule()`. For now I've just added a FIXME there, but I'm hopeful that the new (more precise) name will prevent other similar errors.	2021-02-10 16:53:21 -08:00
Hongtao Yu	1cb47a063e	[CSSPGO] Unblock optimizations with pseudo probe instrumentation. The IR/MIR pseudo probe intrinsics don't get materialized into real machine instructions and therefore they don't incur runtime cost directly. However, they come with indirect cost by blocking certain optimizations. Some of the blocking are intentional (such as blocking code merge) for better counts quality while the others are accidental. This change unblocks perf-critical optimizations that do not affect counts quality. They include: 1. IR InstCombine, sinking load operation to shorten lifetimes. 2. MIR LiveRangeShrink, similar to #1 3. MIR TwoAddressInstructionPass, i.e, opeq transform 4. MIR function argument copy elision 5. IR stack protection. (though not perf-critical but nice to have). Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D95982	2021-02-10 12:43:17 -08:00
Arthur Eubanks	5d960cba34	[opt][NewPM] Add a --print-passes flag to print all available passes It seems nicer to list passes given a flag rather than displaying all passes in opt --help. This is awkwardly structured because a PassBuilder is required, but reusing the PassBuilder in runPassPipeline() doesn't work because we read the input IR before getting to runPassPipeline(). So printing the list of passes needs to happen before reading the input IR. If we remove the legacy PM code in main() and move everything from NewPMDriver.cpp into opt.cpp, we can create the PassBuilder before reading IR and check if we should print the list of passes and exit. But until then this hack seems fine. Compared to the legacy PM, the new PM passes are lacking descriptions. We'll need to figure out a way to add descriptions if we think this is important. Also, this only works for passes specified in PassRegistry.def. If we want to print other custom registered passes, we'll need a different mechanism. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D96101	2021-02-10 11:22:12 -08:00
Jeremy Morse	1d68e0a075	Reland [DWARF] Location-less inlined variables should not have DW_TAG_variable Originally landed in `ddc2f1e3fb` and reverted in `d32deaab4d` because of a Generic test objecting. That was fixed up in `013613964f`. Original landing commit message follows: [DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-10 15:40:47 +00:00
Sander de Smalen	750a78cd5d	[ValueTypes] Add MVT for nxv1bf16. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D96249	2021-02-10 08:50:41 +00:00
Kazu Hirata	781d0fea72	[TableGen] Drop unnecessary const from return types (NFC)	2021-02-09 22:14:28 -08:00
Ta-Wei Tu	e89fcbfad6	Fix deprecated usage of `mallinfo` glibc deprecates `mallinfo` in the latest version of 2.33. This patch replaces the usage of `mallinfo` with the new `mallinfo2` when it's available. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D96359	2021-02-10 13:53:57 +08:00
Tyker	5652e192fc	Revert "[InstCombine] convert assumes to operand bundles" This reverts commit `5eb2e994f9`.	2021-02-10 01:32:00 +01:00
Matt Arsenault	b72a23650f	GlobalISel: Fix using wrong calling convention for callees This was taking the calling convention from the parent function, instead of the callee. Avoids regressions in a future patch when the caller and callee have different type breakdowns. For some reason AArch64's lowerFormalArguments seems to intentionally ignore the parent isVarArg.	2021-02-09 13:48:56 -05:00
Tyker	5eb2e994f9	[InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-09 19:33:53 +01:00
Alex Richardson	7dc3136033	[llvm-readobj] Add support for decoding FreeBSD ELF notes The current support only printed coredump notes, but most binaries also contain notes. This change adds names for four FreeBSD-specific notes and pretty-prints three of them: NT_FREEBSD_ABI_TAG: This note holds a 32-bit (decimal) integer containing the value of the __FreeBSD_version macro, which is defined in crt1.o and will hold a value such as 1300076 for a binary build on a FreeBSD 13 system. NT_FREEBSD_ARCH_TAG: A string containing the value of the build-time MACHINE_ARCH NT_FREEBSD_FEATURE_CTL: A 32-bit flag that indicates to the kernel that the binary wants certain bevahiour. Examples include setting NT_FREEBSD_FCTL_ASLR_DISABLE which tells the kernel to disable ASLR. After this change llvm-readobj also no longer decodes coredump-only FreeBSD notes in non-coredump files. I've also converted the note-freebsd.s test to use yaml2obj instead of llvm-mc. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D74393	2021-02-09 16:59:22 +00:00
Alex Richardson	d613d8eb0e	[yaml2obj] Handle NT_* string values in for ELF note types This is required for D74393. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95953	2021-02-09 16:59:22 +00:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Nemanja Ivanovic	a5222aa085	[DAGCombine] Do not remove masking argument to FP16_TO_FP for some targets As of commit `284f2bffc9`, the DAG Combiner gets rid of the masking of the input to this node if the mask only keeps the bottom 16 bits. This is because the underlying library function does not use the high order bits. However, on PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits from the register. Therefore, the library implementation of __gnu_h2f_ieee will return an incorrect result if the bits aren't cleared. This combine is desired for ARM (and possibly other targets) so this patch adds a query to Target Lowering to check if this zeroing needs to be kept. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092 Differential revision: https://reviews.llvm.org/D96283	2021-02-09 06:33:48 -06:00
Dylan McKay	2ccb941740	[AVR] Fix global references to function symbols References to functions are in program memory and need a `pm()` fixup. This should fix trait objects for Rust on AVR. Differential Revision: https://reviews.llvm.org/D87631 Patch by Alex Mikhalev.	2021-02-10 00:40:49 +13:00
Jan Svoboda	e721bc9eff	[clang][cli] Generate and round-trip CodeGen options This patch implements generation of remaining codegen options and tests it by performing parse-generate-parse round trip. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D96056	2021-02-09 11:43:38 +01:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Amara Emerson	ec41ed5b1b	[AArch64][GlobalISel] Support the 'returned' parameter attribute. On AArch64 (which seems to be the only target that supports it), this attribute allows codegen to avoid saving/restoring the value in x0 across a call. Gives a 0.1% geomean -Os code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D96099	2021-02-08 12:47:39 -08:00
Jamie Schmeiser	4b661b4059	Introduce -print-changed=[diff \| diff-quiet] which show changes in patch-like format Summary: Introduce base classes that hold a textual represent of the IR based on basic blocks and a base class for comparing this representation. A new change printer is introduced that uses these classes to save and compare representations of the IR before and after each pass. It only reports when changes are made by a pass (similar to -print-changed) except that the changes are shown in a patch-like format with those lines that are removed shown in red prefixed with '-' and those added shown in green with '+'. This functionality was introduced in my tutorial at the 2020 virtual developer's meeting. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D91890	2021-02-08 10:11:22 -05:00
Nicholas Guy	cd880442ae	[CodeGen][AArch64] Add TargetInstrInfo hook to modify the TailDuplicateSize default threshold Different targets might handle branch performance differently, so this patch allows for targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch can be and still be duplicated to generate straight-line code instead. This patch also specifies said override values for the AArch64 subtarget. Differential Revision: https://reviews.llvm.org/D95631	2021-02-08 13:28:00 +00:00
Thomas Symalla	f89f6d1e5d	[AMDGPU]: Fixes an invalid clamp selection pattern. When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation." Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D95878	2021-02-08 13:06:30 +01:00
Sander de Smalen	ba8637ca84	[ValueTypes] Fix size of nxv1f16 (32 -> 16). Clearly seems like this was a typo.	2021-02-08 11:00:47 +00:00
David Sherwood	3bbaece5a0	[Analysis] Remove unused functions from TargetLibraryInfo A simple clean-up to remove dead code. Differential Revision: https://reviews.llvm.org/D95934	2021-02-08 09:50:36 +00:00
Raphael Isemann	0ebf904baf	[modules] Put Frontend/OpenMP headers into a Clang module to fix the module build These headers can be in a Clang module like the rest. This also fixes the modules build that is currently struggling with these headers being textually included in several other modules.	2021-02-08 09:54:45 +01:00
Fangrui Song	d3e13b58cd	ELFObjectWriter: Don't de-duplicate STT_FILE symbols	2021-02-07 18:21:36 -08:00
Fangrui Song	09294642be	ELFObjectWriter: Make STT_FILE precede associated local symbols	2021-02-07 17:51:40 -08:00
Kazu Hirata	7b9f6c2d42	[SelectionDAG] Drop unnecessary const from a return type (NFC) Identified with const-return-type.	2021-02-07 09:49:33 -08:00
Kazu Hirata	b3ec6a602d	[IR] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-06 11:17:06 -08:00
Johannes Doerfert	b7d870eae7	[AssumptionCache] Avoid dangling llvm.assume calls in the cache PR49043 exposed a problem when it comes to RAUW llvm.assumes. While D96106 would fix it for GVNSink, it seems a more general concern. To avoid future problems this patch moves away from the vector of weak reference model used in the assumption cache. Instead, we track the llvm.assume calls with a callback handle which will remove itself from the cache if the call is deleted. Fixes PR49043. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96168	2021-02-06 12:18:39 -06:00
Johannes Doerfert	378f4e5ec2	[AssumptionCache] Do not track llvm.assume calls (PR49043) This fixes PR49043 by invalidating the handle on RAUW. This will work fine assuming all existing RAUW users add the new assumption to the cache. That means, if a new llvm.assume call replaces an old one, you need to add the new one now as a RAUW is not enough anymore. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96208	2021-02-06 12:18:30 -06:00
Fangrui Song	e44a100942	.gcc_except_table: Set SHF_LINK_ORDER if binutils>=2.36, and drop unneeded unique ID for -fno-unique-section-names GNU ld>=2.36 supports mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an output section, so we can set SHF_LINK_ORDER if -fbinutils-version=2.36 or above. If -fno-function-sections or older binutils, drop unique ID for -fno-unique-section-names. The users can just specify -fbinutils-version=2.36 or above to allow GC with both GNU ld and LLD. (LLD does not support garbage collection of non-group non-SHF_LINK_ORDER .gcc_except_table sections.)	2021-02-05 21:45:21 -08:00
Kazu Hirata	aa5c09bead	[llvm] Fix header guards (NFC) Identified with llvm-header-guard.	2021-02-05 21:02:06 -08:00
Wenlei He	801d9cc7b9	[CSSPGO] Use merged base profile for hot threshold calculation Context-sensitive profile effectively split a function profile into many copies each representing the CFG profile of a particular calling context. That makes the count distribution looks more flat as we now have more function profiles each with lower counts, which in turn leads to lower hot thresholds. Now we tells threshold computation to merge context profile first before calculating percentile based cutoffs to compensate for seemingly flat context profile. This can be controlled by swtich `sample-profile-contextless-threshold`. Earlier measurement showed ~0.4% perf boost with this tuning on spec2k6 for CSSPGO (with pseudo-probe and new inliner). Differential Revision: https://reviews.llvm.org/D95980	2021-02-05 17:51:00 -08:00
Wouter van Oortmerssen	5e5b2cb131	[WebAssembly] Prevent data inside text sections in assembly This is not supported in Wasm, unless the data was encoded instructions, but that wouldn't work with the assembler's other functionality (enforcing nesting etc.). Fixes: https://bugs.llvm.org/show_bug.cgi?id=48971 Differential Revision: https://reviews.llvm.org/D95838	2021-02-05 13:48:25 -08:00
Aaron Ballman	ec04e2850a	Allow SmallPtrSet to be used with a std::insert_iterator Currently, the SmallPtrSet type allows inserting elements but it does not support inserting elements with a positional hint. The lack of this signature means that you cannot use SmallPtrSet with std::insert_iterator or std::inserter(), which makes some code constructs more awkward. This adds an overload of insert() that can be used in these scenarios. The positional hint is unused by SmallPtrSet and the call is equivalent to calling insert() without a hint.	2021-02-05 16:12:47 -05:00
Sanjay Patel	c981f6f8e1	Revert "[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library" This reverts commit `2303e93e66`. Investigating bot failures.	2021-02-05 15:10:11 -05:00
Lukas Sommer	2303e93e66	[Codegen][ReplaceWithVecLib] add pass to replace vector intrinsics with calls to vector library This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM intrinsics operating on vector operands) with calls to a vector library. Currently, calls to LLVM intrinsics are only replaced with calls to vector libraries when scalar calls to intrinsics are vectorized by the Loop- or SLP-Vectorizer. With this pass, it is now possible to replace calls to LLVM intrinsics already operating on vector operands, e.g., if such code was generated by MLIR. For the replacement, information from the TargetLibraryInfo, e.g., as specified via -vector-library is used. Differential Revision: https://reviews.llvm.org/D95373	2021-02-05 14:25:19 -05:00
Thomas Preud'homme	00a62547da	Stop traping on sNaN in __builtin_isnan __builtin_isnan currently generates a floating-point compare operation which triggers a trap when faced with a signaling NaN in StrictFP mode. This commit uses integer operations instead to not generate any trap in such a case. Reviewed By: kpn Differential Revision: https://reviews.llvm.org/D95948	2021-02-05 18:28:48 +00:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
Simon Pilgrim	0712c2a2b8	CodeGenPassBuilder.h - fix Wdocumentation warning. NFCI. void functions shouldn't have a \returns	2021-02-05 11:11:37 +00:00
Simon Pilgrim	2a957e3e87	DWARFDebugFrame.h - fix Wdocumentation warning. NFCI.	2021-02-05 10:57:38 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Kazu Hirata	d29562b29c	[IR] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-04 21:18:02 -08:00
Fangrui Song	8d4cd2da1f	[MC] Add isFPImm after D96091	2021-02-04 20:51:02 -08:00
Fangrui Song	68d6918e7a	[MC] Add createFPImm/isFPImm/setFPImm to smooth migration from FPImm to DFPImm after D96091	2021-02-04 20:42:35 -08:00
Craig Topper	6b280ce34c	[RISCV] Use LLVMScalarOrSameVectorWidth to make avoid needing to mention the index type for vrgatherei16 intrinsics. Add .vv to the intrinsic name to be consistent with D95979. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D95981	2021-02-04 20:26:45 -08:00
Craig Topper	25ff302a79	[RISCV] Split vrgather intrinsics into separate vrgather.vv and vrgather.vx intrinsics. The vrgather.vv instruction uses a vector of indices with the same SEW as operand 0. The vrgather.vx instructions use a scalar index operand of XLen bits. By splitting this into 2 intrinsics we are able to use LLVMatchType in the definition to avoid specifying the type for the index operand when creating the IR for the intrinsic. For .vv it will match the operand 0 type. And for .vx it will match the type of the vl operand we already needed to specify a type for. I'm considering splitting more intrinsics. This was a somewhat odd one because the .vx doesn't use the element type, it always use XLen. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D95979	2021-02-04 19:50:12 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Dan Gohman	698c6b0a09	[WebAssembly] Support single-floating-point immediate value As mentioned in TODO comment, casting double to float causes NaNs to change bits. To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode. Patch by Yuta Saito. Differential Revision: https://reviews.llvm.org/D77384	2021-02-04 18:05:06 -08:00
Christopher Tetreault	b8b054aa8a	Reland "Ensure that InstructionCost actually implements a total ordering" The operator< in the previous attempt was incorrect. It is unfortunate that this was only caught by the expensive checks. This reverts commit `ff1147c363`.	2021-02-04 10:04:10 -08:00
Sander de Smalen	8f69da9f97	[ElementCount] NFC: Set 'const' qualifier for getWithIncrement/Decrement. These class methods simply return a new UnivariateLinearPolyBase (e.g. ElementCount), and do not modify the object in any way or form, so qualify for being 'const'.	2021-02-04 11:27:45 +00:00
Jan Svoboda	225ccf0c50	[clang][cli] Command line round-trip for HeaderSearch options This patch implements generation of remaining header search arguments. It's done manually in C++ as opposed to TableGen, because we need the flexibility and don't anticipate reuse. This patch also tests the generation of header search options via a round-trip. This way, the code gets exercised whenever Clang is built and tested in asserts mode. All `check-clang` tests pass. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D94472	2021-02-04 10:18:34 +01:00
Joachim Meyer	e3f02302e3	[Support] Indent multi-line descr of enum cli options. As noted in https://reviews.llvm.org/D93459, the formatting of multi-line descriptions of clEnumValN and the likes is unfavorable. Thus this patch adds support for correctly indenting these. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D93494	2021-02-04 10:14:44 +01:00
Petr Hosek	b42ccdf38f	[NFC] Fix the noprofile attribute comment	2021-02-03 21:54:09 -08:00
Kazu Hirata	b4de30f6af	[Support] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-03 20:41:16 -08:00
Michael Kruse	26b5be66f9	[OpenMPIRBuilder] Implement collapseLoops. The collapseLoops method implements a transformations facilitating the implementation of the collapse-clause. It takes a list of loops from a loop nest and reduces it to a single loop that can be used by other methods that are implemented on just a single loop, such as createStaticWorkshareLoop. This patch shares some changes with D92974 (such as adding some getters to CanonicalLoopNest), used by both patches. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93268	2021-02-03 19:12:02 -06:00
Nico Weber	b995314143	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `97ba5cde52`. Still breaks tests: https://reviews.llvm.org/D76802#2540647	2021-02-03 19:14:34 -05:00
Florian Hahn	7db390cc77	Revert "[LTO] Use lto::backend for code generation." This reverts commit `6a59f05606`, because it is causing failures on green dragon.	2021-02-03 22:49:30 +00:00
Florian Hahn	0a17664b47	Revert "[LTO] Add option enable NewPM with LTOCodeGenerator." This reverts commit `7a6a2cc81a` because it is causing failures on green dragon.	2021-02-03 22:49:20 +00:00
Florian Hahn	b0a8e41cff	Revert "[LTOCodeGenerator] Use lto::Config for options (NFC)." This reverts commit `0d487cf87a` because it is causing failures on green dragon.	2021-02-03 22:48:54 +00:00
Amara Emerson	1a13ee1efb	[GlobalISel] Add sext(constant) -> constant artifact combine. This is the G_SEXT counterpart to the existing G_ZEXT/G_ANYEXT combines. Differential Revision: https://reviews.llvm.org/D95729	2021-02-03 14:10:08 -08:00
Arthur Eubanks	f020544601	[NewPM][HelloWorld] Move HelloWorld to Utils To prevent creating a new component, which creates a new library. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D95907	2021-02-03 12:59:40 -08:00
Richard Smith	6b14c12688	Fix overflowing signed left shift, found by ubsan buildbot.	2021-02-03 12:51:39 -08:00
Krzysztof Parzyszek	0bb1985102	[Hexagon] Add LLVM instruction definitions for Hexagon V68	2021-02-03 13:59:34 -06:00
Jeremy Morse	d32deaab4d	Revert "[DWARF] Location-less inlined variables should not have DW_TAG_variable" This reverts commit `ddc2f1e3fb`. A build-bot objected: http://lab.llvm.org:8011/#builders/105/builds/5486	2021-02-03 17:54:33 +00:00
Jeremy Morse	ddc2f1e3fb	[DWARF] Location-less inlined variables should not have DW_TAG_variable Discussed in this thread: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html DwarfDebug::collectEntityInfo accidentally distinguishes between variable locations that never have a location specified, and variable locations that have an empty location specified. The latter leads to the creation of an empty variable referring to the abstract origin. Fix this by seeking a non-empty location before producing a concrete entity, to guarantee a DW_AT_location will be produced. Other loops in collectEntityInfo and endFunctionImpl take care of examining the retainedNodes collection and ensuring optimised-out variables are created. Differential Revision: https://reviews.llvm.org/D95617	2021-02-03 17:32:31 +00:00
Krzysztof Parzyszek	3562d253da	[Hexagon] Add ELF flags for Hexagon V68	2021-02-03 11:02:59 -06:00
Sebastian Neubauer	d49efdc969	Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path." This reverts commits 62af0305b7cc..677a3529d3e6 from D93708. They cause failures in the sanitizer builds because of uninitialized values. A fix is in D95878, but it might take some time until this is pushed, so reverting the changes for now.	2021-02-03 11:03:34 +01:00
Wang, Pengfei	fae6d129da	[X86] Correct types in tablegen multiclasses found by D95874. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D95926	2021-02-03 16:05:05 +08:00
Petr Hosek	97ba5cde52	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-02 23:19:51 -08:00
Kazu Hirata	c18231e3dd	[CodeGen] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-02-02 22:52:45 -08:00
Hsiangkai Wang	c7189ba785	[RISCV] Add new vector instructions in v0.10. * Add new vector instructions in v0.10. - load/store for mask value vle1.v vse1.v - vsetivli for 0-31 immediate vector length. * Rename vector instructions in v0.10. - vfrsqrte7 -> vfrsqrt7 - vfrece7 -> vfrec7 * Reserve memory width encodings for EEW>128b. Differential Revision: https://reviews.llvm.org/D95781	2021-02-03 13:28:58 +08:00
Yang Fan	c90c261e44	[CSSPGO] Fix MSVC initializing truncation warning (NFC) MSVC warning: ``` \llvm-project\llvm\include\llvm\Transforms\IPO\SampleProfileProbe.h(65): warning C4305: 'initializing': truncation from 'double' to 'const float' ```	2021-02-03 11:04:58 +08:00
Yang Fan	8178a55b25	[VFS] Fix Wreturn-type gcc warning (NFC) GCC warning: ``` In file included from /llvm-project/llvm/lib/Support/VirtualFileSystem.cpp:13: /llvm-project/llvm/include/llvm/Support/VirtualFileSystem.h: In static member function ‘static bool llvm::vfs::RedirectingFileSystem::RemapEntry::classof(const llvm::vfs::RedirectingFileSystem::Entry*)’: /llvm-project/llvm/include/llvm/Support/VirtualFileSystem.h:681:5: warning: control reaches end of non-void function [-Wreturn-type] 681 \| } \| ^ ```	2021-02-03 10:22:30 +08:00
Richard Smith	32e98f05fe	Diagnose if a SLEB128 is too large to fit in an int64_t. Previously we'd hit UB due to an invalid left shift operand. Also fix the WASM emitter to properly use SLEB128 encoding instead of ULEB128 encoding for signed fields so that negative numbers don't result in overly-large values that we can't read back any more. In passing, don't diagnose a non-canonical ULEB128 that fits in a uint64_t but has redundant trailing zero bytes. Reviewed By: dblaikie, aardappel Differential Revision: https://reviews.llvm.org/D95510	2021-02-02 14:33:34 -08:00
Christopher Tetreault	ff1147c363	Revert "Ensure that InstructionCost actually implements a total ordering" This reverts commit `b481cd519e`.	2021-02-02 12:10:02 -08:00
Hongtao Yu	3d89b3cbec	[CSSPGO] Introducing distribution factor for pseudo probe. Sample re-annotation is required in LTO time to achieve a reasonable post-inline profile quality. However, we have seen that such LTO-time re-annotation degrades profile quality. This is mainly caused by preLTO code duplication that is done by passes such as loop unrolling, jump threading, indirect call promotion etc, where samples corresponding to a source location are aggregated multiple times due to the duplicates. In this change we are introducing a concept of distribution factor for pseudo probes so that samples can be distributed for duplicated probes scaled by a factor. We hope that optimizations duplicating code well-maintain the branch frequency information (BFI) based on which probe distribution factors are calculated. Distribution factors are updated at the end of preLTO pipeline to reflect an estimated portion of the real execution count. This change also introduces a pseudo probe verifier that can be run after each IR passes to detect duplicated pseudo probes. A saturated distribution factor stands for 1.0. A pesudo probe will carry a factor with the value ranged from 0.0 to 1.0. A 64-bit integral distribution factor field that represents [0.0, 1.0] is associated to each block probe. Unfortunately this cannot be done for callsite probes due to the size limitation of a 32-bit Dwarf discriminator. A 7-bit distribution factor is used instead. Changes are also needed to the sample profile inliner to deal with prorated callsite counts. Call sites duplicated by PreLTO passes, when later on inlined in LTO time, should have the callees’s probe prorated based on the Prelink-computed distribution factors. The distribution factors should also be taken into account when computing hotness for inline candidates. Also, Indirect call promotion results in multiple callisites. The original samples should be distributed across them. This is fixed by adjusting the callisites' distribution factors. Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D93264	2021-02-02 11:55:01 -08:00
Christopher Tetreault	b481cd519e	Ensure that InstructionCost actually implements a total ordering Previously, operator== would consider the actual equality of the pairs (lhs.Value, lhs.State) == (rhs.Value, rhs.State). However, if an invalid cost was involved in a call to operator<, only the state would be compared. Thus, it was not the case that ({2, Invalid} < {3, Invalid} \|\| {2, Invalid} > {3, Invalid} \|\| {2, Invalid} == {3, Invalid}). This patch implements a true total ordering, where cost state is considered first, then value. While it's not really imporant that {2, Invalid} be considered to be less than {3, Invalid}, it's not a problem either. This patch also implements operator== in terms of operator<, so the two definitions will be kept in sync. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D95803	2021-02-02 11:49:14 -08:00
Greg McGary	3a9d2f1488	[lld-macho][NFC] refactor relocation handling Add per-reloc-type attribute bits and migrate code from per-target file into target independent code, driven by reloc attributes. Many cleanups Differential Revision: https://reviews.llvm.org/D95121	2021-02-02 10:54:53 -07:00
Fangrui Song	1560a00032	[yaml2obj/obj2yaml/llvm-readobj] Support SHF_GNU_RETAIN In binutils, the flag is defined for ELFOSABI_GNU and ELFOSABI_FREEBSD. It can be used to mark a section as a GC root. In practice, the flag has generic semantics and can be applied to many EI_OSABI values, so we consider it generic. Differential Revision: https://reviews.llvm.org/D95728	2021-02-02 09:19:53 -08:00
Florian Hahn	3e09bc2500	[ConstraintElimination] Add nicer way to dump constraints (NFC). Use ConstraintSystem::dump(Names) to display the result of decomposing a condition.	2021-02-02 16:36:45 +00:00
Tom Weaver	4f1320b77d	Revert "[InstrProfiling] Use !associated metadata for counters, data and values" This reverts commit `df3e39f60b`. introduced failing test instrprof-gc-sections.c causing build bot to fail: http://lab.llvm.org:8011/#/builders/53/builds/1184	2021-02-02 14:19:31 +00:00
Thomas Symalla	09508d2849	Reverted whitespace changes. Differential Revision: https://reviews.llvm.org/D90968	2021-02-02 09:14:54 +01:00
Thomas Symalla	ecbed4e0ab	Resolve formatting changes.	2021-02-02 09:14:53 +01:00
Thomas Symalla	cdfd9b3bf5	Move Combiner to PreLegalize step	2021-02-02 09:14:53 +01:00
Thomas Symalla	88a832aef1	Refactored the pattern matching.	2021-02-02 09:14:52 +01:00
Thomas Symalla	d41b7fa9bf	Renames	2021-02-02 09:14:52 +01:00
Thomas Symalla	d722924f20	Added comments.	2021-02-02 09:14:52 +01:00
Thomas Symalla	ec043967ec	clang-format	2021-02-02 09:14:52 +01:00
Thomas Symalla	62af0305b7	Added clamp i64 to i16 global isel pattern.	2021-02-02 09:14:52 +01:00
Wenlei He	6bae5973c4	[CSSPGO] Call site prioritized inlining for sample PGO This change implemented call site prioritized BFS profile guided inlining for sample profile loader. The new inlining strategy maximize the benefit of context-sensitive profile as mentioned in the follow up discussion of CSSPGO RFC. The change will not affect today's AutoFDO as it's opt-in. CSSPGO now defaults to the new FDO inliner, but can fall back to today's replay inliner using a switch (`-sample-profile-prioritized-inline=0`). Motivation With baseline AutoFDO, the inliner in sample profile loader only replays previous inlining, and the use of profile is only for pruning previous inlining that turned out to be cold. Due to the nature of replay, the FDO inliner is simple with hotness being the only decision factor. It has the following limitations that we're improving now for CSSPGO. - It doesn't take inline candidate size into account. Since it's doing replay, the size growth is bounded by previous CGSCC inlining. With context-sensitive profile, FDO inliner is no longer limited by previous inlining, so we need to take size into account to avoid significant size bloat. - The way it looks at hotness is not accurate. It uses total samples in an inlinee as proxy for hotness, while what really matters for an inline decision is the call site count. This is an unfortunate fall back because call site count and callee entry count are not reliable due to dwarf based correlation, especially for inlinees. Now paired with pseudo-probe, we have accurate call site count and callee's entry count, so we can use that to gauge hotness more accurately. - It treats all call sites from a block as hot as long as there's one call site considered hot. This is normally true, but since total samples is used as hotness proxy, this transitiveness within block magnifies the inacurate hotness heuristic. With pseduo-probe and the change above, this is no longer an issue for CSSPGO. New FDO Inliner Putting all the requirement for CSSPGO together, we need a top-down call site prioritized BFS inliner. Here're reasons why each component is needed. - Top-down: We need a top-down inliner to better leverage context-sensitive profile, so inlining is driven by accurate context profile, and post-inline is also accurate. This is already implemented in https://reviews.llvm.org/D70655. - Size Cap: For top-down inliner, taking function size into account for inline decision alone isn't sufficient to control size growth. We also need to explicitly cap size growth because with top-down inlining, we can grow inliner size significantly with large number of smaller inlinees even if each individually passes the cost/size check. - Prioritize call sites: With size cap, inlining order also becomes important, because if we stop inlining due to size budget limit, we'd want to use budget towards the most beneficial call sites. - BFS inline: Same as call site prioritization, if we stop inlining due to size budget limit, we want a balanced inline tree, rather than going deep on one call path. Note that the new inliner avoids repeatedly evaluating same set of call site, so it should help with compile time too. For this reason, we could transition today's FDO inliner to use a queue with equal priority to avoid wasted reevaluation of same call site (TODO). Speculative indirect call promotion and inlining is also supported now with CSSPGO just like baseline AutoFDO. Tunings and knobs I created tuning knobs for size growth/cap control, and for hot threshold separate from CGSCC inliner. The default values are selected based on initial tuning with CSSPGO. Results Evaluated with an internal LLVM fork couple months ago, plus another change to adjust hot-threshold cutoff for context profile (will send up after this one), the new inliner show ~1% geomean perf win on spec2006 with CSSPGO, while reducing code size too. The measurement was done using train-train setup, MonoLTO w/ new pass manager and pseudo-probe. Note that this is just a starting point - we hope that the new inliner will open up more opportunity with CSSPGO, but it will certainly take more time and effort to make it fully calibrated and ready for bigger workloads (we're working on it). Differential Revision: https://reviews.llvm.org/D94001	2021-02-01 23:46:34 -08:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Nathan Hawes	ecb00a7762	[VFS] Add support to RedirectingFileSystem for mapping a virtual directory to one in the external FS. Previously file entries in the -ivfsoverlay yaml could map to a file in the external file system, but directories had to list their contents in the form of other file entries or directories. Allowing directory entries to map to a directory in the external file system makes it possible to present an external directory's contents in a different location and (in combination with the 'fallthrough' option) overlay one directory's contents on top of another. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94844	2021-02-02 14:56:17 +10:00
Kazu Hirata	7a37d981d9	[llvm] Use pop_back_val (NFC)	2021-02-01 20:55:05 -08:00
Rahman Lavaee	f1ff6d210a	[obj2yaml, yaml2obj] Use Hex64 for BBAddressMap fields. This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata. Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D95767	2021-02-01 15:37:30 -08:00
Petr Hosek	df3e39f60b	[InstrProfiling] Use !associated metadata for counters, data and values C identifier name input sections such as __llvm_prf_* are GC roots so they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the C identifier name semantics. The !associated metadata may be attached to a global object declaration with a single argument that references another global object, and it gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded by the linker, setting up !associated metadata allows linker to discard counters, data and values associated with that function symbol. Note that !associated metadata is only supported by ELF, it does not have any effect on non-ELF targets. Differential Revision: https://reviews.llvm.org/D76802	2021-02-01 15:01:43 -08:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Philip Reames	2a53d9a6e7	[Loads] Plumb through TLI argument [NFC] This is a (rather delayed) follow up to commit `0129cd5`. This commit is entirely NFC, the semantic change to leverage the new information will be submitted separate with a test case.	2021-02-01 11:45:30 -08:00
Simon Pilgrim	657e769688	Revert rGce587529ad8b5 - "[APFloat] multiplySignificand - pass IEEEFloat as const reference. NFCI." Breaks on some buildbots	2021-02-01 16:15:23 +00:00
J-Y You	267b573b55	[TableGen] Fix anonymous record self-reference in foreach and multiclass If we instantiate self-referenced anonymous records in foreach and multiclass, the NAME value will point to incorrect record. It's because anonymous name is resolved too early. This patch adds AnonymousNameInit to represent an anonymous record name. When instantiating an anonymous record, it will update the referred name. Differential Revision: https://reviews.llvm.org/D95309	2021-02-01 10:59:07 -05:00
Simon Pilgrim	ce587529ad	[APFloat] multiplySignificand - pass IEEEFloat as const reference. NFCI. Avoids unnecessary IEEEFloat copies.	2021-02-01 15:41:50 +00:00
Kerry McLaughlin	9b4fcfaa9e	[SVE][CodeGen] Remove performMaskedGatherScatterCombine The AArch64 DAG combine added by D90945 & D91433 extends the index of a scalable masked gather or scatter to i32 if necessary. This patch removes the combine and instead adds shouldExtendGSIndex, which is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether the index should be extended before calling getMaskedGather/Scatter. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94525	2021-02-01 14:10:00 +00:00
Serge Pavlov	bf416d166b	[FPEnv] Intrinsic for setting rounding mode To set non-default rounding mode user usually calls function 'fesetround' from standard C library. This way has some disadvantages. * It creates unnecessary dependency on libc. On the other hand, setting rounding mode requires few instructions and could be made by compiler. Sometimes standard C library even is not available, like in the case of GPU or AI cores that execute small kernels. * Compiler could generate more effective code if it knows that a particular call just sets rounding mode. This change introduces new IR intrinsic, namely 'llvm.set.rounding', which sets current rounding mode, similar to 'fesetround'. It however differs from the latter, because it is a lower level facility: * 'llvm.set.rounding' does not return any value, whereas 'fesetround' returns non-zero value in the case of failure. In glibc 'fesetround' reports failure if its argument is invalid or unsupported or if floating point operations are unavailable on the hardware. Compiler usually knows what core it generates code for and it can validate arguments in many cases. * Rounding mode is specified in 'fesetround' using constants like 'FE_TONEAREST', which are target dependent. It is inconvenient to work with such constants at IR level. C standard provides a target-independent way to specify rounding mode, it is used in FLT_ROUNDS, however it does not define standard way to set rounding mode using this encoding. This change implements only IR intrinsic. Lowering it to machine code is target-specific and will be implemented latter. Mapping of 'fesetround' to 'llvm.set.rounding' is also not implemented here. Differential Revision: https://reviews.llvm.org/D74729	2021-02-01 11:28:14 +07:00
Craig Topper	70289ea6f5	[RISCV][LegalizeTypes] Try to expand BSWAP before promoting if the promoted BSWAP would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. This is helfpul on RISCV where we might have to promote i16 all the way to i64. Differential Revision: https://reviews.llvm.org/D95756	2021-01-31 14:33:29 -08:00
Florian Hahn	0d487cf87a	[LTOCodeGenerator] Use lto::Config for options (NFC). This patch removes some options that have been duplicated in LTOCodeGenerator and instead use lto::Config directly to manage the options. This is a cleanup after `6a59f05606`. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95738	2021-01-31 19:08:07 +00:00
Kazu Hirata	3d1200b9f6	[llvm] Drop unnecessary const from return types (NFC) Identified with const-return-type.	2021-01-31 10:23:43 -08:00
Alexey Lapshin	fb244ffb9f	[dsymutil][DWARFLinker][NFC] make AddressManager not depending on the order of checks for relocations. Current dsymutil implementation of hasLiveMemoryLocation()/hasLiveAddressRange() and applyValidRelocs() assume that calls should be done in certain order (from first Dies to last). Multi-thread implementation might call these methods in other order(it might process compilation units in order other than they are physically located), so we remove restriction that searching for relocations should be done in ascending order. This change does not introduce noticable performance degradation. The testing results for clang binary: golden-dsymutil/dsymutil 23787992 clang MD5: 5efa8fd9355ebf81b65f24db5375caa2 elapsed time=91sec build-Release/bin/dsymutil 23855616 clang MD5: 5efa8fd9355ebf81b65f24db5375caa2 elapsed time=91sec Differential Revision: https://reviews.llvm.org/D93106	2021-01-31 16:34:10 +03:00
Kazu Hirata	627b5bda11	[llvm] Add missing header guards (NFC) Identified with llvm-header-guard.	2021-01-30 09:53:42 -08:00
Florian Hahn	7a6a2cc81a	[LTO] Add option enable NewPM with LTOCodeGenerator. This patch adds an option to enable the new pass manager in LTOCodeGenerator. It also updates a few tests with legacy PM specific tests, which started failing after `6a59f05606` when LLVM_ENABLE_NEW_PASS_MANAGER=true.	2021-01-30 11:54:20 +00:00
Florian Hahn	6a59f05606	[LTO] Use lto::backend for code generation. This patch updates LTOCodeGenerator to use the utilities provided by LTOBackend to run middle-end optimizations and backend code generation. This is a first step towards unifying the code used by libLTO's C API and the newer, C++ interface (see PR41541). The immediate motivation is to allow using the new pass manager when doing LTO using libLTO's C API, which is used on Darwin, among others. With the changes, there are no codegen/stats differences when building MultiSource/SPEC2000/SPEC2006 on Darwin X86 with LTO, compared to without the patch. Reviewed By: steven_wu Differential Revision: https://reviews.llvm.org/D94487	2021-01-30 10:09:55 +00:00
Kazu Hirata	7728cc003a	[llvm] Use append_range (NFC)	2021-01-29 23:23:34 -08:00
Nathan Hawes	719f778441	[VFS] Combine VFSFromYamlDirIterImpl and OverlayFSDirIterImpl into a single implementation (NFC) As a fixme notes, both of these directory iterator implementations are conceptually similar and duplicate the functionality of returning and uniquing entries across two or more directories. This patch combines them into a single class 'CombiningDirIterImpl'. This also drops the 'Redirecting' prefix from RedirectingDirEntry and RedirectingFileEntry to save horizontal space. There's no loss of clarity as they already have to be prefixed with 'RedirectingFileSystem::' whenever they're referenced anyway. rdar://problem/72485443 Differential Revision: https://reviews.llvm.org/D94857	2021-01-30 11:10:10 +10:00
Roman Lebedev	c2534a7097	[ShadowStackGCLowering] Preserve Dominator Tree, if avaliable This doesn't help avoid any Dominator Tree recalculations just yet, there's one more pass to go..	2021-01-30 01:14:51 +03:00
Christopher Tetreault	49a6502cd5	[SVE] delete VectorType::getNumElements() The previously agreed-upon deprecation period for VectorType::getNumElements() has passed. This patch removes this method and completes the refactor proposed in the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-March/139811.html Reviewed By: david-arm, rjmccall Differential Revision: https://reviews.llvm.org/D95570	2021-01-29 13:46:54 -08:00
Jay Foad	5cf6412a27	[GlobalISel] Fix modifying a G_OR without notifying the observer Remove the call to setFlags in favour of creating the instruction with the correct flags in the first place, so we don't have to explicitly notify the observer. Differential Revision: https://reviews.llvm.org/D95681	2021-01-29 16:32:24 +00:00
Florian Hahn	f3a710cade	[LTO] Update splitCodeGen to take a reference to the module. (NFC) splitCodeGen does not need to take ownership of the module, as it currently clones the original module for each split operation. There is an ~4 year old fixme to change that, but until this is addressed, the function can just take a reference to the module. This makes the transition of LTOCodeGenerator to use LTOBackend a bit easier, because under some circumstances, LTOCodeGenerator needs to write the original module back after codegen. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D95222	2021-01-29 11:53:11 +00:00
Kazu Hirata	046cfb8565	[llvm] Forward-declare formatted_raw_ostream (NFC) Various TargetStreamer.h need formatted_raw_ostream but rely on a forward declaration of formatted_raw_ostream in MCStreamer.h. This patch adds forward declarations right in TargetStreamer.h. While we are at it, this patch removes the one in MCStreamer.h, where it is unnecessary.	2021-01-28 22:21:13 -08:00
Christudasan Devadasan	892e4567e1	Support a list of CostPerUse values This patch allows targets to define multiple cost values for each register so that the cost model can be more flexible and better used during the register allocation as per the target requirements. For AMDGPU the VGPR allocation will be more efficient if the register cost can be associated dynamically based on the calling convention. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86836	2021-01-29 10:14:52 +05:30

... 3 4 5 6 7 ...

44335 Commits