llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	29a2b20ab3	[SDAG] simplify FP binops to undef As discussed in the commit thread for rGa253a2a and D73978, we can do more undef folding for FP ops. The nnan and ninf fast-math-flags specify that if an operand is the disallowed value, the result is poison, so we can produce an undef result. But this doesn't work as expected (the undef operand cases remain) because of a Flags propagation problem in SelectionDAGBuilder. I've added DAGCombiner calls to enable these for the other cases because we've shown in other patches that (because of the limited way that SDAG iterates), it is possible to miss simplifications like this if they are done only at node creation time. Several potential follow-ups to expand on this patch are possible. Differential Revision: https://reviews.llvm.org/D75576	2020-03-04 10:42:16 -05:00
Amara Emerson	e91e1df6ab	[GlobalISel][Localizer] Enable intra-block localization of already-local uses. This changes the localizer to attempt intra-block localizer of instructions that have local uses. This is useful because sometimes the entry block itself has many uses of constant-like instructions, which would benefit from shortening live ranges. Previously if an inst had no non-local uses, we wouldn't add it to the list of instructions to attempt further intra-block localization. This gives a 0.7% geomean code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D75555	2020-03-03 18:14:57 -08:00
Fangrui Song	90acc505ed	[MCDwarf] Change emitListsTableHeaderStart to use a reference and fold Start/End symbols generation into it Apply @dblaikie's suggestions in a post-commit review for D75375 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D75568	2020-03-03 16:20:40 -08:00
Amy Huang	5b3b21f025	[DebugInfo] Fix for adding "returns cxx udt" option to functions in CodeView. Summary: This change checks for the return type in the frontend and adds a flag to the DISubroutineType to indicate that the option should be added in CodeViewDebug. Previously function types sometimes appeared twice in the PDB: once with "returns cxx udt" and once without. See https://bugs.llvm.org/show_bug.cgi?id=44785. Reviewers: rnk, asmith Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75215	2020-03-03 14:00:08 -08:00
Vedant Kumar	f002ee55c7	[MachineVerifier] Remove placement rule exception for debug entry values There should not be an exception allowing debug entry values to be placed after a terminator. Differential Revision: https://reviews.llvm.org/D75559	2020-03-03 13:02:18 -08:00
Vedant Kumar	2bf496620c	[LiveDebugValues] Do not insert DBG_VALUEs after a MBB terminator This fixes a miscompile that happened because a DBG_VALUE interfered with the MachineOutliner's liveness analysis. Inserting a DBG_VALUE after a terminator breaks predicates on MBB such as isReturnBlock(). And the resulting DBG_VALUE cannot be "live". I plan to introduce a MachineVerifier check for this situation in a follow up. rdar://59859175 Testing: check-llvm, LNT build with a stage2 compiler & entry values enabled Differential Revision: https://reviews.llvm.org/D75548	2020-03-03 13:00:52 -08:00
Fangrui Song	55a56041d1	[MCDwarf] Generate DWARF v5 .debug_rnglists for assembly files ``` // clang -c -gdwarf-5 a.s -o a.o .section .init; ret .text; ret ``` .debug_info contains DW_AT_ranges and llvm-dwarfdump will report a verification error because .debug_rnglists does not exist (not implemented). This patch generates .debug_rnglists for assembly files. emitListsTableHeaderStart() in DwarfDebug.cpp can be shared with MCDwarf.cpp. Because CodeGen depends on MC, I move the function to MCDwarf.cpp Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D75375	2020-03-03 09:03:34 -08:00
Craig Topper	d8ad7cc088	[DAGCombiner][X86] Improve narrowExtractedVectorLoad to handle cases where the element size isn't byte sized by the subvector is. Summary: Follow up from D75377. If the subvector is byte sized and the index is aligned to the subvector size, we can shrink the load. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: dbabokin, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75434	2020-03-03 08:41:31 -08:00
Sam Parker	5618e9be37	[RDA][ARM] collectKilledOperands across multiple blocks Use MIOperand in collectLocalKilledOperands to make the search global, as we already have to search for global uses too. This allows us to delete more dead code when tail predicating. Differential Revision: https://reviews.llvm.org/D75167	2020-03-03 15:23:05 +00:00
Sam Parker	dfe8f5da4c	[ARM][RDA] Allow multiple killed users In RDA, check against the already decided dead instructions when looking at users. This allows an instruction to be removed if it has multiple users, but they're all dead. This means that IT instructions can be considered killed once all the itstate using instructions are dead. Differential Revision: https://reviews.llvm.org/D75245	2020-03-03 15:12:29 +00:00
Clement Courbet	b0ae20d92e	[ExpandMemCmp][NFC] Fix typo in comment.	2020-03-03 11:07:13 +01:00
Awanish Pandey	1cb0e01e42	[DebugInfo][DWARF5]: Added support for debuginfo generation for defaulted parameters This patch adds support for dwarf emission/dumping part of debuginfo generation for defaulted parameters. Reviewers: probinson, aprantl, dblaikie Reviewed By: aprantl, dblaikie Differential Revision: https://reviews.llvm.org/D73462	2020-03-03 13:09:53 +05:30
Vedant Kumar	d64a22a2ad	[LiveDebugValues] Prevent some misuse of LocIndex::fromRawInteger, NFC Make it a compile-time error to pass an int/unsigned/etc to fromRawInteger. Hopefully this prevents errors of the form: ``` for (unsigned ID : getVarLocs()) { auto VL = LocMap[LocIndex::fromRawInteger(ID)]; ... ```	2020-03-02 16:59:09 -08:00
Jordan Rupprecht	d7803c3832	Add default case to fix -Wswitch errors	2020-03-02 14:23:46 -08:00
Craig Topper	adc69729ec	[TargetLowering] Fix what look like copy/paste mistakes in compare with infinity handling SimplifySetCC. I expect that the isCondCodeLegal checks should match that CC of the node that we're going to create. Rewriting to a switch to minimize repeated mentions of the same constants.	2020-03-02 14:12:16 -08:00
Stanislav Mekhanoshin	1bacdcf48d	Extend LaneBitmask to 64 bit This is needed for D74873, AMDGPU going to have 16 bit subregs and the largest tuple is 32 VGPRs, which results in 64 lanes. Differential Revision: https://reviews.llvm.org/D75378	2020-03-02 12:10:52 -08:00
Volkan Keles	4167645d1e	GlobalISel: Move Localizer::shouldLocalize(..) to TargetLowering Add a new target hook for shouldLocalize so that targets can customize the logic. https://reviews.llvm.org/D75207	2020-03-02 09:15:40 -08:00
Simon Pilgrim	d20fb7ea13	Fix shadow variable warning. NFC.	2020-03-02 11:41:20 +00:00
Simon Pilgrim	e4380b07cc	Fix operator precedence warning. NFCI.	2020-03-02 10:56:58 +00:00
Serguei Katkov	496e0a99c7	[InlineSpiller] Relax re-materialization restriction for statepoint We should be careful to allow count of re-materialization of operands to be less then number of physical registers. STATEPOINT instruction has a variable number of operands and potentially very big. So re-materialization for all operands is disabled at the moment if restrict-statepoint-remat is true. The patch relaxes the re-materialization restriction for STATEPOINT instruction allowing it for fixed operands. Specifically it is about call target. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits, qcolombet, hiraditya Differential Revision: https://reviews.llvm.org/D75335	2020-03-02 11:25:44 +07:00
Craig Topper	0cd6712a7a	[DAGCombiner][X86] Disable narrowExtractedVectorLoad if the element type size isn't byte sized The address calculation for the offset assumes that you can calculate the offset by multiplying the index by the store size of the element. But that only works if the element's store size is exactly its real size since we store vectors tightly packed in memory. There are improvements we could make to this like special casing extracting element 0. I think we could also handle cases where the extracted VT is byte sized and the index is aligned with the extract element count. Differential Revision: https://reviews.llvm.org/D75377	2020-03-01 18:13:25 -08:00
Craig Topper	b6e2796114	[X86][TwoAddressInstructionPass] Teach tryInstructionCommute to continue checking for commutable FMA operands in more cases. Previously we would only check for another commutable operand if the first commute was an aggressive commute. But if we have two kill operands and neither is tied to the def at the start, we should consider both operands as the one to use as the new def. This improves the loop in the fma-commute-loop.ll test. This test is derived from a post from discourse here https://llvm.discourse.group/t/unnecessary-vmovapd-instructions-generated-can-you-hint-in-favor-of-vfmadd231pd/582 Differential Revision: https://reviews.llvm.org/D75016	2020-03-01 16:38:08 -08:00
Craig Topper	211fb91f10	[DAGCombiner] Don't emit select_cc from visitSINT_TO_FP/visitUINT_TO_FP. Use plain select instead. Select_cc isn't used by all targets. X86 doesn't have optimizations for it. Since we already know the input to the sint_to_fp/uint_to_fp is a setcc we can just emit a plain select using that setcc as the condition. Other DAG combines can turn that into a select_cc on targets that support it. Differential Revision: https://reviews.llvm.org/D75415	2020-03-01 10:52:17 -08:00
Sanjay Patel	619d7dc39a	[DAGCombiner] recognize shuffle (shuffle X, Mask0), Mask --> splat X We get the simple cases of this via demanded elements and other folds, but that doesn't work if the values have >1 use, so add a dedicated match for the pattern. We already have this transform in IR, but it doesn't help the motivating x86 tests (based on PR42024) because the shuffles don't exist until after legalization and other combines have happened. The AArch64 test shows a minimal IR example of the problem. Differential Revision: https://reviews.llvm.org/D75348	2020-03-01 09:10:25 -05:00
Simon Pilgrim	d955b221cb	[MachineInst] Remove dead code. NFCI. The MachineFunction MF value is not used any more and is always null.	2020-02-29 19:25:02 +00:00
Simon Pilgrim	6e7a768354	Make argument const to silence cppcheck warning. NFCI.	2020-02-29 19:25:01 +00:00
Fangrui Song	692e0c9648	[MC] Add MCStreamer::emitInt{8,16,32,64} Similar to AsmPrinter::emitInt{8,16,32,64}.	2020-02-29 09:40:21 -08:00
Vedant Kumar	dd1ea9de2e	Reland: [Coverage] Revise format to reduce binary size Try again with an up-to-date version of D69471 (`99317124` was a stale revision). --- Revise the coverage mapping format to reduce binary size by: 1. Naming function records and marking them `linkonce_odr`, and 2. Compressing filenames. This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB) and speeds up end-to-end single-threaded report generation by 10%. For reference the compressed name data in llc is 81MB (__llvm_prf_names). Rationale for changes to the format: - With the current format, most coverage function records are discarded. E.g., more than 97% of the records in llc are duplicate placeholders for functions visible-but-not-used in TUs. Placeholders are used to show under-covered functions, but duplicate placeholders waste space. - We reached general consensus about giving (1) a try at the 2017 code coverage BoF [1]. The thinking was that using `linkonce_odr` to merge duplicates is simpler than alternatives like teaching build systems about a coverage-aware database/module/etc on the side. - Revising the format is expensive due to the backwards compatibility requirement, so we might as well compress filenames while we're at it. This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB). See CoverageMappingFormat.rst for the details on what exactly has changed. Fixes PR34533 [2], hopefully. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html [2] https://bugs.llvm.org/show_bug.cgi?id=34533 Differential Revision: https://reviews.llvm.org/D69471	2020-02-28 18:12:04 -08:00
Vedant Kumar	3388871714	Revert "[Coverage] Revise format to reduce binary size" This reverts commit `99317124e1`. This is still busted on Windows: http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/40873 The llvm-cov tests report 'error: Could not load coverage information'.	2020-02-28 18:03:15 -08:00
Vedant Kumar	99317124e1	[Coverage] Revise format to reduce binary size Revise the coverage mapping format to reduce binary size by: 1. Naming function records and marking them `linkonce_odr`, and 2. Compressing filenames. This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB) and speeds up end-to-end single-threaded report generation by 10%. For reference the compressed name data in llc is 81MB (__llvm_prf_names). Rationale for changes to the format: - With the current format, most coverage function records are discarded. E.g., more than 97% of the records in llc are duplicate placeholders for functions visible-but-not-used in TUs. Placeholders are used to show under-covered functions, but duplicate placeholders waste space. - We reached general consensus about giving (1) a try at the 2017 code coverage BoF [1]. The thinking was that using `linkonce_odr` to merge duplicates is simpler than alternatives like teaching build systems about a coverage-aware database/module/etc on the side. - Revising the format is expensive due to the backwards compatibility requirement, so we might as well compress filenames while we're at it. This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB). See CoverageMappingFormat.rst for the details on what exactly has changed. Fixes PR34533 [2], hopefully. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html [2] https://bugs.llvm.org/show_bug.cgi?id=34533 Differential Revision: https://reviews.llvm.org/D69471	2020-02-28 17:33:25 -08:00
Vedant Kumar	0368b42295	[entry values] ARM: Add a describeLoadedValue override (PR45025) As a narrow stopgap for the assertion failure described in PR45025, add a describeLoadedValue override to ARMBaseInstrInfo and use it to detect copies in which the forwarding reg is a super/sub reg of the copy destination. For the moment this is unsupported. Several follow ups are possible: 1) Handle VORRq. At the moment, we do not, because isCopyInstrImpl returns early when !MI.isMoveReg(). 2) In the case where forwarding reg is a super-reg of the copy destination, we should be able to describe the forwarding reg as a subreg within the copy destination. I'm not 100% sure about this, but it looks like that's what's done in AArch64InstrInfo. 3) In the case where the forwarding reg is a sub-reg of the copy destination, maybe we could describe the forwarding reg using the copy destinaion and a DW_OP_LLVM_fragment (I guess this should be possible after D75036). https://bugs.llvm.org/show_bug.cgi?id=45025 rdar://59772698 Differential Revision: https://reviews.llvm.org/D75273	2020-02-28 14:30:40 -08:00
David Green	1de1070559	[DAGCombine] Fix alias analysis for unaligned accesses The alias analysis in DAG Combine looks at the BaseAlign, the Offset and the Size of two accesses, and determines if they are known to access different parts of memory by the fact that they are different offsets from inside that "alignment window". It does not seem to account for accesses that are not a multiple of the size, and may overflow from one alignment window into another. For example in the test case we have a 19byte memset that is splits into a 16 byte neon store and an unaligned 4 byte store with a 15 byte offset. This 15byte offset (with a base align of 8) wraps around to the next alignment windows. When compared to an access that is a 16byte offset (of the same 4byte size and 8byte basealign), the two accesses are said not to alias. I've fixed this here by just ensuring that the offsets are a multiple of the size, ensuring that they don't overlap by wrapping. Fixes PR45035, which was exposed by the UseAA changes in the arm backend. Differential Revision: https://reviews.llvm.org/D75238	2020-02-28 18:44:36 +00:00
Simon Pilgrim	4bc6f63320	[TargetLowering] SimplifyDemandedBits - fix SCALAR_TO_VECTOR knownbits bug We can only report the knownbits for a SCALAR_TO_VECTOR node if we only demand the 0'th element - the upper elements are undefined and shouldn't be trusted. This is causing a number of regressions that need addressing but we need to get the bugfix in first.	2020-02-28 15:23:37 +00:00
Jeremy Morse	6af859dcca	[DebugInfo] Re-implement LexicalScopes dominance method, add unit tests Way back in D24994, the combination of LexicalScopes::dominates and LiveDebugValues was identified as having worst-case quadratic complexity, but it wasn't triggered by any code path at the time. I've since run into a scenario where this occurs, in a very large basic block where large numbers of inlined DBG_VALUEs are present. The quadratic-ness comes from LiveDebugValues::join calling "dominates" on every variable location, and LexicalScopes::dominates potentially touching every instruction in a block to test for the presence of a scope. We have, however, already computed the presence of scopes in blocks, in the "InstrRanges" of each scope. This patch switches the dominates method to examine whether a block is present in a scope's InsnRanges, avoiding walking through the whole block. At the same time, fix getMachineBasicBlocks to account for the fact that InsnRanges can cover multiple blocks, and add some unit tests, as Lexical Scopes didn't have any. Differential revision: https://reviews.llvm.org/D73725	2020-02-28 11:41:28 +00:00
Sam Parker	bf61421a02	[RDA] Track implicit-defs Ensure that we're recording implicit defs, as well as visiting implicit uses and implicit defs when we're walking through operands. Differential Revision: https://reviews.llvm.org/D75185	2020-02-28 11:14:42 +00:00
serge-sans-paille	6d15c4deab	No longer generate calls to *_finite According to Joseph Myers, a libm maintainer > They were only ever an ABI (selected by use of -ffinite-math-only or > options implying it, which resulted in the headers using "asm" to redirect > calls to some libm functions), not an API. The change means that ABI has > turned into compat symbols (only available for existing binaries, not for > anything newly linked, not included in static libm at all, not included in > shared libm for future glibc ports such as RV32), so, yes, in any case > where tools generate direct calls to those functions (rather than just > following the "asm" annotations on function declarations in the headers), > they need to stop doing so. As a consequence, we should no longer assume these symbols are available on the target system. Still keep the TargetLibraryInfo for constant folding. Differential Revision: https://reviews.llvm.org/D74712	2020-02-28 10:07:37 +01:00
Vedant Kumar	a993720397	[LiveDebugValues] Encode register location within VarLoc IDs [3/3] This is part 3 of a 3-part series to address a compile-time explosion issue in LiveDebugValues. --- Start encoding register locations within VarLoc IDs, and take advantage of this encoding to speed up transferRegisterDef. There is no fundamental algorithmic change: this patch simply swaps out SparseBitVector in favor of CoalescingBitVector. That changes iteration order (hence the test updates), but otherwise this patch is NFCI. The only interesting change is in transferRegisterDef. Instead of doing: ``` KillSet = {} for (ID : OpenRanges.getVarLocs()) if (DeadRegs.count(ID)) KillSet.add(ID) ``` We now do: ``` KillSet = {} for (Reg : DeadRegs) for (ID : intervalsReservedForReg(Reg, OpenRanges.getVarLocs())) KillSet.add(ID) ``` By not visiting each open location every time we visit an instruction, this eliminates some potentially quadratic behavior. The new implementation basically does a constant amount of work per instruction because the interval map lookups are very fast. For a file in WebKit, this brings the time spent in LiveDebugValues down from ~2.5 minutes to 4 seconds, reducing compile time spent in that pass from 28% of the total to just over 1%. Before: ``` 2.49 min 27.8% 0 s LiveDebugValues::process 2.41 min 27.0% 5.40 s LiveDebugValues::transferRegisterDef 1.51 min 16.9% 1.51 min LiveDebugValues::VarLoc::isDescribedByReg() const 32.73 s 6.1% 8.70 s llvm::SparseBitVector<128u>::SparseBitVectorIterator::operator++() ``` After: ``` 4.53 s 1.1% 0 s LiveDebugValues::process 3.00 s 0.7% 107.00 ms LiveDebugValues::transferRegisterCopy 892.00 ms 0.2% 406.00 ms LiveDebugValues::transferSpillOrRestoreInst 404.00 ms 0.1% 32.00 ms LiveDebugValues::transferRegisterDef 110.00 ms 0.0% 2.00 ms LiveDebugValues::getUsedRegs 57.00 ms 0.0% 1.00 ms std::__1::vector<>::push_back 40.00 ms 0.0% 1.00 ms llvm::CoalescingBitVector<>::find(unsigned long long) ``` FWIW, I tried the same approach using SparseBitVector, but got bad results. To do that, I had to extend SparseBitVector to support 64-bit indices and expose its lower bound operation. The problem with this is that the performance is very hard to predict: SparseBitVector's lower bound operation falls back to O(n) linear scans in a std::list if you're not /very/ careful about managing iteration order. When I profiled this the performance looked worse than the baseline. You can see the full CoalescingBitVector-based implementation here: https://github.com/vedantk/llvm-project/commits/try-coalescing You can see the full SparseBitVector-based implementation here: https://github.com/vedantk/llvm-project/commits/try-sparsebitvec-find Depends on D74984 and D74985. Differential Revision: https://reviews.llvm.org/D74986	2020-02-27 12:39:47 -08:00
Vedant Kumar	210c4853de	[LiveDebugValues] Encode a location in VarLoc IDs, NFC [2/3] This is part 2 of a 3-part series to address a compile-time explosion issue in LiveDebugValues. --- Each VarLoc has a unique ID: this ID is used to look up a VarLoc in the VarLocMap, and to virtually insert a VarLoc into a VarLocSet. Instead of inserting the VarLoc /itself/ into the VarLocSet, we insert just the ID, because this can be represented efficiently with a SparseBitVector. This change introduces LocIndex, a layer of abstraction on top of VarLoc IDs. Prior to this change, an ID was just an index into a vector. With this change, an ID encodes both an index /and/ a register location. The type-checker ensures that conversions to and from LocIndex are correct. For the moment the register location is always 0 (undef). We have plenty of bits left over to encode physregs, stack slots, and other locations in the future. Differential Revision: https://reviews.llvm.org/D74985	2020-02-27 12:39:47 -08:00
Sanjay Patel	90fd859f51	[x86] use instruction-level fast-math-flags to drive MachineCombiner The code changes here are hopefully straightforward: 1. Use MachineInstruction flags to decide if FP ops can be reassociated (use both "reassoc" and "nsz" to be consistent with IR transforms; we probably don't need "nsz", but that's a safer interpretation of the FMF). 2. Check that both nodes allow reassociation to change instructions. This is a stronger requirement than we've usually implemented in IR/DAG, but this is needed to solve the motivating bug (see below), and it seems unlikely to impede optimization at this late stage. 3. Intersect/propagate MachineIR flags to enable further reassociation in MachineCombiner. We managed to make MachineCombiner flexible enough that no changes are needed to that pass itself. So this patch should only affect x86 (assuming no other targets have implemented the hooks using MachineIR flags yet). The motivating example in PR43609 is another case of fast-math transforms interacting badly with special FP ops created during lowering: https://bugs.llvm.org/show_bug.cgi?id=43609 The special fadd ops used for converting int to FP assume that they will not be altered, so those are created without FMF. However, the MachineCombiner pass was being enabled for FP ops using the global/function-level TargetOption for "UnsafeFPMath". We managed to run instruction/node-level FMF all the way down to MachineIR sometime in the last 1-2 years though, so we can do better now. The test diffs require some explanation: 1. llvm/test/CodeGen/X86/fmf-flags.ll - no target option for unsafe math was specified here, so MachineCombiner kicks in where it did not previously; to make it behave consistently, we need to specify a CPU schedule model, so use the default model, and there are no code diffs. 2. llvm/test/CodeGen/X86/machine-combiner.ll - replace the target option for unsafe math with the equivalent IR-level flags, and there are no code diffs; we can't remove the NaN/nsz options because those are still used to drive x86 fmin/fmax codegen (special SDAG opcodes). 3. llvm/test/CodeGen/X86/pow.ll - similar to #1 4. llvm/test/CodeGen/X86/sqrt-fastmath.ll - similar to #1, but MachineCombiner does some reassociation of the estimate sequence ops; presumably these are perf wins based on latency/throughput (and we get some reduction of move instructions too); I'm not sure how it affects numerical accuracy, but the test reflects reality better now because we would expect MachineCombiner to be enabled if the IR was generated via something like "-ffast-math" with clang. 5. llvm/test/CodeGen/X86/vec_int_to_fp.ll - this is the test added to model PR43609; the fadds are not reassociated now, so we should get the expected results. 6. llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll - similar to #1 7. llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll - similar to #1 Differential Revision: https://reviews.llvm.org/D74851	2020-02-27 15:19:37 -05:00
Djordje Todorovic	016d91ccbd	[CallSiteInfo] Handle bundles when updating call site info This will address the issue: P8198 and P8199 (from D73534). The methods was not handle bundles properly. Differential Revision: https://reviews.llvm.org/D74904	2020-02-27 13:57:06 +01:00
David Stenberg	6d857166d2	[DebugInfo] Describe call site values for chains of expression producing instrs Summary: If the describeLoadedValue() hook produced a DIExpression when describing a instruction, and it was not possible to emit a call site entry directly (the value operand was not an immediate nor a preserved register), then that described value could not be inserted into the worklist, and would instead be dropped, meaning that the parameter's call site value couldn't be described. This patch extends the worklist so that each entry has an DIExpression that is built up when iterating through the instructions. This allows us to describe instruction chains like this: $reg0 = mv $fp $reg0 = add $reg0, offset call @call_with_offseted_fp Since DW_OP_LLVM_entry_value operations can't be combined with any other expression, such call site entries will not be emitted. I have added a test, dbgcall-site-expr-entry-value.mir, which verifies that we don't assert or emit broken DWARF in such cases. Reviewers: djtodoro, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D75036	2020-02-27 11:18:51 +01:00
David Stenberg	ff574ff291	[DebugInfo][NFC] Move out lambdas from collectCallSiteParameters() Summary: This is a preparatory patch for D75036, in which a debug expression is associated with each parameter register in the worklist. In that patch the two lambda functions addToWorklist() and finishCallSiteParams() grow a bit, so move those out to separate functions. This patch also prepares for each parameter register having their own expression moving the creation of the DbgValueLoc into finishCallSiteParams(). Reviewers: djtodoro, vsk Reviewed By: djtodoro, vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D75050	2020-02-27 11:18:51 +01:00
Matt Arsenault	6fc0d00823	GlobalISel: Fix lowering for G_UADDE/G_USUBE The type parameter passed into lower is invalid and should be removed from the function.	2020-02-26 19:10:52 -08:00
Matt Arsenault	c7e8d8b13e	GlobalISel: Cleanup code with MachineIRBuilder features	2020-02-26 19:10:34 -08:00
Krzysztof Parzyszek	fd7c2e24c1	[SDAG] Add SDNode::values() = make_range(values_begin(), values_end()) Also use it in a few places to simplify code a little bit. NFC	2020-02-26 12:07:38 -06:00
Sanjay Patel	b3d0c79836	[DAGCombiner] avoid narrowing fake fneg vector op This may inhibit vector narrowing in general, but there's already an inconsistency in the way that we deal with this pattern as shown by the test diff. We may want to add a dedicated function for narrowing fneg. It's often folded into some other op, so moving it away from other math ops may cause regressions that we would not see for normal binops. See D73978 for more details.	2020-02-26 11:25:56 -05:00
Simon Pilgrim	bbb0933e3d	[DAG] visitRotate - modulo non-uniform constant rotation amounts	2020-02-26 15:43:12 +00:00
Sam Parker	1d06e75df2	[ARM][RDA] add getUniqueReachingMIDef Add getUniqueReachingMIDef to RDA which performs a global search for a machine instruction that produces a unique definition of a given register at a given point. Also add two helper functions (getMIOperand) that wrap around this functionality to get the incoming definition uses of a given instruction. These now replace the uses of getReachingMIDef in ARMLowOverheadLoops. getReachingMIDef has been renamed to getReachingLocalMIDef and has been made private along with getInstFromId. Differential Revision: https://reviews.llvm.org/D74605	2020-02-26 11:15:26 +00:00
Fangrui Song	b61a4aaca5	[MC] Default MCContext::UseNamesOnTempLabels to false and only set it to true for MCAsmStreamer Only MCAsmStreamer (assembly output) needs to keep names of temporary labels created by MCContext::createTempSymbol(). This change made the rL236642 optimization available for cc2as and probably some other users. This eliminates a behavior difference between llvm-mc -filetype=obj and cc1as, which caused https://reviews.llvm.org/D74006#1890487 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D75097	2020-02-25 18:23:10 -08:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Quentin Colombet	5bf0023b0d	[GISel][KnownBits] Update a comment regarding the effect of cache on PHIs Unlike what I claimed in my previous commit. The caching is actually not NFC on PHIs. When we put a big enough max depth, we end up simulating loops. The cache is effectively cutting the simulation short and we get less information as a result. E.g., ``` v0 = G_CONSTANT i8 0xC0 jump v1 = G_PHI i8 v0, v2 v2 = G_LSHR i8 v1, 1 ``` Let say we want the known bits of v1. - With cache: Set v1 cache to we know nothing v1 is v0 & v2 v0 gives us 0xC0 v2 gives us known bits of v1 >> 1 v1 is in the cache => v1 is 0, thus v2 is 0x80 Finally v1 is v0 & v2 => 0x80 - Without cache and enough depth to do two iteration of the loop: v1 is v0 & v2 v0 gives us 0xC0 v2 gives us known bits of v1 >> 1 v1 is v0 & v2 v0 is 0xC0 v2 is v1 >> 1 Reach the max depth for v1... unwinding v1 is know nothing v2 is 0x80 v0 is 0xC0 v1 is 0x80 v2 is 0xC0 v0 is 0xC0 v1 is 0xC0 Thus now v1 is 0xC0 instead of 0x80. I've added a unittest demonstrating that. NFC	2020-02-25 15:56:15 -08:00
Scott Linder	915b4aa139	Support emitting .cfi_undefined in CodeGen This will be used by AMDGPU. Differential Revision: https://reviews.llvm.org/D74914	2020-02-25 14:00:01 -05:00
Quentin Colombet	a12f1d6a52	[MachineInstr] Add a dumpr method Add a dump method that recursively prints an instruction and all the instructions defining its operands and so on. This is helpful when looking at combiner issue. NFC Differential Revision: https://reviews.llvm.org/D75094	2020-02-25 10:46:29 -08:00
Roman Lebedev	d20907d1de	[Codegen] Revert rL354676/rL354677 and followups - introduced PR43446 miscompile This reverts https://reviews.llvm.org/D58468 (rL354676, `44037d7a63`), and all and any follow-ups to that code block. https://bugs.llvm.org/show_bug.cgi?id=43446	2020-02-25 20:30:12 +03:00
Jay Foad	ccee390767	GlobalISel: NFC minor cleanup to avoid a couple of fixed size local arrays	2020-02-25 09:49:19 +00:00
Roman Tereshin	b3bce6a3dd	[MachineVerifier] Doing ::calcRegsPassed over faster sets: ~15-20% faster MV, NFC MachineVerifier still takes 45-50% of total compile time with -verify-machineinstrs, with calcRegsPassed dataflow taking ~50-60% of MachineVerifier. The majority of that time is spent in BBInfo::addPassed, mostly within DenseSet implementing the sets the dataflow is operating over. In particular, 1/4 of that DenseSet time is spent just iterating over it (operator++), 40-50% on insertions, and most of the rest in ::count. Given that, we're implementing custom sets just for this analysis here, focusing on cheap insertions and O(n) iteration time (as opposed to O(U), where U is the universe). As it's based _mostly_ on BitVector for sparse and SmallVector for dense, it may remotely resemble SparseSet. The difference is, our solution is a lot less clever, doesn't have constant time `clear` that we won't use anyway as reusing these sets across analyses is cumbersome, and thus more space efficient and safer (got a resizable Universe and a fallback to DenseSet for sparse if it gets too big). With this patch MachineVerifier gets ~15-20% faster, its contribution to total compile time drops from 45-50% to ~35%, while contribution of calcRegsPassed to MachineVerifier drops from 50-60% to ~35% as well. calcRegsPassed itself gets another 2x faster here. All measured on a large suite of shaders targeting a number of GPUs. Reviewers: bogner, stoklund, rudkx, qcolombet Reviewed By: rudkx Tags: #llvm Differential Revision: https://reviews.llvm.org/D75033	2020-02-24 19:01:21 -08:00
Bill Wendling	23c2a5ce33	Allow "callbr" to return non-void values Summary: Terminators in LLVM aren't prohibited from returning values. This means that the "callbr" instruction, which is used for "asm goto", can support "asm goto with outputs." This patch removes all restrictions against "callbr" returning values. The heavy lifting is done by the code generator. The "INLINEASM_BR" instruction's a terminator, and the code generator doesn't allow non-terminator instructions after a terminator. In order to correctly model the feature, we need to copy outputs from "INLINEASM_BR" into virtual registers. Of course, those copies aren't terminators. To get around this issue, we split the block containing the "INLINEASM_BR" right before the "COPY" instructions. This results in two cheats: - Any physical registers defined by "INLINEASM_BR" need to be marked as live-in into the block with the "COPY" instructions. This violates an assumption that physical registers aren't marked as "live-in" until after register allocation. But it seems as if the live-in information only needs to be correct after register allocation. So we're able to get away with this. - The indirect branches from the "INLINEASM_BR" are moved to the "COPY" block. This is to satisfy PHI nodes. I've been told that MLIR can support this handily, but until we're able to use it, we'll have to stick with the above. Reviewers: jyknight, nickdesaulniers, hfinkel, MaskRay, lattner Reviewed By: nickdesaulniers, MaskRay, lattner Subscribers: rriddle, qcolombet, jdoerfert, MatzeB, echristo, MaskRay, xbolva00, aaron.ballman, cfe-commits, JonChesterfield, hiraditya, llvm-commits, rnk, craig.topper Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D69868	2020-02-24 18:29:06 -08:00
Matt Arsenault	11e3dde625	GlobalISel: Reimplement fewerElementsVectorBasic Changes the handling of odd breakdowns, and avoids using G_EXTRACT/G_INSERT. Pad with undef to a wider size, and unmerge. Also avoid introducing instructions for the fully undef components.	2020-02-24 21:19:47 -05:00
Craig Topper	a5fa778882	[LegalizeTypes] Scalarize non-byte sized loads in WidenRecRes_Load and SplitVecResLoad Should fix PR42803 and PR44902 Differential Revision: https://reviews.llvm.org/D74590	2020-02-24 15:14:33 -08:00
Roman Tereshin	6f87b162e6	[MachineVerifier] Doing ::calcRegsPassed in RPO: ~35% faster MV, NFC Depending on the target, test suite, pipeline config and perhaps other factors machine verifier when forced on with -verify-machineinstrs can increase compile time 2-2.5 times over (Release, Asserts On), taking up ~60% of the time. An invaluable tool, it significantly slows down machine verifier-enabled testing. Nearly 75% of its time MachineVerifier spends in the calcRegsPassed method. It's a classic forward dataflow analysis executed over sets, but visiting MBBs in arbitrary order. We switch that to RPO here. This speeds up MachineVerifier by about 35%, decreasing the overall compile time with -verify-machineinstrs by 20-25% or so. calcRegsPassed itself gets 2x faster here. All measured on a large suite of shaders targeting a number of GPUs. Reviewers: bogner, stoklund, rudkx, qcolombet Reviewed By: bogner Tags: #llvm Differential Revision: https://reviews.llvm.org/D75032	2020-02-24 13:30:01 -08:00
Simon Pilgrim	53b597cfa2	[SelectionDAG] Merge constant SDNode arithmetic into foldConstantArithmetic This is the second patch as part of https://bugs.llvm.org/show_bug.cgi?id=36544 Merging in the ConstantSDNode variant of FoldConstantArithmetic. After this, I will begin merging in FoldConstantVectorArithmetic I've ensured this patch can build & pass all lit tests in Windows and Linux environments. Patch by @justice_adams (Justice Adams) Differential Revision: https://reviews.llvm.org/D74881	2020-02-24 18:54:22 +00:00
Sjoerd Meijer	7efabe5c7d	[MIR][ARM] MachineOperand comments This adds infrastructure to print and parse MIR MachineOperand comments. The motivation for the ARM backend is to print condition code names instead of magic constants that are difficult to read (for human beings). For example, instead of this: dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14, $noreg t2Bcc %bb.4, 0, killed $cpsr we now print this: dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always /, $noreg t2Bcc %bb.4, 0 / CC:eq /, killed $cpsr This shows that MachineOperand comments are enclosed between / and /. In this example, the EOR instruction is not conditionally executed (i.e. it is "always executed"), which is encoded by the 14 immediate machine operand. Thus, now this machine operand has / CC::always / as a comment. The 0 on the next conditional branch instruction represents the equal condition code, thus now this operand has / CC:eq */ as a comment. As it is a comment, the MI lexer/parser completely ignores it. The benefit is that this keeps the change in the lexer extremely minimal and no target specific parsing needs to be done. The changes on the MIPrinter side are also minimal, as there is only one target hooks that is used to create the machine operand comments. Differential Revision: https://reviews.llvm.org/D74306	2020-02-24 14:19:21 +00:00
Sam Parker	a67eb221e2	[RDA][ARM][LowOverheadLoops] Iteration count IT blocks Change the way that we remove the redundant iteration count code in the presence of IT blocks. collectLocalKilledOperands has been introduced to scan an instructions operands, collecting the killed instructions and then visiting them too. This is used to delete the code in the preheader which calculates the iteration count. We also track any IT blocks within the preheader and, if we remove all the instructions from the IT block, we also remove the IT instruction. isSafeToRemove is used to remove any redundant uses of the iteration count within the loop body. Differential Revision: https://reviews.llvm.org/D74975	2020-02-24 13:51:03 +00:00
Bevin Hansson	6e561d1c94	[Intrinsic] Add fixed point saturating division intrinsics. Summary: This patch adds intrinsics and ISelDAG nodes for signed and unsigned fixed-point division: ``` llvm.sdiv.fix.sat.* llvm.udiv.fix.sat.* ``` These intrinsics perform scaled, saturating division on two integers or vectors of integers. They are required for the implementation of the Embedded-C fixed-point arithmetic in Clang. Reviewers: bjope, leonardchan, craig.topper Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71550	2020-02-24 10:50:52 +01:00
Bevin Hansson	c3f36acc92	[MC] Widen the functional unit type from 32 to 64 bits. Summary: The type used to represent functional units in MC is 'unsigned', which is 32 bits wide. This is currently not a problem in any upstream target as no one seems to have hit the limit on this yet, but in our downstream one, we need to define more than 32 functional units. Increasing the size does not seem to cause a huge size increase in the binary (an llc debug build went from 1366497672 to 1366523984, a difference of 26k), so perhaps it would be acceptable to have this patch applied upstream as well. Subscribers: hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71210	2020-02-24 09:37:00 +01:00
Craig Topper	3a6bb32bd2	[SelectionDAG] Remove ISD::LIFETIME_START/LIFETIME_END from assert in getMemIntrinsicNode. These appear to have their own SDNode type and shouldn't use MemIntrinsicSDNode.	2020-02-23 22:32:36 -08:00
Florian Hahn	7769030b93	Recommit "[PatternMatch] Match XOR variant of unsigned-add overflow check." This version fixes a buildbot failure cause by picking the wrong insert point for XORs. We cannot pick the XOR binary operator as insert point, as it is not guaranteed that both input operands for the overflow intrinsic are defined before it. This reverts the revert commit `c7fc0e5da6`.	2020-02-23 18:33:18 +00:00
Sanjay Patel	a253a2a793	[SDAG] fold fsub -0.0, undef to undef rather than NaN A question about this behavior came up on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-February/139003.html ...and as part of backend improvements in D73978. We decided not to implement a more general change that would have folded any FP binop with nearly arbitrary constant + undef operand to undef because that is not theoretically correct (even if it is practically correct). This is the SDAG-equivalent to the IR change in D74713.	2020-02-23 11:36:53 -05:00
Quentin Colombet	b6d63c92ec	[GISel][KnownBits] Suppress unused warning on the dump method NFC	2020-02-21 21:07:04 -08:00
Quentin Colombet	618dec2aef	[GISel][KnownBits] Add a cache mechanism to speed compile time This patch adds a cache that is valid only for the duration of a call to getKnownBits. With such short lived cache we avoid all the problems of cache invalidation while still getting the benefits of reusing the information we already computed. This cache is useful whenever an instruction occurs more than once in a chain of computation. E.g., v0 = G_ADD v1, v2 v3 = G_ADD v0, v1 Previously we would compute the known bits for: v1, v2, v0, then v1 again and finally v3. With the patch, now we won't have to recompute v1 again. NFC	2020-02-21 14:31:42 -08:00
Francesco Petrogalli	31ec721516	[llvm][CodeGen] DAG Combiner folds for vscale. Summary: This patch simplifies the DAGs generated when using the intrinsic `@llvm.vscale.` as follows: Fold (add (vscale * C0), (vscale * C1)) to (vscale * (C0 + C1)). * Canonicalize (sub X, (vscale * C)) to (add X, (vscale * -C)). * Fold (mul (vscale * C0), C1) to (vscale * (C0 * C1)). * Fold (shl (vscale * C0), C1) to (vscale * (C0 << C1)). The test `sve-gep-ll` have been updated to reflect the folding introduced by this patch. Reviewers: efriedma, sdesmalen, andwar, rengolin Reviewed By: sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74782	2020-02-21 18:03:12 +00:00
Hiroshi Yamauchi	0e3e242209	[BFI] Fix missed BFI updates in MachineSink. Summary: This prevents BFI queries on new blocks (from MachineSinking::GetAllSortedSuccessors) and fixes a bunch of assert failures under -check-bfi-unknown-block-queries=true. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74511	2020-02-21 09:50:54 -08:00
Nikita Popov	a8db806d52	[SimplifyLibCalls][IRBuilder] Accept any IRBuilder in SimplifyLibCalls This changes the SimplifyLibCalls utility to accept an IRBuilderBase, which allows us to pass through the IRBuilder used by InstCombine. This will ensure that new instructions get added to the worklist. The annotated test-case drops from 4 to 2 InstCombine iterations thanks to this. To achieve this, I'm adding an IRBuilderBase::OperandBundlesGuard, which is basically the same as the existing InsertPointGuard and FastMathFlagsGuard, but for operand bundles. Also add a setDefaultOperandBundles() method so these can be set outside the constructor. Differential Revision: https://reviews.llvm.org/D74792	2020-02-21 18:26:05 +01:00
Jay Foad	cab39e4b8c	GlobalISel: Fix narrowing of (G_ASHR i64:x, 32) Reviewers: arsenm Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74950	2020-02-21 16:51:03 +00:00
Simon Pilgrim	42ec6fdce9	[TargetLowering] Apply basic shift combines before recursive SimplifyDemandedBits calls. Minor refactor/cleanup before we begin adding non-uniform support.	2020-02-21 16:31:20 +00:00
Simon Pilgrim	86c52af05a	[TargetLowering] SimplifyDemandedBits - use getValidShiftAmountConstant helper. Use the SelectionDAG::getValidShiftAmountConstant helper to get const/constsplat shift amounts, which allows us to drop the out of range shift amount early-out. First step towards better non-uniform shift amount support in SimplifyDemandedBits.	2020-02-21 14:23:53 +00:00
Sam Clegg	df74033ec9	[WebAssembly] Remove unneeded getWasmKindForNamedSection function I believe this was carried over from getELFKindForNamedSection since the wasm backend originally used ELF object writing as a template. Differential Revision: https://reviews.llvm.org/D74565	2020-02-20 22:49:08 -08:00
Eli Friedman	c767cf24e4	[SVE] Add support for lowering GEPs involving scalable vectors. This includes both GEPs where the indexed type is a scalable vector, and GEPs where the result type is a scalable vector. Differential Revision: https://reviews.llvm.org/D73602	2020-02-20 13:45:41 -08:00
Quentin Colombet	e4a9225f5d	[GISel][KnownBits] Give up on PHI analysis as soon as we don't know anything When analyzing PHIs, we gather the known bits for every operand and merge them together to get the known bits of the result of the PHI. It is not unusual that merging the information leads to know nothing on the result (e.g., phi a: i8 3, b: i8 unknown, ..., after looking at the second argument we know we will know nothing on the result), thus, as soon as we reach that state, stop analyzing the following operand (i.e., on the previous example, we won't process anything after looking at `b`). This improves compile time in particular with PHIs with a large number of operands. NFC.	2020-02-20 11:34:01 -08:00
Simon Pilgrim	f9c326364e	[DAGCombiner] Use SDValue::getConstantOperandAPInt helper where possible. NFC.	2020-02-20 18:23:05 +00:00
Simon Pilgrim	fc2b4a02b1	[DAGCombine] visitEXTRACT_VECTOR_ELT - add SimplifyDemandedBits multi use support Similar to what we already do with SimplifyDemandedVectorElts, call SimplifyDemandedBits across all the extracted elements of the source vector, treating it as single use. There's a minor regression in store-weird-sizes.ll which will be addressed in an upcoming SimplifyDemandedBits patch.	2020-02-20 15:49:38 +00:00
Sam Parker	659500c0c9	[NFC][RDA] Break-up initialization code Separate out the initialization code from the loop traversal so that the analysis can be reset and re-run by a user.	2020-02-20 14:59:42 +00:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
Bill Wendling	129c911efa	Include static prof data when collecting loop BBs Summary: If the programmer adds static profile data to a branch---i.e. uses "__builtin_expect()" or similar---then we should honor it. Otherwise, "__builtin_expect()" is ignored in crucial situations. So we trust that the programmer knows what they're doing until proven wrong. Subscribers: hiraditya, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74809	2020-02-19 11:33:48 -08:00
Florian Hahn	c7fc0e5da6	Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check." This reverts commit `e01a3d49c2`. and commit `a6a585b803`. This causes a failure on GreenDragon: http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597	2020-02-19 19:37:08 +01:00
Florian Hahn	e01a3d49c2	[PatternMatch] Match XOR variant of unsigned-add overflow check. Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match the expected pattern in CodeGenPerpare via UAddWithOverflow. This causes a regression over Clang 7 on both X86 and AArch64: https://gcc.godbolt.org/z/juhXYV This patch extends UAddWithOverflow to also catch the XOR case, if the XOR is only used in the ICMP. This covers just a single case, but I'd like to make sure I am not missing anything before tackling the other cases. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D74228	2020-02-19 15:25:18 +01:00
Florian Hahn	216afd3301	[TargetLower] Update shouldFormOverflowOp check if math is used. On some targets, like SPARC, forming overflow ops is only profitable if the math result is used: https://godbolt.org/z/DxSmdB This patch adds a new MathUsed parameter to allow the targets to make the decision and defaults to only allowing it if the math result is used. That is the conservative choice. This patch also updates AArch64ISelLowering, X86ISelLowering, ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow ops if the math result is not used. On those targets using the overflow intrinsic for the overflow check only generates better code. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74722	2020-02-19 11:28:33 +01:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
Aditya Nandakumar	b91d9ec0bb	[GlobalISel]: Fix some non determinism exposed in CSE due to not notifying observers about mutations + add verification for CSE https://reviews.llvm.org/D67133 While investigating some non determinism (CSE doesn't produce wrong code, it just doesn't CSE some times) in GISel CSE on an out of tree target, I realized that the core issue was that there were lots of code that mutates (setReg, setRegClass etc), but doesn't notify observers (CSE in this case but this could be any other observer). In order to make the Observer be available in various parts of code and to avoid having to thread it through various API, the MachineFunction now has the observer as field. This allows it to be easily used in helper functions such as constrainOperandRegClass. Also added some invariant verification method in CSEInfo which can catch these issues (when CSE is enabled).	2020-02-18 14:54:57 -08:00
Thomas Lively	9d37f5afac	[WebAssembly] Implement multivalue call_indirects Summary: Unlike normal calls, call_indirects have immediate arguments that caused a MachineVerifier failure without a small tweak to loosen the verifier's requirements for variadicOpsAreDefs instructions. One nice thing about the new call_indirects is that they do not need to participate in the PCALL_INDIRECT mechanism because their post-isel hook handles moving the function pointer argument and adding the flags and typeindex arguments itself. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74191	2020-02-18 13:49:46 -08:00
Thomas Lively	7b64a59060	Reland "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering" This reverts commit `649aba93a2`, now that the approach started there has been shown to be workable in the patch series culminating in https://reviews.llvm.org/D74192.	2020-02-18 13:49:46 -08:00
Simon Pilgrim	d6eef0614f	[TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC.	2020-02-18 19:53:50 +00:00
Huihui Zhang	8ee0e1dc02	[NFC] Silence compiler warning [-Wmissing-braces].	2020-02-18 10:37:12 -08:00
Sander de Smalen	8fbc925807	Add OffsetIsScalable to getMemOperandWithOffset Summary: Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo, has the effect that all places where this information is used (notably, TargetInstrInfo::getMemOperandWithOffset) will need to consider Scale - and derived, Offset - possibly being scalable. This patch adds a new operand `bool &OffsetIsScalable` to TargetInstrInfo::getMemOperandWithOffset and fixes up all the places where this function is used, to consider the offset possibly being scalable. In most cases, this means bailing out because the algorithm does not (or cannot) support scalable offsets in places where it does some form of alias checking for example. Reviewers: rovka, efriedma, kristof.beyls Reviewed By: efriedma Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72758	2020-02-18 15:53:29 +00:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
James Clarke	b3cd44f80b	Use SETNE directly rather than SUB/SETNE 0 for stack guard check Summary: Backends should fold the subtraction into the comparison, but not all seem to. Moreover, on targets where pointers are not integers, such as CHERI, an integer subtraction is not appropriate. Instead we should just compare the two pointers directly, as this should work everywhere and potentially generate more efficient code. Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish Reviewed By: lebedev.ri Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74454	2020-02-18 13:21:26 +00:00
Djordje Todorovic	a5ac8ca3e0	[CSInfo][TailDuplicator] Delete the call site info when removing dead MBBs This is needed for the debug entry values feature. Differential Revision: https://reviews.llvm.org/D74702	2020-02-18 12:29:51 +01:00
Jim Lin	466f8843f5	[NFC] Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h,td}	2020-02-18 10:49:13 +08:00
Vedant Kumar	3f148eabe0	[LiveDebugValues] Visit open var locs just once in transferRegisterDef, NFC For a file in WebKit, this brings the time spent in LiveDebugValues down from 16 minutes to 2 minutes. The reduction comes from iterating the set of open variable locations just once in transferRegisterDef. Post-patch, the most expensive item inside of transferRegisterDef is a call to VarLoc::isDescribedByReg, which we have to do. Testing: I built LNT using the Os-g cmake cache with & without this patch, then diffed the object files to verify there was no binary diff. rdar://59446577 Differential Revision: https://reviews.llvm.org/D74633	2020-02-17 14:04:22 -08:00
Matt Arsenault	0e2eb357e0	GlobalISel: Extend narrowing to G_ASHR	2020-02-17 10:42:59 -08:00
Matt Arsenault	8550859535	GlobalISel: Extend shift narrowing to G_SHL	2020-02-17 09:13:37 -08:00
Benjamin Kramer	564a9de28e	Hide implementation details. NFC>	2020-02-17 17:55:23 +01:00
Simon Pilgrim	a1585aec6f	[SelectionDAG] Expose the "getValidShiftAmount" helpers available. NFCI. These are going to be useful in TargetLowering::SimplifyDemandedBits, so expose these helpers outside of SelectionDAG.cpp Also add an getValidShiftAmountConstant early-out to getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant so we can use them for scalar cases as well.	2020-02-17 16:28:46 +00:00
Matt Arsenault	78d455adf0	GlobalISel: Add combine to narrow G_LSHR Produce an unmerge to a narrower type and introduce a narrower shift if needed. I wasn't sure if there was a better way to parameterize the target's preferred shift type for the GICombineRule, so manually call the combine helper.	2020-02-17 08:04:52 -08:00
Sander de Smalen	a7a96c726e	[AArch64] Implement passing SVE vectors by ref for AAPCS. Summary: This patch implements the part of the calling convention where SVE Vectors are passed by reference. This means the caller must allocate stack space for these objects and pass the address to the callee. Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71216	2020-02-17 15:20:28 +00:00
Sjoerd Meijer	dad5f00e3b	[DAGCombine] Combine pattern for REV16 This adds another pattern to the combiner for a case that we were not handling to generate the REV16 instruction for ARM/Thumb2 and a bswap+ror on X86. Differential Revision: https://reviews.llvm.org/D74032	2020-02-17 14:54:17 +00:00
Benjamin Kramer	5fc5c7db38	Strength reduce vectors into arrays. NFCI.	2020-02-17 15:37:35 +01:00
Fangrui Song	549b436beb	[MC] De-capitalize MCStreamer::Emit{Bundle,Addrsig}* etc So far, all non-COFF-related Emit* functions have been de-capitalized.	2020-02-15 09:11:48 -08:00
Simon Pilgrim	ce2b5f1569	Fix gcc9.2 -Winit-list-lifetime warning. NFCI. Reported by @lbenes (Luke Benes)	2020-02-15 16:48:51 +00:00
Fangrui Song	774971030d	[MCStreamer] De-capitalize EmitValue EmitIntValue{,InHex}	2020-02-14 23:08:40 -08:00
Fangrui Song	895cad1a13	[AsmPrinter][XRay] Omit unique ID for xray_instr_map and xray_fn_idx Follow-up for D74006.	2020-02-14 21:10:46 -08:00
Diogo Sampaio	8bc790f9e6	[AArch64][FPenv] Update chain of int to fp conversion Summary: When using strict fp, it is required to update the chain when performing integer type promotion of a operand to a integer to floating point conversion. Reviewers: craig.topper, john.brawn Reviewed By: craig.topper Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74597	2020-02-15 05:07:34 +00:00
Fangrui Song	f554e27224	[AsmPrinter] Omit unique ID for __patchable_function_entries sections Follow-up for D74006. When the integrated assembler is used, we use SHF_LINK_ORDER. The linked-to symbol is part of ELFSectionKey, thus we can omit the unique ID.	2020-02-14 20:54:54 -08:00
Fangrui Song	1dc16c752d	[MC] Add MCSection::NonUniqueID and delete one MCContext::getELFSection overload	2020-02-14 20:25:52 -08:00
Fangrui Song	6d2d589b06	[MC] De-capitalize another set of MCStreamer::Emit* functions Emit{ValueTo,Code}Alignment Emit{DTP,TP,GP}* EmitSymbolValue etc	2020-02-14 19:26:52 -08:00
Fangrui Song	a55daa1461	[MC] De-capitalize some MCStreamer::Emit* functions	2020-02-14 19:11:53 -08:00
Matt Arsenault	3bb0ff8341	GlobalISel: Remove unused function argument	2020-02-14 15:57:39 -08:00
Sean Fertile	b75692c30e	[AsmPrinter] Use the McASMInfo to determine if we need descriptors. In https://reviews.llvm.org/rG8b737688c21a9755cae14cb9343930e0882164ab I switched the condition gating the creation of the descriptor symbol from checking the MCAsmInfo if we need to support descriptors, to if the OS was AIX. Technically the 2 should be interchangeable: if we are targeting AIX then we need to emit XCOFF object files, and the MCAsmInfo must return true for needing function descriptors. This doesn't account for lit test with runsteps that only set the arch. Eg: test/CodeGen/XCore/section-name.ll which when run natively on AIX we end up with a target xcore-ibm-aix and needFunctionDescriptors is false. This patch reverts to using the MCAsmInfo and adds an assert that the target OS must be AIX since that is the only target using the descriptor hook. Differential Revision: https://reviews.llvm.org/D74622	2020-02-14 15:20:39 -05:00
Matt Arsenault	bfbfa18591	GlobalISel: Lower s64->s16 G_FPTRUNC This is more or less directly ported from the AMDGPU custom lowering for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES instead of creating shift/trunc to extract the two halves, and zexting an inverted compare instead of select_cc). This also does not include the fast math expansion the DAG which converts to f32 and then to f16. I think that belongs in a pre-legalize combine instead.	2020-02-14 10:46:58 -08:00
Volkan Keles	187686a22f	[GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges Like COPY instructions explained in D70616, we don't check the constraints when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check if registers can be replaced, or a COPY instruction needs to be built. https://reviews.llvm.org/D70564	2020-02-14 10:45:58 -08:00
Alexandre Ganea	8404aeb56a	[Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket. == Background == Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads. By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to. This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market. == The problem == The heavyweight_hardware_concurrency() API was introduced so that only one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group". == The changes in this patch == To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO). When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead. The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware core will be used. When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once. Differential Revision: https://reviews.llvm.org/D71775	2020-02-14 10:24:22 -05:00
Fangrui Song	bcd24b2d43	[AsmPrinter][MCStreamer] De-capitalize EmitInstruction and EmitCFI*	2020-02-13 22:08:55 -08:00
Fangrui Song	1d49eb00d9	[AsmPrinter] De-capitalize all AsmPrinter::Emit* but EmitInstruction Similar to rL328848.	2020-02-13 17:06:24 -08:00
Vedant Kumar	3091049446	Add dbgs() output to help track down missing DW_AT_location bugs, NFC	2020-02-13 14:38:44 -08:00
Vedant Kumar	8e77b33b3c	[Local] Do not move around dbg.declares during replaceDbgDeclare replaceDbgDeclare is used to update the descriptions of stack variables when they are moved (e.g. by ASan or SafeStack). A side effect of replaceDbgDeclare is that it moves dbg.declares around in the instruction stream (typically by hoisting them into the entry block). This behavior was introduced in llvm/r227544 to fix an assertion failure (llvm.org/PR22386), but no longer appears to be necessary. Hoisting a dbg.declare generally does not create problems. Usually, dbg.declare either describes an argument or an alloca in the entry block, and backends have special handling to emit locations for these. In optimized builds, LowerDbgDeclare places dbg.values in the right spots regardless of where the dbg.declare is. And no one uses replaceDbgDeclare to handle things like VLAs. However, there doesn't seem to be a positive case for moving dbg.declares around anymore, and this reordering can get in the way of understanding other bugs. I propose getting rid of it. Testing: stage2 RelWithDebInfo sanitized build, check-llvm rdar://59397340 Differential Revision: https://reviews.llvm.org/D74517	2020-02-13 14:35:02 -08:00
Fangrui Song	0bc77a0f0d	[AsmPrinter] De-capitalize some AsmPrinter::Emit* functions Similar to rL328848.	2020-02-13 13:38:33 -08:00
Fangrui Song	0dce409cee	[AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile	2020-02-13 13:22:49 -08:00
Matt Arsenault	de256478e6	GlobalISel: Don't use LLT references These should always be passed by value	2020-02-13 15:25:30 -05:00
Simon Pilgrim	32176133fa	Move FIXME to start of comment so visual studio actually tags it. NFC.	2020-02-13 14:28:50 +00:00
Serguei Katkov	a6f38b4697	[Statepoint] Remove redundant clear of call target on register Patchable statepoint is lowered into sequence of nops, so zeroed call target should not be on register. It is better to use getTargetConstant instead of getConstant to select zero constant for call target. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D74465	2020-02-13 10:25:50 +07:00
Fangrui Song	c662795b07	[AsmPrinter][ELF] Emit local alias for ExternalLinkage dso_local GlobalAlias	2020-02-12 17:08:22 -08:00
Guozhi Wei	369d086d78	[MBP] Partial tail duplication into hot predecessors Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors. This patch improves tail duplication in 3 aspects: A successor can be duplicated into part of its predecessors. A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only. If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors. Differential Revision: https://reviews.llvm.org/D73387	2020-02-12 15:22:33 -08:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Clement Courbet	15488ff24b	[CodeGen] Fix the computation of the alignment of split stores. Summary: Right now the alignment of the lower half of a store is computed as align/2, which fails for unaligned stores (align = 1), and is overly pessimitic for, e.g. a 8 byte store aligned to 4 bytes. Fixes PR44851 Fixes PR44877 Reviewers: gchatelet, spatel, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74311	2020-02-12 10:37:30 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Nicolai Hähnle	07a5b849f7	SelectionDAG: Fix bug in ClusterNeighboringLoads Summary: The method attempts to find loads that can be legally clustered by looking for loads consuming the same chain glue token. However, the old code looks at _all_ users of values produced by the chain node -- including uses of the loaded/returned value of volatile loads or atomics. This could lead to circular dependencies which then failed during scheduling. With this change, we filter out users by getResNo, i.e. by which SDValue value they use, to ensure that we only look at users of the chain glue token. This appears to be a rather old bug, which is perhaps surprising. However, the test case is actually quite fragile (i.e., it is hidden by fairly small changes), and the test _must_ use volatile loads for the bug to manifest. Reviewers: arsenm, bogner, craig.topper, foad Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74253	2020-02-12 09:12:55 +01:00
Craig Topper	0daf9b8e41	[X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses them to implement soft promotion for the half type. This is enough to provide basic support for __fp16 with strictfp. Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C is enabled.	2020-02-11 22:30:04 -08:00
lewis-revill	a6bd1256ce	[DebugInfo] Call site entries cannot be generated for FrameSetup calls Instructions marked as FrameSetup do not cause requestLabelAfterInsn to be called and so no such label is generated. Call instructions which require call site entries to be generated require this label to be present in order to calculate the return PC offset/address, but the check for whether the call instruction is marked as FrameSetup was not present. Therefore in the case where a call instruction is marked as FrameSetup, an assertion failure occurs if a call site entry is to be generated. This is the case with RISC-V's implementation of save/restore via library calls. Differential Revision: https://reviews.llvm.org/D71593	2020-02-11 21:23:18 +00:00
OCHyams	35e0ab647b	[DebugInfo][NFC] Fixup the UserValue methods to use FragmentInfo Fixup the UserValue methods to use FragmentInfo instead of DIExpression because the DIExpression is only ever used to get the to get the FragmentInfo. The DIExpression is meaningless in the UserValue class because each definition point added to a UserValue may have a unique DIExpression. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D74057	2020-02-11 10:20:24 +00:00
OCHyams	3aa33fde03	[DebugInfo][NFC] Rename the class DbgValueLocation to DbgVariableValue Rename the class DbgValueLocation to DbgVariableValue and instances from Loc to DbgValue. These names better express the new semantics introduced in D74053. The class previously represented a { Location } only. It now represents a { Location, DIExpression } pair which together describe a value. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D74055	2020-02-11 10:20:24 +00:00
OCHyams	1e40799324	[DebugInfo] Teach LDV how to handle identical variable fragments LiveDebugVariables uses interval maps to explicitly represent DBG_VALUE intervals. DBG_VALUEs are filtered into an interval map based on their { Variable, DIExpression }. The interval map will coalesce adjacent entries that use the same { Location }. Under this model, DBG_VALUEs which refer to the same bits of the same variable will be filtered into different interval maps if they have different DIExpressions which means the original intervals will not be properly preserved. This patch fixes the problem by using { Variable, Fragment } to filter the DBG_VALUEs into maps, and coalesces adjacent entries iff they have the same { Location, DIExpression } pair. The solution is not perfect because we see the similar issues appear when partially overlapping fragments are encountered, but is far simpler than a complete solution (i.e. D70121). Fixes: pr41992, pr43957 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D74053	2020-02-11 10:20:24 +00:00
Amara Emerson	067dd9c6b1	[GlobalISel][CallLowering] Use stripPointerCasts(). A downstream test exposed a simple logic bug with the manual pointer stripping code, fix that by just using stripPointerCasts() on the value. I don't think there's a way to expose this issue upstream.	2020-02-10 15:43:57 -08:00
Matt Arsenault	f270da6bfc	RegisterCoalescer: Add LaneMask to debug printing	2020-02-10 12:34:33 -08:00
diggerlin	aa86311e62	[AIX][XCOFF] Support Mergeable2ByteCString and Mergeable4ByteCString SUMMARY: The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74164	2020-02-10 14:45:54 -05:00
Sebastian Neubauer	7cddd15e56	[SelectionDAG] Optimize build_vector of truncates and shifts Add a simplification to fuse a manual vector extract with shifts and truncate into a bitcast. Unpacking and packing values into vectors is only optimized with extractelement instructions, not when manually unpacked using shifts and truncates. This patch simplifies shifts and truncates into a bitcast if possible. Simplify (build_vec (trunc $1) (trunc (srl $1 width)) (trunc (srl $1 (2 * width))) ...) to (bitcast $1) Differential Revision: https://reviews.llvm.org/D73892	2020-02-10 15:04:07 +01:00
Djordje Todorovic	3a4dc577c9	[CSInfo] Fix the assertions regarding updating the CSInfo The call site info was not updated correctly when deleting corresponding call instructions. Differential Revision: https://reviews.llvm.org/D73700	2020-02-10 10:55:06 +01:00
Djordje Todorovic	68908993eb	[CSInfo] Use isCandidateForCallSiteEntry() when updating the CSInfo Use the isCandidateForCallSiteEntry(). This should mostly be an NFC, but there are some parts ensuring the moveCallSiteInfo() and copyCallSiteInfo() operate with call site entry candidates (both Src and Dest should be the call site entry candidates). Differential Revision: https://reviews.llvm.org/D74122	2020-02-10 10:03:14 +01:00
Amara Emerson	21c9d9ad43	[GlobalISel][CallLowering] Tighten constantexpr check for callee. I'm not sure there's a test case for this, but it's better to be safe.	2020-02-09 22:59:48 -08:00
Matt Arsenault	312a9d1b83	GlobalISel: Fix narrowScalar for G_{CTLZ\|CTTZ}_ZERO_UNDEF Narrow these for 64-bit VALU for AMDGPU.	2020-02-09 19:02:38 -05:00
Matt Arsenault	6135f5eda4	GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ The result type is separate from the source type.	2020-02-09 18:11:43 -05:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
Craig Topper	2af1640f9a	[LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF. Summary: For CTTZ we place a set bit just past where the non-promoted type stopped so the extended bits won't be used for the count. For CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in the original type and we end up counting into the extended bits. So we can just use ANY_EXTEND for both cases. This matches what is done in type legalization for these operations. We make no effort to force the upper bits to zero. Differential Revision: https://reviews.llvm.org/D74111	2020-02-07 22:25:56 -08:00
Amara Emerson	35c63d66aa	[GlobalISel][CallLowering] Look through bitcasts from constant function pointers. Calls to ObjC's objc_msgSend function are done by bitcasting the function global to the required function type signature. This patch looks through this bitcast so that we can do a direct call with bl on arm64 instead of using an indirect blr. Differential Revision: https://reviews.llvm.org/D74241	2020-02-07 15:32:54 -08:00
Vedant Kumar	0d0ef315cb	[MachineInstr] Add isCandidateForCallSiteEntry predicate Add the isCandidateForCallSiteEntry predicate to MachineInstr to determine whether a DWARF call site entry should be created for an instruction. For now, it's enough to have any call instruction that doesn't belong to a blacklisted set of opcodes. For these opcodes, a call site entry isn't meaningful. Differential Revision: https://reviews.llvm.org/D74159	2020-02-07 10:10:41 -08:00
Petar Avramovic	7df5fc9e03	[GlobalISel] Add buildMerge with SrcOp initializer list Allows more flexible use of buildMerge in places where use operands are available as SrcOp since it does not require explicit conversion to Register. Simplify code with new buildMerge. Differential Revision: https://reviews.llvm.org/D74223	2020-02-07 18:43:45 +01:00
Amara Emerson	28d22c2c9c	[GlobalISel][IRTranslator] Add special case support for ~memory inline asm clobber. This is a one off special case, since actually implementing full inline asm support will be much more involved. This lets us compile a lot more code as a common simple case. Differential Revision: https://reviews.llvm.org/D74201	2020-02-07 08:55:23 -08:00
Jinsong Ji	01edae1271	[AsmPrinter] Print FP constant in hexadecimal form instead Printing floating point number in decimal is inconvenient for humans. Verbose asm output will print out floating point values in comments, it helps. But in lots of cases, users still need additional work to covert the decimal back to hex or binary to check the bit patterns, especially when there are small precision difference. Hexadecimal form is one of the supported form in LLVM IR, and easier for debugging. This patch try to print all FP constant in hex form instead. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D73566	2020-02-07 16:00:55 +00:00
Matt Arsenault	3b198518ad	GlobalISel: Fix narrowing of G_CTPOP The result type is separate from the source type. Tests will be included in a future AMDGPU patch which uses this from RegBankSelect/applyMappingImpl.	2020-02-07 06:58:00 -08:00
Matt Arsenault	8de2dad9e0	GlobalISel: Fix lowering of G_CTLZ/G_CTTZ The type passed to lower was invalid, so I'm not sure how this was even working before. The source and destination type also do not have to match, so make sure to use the right ones.	2020-02-07 06:54:12 -08:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Sourabh Singh Tomar	84e5760a16	[DebugInfo]: Reorderd the emission of debug_str section. Summary: This patch reorders the emission of debug_str section, so that string can come after macros. This is necessary for macro forms like DW_MACRO_define_strp, which emits macro as a string in debug_str section.	2020-02-07 11:15:55 +05:30
Amara Emerson	ac8a12c874	[GlobalISel] Use G_ZEXTLOAD instead of an anyextending load for non-pow-2 legalization. Fixes PR43288	2020-02-06 14:36:36 -08:00
Konstantin Schwarz	76986bdc46	[GlobalISel] Legalize more G_FP(EXT\|TRUNC) libcalls. This adds a new helper function for retrieving the floating point type corresponding to the specified bit-width.	2020-02-06 11:41:34 -08:00
Fangrui Song	727362e87b	[MC][ELF] Rename MC related "Associated" to "LinkedToSym" "linked-to section" is used by the ELF spec. By analogy, "linked-to symbol" is a good name for the signature symbol. The word "linked-to" implies a directed edge and makes it clear its relation with "sh_link", while one can argue that "associated" means an undirected edge. Also, combine tests and add precise SMLoc to improve diagnostics. Reviewed By: eugenis, grimar, jhenderson Differential Revision: https://reviews.llvm.org/D74082	2020-02-06 11:31:04 -08:00
Jeremy Morse	6531a78ac4	Revert "[DebugInfo] Remove some users of DBG_VALUEs IsIndirect field" This reverts commit `ed29dbaafa`. I'm backing out D68945, which as the discussion for D73526 shows, doesn't seem to handle the -O0 path through the codegen backend correctly. I'll reland the patch when a fix is worked out, apologies for all the churn. The two parent commits are part of this revert too. Conflicts: llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp llvm/test/DebugInfo/X86/dbg-addr-dse.ll SelectionDAGBuilder conflict is due to a nearby change in `e39e2b4a79` that's technically unrelated. dbg-addr-dse.ll conflicted because `41206b61e3` (legitimately) changes the order of two lines. There are further modifications to dbg-value-func-arg.ll: it landed after the patch being reverted, and I've converted indirection to be represented by the isIndirect field rather than DW_OP_deref.	2020-02-06 14:41:40 +00:00
Jeremy Morse	ece761427f	Revert "[DebugInfo][DAG] Distinguish different kinds of location indirection" This reverts commit `3137fe4d23`. I'm backing out D68945, which this patch is a follow up for. It'll be re-landed when D68945 is fixed. The changes to dbg-value-func-arg.ll occur because our handling of certain kinds of location now mixes up indirection that happens at different points in a DIExpression. While this is a regression, it's a return to the prior behaviour while a better patch is sought.	2020-02-06 14:41:40 +00:00
Jeremy Morse	ed5998d21e	Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location" This reverts commit `2d3174c4df`. The overall solution for this problem is reverting D68945, which wasn't handling the -O0 path through the codegen backend correctly. See: discussion in D73526.	2020-02-06 14:41:39 +00:00
Sjoerd Meijer	93b0536fd2	[RDA] getInstFromId: find instructions. NFC. To find the instruction in the block for a given ID, first a count and then a lookup was performed in the map, which is almost the same thing, thus doing double the work. Differential Revision: https://reviews.llvm.org/D73866	2020-02-06 14:13:31 +00:00
Sam Parker	0a8cae10fe	[ReachingDefs] Make isSafeToMove more strict. Test that we're not moving the instruction through instructions with side-effects. Differential Revision: https://reviews.llvm.org/D74058	2020-02-06 14:06:08 +00:00
Diogo Sampaio	8ba2b62810	[ARM] Fix non-determenistic behaviour Summary: ARM Type Promotion pass does not clear the container that defines if one variable was visited or not, missing optimization opportunities by luck when two llvm:Values from different functions are allocated at the same memory address. Also fixes a comment and uses existing method to pop and obtain last element of the worklist. Reviewers: samparker Reviewed By: samparker Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73970	2020-02-06 09:21:13 +00:00
Matt Arsenault	7464e8d6ad	GlobalISel: Remove check for illegal MIR The verifier will catch this.	2020-02-05 18:37:17 -05:00
Jonas Paulsson	96ea377ea4	[PHIElimination] Compile time optimization for huge functions. This is a compile-time optimization for PHIElimination (splitting of critical edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As discussed there, the way to remedy the slowdowns with huge functions is to pre-compute the live-in registers for each MBB in an efficient way in PHIElimination.cpp and then pass that information along to LiveVariabless::addNewBlock(). In all the huge test programs where this slowdown has been noticable, it has dissapeared entirely with this patch. Review: Björn Pettersson, Quentin Colombet. Differential Revision: https://reviews.llvm.org/D73152	2020-02-05 18:10:03 -05:00
Matt Arsenault	9087ef0765	GlobalISel: Allow CSE of G_IMPLICIT_DEF The legalizer produces a lot of these, and they make reading legalized MIR annoying. For some reason, this does seem to sometimes introduce copies of implicit def, which is dumb.	2020-02-05 17:47:21 -05:00
Shu-Chun Weng	ce9633633c	[GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions contractCrossBankCopyIntoStore() finds the instruction defines the source register and uses its output to replace the register. There are, however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES. Current implementation hardcodes to operand 0 and has no way of knowing which output should be used. This change adds another function to directly return the register that is the source of the register and use that for folding. This fixes https://bugs.llvm.org/show_bug.cgi?id=44783 Differential Revision: https://reviews.llvm.org/D74005	2020-02-05 10:38:35 -08:00
Simon Pilgrim	4592bb7195	visitINSERT_VECTOR_ELT - pull out repeated dyn_cast. NFCI. This always gets called at least once.	2020-02-05 13:30:54 +00:00
Djordje Todorovic	de90d73e03	[DebugInfo] Avoid the call site param for mem instrs with multiple defs We currently only handle mem instructions with a single define. Avoid the call site parameter debug info when we find the case with multiple defs, rather than throwing an assert. Differential Revision: https://reviews.llvm.org/D73954	2020-02-05 10:03:14 +01:00
Thomas Lively	649aba93a2	Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering" Summary: This reverts commit `3ef169e586`. The purpose of this commit was to allow stack machines to perform instruction selection for instructions with variadic defs. However, MachineInstrs fundamentally cannot support variadic defs right now, so this change does not turn out to be useful. Depends on D73927. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73928	2020-02-04 20:04:59 -08:00
David Blaikie	ec50e10db4	DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF Originally committed in: `1ced28cbe7` Reverted in: `f75301d16d` (reverted due to tests failing on non-linux/x86 targets, tests have since been generalized and specialized... since Split DWARF isn't supported on non-elf targets anyway and we have no way to run on "whatever elf target is available" so they fail on MacOS without an explicit target triple) This code was incorrectly emitting extra bytes into arbitrary parts of the object file when it was meant to be hashing them to compute the DWO ID. Follow-up patch(es) will refactor this API somewhat to make such bugs harder to introduce, hopefully.	2020-02-04 19:25:47 -08:00
Francis Visoiu Mistrih	7531a5039f	[Remarks] Extend the RemarkStreamer to support other emitters This extends the RemarkStreamer to allow for other emitters (e.g. frontends, SIL, etc.) to emit remarks through a common interface. See changes in llvm/docs/Remarks.rst for motivation and design choices. Differential Revision: https://reviews.llvm.org/D73676	2020-02-04 17:16:02 -08:00
Reid Kleckner	2d89e0a098	[SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr The CATCHPAD node mostly existed to be selected into the EH_RESTORE instruction, which sets the frame back up when 32-bit Windows exceptions return to the parent function. However, creating this MachineInstr early increases the risk that other passes will come along and insert instructions that use the stack before ESP and EBP are restored. That happened in PR44697. Instead of representing these in the instruction stream early, delay it until PEI. Mark the blocks where this needs to happen as EHPads, but not funclet entry blocks. Passes after PEI have to be careful not to hoist instructions that can use stack across frame setup instructions, so this should be relatively reliable. Fixes PR44697 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D73752	2020-02-04 15:13:12 -08:00
Matt Arsenault	23b76096b7	CodeGenPrepare: Reorder check for cold and shouldOptimizeForSize shouldOptimizeForSize is showing up in a profile, spending around 10% of the pass time in one function. This should probably not be so slow, but the much cheaper attribute check should be done first anyway.	2020-02-04 11:23:13 -08:00
Matt Arsenault	de8451fe4d	GlobalISel: Fold SmallVector resizes into constructors	2020-02-04 10:28:08 -08:00
Matt Arsenault	a3c814d234	Separately track input and output denormal mode AMDGPU and x86 at least both have separate controls for whether denormal results are flushed on output, and for whether denormals are implicitly treated as 0 as an input. The current DAGCombiner use only really cares about the input treatment of denormals.	2020-02-04 12:59:21 -05:00
Nico Weber	f75301d16d	Revert "DebugInfo: Check DW_OP_convert in loclists with Split DWARF" and follow-ups. This reverts commit `1ced28cbe7`. This reverts commit `4f281f0474`. This reverts commit `552a8fe12b`. The test fails on non-Linux.	2020-02-04 10:06:46 -05:00
Jeremy Morse	41206b61e3	[DebugInfo] Re-instate LiveDebugVariables scope trimming This patch reverts part of r362750 / D62650, which stopped LiveDebugVariables from trimming leading variable location ranges down to only covering those instructions that are in scope. I've observed some circumstances where the number of DBG_VALUEs in a function can be amplified in an un-necessary way, to cover more instructions that are out of scope, leading to very slow compile times. Trimming the range of instructions that the variables cover solves the slow compile times. The specific problem that r362750 tries to fix is addressed by the assignment to RStart that I've added. Any variable location that begins at the first instruction of a block will now be considered to begin at the start of the block. While these sound the same, the have different SlotIndexes, and the register allocator may shoehorn additional instructions in between the two. The test added in the past (wrong_debug_loc_after_regalloc.ll) still works with this modification. live-debug-variables.ll has a range trimmed to not cover the prologue of the function, while dbg-addr-dse.ll has a DBG_VALUE sink past one instruction with no DebugLoc, which is expected behaviour. Differential Revision: https://reviews.llvm.org/D73691	2020-02-04 14:51:06 +00:00
Filipe Cabecinhas	abada5036e	[NFC] Fix some spelling mistakes to test pushing to GH.	2020-02-04 11:07:31 +00:00
Simon Pilgrim	3dd688a9ee	[DAG] OptLevelChanger - fix uninitialized variable analyzer warning (PR44471) Ensure that OptLevelChanger::SavedFastISel is initialized in the constructor. This should be NFC - as the equivalent 'same opt level' early-out is used in the destructor as well, so SavedFastISel is only actually referenced in the general case. Differential Revision: https://reviews.llvm.org/D73875	2020-02-04 10:54:33 +00:00
David Green	362d00e051	[ARM][VecReduce] Force expand vector_reduce_fmin Under MVE, we do not have any lowering for fminimum, which a vector_reduce_fmin without NoNan will be expanded into. As with the other recent patches, force this to expand in the pre-isel pass. Note that Neon lowering would be OK because the scalar fminimum uses the vector VMIN instruction, but is probably better to just rely on the scalar operations, which is what is done here. Also fixes what appears to be the reversal of INF vs -INF in the vector_reduce_fmin widening code.	2020-02-04 09:36:59 +00:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
David Blaikie	1ced28cbe7	DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF This code was incorrectly emitting extra bytes into arbitrary parts of the object file when it was meant to be hashing them to compute the DWO ID. Follow-up patch(es) will refactor this API somewhat to make such bugs harder to introduce, hopefully.	2020-02-03 19:16:42 -08:00
David Blaikie	031f83fb82	DebugInfo: Simplify emitDebugLocEntry by never passing a null CU	2020-02-03 18:47:14 -08:00
Matt Arsenault	cd7650c186	GlobalISel: Implement fewerElementsVector for G_SEXT_INREG Start using a new strategy with a combination of merge and unmerges. This allows scalarizing before lowering, which in cases like <2 x s128> avoids producing giant illegal shifts.	2020-02-03 11:47:33 -08:00
Quentin Colombet	f26ff8c9df	[TargetRegisterInfo] Make the heuristic to skip region split overridable by the target RegAllocGreedy uses a fairly compile time intensive splitting heuristic called region splitting. This heuristic was disabled via another heuristic when it is likely that it won't be worth the compile time. The only way to control this other heuristic was via a command line option (huge-size-for-split). This commit gives more control on this heuristic by making it overridable by the target using a target hook in TargetRegisterInfo called shouldRegionSplitForVirtReg. The default implementation of this hook keeps the heuristic as it was before this patch.	2020-02-03 11:30:35 -08:00
Simon Pilgrim	61621f826a	[TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.	2020-02-03 16:50:04 +00:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
Guillaume Chatelet	fc19465965	[Alignment][NFC] Use Align for code creating MemOp Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73874	2020-02-03 14:10:30 +01:00
Guillaume Chatelet	75d9994a51	Fix broken invariant Summary: A Copy with a source that is zeros is the same as a Set of zeros. This fixes the invariant that SrcAlign should always be non-null. Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73791	2020-02-03 11:01:05 +01:00
Fangrui Song	44cdae68c3	[CodeGenPrepare] Delete dead !DL check Follow-up for D73754 DL is assigned in CodeGenPrepare::runOnFunction and is guaranteed to be non-null.	2020-02-02 09:49:06 -08:00
Fangrui Song	5a56a25b0b	[CodeGenPrepare] Make TargetPassConfig required The code paths in the absence of TargetMachine, TargetLowering or TargetRegisterInfo are poorly tested. As rL285987 said, requiring TargetPassConfig allows us to delete many (untested) checks littered everywhere. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D73754	2020-02-02 09:28:45 -08:00
Fangrui Song	5932f7b8f2	[PatchableFunction] Use an empty DebugLoc The current FirstMI.getDebugLoc() is actually null in almost all cases. If it isn't, the generated .loc will be considered initial. The .loc will have the prologue_end flag and terminate the prologue prematurely. Also use an overload of BuildMI that will not prepend PATCHABLE_FUNCTION_ENTRY to a MachineInstr bundle.	2020-02-01 14:12:06 -08:00
Craig Topper	943b5561d6	[LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops. This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes. It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from. This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle. Differential Revision: https://reviews.llvm.org/D73749	2020-02-01 11:21:04 -08:00
Matt Arsenault	bc101ffd77	GlobalISel: Support widening unmerge results with pointer source	2020-02-01 10:47:03 -05:00
David Blaikie	338beff4dc	DwarfDebug.cpp: Fix some indentation	2020-01-31 16:01:57 -08:00
David Blaikie	b33e5f3c3e	DebugInfo: Split DWARF: Hash non-member function child DIEs Significant missing hashing - as per the comment this was only meant to skip member functions (unspecified, but I think it's legible as member function declarations, not definitions) but was skipping all named subprograms (so only hashed child DIEs for member function definitions - because they didn't have a direct name, but only a name given indirectly in the DW_AT_specification-referenced DIE)	2020-01-31 15:32:03 -08:00
Matt Arsenault	792d9b5719	DAG: Check if a value is divergent before requiresUniformRegister This avoids a potentially expensive scan if we already know it doesn't matter.	2020-01-31 15:27:18 -08:00
Jay Foad	f465b1aff4	[GlobalISel] Tweak lowering of G_SMULO/G_UMULO Summary: Applying this cleanup: - MIRBuilder.buildInstr(TargetOpcode::G_ASHR) - .addDef(Shifted) - .addUse(Res) - .addUse(ShiftAmt); + MIRBuilder.buildAShr(Shifted, Res, ShiftAmt); caused an assertion failure here: llc: /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:404: llvm::MachineInstr *llvm::MachineRegisterInfo::getVRegDef(unsigned int) const: Assertion `(I.atEnd() \|\| std::next(I) == def_instr_end()) && "getVRegDef assumes a single definition or no definition"' failed. #4 0x00000000050a6d96 in llvm::MachineRegisterInfo::getVRegDef (this=0x74606a0, Reg=2147483650) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:403 #5 0x00000000066148f6 in llvm::getConstantVRegValWithLookThrough (VReg=2147483650, MRI=..., LookThroughInstrs=false, HandleFConstant=true) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:244 #6 0x00000000066147da in llvm::getConstantVRegVal (VReg=2147483650, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:210 #7 0x0000000006615367 in llvm::ConstantFoldBinOp (Opcode=101, Op1=2147483650, Op2=2147483656, MRI=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/Utils.cpp:341 #8 0x000000000657eee0 in llvm::CSEMIRBuilder::buildInstr (this=0x7465010, Opc=101, DstOps=..., SrcOps=..., Flag=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/CSEMIRBuilder.cpp:160 #9 0x0000000003645958 in llvm::MachineIRBuilder::buildAShr (this=0x7465010, Dst=..., Src0=..., Src1=..., Flags=...) at /home/jayfoad2/git/llvm-project/llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h:1298 #10 0x00000000065c35b1 in llvm::LegalizerHelper::lower (this=0x7fffffffb5f8, MI=..., TypeIdx=0, Ty=...) at /home/jayfoad2/git/llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2020 because at this point there are two instructions defining Res: the original G_SMULO/G_UMULO and the new G_MUL that we built. The fix is to modify the original mul in place, so that there is only ever one definition of Res. Reviewers: arsenm, aditya_nandakumar Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72842	2020-01-31 19:21:01 +00:00
Simon Pilgrim	8fbc7fd567	[DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors If we don't demand any elements of the inserted subvector then just skip it.	2020-01-31 18:57:22 +00:00
Simon Pilgrim	5702dadf6f	[DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.	2020-01-31 18:02:34 +00:00
Hiroshi Yamauchi	ac8da31a0f	[PGO][PGSO] Handle MBFIWrapper Some code gen passes use MBFIWrapper to keep track of the frequency of new blocks. This was not taken into account and could lead to incorrect frequencies as MBFI silently returns zero frequency for unknown/new blocks. Add a variant for MBFIWrapper in the PGSO query interface. Depends on D73494.	2020-01-31 09:36:55 -08:00
Jay Foad	2a1b5af299	[GlobalISel] Tidy up unnecessary calls to createGenericVirtualRegister Summary: As a side effect some redundant copies of constant values are removed by CSEMIRBuilder. Reviewers: aemerson, arsenm, dsanders, aditya_nandakumar Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, hiraditya, jrtc27, atanasyan, volkan, Petar.Avramovic, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73789	2020-01-31 17:07:16 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Quentin Colombet	cfebd77742	[GISel][KnownBits] Fix a bug where we could run out of stack space One of the exit criteria of computeKnownBits is whether we reach the max recursive call depth. Before this patch we would check that the depth is exactly equal to max depth to exit. Depth may get bigger than max depth if it gets passed to a different GISelKnownBits object. This may happen when say a generic part uses a GISelKnownBits object with some max depth, but then we hit TL.computeKnownBitsForTargetInstr which creates a new GISelKnownBits object with a different and smaller depth. In that situation, when we hit the max depth check for the first time in the target specific GISelKnownBits object, depth may already be bigger than the current max depth. Hence we would continue to compute the known bits, until we ran through the full depth of the chain of computation or ran out of stack space. For instance, let say we have GISelKnownBits Info(/MaxDepth/ = 10); Info.getKnownBits(Foo) // 9 recursive calls to computeKnownBitsImpl. // Then we hit a target specific instruction. // The target specific GISelKnownBits does this: GISelKnownBits TargetSpecificInfo(/MaxDepth/ = 6) TargetSpecificInfo.computeKnownBitsImpl() // <-- next max depth checks would // always return false. This commit does not have any test case, none of the in-tree targets use computeKnownBitsForTargetInstr.	2020-01-30 19:30:39 -08:00
Leonard Chan	2d3174c4df	[SafeStack][DebugInfo] Insert DW_OP_deref in correct location This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585 where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt symbols when debugging. This is an attempt to reland with a few fixes for buildbot since I haven't merged from master in a bit. Differential Revision: https://reviews.llvm.org/D73526	2020-01-30 17:09:42 -08:00
Amara Emerson	84bd851108	[GlobalISel][IRTranslator] When translating vector geps, splat the base pointer if required. We can have geps that have a scalar base pointer, and a vector index value, which means that the base pointer must be splatted into a vector of pointers. This fixes crashes on arm64 GlobalISel with optimizations enabled.	2020-01-30 16:27:27 -08:00
Leonard Chan	3b23453b6c	Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location" This reverts commit `fff6a1b0f1`. This was breaking a bunch of buildbots.	2020-01-30 16:18:41 -08:00
Leonard Chan	fff6a1b0f1	[SafeStack][DebugInfo] Insert DW_OP_deref in correct location This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585 where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt symbols when debugging. Differential Revision: https://reviews.llvm.org/D73526	2020-01-30 15:58:37 -08:00
Matt Arsenault	eb7f74e300	CodeGen: Use Register	2020-01-30 15:01:56 -08:00
Sean Fertile	8b737688c2	[AIX] Minor cleanup in AsmPrinter. [NFC] - Extends the comments related to function descriptors, noting how they are only used on AIX. - Changes the condition used to gate the creation of the current function symbol in AsmPrinter::SetupMachineFunction to reflect being AIX specific. The creation of the symbol is different because of AIXs linkage conventions, not because AIX uses function descriptors. Differential Revision: https://reviews.llvm.org/D73115	2020-01-30 14:15:02 -05:00
Fangrui Song	06b8e32d4f	[AArch64] -fpatchable-function-entry=N,0: place patch label after BTI Summary: For -fpatchable-function-entry=N,0 -mbranch-protection=bti, after `9a24488cb6`, we place the NOP sled after the initial BTI. ``` .Lfunc_begin0: bti c nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lfunc_begin0 ``` This patch adds a label after the initial BTI and changes the __patchable_function_entries entry to reference the label: ``` .Lfunc_begin0: bti c .Lpatch0: nop nop .section __patchable_function_entries,"awo",@progbits,f,unique,0 .p2align 3 .xword .Lpatch0 ``` This placement is compatible with the resolution in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 . A local linkage function whose address is not taken does not need a BTI. Placing the patch label after BTI has the advantage that code does not need to differentiate whether the function has an initial BTI. Reviewers: mrutland, nickdesaulniers, nsz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73680	2020-01-30 11:11:52 -08:00
Matt Arsenault	ea956685a1	GlobalISel: Implement s32->s64 G_FPTOSI lowering Port directly from DAG version. The lowering for G_FPTOUI used to fail on AMDGPU because it uses G_FPTOSI.	2020-01-30 08:47:07 -05:00
Dominik Montada	dc141af755	[GlobalISel] (fix) Use pointer type size for offset constant when lowering stores Commit `9965b12fd1` was supposed to change the offset constant when lowering load/stores, but only introduced this change for loads. This patch adds the same fix for stores.	2020-01-30 08:32:35 -05:00
Simon Pilgrim	57b0d33224	[DAGCombiner] ISD::AND/OR/XOR - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-30 12:02:53 +00:00
Simon Pilgrim	a967aa2706	[DAGCombiner] ISD::SDIV/UDIV/SREM/UREM - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-30 12:02:52 +00:00
Matt Arsenault	c5fffa4da3	GlobalISel: Add observer argument to legalizeIntrinsic This is passed to legalizeCustom, but not intrinsic. Also remove the MRI argument, since you can get that from the MachineIRBuilder. I'm not sure why MachineIRBuilder has a private observer member, and this is passed separately.	2020-01-29 18:33:45 -05:00
Amara Emerson	c12f046eb9	[GlobalISel] Add new combine to convert scalar G_MUL to G_SHL. For pow2 constants we should use G_SHL for pattern matching (and perf) purposes later. Vector support not yet implemented. Differential Revision: https://reviews.llvm.org/D73659	2020-01-29 13:39:00 -08:00
Amara Emerson	0da937bb5c	[GlobalISel][IRTranslator] Follow convention and put constant offset of getelementptr arithmetic on RHS. We were needlessly putting known constant values on the LHS of a G_MUL, which is suboptimal. Differential Revision: https://reviews.llvm.org/D73650	2020-01-29 11:37:19 -08:00
Fangrui Song	8903e61b66	[AsmPrinter][ELF] Define local aliases (.Lfoo$local) for GlobalObjects For `MC_GlobalAddress` operands referencing certain GlobalObjects, we can lower them to STB_LOCAL aliases to avoid costs brought by assembler/linker's conservative decisions about symbol interposition: * An assembler conservatively assumes a global default visibility symbol interposable (ELF semantics). So relocations in object files are needed even if the code generator assumed the definition exact and non-interposable. * The relocations can cause the creation of PLT entries on some targets for -shared links. A linker conservatively assumes a global default visibility symbol interposable (if not otherwise constrained by -Bsymbolic/--dynamic-list/VER_NDX_LOCAL/etc). "certain" refers to GlobalObjects in the intersection of `hasExactDefinition() and !isInterposable()`: `external`, `appending`, `internal`, `private`. Local linkages (`internal` and `private`) cannot be interposed. `appending` is for very few objects LLVM interpret specially. So the set just includes `external`. This patch emits STB_LOCAL aliases (.Lfoo$local) for such GlobalObjects, so that targets can lower MC_GlobalAddress operands to STB_LOCAL aliases if applicable. We may extend the scope and include GlobalAlias in the future. LLVM's existing -fno-semantic-interposition behaviors give us license to do such optimizations: * Various optimizations (ipconstprop, inliner, sccp, sroa, etc) treat normal ExternalLinkage GlobalObjects as non-interposable. * Before D72197, MC resolved a PC-relative VK_None fixup to a non-local symbol at assembly time (no outstanding relocation), if the target is defined in the same section. Put it simply, even if IR optimizations failed to optimize and allowed interposition for the function call in `void foo() {} void bar() { foo(); }`, the assembler would disallow it. This patch sets up AsmPrinter infrastructure to make -fno-semantic-interposition more so. With and without the patch, the object file output should be identical: `.Lfoo$local` does not take a symbol table entry. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D73228	2020-01-29 10:58:43 -08:00
Simon Pilgrim	f7245ef897	[DAGCombiner] ISD::SHL/SRA/SRL - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-29 18:49:42 +00:00
Adrian Prantl	18dbe1b279	Run clang-format on DwarfExpression (NFC)	2020-01-29 10:23:12 -08:00
Adrian Prantl	816ee8a423	DwarfExpression: Factor out getOrCreateBaseType() (NFC)	2020-01-29 10:23:12 -08:00
Simon Pilgrim	25b8e96388	[DAGCombiner] ISD::MUL - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-29 17:26:22 +00:00
Simon Pilgrim	4b04e11735	[DAGCombiner] Sub/SUBSAT - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us.	2020-01-29 16:57:13 +00:00
Simon Pilgrim	48bd6a0986	[DAGCombiner] visitIMINMAX - use general SelectionDAG::FoldConstantArithmetic This handles all the constant splat / opaque testing for us instead of the ConstantSDNode variant where we have to do it ourselves.	2020-01-29 16:57:13 +00:00
Matt Arsenault	b63629a58d	GlobalISel: Fix mask computation in lowerInsert This is supposed to be the high bit index, not the width. Use the wrapping form of getBitsSet and avoid the bitflip.	2020-01-29 08:25:36 -08:00
Jay Foad	0d7bd34312	[MachineScheduler] Ignore artificial edges when forming store chains Summary: BaseMemOpClusterMutation::apply forms store chains by looking for control (i.e. non-data) dependencies from one mem op to another. In the test case, clusterNeighboringMemOps successfully clusters the loads, and then adds artificial edges to the loads' successors as described in the comment: // Copy successor edges from SUa to SUb. Interleaving computation // dependent on SUa can prevent load combining due to register reuse. The effect of this is that data dependencies from one load to a store are copied as artificial dependencies from a different load to the same store. Then when BaseMemOpClusterMutation::apply looks at the stores, it finds that some of them have a control dependency on a previous load, which breaks the chains and means that the stores are not all considered part of the same chain and won't all be clustered. The fix is to only consider non-artificial control dependencies when forming chains. Subscribers: MatzeB, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71717	2020-01-29 16:23:01 +00:00
Matt Arsenault	f717483acd	GlobalISel: Assert on invalid bitcast in MIRBuilder The other casts validate, so this should too.	2020-01-29 07:49:39 -08:00
Matt Arsenault	c5c1bb3374	GlobalISel: Lower G_WRITE_REGISTER	2020-01-29 06:48:24 -08:00
David Stenberg	6a2413c435	[ARM64] Debug info for structure argument missing DW_AT_location Summary: Prevent eliminating dbg_val due to COPY. Fixes this https://bugs.llvm.org/show_bug.cgi?id=40709 Patch by: Kamlesh Kumar (kamleshbhalui) Reviewers: aprantl, dblaikie, vsk, dsanders Reviewed By: dsanders Subscribers: dstenb, kristof.beyls, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D73159	2020-01-29 10:56:23 +01:00
Sam Parker	ac30ea2f87	[RDA][ARM] Move functionality into RDA Add several new helpers to RDA: - hasLocalDefBefore - isRegDefinedAfter - isSafeToDefRegAt And move two bits of logic from ARMLowOverheadLoops into RDA: - isSafeToMove - isSafeToRemove Both of these have some wrappers too to make them more convienent to use. Differential Revision: https://reviews.llvm.org/D73460	2020-01-29 03:27:47 -05:00
Benjamin Kramer	adcd026838	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Michael Spang	a2fb2c0ddc	[GlobalMerge] Preserve symbol visibility when merging globals Symbols created for merged external global variables have default visibility. This can break programs when compiling with -Oz -fvisibility=hidden as symbols that should be hidden will be exported at link time. Differential Revision: https://reviews.llvm.org/D73235	2020-01-28 13:26:18 -08:00
Hiroshi Yamauchi	2c03c899d5	[MBFI] Move BranchFolding::MBFIWrapper to its own files. NFC. Summary: To avoid header file circular dependency issues in passing updated MBFI (in MBFIWrapper) to the interface of profile guided size optimizations. A prep step for (and split off of) D73381. Reviewers: davidxl Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73494	2020-01-28 10:58:46 -08:00
Sam Parker	7ad879caa0	[NFC][RDA] typedef SmallPtrSetImpl<MachineInstr*>	2020-01-28 13:15:44 +00:00
Wang, Pengfei	3239b5034e	[FPEnv] Add pragma FP_CONTRACT support under strict FP. Summary: Support pragma FP_CONTRACT under strict FP. Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3 Subscribers: hiraditya, jdoerfert, cfe-commits, llvm-commits, LuoYuanke Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72820	2020-01-28 20:43:43 +08:00
Guillaume Chatelet	879c825cb8	[instrinsics] Add @llvm.memcpy.inline instrinsics Summary: This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++. A follow up CL will add the intrinsics in Clang. Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71710	2020-01-28 09:42:01 +01:00
Fangrui Song	c7c5da6df3	Reland "[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()"" Reland `7a8b0b1595`, with a fix that checks `!E.value().empty()` to avoid inserting a zero to SlotRemap. Debugged by rnk@ in https://bugs.chromium.org/p/chromium/issues/detail?id=1045650#c33 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D73510	2020-01-27 15:58:49 -08:00
Jay Foad	cbbbd5b5f6	[GlobalISel] Make use of KnownBits::computeForAddSub Summary: This is mostly NFC. computeForAddSub may give more precise results in some cases, but that doesn't seem to affect any existing GlobalISel tests. Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73431	2020-01-27 22:22:56 +00:00
Simon Pilgrim	e7e043724e	[DAG] Enable ISD::EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow. Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.	2020-01-27 21:17:47 +00:00
Adrian Prantl	a095d149c2	Fix an assertion failure in DwarfExpression's subregister composition This patch fixes an assertion failure in DwarfExpression that is triggered when a complex fragment has exactly the size of a subregister of the register the DBG_VALUE points to and there is no DWARF encoding for the super-register. I took the opportunity to replace/document some magic values with static constructor functions to make this code less confusing to read. rdar://problem/58489125 Differential Revision: https://reviews.llvm.org/D72938	2020-01-27 12:44:37 -08:00
Vedant Kumar	e08f205f5c	Reland (again): [DWARF] Allow cross-CU references of subprogram definitions This is a revert-of-revert (i.e. this reverts commit `802bec89`, which itself reverted `fa4701e1` and `79daafc9`) with a fix folded in. The problem was that call site tags weren't emitted properly when LTO was enabled along with split-dwarf. This required a minor fix. I've added a reduced test case in test/DebugInfo/X86/fission-call-site.ll. Original commit message: This allows a call site tag in CU A to reference a callee DIE in CU B without resorting to creating an incomplete duplicate DIE for the callee inside of CU A. We already allow cross-CU references of subprogram declarations, so it doesn't seem like definitions ought to be special. This improves entry value evaluation and tail call frame synthesis in the LTO setting. During LTO, it's common for cross-module inlining to produce a call in some CU A where the callee resides in a different CU, and there is no declaration subprogram for the callee anywhere. In this case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram in order to fill in the call site tag. That empty 'definition' defeats entry value evaluation etc., because the debugger can't figure out what it means. As a follow-up, maybe we could add a DWARF verifier check that a DW_TAG_subprogram at least has a DW_AT_name attribute. Update #1: Reland with a fix to create a declaration DIE when the declaration is missing from the CU's retainedTypes list. The declaration is left out of the retainedTypes list in two cases: 1) Re-compiling pre-r266445 bitcode (in which declarations weren't added to the retainedTypes list), and 2) Doing LTO function importing (which doesn't update the retainedTypes list). It's possible to handle (1) and (2) by modifying the retainedTypes list (in AutoUpgrade, or in the LTO importing logic resp.), but I don't see an advantage to doing it this way, as it would cause more DWARF to be emitted compared to creating the declaration DIEs lazily. Update #2: Fold in a fix for call site tag emission in the split-dwarf + LTO case. Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a ReleaseLTO-g build of the test suite. rdar://46577651, rdar://57855316, rdar://57840415, rdar://58888440 Differential Revision: https://reviews.llvm.org/D70350	2020-01-27 10:52:34 -08:00
Nico Weber	68051c1224	Revert "[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager()" This reverts commit `7a8b0b1595`. It seems to break exception handling on 32-bit Windows, see https://crbug.com/1045650	2020-01-27 11:22:33 -05:00
Dominik Montada	9965b12fd1	Use pointer type size for offset constant when lowering load/stores	2020-01-27 06:55:32 -08:00
Matt Arsenault	2a160ba5b0	GlobalISel: Reimplement widenScalar for G_UNMERGE_VALUES results Only use shifts if the requested type exactly matches the source type, and create sub-unmerges otherwise.	2020-01-27 06:18:26 -08:00
Matt Arsenault	06d9230fef	GlobalISel: Translate vector GEPs	2020-01-27 05:35:05 -08:00
Igor Kudrin	8f3d47c54a	[DWARF] Do not pass Version to DWARFExpression. NFCI. The Version was used only to determine the size of an operand of DW_OP_call_ref. The size was 4 for all versions apart from 2, but the DW_OP_call_ref operation was introduced only in DWARF3. Thus, the code may be simplified and using of Version may be eliminated. Differential Revision: https://reviews.llvm.org/D73264	2020-01-27 19:08:46 +07:00
David Stenberg	13d4ef9ac0	Improvements to call site register worklist Summary: This fixes PR44118. For cases where we have a chain like this: R8 = R1 (entry value) R0 = R8 call @foo R0 the code that emits call site entries using entry values would not follow that chain, instead emitting a call site entry with R8 as location rather than R0. Such a case was discovered when originally adding dbgcall-site-orr-moves.mir. This patch fixes that issue. This is done by changing the ForwardedRegWorklist set to a map in which the worklist registers always map to the parameter registers that they describe. Another thing this patch fixes is that worklist registers now can describe more than one parameter register at a time. Such a case occurred in dbgcall-site-interpretation.mir, resulting in a call site entry not being emitted for one of the parameters. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D73168	2020-01-27 12:41:42 +01:00
David Stenberg	b46baa82fc	Don't separate imp/expl def handling for call site params Summary: Since D70431 the describeLoadedValue() hook takes a parameter register, meaning that it can now be asked to describe any register. This means that we can drop the difference between explicit and implicit defines that we previously had in collectCallSiteParameters(). I have not found any case for any upstream targets where a parameter register is only implicitly defined, and does not overlap with any explicit defines. I don't know if such a case would even make sense. So as far as I have tested, this patch should be a non-functional change. However, this reduces the complexity of the code a bit, and it will simplify the implementation of an upcoming patch which solves PR44118. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D73167	2020-01-27 11:31:09 +01:00
Petar Avramovic	cbf03aee6d	[MIPS GlobalISel] Select population count (popcount) G_CTPOP is generated from llvm.ctpop.<type> intrinsics, clang generates these intrinsics from __builtin_popcount and __builtin_popcountll. Add lower and narrow scalar for G_CTPOP. Lower G_CTPOP for MIPS32. Differential Revision: https://reviews.llvm.org/D73216	2020-01-27 09:59:50 +01:00
Petar Avramovic	8bc7ba5b9e	[MIPS GlobalISel] Select count trailing zeros llvm.cttz.<type> intrinsic has additional i1 argument is_zero_undef, it tells whether zero as the first argument produces a defined result. G_CTTZ is generated from llvm.cttz.<type> (<type> <src>, i1 false) intrinsics, clang generates these intrinsics from __builtin_ctz and __builtin_ctzll. G_CTTZ_ZERO_UNDEF comes from llvm.cttz.<type> (<type> <src>, i1 true). Clang generates such intrinsics as parts of expansion of builtin_ffs and builtin_ffsll. It is also traditionally part of and many algorithms that are now predicated on avoiding zero-value inputs. Add narrow scalar (algorithm uses G_CTTZ_ZERO_UNDEF) for G_CTTZ. Lower G_CTTZ and G_CTTZ_ZERO_UNDEF for MIPS32. Differential Revision: https://reviews.llvm.org/D73215	2020-01-27 09:51:06 +01:00
Petar Avramovic	2b66d32f3f	[MIPS GlobalISel] Select count leading zeros llvm.ctlz.<type> intrinsic has additional i1 argument is_zero_undef, it tells whether zero as the first argument produces a defined result. MIPS clz instruction returns 32 for zero input. G_CTLZ is generated from llvm.ctlz.<type> (<type> <src>, i1 false) intrinsics, clang generates these intrinsics from __builtin_clz and __builtin_clzll. G_CTLZ_ZERO_UNDEF can also be generated from llvm.ctlz with true as second argument. It is also traditionally part of and many algorithms that are now predicated on avoiding zero-value inputs. Add narrow scalar for G_CTLZ (algorithm uses G_CTLZ_ZERO_UNDEF). Lower G_CTLZ_ZERO_UNDEF and select G_CTLZ for MIPS32. Differential Revision: https://reviews.llvm.org/D73214	2020-01-27 09:43:38 +01:00
Fangrui Song	941f20c3bd	[MachineVerifier] Simplify and delete LLVM_VERIFY_MACHINEINSTRS from a comment. NFC The environment variable has been unused since r228079.	2020-01-27 00:31:23 -08:00
Wang, Pengfei	17b8f96d65	[FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION, and macro FUNCTION likewise. NFCI. Some functions like fmuladd don't really have a node, we should divide the declaration form those have node to avoid introducing fake nodes. Differential Revision: https://reviews.llvm.org/D72871	2020-01-27 10:38:05 +08:00
Simon Pilgrim	4a5f9d9faf	[TargetLowering] Respect recursive depth in SimplifyDemandedBits call to ComputeNumSignBits	2020-01-26 10:01:56 +00:00
Simon Pilgrim	3daa71ee00	[SelectionDAG] ComputeNumSignBits - add DemandedElts support for MIN/MAX ops	2020-01-25 20:21:14 +00:00
Simon Pilgrim	3f8916b2e8	[SelectionDAG] ComputeNumSignBits - add support for rotate non-uniform vector amounts	2020-01-25 19:15:05 +00:00
Simon Pilgrim	e3c26a9d1b	[SelectionDAG] ComputeNumSignBits - add support for rotate uniform vector amounts	2020-01-25 18:55:47 +00:00
Simon Pilgrim	c8de7c8f50	[TargetLowering] SimplifyDemandedBits - Remove ashr if all our demandedbits already match the sign bit Differential Revision: https://reviews.llvm.org/D73412	2020-01-25 17:36:46 +00:00
Vedant Kumar	802bec8961	Revert "Reland: [DWARF] Allow cross-CU references of subprogram definitions" ... as well as: Revert "[DWARF] Defer creating declaration DIEs until we prepare call site info" This reverts commit `fa4701e197`. This reverts commit `79daafc903`. There have been reports of this assert getting hit: CalleeDIE && "Could not find DIE for call site entry origin	2020-01-24 18:07:54 -08:00
Quentin Colombet	5d87b5d202	[GISelKnownBits] Add support for PHIs Teach the GISelKnowBits analysis how to deal with PHI operations. PHIs are essentially COPYs happening on edges, so we can just reuse the code for COPY. This is NFC COPY-wise has we leave Depth untouched when calling computeKnownBitsImpl for COPYs, like it was before this patch. Increasing Depth is however required for PHIs as they may loop back to themselves and we would end up in an infinite loop if we were not increasing Depth. Differential Revision: https://reviews.llvm.org/D73317	2020-01-24 16:43:52 -08:00
@justice_adams (Justice Adams)	daee63f974	[SelectionDag] Updated FoldConstantArithmetic method signature in preparation for merge with FoldConstantVectorArithmetic Updated FoldConstantArithmetic method signature to match that of FoldConstantVectorArithmetic in preparation for merging the two functions together https://bugs.llvm.org/show_bug.cgi?id=36544 This is the first step in combining the various FoldConstantVectorArithmetic and FoldConstantVectorArithmetic functions into one FoldConstantArithmetic function. Differential Revision: https://reviews.llvm.org/D72870	2020-01-24 18:00:58 -05:00
Craig Topper	d3bf06bc81	[DAGCombiner] Add combine for (not (strict_fsetcc)) to create a strict_fsetcc with the opposite condition. Unlike the existing code that I modified here, I only handle the case where the strict_fsetcc has a single use. Not sure exactly how to handle multiples uses. Testing this on X86 is hard because we already have a other combines that get rid of lowered version of the integer setcc that this xor will eventually become. So this combine really just saves a bunch of extra nodes being created. Not sure about other targets. Differential Revision: https://reviews.llvm.org/D71816	2020-01-24 14:15:36 -08:00
Stanislav Mekhanoshin	be8e38cbd9	Correct NumLoads in clustering Scheduler sends NumLoads argument into shouldClusterMemOps() one less the actual cluster length. So for 2 instructions it will pass just 1. Correct this number. This is NFC for in tree targets. Differential Revision: https://reviews.llvm.org/D73292	2020-01-24 12:45:28 -08:00
Stanislav Mekhanoshin	7a94d4f4ee	Allow combining of extract_subvector to extract element Differential Revision: https://reviews.llvm.org/D73132	2020-01-24 10:50:26 -08:00
Fangrui Song	50a3ff30e1	[PatchableFunction] Allow empty entry MachineBasicBlock Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73301	2020-01-24 09:42:48 -08:00
Tom Weaver	f5147765ba	[DebugInfo][LiveDebugValues] Teach Live Debug Values About Meta Instructions Previously LiveDebugValues pass would consider meta instructions that 'fiddle' with liveness of registers as register definitions when transfering register defs. This would mean that, for example, a KILL instruction would cause LiveDebugValues to terminate the range of an earlier DBG_VALUE instruction resulting in the none propogation of said DBG_VALUE instructions into later blocks. This patch adds the check and a helpful comment, fixes a test that previously tested for the broken behaviour by coincidence and adds a test specifically for this. reviewers: vsk, dstenb, djtodoro Differential Revision: https://reviews.llvm.org/D73210	2020-01-24 16:29:05 +00:00
Guillaume Chatelet	805c157e8a	[Alignment][NFC] Deprecate Align::None() Summary: This is a follow up on https://reviews.llvm.org/D71473#inline-647262. There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case. Reviewers: xbolva00, courbet, bollu Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73099	2020-01-24 12:53:58 +01:00
Simon Pilgrim	0b45c2264a	[SelectionDAG] rot(x, y) --> x iff ComputeNumSignBits(x) == BitWidth(x) Rotating an 0/-1 value by any amount will always result in the same 0/-1 value	2020-01-24 10:35:57 +00:00
Fangrui Song	22467e2595	Add function attribute "patchable-function-prefix" to support -fpatchable-function-entry=N,M where M>0 Similar to the function attribute `prefix` (prefix data), "patchable-function-prefix" inserts data (M NOPs) before the function entry label. -fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry) will look like: ``` .type foo,@function .Ltmp0: # @foo nop foo: .Lfunc_begin0: # optional `bti c` (AArch64 Branch Target Identification) or # `endbr64` (Intel Indirect Branch Tracking) nop .section __patchable_function_entries,"awo",@progbits,get,unique,0 .p2align 3 .quad .Ltmp0 ``` -fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html): ``` (a) (b) func: func: .Ltmp0: bti c bti c .Ltmp0: nop nop ``` (a) needs no additional code. If the consensus is to go for (b), we will need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp . Differential Revision: https://reviews.llvm.org/D73070	2020-01-23 17:02:27 -08:00
Simon Pilgrim	e25eee4db7	[SelectionDAG] ComputeNumSignBits - add ISD::ADD demanded elts support	2020-01-23 17:48:07 +00:00
Sam Parker	05532575e8	[RDA] Skip debug values Skip debug instructions when iterating through a block to find uses. Differential Revision: https://reviews.llvm.org/D73273	2020-01-23 17:04:54 +00:00
Simon Pilgrim	0fec8acdd8	[SelectionDAG] ComputeNumSignBits - add ISD::ADD vector support Add missing handling for (ADD (AND X, 1), -1) uniform vectors	2020-01-23 16:42:12 +00:00
Guillaume Chatelet	59f95222d4	[Alignment][NFC] Use Align with CreateAlignedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73274	2020-01-23 17:34:32 +01:00
Simon Pilgrim	fc5bbbf328	[SelectionDAG] ComputeNumSignBits - add ISD::SUB demanded elts support	2020-01-23 16:20:48 +00:00
Jay Foad	b482e1bfe2	[CodeGen] Make use of MachineInstrBuilder::getReg Reviewers: arsenm Subscribers: wdng, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73262	2020-01-23 13:38:13 +00:00
Sam Parker	0d1468db58	[NFC][RDA] Make the interface const Make all the public query methods const.	2020-01-23 13:32:11 +00:00
Simon Pilgrim	48d4ba8fb2	[SelectionDAG] Compute Known + Sign Bits - merge INSERT_VECTOR_ELT known/unknown index paths Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call.	2020-01-23 13:31:37 +00:00
Guillaume Chatelet	279fa8e006	[Alignement][NFC] Deprecate untyped CreateAlignedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73260	2020-01-23 13:34:32 +01:00
Simon Pilgrim	03cae086f4	[SelectionDAG] ComputeKnownBits - merge EXTRACT_VECTOR_ELT known/unknown index paths Match the approach in SimplifyDemandedBits/ComputeNumSignBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits call.	2020-01-23 11:29:16 +00:00
Simon Pilgrim	98da49d979	[SelectionDAG] Compute Known + Sign Bits - merge INSERT_SUBVECTOR known/unknown index paths Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call, additionally we only ever need original demanded elts of the base vector even if the index is unknown.	2020-01-23 11:29:15 +00:00
Djordje Todorovic	91b0956f38	[NFC][DwarfDebug] Use proper analog GNU attribute for the pc address The low_pc is analog to the DW_AT_call_return_pc, since it describes the return address after the call. The DW_AT_call_pc is the address of the call instruction, and we don't use it at the moment. Differential Revision: https://reviews.llvm.org/D73173	2020-01-23 12:15:35 +01:00
David Tenty	45a4aaea7f	[NFC][XCOFF] Refactor Csect creation into TargetLoweringObjectFile Summary: We create a number of standard types of control sections in multiple places for things like the function descriptors, external references and the TOC anchor among others, so it is possible for their properties to be defined inconsistently in different places. This refactor moves their creation and properties into functions in the TargetLoweringObjectFile class hierarchy, where functions for retrieving various special types of sections typically seem to reside. Note: There is one case in PPCISelLowering which is specific to function entry points which we don't address since we don't have access to the TLOF there. Reviewers: DiggerLin, jasonliu, hubert.reinterpretcast Reviewed By: jasonliu, hubert.reinterpretcast Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72347	2020-01-22 12:09:11 -05:00
Stanislav Mekhanoshin	2d0fcf786c	Precommit NFC part of DAGCombiner change. NFC. This is NFC part of DAGCombiner::visitEXTRACT_SUBVECTOR() change in the D73132.	2020-01-22 09:01:22 -08:00
Hiroshi Yamauchi	ddbc728828	[PGO][PGSO] Update BFI in CodeGenPrepare::optimizeSelectInst. Summary: Without the BFI update, some hot blocks are incorrectly treated as cold code. This fixes a FDO perf regression in the TSVC benchmark from D71288. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73146	2020-01-22 08:36:54 -08:00
Sander de Smalen	4cf16efe49	[AArch64][SVE] Add patterns for unpredicated load/store to frame-indices. This patch also fixes up a number of cases in DAGCombine and SelectionDAGBuilder where the size of a scalable vector is used in a fixed-width context (thus triggering an assertion failure). Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D71215	2020-01-22 14:32:27 +00:00
Jay Foad	e0f0d0e55c	[MachineScheduler] Allow clustering mem ops with complex addresses The generic BaseMemOpClusterMutation calls into TargetInstrInfo to analyze the address of each load/store instruction, and again to decide whether two instructions should be clustered. Previously this had to represent each address as a single base operand plus a constant byte offset. This patch extends it to support any number of base operands. The old target hook getMemOperandWithOffset is now a convenience function for callers that are only prepared to handle a single base operand. It calls the new more general target hook getMemOperandsWithOffset. The only requirements for the base operands returned by getMemOperandsWithOffset are: - they can be sorted by MemOpInfo::Compare, such that clusterable ops get sorted next to each other, and - shouldClusterMemOps knows what they mean. One simple follow-on is to enable clustering of AMDGPU FLAT instructions with both vaddr and saddr (base register + offset register). I've left a FIXME in the code for this case. Differential Revision: https://reviews.llvm.org/D71655	2020-01-22 14:28:24 +00:00
Simon Pilgrim	80656fd7ae	[SelectionDAG] getShiftAmountConstant - assert the type is an integer.	2020-01-22 13:52:44 +00:00
Sander de Smalen	67d4c9924c	Add support for (expressing) vscale. In LLVM IR, vscale can be represented with an intrinsic. For some targets, this is equivalent to the constexpr: getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1 This can be used to propagate the value in CodeGenPrepare. In ISel we add a node that can be legalized to one or more instructions to materialize the runtime vector length. This patch also adds SVE CodeGen support for VSCALE, which maps this node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD] instructions (scaled multiples of 2, 4, or 8 bytes, respectively). Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner Reviewed by: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D68203	2020-01-22 10:09:27 +00:00
Guillaume Chatelet	0957233320	[Alignment][NFC] Use Align with CreateMaskedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73106	2020-01-22 11:04:39 +01:00
Amara Emerson	67a8775322	[AArch64] Don't generate gpr CSEL instructions in early-ifcvt if regclasses aren't compatible. In GlobalISel we may in some unfortunate circumstances generate PHIs with operands that are on separate banks. If-conversion doesn't currently check for that case and ends up generating a CSEL on AArch64 with incorrect register operands. Differential Revision: https://reviews.llvm.org/D72961	2020-01-21 16:51:31 -08:00
Quentin Colombet	ff1f3cc1a1	[GISelKnownBits] Make the max depth a parameter of the analysis Allow users of that analysis to define the cut off depth of the analysis instead of hardcoding 6. NFC as the default parameter is 6.	2020-01-21 11:35:31 -08:00
Thomas Lively	3ef169e586	[WebAssembly][InstrEmitter] Foundation for multivalue call lowering Summary: WebAssembly is unique among upstream targets in that it does not at any point use physical registers to store values. Instead, it uses virtual registers to model positions in its value stack. This means that some target-independent lowering activities that would use physical registers need to use virtual registers instead for WebAssembly and similar downstream targets. This CL generalizes the existing `usesPhysRegsForPEI` lowering hook to `usesPhysRegsForValues` in preparation for using it in more places. One such place is in InstrEmitter for instructions that have variadic defs. On register machines, it only makes sense for these defs to be physical registers, but for WebAssembly they must be virtual registers like any other values. This CL changes InstrEmitter to check the new target lowering hook to determine whether variadic defs should be physical or virtual registers. These changes are necessary to support a generalized CALL instruction for WebAssembly that is capable of returning an arbitrary number of arguments. Fully implementing that instruction will require additional changes that are described in comments here but left for a follow up commit. Reviewers: aheejin, dschuff, qcolombet Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71484	2020-01-21 11:13:46 -08:00
Fangrui Song	7a8b0b1595	[StackColoring] Remap PseudoSourceValue frame indices via MachineFunction::getPSVManager() Reviewed By: dantrushin Differential Revision: https://reviews.llvm.org/D73063	2020-01-21 09:46:27 -08:00
Krzysztof Parzyszek	020041d99b	Update spelling of {analyze,insert,remove}Branch in strings and comments These names have been changed from CamelCase to camelCase, but there were many places (comments mostly) that still used the old names. This change is NFC.	2020-01-21 10:15:38 -06:00
Simon Pilgrim	f04284cf1d	[TargetLowering] SimplifyDemandedBits ISD::SRA multi-use handling Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses	2020-01-21 15:12:07 +00:00
Simon Pilgrim	47f99d2ca8	[SelectionDAG] GetDemandedBits - remove ANY_EXTEND handling Rely on SimplifyMultipleUseDemandedBits fallback instead.	2020-01-21 14:39:00 +00:00
Simon Pilgrim	651fa669a2	[TargetLowering] SimplifyDemandedBits ANY_EXTEND/ANY_EXTEND_VECTOR_INREG multi-use handling Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses	2020-01-21 14:07:19 +00:00
Simon Pilgrim	5f5f478564	[DAG] Fold extract_vector_elt (scalar_to_vector), K to undef (K != 0) This was unconditionally folding this to the source operand, even if the access was out of bounds. Use undef instead of the extract is not the first element. This helps with some cases where 3-vectors are legalized and avoids processing the 4th component. Original Patch by: arsenm (Matt Arsenault) Differential Revision: https://reviews.llvm.org/D51589	2020-01-21 10:58:30 +00:00
Simon Pilgrim	8d2e6bdbe1	[TargetLowering] SimplifyDemandedBits - Pull out InDemandedMask variable to ISD::SHL. NFCI. Matches ISD::SRA + ISD::SRL variants.	2020-01-21 10:40:18 +00:00
Fangrui Song	d232c21566	[AsmPrinter] Don't emit __patchable_function_entries entry if "patchable-function-entry"="0" Add improve tests	2020-01-20 16:13:48 -08:00
Simon Pilgrim	9c06c10fba	[SelectionDAG] GetDemandedBits - fallback to SimplifyMultipleUseDemandedBits by default. First step towards removing SelectionDAG::GetDemandedBits entirely since it so similar to SimplifyMultipleUseDemandedBits anyhow.	2020-01-20 16:51:52 +00:00
Awanish Pandey	84c4c87e04	Recommit "[DWARF5][DebugInfo]: Added support for DebugInfo generation for auto return type for C++ member functions." Summary: This was reverted in `328e0f3dca` due to chromium bot failure. This revision addresses that case. Original commit message: Summary: This patch will provide support for auto return type for the C++ member functions. Before this return type of the member function is deduced and stored in the DIE. This patch includes llvm side implementation of this feature. Patch by: Awanish Pandey <Awanish.Pandey@amd.com> Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george Reviewed by: dblaikie Differential Revision: https://reviews.llvm.org/D70524	2020-01-20 15:13:13 +05:30
Fangrui Song	eaab1bf21e	[StackColoring] Remap FixedStackPseudoSourceValue frame index referenced by MachineMemOperand StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0) but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i) referenced by MachineMemoryOperand. This can cause an assertion failure when LiveDebugValues references a dead stack object. It is difficult to craft a test case. -g, va_copy and stack-coloring are required. I can only reproduce it on ppc32.	2020-01-19 22:53:45 -08:00
Fangrui Song	886d2c2ca7	[BranchRelaxation] Simplify offset computation and fix a bug in adjustBlockOffsets() If Start!=0, adjustBlockOffsets() may unnecessarily adjust the offset of Start. There is no correctness issue, but it can create more block splits.	2020-01-19 16:02:16 -08:00
Fangrui Song	9a24488cb6	[CodeGen] Move fentry-insert, xray-instrumentation and patchable-function before addPreEmitPass() This intention is to move patchable-function before aarch64-branch-targets (configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424). This also allows addPreEmitPass() passes to know the precise instruction sizes if they want. Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1. No output difference with this commit and the previous commit.	2020-01-19 00:09:46 -08:00
Fangrui Song	9583a3f262	[AsmPrinter] Delete dead takeDeletedSymbsForFunction() The code added in r98579 is dead now.	2020-01-18 17:08:00 -08:00
Michael Liao	6d0d86a64d	[DAG] Add helper for creating constant vector index with correct type. NFC.	2020-01-18 01:23:36 -05:00
David Blaikie	58b10df54f	DebugInfo: Move SectionLabel tracking into CU's addRange This makes the SectionLabel handling more resilient - specifically for future PROPELLER work which will have more CU ranges (rather than just one per function). Ultimately it might be nice to make this more general/resilient to arbitrary labels (rather than relying on the labels being created for CU ranges & then being reused by ranges, loclists, and possibly other addresses). It's possible that other (non-rnglist/loclist) uses of addresses will need the addresses to be in SectionLabels earlier (eg: move the CU.addRange to be done on function begin, rather than function end, so during function emission they are already populated for other use).	2020-01-17 18:12:34 -08:00
Derek Schuff	ff171acf84	[WebAssembly] Track frame registers through VReg and local allocation This change has 2 components: Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It describes how the Dwarf frame base will be encoded. That can be a register (the default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or a DW_OP_WASM_location descriptr. WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs has run. Make WebAssemblyExplicitLocals store the local it allocates for the frame register. Use this local information to implement getDwarfFrameBase The result is that the DW_AT_frame_base attribute is correctly encoded for each subprogram, and each param and local variable has a correct DW_AT_location that uses DW_OP_fbreg to refer to the frame base. This is a reland of rG3a05c3969c18 with fixes for the expensive-checks and Windows builds Differential Revision: https://reviews.llvm.org/D71681	2020-01-17 17:23:56 -08:00
Matt Arsenault	a4451d88ee	Consolidate internal denormal flushing controls Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.	2020-01-17 20:09:53 -05:00
Reid Kleckner	423e3db6a8	Remove unneeded FoldingSet.h include from Attributes.h Avoids 637 extra FoldingSet.h and Allocator.h includes. FoldingSet.h needs Allocator.h, which is relatively expensive.	2020-01-17 16:36:09 -08:00
Evgenii Stepanov	d081962dea	Merge memtag instructions with adjacent stack slots. Summary: Detect a run of memory tagging instructions for adjacent stack frame slots, and replace them with a shorter instruction sequence * replace STG + STG with ST2G * replace STGloop + STGloop with STGloop This code needs to run when stack slot offsets are already known, but before FrameIndex operands in STG instructions are eliminated; that's the reason for the new hook in PrologueEpilogue. This change modifies STGloop and STZGloop pseudos to take the size as an immediate integer operand, and adds _untied variants of those pseudos that are allowed to take the base address as a FI operand. This is needed to simplify recognizing an STGloop instruction as operating on a stack slot post-regalloc. This improves memtag code size by ~0.25%, and it looks like an additional ~0.1% is possible by rearranging the stack frame such that consecutive STG instructions reference adjacent slots (patch pending). Reviewers: pcc, ostannard Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70286	2020-01-17 15:19:29 -08:00
Ian Levesque	97ba483026	[xray] Allow instrumenting only function entry and/or only function exit Extend -fxray-instrumentation-bundle to split function-entry and function-exit into two separate options, so that it is possible to instrument only function entry or only function exit. For use cases that only care about one or the other this will save significant overhead and code size. Differential Revision: https://reviews.llvm.org/D72890	2020-01-17 13:32:34 -08:00
Ian Levesque	7628e474a5	[xray] Add xray-ignore-loops option XRay allows tuning by minimum function size, but also always instruments functions with loops in them. If the minimum function size is set to a large value the loop instrumention ends up causing most functions to be instrumented anyway. This adds a new flag, xray-ignore-loops, to disable the loop detection logic. Differential Revision: https://reviews.llvm.org/D72659	2020-01-17 13:32:17 -08:00
Adrian Prantl	7b30370e5b	Move the sysroot attribute from DIModule to DICompileUnit [this re-applies `c0176916a4` with the correct commit message and phabricator link] This addresses point 1 of PR44213. https://bugs.llvm.org/show_bug.cgi?id=44213 The DW_AT_LLVM_sysroot attribute is used for Clang module debug info, to allow LLDB to import a Clang module from source. Currently it is part of each DW_TAG_module, however, it is the same for all modules in a compile unit. It is more efficient and less ambiguous to store it once in the DW_TAG_compile_unit. This should have no effect on DWARF consumers other than LLDB. Differential Revision: https://reviews.llvm.org/D71732	2020-01-17 12:55:40 -08:00
Adrian Prantl	c17aee67f1	Revert "Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysroot" This reverts commit `12e479475a`. I accidentally landed this patch with the wrong commit message ...	2020-01-17 12:52:36 -08:00
Adrian Prantl	12e479475a	Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysroot This is a purely cosmetic change that is NFC in terms of the binary output. I bugs me that I called the attribute DW_AT_LLVM_isysroot since the "i" is an artifact of GCC command line option syntax (-isysroot is in the category of -i options) and doesn't carry any useful information otherwise. This attribute only appears in Clang module debug info. Differential Revision: https://reviews.llvm.org/D71722	2020-01-17 09:36:48 -08:00
Simon Pilgrim	1dc2f25790	[SelectionDAG] ComputeKnownBits - assert we're computing the 0'th (difference) result for the SUB/SUBC cases Matches what we already do for the ADD/ADDC/ADDE case.	2020-01-17 13:53:57 +00:00
Sam Parker	42350cd893	[ARM][MVE] Tail Predicate IsSafeToRemove Introduce a method to walk through use-def chains to decide whether it's possible to remove a given instruction and its users. These instructions are then stored in a set until the end of the transform when they're erased. This is now used to perform checks on the iteration count (LoopDec chain), element count (VCTP chain) and the possibly redundant iteration count. As well as being able to remove chains of instructions, we know also check that the sub feeding the vctp is producing the expected value. Differential Revision: https://reviews.llvm.org/D71837	2020-01-17 13:19:14 +00:00
Simon Pilgrim	f611158350	[SelectionDAG] Better ISD::ANY_EXTEND/ISD::ANY_EXTEND_VECTOR_INREG ComputeKnownBits support Add DemandedElts handling to ISD::ANY_EXTEND and add missing ISD::ANY_EXTEND_VECTOR_INREG handling. Despite the lack of test changes this code IS being used - its just that the ANY_EXTEND ops are legalized later on (typically to ZERO_EXTEND equivalents) so we typically manage to combine later on.	2020-01-17 11:37:58 +00:00
Davide Italiano	30a8865142	[FastISel] Lower `llvm.dbg.value(undef, ...` correctly. Summary: Instead of just dropping them. <rdar://problem/58657146> Reviewers: aprantl, vsk, ab, paquette, echristo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72877	2020-01-16 16:22:20 -08:00
Derek Schuff	80906d9d16	Revert "[WebAssembly] Track frame registers through VReg and local allocation" This reverts commit `3a05c3969c`. It breaks under expensive-checks and on Windows	2020-01-16 14:38:00 -08:00
Derek Schuff	3a05c3969c	[WebAssembly] Track frame registers through VReg and local allocation This change has 2 components: Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It describes how the Dwarf frame base will be encoded. That can be a register (the default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or a DW_OP_WASM_location descriptr. WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs has run. Make WebAssemblyExplicitLocals store the local it allocates for the frame register. Use this local information to implement getDwarfFrameBase The result is that the DW_AT_frame_base attribute is correctly encoded for each subprogram, and each param and local variable has a correct DW_AT_location that uses DW_OP_fbreg to refer to the frame base. Differential Revision: https://reviews.llvm.org/D71681	2020-01-16 13:51:17 -08:00
Matt Arsenault	a66d2817ca	GlobalISel: Don't ignore requested ext narrowing type This was assuming the narrow target was the source type. Respect the requested type when these don't match by using intermediate merges. This avoids producing very wide, illegal shift expansions.	2020-01-16 14:29:37 -05:00
Matt Arsenault	be31a7b7ee	GlobalISel: Move extension scalar narrowing to separate function Also rename a few things. Handling a different requested type will require this to become much more complex.	2020-01-16 14:29:37 -05:00
Craig Topper	61a89e17df	[LegalizeDAG][Mips] Add an assert to protect a uint_to_fp implementation from double rounding. Add a i32->f32 uint_to_fp implementation that avoids this code. The algorithm here only works if the sint_to_fp doesn't do any rounding. Otherwise it can round before the offset fixup is applied. Add an assert to protect this. To avoid breaking the one test in tree that tested this code with a set of types that fail the assert, I've enabled i32->f32 to use the i64->f32 algorithm. This only occurs when f64 isn't a legal type. If f64 is legal then we do i32->f64->f32 instead. Differential Revision: https://reviews.llvm.org/D72794	2020-01-16 11:08:16 -08:00
Matt Arsenault	d0943537e1	GlobalISel: Apply target MMO flags to atomics Unify MMO flag handling with SelectionDAG like with loads and stores.	2020-01-16 13:49:43 -05:00
Matt Arsenault	0d0fce42b0	GlobalISel: Preserve load/store metadata in IRTranslator This was dropping the invariant metadata on dead argument loads, so they weren't deleted. Atomics still need to be fixed the same way. Also, apparently store was never preserving dereferencable which should also be fixed.	2020-01-16 13:49:43 -05:00
Jay Foad	885260d5d8	[GlobalISel] Don't arbitrarily limit a mask to 64 bits Reviewers: arsenm Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72853	2020-01-16 16:13:20 +00:00
Jay Foad	63f73545dd	[GlobalISel] Pass MachineOperands into MachineIRBuilder helper methods Reviewers: arsenm, aditya_nandakumar, aemerson Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72849	2020-01-16 16:04:21 +00:00
Sam Parker	760b175109	[ARM][LowOverheadLoops] Update liveness info Recommitting `e93e0d413f` after reverting due to test failures, which will hopefully now be fixed. Original commit message: After expanding the pseudo instructions, update the liveness info. We do this in a post-order traversal of the loop, including its exit blocks and preheader(s). Differential Revision: https://reviews.llvm.org/D72131	2020-01-16 15:44:25 +00:00
Jay Foad	28bb43bdf8	[GlobalISel] Use more MachineIRBuilder helper methods Reviewers: arsenm, nhaehnle Subscribers: wdng, rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72833	2020-01-16 15:34:51 +00:00
Jeremy Morse	c969335abd	Revert "[PHIEliminate] Move dbg values after phi and label" Testing compiler-rt, a new assertion failure occurs when building the GwpAsanTestObjects object. I'm uploading a reproducer to D70597. This reverts commit `75188b01e9`.	2020-01-16 14:01:27 +00:00
Chris Ye	75188b01e9	[PHIEliminate] Move dbg values after phi and label If there are DBG_VALUEs between phi and label (after phi and before label), DBG_VALUE will block PHI lowering after the LABEL. Moving all DBG_VALUEs after Labels in the function ScheduleDAGSDNodes::EmitSchedule to avoid impacting PHI lowering. before: PHI DBG_VALUE LABEL after: (move DBG_VALUE after label) PHI LABEL DBG_VALUE then: (phi lowering after label) LABEL COPY DBG_VALUE Fixes the issue: https://bugs.llvm.org/show_bug.cgi?id=43859 Differential Revision: https://reviews.llvm.org/D70597	2020-01-16 11:58:09 +00:00
Craig Topper	5cf1b01a01	[LegalizeDAG][TargetLowering] Move vXi64/i64->vXf32/f32 uint_to_fp legalizing code from TargetLowering::expandUINT_TO_FP back to LegalizeDAG. This was moved in October 2018, but we don't appear to be using this for vectors on any in tree target. Moving it back simplifies D72794 so we can share the code for i32->f32.	2020-01-15 22:04:50 -08:00
Stanislav Mekhanoshin	8b417dd3d6	Process BUNDLE in tail duplication When tail duplication estimates a size of tail it uses instruction count. Account for a number of instrictions in a bundle too. Differential Revision: https://reviews.llvm.org/D72783	2020-01-15 15:46:57 -08:00
Matt Arsenault	25e9938a45	GlobalISel: Handle more cases of G_SEXT narrowing This now develops the same problem G_ZEXT/G_ANYEXT have where the requested type is assumed to be the source type. This will be fixed separately by creating intermediate merges.	2020-01-15 18:33:15 -05:00
Vedant Kumar	43464509fc	DWARF: Simplify the way the return PC is attached to call site tags, NFC This cleanup was suggested by Djordje in D72489.	2020-01-15 14:16:21 -08:00

... 5 6 7 8 9 ...

28401 Commits