llvm-project

Commit Graph

Author	SHA1	Message	Date
Jeremy Morse	43de305704	[DebugInfo][InstrRef] Fix a tombstone-in-DenseMap crash from D117877 This is a follow-up to D117877: variable assignments of DBG_VALUE $noreg, or DBG_INSTR_REFs where no value can be found, are represented by a DbgValue object with Kind "Undef", explicitly meaning "there is no value". In D117877 I added a special-case to some assignment accounting faster, without considering this scenario. It causes variables to be given the value ValueIDNum::EmptyValue, which then ends up being a DenseMap key. The DenseMap asserts, because EmptyValue is the tombstone key. Fix this by handling the assign-undef scenario in the special case, to match what happens in the general case: the variable has no value if it's only ever assigned $noreg / undef. Differential Revision: https://reviews.llvm.org/D118715	2022-02-02 15:08:49 +00:00
Jeremy Morse	9fd9d56dc6	[DebugInfo][InstrRef][NFC] Use depth-first scope search for variable locs This patch aims to reduce max-rss from instruction referencing, by avoiding keeping variable value information in memory for too long. Instead of computing all the variable values then emitting them to DBG_VALUE instructions, this patch tries to stream the information out through a depth first search: * Make use of the fact LexicalScopes gives a depth-number to each lexical scope, * Produce a map that identifies the last lexical scope to make use of a block, * Enumerate each scope in LexicalScopes' DFS order, solving the variable value problem, * After each scope is processed, look for any blocks that won't be used by any other scope, and emit all the variable information to DBG_VALUE instructions. Differential Revision: https://reviews.llvm.org/D118460	2022-02-02 14:09:54 +00:00
Jeremy Morse	a80181a81e	[DebugInfo][InstrRef][NFC] Free resources at an earlier stage This patch releases some memory from InstrRefBasedLDV earlier that it would otherwise. The underlying problem is: * We store a big table of "live in values for each block", * We translate that into DBG_VALUE instructions in each block, And both exist in memory at the same time, which needlessly doubles that information. The most of what this patch does is: as we progressively translate live-in information into DBG_VALUEs, we free the variable-value / machine-value tracking information as we go, which significantly reduces peak memory. While I'm here, also add a clear method to wipe variable assignments that have been accumulated into VLocTracker objects, and turn a DenseMap into a SmallDenseMap to avoid an initial allocation. Differential Revision: https://reviews.llvm.org/D118453	2022-02-02 12:58:15 +00:00
Jeremy Morse	d556eb7e27	[DebugInfo][InstrRef][NFC] Cache some PHI resolutions Install a cache of DBG_INSTR_REF -> ValueIDNum resolutions, for scenarios where the value has to be reconstructed from several DBG_PHIs. Whenever this happens, it's because branch folding + tail duplication has messed with the SSA form of the program, and we have to solve a mini SSA problem to find the variable value. This is always called twice, so it makes sense to cache the value. This gives a ~0.5% geomean compile-time-performance improvement on CTMark. Differential Revision: https://reviews.llvm.org/D118455	2022-02-02 12:21:28 +00:00
Simon Pilgrim	5aa2acc86b	[DAG] SimplifyDemandedVectorElts - remove KnownZero/KnownUndef from DCI helper wrapper None of the external users actual touch these (they're purely used internally down the recursive call) - its trivial to add another wrapper if anything ever does want to track known elements.	2022-02-02 12:04:49 +00:00
Jeremy Morse	14aaaa1236	Re-apply `3fab2d138e`, now with a triple added Was reverted in `1c1b670a73` as it broke all non-x86 bots. Original commit message: [DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821. It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them. In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero. Differential Revision: https://reviews.llvm.org/D118601	2022-02-02 11:04:00 +00:00
Sam Parker	281d29b8fe	[TypePromotion] Avoid some unnecessary truncs Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-02-02 10:05:15 +00:00
Simon Moll	7d926b7177	[VE] LEGALAVL and staged VVP legalization The new LEGALAVL node annotates that the AVL refers to packs of 64bit. We use a two-stage lowering approach with LEGALAVL: First, standard SDNodes are translated into illegal VVP layer nodes. Regardless of source (VP or standard), all VVP nodes have a mask and AVL parameter. The AVL parameter refers to the element position (just as in VP intrinsics). Second, we legalize the AVL usage in VVP layer nodes. If the element size is < 64bit, the EVL parameter has to be adjusted to refer to packs of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D118321	2022-02-02 09:11:41 +01:00
Kevin Athey	1c1b670a73	Revert "[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out" This reverts commit `3fab2d138e`. Breaking PPC sanitizer build: https://lab.llvm.org/buildbot/#/builders/105/builds/20857	2022-02-01 18:37:02 -08:00
David Blaikie	f69f23396d	Revert "DebugInfo: Don't put types in type units if they reference internal linkage types" This reverts commit `ab4756338c`. Breaks some cases, including this: namespace { template <typename> struct a {}; } // namespace class c { c(); }; class b { b(); a<c> ax; }; b::b() {} c::c() {} By producing a reference to a type unit for "c" but not producing the type unit.	2022-02-01 16:13:07 -08:00
David Green	c89cfbd4dd	Revert "[DAG] Extend SearchForAndLoads with any_extend handling" This reverts commit `100763a88f` as it was making incorrect assumptions about implicit zero_extends.	2022-02-01 20:18:40 +00:00
Jeremy Morse	8e75536e51	[DebugInfo][InstrRef][NFC] Bypass a frequently-noop loop Bypass this loop if it would do nothing -- if there are no register masks to be examined, there's no point looking at each location to see if the location has been def'd. Awkwardly, this was responsible for almost an entire half a percent of performance improvement on CTMark. Differential Revision: https://reviews.llvm.org/D118613	2022-02-01 19:39:09 +00:00
Jeremy Morse	3fab2d138e	[DebugInfo][InstrRef] Add a max-stack-slots-to-track cut-out In certain circumstances with things like autogenerated code and asan, you can end up with thousands of Values live at the same time, causing a large working set and a lot of information spilled to the stack. Unfortunately InstrRefBasedLDV doesn't cope well with this and consumes a lot of memory when there are many many stack slots. See the reproducer in D116821. It seems very unlikely that a developer would be able to reason about hundreds of live named local variables at the same time, so a huge working set and many stack slots is an indicator that we're likely analysing autogenerated or instrumented code. In those cases: gracefully degrade by setting an upper bound on the amount of stack slots to track. This limits peak memory consumption, at the cost of dropping some variable locations, but in a rare scenario where it's unlikely someone is actually going to use them. In terms of the patch, this adds a cl::opt for max number of stack slots to track, and has the stack-slot-numbering code optionally return None. That then filters through a number of code paths, which can then chose to not track a spill / restore if it touches an untracked spill slot. The added test checks that we drop variable locations that are on the stack, if we set the limit to zero. Differential Revision: https://reviews.llvm.org/D118601	2022-02-01 19:25:29 +00:00
Jeremy Morse	91fb66cf91	[DebugInfo][InstrRef][NFC] Don't build a map of un-needed values When finding locations for variable values at the start of a block, we build a large map of every value to every location, and then pick out the locations for values that are desired. This takes up quite a lot of time, because, unsurprisingly, there are usually more values in registers and stack slots than there are variables. This patch instead creates a map of desired values to their locations, which are initially illegal locations. Then, as we examine every available value, we can select locations for values we care about, and ignore those that we don't. This substantially reduces the amount of work done (i.e., building a map up of values to locations that nothing wants or needs). Geomean performance improvement of 1% on CTMark, woo. Differential Revision: https://reviews.llvm.org/D118597	2022-02-01 18:58:06 +00:00
Mircea Trofin	22d3bbdf4e	[nfc][regalloc] Move DefaultEvictionAdvisor::* to RegAllocEvictionAdvisor.cpp This is leftover from the advisor refactoring. Straight-forward copy and paste.	2022-02-01 07:59:25 -08:00
Simon Pilgrim	904395ab8f	[DAG] SimplifyMultipleUseDemandedBits - add default Depth = 0 argument. Simplifies an upcoming change.	2022-02-01 12:34:38 +00:00
Simon Pilgrim	d83a96f59f	[DAG] Make it clear mul(x,x) knownbits bit[1] == 0 check should be for x is undef only As raised on rGffd0e464b4b9, if x is poison, this fold is still ok.	2022-02-01 11:32:14 +00:00
Bjorn Pettersson	3885879046	[DAGCombine] Add simple folds for SSHLSAT/USHLSAT Do "simplifyShift" and "FoldConstantArithmetic" folds for the SSHLSAT and USHLSAT DAG nodes. This includes folds such as: (shlsat undef/poison, x) -> 0 (shlsat x, undef/poison) -> undef (shlsat x, too_large_shamt) -> undef (shlsat 0, x) -> 0 (shlsat x, 0) -> x (shlsat c1, c2) -> c3 Differential Revision: https://reviews.llvm.org/D118603	2022-02-01 10:51:35 +01:00
David Sherwood	daa80339df	[CodeGen] Support folds of not(cmp(cc, ...)) -> cmp(!cc, ...) for scalable vectors I have updated TargetLowering::isConstTrueVal to also consider SPLAT_VECTOR nodes with constant integer operands. This allows the optimisation to also work for targets that support scalable vectors. Differential Revision: https://reviews.llvm.org/D117210	2022-02-01 09:50:00 +00:00
Mircea Trofin	a3f1491849	[nfc][mlgo][regalloc] 'hasPreferredPhys' out of feature components It isn't cacheable, it can be updated by other events than live interval resizing.	2022-01-31 18:59:47 -08:00
Mircea Trofin	9aa2c914b9	[mlgo][regalloc] Factor live interval feature calculation Factoring it out so we can subsequently cache it. This should be a NFC, however, for the float quantities, we see small errors in the least significant digits. This is because, before, we were summing up one by one. Now, we sum up results of sums. This shouldn't matter for ML, and will require rework when we do quantization (avoiding floats altogether), but meanwhile, it did require an update to the reference file used for testing. The patch also bumps the precision of the variables involved in this, to reduce the error (note they are casted back to float at the end by the SET macro, since we only work with float and not double in TF) Differential Revision: https://reviews.llvm.org/D118659	2022-01-31 15:19:15 -08:00
Mircea Trofin	d46305e22d	[NFC][regalloc] Move evict advisor initialization before VRAI This is because a subsequent patch will propose obtaining the VRAI from the advisor, which will enable feature caching for the ML advisor, for better compile time. Making this change first as it's both innocuous and keeps the future patch to be reviewed small.	2022-01-31 14:04:59 -08:00
Mircea Trofin	bc3b372161	[nfc][mlgo] De-const a parameter We plan to pass the MachineFunction& to APIs that expect it non-const (for legitimate reasons). The advisor still holds the ref as a const ref, though, so we keep most of the maintainability value of that.	2022-01-31 13:44:33 -08:00
Philip Reames	57cf29ac1b	[Statepoint] Remove another use of getActualReturnType [NFC] For the cross block gc.result projection case, we only care about the return type if there is a cross block gc.result, and if there is one, we can take the type from the gc.result. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:57:46 -08:00
Adrian Prantl	f85c6b79f3	Fix a fragment overflow problem when composing super-registers. Addresses https://github.com/llvm/llvm-project/issues/53342 Differential Revision: https://reviews.llvm.org/D118412	2022-01-31 09:47:29 -08:00
Philip Reames	6e4f7c0823	[Statepoints] Take result type from gc.result [NFC] When lowering a gc.result, we can assume that the result type of the gc.result matches the type of the underlying call. This is explicitly required in LangRef. At the moment, this makes little difference, but for opaque pointers we need a means to get result typing without relying on pointee types.	2022-01-31 09:42:34 -08:00
Philip Reames	093b43f48d	Sink getGCResultLocality to sole use [NFC]	2022-01-31 09:33:57 -08:00
Jeremy Morse	4a2cb01370	[DebugInfo][InstrRef][NFC] Refactor ahead of further optimisations This patch shuffles some functions around so that some blocks of code can be reused. In particular, * Move the determination of "which blocks are in scope" to its own function, as it's non-trivial to solve. Delete the "InScopeBlocks" collection too, which nothing reads from. * Split transfer emission (i.e., installing DBG_VALUEs into blocks) into its own function. * Name some useful types. * Rename "ScopeToBlocks" to "ScopeToAssignBlocks", as that's what the collection contains, blocks where assignments happen. Differential Revision: https://reviews.llvm.org/D118454	2022-01-31 16:45:53 +00:00
Jeremy Morse	e9739f116d	Revert "[DebugInfo][InstrRef][NFC] Add a missing assignment operator" This reverts commit `f18429372f`. Bitten by -Werror,-Wdeprecated-copy on a buildbot, alas!	2022-01-31 16:15:21 +00:00
Jeremy Morse	f18429372f	[DebugInfo][InstrRef][NFC] Add a missing assignment operator ValueIDNum is supposed to be a value type that boils down to a uint64_t, that has some bitfields for convenience. If we use the default operator=, we end up with each bit field being individually assigned, which is un-necessarily slow. Implement the assignment operator by just copying the uint64_t value of the object. This is quicker, and matches how the comparison operators work already. Doing so is 0.1% faster on the compile-time-tracker.	2022-01-31 16:08:38 +00:00
Kerry McLaughlin	002b944dfa	[SVE] Fix TypeSize->uint64_t implicit conversion in visitAlloca() Fixes a crash ('Invalid size request on a scalable vector') in visitAlloca() when we call this function for a scalable alloca instruction, caused by the implicit conversion of TySize to uint64_t. This patch changes TySize to a TypeSize as returned by getTypeAllocSize() and ensures the allocation size is multiplied by vscale for scalable vectors. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D118372	2022-01-31 14:37:23 +00:00
Dávid Bolvanský	ae990a3cbd	[Analysis] Attribute noundef should not prevent tail call optimization Very similar to https://reviews.llvm.org/D101230 Fixes https://github.com/llvm/llvm-project/issues/53501	2022-01-31 15:13:52 +01:00
Simon Pilgrim	7ec8fc2932	[X86] combineAnd() - per-element simplification - call SimplifyDemandedBits using mask demanded bits if SimplifyDemandedVectorElts fails We already call SimplifyDemandedVectorElts using whether each vector mask element is zero/nonzero, this just extends this to also try SimplifyDemandedBits using the demanded bits mask generated from the nonzero elements. This also requires an additional TargetLowering::SimplifyDemandedBits DemandedBits/DemandedElts wrapper.	2022-01-31 13:58:00 +00:00
Jeremy Morse	c703d77a61	[DebugInfo][InstrRef] Don't fully propagate single assigned variables If we only assign a variable value a single time, we can take a short-cut when computing its location: the variable value is only valid up to the dominance frontier of where the assignemnt happens. Past that point, there are other predecessors from where the variable has no value, meaning the variable has no location past that point. This patch recognises this scenario, and avoids expensive SSA computation, to improve compile-time performance. Differential Revision: https://reviews.llvm.org/D117877	2022-01-31 12:54:17 +00:00
Simon Pilgrim	2d1390efbe	[DAG] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero	2022-01-31 12:00:51 +00:00
Simon Pilgrim	48f45f6b25	[X86] Limit mul(x,x) knownbits tests with not undef/poison check We can only assume bit[1] == zero if its the only demanded bit or the source is not undef/poison	2022-01-31 11:55:10 +00:00
Fangrui Song	0e691aed7e	[mlgo][regalloc] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after `a8a7bf922c`	2022-01-30 15:18:30 -08:00
Mircea Trofin	a8a7bf922c	[mlgo][regalloc] Fix register masking If AllocationOrder has less than 32 elements, we were treating the extra positions as if they were valid. This was detected by a subsequent assert. The fix also tightens the asserts.	2022-01-30 14:59:08 -08:00
Markus Böck	e0b11c7659	[Support][NFC] Fix generic `ChildrenGetterTy` of `IDFCalculatorBase` Both IDFCalculatorBase and its accompanying DominatorTreeBase only supports pointer nodes. The template argument is the block type itself and any uses of GraphTraits is therefore done via a pointer to the node type. However, the ChildrenGetterTy type of IDFCalculatorBase has a use on just the node type instead of a pointer to the node type. Various parts of the monorepo has worked around this issue by providing specializations of GraphTraits for the node type directly, or not been affected by using specializations instead of the generic case. These are unnecessary however and instead the generic code should be fixed instead. An example from within Tree is eg. A use of IDFCalculatorBase in InstrRefBasedImpl.cpp. It basically instantiates a IDFCalculatorBase<MachineBasicBlock, false> but due to the bug above then goes on to specialize GraphTraits<MachineBasicBlock> although GraphTraits<MachineBasicBlock*> exists (and should be used instead). Similar dead code exists in clang which defines redundant GraphTraits to work around this bug. This patch fixes both the original issue and removes the dead code that was used to work around the issue. Differential Revision: https://reviews.llvm.org/D118386	2022-01-30 22:09:07 +01:00
Kazu Hirata	2bea207d26	[CodeGen] Use default member initialization (NFC) Identified with modernize-use-default-member-init.	2022-01-30 12:32:51 -08:00
Mircea Trofin	bc5644ee74	[MLGO] Regalloc: allow multiple occurences of -regalloc-enable-advisor This allows scearios where some central config sets it one way and a user wants to override it.	2022-01-29 09:00:52 -08:00
Fangrui Song	33b38339a0	[lld] Add module name to LTO inline asm diagnostic Close #52781: for LTO, the inline asm diagnostic uses `<inline asm>` as the file name (lib/CodeGen/AsmPrinter/AsmPrinterInlineAsm.cpp) and it is unclear which module has the issue. With this patch, we will see the module name (say `asm.o`) before `<inline asm>` with ThinLTO. ``` % clang -flto=thin -c asm.c && myld.lld asm.o -e f ld.lld: error: asm.o <inline asm>:1:2: invalid instruction mnemonic 'invalid' invalid ^~~~~~~ ``` For regular LTO, unfortunately the original module name is lost and we only get ld-temp.o. Reviewed By: #lld-macho, ychen, Jez Ng Differential Revision: https://reviews.llvm.org/D118434	2022-01-28 11:32:42 -08:00
Cullen Rhodes	5d089d9a83	[DAGCombiner] Fix invalid size request in combineRepeatedFPDivisors If we have a vector FP division with a splatted divisor, use getVectorMinNumElements when scaling the num of uses by splat factor. For AArch64 the combine kicks in for the <vscale x 4 x float> case since it's above the fdiv threshold (3) when scaling num uses by splat factor, but the codegen is worse (splat + vector fdiv + vector fmul) than the <vscale x 2 x double> case (splat + vector fdiv). If the combine could be converted into a scalar FP division by scalarizeBinOpOfSplats it may be cheaper, but it looks like this is predicated on the isExtractVecEltCheap TLI function which is implemented for x86 but not AArch64. Perhaps for now combineRepeatedFPDivisors should only scale num uses by splat if the division can be converted into scalar op. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118343	2022-01-28 17:01:08 +00:00
Jeremy Morse	76fd78b4b3	[MVerifier] Don't check liveness of any debug instruction operands Shiny new DBG_PHI instruction usually have physical registers as operands -- however, the machine verifier checks to see whether they're live, and occasionally this fails. There's a filter for DBG_VALUE instructions to not get verified in this way: expand it to exempt all debug instructions from liveness checking, which means DBG_PHIs get treated like DBG_VALUEs. This also future proofs against us adding new debug instructions. Differential Revision: https://reviews.llvm.org/D117891	2022-01-28 15:04:54 +00:00
Martin Storsjö	f7d2afbac9	[CodeGen] Emit COFF symbol type for function aliases On the level of the generated object files, both symbols (both original and alias) are generally indistinguishable - both are regular defined symbols. But previously, only the original function had the COFF ComplexType set to IMAGE_SYM_DTYPE_FUNCTION, while the symbol created via an alias had the type set to IMAGE_SYM_DTYPE_NULL. This matches what GCC does, which emits directives for setting the COFF symbol type for this kind of alias symbol too. This makes a difference when GNU ld.bfd exports symbols without dllexport directives or a def file - it seems to decide between function or data exports based on the COFF symbol type. This means that functions created via aliases, like some C++ constructors, are exported as data symbols (missing the thunk for calling without dllimport). The hasnt been an issue when doing the same with LLD, as LLD decides between function or data export based on the flags of the section that the symbol points at. This should fix the root cause of https://github.com/msys2/MINGW-packages/issues/10547. Differential Revision: https://reviews.llvm.org/D118328	2022-01-28 13:06:16 +02:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Simon Pilgrim	fdd3e2c943	[DAG] SelectionDAG::getNode(N1,N2) - detect N2 constant vector splats as well as scalars We already perform some basic folds (add/sub with zero etc.) on scalar types, this patch adds some basic support for constant splats as well in a few cases (we can add more with future test coverage). In the cases I've enabled, we can handle buildvector implicit truncation as we're not creating new constant nodes from the vector types - we're just returning existing nodes. This allows us to get a number of extra cases in the aarch64 tests. I haven't enabled support for undefs in buildvector splats, as we're often checking for zero/allones patterns that return the original constant and we shouldn't be returning undef elements in some of these cases - we can enable this later if we're OK with creating new constants. Differential Revision: https://reviews.llvm.org/D118264	2022-01-27 10:59:08 +00:00
Fraser Cormack	84e85e025e	[SelectionDAG][VP] Provide expansion for VP_MERGE This patch adds support for expanding VP_MERGE through a sequence of vector operations producing a full-length mask setting up the elements past EVL/pivot to be false, combining this with the original mask, and culminating in a full-length vector select. This expansion should work for any data type, though the only use for RVV is for boolean vectors, which themselves rely on an expansion for the VSELECT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118058	2022-01-27 09:00:41 +00:00
Adrian Prantl	ee72b17386	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Relanding (I accidentally pushed an earlier version of the patch previously). Differential Revision: https://reviews.llvm.org/D118183	2022-01-26 13:08:35 -08:00
Adrian Prantl	f400a6012c	Revert "Fix UB in DwarfExpression::emitLegacyZExt()" This reverts commit `216002c4bb` while investigating bot breakage.	2022-01-26 12:46:07 -08:00
Matt Arsenault	2d670de84c	GlobalISel: Avoid crash on asm with lying result types The physical register in the asm has the wrong type for the declared IR. It seems to work in the DAG by extracting the 4 elements that are defined in the IR from the register, but that isn't handled here. This doesn't seem to be a well tested path since other mismatched cases are crashing the DAG asm handling.	2022-01-26 15:23:59 -05:00
Adrian Prantl	216002c4bb	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Differential Revision: https://reviews.llvm.org/D118183	2022-01-26 10:57:11 -08:00
Chih-Ping Chen	28bfa57a73	[DebugInfo] Add stringLocationExp field to DIStringType DIStringType is used to encode the debug info of a character object in Fortran. A Fortran deferred-length character object is typically implemented as a pair of the following two pieces of info: An address of the raw storage of the characters, and the length of the object. The stringLocationExp field contains the DIExpression to get to the raw storage. This patch also enables the emission of DW_AT_data_location attribute in a DW_TAG_string_type debug info entry based on stringLocationExp in DIStringType. A test is also added to ensure that the bitcode reader is backward compatible with the old DIStringType format. Differential Revision: https://reviews.llvm.org/D117586	2022-01-26 11:56:57 -05:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
Sanjay Patel	63daea8b35	[SDAG] fix bug in ComputeNumSignBits of target constant The loop below the changed line assumes that the element width of the target constant is the same as the element width of the loaded value, but that is not always true. We could try harder to do some kind of min/max calc even if the sizes don't match, but that can be another patch if needed. This fixes #53401 (miscompile) and does not change the motivating cases added when this analysis was introduced: `ad298f86b7`	2022-01-26 10:22:41 -05:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
alex-t	5157f984ae	[AMDGPU] Enable divergence-driven XNOR selection Currently not (xor_one_use) pattern is always selected to S_XNOR irrelative od the node divergence. This relies on further custom selection pass which converts to VALU if necessary and replaces with V_NOT_B32 ( V_XOR_B32) on those targets which have no V_XNOR. Current change enables the patterns which explicitly select the not (xor_one_use) to appropriate form. We assume that xor (not) is already turned into the not (xor) by the combiner. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D116270	2022-01-26 15:33:10 +03:00
Sebastian Neubauer	4723f3cf03	[AMDGPU][GlobalISel] Combine unmerge of undef Fold (unmerge undef) -> undef, undef, ... Differential Revision: https://reviews.llvm.org/D118138	2022-01-26 12:30:36 +01:00
David Green	57356d6bb7	[DAG] Create fptoui.sat from clamped fptoui This is the unsigned variant of D111976, where we convert a clamped fptoui to a fptoui.sat. Because we are unsigned, the condition this time is only UMIN of UINT_MAX. Similarly to D111976 it handles ISD::UMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D114964	2022-01-26 08:37:44 +00:00
wangpc	8597458278	[regalloc] Fix assertion error when LiveInterval is empty When evicting interference, it causes an asseertion error since LiveIntervals::intervalIsInOneMBB assumes that input is not empty. This patch fixed bug mentioned in D118020. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D118124	2022-01-26 14:06:57 +08:00
Adrian Prantl	3efa016d4c	Revert accidentally pushed commit. It was not yet reviewed. "Fix UB in DwarfExpression::emitLegacyZExt()" This reverts commit `e37de5d36e`.	2022-01-25 13:53:14 -08:00
Adrian Prantl	e37de5d36e	Fix UB in DwarfExpression::emitLegacyZExt() A shift-left > 63 triggers a UBSAN failure. This patch kicks the can down the road (to the consumer) by emitting a more compact representation of the shift computation in DWARF expressions. Differential Revision: https://reviews.llvm.org/D118183	2022-01-25 13:49:14 -08:00
Sean Fertile	a2505bd063	[PowerPC][AIX] Override markFunctionEnd() During fast-isel calling 'markFunctionEnd' in the base class will call tidyLandingPads. This can cause an issue where we have determined that we need ehinfo and emitted a traceback table with the bits set to indicate that we will be emitting the ehinfo, but the tidying deletes all landing pads. In this case we end up emitting a reference to __ehinfo.N symbol, but not emitting a definition to said symbol and the resulting file fails to assemble. Differential Revision: https://reviews.llvm.org/D117040	2022-01-25 10:08:53 -05:00
Nikita Popov	a3a2239aaa	[GlobalISel] Avoid pointer element type access during InlineAsm lowering Same change as has been made for the SDAG lowering.	2022-01-25 14:26:47 +01:00
Simon Pilgrim	15e2be291f	[DAG] visitMULHS/MULHU/AND - remove some redundant LHS constant checks Now that we constant fold and canonicalize constants to the RHS, we don't need to check both LHS and RHS for specific constants	2022-01-25 11:54:23 +00:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Fraser Cormack	7cb452bfde	[SelectionDAG][VP] Add widening support for VP_MERGE This patch adds widening support for ISD::VP_MERGE, which widens identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118030	2022-01-25 10:59:40 +00:00
Fraser Cormack	5f5c5603ce	[SelectionDAG][VP] Add splitting support for VP_MERGE This patch adds splitting support for ISD::VP_MERGE, which splits identically to VP_SELECT and similarly to other select-like nodes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D118032	2022-01-25 10:33:23 +00:00
Victor Perez	2233befa5d	[LegalizeTypes][VP] Add splitting support for vp.gather and vp.scatter Split these nodes in a similar way as their masked versions. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D117760	2022-01-25 10:08:07 +00:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Nikita Popov	9554aaa275	[Dwarf] Optimize getOrCreateSourceID() for repeated calls on same file (NFCI) DwarfCompileUnit::getOrCreateSourceID() is often called many times in sequence with the same DIFile. This is currently very expensive, because it involves creating a string from directory and file name and looking it up in a string map. This patch remembers the last DIFile and its ID and directly returns that. This gives a geomean -1.3% compile-time improvement on CTMark O0-g. Differential Revision: https://reviews.llvm.org/D118041	2022-01-25 09:27:11 +01:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Paweł Bylica	9d32847b33	[DAGCombine] Remove unused param in combineCarryDiamond(). NFC	2022-01-24 20:57:00 +01:00
Mircea Trofin	b1af01fe6a	[NFC][MLGO] Simplify conditional compilation Most of the code that's shared between 'release' and 'development' modes doesn't depend on anything special.	2022-01-24 11:19:04 -08:00
Jeremy Morse	d27f022614	[NFC][DebugInfo] Strip out an undesired #if 0 block As mentioned in discussion of D116821, it's better to just delete this block than keep it hanging around.	2022-01-24 18:04:47 +00:00
Jeremy Morse	74db5c8c95	Revert rG6a605b97a200 due to excessive memory use Over in the comments for D116821, some use-cases have cropped up where there's a substantial increase in memory usage. A quick inspection shows that a) it's a lot of memory and b) there are several things to be done to reduce it. Reverting (via disabling this feature by default) to avoid bothering people in the meantime.	2022-01-24 17:08:21 +00:00
Sander de Smalen	699e22a083	[ISEL] Move trivial step_vector folds to FoldConstantArithmetic. Given that step_vector is practically a constant, doing this early helps with DAGCombine folds that happen before type legalization. There is currently no way to test this happens earlier, although existing tests for step_vector folds continue protect the folds happening at all. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117863	2022-01-24 16:37:21 +00:00
Craig Topper	a43ed49f5b	[DAGCombiner][RISCV] Canonicalize (bswap(bitreverse(x))->bitreverse(bswap(x)). If the bitreverse gets expanded, it will introduce a new bswap. By putting a bswap before the bitreverse, we can ensure it gets cancelled out when this happens. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118012	2022-01-24 08:31:53 -08:00
Craig Topper	b8c7cdcc81	[SelectionDAG][RISCV] Teach getNode to fold bswap(bswap(x))->x. This can show up during when bitreverse is expanded to bswap and swap of bits within a byte. If the input is already a bswap, we should cancel them out before we further transform them in a way that makes it harder to see the redundancy. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D118007	2022-01-24 08:17:46 -08:00
Matt Arsenault	99e8e17313	Reapply "Revert "GlobalISel: Add G_ASSERT_ALIGN hint instruction" This reverts commit `a97e20a3a8`.	2022-01-24 09:26:52 -05:00
serge-sans-paille	5f290c090a	Move STLFunctionalExtras out of STLExtras Only using that change in StringRef already decreases the number of preoprocessed lines from 7837621 to 7776151 for LLVMSupport Perhaps more interestingly, it shows that many files were relying on the inclusion of StringRef.h to have the declaration from STLExtras.h. This patch tries hard to patch relevant part of llvm-project impacted by this hidden dependency removal. Potential impact: - "llvm/ADT/StringRef.h" no longer includes <memory>, "llvm/ADT/Optional.h" nor "llvm/ADT/STLExtras.h" Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup/5831	2022-01-24 14:13:21 +01:00
Bjorn Pettersson	46cacdbb21	[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth In code review for D117104 two slightly weird checks were found in DAGCombiner::reduceLoadWidth. They were typically checking if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)), but such a comparison actually only make sense if BitsB is a power of two. The checks were related to the code that attempted to shrink a load based on the fact that the loaded value would be right shifted. Afaict the legality of the value types is checked later (typically in isLegalNarrowLdSt), so the existing checks were both overly conservative as well as being wrong whenever ExtVTBits wasn't a power of two. The latter was a situation triggered by a number of lit tests so we could not just assert on ExtVTBIts being a power of two). When attempting to simply remove the checks I found some problems, that seems to have been guarded by the checks (maybe just out of luck). A typical example would be a pattern like this: t1 = load i96* ptr t2 = srl t1, 64 t3 = truncate t2 to i64 When DAGCombine is visiting the truncate reduceLoadWidth is called attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then the SRL is detected and we set ShAmt to 64. In the past we've bailed out due to i96 not being a multiple of 64. If we simply remove that check then we would end up replacing the load with a new load that would read 64 bits but with a base pointer adjusted by 64 bits. So we would read 32 bits the wasn't accessed by the original load. This patch will instead utilize the fact that the logical left shift can be folded away by using a zextload. Thus, the pattern above will now be combined into t3 = load i32* ptr+offset, zext to i64 Another case is shown in the X86/shift-folding.ll test case: t1 = load i32* ptr t2 = srl i32 t1, 8 t3 = truncate t2 to i16 In the past we bailed out due to the shift count (8) not being a multiple of 16. Now the narrowing kicks in and we get t3 = load i16* ptr+offset Differential Revision: https://reviews.llvm.org/D117406	2022-01-24 12:22:04 +01:00
Nikita Popov	0d1308a7b7	[AArch64][GlobalISel] Support returned argument with multiple registers The call lowering code assumed that a returned argument could only consist of one register. Pass an ArrayRef<Register> instead of Register to make sure that all parts get assigned. Fixes https://github.com/llvm/llvm-project/issues/53315. Differential Revision: https://reviews.llvm.org/D117866	2022-01-24 10:55:28 +01:00
Nikita Popov	e7c9a6cae0	[SDAG] Don't move DBG_VALUE instructions after insertion point during scheduling (PR53243) EmitSchedule() shouldn't be touching instructions after the provided insertion point. The change introduced in D83561 performs a scan to the end of the block, and thus may move unrelated instructions. In particular, this ends up moving instructions that have been produced by FastISel and will later be deleted. Moving them means that more instructions than intended are removed. Fix this by stopping the iteration when the insertion point is reached. Fixes https://github.com/llvm/llvm-project/issues/53243. Differential Revision: https://reviews.llvm.org/D117489	2022-01-24 10:50:49 +01:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Abinav Puthan Purayil	68b70d17d8	[GlobalISel] Fold or of shifts with constant amount to funnel shift. This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0 and C1 are constants where C0 + C1 is the bit-width of the shift instructions. Differential Revision: https://reviews.llvm.org/D116529	2022-01-24 10:43:32 +05:30
David Blaikie	2e58a18910	DebugInfo: Include template parameters for simplified template decls in type units LLVM DebugInfo CodeGen synthesizes type declarations in type units when referencing types that are not in type units. When those synthesized types are templates and simplified template names (or mangled simplified template names) are in use, the template arguments must be attached to those declarations. A deeper fix (with a CU or DICompositeType flag) that would also support other uses of clang's -debug-forward-template-args (such as Sony's platform) could/should be implemented to fix this more broadly.	2022-01-23 16:10:14 -08:00
David Blaikie	ab4756338c	DebugInfo: Don't put types in type units if they reference internal linkage types Doing this causes a declaration of the internal linkage (anonymous namespace) type to be emitted in the type unit, which would then be ambiguous as to which internal linkage definition it refers to (since the name is only valid internally). It's possible these internal linkage types could be resolved relative to the unit the TU is referred to from - but that doesn't seem ideal, and there's no reason to put the type in a type unit since it can only be defined in one CU anyway (since otherwise it'd be an ODR violation) & so avoiding the type unit should be a smaller DWARF encoding anyway. This also addresses an issue with Simplified Template Names where the template parameter could not be rebuilt from the declaration emitted into the TU (specifically for an enum non-type template parameter, where looking up the enumerators is necessary to rebuild the full template name)	2022-01-23 14:07:31 -08:00
Simon Pilgrim	accc07e654	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:36:25 +00:00
Simon Pilgrim	6605057992	Revert rG7c66aaddb128dc0f342830c1efaeb7a278bfc48c "[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312)" Noticed a typo in the getBooleanContents call just after I pressed commit :(	2022-01-23 16:28:44 +00:00
Simon Pilgrim	7c66aaddb1	[DAG] Fold (X & Y) != 0 --> zextOrTrunc(X & Y) iff everything but LSB is known zero (PR51312) Fixes parity codegen issue where we know all but the lowest bit is zero, we can replace the ICMPNE with 0 comparison with a ext/trunc Differential Revision: https://reviews.llvm.org/D117983	2022-01-23 16:20:42 +00:00
Simon Pilgrim	20d46fbd4a	[CodeGenPrepare] Use dyn_cast result to check for null pointers Simplifies logic and helps the static analyzer correctly check for nullptr dereferences	2022-01-23 12:47:52 +00:00
David Green	b27e5459d5	[DAG] Convert truncstore(extend(x)) back to store(x) Pulled out of D106237, this folds truncstore(extend(x)) back to store(x) if the original store was legal. This can come up due to the order we fold nodes. A fold from X86 needs to be adjusted to prevent infinite loops, to have it pick the operand of a trunc more directly. Differential Revision: https://reviews.llvm.org/D117901	2022-01-22 13:20:36 +00:00
OCHyams	b6a41fddcf	[DWARF][DebugInfo] Fix off-by-one error in size of DW_TAG_base_type types Fix PR53163 by rounding the byte size of DW_TAG_base_type types up. Without this fix we risk emitting types with a truncated size (including rounding less-than-byte-sized types' sizes down to zero). Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D117124	2022-01-21 11:37:49 +00:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Alexandre Ganea	5af2433e17	[clang-cl] Support the /HOTPATCH flag This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019 The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching). Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: `d703b92296/lld/COFF/Writer.cpp (L1298)` The outcome is that we can finally use Live++ or Recode along with clang-cl. NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result. Depends on D43002, D80833 and D81301 for the full feature. Differential Revision: https://reviews.llvm.org/D116511	2022-01-20 12:57:19 -05:00
Lucas Prates	283f5a198a	[GlobalISel] Fix incorrect sign extension when combining G_INTTOPTR and G_PTR_ADD The GlobalISel combiner currently uses sign extension when manipulating the LHS constant when combining a sequence of the following sequence of machine instructions into a single constant: ``` %0:_(s32) = G_CONSTANT i32 <CONSTANT> %1:_(p0) = G_INTTOPTR %0:_(s32) %2:_(s64) = G_CONSTANT i64 <CONSTANT> %3:_(p0) = G_PTR_ADD %1:_, %2:_(s64) ``` This causes an issue when the bit width of the first contant and the target pointer size are different, as G_INTTOPTR has no sign extension semantics. This patch fixes this by capture an arbitrary precision in when matching the constant, allowing the matching function to correctly zero extend it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D116941	2022-01-20 17:02:52 +00:00
Mircea Trofin	f29256a64a	[MLGO] Improved support for AOT cross-targeting scenarios The tensorflow AOT compiler can cross-target, but it can't run on (for example) arm64. We added earlier support where the AOT-ed header and object would be built on a separate builder and then passed at build time to a build host where the AOT compiler can't run, but clang can be otherwise built. To simplify such scenarios given we now support more than one AOT-able case (regalloc and inliner), we make the AOT scenario centered on whether files are generated, case by case (this includes the "passed from a different builder" scenario). This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of case by case control. A builder can opt out of an AOT case by passing that case's model path as `none`. Note that the overrides still take precedence. This patch controls conditional compilation with case-specific flags, which can be enabled locally, for the component where those are available. We still keep an overall flag for some tests. The 'development/training' mode is unchanged, because there the model is passed from the command line and interpreted. Differential Revision: https://reviews.llvm.org/D117752	2022-01-20 07:05:39 -08:00
Nikita Popov	81d35f27dd	[DebugInstrRef] Memoize variable order during sorting (NFC) Instead of constructing DebugVariables and looking up the order in the comparison function, compute the order upfront and then sort a vector of (order, instr). This improves compile-time by -0.4% geomean on CTMark ReleaseLTO-g. Differential Revision: https://reviews.llvm.org/D117575	2022-01-20 16:04:24 +01:00
Victor Perez	c10c748878	[LegalizeTypes][VP] Add widening support for vp.gather and vp.scatter Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117557	2022-01-20 08:57:57 +00:00

1 2 3 4 5 ...

31928 Commits