llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Ian Levesque	68a1f09107	[xray] Honor xray-never function-instrument attribute function-instrument=xray-never wasn't actually honored before. We were getting lucky that it worked because CodeGenFunction would omit the other xray attributes when a function was annotated with xray_never_instrument. This patch adds proper support. Differential Revision: https://reviews.llvm.org/D89441	2021-01-19 18:47:09 -05:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Jessica Paquette	cbf5246359	Fix buildbot after `cfc6073017` Windows buildbots were not happy with using find_if + instructionsWithoutDebug. In `cfc6073017`, instructionsWithoutDebug is not technically necessary. So, just iterate over the block directly. http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio	2021-01-19 10:38:04 -08:00
Jessica Paquette	cfc6073017	[GlobalISel] Combine (a[0]) \| (a[1] << k1) \| ...\| (a[m] << kn) into a wide load This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`. (See D27861) This tries to recognize patterns like below (assuming a little-endian target): ``` s8* x = ... s32 val = a[0] \| (a[1] << 8) \| (a[2] << 16) \| (a[3] << 24) -> s32 val = ((i32)a) s8 x = ... s32 val = a[3] \| (a[2] << 8) \| (a[1] << 16) \| (a[0] << 24) -> s32 val = BSWAP(*((s32)a)) ``` (This patch also handles the big-endian target case as well, in which the first example above has a BSWAP, and the second example above does not.) To recognize the pattern, this searches from the last G_OR in the expression tree. E.g. ``` Reg Reg \ / OR_1 Reg \ / OR_2 \ Reg .. / Root ``` Each non-OR register in the tree is put in a list. Each register in the list is then checked to see if it's an appropriate load + shift logic. If every register is a load + potentially a shift, the combine checks if those loads + shifts, when OR'd together, are equivalent to a wide load (possibly with a BSWAP.) To simplify things, this patch (1) Only handles G_ZEXTLOADs (which appear to be the common case) (2) Only works in a single MachineBasicBlock (3) Only handles G_SHL as the bit twiddling to stick the small load into a specific location An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from test/CodeGen/AArch64/load-combine.ll) At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3, and a 0.4% improvement for CTMark/7zip-benchmark. Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was the first instruction in the block. Differential Revision: https://reviews.llvm.org/D94350	2021-01-19 10:24:27 -08:00
Luo, Yuanke	c535a7fdad	[X86] Fix tile spill merge issue. This is a additional bug fix for `c5be0e0cc0`. The distance for the spill instructions is wrong in previous patch. Differential Revision: https://reviews.llvm.org/D94772	2021-01-19 10:51:42 +08:00
Chen Zheng	a9b3303a88	Revert "[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike." This reverts commit `3bdf4507b6`. Post commit comments need to be addressed first.	2021-01-18 21:33:31 -05:00
Craig Topper	79e798aca3	Recommit "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This recommits `2c51bef76c`. I've fixed the broken check line from when I renamed the test function. Original commit message: This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount.	2021-01-18 11:08:28 -08:00
Craig Topper	5d431c3d32	Revert "[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results." This reverts commit `2c51bef76c`. I seem to have messed up the check lines in the test.	2021-01-18 11:00:20 -08:00
Craig Topper	2c51bef76c	[RISCV] Add a test of vector sadd.overflow to demonstrate intrinsics with multiple scalable vector results. This builds on D94142 where scalable vectors are allowed in structs. I did have to fix one scalable vector issue in the vector type creation for these intrinsics where we used getVectorNumElements instead of ElementCount. Differential Revision: https://reviews.llvm.org/D94149	2021-01-18 10:41:36 -08:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Denis Antrushin	f7443905af	[Statepoint] Handle `undef` operands in statepoint. Currently when spilling statepoint register operands in FixupStatepoints we do not pay attention that it might be `undef`. We just generate a spill, which may lead to verifier error because we have a use without def. To handle it, let FixupStateponts ignore `undef` register operands completely and change them to some constant value when generating stack map. Use same value as used by ISel for this purpose (0xFEFEFEFE). Reviewed By: reames Differential Revision: https://reviews.llvm.org/D94703	2021-01-18 15:20:54 +03:00
Tres Popp	3bd24574c7	Revert "[PowerPC] support register pressure reduction in machine combiner." This reverts commit `26a396c4ef`. See https://reviews.llvm.org/D92071 for a description of the issue.	2021-01-18 12:01:57 +01:00
Simon Pilgrim	207f32948b	[DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532	2021-01-18 10:29:23 +00:00
Craig Topper	cfec6cd50c	[IR] Allow scalable vectors in structs to support intrinsics returning multiple values. RISC-V would like to use a struct of scalable vectors to return multiple values from intrinsics. This woud also be needed for target independent intrinsics like llvm.sadd.overflow. This patch removes the existing restriction for this. I've modified StructType::isSized to consider a struct containing scalable vectors as unsized so the verifier won't allow loads/stores/allocas of these structs. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94142	2021-01-17 23:29:51 -08:00
Chen Zheng	26a396c4ef	[PowerPC] support register pressure reduction in machine combiner. Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071	2021-01-17 23:56:13 -05:00
Qiu Chaofan	f776d8b12f	[Legalizer] Promote result type in expanding FP_TO_XINT This patch promotes result integer type of FP_TO_XINT in expanding. So crash in conversion from ppc_fp128 to i1 will be fixed. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92473	2021-01-18 11:56:11 +08:00
Chen Zheng	3bdf4507b6	[NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike. add one use check to lookThruCopyLike. The root node is safe to be deleted if we are sure that every definition in the copy chain only has one use. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92069	2021-01-17 19:56:42 -05:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Bjorn Pettersson	4f15556731	[LegalizeDAG] Handle NeedInvert when expanding BR_CC This is a follow-up fix to commit `03c8d6a0c4`. Seems like we now end up with NeedInvert being set in the result from LegalizeSetCCCondCode more often than in the past, so we need to handle NeedInvert when expanding BR_CC. Not sure how to deal with the "Tmp4.getNode()" case properly, but current assumption is that that code path isn't impacted by the changes in `03c8d6a0c4` so we can simply move the old assert into the if-branch and only handle NeedInvert in the else-branch. I think that the test case added here, for PowerPC, might have failed also before commit `03c8d6a0c4`. But we started to hit the assert more often downstream when having merged that commit. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94762	2021-01-16 14:33:19 +01:00
Jeroen Dobbelaere	668827b648	Introduce llvm.noalias.decl intrinsic The ``llvm.experimental.noalias.scope.decl`` intrinsic identifies where a noalias scope is declared. When the intrinsic is duplicated, a decision must also be made about the scope: depending on the reason of the duplication, the scope might need to be duplicated as well. Reviewed By: nikic, jdoerfert Differential Revision: https://reviews.llvm.org/D93039	2021-01-16 09:20:45 +01:00
Kazu Hirata	8fd8ff1f67	[StringExtras] Rename SubsequentDelim to ListSeparator This patch renames SubsequentDelim to ListSeparator to clarify the purpose of the class. Differential Revision: https://reviews.llvm.org/D94649	2021-01-15 21:00:56 -08:00
Craig Topper	a9e939760c	[CodeGen] Removes unwanted optimisation for TargetConstantFP This 'FIXME' popped up in the development of an out-of-tree backend. Quick fix, but first llvm upstream patch, therefore I do not have commit rights, so if approved please commit? - Test is not included as this came up in an out-of-tree backend (if required, please hint on how to test this). Patch by simveg (Simon) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93219	2021-01-15 11:52:53 -08:00
Craig Topper	4c5066b078	[TargetLowering] Don't speculatively call ComputeNumSignBits. NFC These methods are recursive so a little costly. We only look at the result in one place in this function and it's conditional. We also only need the second call if the first had enough returned enough sign bits.	2021-01-15 09:09:35 -08:00
Simon Pilgrim	46aa3c6c33	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - improve shuffle(shuffle(x,y),shuffle(x,y)) merging MergeInnerShuffle currently attempts to merge shuffle(shuffle(x,y),z) patterns into a single shuffle, using 1 or 2 of the x,y,z ops. However if we already match 2 ops we might be able to handle the third op if its also a shuffle that references one of the previous ops, allowing us to handle some cases like: shuffle(shuffle(x,y),shuffle(x,y)) shuffle(shuffle(shuffle(x,z),y),z) shuffle(shuffle(x,shuffle(x,y)),z) etc. This isn't an exhaustive match and is dependent on the order the candidate ops are encountered - if one of the matched ops was a shuffle that was peek-able we don't go back and try to split that, I haven't found much need for that amount of analysis yet. This is a preliminary patch that will allow us to later improve x86 HADD/HSUB matching - but needs to be reviewed separately as its in generic code and affects existing Thumb2 tests. Differential Revision: https://reviews.llvm.org/D94671	2021-01-15 15:08:31 +00:00
Kazu Hirata	7dc3575ef2	[llvm] Remove redundant return and continue statements (NFC) Identified with readability-redundant-control-flow.	2021-01-14 20:30:34 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Kazu Hirata	9bcc0d1040	[CodeGen, Transforms] Use llvm::sort (NFC)	2021-01-14 20:30:31 -08:00
Jay Foad	868da2ea93	[SelectionDAG] Remove an early-out from computeKnownBits for smin/smax Even if we know nothing about LHS, it can still be useful to know that smax(LHS, RHS) >= RHS and smin(LHS, RHS) <= RHS. Differential Revision: https://reviews.llvm.org/D87145	2021-01-14 18:15:17 +00:00
Jay Foad	517196e569	[Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC. Differential Revision: https://reviews.llvm.org/D94588	2021-01-14 14:02:43 +00:00
Jay Foad	a1cba5b7a1	[SelectionDAG] Make use of KnownBits::commonBits. NFC. Differential Revision: https://reviews.llvm.org/D94587	2021-01-14 14:02:43 +00:00
Simon Pilgrim	7c30c05ff7	[DAG] visitVECTOR_SHUFFLE - MergeInnerShuffle - reset shuffle ops and reorder early-out and second op matching. NFCI. I'm hoping to reuse MergeInnerShuffle in some other folds - so ensure the candidate ops/mask are reset at the start of each run. Also, move the second op matching before bailing to make it simpler to try to match other things afterward.	2021-01-14 11:55:20 +00:00
Simon Pilgrim	af8d27a7a8	[DAG] visitVECTOR_SHUFFLE - pull out shuffle merging code into lambda helper. NFCI. Make it easier to reuse in a future patch.	2021-01-14 11:05:19 +00:00
David Stuttard	259936f491	[NFC][AsmPrinter] Windows warning: Use explicit cast static_cast for uint64_t to unsigned gives a MS VC build warning for Windows: warning C4309: 'static_cast': truncation of constant value Use an explicit cast instead. Change-Id: I692d335b4913070686a102780c1fb05b893a2f69 Differential Revision: https://reviews.llvm.org/D94592	2021-01-14 09:10:31 +00:00
Kazu Hirata	125ea20d55	[llvm] Use llvm::stable_sort (NFC)	2021-01-13 19:14:43 -08:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Simon Pilgrim	993c488ed2	[DAG] visitVECTOR_SHUFFLE - use all_of to check for all-undef shuffle mask. NFCI.	2021-01-13 17:19:41 +00:00
Matt Arsenault	d55d592a92	GlobalISel: Do not set observer of MachineIRBuilder in LegalizerHelper This fixes double printing of insertion debug messages in the legalizer. Try to cleanup usage of observers. Currently the use of observers is pretty hard to follow and it's not clear what is responsible for them. Observers are referenced in 3 places: 1. In the MachineFunction 2. In the MachineIRBuilder 3. In the LegalizerHelper The observers in the MachineFunction and MachineIRBuilder are both called only on insertions, and are redundant with each other. The source of the double printing was the same observer was added to both the MachineFunction, and the MachineIRBuilder. One of these references needs to be removed. Arguably observers in general should be fully removed from one or the other, but it may be useful to have a local observer in the MachineIRBuilder that is not added to the function's observers. Alternatively, the wrapper observer could manage a local observer in one place. The LegalizerHelper only ever calls the observer on changing/changed instructions, and never insertions. Logically these are two different types of observers, for changes and for insertions. Additionally, some places used the GISelObserverWrapper when they only needed a single observer they could use directly. Setting the observer in the LegalizerHelper constructor is not flexible enough if the LegalizerHelper is constructed anywhere outside the one used by the legalizer. AMDGPU calls the LegalizerHelper in RegBankSelect, and needs to use a local observer to apply the regbank to newly created instructions. Currently it accomplishes this by constructing a local MachineIRBuilder. I'm trying to move the MachineIRBuilder to be owned/maintained by the RegBankSelect pass itself, but the locally constructed LegalizerHelper would reset the observer. Mips also has a special case use of the LegalizationArtifactCombiner in applyMappingImpl; I think we do need to run the artifact combiner during RegBankSelect, but in a more consistent way outside of applyMappingImpl.	2021-01-13 10:44:31 -05:00
Kerry McLaughlin	2170e0ee60	[SVE][CodeGen] CTLZ, CTTZ & CTPOP operations (predicates) Canonicalise the following operations in getNode() for predicate types: - CTLZ(Pred) -> bitwise_NOT(Pred) - CTTZ(Pred) -> bitwise_NOT(Pred) - CTPOP(Pred) -> Pred Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D94428	2021-01-13 12:24:54 +00:00
Serguei Katkov	8f8c207b8f	[Verifier] Add tied-ness verification to statepoint intsruction Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94483	2021-01-13 14:40:44 +07:00
Kazu Hirata	2c2d489b78	[CodeGen] Remove unused function isRegLiveInExitBlocks (NFC) The last use was removed on Jan 17, 2020 in commit `42350cd893`.	2021-01-12 21:43:48 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
Serguei Katkov	fba9805ba3	[Verifier] Extend statepoint verifier to cover more constants Also old mir tests are updated to meet last changes in STATEPOINT format. Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94482	2021-01-13 11:51:48 +07:00
Serguei Katkov	157efd84ab	[Statepoint Lowering] Add an option to allow use gc values in regs for landing pad Default value is not changed, so it is NFC actually. The option allows to use gc values on registers in landing pads. Reviewers: reames, dantrushin Reviewed By: reames, dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94469	2021-01-13 11:39:34 +07:00
Serguei Katkov	f454c9f102	[InlineSpiller] Re-tie operands if folding failed InlineSpiller::foldMemoryOperand unties registers before an attempt to fold and does not restore tied-ness in case of failure. I do not have a particular test for demo of invalid behavior. This is something of clean-up. It is better to keep the behavior correct in case some time in future it happens. Reviewers: reames, dantrushin Reviewed By: dantrushin, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94389	2021-01-13 10:31:43 +07:00
Juneyoung Lee	25eb7b08ba	[DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND) This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 . When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted. It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag. The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however. This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`. To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB. It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however. I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction). Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB. Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92015	2021-01-13 09:36:52 +09:00
Craig Topper	03c8d6a0c4	[LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO. If SETO/SETUO aren't legal, they'll be expanded and we'll end up with 3 comparisons. SETONE is equivalent to (SETOGT \|\| SETOLT) so if one of those operations is supported use that expansion. We don't need both since we can commute the operands to make the other. SETUEQ can be implemented with !(SETOGT \|\| SETOLT) or (SETULE && SETUGE). I've only implemented the first because it didn't look like most of the affected targets had legal SETULE/SETUGE. Reviewed By: frasercrmck, tlively, nemanjai Differential Revision: https://reviews.llvm.org/D94450	2021-01-12 10:45:03 -08:00
Jay Foad	f264f9ad7d	[SlotIndexes] Fix and simplify basic block splitting Remove the InsertionPoint argument from SlotIndexes::insertMBBInMaps because it was confusing: what does it mean to insert a new block between two instructions, in the middle of an existing block? Instead, support the case that MachineBasicBlock::splitAt really needs, where the new block contains some instructions that are already in the maps because they have been moved there from the tail of the previous block. In all other use cases the new block is empty. Based on work by Carl Ritson! Differential Revision: https://reviews.llvm.org/D94311	2021-01-12 10:50:14 +00:00
Sander de Smalen	c8a914db5c	[LiveDebugValues] Fix comparison operator in VarLocBasedImpl The issue was introduced in commit rG84a1120943a651184bae507fed5d648fee381ae4 and would cause a VarLoc's StackOffset to be compared with its own, instead of the StackOffset from the other VarLoc. This patch fixes that.	2021-01-12 08:44:58 +00:00
Craig Topper	df74c001fa	[DAGCombiner] Replace static helper function isConstantFPBuildVectorOrConstantFP with the identical version in SelectionDAG. NFC	2021-01-11 23:41:40 -08:00
Craig Topper	f9ef3a6003	[SelectionDAG] Make isConstantIntBuildVectorOrConstantInt and isConstantFPBuildVectorOrConstantFP methods const.	2021-01-11 23:26:53 -08:00
Craig Topper	b1c304c494	[CodeGen] Try to make the print of memory operand alignment a little more user friendly. Memory operands store a base alignment that does not factor in the effect of the offset on the alignment. Previously the printing code only printed the base alignment if it was different than the size. If there is an offset, the reader would need to figure out the effective alignment themselves. This has confused me before and someone else was recently confused on IRC. This patch prints the possibly offset adjusted alignment if it is different than the size. And prints the base alignment if it is different than the alignment. The MIR parser has been updated to read basealign in addition to align. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D94344	2021-01-11 19:58:47 -08:00
David Stuttard	5464baaae8	Fix minor build issue (NFC) Change [x86] Fix tile register spill issue was causing problems for our build using gcc-5.4.1 The problem was caused by this line: for (const MachineInstr &MI : make_range(MIS.begin(), MI)) where MI was previously defined as a MachineBasicBlock iterator. Differential Revision: https://reviews.llvm.org/D94415	2021-01-11 11:24:09 -08:00
Paul Robinson	1f9c29228c	[FastISel] NFC: Clean up unnecessary bookkeeping Now that we flush the local value map for every instruction, we don't need any extra flushes for specific cases. Also, LastFlushPoint is not used for anything. Follow-ups to #c161665 (D91734). This reapplies #3fd39d3. Differential Revision: https://reviews.llvm.org/D92338	2021-01-11 09:40:39 -08:00
Paul Robinson	be179b9946	[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option This option is not used for anything after #c161665 (D91737). This commit reapplies #a474657.	2021-01-11 09:32:49 -08:00
Paul Robinson	c161775dec	[FastISel] Flush local value map on every instruction Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after() every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. In addition, constants materialized due to PHI instructions are not assigned a debug location immediately; instead, when the local value map is flushed, if the first local value instruction has no debug location, it is given the same location as the first non-local-value-map instruction. This prevents PHIs from introducing unattributed instructions, which would either be implicitly attributed to the location for the preceding IR instruction, or given line 0 if they are at the beginning of a machine basic block. Neither of those consequences is good for debugging. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. () Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. This reapplies commits `cf1c774d` and `dc35368c`, and adds the modification to PHI handling, which should avoid problems with debugging under gdb. Differential Revision: https://reviews.llvm.org/D91734	2021-01-11 08:32:36 -08:00
Joe Ellis	007358239d	[DAGCombiner] Use getVectorElementCount inside visitINSERT_SUBVECTOR This avoids TypeSize-/ElementCount-related warnings. Differential Revision: https://reviews.llvm.org/D92747	2021-01-11 14:15:11 +00:00
Luo, Yuanke	c5be0e0cc0	[X86] Fix tile register spill issue. The tile register spill need 2 instructions. %46:gr64_nosp = MOV64ri 64 TILESTORED %stack.2, 1, killed %46:gr64_nosp, 0, $noreg, %43:tile The first instruction load the stride to a GPR, and the second instruction store tile register to stack slot. The optimization of merge spill instruction is done after register allocation. And spill tile register need create a new virtual register to for stride, so we can't hoist tile spill instruction in postOptimization() of register allocation. We can't hoist TILESTORED alone and we can't hoist the 2 instuctions together because MOV64ri will clobber some GPR. This patch is to disble the spill merge for any spill which need 2 instructions. Differential Revision: https://reviews.llvm.org/D93898	2021-01-11 18:35:09 +08:00
Hsiangkai Wang	5e476061de	[NFC][AsmPrinter] Make comments for spill/reload more precise. The size of spill/reload may be unknown for scalable vector types. When the size is unknown, print it as "Unknown-size" instead of a very large number. Differential Revision: https://reviews.llvm.org/D94299	2021-01-11 15:00:27 +08:00
QingShan Zhang	7539c75bb4	[DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent. As the direction is to remove this global option, we need to remove the unsafe-fp-math check for sqrt and update the test with afn fast-math flags. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D93891	2021-01-11 02:25:53 +00:00
Kazu Hirata	407b1e65a4	[StringExtras] Add a helper class for comma-separated lists This patch introduces a helper class SubsequentDelim to simplify loops that generate a comma-separated lists. For example, consider the following loop, taken from llvm/lib/CodeGen/MachineBasicBlock.cpp: for (auto I = pred_begin(), E = pred_end(); I != E; ++I) { if (I != pred_begin()) OS << ", "; OS << printMBBReference(I); } The new class allows us to rewrite the loop as: SubsequentDelim SD; for (auto I = pred_begin(), E = pred_end(); I != E; ++I) OS << SD << printMBBReference(I); where SD evaluates to the empty string for the first time and ", " for subsequent iterations. Unlike interleaveComma, defined in llvm/include/llvm/ADT/STLExtras.h, SubsequentDelim can accommodate a wider variety of loops, including: - those that conditionally skip certain items, - those that need iterators to call getSuccProbability(I), and - those that iterate over integer ranges. As an example, this patch cleans up MachineBasicBlock::print. Differential Revision: https://reviews.llvm.org/D94377	2021-01-10 14:32:02 -08:00
Kazu Hirata	e3d3dbd339	[llvm] Ensure newlines at the end of files (NFC) This patch eliminates pesky "No newline at end of file" messages from git diff.	2021-01-10 09:24:57 -08:00
Kazu Hirata	9850d3b10a	[CodeGen, DebugInfo] Use llvm::find_if (NFC)	2021-01-10 09:24:53 -08:00
Juneyoung Lee	9f2d9364b0	[CodeGen] Update transformations to use poison for shufflevector/insertelem's initial vector elem This patch is a part of D93817 and makes transformations in CodeGen use poison for shufflevector/insertelem's initial vector element. The change in CodeGenPrepare.cpp is fine because the mask of shufflevector should be always zero. It doesn't touch the second element (which is poison). The change in InterleavedAccessPass.cpp is also fine becauses the mask is of the form <a, a+m, a+2m, .., a+km> where a+km is smaller than the size of the first vector operand. This is guaranteed by the caller of replaceBinOpShuffles, which is lowerInterleavedLoad. It calls isDeInterleaveMask and isDeInterleaveMaskOfFactor to check the mask is the desirable form. isDeInterleaveMask has the check that a+km is smaller than the vector size. To check my understanding, I added an assertion & added a test to show that this optimization doesn't fire in such case. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D94056	2021-01-10 18:03:51 +09:00
Fraser Cormack	41d06095b0	[SelectionDAG] Teach isConstOrConstSplat about ISD::SPLAT_VECTOR This improves llvm::isConstOrConstSplat by allowing it to analyze ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of operations using scalable vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94168	2021-01-09 20:54:34 +00:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Fraser Cormack	de373ef779	[SelectionDAG] Extend immAll(Ones\|Zeros)V to handle ISD::SPLAT_VECTOR The TableGen immAllOnesV and immAllZerosV helpers implicitly wrapped the ISD::isBuildVectorAll(Ones\|Zeros) helper functions. This was inhibiting their use for targets such as RISC-V which use ISD::SPLAT_VECTOR. In particular, RISC-V had to define its own 'vnot' fragment. In order to extend the scope of these nodes to include support for ISD::SPLAT_VECTOR, two new ISD predicate functions have been introduced: ISD::isConstantSplatVectorAll(Ones\|Zeros). These effectively supersede the older "isBuildVector" predicates, which are now simple wrappers for the new functions. They pass a defaulted boolean toggle which preserves the old behaviour. It is hoped that in time all call-sites can be ported to the "isConstantSplatVector" functions. While the use of ISD::isBuildVectorAll(Ones\|Zeros) has not changed, the behaviour of the TableGen immAll(Ones\|Zeros)V has. To test the new functionality, the custom RISC-V TableGen fragment has been removed and replaced with the built-in 'vnot'. To test their use as pattern-roots, two splat patterns have been updated accordingly. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94223	2021-01-09 17:05:31 +00:00
Heejin Ahn	52e240a072	[WebAssembly] Remove exnref and br_on_exn This removes `exnref` type and `br_on_exn` instruction. This is effectively NFC because most uses of these were already removed in the previous CLs. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94041	2021-01-09 02:02:54 -08:00
Heejin Ahn	9e4eadeb13	[WebAssembly] Update basic EH instructions for the new spec This implements basic instructions for the new spec. - Adds new versions of instructions: `catch`, `catch_all`, and `rethrow` - Adds support for instruction selection for the new instructions - `catch` needs a custom routine for the same reason `throw` needs one, to encode `__cpp_exception` tag symbol. - Updates `WebAssembly::isCatch` utility function to include `catch_all` and Change code that compares an instruction's opcode with `catch` to use that function. - LateEHPrepare - Previously in LateEHPrepare we added `catch` instruction to both `catchpad`s (for user catches) and `cleanuppad`s (for destructors). In the new version `catch` is generated from `llvm.catch` intrinsic in instruction selection phase, so we only need to add `catch_all` to the beginning of cleanup pads. - `catch` is generated from instruction selection, but we need to hoist the `catch` instruction to the beginning of every EH pad, because `catch` can be in the middle of the EH pad or even in a split BB from it after various code transformations. - Removes `addExceptionExtraction` function, which was used to generate `br_on_exn` before. - CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this function on the new instruction causes crashes, and the new version will be added in a later CL, whose contents will be completely different. So deleting the whole function will make the diff easier to read. - Reenables all disabled tests in exception.ll and eh-lsda.ll and a single basic test in cfg-stackify-eh.ll. - Updates existing tests to use the new assembly format. And deletes `br_on_exn` instructions from the tests and FileCheck lines. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94040	2021-01-09 01:48:06 -08:00
Heejin Ahn	9724c3cff4	[WebAssembly] Update WasmEHPrepare for the new spec Clang generates `wasm.get.exception` and `wasm.get.ehselector` intrinsics, which respectively return a caught exception value (a pointer to some C++ exception struct) and a selector (an integer value that tells which C++ `catch` clause the current exception matches, or does not match any). WasmEHPrepare is a pass that does some IR-level preparation before instruction selection. Previously one of things we did in this pass was to convert `wasm.get.exception` intrinsic calls to `wasm.extract.exception` intrinsics. Their semantics were the same except `wasm.extract.exception` did not have a token argument. We maintained these two separate intrinsics with the same semantics because instruction selection couldn't handle token arguments. This `wasm.extract.exception` intrinsic was later converted to `extract_exception` instruction in instruction selection, which was a pseudo instruction to implement `br_on_exn`. Because `br_on_exn` pushed an extracted value onto the value stack after the `end` instruction of a `block`, but LLVM does not have a way of modeling that kind of behavior, so this pseudo instruction was used to pull an extracted value out of thin air, like this: ``` block $l0 ... br_on_exn $cpp_exception $l0 ... end extract_exception ;; pushes values onto the stack ``` In the new spec, we don't need this pseudo instruction anymore because `catch` itself returns a value and we don't have `br_on_exn` anymore. In the spec `catch` returns multiple values (like `br_on_exn`), but here we assume it only returns a single i32, which is sufficient to support C++. So this renames `wasm.get.exception` intrinsic to `wasm.catch`. Because this CL does not yet contain instruction selection for `wasm.catch` intrinsic, all `RUN` lines in exception.ll, eh-lsda.ll, and cfg-stackify-eh.ll, and a single `RUN` line in wasm-eh.cpp (which is an end-to-end test from C++ source to assembly) fail. So this CL temporarily disables those `RUN` lines, and for those test files without any valid remaining `RUN` lines, adds a dummy `RUN` line to make them pass. These tests will be reenabled in later CLs. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94039	2021-01-08 23:38:26 -08:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00
Heejin Ahn	7be271537e	[WebAssembly] Rename wasm_rethrow_in_catch intrinsic/builtin `wasm_rethrow_in_catch` intrinsic and builtin are used in order to rethrow an exception when the exception is caught but there is no matching clause within the current `catch`. For example, ``` try { foo(); } catch (int n) { ... } ``` If the caught exception does not correspond to C++ `int` type, it should be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch` because at the time I thought there would be another intrinsic for C++'s `throw` keyword, which rethrows an exception. It turned out that `throw` keyword doesn't require wasm's `rethrow` instruction, so we rename `rethrow_in_catch` to just `rethrow` here. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D94038	2021-01-08 06:55:04 -08:00
Simon Moll	611d3c63f3	[VP] ISD helper functions [VE] isel for vp_add, vp_and This implements vp_add, vp_and for the VE target by lowering them to the VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode, getVPMaskIdx, getVPExplicitVectorLengthIdx). Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D93766	2021-01-08 14:29:45 +01:00
Sjoerd Meijer	8af859d514	[MachineLoop] New helper isLoopInvariant() This factors out code from MachineLICM that determines whether an instruction is loop-invariant, which is a generally useful function. Thus this allows to use that helper elsewhere too. Differential Revision: https://reviews.llvm.org/D94082	2021-01-08 09:04:56 +00:00
Christudasan Devadasan	ae25a397e9	AMDGPU/GlobalISel: Enable sret demotion	2021-01-08 10:56:35 +05:30
Kazu Hirata	8febb2e0f5	[CodeGen] Remove unused function isCallerPreservedOrConstPhysReg (NFC) The last use of the function was removed on Oct 20, 2018 in commit `8d6ff4c0af`.	2021-01-07 20:29:32 -08:00
Matt Arsenault	2cbbc6e87c	GlobalISel: Fail legalization on narrowing extload below memory size	2021-01-07 17:40:34 -05:00
Matt Arsenault	1f9b6ef91f	GlobalISel: Add combine for G_UREM by power of 2 Really I want this in the legalizer, but this is a start.	2021-01-07 16:36:35 -05:00
Wouter van Oortmerssen	5c38ae36c5	[WebAssembly] Fixed byval args missing DWARF DW_AT_LOCATION A struct in C passed by value did not get debug information. Such values are currently lowered to a Wasm local even in -O0 (not to an alloca like on other archs), which becomes a Target Index operand (TI_LOCAL). The DWARF writing code was not emitting locations in for TI's specifically if the location is a single range (not a list). In addition, the ExplicitLocals pass which removes the ARGUMENT pseudo instructions did not update the associated DBG_VALUEs, and couldn't even find these values since the code assumed such instructions are adjacent, which is not the case here. Also fixed asm printing of TIs needed by a test. Differential Revision: https://reviews.llvm.org/D94140	2021-01-07 10:31:38 -08:00
Matt Arsenault	c9122ddef5	CodeGen: Refactor regallocator command line and target selection Make the sequence of passes to select and rewrite instructions to physical registers be a target callback. This is to prepare to allow targets to split register allocation into multiple phases.	2021-01-07 13:13:25 -05:00
Simon Pilgrim	350ab7aa1c	[DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead. This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers. Differential Revision: https://reviews.llvm.org/D93599	2021-01-07 12:03:19 +00:00
Sanjoy Das	a855c9403f	[NFC] Don't copy MachineFrameInfo on each invocation of HasAlias Also fix a typo in a comment. This fixes a compile time issue in XLA (https://www.tensorflow.org/xla). Differential Revision: https://reviews.llvm.org/D94182	2021-01-06 18:59:20 -08:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Simon Pilgrim	1307e3f6c4	[TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support.	2021-01-06 16:13:51 +00:00
Sander de Smalen	aa280c99f7	[AArch64][SVE] Emit DWARF location expr for SVE (dbg.declare) When using dbg.declare, the debug-info is generated from a list of locals rather than through DBG_VALUE instructions in the MIR. This patch is different from D90020 because it emits the DWARF location expressions from that list of locals directly. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D90044	2021-01-06 11:45:05 +00:00
Sander de Smalen	84a1120943	[LiveDebugValues] Handle spill locations with a fixed and scalable component. This patch fixes the two LiveDebugValues implementations (InstrRef/VarLoc)Based to handle cases where the StackOffset contains both a fixed and scalable component. This depends on the `TargetRegisterInfo::prependOffsetExpression` being added in D90020. Feel free to leave comments on that patch if you have them. Reviewed By: djtodoro, jmorse Differential Revision: https://reviews.llvm.org/D90046	2021-01-06 11:30:13 +00:00
Sander de Smalen	a7e3339f3b	[AArch64][SVE] Emit DWARF location expression for SVE stack objects. Extend PEI to emit a DWARF expression for StackOffsets that have a fixed and scalable component. This means the expression that needs to be added is either: <base> + offset or: <base> + offset + scalable_offset * scalereg where for SVE, the scale reg is the Vector Granule Dwarf register, which encodes the number of 64bit 'granules' in an SVE vector and which the debugger can evaluate at runtime. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D90020	2021-01-06 09:40:53 +00:00
Kazu Hirata	cea1c63756	[MachineSink] Construct SmallVector with iterator ranges (NFC)	2021-01-05 21:15:57 -08:00
Christudasan Devadasan	d68458bd56	[GlobalISel] Base implementation for sret demotion. If the return values can't be lowered to registers SelectionDAG performs the sret demotion. This patch contains the basic implementation for the same in the GlobalISel pipeline. Furthermore, targets should bring relevant changes during lowerFormalArguments, lowerReturn and lowerCall to make use of this feature. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92953	2021-01-06 10:30:50 +05:30
David Blaikie	ad18b075fd	DebugInfo: Add support for always using ranges (rather than low/high pc) in DWARFv5 Given the ability provided by DWARFv5 rnglists to reuse addresses in the address pool, it can be advantageous to object file size to use range encodings even when the range could be described by a direct low/high pc. Add a flag to allow enabling this in DWARFv5 for the purpose of experimentation/data gathering. It might be that it makes sense to enable this functionality by default for DWARFv5 + Split DWARF at least, where the tradeoff/desire to optimize for .o file size is more explicit and .o bytes are higher priority than .dwo bytes.	2021-01-05 16:36:22 -08:00
Craig Topper	4ef91f5871	[DAGCombiner] Don't speculatively create an all ones constant in visitREM that might not be used. This looks to have been done to save some duplicated code under two different if statements, but it ends up being harmful to D94073. This speculative constant can be called on a scalable vector type with i64 element size when i64 scalars aren't legal. The code tries and fails to find a vector type with i32 elements that it can use. So only create the node when we know it will be used.	2021-01-05 12:45:57 -08:00
Matt Arsenault	a427f15d60	GlobalISel: Add isKnownToBeAPowerOfTwo helper function	2021-01-05 12:59:08 -05:00
Jinsong Ji	f26bc0ddd5	[RegisterClassInfo] Return non-zero for RC without allocatable reg In some case, the RC may have 0 allocatable reg. eg: VRSAVERC in PowerPC, which has only 1 reg, but it is also reserved. The curreent implementation will keep calling the computePSetLimit because getRegPressureSetLimit assume computePSetLimit will return a non-zero value. The fix simply early return the value from TableGen for such special case. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92907	2021-01-05 16:18:34 +00:00
Fraser Cormack	9a1ac97d3a	[CodeGen] Format SelectionDAG::getConstant methods (NFC)	2021-01-05 12:59:46 +00:00
QingShan Zhang	2962f1149c	[NFC] Add the getSizeInBytes() interface for MachineConstantPoolValue Current implementation assumes that, each MachineConstantPoolValue takes up sizeof(MachineConstantPoolValue::Ty) bytes. For PowerPC, we want to lump all the constants with the same type as one MachineConstantPoolValue to save the cost that calculate the TOC entry for each const. So, we need to extend the MachineConstantPoolValue that break this assumption. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89108	2021-01-05 03:22:45 +00:00
Roman Lebedev	b4f519bddd	[NFCI] DwarfEHPrepare: update DomTree in non-permissive mode, when present Being stricter will catch issues that would be just papered over in permissive mode, and is likely faster.	2021-01-05 01:26:36 +03:00
Cameron McInally	92be640bd7	[FPEnv][AMDGPU] Disable FSUB(-0,X)->FNEG(X) DAGCombine when subnormals are flushed This patch disables the FSUB(-0,X)->FNEG(X) DAG combine when we're flushing subnormals. It requires updating the existing AMDGPU tests to use the fneg IR instruction, in place of the old fsub(-0,X) canonical form, since AMDGPU is the only backend currently checking the DenormalMode flags. Note that this will require follow-up optimizations to make sure the FSUB(-0,X) form is handled appropriately Differential Revision: https://reviews.llvm.org/D93243	2021-01-04 14:44:10 -06:00
Kazu Hirata	eb198f4c3c	[llvm] Use llvm::any_of (NFC)	2021-01-04 11:42:47 -08:00
Florian Hahn	ed936aad78	[InterleavedAccess] Return correct 'modified' status. Both tryReplaceExtracts and replaceBinOpShuffles may modify the IR, even if no interleaved loads are generated, but currently the pass pretends no changes were made. This patch updates the pass to return true if either of the functions made any changes. In case of tryReplaceExtracts, changes are made if there are any Extracts and true is returned. `replaceBinOpShuffles` always makes changes if BinOpShuffles is not empty. It also always returned true, so I went ahead and change it to just `replaceBinOpShuffles`. Fixes PR48208. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D93997	2021-01-04 15:49:47 +00:00
Kazu Hirata	ba82c0b315	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-01-03 09:57:47 -08:00
Brandon Bergren	8f004471c2	[PowerPC] Add the LLVM triple for powerpcle [1/5] Add a triple for powerpcle--. This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations: 1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs. Such a loader is implemented as a freestanding ELF32 LSB binary. 2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls. 3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D93918	2021-01-02 12:17:22 -06:00
Kazu Hirata	171c5fd43e	[llvm] Use llvm::erase_value and llvm::erase_if (NFC)	2021-01-02 09:24:15 -08:00
Roman Lebedev	f4ea21947d	[NFCI][CodeGen] DwarfEHPrepare: don't actually pass DTU into simplifyCFG by default also, don't verify DomTree unless we intend to maintain it. This is a very dumb think-o, i guess i was even warned about it by subconsciousness in 4b80647367950ba3da6a08260487fd0dbc50a9c5's commit message.. Fixes a compile-time regression reported by Martin Storsjö in post-commit review of `2461cdb417`.	2021-01-02 14:38:52 +03:00
Yang Fan	471dec3801	[CodeGen][NFC] Fix a build warning due to an extra semicolon	2021-01-02 10:42:58 +08:00
Roman Lebedev	2461cdb417	[CodeGen][SimplifyCFG] Teach DwarfEHPrepare to preserve DomTree Once the default for SimplifyCFG flips, we can no longer pass nullptr instead of DomTree to SimplifyCFG, so we need to propagate it here. We don't strictly need to actually preserve DomTree in DwarfEHPrepare, but we might as well do it, since it's trivial.	2021-01-02 01:01:19 +03:00
Roman Lebedev	e6b1a27fb9	[NFC][CodeGen] Split DwarfEHPrepare pass into an actual transform and an legacy-PM wrapper This is consistent with the layout of other passes, and simplifies further refinements regarding DomTree handling. This is indended to be a NFC commit.	2021-01-02 01:01:19 +03:00
Roman Lebedev	c38739ad8f	[NFC] clang-format the entire DwarfEHPrepare.cpp	2021-01-02 01:01:19 +03:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Fangrui Song	d1fd72343c	Refactor how -fno-semantic-interposition sets dso_local on default visibility external linkage definitions The idea is that the CC1 default for ELF should set dso_local on default visibility external linkage definitions in the default -mrelocation-model pic mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar. The refactoring is made available by `2820a2ca3a`. Currently only x86 supports local aliases. We move the decision to the driver. There are three CC1 states: * -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable. * (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local * -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out. Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.	2020-12-31 13:59:45 -08:00
Juneyoung Lee	5cdf6ed744	[CodeGen] recognize select form of and/ors when splitting branch conditions Recently a few patches are made to move towards using select i1 instead of and/or i1 to represent "a && b"/"a \|\| b" in C/C++. "a && b" in C/C++ does not evaluate b if a is false whereas 'and a, b' in IR evaluates b and uses its result regardless of the result of a. This is problematic because it can cause miscompilation if b was an erroneous operation (https://llvm.org/pr48353). In C/C++, the result is simply false because b is not evaluated, but in IR the result is poison. The discussion at D93065 has more context about this. This patch makes two branch-splitting optimizations (one in SelectionDAGBuilder, one in CodeGenPrepare) recognize select form of and/or as well using m_LogicalAnd/Or. Since it is CodeGen, I think this is semantically ok (at least as safe as what codegen already did). Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93853	2021-01-01 04:46:10 +09:00
Kazu Hirata	7bc76fd0ec	[CodeGen] Construct SmallVector with iterator ranges (NFC)	2020-12-31 09:39:11 -08:00
Fangrui Song	f731839584	[LowerEmuTls] Copy dso_local from <var> to __emutls_v.<var> This effect is not testable until we drop the implied dso_local for ELF static/PIE defined symbols from TargetMachine::shouldAssumeDSOLocal.	2020-12-30 16:11:32 -08:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Yuanfang Chen	480936e741	Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" (again) This reverts commit `16c8f6e913` with fix. -Wswitch catched an unhandled enum value due to recent commits in TargetPassConfig.cpp.	2020-12-29 16:39:55 -08:00
Yuanfang Chen	16c8f6e913	Revert "Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline"" This reverts commit `21314940c4`. Build failure in some bots.	2020-12-29 16:29:07 -08:00
Yuanfang Chen	21314940c4	Reland "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" This reverts commit `94427af60c` (relands `4646de5d75` with fix). Use "return std::move(AsmStreamer);" instead of "return AsmStreamer;" in LVMTargetMachine::createMCStreamer. Unlike Clang, GCC seems having trouble inserting a implicit lvalue->rvalue conversion.	2020-12-29 15:17:23 -08:00
Kazu Hirata	1e3ed09165	[CodeGen] Use llvm::append_range (NFC)	2020-12-28 19:55:16 -08:00
Yuanfang Chen	94427af60c	Revert "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" This reverts commit `4646de5d75`. Some bots have build failure.	2020-12-28 17:44:22 -08:00
Yuanfang Chen	4646de5d75	[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline Following up on D67687. Please refer to the RFC here http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html `CodeGenPassBuilder` is the NPM counterpart of `TargetPassConfig` with below differences. - Debugging features (MIR print/verify, disable pass, start/stop-before/after, etc.) living in `TargetPassConfig` are moved to use PassInstrument as much as possible. (Implementation also lives in `TargetPassConfig.cpp`) - `TargetPassConfig` is a polymorphic base (virtual inheritance) to build the target-dependent pipeline whereas `CodeGenPassBuilder` is the CRTP base/helper to implement the target-dependent pipeline. The motivation is flexibility for targets to customize the pipeline, inlining opportunity, and fits the overall NPM value semantics design. - `TargetPassConfig` is a legacy immutable pass to declare hooks for targets to customize some target-independent codegen layer behavior. This is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future. Reviewed By: arsenm, aeubanks Differential Revision: https://reviews.llvm.org/D83608	2020-12-28 17:36:36 -08:00
Gabriel Hjort Åkerlund	b9a7c89d43	[MIRPrinter] Fix incorrect output of unnamed stack names The MIRParser expects unnamed stack entries to have empty names (''). In case of unnamed alloca instructions, the MIRPrinter would output '<unnamed alloca>', which caused the MIRParser to reject the generated code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D93685	2020-12-28 18:01:40 +01:00
Chen Zheng	31c2b93d83	[MachineSink] add threshold in machinesink pass to reduce compiling time.	2020-12-27 23:23:07 -05:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00
Amara Emerson	7df3544e80	[GlobalISel] Fix assertion failures after "GlobalISel: Return APInt from getConstantVRegVal" landed. APInt binary ops don't promote types but instead assert, which a combine was relying on.	2020-12-26 23:51:44 -08:00
Kazu Hirata	e457896a6e	[CodeGen] Remove unused function hasInlineAsmMemConstraint (NFC) The last use of the function was removed on Sep 13, 2010 in commit `1094c80281`.	2020-12-24 09:17:58 -08:00
Kazu Hirata	df812115e3	[CodeGen, Transforms] Use llvm::any_of (NFC)	2020-12-24 09:08:36 -08:00
Evgeniy Brevnov	e0751234ef	[CodeGen] Add "noreturn" attirbute to _Unwind_Resume Currently 'resume' is lowered to _Unwind_Resume with out "noreturn" attribute. Semantically _Unwind_Resume library call is expected to never return and should be marked as such. Though I didn't find any changes in behavior of existing tests there will be a difference once https://reviews.llvm.org/D79485 lands. I was not able to come up with the test case anything better than just checking for presence of "noreturn" attribute. Please let me know if there is a better way to test the change. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D93682	2020-12-24 18:14:18 +07:00
Layton Kifer	d29f93bda5	[DAGCombiner] Don't create sexts of deleted xors when they were in-visit replaced Fixes a bug introduced by D91589. When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced. Thanks @jgorbe, for finding the below reproducer. The following reduced test case crashes clang when built with `clang -O1 -frounding-math`: ``` template <class> class a { int b() { return c == 0.0 ? 0 : -1; } int c; }; template class a<long>; ``` A debug build of clang produces this "assertion failed" error: ``` clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm:: SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed. ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93274	2020-12-23 16:16:26 -08:00
Sriraman Tallam	34e70d722d	Append ".__part." to every basic block section symbol. Every basic block section symbol created by -fbasic-block-sections will contain ".__part." to know that this symbol corresponds to a basic block fragment of the function. This patch solves two problems: a) Like D89617, we want function symbols with suffixes to be properly qualified so that external tools like profile aggregators know exactly what this symbol corresponds to. b) The current basic block naming just adds a ".N" to the symbol name where N is some integer. This collides with how clang creates __cxx_global_var_init.N. clang creates these symbol names to call constructor functions and basic block symbol naming should not use the same style. Fixed all the test cases and added an extra test for __cxx_global_var_init breakage. Differential Revision: https://reviews.llvm.org/D93082	2020-12-23 11:35:44 -08:00
Matt Arsenault	581d13f8ae	GlobalISel: Return APInt from getConstantVRegVal Returning int64_t was arbitrarily limiting for wide integer types, and the functions should handle the full generality of the IR. Also changes the full form which returns the originally defined vreg. Add another wrapper for the common case of just immediately converting to int64_t (arguably this would be useful for the full return value case as well). One possible issue with this change is some of the existing uses did break without conversion to getConstantVRegSExtVal, and it's possible some without adequate test coverage are now broken.	2020-12-22 22:23:58 -05:00
Matt Arsenault	5bec082834	VirtRegMap: Use Register	2020-12-22 20:56:14 -05:00
Arthur O'Dwyer	22cf54a7fb	Replace `T(x)` with `reinterpret_cast<T>(x)` everywhere it means reinterpret_cast. NFC. Differential Revision: https://reviews.llvm.org/D76572	2020-12-22 19:54:29 -05:00
Sjoerd Meijer	9a6de74d5a	[MachineLICM] Add llvm debug messages to SinkIntoLoop. NFC. I am investigating sinking instructions back into the loop under high register pressure. This is just a first NFC step to add some debug messages that allows tracing of the decision making.	2020-12-22 09:19:43 +00:00
Pavel Labath	8d75d902a9	[DebugInfo] Don't use DW_OP_implicit_value for fragments Currently using DW_OP_implicit_value in fragments produces invalid DWARF expressions. (Such a case can occur in complex floats, for example.) This problem manifests itself as a missing DW_OP_piece operation after the last fragment. This happens because the function for printing constant float value skips printing the accompanying DWARF expression, as that would also print DW_OP_stack_value (which is not desirable in this case). However, this also results in DW_OP_piece being skipped. The reason that DW_OP_piece is missing only for the last piece is that the act of printing the next fragment corrects this. However, it does that for the wrong reason -- the code emitting this DW_OP_piece thinks that the previous fragment was missing, and so it thinks that it needs to skip over it in order to be able to print itself. In a simple scenario this works out, but it's likely that in a more complex setup (where some pieces are in fact missing), this logic would go badly wrong. In a simple setup gdb also seems to not mind the fact that the DW_OP_piece is missing, but it would also likely not handle more complex use cases. For this reason, this patch disables the usage of DW_OP_implicit_value in the frament scenario (we will use DW_OP_const*** instead), until we figure out the right way to deal with this. This guarantees that we produce valid expressions, and gdb can handle both kinds of inputs anyway. Differential Revision: https://reviews.llvm.org/D92013	2020-12-22 10:07:47 +01:00
Bing1 Yu	e8ade4569b	[LegalizeType] When LegalizeType procedure widens a masked_gather, set MemoryType's EltNum equal to Result's EltNum When LegalizeType procedure widens a masked_gather, set MemoryType's EltNum equal to Result's EltNum. As I mentioned in https://reviews.llvm.org/D91092, in previous code, If we have a v17i32's masked_gather in avx512, we widen it to a v32i32's masked_gather with a v17i32's MemoryType. When the SplitVecRes_MGATHER process this v32i32's masked_gather, GetSplitDestVTs will assert fail since what you are going to split is v17i32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D93610	2020-12-22 13:27:38 +08:00
Fangrui Song	d9a0c40bce	[MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the intention clearer and matches the direction of reverted r240130 (to drop the unneeded parameters). No behavior change.	2020-12-21 14:04:13 -08:00
Denis Antrushin	6f45049fb6	[Statepoints] Disable VReg lowering for values used on exception path of invoke. Currently we lower invokes the same way as usual calls, e.g.: V1 = STATEPOINT ... V (tied-def 0) But this is incorrect is V1 is used on exceptional path. By LLVM rules V1 neither dominates its uses in landing pad, nor its live range is live on entry to landing pad. So compiler is allowed to do various weird transformations like splitting live range after statepoint and use split LR in catch block. Until (and if) we find better solution to this problem, let's use old lowering (spilling) for those values which are used on exceptional path and allow VReg lowering for values used only on normal path. Differential Revision: https://reviews.llvm.org/D93449	2020-12-21 20:27:05 +07:00
Fangrui Song	1635dea266	[AsmPrinter] Replace a reachable report_fatal_error with MCContext::reportError	2020-12-20 23:45:49 -08:00
Pushpinder Singh	e2303a448e	[FastRA] Fix handling of bundled MIs Fast register allocator skips bundled MIs, as the main assignment loop uses MachineBasicBlock::iterator (= MachineInstrBundleIterator) This was causing SIInsertWaitcnts to crash which expects all instructions to have registers assigned. This patch makes sure to set everything inside bundle to the same assignments done on BUNDLE header. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D90369	2020-12-21 02:10:55 -05:00
Chen Zheng	4dce7c2e20	[MachineLICM] delete dead flag if the duplicated def outside of loop is dead. Fixup dead flags for CSE-ed instructions. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92557	2020-12-20 19:26:22 -05:00
Kazu Hirata	3285ee143b	[Analysis, IR, CodeGen] Use llvm::erase_if (NFC)	2020-12-20 09:19:35 -08:00
Kazu Hirata	805d59593f	[Analysis, CodeGen, IR] Use contains (NFC)	2020-12-18 19:08:17 -08:00
Chih-Ping Chen	5f75dcf571	[DebugInfo] Support Fortran 'use <external module>' statement. The main change is to add a 'IsDecl' field to DIModule so that when IsDecl is set to true, the debug info entry generated for the module would be marked as a declaration. That way, the debugger would look up the definition of the module in the gloabl scope. Please see the comments in llvm/test/DebugInfo/X86/dimodule.ll for what the debug info entries would look like. Differential Revision: https://reviews.llvm.org/D93462	2020-12-18 13:10:57 -05:00
Craig Blackmore	698ae90f30	[RegisterScavenging] Fix assert in scavengeRegisterBackwards According to the documentation, if a spill is required to make a register available and AllowSpill is false, then NoRegister should be returned, however, this scenario was actually triggering an assertion failure. This patch moves the assertion after the handling of AllowSpill. Authored by: Lewis Revill Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92104	2020-12-18 16:57:05 +00:00
Matt Arsenault	fd0f5fb8de	PEI: Only call updateLiveness once per function This only needs to be called once for the function, and it visits all the necessary blocks in the function. It looks like `631f6b888c` accidentally moved this into the loop over all save blocks.	2020-12-18 11:02:28 -05:00
Bjorn Pettersson	a89d751fb4	Add intrinsics for saturating float to int casts This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ The intrinsics have overloaded source and result type and support vector operands: i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example: i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19 I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width. There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like: f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN If the bounds cannot be exactly represented, we expand to something like this instead: i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN It should be noted that this expansion assumes a non-trapping fptoxi. Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening. Original patch by @nikic and @ebevhan (based on D54696). Differential Revision: https://reviews.llvm.org/D54749	2020-12-18 11:09:41 +01:00
Rong Xu	3733463dbb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Layton Kifer	385e9a2a04	[DAGCombiner] Improve shift by select of constant Clean up a TODO, to support folding a shift of a constant by a select of constants, on targets with different shift operand sizes. Reviewed By: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D90349	2020-12-18 02:21:42 +00:00
dfukalov	9ed8e0caab	[NFC] Reduce include files dependency and AA header cleanup (part 2). Continuing work started in https://reviews.llvm.org/D92489: Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h". Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92852	2020-12-17 14:04:48 +03:00

1 2 3 4 5 ...

30139 Commits