llvm-project

Commit Graph

Author	SHA1	Message	Date
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
Anshil Gandhi	29e11a1aa3	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `c4e5425aa5`.	2021-08-13 23:58:04 -06:00
Anshil Gandhi	c4e5425aa5	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-13 22:44:08 -06:00
Jessica Paquette	50efbf9cbe	[GlobalISel] Narrow binops feeding into G_AND with a mask This is a fairly common pattern: ``` %mask = G_CONSTANT iN <mask val> %add = G_ADD %lhs, %rhs %and = G_AND %add, %mask ``` We have combines to eliminate G_AND with a mask that does nothing. If we combined the above to this: ``` %mask = G_CONSTANT iN <mask val> %narrow_lhs = G_TRUNC %lhs %narrow_rhs = G_TRUNC %rhs %narrow_add = G_ADD %narrow_lhs, %narrow_rhs %ext = G_ZEXT %narrow_add %and = G_AND %ext, %mask ``` We'd be able to take advantage of those combines using the trunc + zext. For this to work (or be beneficial in the best case) - The operation we want to narrow then widen must only be used by the G_AND - The G_TRUNC + G_ZEXT must be free - Performing the operation at a narrower width must not produce a different value than performing it at the original width after masking. Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign. Differential Revision: https://reviews.llvm.org/D107929	2021-08-13 18:31:13 -07:00
Matt Arsenault	cc56152f83	GlobalISel: Add helper function for getting EVT from LLT This can only give an imperfect approximation, but is enough to avoid crashing in places where we call into EVT functions starting from LLTs.	2021-08-13 21:10:13 -04:00
Arthur Eubanks	f80ae58068	[NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex) getAttribute() is confusing, use a clearer method.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Ruiling Song	e1beebbac5	SplitKit: Don't further split subrange mask in buildCopy We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829	2021-08-13 07:36:38 +08:00
Rong Xu	4c5909ba83	[SampleFDO] Add two passes of MIRAddFSDiscriminatorsPass This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register allocation, and Pass2 of MIRAddFSDiscriminatorsPass before Block-Placement. This is still under --enable-fs-discrmininator option (default false). This would reduce the turn-around time for FSAFDO transition. Differential Revision: https://reviews.llvm.org/D104579	2021-08-11 11:11:04 -07:00
Fraser Cormack	885be620f9	[LegalizeTypes][NFC] Remove else-after-return Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107890	2021-08-11 16:48:28 +01:00
Rainer Orth	7bbbf29561	[ELF] Don't emit SHF_GNU_RETAIN on Solaris The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris. Initially, as reported in Bug 49437, it caused dozens of testsuite failures on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but `SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag (in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a completely different semantics, which confuses Solaris `ld` very much. Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris `ld` doesn't support, causing it to SEGV and break the build. The linker is currently being hardened to not accept non-native OS ABIs to avoid this. The need for linker support is already documented in `clang/include/clang/Basic/AttrDocs.td`, but not currently checked. This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D107747	2021-08-11 09:27:51 +02:00
madhur13490	61526b1262	[DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D107845	2021-08-11 12:14:28 +05:30
Craig Topper	a8ae41fb51	[SelectionDAGBuilder] Save iterator to avoid second DenseMap lookup. NFC We were calling find and then using operator[]. Instead keep the iterator from find and use it to get the value. Just happened to notice while investigating how we decide what extends to use between basic blocks.	2021-08-10 22:37:48 -07:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
Amara Emerson	7ec4ce157b	[AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality. With contributions by Sebastian Neubauer Differential Revision: https://reviews.llvm.org/D105676	2021-08-10 16:41:18 -07:00
Adrian Prantl	d6b6880172	Streamline the API of salvageDebugInfoImpl (NFC) This patch refactors / simplifies salvageDebugInfoImpl(). The goal here is to simplify the implementation of coro::salvageDebugInfo() in a followup patch. 1. Change the return value to I.getOperand(0). Currently users of salvageDebugInfoImpl() assume that the first operand is I.getOperand(0). This patch makes this information explicit. A nice side-effect of this change is that it allows us to salvage expressions such as add i8 1, %a in the future. 2. Factor out the creation of a DIExpression and return an array of DIExpression operations instead. This change allows users that call salvageDebugInfoImpl() in a loop to avoid the costly creation of temporary DIExpressions and to defer the creation of a DIExpression until the end. This patch does not change any functionality. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107383	2021-08-10 15:21:18 -07:00
Jinsong Ji	2cfd427626	[AIX] Don't crash on unimplemented lowerRelativeReference We may call lowerRelativeReference in MC to determine whether target supports this lowering. We should return nullptr instead of crashing when we haven't implemented the real lowering. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D107830	2021-08-10 17:43:06 +00:00
Matt Arsenault	1b41945da0	RegAllocGreedy: Add spaces between registers in debug message	2021-08-10 13:12:34 -04:00
Konstantin Schwarz	64bef13f08	[GlobalISel] Look through truncs and extends in narrowScalarShift If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be shifted individually by the constant shift amount. However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization code was used, producing intermediate shifts that are potentially illegal on some targets. This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D89100	2021-08-10 13:49:22 +02:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Jeremy Morse	d4ce9e463d	[DWARF] Revert sharing subprograms across CUs This patch is a revert of `e08f205f5c`. In that patch, DW_TAG_subprograms were permitted to be referenced across CU boundaries, to improve stack trace construction using call site information. Unfortunately, as documented in PR48790, the way that subprograms are "owned" by dwarf units is sufficiently complicated that subprograms end up in unexpected units, invalidating cross-unit references. There's no obvious way to easily fix this, and several attempts have failed. Revert this to ensure correct DWARF is always emitted. Three tests change in addition to the reversion, but they're all very light alterations. Differential Revision: https://reviews.llvm.org/D107076	2021-08-09 12:43:43 +01:00
Luo, Yuanke	53642d5b80	[NFC] Fix the formula for reciprocal calculation. Differential Revision: https://reviews.llvm.org/D107713	2021-08-09 16:03:56 +08:00
Amara Emerson	4c2e01232c	[GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs. We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() instead of eraseFromParent(). We should probably use that in other places too but fix this issue which affects clang bootstrap builds for now.	2021-08-07 01:41:02 -07:00
Nemanja Ivanovic	62fe3dcf98	Fix PPC buildbot break caused by `4c4093e6e3` This commit adds the isnan intrinsic and provides a default expansion for it in the SDAG. However, it makes the assumption that types it operates on are IEEE-compliant types. This is not always the case. An example of that is PPC "double double" which has a representation that - Does not need to conform to IEEE requirements for isnan as it is not an IEEE-compliant type - Does not have a representation that allows for straightforward reinterpreting as an integer and use of integer operations The result was that this commit broke __builtin_isnan for ppc_fp128 making many valid numeric values report a NaN. This patch simply changes the expansion to always expand to unordered comparison (regardless of whether FP exceptions are tracked). This is inline with previous semantics.	2021-08-06 22:10:20 -05:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
Jon Roelofs	eae4a44c1d	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-06 09:48:39 -07:00
Craig Topper	b2ca4dc935	[LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available. This isn't optimal, but prevents crashing when the libcall isn't available. It just calculates the full product and makes sure the high bits match the sign of the low half. Each of the pieces should go through their own type legalization. This can make D107420 unnecessary. Needs tests, but I wanted to start discussion about D107420. Reviewed By: FreddyYe Differential Revision: https://reviews.llvm.org/D107581	2021-08-06 09:43:01 -07:00
Kazu Hirata	276be84d0a	[CodeGen] Remove computeDefOperandLatency (NFC) The last use was removed on Oct 9, 2016 in commit `5c924d7117`.	2021-08-06 08:26:55 -07:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Jay Foad	cd2594e1c6	[GlobalISel] Improve legalization of narrow CTTZ Differential Revision: https://reviews.llvm.org/D107457	2021-08-06 09:40:48 +01:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Sean Fertile	23651c5ae0	[PowerPC][AIX] Create multiple constant sections. Fixes issue where late materialized constants can be more strictly aligned then their containing csect. Differential Revision: https://reviews.llvm.org/D103103	2021-08-05 21:19:16 -04:00
Jon Roelofs	5fc7b1a260	Revert "[GlobalISel][KnownBits] Implement G_CTPOP" This reverts commit `ce6eb4f15a`. It's broken on the windows bots: https://reviews.llvm.org/D107606#2930121	2021-08-05 17:47:47 -07:00
Jon Roelofs	ce6eb4f15a	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-05 17:17:29 -07:00
Craig Topper	f7076cfd3a	[DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding. We don't have real demanded bits support for MULHU, but we can still use the known bits based constant folding support at the end of SimplifyDemandedBits to simplify a MULHU. This helps with cases where we know the LHS and RHS have enough leading zeros so that the high multiply result is always 0. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106471	2021-08-05 08:31:26 -07:00
Simon Pilgrim	2cbf9fd402	[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes. This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector. This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector. Fixes PR50053 Differential Revision: https://reviews.llvm.org/D107068	2021-08-05 15:40:48 +01:00
Paul Robinson	75aa3d520d	Add a DIExpression const-folder to prevent silly expressions. It's entirely possible (because it actually happened) for a bool variable to end up with a 256-bit DW_AT_const_value. This came about when a local bool variable was initialized from a bitfield in a 32-byte struct of bitfields, and after inlining and constant propagation, the variable did have a constant value. The sequence of optimizations had it carrying "i256" values around, but once the constant made it into the llvm.dbg.value, no further IR changes could affect it. Technically the llvm.dbg.value did have a DIExpression to reduce it back down to 8 bits, but the compiler is in no way ready to emit an oversized constant and a DWARF expression to manipulate it. Depending on the circumstances, we had either just the very fat bool value, or an expression with no starting value. The sequence of optimizations that led to this state did seem pretty reasonable, so the solution I came up with was to invent a DWARF constant expression folder. Currently it only does convert ops, but there's no reason it couldn't do other ops if that became useful. This broke three tests that depended on having convert ops survive into the DWARF, so I added an operator that would abort the folder to each of those tests. Differential Revision: https://reviews.llvm.org/D106915	2021-08-05 06:14:40 -07:00
Petar Avramovic	66de26b1f9	GlobalISel: Fix matchEqualDefs for instructions with multiple defs Instructions that produceSameValue produce same values for operands with same index. matchEqualDefs used to return true for any two values from different instructions that produce same values. Fix this by checking if values are defined by operands with the same index. Differential Revision: https://reviews.llvm.org/D107362	2021-08-05 15:05:45 +02:00
Dominik Montada	cc947e29ea	[GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107330	2021-08-05 13:52:10 +02:00
Fraser Cormack	0b8471e91b	[SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action The LegalizeAction for this node should follow the logic for `VECREDUCE_SEQ_FADD` and be determined using the vector operand's type. here isn't an in-tree target that makes use of this, but I think it's safe to say this is how it should behave, should a target want to customize the action for this node. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107478	2021-08-05 09:42:33 +01:00
Fangrui Song	a194438615	[CodeGen] Add -align-loops to `lib/CodeGen/CommandFlags.cpp`. It can replace -x86-experimental-pref-loop-alignment=. The loop alignment is only used by MachineBlockPlacement. The implementation uses a new `llvm::TargetOptions` for now, as an IR function attribute/module flags metadata may be overkill. This is the llvm part of D106701.	2021-08-04 12:45:18 -07:00
Craig Topper	c23405174a	[DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS. This allows special constants like to 0 to be recognized. It's also expected by isel patterns if a target had a mulh with immediate instructions. The commuting done by tablegen won't commute patterns with immediates since it expects DAGCombine to have done it. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107486	2021-08-04 11:39:23 -07:00
David Green	eeddcba525	[RDA] Attempt to make RDA subreg aware This attempts to make more of RDA aware of potentially overlapping subregisters. Some of this was already in place, with it iterating through MCRegUnitIterators. This also replaces calls to LiveRegs.contains(..) with !LiveRegs.available(..), and updates the isValidRegUseOf and isValidRegDefOf to search subregs. Differential Revision: https://reviews.llvm.org/D107351	2021-08-04 14:21:32 +01:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Heejin Ahn	9bd02c433b	[WebAssembly] Misc. cosmetic changes in EH (NFC) - Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are planning to add a separate `wasm.catch.longjmp` intrinsic which returns two values. - Rename several variables - Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall` from LowerEmscriptenEHSjLj pass - Add `-verify-machineinstrs` in a test for a safety measure - Add more comments + fix some errors in comments - Replace `std::vector` with `SmallVector` for cases likely with small number of elements - Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are soon going to add `EnableWasmSjLj`, so this makes the distincion clearer Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D107405	2021-08-03 21:03:46 -07:00
Arthur Eubanks	ad25344620	[MC][CodeGen] Emit constant pools earlier Previously we would emit constant pool entries for ldr inline asm at the very end of AsmPrinter::doFinalization(). However, if we're emitting dwarf aranges, that would end all sections with aranges. Then if we have constant pool entries to be emitted in those same sections, we'd hit an assert that the section has already been ended. We want to emit constant pool entries before emitting dwarf aranges. This patch splits out arm32/64's constant pool entry emission into its own MCTargetStreamer virtual method. Fixes PR51208 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107314	2021-08-03 20:55:31 -07:00
Simon Pilgrim	11396641e4	[DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI. We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together. Noticed while triaging PR45116	2021-08-03 13:47:55 +01:00
Eli Friedman	1f62af6346	[AArch64][SelectionDAG] Support passing/returning scalable vectors with unusual types. This adds handling for two cases: 1. A scalable vector where the element type is promoted. 2. A scalable vector where the element count is odd (or more generally, not divisble by the element count of the part type). (Some element types still don't work; for example, <vscale x 2 x i128>, or <vscale x 2 x fp128>.) Differential Revision: https://reviews.llvm.org/D105591	2021-08-02 15:53:16 -07:00
Max Kazantsev	c5b63714b5	[GC][NFC] Make getGCStrategy by name available in IR We might want to use info from GC strategy in middle end analysis. The motivation for this is provided in D99135: we may want to ask a GC if it's going to work with a given pointer (currently this code makes naive check by the method name). Differetial Revision: https://reviews.llvm.org/D100559 Reviewed By: reames	2021-08-02 14:26:04 +07:00
Matt Arsenault	ebc17a0d68	GlobalISel: Scalarize unaligned vector stores This has the same problems and limitations as the load path.	2021-07-31 10:37:15 -04:00
Simon Pilgrim	3a7c82efb8	[DAG] isGuaranteedNotToBeUndefOrPoison - handle ISD::BUILD_VECTOR nodes If all demanded elements of the BUILD_VECTOR pass a isGuaranteedNotToBeUndefOrPoison check, then we can treat this specific demanded use of the BUILD_VECTOR as guaranteed not to be undef or poison either. Differential Revision: https://reviews.llvm.org/D107174	2021-07-31 15:08:25 +01:00
Matt Arsenault	bc2cb91a20	GlobalISel: Have lowerStore handle some unaligned stores This is NFC until some of the AMDGPU legalization rules are ripped out.	2021-07-31 10:01:42 -04:00
Alexandros Lamprineas	7d940432c4	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-07-31 09:51:28 +01:00
Alexandros Lamprineas	3094e5389b	[AArch64] Add a Machine Value Type for 8 consecutive registers Adds MVT::i64x8, a Machine Value Type needed for lowering inline assembly operands which materialize a sequence of eight general purpose registers. Differential Revision: https://reviews.llvm.org/D94096	2021-07-31 09:51:28 +01:00
Rahman Lavaee	2256b359d7	Explain the symbols of basic block clusters with an example in the header comments. This prevents from confusion with the ``labels`` option. Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D107128	2021-07-30 12:08:04 -07:00
Simon Pilgrim	3c0b596ecc	SelectionDAGDumper.cpp - remove nested if-else return chain. NFCI. Match style and don't use an else after a return.	2021-07-30 19:23:05 +01:00
Simon Pilgrim	986841cca2	SelectionDAGDumper.cpp - printrWithDepthHelper - remove dead code. NFCI. Fixes coverity warning - we have an early-out for unsigned depth == 0, so the depth < 1 early-out later on is dead code.	2021-07-30 19:23:04 +01:00
Matt Arsenault	e46badd4e9	GlobalISel: Have lowerLoad scalarize unaligned vectors This could be smarter by picking an ideal type, or at least splitting the vector in half first. Also handles lower for non-power-of-2, non-extending vector loads. Currently this just avoids failing to legalize some odd vector AMDGPU tests, but is a step towards removing the split logic from the NarrowScalar logic.	2021-07-30 13:23:29 -04:00
Matt Arsenault	f19226dda5	GlobalISel: Have load lowering handle some unaligned accesses The code for splitting an unaligned access into 2 pieces is essentially the same as for splitting a non-power-of-2 load for scalars. It would be better to pick an optimal memory access size and directly use it, but splitting in half is what the DAG does. As-is this fixes handling of some unaligned sextload/zextloads for AMDGPU. In the future this will help drop the ugly abuse of narrowScalar to handle splitting unaligned accesses.	2021-07-30 12:55:58 -04:00
Adrian Prantl	c5d84d2eb3	GlobalISel/AArch64: don't optimize away redundant branches at -O0 This patch prevents GlobalISel from optimizing out redundant branch instructions when compiling without optimizations. The motivating example is code like the following common pattern in Swift, where users expect to be able to set a breakpoint on the early exit: public func f(b: Bool) { guard b else { return // I would like to set a breakpoint here. } ... } The patch modifies two places in GlobalISEL: The first one is in IRTranslator.cpp where the removal of redundant branches is made conditional on the optimization level. The second one is in AArch64InstructionSelector.cpp where an -O0 only optimization is being removed. Disabling these optimizations increases code size at -O0 by ~8%. However, doing so improves debuggability, and debug builds are the primary reason why developers compile without optimizations. We thus concluded that this is the right trade-off. rdar://79515454 This tenatively reapplies the patch without modifications, the LLDB test that has blocked this from landing previously has since been modified to hopefully no longer be sensitive to this change. Differential Revision: https://reviews.llvm.org/D105238	2021-07-29 16:04:22 -07:00
Amara Emerson	c54d5c9756	[GlobalISel] Use GMergeLikeOp to simplify a combine. NFC.	2021-07-29 13:53:16 -07:00
Amara Emerson	532c458fa8	[GlobalISel] Add GPtrAdd and use it in some combines.	2021-07-29 12:04:02 -07:00
Jessica Clarke	95ef464ac9	Handle subregs and superregs in callee-saved register mask If a target lists both a subreg and a superreg in a callee-saved register mask, the prolog will spill both aliasing registers. Instead, don't spill the subreg if a superreg is being spilled. This case is hit by the PowerPC SPE code, as well as a modified RISC-V backend for CHERI I maintain out of tree. Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D73170	2021-07-29 16:53:29 +01:00
Sanjay Patel	fa6b2c9915	[DAGCombiner] don't try to partially reduce add-with-overflow ops This transform was added with D58874, but there were no tests for overflow ops. We need to change this one way or another because it can crash as shown in: https://llvm.org/PR51238 Note that if there are no uses of an overflow op's bool overflow result, we reduce it to a regular math op, so we continue to fold that case either way. If we have uses of both the math and the overflow bool, then we are likely not saving anything by creating an independent sub instruction as seen in the test diffs here. This patch makes the behavior in SDAG consistent with what we do in instcombine AFAICT. Differential Revision: https://reviews.llvm.org/D106983	2021-07-29 08:51:54 -04:00
Guozhi Wei	50b6273145	[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately. Differential Revision: https://reviews.llvm.org/D106329	2021-07-28 19:00:45 -07:00
Jeremy Morse	8612417e5a	[DebugInfo][InstrRef] Don't break up ret-sequences on debug-info instrs When we have a terminator sequence (i.e. a tailcall or return), MIIsInTerminatorSequence is used to work out where the preceding ABI-setup instructions end, i.e. the parts that were glued to the terminator instruction. This allows LLVM to split blocks safely without having to worry about ABI stuff. The function only ignores DBG_VALUE instructions, meaning that the two debug instructions I recently added can end terminator sequences early, causing various MachineVerifier errors. This patch promotes the test for debug instructions from "isDebugValue" to "isDebugInstr", thus avoiding any debug-info interfering with this function. Differential Revision: https://reviews.llvm.org/D106660	2021-07-28 15:56:00 +01:00
Juneyoung Lee	4f71f59bf3	[DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))` if the SETCC is only used by BRCOND. Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D105344	2021-07-28 09:22:15 +09:00
Anirudh Prasad	a8cfa4b9bd	[SystemZ][z/OS] Initial code to generate assembly files on z/OS - This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target. - Only the .text and the .bss sections are added for now. - The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections. - This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target - Further improvements and additions will be made in future patches. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106380	2021-07-27 11:29:15 -04:00
Jeremy Morse	ec9da51724	[DebugInfo][InstrRef] Correctly update DBG_PHIs during instr scheduling Avoid several crashes when DBG_INSTR_REF and DBG_PHI instructions are fed to the instruction scheduler. DBG_INSTR_REFs should be treated like DBG_LABELs, and just ignored for the purpose of scheduling [0]. DBG_PHIs however behave much more like DBG_VALUEs: they refer to register operands, and if some register defs get shuffled around during instruction scheduling, there's a risk that the debug instr will refer to the wrong value. There's already a facility for updating DBG_VALUEs to reflect this; add DBG_PHI to the list of instructions that it will update. [0] Suboptimal, but it's what instr scheduling does right now. Differential Revision: https://reviews.llvm.org/D106663	2021-07-27 15:12:46 +01:00
Jeremy Morse	7dc9d73731	[DebugInfo][InstrRef] Handle llvm.frameaddress intrinsics gracefully When working out which instruction defines a value, the instruction-referencing variable location code has a few special cases for physical registers: * Arguments are never defined by instructions, * Constant physical registers always read the same value, are never def'd This patch adds a third case for the llvm.frameaddress intrinsics: you can read the framepointer in any block if you so choose, and use it as a variable location, as shown in the added test. This rather violates one of the assumptions behind instruction referencing, that LLVM-ir shouldn't be able to read from an arbitrary register at some arbitrary point in the program. The solution for now is to just emit a DBG_PHI that reads the register value: this works, but if we wanted to do something clever with DBG_PHIs in the future then this would probably get in the way. As it stands, this patch avoids a crash. Differential Revision: https://reviews.llvm.org/D106659	2021-07-27 13:44:37 +01:00
Jay Foad	dc4ca0dbbc	[GlobalISel] Constant fold G_SITOFP and G_UITOFP in CSEMIRBuilder Differential Revision: https://reviews.llvm.org/D104528	2021-07-27 11:27:58 +01:00
Fraser Cormack	7b33b849bd	[SelectionDAG] Support scalable splats in U(ADD\|SUB)SAT combines This patch builds on top of D106575 in which scalable-vector splats were supported in `ISD::matchBinaryPredicate`. It teaches the DAGCombiner how to perform a variety of the pre-existing saturating add/sub combines on scalable-vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106652	2021-07-27 10:52:34 +01:00
David Green	e00d67dc48	[NFC] Reflow some debug messages.	2021-07-27 10:11:51 +01:00
Johannes Doerfert	25a3130d89	[Local] Do not introduce a new `llvm.trap` before `unreachable` This is the second attempt to remove the `llvm.trap` insertion after https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6 reverted the first one. It is not clear what the exact issue was back then and it might already be gone by now, it has been >5 years after all. Replaces D106299. Differential Revision: https://reviews.llvm.org/D106308	2021-07-26 23:33:36 -05:00
Mitch Phillips	ae70b211eb	Revert "[GlobalISel] Add scalar widening for G_MERGE_VALUES destination" This reverts commit `0a37163d1d`. Reason: Broke the sanitizer msan bots. More details are available in the original Phabricator review: https://reviews.llvm.org/D106814.	2021-07-26 19:52:12 -07:00
Jon Roelofs	f2e8e46d78	Revert "[AArch64][GlobalISel] Legalize ctpop s128" This reverts commit `97e95fea53`. It broke test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll. Not sure why I didn't see that.	2021-07-26 17:06:43 -07:00
Jessica Paquette	0a37163d1d	[GlobalISel] Add scalar widening for G_MERGE_VALUES destination This adds support for the case where WideSize = DstSize + K * SrcSize In this case, we can pad the G_MERGE_VALUES instruction with K extra undef values with width SrcSize. Then the destination can be handled via widenScalarDst. Differential Revision: https://reviews.llvm.org/D106814	2021-07-26 17:00:00 -07:00
Jon Roelofs	97e95fea53	[AArch64][GlobalISel] Legalize ctpop s128 Differential revision: https://reviews.llvm.org/D106494	2021-07-26 16:33:50 -07:00
Amara Emerson	c658b472f3	[GlobalISel] Add a constant folding combine. Use it AArch64 post-legal combiner. These don't always get folded because when the instructions are created the constants are obscured by artifacts. Differential Revision: https://reviews.llvm.org/D106776	2021-07-26 14:53:33 -07:00
Heejin Ahn	a48ee9f255	[WebAssembly] Remove dominator dependency in WasmEHPrepare (NFC) Dominator trees were previously used for an optimization related to `wasm.lsda` but the optimization was removed in D97309. Currently dominators are not doing anything in this pass. Also removes some `include` lines without which it compiles. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D106811	2021-07-26 14:45:13 -07:00
Matheus Izvekov	f84c70a379	[CodeView] Saturate values bigger than supported by APInt. This fixes an assert firing when compiling code which involves 128 bit integrals. This would trigger runtime checks similar to this: ``` Assertion failed: getMinSignedBits() <= 64 && "Too many bits for int64_t", file llvm/include/llvm/ADT/APInt.h, line 1646 ``` To get around this, we just saturate those big values. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D105320	2021-07-26 22:15:26 +02:00
Craig Topper	14e356d121	[TypePromotion] Remove redundant if. NFC The same condition was checked in the previous if. Maybe this was a bad merge resolution?	2021-07-26 11:47:25 -07:00
Amara Emerson	dec34104bf	[GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner. Differential Revision: https://reviews.llvm.org/D106761	2021-07-26 10:37:31 -07:00
Stephen Tozer	31e7551217	[DebugInfo] Correctly update debug users of SSA values in tail duplication During tail duplication, SSA values may be updated and have their uses replaced with a virtual register, and any debug instructions that use that value are deleted. This patch fixes the implementation of the debug instruction deletion to work correctly for debug instructions that use the SSA value multiple times, by batching deletions so that we don't attempt to delete the same instruction twice. Differential Revision: https://reviews.llvm.org/D106557	2021-07-26 17:27:57 +01:00
Jeremy Morse	f86694cb80	[InstrRef][AArch64][1/4] Accept constant physreg variable locations Late in SelectionDAG we join up instruction numbers with their defining instructions, if it couldn't be done during the main part of SelectionDAG. One exception is function arguments, where we have to point a DBG_PHI instruction at the incoming live register, as they don't have a defining instruction. This patch adds another exception, for constant physregs, like aarch64 has. It may seem wasteful to use two instructions where we could use a single DBG_VALUE, however the whole point of instruction referencing is to decouple the identification of values from the specification of where variable location ranges start. (Part of my aarch64 work to ease adoption of instruction referencing, as in the meta comment on D104520) Differential Revision: https://reviews.llvm.org/D104520	2021-07-26 15:26:15 +01:00
Fraser Cormack	f924a3d474	[SelectionDAG] Support scalable-vector splats in yet more cases This patch extends support for (scalable-vector) splats in the DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a variety of simple combines of constants. Users of this function may now have to distinguish between `BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing with this in-tree follows the approach added for `ISD::matchUnaryPredicate` implemented in D94501. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106575	2021-07-26 10:15:08 +01:00
Esme-Yi	0d3e4d9d4d	[Debug-Info][llvm-dwarfdump] Don't use DW_FORM_data4/8 to encode the constants for DW_AT_data_member_location. Summary: In DWARF v3, DW_FORM_data4/8 in DW_AT_data_member_location are interpreted as location list pointers. Interpreting constants as pointers is not expected, so we use DW_FORM_udata to encode the constants. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D105687	2021-07-26 03:47:02 +00:00
Simon Pilgrim	478b22d95a	[CGP] despeculateCountZeros - Don't create is-zero branch if cttz/ctlz source is known non-zero If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from). Differential Revision: https://reviews.llvm.org/D106685	2021-07-24 13:11:49 +01:00
Simon Pilgrim	c261a06b7a	[DAG] Add initial SelectionDAG::isGuaranteedNotToBeUndefOrPoison framework (PR51129) I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage. I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added. Differential Revision: https://reviews.llvm.org/D106668	2021-07-24 11:36:35 +01:00
David Truby	1528a4d400	[llvm][sve] Lowering for VLS truncating stores This adds custom lowering for truncating stores when operating on fixed length vectors in SVE. It also includes a DAG combine to fold extends followed by truncating stores into non-truncating stores in order to prevent this pattern appearing once truncating stores are supported. Currently truncating stores are not used in certain cases where the size of the vector is larger than the target vector width. Differential Revision: https://reviews.llvm.org/D104471	2021-07-23 14:04:55 +01:00
Paulo Matos	46667a1003	[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR Reland of `31859f896`. This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and lowering methods for load and stores of reference types from IR globals. Once the lowering creates the new nodes, tablegen pattern matches those and converts them to Wasm global.get/set. Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D104797	2021-07-22 22:07:24 +02:00
Roman Lebedev	af8fa36bf0	[NFCI][TLI] prepare[US]REMEqFold(): don't add nonsensical 'exact' flag to rotates created As pointed out by Craig Topper.	2021-07-22 23:02:58 +03:00
Simon Tatham	bd41136746	[clang] Use i64 for the !srcloc metadata on asm IR nodes. This is part of a patch series working towards the ability to make SourceLocation into a 64-bit type to handle larger translation units. !srcloc is generated in clang codegen, and pulled back out by llvm functions like AsmPrinter::emitInlineAsm that need to report errors in the inline asm. From there it goes to LLVMContext::emitError, is stored in DiagnosticInfoInlineAsm, and ends up back in clang, at BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true clang::SourceLocation from the integer cookie. Throughout this code path, it's now 64-bit rather than 32, which means that if SourceLocation is expanded to a 64-bit type, this error report won't lose half of the data. The compiler will tolerate both of i32 and i64 !srcloc metadata in input IR without faulting. Test added in llvm/MC. (The semantic accuracy of the metadata is another matter, but I don't know of any situation where that matters: if you're reading an IR file written by a previous run of clang, you don't have the SourceManager that can relate those source locations back to the original source files.) Original version of the patch by Mikhail Maltsev. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D105491	2021-07-22 10:24:52 +01:00
ShihPo Hung	8d86562e5f	[RegisterCoalescer] Make resolveConflicts aware of earlyclobber Prior to this patch, it skipped the instruction defining VNI when checking if the tainted lanes are used. In the given example, VRGATHER is an illegal instruction because its DstReg overlaps with SrcReg. Therefore we need to check the defining instruction as well when there is an earlyclobber constraint. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D105684	2021-07-22 12:11:10 +08:00
Stanislav Mekhanoshin	c54c76037b	Prevent dead uses in register coalescer after rematerialization The coalescer does not check if register uses are available at the point of rematerialization. If it attempts to rematerialize an instruction with such uses it can end up with use without a def. LiveRangeEdit does such check during rematerialization, so just call LiveRangeEdit::allUsesAvailableAt() to avoid the problem. Differential Revision: https://reviews.llvm.org/D106396	2021-07-21 15:19:55 -07:00
Eli Friedman	0ca46a1757	[SelectionDAG] Fix the representation of ISD::STEP_VECTOR. The existing rule about the operand type is strange. Instead, just say the operand is a TargetConstant with the right width. (Legalization ignores TargetConstants, so it doesn't matter if that width is legal.) Highlights: 1. I had to substantially rewrite the AArch64 isel patterns to expect a TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe worth considering a target-specific node with some dagcombines instead of this complicated nest of isel patterns. 2. Our behavior on RV32 for vectors of i64 has changed slightly. In particular, we correctly preserve the width of the arithmetic through legalization. This changes the DAG a bit. Maybe room for improvement here. 3. I explicitly defined the behavior around overflow. This is necessary to make the DAGCombine transforms legal, and I don't think it causes any practical issues. Differential Revision: https://reviews.llvm.org/D105673	2021-07-21 10:58:40 -07:00
Jon Roelofs	4de74a7c4d	[MachineVerifier] Make INSERT_SUBREG diagnostic respect operand 2 subregs This came out of post-commit review: https://reviews.llvm.org/D105953#inline-1012919 Thanks uabelho!	2021-07-21 08:47:17 -07:00
Guillaume Chatelet	d6da02d952	[llvm] Add enum iteration to Sequence This patch allows iterating typed enum via the ADT/Sequence utility. It also changes the original design to better separate concerns: - `StrongInt` only deals with safe `intmax_t` operations, - `SafeIntIterator` presents the iterator and reverse iterator interface but only deals with safe `StrongInt` internally. - `iota_range` only deals with `SafeIntIterator` internally. This design ensures that operations are always valid. In particular, "Out of bounds" assertions fire when: - the `value_type` is not representable as an `intmax_t` - iterator operations make internal computation underflow/overflow - the internal representation cannot be converted back to `value_type` Differential Revision: https://reviews.llvm.org/D106279	2021-07-21 12:48:53 +00:00
Tim Northover	291e0daa6e	AArch64: support 8 & 16-bit atomic operations in GlobalISel We have SelectionDAG patterns for 8 & 16-bit atomic operations, but they assume the value types will have been legalized to 32-bits. So this adds the ability to widen them to both AArch64 & generic GISel infrastructure.	2021-07-21 09:35:14 +01:00
Jon Roelofs	be8738324c	[MachineVerifier] Diagnose invalid INSERT_SUBREGs Differential revision: https://reviews.llvm.org/D105953	2021-07-20 17:32:29 -07:00
Jon Roelofs	a14b4e34a4	[GlobalISel] Tail call memcpy/memmove/memset even in the presence of copies Differentail revision: https://reviews.llvm.org/D105382	2021-07-20 17:04:33 -07:00
Jon Roelofs	afaf92826e	[GlobalISel] Mark memcpy/memmove/memset as thisreturn https://clang.godbolt.org/z/9az64j8W6 rdar://77466123 Differential revision: https://reviews.llvm.org/D105370	2021-07-20 17:04:33 -07:00
Fangrui Song	3924877932	[IR] Rename `comdat noduplicates` to `comdat nodeduplicate` In the textual format, `noduplicates` means no COMDAT/section group deduplication is performed. Therefore, if both sets of sections are retained, and they happen to define strong external symbols with the same names, there will be a duplicate definition linker error. In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`. The name describes the corollary instead of the immediate semantics. The name can cause confusion to other binary formats (ELF, wasm) which have implemented/ want to implement the "no deduplication" selection kind. Rename it to be clearer. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D106319	2021-07-20 12:47:10 -07:00
Stefan Pintilie	1a6dc92be7	[PowerPC] Inefficient register allocation of ACC registers results in many copies. ACC registers are a combination of four consecutive vector registers. If the vector registers are assigned first this often forces a number of copies to appear just before the ACC register is created. If the ACC register is assigned first then fewer copies are generated when the vector registers are assigned. This patch tries to force the register allocator to assign the ACC registers first and then the UACC registers and then the vector pair registers. It does this by changing the priority of the register classes. This patch also adds hints to help the register allocator assign UACC registers from known ACC registers and vector pair registers from known UACC registers. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D105854	2021-07-20 10:53:40 -05:00
Jeremy Morse	241f3e386c	[DebugInfo][InstrRef] Fix a broken substitution method, add test coverage This patch fixes a clearly-broken function that I absent-mindedly bodged many months ago. Over in D85749 I landed the substituteDebugValuesForInst, that creates substitution records for all the def operands from one debug-labelled instruction to the new one. Unfortunately it would crash if the two instructions had different numbers of operands; I tried to fix this in `537f0fbe82` by adding a "max operand" parameter to the method, but then didn't actually change the loop bound to take account of this. It passed all the tests because.... well there wasn't any real test coverage of this method. This patch fixes up the loop to be bounded by the MaxOperand bound; and adds test coverage for the x86-fixup-LEAs calls to this method, so that it's actually tested. Differential Revision: https://reviews.llvm.org/D105820	2021-07-20 11:45:13 +01:00
Matt Arsenault	c9ec807b11	CodeGen: Make MachineOptimizationRemarkEmitterPass a CFG analysis This avoids rerunning it a few times.	2021-07-19 21:08:26 -04:00
Matt Arsenault	904dab55ab	GlobalISel: Remove some mystery code that clears isReturned I don't understand what this is going for, and haven't found an analog in DAG code. No tests fail with this removed.	2021-07-19 20:21:05 -04:00
Amy Huang	fd972bb9fd	Revert "[llvm][sve] Lowering for VLS truncating stores" because it causes a seg fault (see https://reviews.llvm.org/D104471). This reverts commit `c305557acd`.	2021-07-19 11:03:33 -07:00
Amara Emerson	03cdb5221d	[GlobalISel] Fix load-or combine moving loads across potential aliasing stores. Although this combine checks that there's no load folding barriers between the loads that it's trying to merge, it was inserting the load at the MIRBuilder's default insertion point, which is the G_OR use inst. This was causing a miscompile in the test suite's SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-bswap-2 Differential Revision: https://reviews.llvm.org/D106251	2021-07-19 10:23:23 -07:00
Craig Topper	50302feb1d	[SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend. RISCV would prefer a sign extended constant since that works better with our constant materialization. We have an existing TLI hook we use to control sign extension of setcc operands in type legalization. That hook happens to do the right check we need here, but might be straying from its original purpose. With only RISCV defining this hook in tree, I wasn't sure if it was worth adding another hook with identical behavior. This is an alternative to D105785 where I tried to handle this in the RISCV backend by not creating ANY_EXTENDs in some places. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105918	2021-07-19 09:25:28 -07:00
Matt Arsenault	67d6132463	GlobalISel: Preserve memory types for implicit sret load/stores	2021-07-19 11:52:42 -04:00
Matt Arsenault	9236125ec8	GlobalISel: Preserve LLT when bitcasting loads and stores This also avoids improperly legalizing some truncating vector stores.	2021-07-19 11:30:14 -04:00
Roman Lebedev	5b51bd1878	[TLI] prepareSREMEqFold(): use correct VT for the final VSELECT (PR51133) We were using the wrong VT for this final VSELECT, it should be in the final comparison VT, not the source value's VT. Fixes https://bugs.llvm.org/show_bug.cgi?id=51133	2021-07-19 16:44:00 +03:00
Eli Friedman	6601be4419	[X86] Remove incorrect use of known bits in shuffle simplification. This reverts commit `2a419a0b99`. The result of a shufflevector must not propagate poison from any element other than the one noted in the shuffle mask. The regressions outside of fptoui-may-overflow.ll can probably be recovered some other way; for example, using isGuaranteedNotToBePoison. See discussion on https://reviews.llvm.org/D106053 for more background. Differential Revision: https://reviews.llvm.org/D106222	2021-07-18 18:13:11 -07:00
Simon Pilgrim	fd7a54c709	[DAG] DAGCombiner::foldSelectOfBinops - propagate the common flags to the merged binop As discussed on D106058 - we were failing to keep the common flags. This matches the behaviour in InstCombinerImpl::foldSelectOpOp.	2021-07-18 18:38:59 +01:00
Simon Pilgrim	5643be96bc	[DAG] Enable foldSelectOfBinops on select(setcc(),binop(),binop()) calls	2021-07-18 18:38:59 +01:00
Simon Pilgrim	1a6a8443c2	[DAG] Move select(cc, binop(), binop()) folds into DAGCombiner::foldSelectOfBinops. NFCI. I'm going to extend the functionality started in D106058 so move the folds into their own method to reduce the amount of code in DAGCombiner::visitSELECT	2021-07-18 14:54:41 +01:00
Amara Emerson	4c55cdb00a	[GlobalISel] Fix known bits for G_BSWAP and B_BITREVERSE not doing anything. llvm::KnownBits::byteSwap() and reverse() don't modify in-place, so we weren't actually computing anything. This was causing a miscompile on an arm64 stage2 bootstrap clang build.	2021-07-17 23:07:16 -07:00
Kazu Hirata	1993b73755	[Analaysis, CodeGen] Remove getHotSucc (NFC) These functions seem to be unused for at least 5 years.	2021-07-17 07:31:36 -07:00
Amara Emerson	9637848f51	[GlobalISel] Fix non-pow-2 legalization of s56 stores. s56 stores are broken down into s32 + s24 stores. During this step both of those new stores use an anyextended s64 value, resulting in truncating stores. With s56, the s24 requires another lower step to make it legal, and we were crashing because we didn't expect non-pow-2 stores to also be truncating as well. Differential Revision: https://reviews.llvm.org/D106183	2021-07-16 13:29:49 -07:00
Guozhi Wei	5609c8b607	[X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB This patch transforms the sequence lea (reg1, reg2), reg3 sub reg3, reg4 to two sub instructions sub reg1, reg4 sub reg2, reg4 Similar optimization can also be applied to LEA/ADD sequence. The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD). Differential Revision: https://reviews.llvm.org/D104684	2021-07-16 10:16:03 -07:00
Jon Roelofs	6c40abb6fe	Revert "[MachineVerifier] Diagnose invalid INSERT_SUBREGs" This reverts commit `dd57ba1a17`. It broke some tests: http://45.33.8.238/linux/51314/step_12.txt	2021-07-16 09:53:55 -07:00
Simon Pilgrim	95995673d1	[DAG] SelectionDAG::MaskedElementsAreZero - assert we're calling with a vector. NFCI. Add an assertion that we've calling MaskedElementsAreZero with a vector op and that the DemandedElts arg is a matching width. Makes the error a lot easier to grok when something else accidentally gets used.	2021-07-16 17:43:35 +01:00
Jon Roelofs	dd57ba1a17	[MachineVerifier] Diagnose invalid INSERT_SUBREGs Differential revision: https://reviews.llvm.org/D105953	2021-07-16 09:43:12 -07:00
Matt Arsenault	5a0d940f2a	GlobalISel: Preserve memory type for memset expansion	2021-07-16 11:41:32 -04:00
Matt Arsenault	f57f8f7ccc	GlobalISel: Remove dead function	2021-07-16 08:59:25 -04:00
Jeremy Morse	231bf52119	[InstrRef][FastISel] Support emitting DBG_INSTR_REF from fast-isel If you attach __attribute__((optnone)) to a function when using optimisations, that function will use fast-isel instead of the usual SelectionDAG method. This is a problem for instruction referencing, because it means DBG_VALUEs of virtual registers will be created, triggering some safety assertions in LiveDebugVariables. Those assertions exist to detect exactly this scenario, where an unexpected piece of code is generating virtual register references in instruction referencing mode. Fix this by transforming the DBG_VALUEs created by fast-isel into half-formed DBG_INSTR_REFs, after which they get patched up in finalizeDebugInstrRefs. The test modified adds a fast-isel mode to the instruction referencing isel test. Differential Revision: https://reviews.llvm.org/D105694	2021-07-16 13:56:15 +01:00
Matt Arsenault	a2d7ace3e3	GlobalISel: Surface offsets parameter from ComputeValueVTs	2021-07-15 19:11:40 -04:00
Matt Arsenault	e91da668d0	GlobalISel: Track argument pointeriness with arg flags Since we're still building on top of the MVT based infrastructure, we need to track the pointer type/address space on the side so we can end up with the correct pointer LLTs when interpreting CCValAssigns.	2021-07-15 19:11:40 -04:00
Amara Emerson	4e3dc6b8dd	GlobalISel: Introduce GenericMachineInstr classes and derivatives for idiomatic LLVM RTTI. This adds some level of type safety, allows helper functions to be added for specific opcodes for free, and also allows us to succinctly check for class membership with the usual dyn_cast/isa/cast functions. To start off with, add variants for the different load/store operations with some places using it. Differential Revision: https://reviews.llvm.org/D105751	2021-07-15 15:21:57 -07:00
Jessica Paquette	5da0f9ab61	[GlobalISel] Fix infinite loop in reassociationCanBreakAddressingModePattern It didn't update the opcode while walking through G_INTTOPTR/G_PTRTOINT. Differential Revision: https://reviews.llvm.org/D106080	2021-07-15 10:09:07 -07:00
Simon Pilgrim	0aece73aba	[DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z)) Similar to the folds performed in InstCombinerImpl::foldSelectOpOp, this attempts to push a select further up to help merge a pair of binops. I'm primarily interested in select(cond,add(x,y),add(x,z)) folds to help expose pointer math (see https://bugs.llvm.org/show_bug.cgi?id=51069 etc.) but I've tried to use the more generic isBinOp(). Differential Revision: https://reviews.llvm.org/D106058	2021-07-15 16:08:30 +01:00
Tim Northover	5d7632ee72	MachO: don't emit L... private symbols in do_not_dead_strip sections. The linker can sometimes drop the do_not_dead_strip if it can't associate the atom with a symbol (the other place to specify no dead-stripping in MachO files).	2021-07-15 14:40:43 +01:00
Djordje Todorovic	fa2daaeff8	[2/2][RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs This patch adds the forward scan for finding redundant DBG_VALUEs. This analysis aims to remove redundant DBG_VALUEs by going forward in the basic block by considering the first DBG_VALUE as a valid until its first (location) operand is not clobbered/modified. For example: (1) DBG_VALUE $edi, !"var1", ... (2) <block of code that does affect $edi> (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (3). Differential Revision: https://reviews.llvm.org/D105280	2021-07-15 00:08:31 -07:00
Kai Luo	b9c3941cd6	[PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D103614	2021-07-15 01:12:09 +00:00
Stanislav Mekhanoshin	76b7d3432e	[AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization Any def of EXEC prevents rematerialization of any VOP instruction because of the physreg use. Create a callback to check if the physreg use can be ingored to allow rematerialization. Differential Revision: https://reviews.llvm.org/D105836	2021-07-14 13:03:58 -07:00
Eli Friedman	1e30bf8621	[SelectionDAG] Add an overload of getStepVector that assumes step 1. This is mostly a minor convenience, but the pattern seems frequent enough to be worthwhile (and we'll probably add more uses in the future). Differential Revision: https://reviews.llvm.org/D105850	2021-07-14 11:37:01 -07:00
Matt Arsenault	47269da5d8	GlobalISel: Handle lowering non-power-of-2 extloads	2021-07-14 11:54:11 -04:00
Djordje Todorovic	df686842bc	[RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs This new MIR pass removes redundant DBG_VALUEs. After the register allocator is done, more precisely, after the Virtual Register Rewriter, we end up having duplicated DBG_VALUEs, since some virtual registers are being rewritten into the same physical register as some of existing DBG_VALUEs. Each DBG_VALUE should indicate (at least before the LiveDebugValues) variables assignment, but it is being clobbered for function parameters during the SelectionDAG since it generates new DBG_VALUEs after COPY instructions, even though the parameter has no assignment. For example, if we had a DBG_VALUE $regX as an entry debug value representing the parameter, and a COPY and after the COPY, DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets rewritten into $regX, we'd end up having redundant DBG_VALUE. This breaks the definition of the DBG_VALUE since some analysis passes might be built on top of that premise..., and this patch tries to fix the MIR with the respect to that. This first patch performs bacward scan, by trying to detect a sequence of consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one variable but the last one: For example: (1) DBG_VALUE $edi, !"var1", ... (2) DBG_VALUE $esi, !"var2", ... (3) DBG_VALUE $edi, !"var1", ... ... in this case, we can remove (1). By combining the forward scan that will be introduced in the next patch (from this stack), by inspecting the statistics, the RemoveRedundantDebugValues removes 15032 instructions by using gdb-7.11 as a testbed. Differential Revision: https://reviews.llvm.org/D105279	2021-07-14 04:29:42 -07:00
Ruiling Song	40e3df2a1b	[RegisterCoalescer] Resolve conflict based on liveness of subregister Currently we are resolving lane/subregister conflict by visiting instructions sequentially in current block to see whether there is any use of the tainted lanes. To save compile time, we are not doing further check in successor blocks. This sounds reasonable without subgregister liveness. But since we have added subregister liveness tracking capability to register coalescer, we can easily determine whether we have subregister liveness conflict by checking subranges. This would help coalescing more COPYs for target that enables subregister liveness tracking. Reviewed by: arsenm, qcolombet Differential Revision: https://reviews.llvm.org/D104509	2021-07-14 14:43:22 +08:00
Hongtao Yu	74b99b5c2e	[CSSPGO] Do not import pseudo probe desc in thinLTO Previously we reliedy on pseudo probe descriptors to look up precomputed GUID during probe emission for inlined probes. Since we are moving to always using unique linkage names, GUID for functions can be computed in place from dwarf names. This eliminates the need of importing pseudo probe descs in thinlto, since those descs should be emitted by the original modules. This significantly reduces thinlto memory footprint in some extreme case where the number of imported modules for a single module is massive. Test Plan: Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D105248	2021-07-13 18:26:36 -07:00
Matt Arsenault	eebe841a47	RegAlloc: Allow targets to split register allocation AMDGPU normally spills SGPRs to VGPRs. Previously, since all register classes are handled at the same time, this was problematic. We don't know ahead of time how many registers will be needed to be reserved to handle the spilling. If no VGPRs were left for spilling, we would have to try to spill to memory. If the spilled SGPRs were required for exec mask manipulation, it is highly problematic because the lanes active at the point of spill are not necessarily the same as at the restore point. Avoid this problem by fully allocating SGPRs in a separate regalloc run from VGPRs. This way we know the exact number of VGPRs needed, and can reserve them for a second run. This fixes the most serious issues, but it is still possible using inline asm to make all VGPRs unavailable. Start erroring in the case where we ever would require memory for an SGPR spill. This is implemented by giving each regalloc pass a callback which reports if a register class should be handled or not. A few passes need some small changes to deal with leftover virtual registers. In the AMDGPU implementation, a new pass is introduced to take the place of PrologEpilogInserter for SGPR spills emitted during the first run. One disadvantage of this is currently StackSlotColoring is no longer used for SGPR spills. It would need to be run again, which will require more work. Error if the standard -regalloc option is used. Introduce new separate -sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be controlled individually. PBQB is not currently supported, so this also prevents using the unhandled allocator.	2021-07-13 18:49:29 -04:00
Guillaume Chatelet	2c47b8847e	Revert "[llvm] Add enum iteration to Sequence" This reverts commit `a006af5d6e`.	2021-07-13 16:44:42 +00:00
Guillaume Chatelet	a006af5d6e	[llvm] Add enum iteration to Sequence This patch allows iterating typed enum via the ADT/Sequence utility. Differential Revision: https://reviews.llvm.org/D103900	2021-07-13 16:22:19 +00:00
Matt Arsenault	222fde1eec	GlobalISel: Use extension instead of merge with undef in common case This fixes not respecting signext/zeroext in these cases. In the anyext case, this avoids a larger merge with undef and should be a better canonical form. This should also handle this if a merge is needed, but I'm not aware of a case where that can happen. In a future change this will also allow AMDGPU to drop some custom code without introducing regressions.	2021-07-13 11:04:47 -04:00
Matt Arsenault	77a608d9de	GlobalISel: Remove getIntrinsicID utility function This is redundant with a method directly on MachineInstr	2021-07-13 11:04:10 -04:00
Qiu Chaofan	954a15d639	[SelectionDAG] Check use before combining into USUBSAT Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D105789	2021-07-13 14:50:26 +08:00

1 2 3 4 5 ...

31133 Commits