llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	95ac3d15e9	[AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization. For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize completely if the source is <= 64b. This change adds support for that in the legalizer. If the source has a pow-2 num elements, then we can do a tree reduction using the scalar operation in the individual elements. Otherwise, we just create a sequential chain of operations. For AArch64, we only need to scalarize if the input is <64b. If it's great than 64b then we can first do a fewElements step to 64b, taking advantage of vector instructions until we reach the point of scalarization. I also had to relax the verifier checks for reductions because the intrinsics support <1 x EltTy> types, which we lower to scalars for GlobalISel. Differential Revision: https://reviews.llvm.org/D108276	2021-08-19 16:38:52 -07:00
Amara Emerson	a0051f7149	[AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT. When support for copying vector s8 lanes was added recently, this also had the side effect of fixing a fallback for <16 x s8> extracts since both used the same helper. However, there was a bug in another helper to get the regclass for a specific FPR-native type, which was assigning FPR16 to s8 instead of FPR8.	2021-08-19 16:22:32 -07:00
Tim Northover	edab411ee6	AArch64: copy all parts of the mem operand across when combining a store In particular we were dropping volatility, which can lead to unwanted transformations.	2021-08-19 18:26:39 +01:00
Owen Anderson	06a4c85890	Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64. This allows the instruction selector to realize that it can directly broadcast the low byte of the memset value, rather than replicating it to a 64-bit GPR before broadcasting. This fixes PR50985. Differential Revision: https://reviews.llvm.org/D108354	2021-08-19 16:54:07 +00:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
Jessica Paquette	3d91d5b757	[AArch64][GlobalISel] Mark G_FMINNUM/G_FMAXNUM as floating point opcodes We need to ensure that these end up on FPR to allow imported patterns to select them. This will also ensure that we get good regbank selection when dealing with instructions like G_PHI/G_LOAD/G_STORE which deduce their banks from their uses/users. Differential Revision: https://reviews.llvm.org/D108260	2021-08-18 13:32:19 -07:00
Jessica Paquette	45e1a6bd25	[AArch64][GlobalISel] Legalize scalar G_FMINNUM + G_FMAXNUM For subtargets with full FP16, this is legal for s16, s32, and s64. Without full FP16, it's legal for s32 and s64. For s128, this is a libcall. We also support some vector types, but for now, let's just support scalars. Differential Revision: https://reviews.llvm.org/D108259	2021-08-18 13:30:03 -07:00
Jessica Paquette	791006fb8c	[GlobalISel] Implement lowering for G_ISNAN + use it in AArch64 GlobalISel equivalent to `TargetLowering::expandISNAN`. Use it in AArch64 and add a testcase. Differential Revision: https://reviews.llvm.org/D108227	2021-08-18 10:54:25 -07:00
Jessica Paquette	d9873711cb	[GlobalISel] Add IRTranslator support for G_ISNAN Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it. This is pretty much the same as the associated SelectionDAGBuilder code. Main difference is that we don't expand it here. It makes more sense to do that during legalization in GlobalISel. GlobalISel will just legalize the generated illegal types. Differential Revision: https://reviews.llvm.org/D108226	2021-08-18 10:48:10 -07:00
Amara Emerson	284006079e	[AArch64][GlobalISel] Add support for selection of s8:fpr = G_UNMERGE <8 x s8>	2021-08-18 00:34:06 -07:00
Yunde Zhong	9790a2a72f	[tests] precommit tests for D107692	2021-08-17 13:05:02 +08:00
Paul Walker	cd0e196413	[DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation. visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when removing "redundant" INSERT_SUBVECTOR operations. This patch adds an extra check to ensure such combines only occur after operation legalisation if any resulting BITBAST is itself legal. Differential Revision: https://reviews.llvm.org/D108086	2021-08-15 18:25:49 +01:00
Nikita Popov	81b106584f	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076	2021-08-15 12:35:52 +02:00
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
Jessica Paquette	50efbf9cbe	[GlobalISel] Narrow binops feeding into G_AND with a mask This is a fairly common pattern: ``` %mask = G_CONSTANT iN <mask val> %add = G_ADD %lhs, %rhs %and = G_AND %add, %mask ``` We have combines to eliminate G_AND with a mask that does nothing. If we combined the above to this: ``` %mask = G_CONSTANT iN <mask val> %narrow_lhs = G_TRUNC %lhs %narrow_rhs = G_TRUNC %rhs %narrow_add = G_ADD %narrow_lhs, %narrow_rhs %ext = G_ZEXT %narrow_add %and = G_AND %ext, %mask ``` We'd be able to take advantage of those combines using the trunc + zext. For this to work (or be beneficial in the best case) - The operation we want to narrow then widen must only be used by the G_AND - The G_TRUNC + G_ZEXT must be free - Performing the operation at a narrower width must not produce a different value than performing it at the original width after masking. Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign. Differential Revision: https://reviews.llvm.org/D107929	2021-08-13 18:31:13 -07:00
Jessica Paquette	ccfc079047	[AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT These are lowered, matching SDAG behaviour. (See llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll) These fall back ~159 times on a build of clang with GISel enabled. Differential Revision: https://reviews.llvm.org/D107777	2021-08-13 09:02:25 -07:00
David Truby	8f359a80e4	[llvm][sve] Fix erroneous tests for fixed length extending loads	2021-08-12 10:31:43 +00:00
David Truby	9c47d6b48d	[llvm][sve] Lowering for VLS extending loads This patch enables extending loads for fixed length SVE code generation. There is a slight regression here in the mulh tests; since these tests load the parameter and then extend it these are treated as extending loads which are merged, preventing the mulh instruction from being generated. As this affects scalable SVE codegen as well this should be addressed in a separate patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D107057	2021-08-12 09:43:39 +00:00
Amara Emerson	73056f239e	[AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules. These rules were originally written when the new predicate based legalizer was introduced in an attempt to preserve existing behaviour. It wasn't properly kept up to date as things like vector support was split out into G_CONCAT_VECTORS, and frankly, even if it was, it was too complex. It's much easier to start from scratch with what we can actually support, which is just a few type combinations. Anything illegal we should either legalize, or should be eliminated as a side effect of artifact combination. Differential Revision: https://reviews.llvm.org/D107937	2021-08-11 16:45:23 -07:00
Usman Nadeem	9396c3ec7b	[AArch64][SVE] Remove assertion/range check for i16 values during immediate selection The assertion can fail in some cases when an i16 constant is promoted to i32. e.g. in the added test case the value `i16 -32768` is within the range of i16 but the assert fails when the constant is promoted to positive `i32 32768` by an earlier call to DAG.getConstant(). Differential Revision: https://reviews.llvm.org/D107880 Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de	2021-08-11 14:50:20 -07:00
Amara Emerson	2c1789bc8c	[AArch64][GlobalISel] Add ptradd_immed_chain combine to post-legalizer combiner.	2021-08-11 13:59:23 -07:00
Amara Emerson	7ec4ce157b	[AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality. With contributions by Sebastian Neubauer Differential Revision: https://reviews.llvm.org/D105676	2021-08-10 16:41:18 -07:00
Tim Northover	5ad0860899	AArch64: support @llvm.va_copy in GISel	2021-08-10 13:11:03 +01:00
Konstantin Schwarz	64bef13f08	[GlobalISel] Look through truncs and extends in narrowScalarShift If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be shifted individually by the constant shift amount. However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization code was used, producing intermediate shifts that are potentially illegal on some targets. This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D89100	2021-08-10 13:49:22 +02:00
Eli Friedman	ac20e56911	[AArch64] Implement FCOPYSIGN for SVE. I was originally going to try to implement this in target-independent code, but it's actually sort of tricky to generate the correct sequence for vectors like nxv2f32. So just stick this in target-specific code, at least for now. Differential Revision: https://reviews.llvm.org/D107608	2021-08-09 12:06:48 -07:00
Bradley Smith	73ecb9987b	[AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter The patterns for fixed length gather/scatter with 32-bit offsets and 64-bit memory type are slightly different that the rest of the patterns, as such the lowering needs to be slightly different to ensure the correct types are used. Differential Revision: https://reviews.llvm.org/D107576	2021-08-09 14:05:22 +00:00
Amara Emerson	4c2e01232c	[GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs. We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() instead of eraseFromParent(). We should probably use that in other places too but fix this issue which affects clang bootstrap builds for now.	2021-08-07 01:41:02 -07:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Jessica Paquette	e6a3944ea9	[AArch64][GlobalISel] Overhaul G_INSERT legalization Similar cleanup to G_EXTRACT (`51bd4e874f`). Also swap the order of clamp/widen to avoid unnecessary complex merges. Add a bunch of missing testcases to legalize-inserts while we're at it. Differential Revision: https://reviews.llvm.org/D107601	2021-08-05 18:28:22 -07:00
Jessica Paquette	562c8e14d9	[AArch64][GlobalISel] Widen G_IMPLICIT_DEF and G_FREEZE before clamping Similar to other cleanup commits which widen instructions before clamping during legalization. Purpose of this is to avoid weird type breakdowns. In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions. The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so this can help with emitting merges elsewhere. Differential Revision: https://reviews.llvm.org/D107604	2021-08-05 18:21:14 -07:00
Amara Emerson	1577c41090	[GlobalISel] Allow the ArtifactValueFinder to return the best available register on failure. In some cases, like with inserts, we may have a matching size register already, but still decide to try to look further. This change adds a CurrentBest register to the value finder state, and any time a method fails to make progress, returns that register (which may just be an empty Register). To facilitate this, add a new entry point to the findValueFromDef() function which initializes this state. Also fix the build vector finder to return the current build_vector if all sources are being requested. Differential Revision: https://reviews.llvm.org/D107017	2021-08-05 17:37:30 -07:00
Jessica Paquette	8a557d8311	[AArch64][GlobalISel] Widen extloads before clamping during legalization Allows us to avoid awkward type breakdowns on types like s88, like the other commits. Differential Revision: https://reviews.llvm.org/D107587	2021-08-05 16:14:06 -07:00
Jessica Paquette	36498374d4	[AArch64][GlobalISel] Widen G_BSWAP before clamping This allows us to avoid odd type breakdowns + allows us to legalize types like s88 in the first place. Add some testcases for known legal types + testcases for s4 and s88. Differential Revision: https://reviews.llvm.org/D107607	2021-08-05 15:16:00 -07:00
Jessica Paquette	51bd4e874f	[AArch64][GlobalISel] Overhaul G_EXTRACT legalization This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly changing this because it should make it easier to improve legalization for instructions which use G_EXTRACT as part of the legalization process. This also adds support for legalizing some weird types. Similar to other recent legalizer changes, this changes the order of widening/clamping. There was some dead code in our existing rules (e.g. the p0 case would never get hit), so this knocks those out and makes the types we want to handle explicit. This also removes some checks which, nowadays, are handled by the MachineVerifier. Differential Revision: https://reviews.llvm.org/D107505	2021-08-05 13:55:15 -07:00
Jon Roelofs	98f38c151b	[AArch64][GlobalISel] Legalize ctpop s128 This is re-landing the same patch again, but without the changes to LegalizerHelper that regressed the Mips test: test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll Differential revision: https://reviews.llvm.org/D106494	2021-08-05 11:54:53 -07:00
Jessica Paquette	f3f3098afe	[AArch64][GlobalISel] Mark v16s8 <- v8s8, v8s8 G_CONCAT_VECTOR as legal G_CONCAT_VECTORS shows up from time to time when legalizing other instructions. We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it as legal gives us selection for free. Differential Revision: https://reviews.llvm.org/D107512	2021-08-05 09:40:46 -07:00
Simon Pilgrim	2cbf9fd402	[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes. This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector. This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector. Fixes PR50053 Differential Revision: https://reviews.llvm.org/D107068	2021-08-05 15:40:48 +01:00
Dominik Montada	cc947e29ea	[GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107330	2021-08-05 13:52:10 +02:00
Jessica Paquette	ca2e053652	[AArch64][GlobalISel] Legalize wide vector G_PHIs Clamp the max number of elements when legalizing G_PHI. This allows us to legalize some common fallbacks like 4 x s64. Here's an example: https://godbolt.org/z/6YocsEYTd Had to add -global-isel-abort=0 to legalize-phi.mir to account for the G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI. Differential Revision: https://reviews.llvm.org/D107508	2021-08-04 16:48:59 -07:00
Jessica Paquette	d9279843b1	[AArch64][GlobalISel] Widen G_PHI before clamping it during legalization This allows us to handle weird types like s88; we first widen to s128, then clamp back down to s64. https://godbolt.org/z/9xqbP46Mz Also this makes it possible for GISel to legalize the case in pr48188.ll. It now does the same thing as SDAG, although regalloc chooses different registers. Differential Revision: https://reviews.llvm.org/D107417	2021-08-04 10:25:14 -07:00
Jessica Paquette	7d97de60b3	[AArch64][GlobalISel] Widen G_FPTO*I before clamping Going through our legalization rules and doing some cleanup. Widening and then clamping is usually easier than clamping and then widening. This allows us to legalize some weird types like s88. Differential Revision: https://reviews.llvm.org/D107413	2021-08-04 10:19:26 -07:00
Bradley Smith	d9cc5d84e4	[AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores An insert subvector that is inserting the result of a vector predicate sized load into undef at index 0, whose result is casted to a predicate type, can be combined into a direct predicate load. Likewise the same applies to extract subvector but in reverse. The purpose of this optimization is to clean up cases that will be introduced in a later patch where casts to/from predicate types from i8 types will use insert subvector, rather than going through memory early. This optimization is done in SVEIntrinsicOpts rather than InstCombine to re-introduce scalable loads as late as possible, to give other optimizations the best chance possible to do a good job. Differential Revision: https://reviews.llvm.org/D106549	2021-08-04 15:51:14 +00:00
Simon Wallis	9269752671	[AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults Don't know how to custom expand this UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788 The fix is to provide missing expansions for: case ISD::STRICT_FP_TO_UINT: case ISD::STRICT_FP_TO_SINT: A test case is provided. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107452	2021-08-04 16:18:19 +01:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Jessica Paquette	5643736378	[AArch64][GlobalISel] Widen G_SELECT before clamping it This allows us to handle the s88 G_SELECTS: https://godbolt.org/z/5s18M4erY Weird types like this can result in weird merges. Widening to s128 first and then clamping down avoids that situation. Differential Revision: https://reviews.llvm.org/D107415	2021-08-03 18:31:17 -07:00
David Green	bd07c2e266	[AArch64] Prefer fmov over orr v.16b when copying f32/f64 This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to a fmov of the appropriate type. At least on some CPU's with 64bit NEON data paths this is expected to be faster, and shouldn't be slower on any CPU that treats fmov as a register rename. Differential Revision: https://reviews.llvm.org/D106365	2021-08-03 17:25:40 +01:00
Jason Molenda	0d8cd4e2d5	[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is unshifted, subs x9, x0, #256 ; =256 when the comment adds nothing additional to the reader. Differential Revision: https://reviews.llvm.org/D107196	2021-08-03 02:28:46 -07:00

1 2 3 4 5 ...

4879 Commits