llvm-project

Commit Graph

Author	SHA1	Message	Date
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Chris Lattner	9e46dd965a	[APInt.h] Reduce the APInt header file interface a bit. NFC This moves one mid-size function out of line, inlines the trivial tcAnd/tcOr/tcXor/tcComplement methods into their only caller, and moves the magic/umagic functions into SelectionDAG since they are implementation details of its algorithm. This also removes the unit tests for magic, but these are already tested in the divide lowering logic for various targets. This also upgrades some C style comments to C++. Differential Revision: https://reviews.llvm.org/D109476	2021-09-08 18:17:07 -07:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC `79845ed6df` folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Roman Lebedev	f5753125f0	[Codegen][TLI][X86] SimplifyMultipleUseDemandedBits(): 0'th vec subreg widening is free, try to perform it earlier I believe, the profitability reasoning here is correct "sub"reg is already located within the 0'th subreg of wider reg, so if we have suvector insertion at index 0 into undef, then it's always free do to. After this, D109065 finally avoids the regression in D108382. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D109074	2021-09-02 00:54:05 +03:00
Bjorn Pettersson	789f01283d	[SelectionDAG] Fix miscompile bugs related to smul.fix.sat with scale zero When expanding a SMULFIXSAT ISD node (usually originating from a smul.fix.sat intrinsic) we've applied some optimizations for the special case when the scale is zero. The idea has been that it would be cheaper to use an SMULO instruction (if legal) to perform the multiplication and at the same time detect any overflow. And in case of overflow we could use some SELECT:s to replace the result with the saturated min/max value. The only tricky part is to know if we overflowed on the min or max value, i.e. if the product is positive or negative. Unfortunately the implementation has been incorrect as it has looked at the product returned by the SMULO to determine the sign of the product. In case of overflow that product is truncated and won't give us the correct sign bit. This patch is adding an extra XOR of the multiplication operands, which is used to determine the sign of the non truncated product. This patch fixes PR51677. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D108938	2021-08-30 22:08:26 +02:00
Craig Topper	8bb24289f3	[SelectionDAG] Optimize bitreverse expansion to minimize the number of mask constants. We can halve the number of mask constants by masking before shl and after srl. This can reduce the number of mov immediate or constant materializations. Or reduce the number of constant pool loads for X86 vectors. I think we might be able to do something similar for bswap. I'll look at it next. Differential Revision: https://reviews.llvm.org/D108738	2021-08-26 09:33:24 -07:00
David Green	d10f23a25d	[ISel] Expand saddsat and ssubsat via asr and xor This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853	2021-08-19 16:08:07 +01:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Nemanja Ivanovic	62fe3dcf98	Fix PPC buildbot break caused by `4c4093e6e3` This commit adds the isnan intrinsic and provides a default expansion for it in the SDAG. However, it makes the assumption that types it operates on are IEEE-compliant types. This is not always the case. An example of that is PPC "double double" which has a representation that - Does not need to conform to IEEE requirements for isnan as it is not an IEEE-compliant type - Does not have a representation that allows for straightforward reinterpreting as an integer and use of integer operations The result was that this commit broke __builtin_isnan for ppc_fp128 making many valid numeric values report a NaN. This patch simply changes the expansion to always expand to unordered comparison (regardless of whether FP exceptions are tracked). This is inline with previous semantics.	2021-08-06 22:10:20 -05:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Alexandros Lamprineas	7d940432c4	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-07-31 09:51:28 +01:00
Fraser Cormack	f924a3d474	[SelectionDAG] Support scalable-vector splats in yet more cases This patch extends support for (scalable-vector) splats in the DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a variety of simple combines of constants. Users of this function may now have to distinguish between `BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing with this in-tree follows the approach added for `ISD::matchUnaryPredicate` implemented in D94501. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106575	2021-07-26 10:15:08 +01:00
Simon Pilgrim	c261a06b7a	[DAG] Add initial SelectionDAG::isGuaranteedNotToBeUndefOrPoison framework (PR51129) I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage. I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place. SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added. Differential Revision: https://reviews.llvm.org/D106668	2021-07-24 11:36:35 +01:00
Roman Lebedev	af8fa36bf0	[NFCI][TLI] prepare[US]REMEqFold(): don't add nonsensical 'exact' flag to rotates created As pointed out by Craig Topper.	2021-07-22 23:02:58 +03:00
Roman Lebedev	5b51bd1878	[TLI] prepareSREMEqFold(): use correct VT for the final VSELECT (PR51133) We were using the wrong VT for this final VSELECT, it should be in the final comparison VT, not the source value's VT. Fixes https://bugs.llvm.org/show_bug.cgi?id=51133	2021-07-19 16:44:00 +03:00
Arthur Eubanks	aad41e2299	[OpaquePtr] Use ArgListEntry::IndirectType for lowering ABI attributes Consolidate PreallocatedType and ByValType into IndirectType, and use that for inalloca.	2021-07-07 14:58:38 -07:00
Bradley Smith	2668727929	[SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR Inserting into a smaller-than-legal scalable vector would result in an internal compiler error. For example, inserting a <vscale x 4 x i8> into a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a crash. This crash was happening because there was no code to promote (legalise) the result of an INSERT_SUBVECTOR node. This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises the ISD node. This is currently done by going through memory. This is necessary because of the requirement that the SubVec parameter of the INSERT_SUBVECTOR node must be smaller than the Vec parameter, which means that INSERT_SUBVECTOR cannot always have a legal result/operand types. Co-Authored-by: Joe Ellis <joe.ellis@arm.com> Differential Revision: https://reviews.llvm.org/D102766	2021-07-01 17:05:53 +01:00
Bradley Smith	002911503f	[TargetLowering][AArch64][SVE] Take into account accessed type when clamping address When clamping the index for a memory access to a stacked vector we must take into account the entire type being accessed, not just assume that we are accessing only a single element. Differential Revision: https://reviews.llvm.org/D105016	2021-06-30 13:30:18 +01:00
Martin Storsjö	42f74e8249	[llvm] Rename StringRef _lower() method calls to _insensitive() This is a mechanical change. This actually also renames the similarly named methods in the SmallString class, however these methods don't seem to be used outside of the llvm subproject, so this doesn't break building of the rest of the monorepo.	2021-06-25 00:22:01 +03:00
Craig Topper	03f9e04bc3	[TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant. We don't constant fold based on demanded bits elsewhere in SimplifyDemandedBits, so I don't think we should shrink them either. The affected ARM test changes because a constant become non-opaque and eventually enabled some constant folding. This no longer happens. I checked and InstCombine is able to simplify this test. I'm not sure exactly what it was trying to test. Reviewed By: lebedev.ri, dmgreen Differential Revision: https://reviews.llvm.org/D104832	2021-06-24 10:09:36 -07:00
Roman Lebedev	585e65d330	[TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0)) Iff we have `SCALAR_TO_VECTOR` (and we demand it's only defined 0'th element), and said scalar was produced by `EXTRACT_VECTOR_ELT` from the 0'th element of some vector, then we can just continue traversal into said source vector. This comes up in X86 vector uniform shift lowering. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D104250	2021-06-14 23:52:53 +03:00
Arthur Eubanks	47211fa889	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" Needs to be discussed more. This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46 This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1 This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23 This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd	2021-06-07 16:07:44 -07:00
Arthur Eubanks	9255a5c1ba	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. Issues can be diagnosed with D103412. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-06-03 15:52:01 -07:00
Arthur Eubanks	71cca4f728	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" This reverts commit `1c7f32334d`. Some code still needs to properly set parameter ABI attributes, see D101806.	2021-05-29 23:08:15 -07:00
Arthur Eubanks	3a6f12f915	Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering" This reverts commit `bc7d15c61d`. Dependent change is to be reverted.	2021-05-29 22:40:33 -07:00
Arthur Eubanks	bc7d15c61d	[NFC] Use ArgListEntry indirect types more in ISel lowering For opaque pointers, we're trying to avoid uses of PointerType::getElementType(). A couple of ISel places use PointerType::getElementType(). Some of these are easy to fix by using ArgListEntry's indirect types. The inalloca type wasn't stored there, as opposed to preallocated and byval which have their indirect types available, so add it and use it. This is a reland after an MSan fix in D102667. Differential Revision: https://reviews.llvm.org/D101713	2021-05-18 14:30:22 -07:00
Arthur Eubanks	1c7f32334d	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. This is a reland after fixing MSan issues in D102667. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-05-18 14:30:22 -07:00
Simon Pilgrim	c29522d648	[TargetLowering] prepareUREMEqFold/prepareSREMEqFold - account for non legal shift types Ensure we tell getShiftAmountTy that we're working with pre-legalized types to prevent cases where the (legalized) shift type can no longer handle the (non-legalized) type width. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34366	2021-05-17 11:03:27 +01:00
Arthur Eubanks	341902672c	Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry" This reverts commit `16748bd2fb`. Causes https://crbug.com/1209013	2021-05-16 22:02:10 -07:00
Arthur Eubanks	7647cb14dc	Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering" This reverts commit `85af8a8c1b`.	2021-05-16 22:00:54 -07:00
Tim Northover	ea0eec69f1	IR+AArch64: add a "swiftasync" argument attribute. This extends any frame record created in the function to include that parameter, passed in X22. The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001 in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect of this is that tools walking the stack should expect to see one of three values there: * 0b0000 => a normal, non-extended record with just [FP, LR] * 0b0001 => the extended record [X22, FP, LR] * 0b1111 => kernel space, and a non-extended record. All other values are currently reserved. If compiling for arm64e this context pointer is address-discriminated with the discriminator 0xc31a and the DB (process-specific) key. There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing front-ends access to this slot (and forcing its creation initialized to nullptr if necessary).	2021-05-14 11:43:58 +01:00
Arthur Eubanks	85af8a8c1b	[NFC] Use ArgListEntry indirect types more in ISel lowering For opaque pointers, we're trying to avoid uses of PointerType::getElementType(). A couple of ISel places use PointerType::getElementType(). Some of these are easy to fix by using ArgListEntry's indirect types. The inalloca type wasn't stored there, as opposed to preallocated and byval which have their indirect types available, so add it and use it. Differential Revision: https://reviews.llvm.org/D101713	2021-05-10 13:05:15 -07:00
Arthur Eubanks	16748bd2fb	[TargetLowering] Only inspect attributes in the arguments for ArgListEntry Parameter attributes are considered part of the function [1], and like mismatched calling conventions [2], we can't have the verifier check for mismatched parameter attributes. [1] https://llvm.org/docs/LangRef.html#parameter-attributes [2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D101806	2021-05-10 12:35:11 -07:00
Simon Pilgrim	280aa3415e	[DAG] Add a generic expansion for SHIFT_PARTS opcodes using funnel shifts Based off a discussion on D89281 - where the AARCH64 implementations were being replaced to use funnel shifts. Any target that has efficient funnel shift lowering can handle the shift parts expansion using the same expansion, avoiding a lot of duplication. I've generalized the X86 implementation and moved it to TargetLowering - so far I've found that AARCH64 and AMDGPU benefit, but many other targets (ARM, PowerPC + RISCV in particular) could easily use this with a few minor improvements to their funnel shift lowering (or the folding of their target ops that funnel shifts lower to). NOTE: I'm trying to avoid adding full SHIFT_PARTS legalizer handling as I think it might actually be possible to remove these opcodes in the medium-term and use funnel shift / libcall expansion directly. Differential Revision: https://reviews.llvm.org/D101987	2021-05-07 13:12:30 +01:00
Craig Topper	3067520bf4	[SelectionDAG] Use a VTSDNode to store the saturation width for FP_TO_SINT_SAT/FP_TO_UINT_SAT Previously we used an i32 constant to store the saturation width, but i32 isn't legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the type legalizer. This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes it opaque to the type legalizer. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101262	2021-04-27 14:38:42 -07:00
Dávid Bolvanský	ef2dc7ed9f	[Analysis] Attribute alignment should not prevent tail call optimization Fixes tail folding issue mentioned in D100879. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D101230	2021-04-24 19:57:42 +02:00
Simon Pilgrim	d860bf2d0e	[DAG] TargetLowering.cpp - breakup if-else chains where each block returns. NFCI. Match style guide that requests that if+return blocks are separate.	2021-04-21 11:17:27 +01:00
David Sherwood	83f5fa519e	[CodeGen] Improve code generation for clamping of constant indices with scalable vectors When trying to clamp a constant index into a scalable vector we can test if the index is less than the minimum number of elements in the vector. If so, we can simply return the index because we know it is guaranteed to fit inside the vector. Differential Revision: https://reviews.llvm.org/D100639	2021-04-19 08:34:17 +01:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Momchil Velikov	f9d932e673	[clang][AArch64] Correctly align HFA arguments when passed on the stack When we pass a AArch64 Homogeneous Floating-Point Aggregate (HFA) argument with increased alignment requirements, for example struct S { __attribute__ ((__aligned__(16))) double v[4]; }; Clang uses `[4 x double]` for the parameter, which is passed on the stack at alignment 8, whereas it should be at alignment 16, following Rule C.4 in AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules) Currently we don't have a way to express in LLVM IR the alignment requirements of the function arguments. The align attribute is applicable to pointers only, and only for some special ways of passing arguments (e..g byval). When implementing AAPCS32/AAPCS64, clang resorts to dubious hacks of coercing to types, which naturally have the needed alignment. We don't have enough types to cover all the cases, though. This patch introduces a new use of the stackalign attribute to control stack slot alignment, when and if an argument is passed in memory. The attribute align is left as an optimizer hint - it still applies to pointer types only and pertains to the content of the pointer, whereas the alignment of the pointer itself is determined by the stackalign attribute. For byval arguments, the stackalign attribute assumes the role, previously perfomed by align, falling back to align if stackalign` is absent. On the clang side, when passing arguments using the "direct" style (cf. `ABIArgInfo::Kind`), now we can optionally specify an alignment, which is emitted as the new `stackalign` attribute. Patch by Momchil Velikov and Lucas Prates. Differential Revision: https://reviews.llvm.org/D98794	2021-04-15 22:58:14 +01:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Cullen Rhodes	2750f3ed31	[IR] Introduce llvm.experimental.vector.splice intrinsic This patch introduces a new intrinsic @llvm.experimental.vector.splice that constructs a vector of the same type as the two input vectors, based on a immediate where the sign of the immediate distinguishes two variants. A positive immediate specifies an index into the first vector and a negative immediate specifies the number of trailing elements to extract from the first vector. For example: @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E> ; index @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, -3) ==> <B, C, D, E> ; trailing element count These intrinsics support both fixed and scalable vectors, where the former is lowered to a shufflevector to maintain existing behaviour, although while marked as experimental the recommended way to express this operation for fixed-width vectors is to use shufflevector. For scalable vectors where it is not possible to express a shufflevector mask for this operation, a new ISD node has been implemented. This is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker and Cullen Rhodes. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D94708	2021-03-09 10:44:22 +00:00
Craig Topper	74e6030bcb	[TargetLowering] Use HandleSDNodes to prevent nodes from being deleted by recursive calls in getNegatedExpression. For binary or ternary ops we call getNegatedExpression multiple times and then compare costs. While we're doing this we need to hold a node from the first call across the second call, but its not yet attached to the DAG. Its possible the second call creates an identical node and then decides it didn't need it so will try to delete it if it has no uses. This can cause a reference to the node we're holding further up the call stack to become invalidated. To prevent this, we can use a HandleSDNode to artifically give the node a use without connecting it to the DAG. I've used a std::list of HandleSDNodes so we can create handles only when we have a node to hold. HandleSDNode does not have default constructor and cannot be copied or moved. Fixes PR49393. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97914	2021-03-04 22:48:25 -08:00
Simon Pilgrim	7d3d9fe8cd	[DAG] TargetLowering::BuildUDIV - use APInt as const ref. NFCI. Fixes clang-tidy warning.	2021-03-04 12:15:08 +00:00
Simon Pilgrim	73adc26ac0	[DAG] expandAddSubSat - break if-else chain. NFCI. Fix styleguide issue - each if() block always returns so we don't need to make them a if-else chain.	2021-02-26 11:02:08 +00:00
Kazu Hirata	ffba9e596d	[CodeGen] Use range-based for loops (NFC)	2021-02-21 19:58:07 -08:00
Craig Topper	064ada4ec6	[SelectionDAG][AArch64] Restrict matchUnaryPredicate to only handle SPLAT_VECTOR for scalable vectors. `fde2466171` added support for scalable vectors to matchUnaryPredicate by handling SPLAT_VECTOR in addition to BUILD_VECTOR. This was used to enabled UDIV/SDIV/UREM/SREM by constant expansion in BuildUDIV/BuildSDIV in TargetLowering.cpp The caller there expects to call getBuildVector from the match factors. This leads to a crash right now if there is a SPLAT_VECTOR of fixed vectors since the number of vectors won't match the number of elements. To fix this, this patch updates the callers to check the opcode instead of whether the type is fixed or scalable. This assumes that only 3 opcodes are handled by matchUnaryPredicate so I've added an assertion to the final else to check that opcode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96174	2021-02-16 09:22:46 -08:00
Craig Topper	eb75f250fe	[RISCV][LegalizeTypes] Try to expand BITREVERSE before promoting if the promoted BITREVERSE would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. Simlilar was done for BSWAP previously. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96681	2021-02-15 12:33:16 -08:00
Simon Pilgrim	7ad0c573bd	[DAG] Fix shift amount limit in SimplifyDemandedBits trunc(shift(x,c)) to truncated bitwidth We lost this in D56387/rG69bc0990a9181e6eb86228276d2f59435a7fae67 - where I got the src/dst bitwidths mixed up and assumed getValidShiftAmountConstant would catch it. Patch by @craig.topper - confirmed by @Carrot that it fixes PR49162	2021-02-13 12:00:08 +00:00
Craig Topper	5744502a13	[TargetLowering][RISCV][AArch64][PowerPC] Enable BuildUDIV/BuildSDIV on illegal types before type legalization if we can find a larger legal type that supports MUL. If we wait until the type is legalized, we'll lose information about the orginal type and need to use larger magic constants. This gets especially bad on RISCV64 where i64 is the only legal type. I've limited this to simple scalar types so it only works for i8/i16/i32 which are most likely to occur. For more odd types we might want to do a small promotion to a type where MULH is legal instead. Unfortunately, this does prevent some urem/srem+seteq matching since that still require legal types. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D96210	2021-02-11 09:43:13 -08:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Craig Topper	8cc9c42a0c	[TargetLowering] Use LegalOnly operand to isOperationLegalOrCustom to simplify some code. NFC	2021-02-04 12:30:37 -08:00
xgupta	94fac81fcc	[Branch-Rename] Fix some links According to the [[ https://foundation.llvm.org/docs/branch-rename/ \| status of branch rename ]], the master branch of the LLVM repository is removed on 28 Jan 2021. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95766	2021-02-01 16:43:21 +05:30
Craig Topper	70289ea6f5	[RISCV][LegalizeTypes] Try to expand BSWAP before promoting if the promoted BSWAP would expand anyway. If we're going to end up expanding anyway, we should do it early so we don't create extra operations to handle the bytes added by promotion. This is helfpul on RISCV where we might have to promote i16 all the way to i64. Differential Revision: https://reviews.llvm.org/D95756	2021-01-31 14:33:29 -08:00
Craig Topper	ea87cf2acd	[TargetLowering][RISCV] Don't transform (seteq/ne (sext_inreg X, VT), C1) -> (seteq/ne (zext_inreg X, VT), C1) if the sext_inreg is cheaper RISCV has to use 2 shifts for (i64 (zext_inreg X, i32)), but we can use addiw rd, rs1, x0 for sext_inreg. We already understood this when type legalizing i32 seteq/ne on rv64. But this transform in SimplifySetCC would sometimes undo it. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D95289	2021-01-25 16:37:21 -08:00
Fraser Cormack	fde2466171	[SelectionDAG] Support scalable-vector splats in more cases This patch adds support for scalable-vector splats in DAGCombiner's `isConstantOrConstantVector` and `ISD::matchUnaryPredicate` functions, which enable the SelectionDAG div/rem-by-constant optimizations for scalable vector types. It also fixes up one case where the UDIV optimization was generating a SETCC without first consulting the target for its preferred SETCC result type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94501	2021-01-25 10:58:15 +00:00
QingShan Zhang	ffc3e800c6	[NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero For now, we correct the result for sqrt if iteration > 0. This doesn't make sense as they are not strict relative. Reviewed By: dmgreen, spatel, RKSimon Differential Revision: https://reviews.llvm.org/D94480	2021-01-25 04:02:44 +00:00
Craig Topper	147c0c263d	[TargetLowering] Use isOneConstant to simplify some code. NFC	2021-01-22 19:32:19 -08:00
Craig Topper	c953a83347	[TargetLowering] Use getBoolConstant instead of assuming zero or one for boolean contents. Noticed while I was touching other nearby code. I don't have a test where this matters because the targets I work on use zero or one boolean contents. And the tests cases I've seen this fire on happen before type legalization where the result type is MVT::i1 so the distinction doesn't matter.	2021-01-22 00:26:14 -08:00
Craig Topper	5660dc5968	[TargetLowering] Simplify some code in SimplifySetCC that tries to handle SIGN_EXTEND_INREG operand types that should never happen. NFCI There was code to handle the first operand being different than the result type. And code to handle first operand having the same type as the type to extend from. This should never happen for a correctly formed SIGN_EXTEND_INREG. I've replace the code with asserts. I also noticed we created the same APInt twice so I've reused it.	2021-01-21 23:56:37 -08:00
Simon Pilgrim	69bc0990a9	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE (REAPPLIED). Add DemandedElts support inside the TRUNCATE analysis. REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d Differential Revision: https://reviews.llvm.org/D56387	2021-01-21 13:01:34 +00:00
Simon Pilgrim	935bacd3a7	[DAG] SimplifyDemandedBits - correctly adjust truncated shift amount type As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.	2021-01-21 12:38:36 +00:00
Hans Wennborg	a51226057f	Revert "[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE" It caused "Vector shift amounts must be in the same as their first arg" asserts in Chromium builds. See the code review for repro instructions. > Add DemandedElts support inside the TRUNCATE analysis. > > Differential Revision: https://reviews.llvm.org/D56387 This reverts commit `cad4275d69`.	2021-01-20 20:06:55 +01:00
Simon Pilgrim	cad4275d69	[DAGCombiner] Enable SimplifyDemandedBits vector support for TRUNCATE Add DemandedElts support inside the TRUNCATE analysis. Differential Revision: https://reviews.llvm.org/D56387	2021-01-20 15:39:58 +00:00
Simon Pilgrim	207f32948b	[DAG] SimplifyDemandedBits - use KnownBits comparisons to remove ISD::UMIN/UMAX ops Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other. Differential Revision: https://reviews.llvm.org/D94532	2021-01-18 10:29:23 +00:00
Craig Topper	4c5066b078	[TargetLowering] Don't speculatively call ComputeNumSignBits. NFC These methods are recursive so a little costly. We only look at the result in one place in this function and it's conditional. We also only need the second call if the first had enough returned enough sign bits.	2021-01-15 09:09:35 -08:00
Jay Foad	517196e569	[Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC. Differential Revision: https://reviews.llvm.org/D94588	2021-01-14 14:02:43 +00:00
Jay Foad	a1cba5b7a1	[SelectionDAG] Make use of KnownBits::commonBits. NFC. Differential Revision: https://reviews.llvm.org/D94587	2021-01-14 14:02:43 +00:00
Fraser Cormack	41d06095b0	[SelectionDAG] Teach isConstOrConstSplat about ISD::SPLAT_VECTOR This improves llvm::isConstOrConstSplat by allowing it to analyze ISD::SPLAT_VECTOR nodes, in order to allow more constant-folding of operations using scalable vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D94168	2021-01-09 20:54:34 +00:00
Simon Pilgrim	350ab7aa1c	[DAG] Simplify OR(X,SHL(Y,BW/2)) eq/ne 0/-1 'all/any-of' style patterns Attempt to simplify all/any-of style patterns that concatenate 2 smaller integers together into an and(x,y)/or(x,y) + icmp 0/-1 instead. This is mainly to help some bool predicate reduction patterns where we end up concatenating bool vectors that have been bitcasted to integers. Differential Revision: https://reviews.llvm.org/D93599	2021-01-07 12:03:19 +00:00
Simon Pilgrim	1307e3f6c4	[TargetLowering] Add icmp ne/eq (srl (ctlz x), log2(bw)) vector support.	2021-01-06 16:13:51 +00:00
Bjorn Pettersson	a89d751fb4	Add intrinsics for saturating float to int casts This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ The intrinsics have overloaded source and result type and support vector operands: i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example: i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19 I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width. There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like: f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN If the bounds cannot be exactly represented, we expand to something like this instead: i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN It should be noted that this expansion assumes a non-trapping fptoxi. Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening. Original patch by @nikic and @ebevhan (based on D54696). Differential Revision: https://reviews.llvm.org/D54749	2020-12-18 11:09:41 +01:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Kerry McLaughlin	c5ced82c8e	[SVE][CodeGen] Lower scalable floating-point vector reductions Changes in this patch: - Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally to also work for scalable types - Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32) - Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D93050	2020-12-14 11:45:42 +00:00
Kai Luo	44bd8ea167	[DAGCombine][PowerPC] Simplify nabs by using legal `smin` operation Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation. Verification: https://alive2.llvm.org/ce/z/vpquFR Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92637	2020-12-08 03:24:07 +00:00
Simon Pilgrim	6dbd0d36a1	[DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds. The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away. Differential Revision: https://reviews.llvm.org/D91876	2020-12-01 11:56:26 +00:00
Simon Pilgrim	969918e177	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code. Differential Revision: https://reviews.llvm.org/D92183	2020-11-27 11:18:58 +00:00
Simon Pilgrim	8057ebf4a0	Revert rG12d59b696b330 "[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal" This reverts commit `12d59b696b`. Prematurely pushed this to trunk	2020-11-26 15:07:45 +00:00
Simon Pilgrim	12d59b696b	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code.	2020-11-26 14:47:28 +00:00
Simon Pilgrim	9c86c5e8ad	[DAG] Legalize abs(x) -> umin(x,sub(0,x)) iff umin/sub are legal If umin() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method. Followup to D92095 Alive2: https://alive2.llvm.org/ce/z/8nuX6s https://alive2.llvm.org/ce/z/q2hB9w	2020-11-25 18:06:02 +00:00
Simon Pilgrim	0637dfe88b	[DAG] Legalize abs(x) -> smax(x,sub(0,x)) iff smax/sub are legal If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method. This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering). Alive2: https://alive2.llvm.org/ce/z/xRk3cD Differential Revision: https://reviews.llvm.org/D92095	2020-11-25 15:03:03 +00:00
Kai Luo	8e6d92026c	[DAG][PowerPC] Fix dropped `nsw` flag in `SimplifySetCC` by adding `doesNodeExist` helper `SimplifySetCC` invokes `getNodeIfExists` without passing `Flags` argument and `getNodeIfExists` uses a default `SDNodeFlags` to intersect the original flags, as a consequence, flags like `nsw` is dropped. Added a new helper function `doesNodeExist` to check if a node exists without modifying its flags. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D89938	2020-11-25 04:39:03 +00:00
Simon Pilgrim	791040cd8b	[DAG] LowerMINMAX - move default expansion to generic TargetLowering::expandIntMINMAX This is part of the discussion on D91876 about trying to reduce custom lowering of MIN/MAX ops on older SSE targets - if we can improve generic vector expansion we should be able to relax the limitations in SelectionDAGBuilder when it will let MIN/MAX ops be generated, and avoid having to flag so many ops as 'custom'.	2020-11-22 13:02:27 +00:00
Kerry McLaughlin	170947a5de	[SVE][CodeGen] Lower scalable masked scatters Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only) Changes included in this patch: - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts. - Added the getCanonicalIndexType function to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes) - Tests with 32 & 64-bit scaled & unscaled offsets Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90941	2020-11-11 11:50:22 +00:00
David Zarzycki	a41ea782c8	[SelectionDAG] Enable CTPOP optimization fine tuning Add a TLI hook to allow SelectionDAG to fine tune the conversion of CTPOP to a chain of "x & (x - 1)" when CTPOP isn't legal. A subsequent patch will attempt to fine tune the X86 code gen. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89952	2020-11-09 13:49:01 -05:00
David Zarzycki	57e46e7123	[SelectionDAG] NFC: Hoist is legal check This was requested during the code review of D89952.	2020-11-09 10:55:15 -05:00
Kerry McLaughlin	f2412d372d	[SVE][CodeGen] Lower scalable integer vector reductions This patch uses the existing LowerFixedLengthReductionToSVE function to also lower scalable vector reductions. A separate function has been added to lower VECREDUCE_AND & VECREDUCE_OR operations with predicate types using ptest. Lowering scalable floating-point reductions will be addressed in a follow up patch, for now these will hit the assertion added to expandVecReduce() in TargetLowering. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D89382	2020-11-04 11:38:49 +00:00
Cameron McInally	dda1e74b58	[Legalize] Add legalizations for VECREDUCE_SEQ_FADD Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass. Differential Revision: https://reviews.llvm.org/D90247	2020-10-30 16:02:55 -05:00
Nikita Popov	91bf172088	[SDAG] Extract helper to get vecreduce base opcode (NFC)	2020-10-29 20:22:22 +01:00
Sven van Haastregt	5d03080092	[TargetLowering] Add i1 condition for bit comparison fold For i1 types, boolean false is represented identically regardless of the boolean content, so we can allow optimizations that otherwise would not be correct for booleans with false represented as a negative one. Patch by Erik Hogeman. Differential Revision: https://reviews.llvm.org/D90145	2020-10-27 12:22:20 +00:00
Simon Pilgrim	62b17a7697	[LegalizeTypes] Legalize vector rotate operations Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps. This fixes https://bugs.llvm.org/show_bug.cgi?id=47320 Patch By: @rsanthir.quic (Ryan Santhirarajan) Differential Revision: https://reviews.llvm.org/D89497	2020-10-24 11:30:32 +01:00
Craig Topper	9e884169a2	[FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes. There are still problems with unsigned i32 conversions too which I'll try to fix in another patch. Fixes part of PR47393 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87115	2020-10-21 18:12:54 -07:00
Sven van Haastregt	bfc961aeb2	[TargetLowering] Check boolean content when folding bit compare Updates an optimization that relies on boolean contents being either 0 or 1 to properly check for this before triggering. The following: (X & 8) != 0 --> (X & 8) >> 3 Produces unexpected results when a boolean 'true' value is represented by negative one. Patch by Erik Hogeman. Differential Revision: https://reviews.llvm.org/D89390	2020-10-21 11:46:55 +01:00
Craig Topper	edd0cb11bd	[SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats This enables these transforms for vectors: (ctpop x) u< 2 -> (x & x-1) == 0 (ctpop x) u> 1 -> (x & x-1) != 0 (ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0) (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) All enabled if CTPOP isn't Legal. This differs from the scalar behavior where the first two are done unconditionally and the last two are done if CTPOP isn't Legal or Custom. The Legal check produced better results for vectors based on X86's custom handling. Might be worth re-visiting scalars here. I disabled the looking through truncate for vectors. The code that creates new setcc can use the same result VT as the original setcc even if we truncated the input. That may work work for most scalars, but definitely wouldn't work for vectors unless it was a vector of i1. Fixes or at least improves PR47825 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89346	2020-10-19 12:56:59 -07:00
David Sherwood	35a531fb45	[SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the code was only ever expecting either scalar values or fixed width vectors. I've changed some of these places to use the equivalent scalar operator. Differential Revision: https://reviews.llvm.org/D88482	2020-10-19 08:30:31 +01:00
David Sherwood	f693f915a0	[SVE][CodeGen] Replace uses of TypeSize comparison operators In certain places in the code we can never end up in a situation where we're mixing fixed width and scalable vector types. For example, we can't have truncations and extends that change the lane count. Also, in other places such as GenWidenVectorStores and GenWidenVectorLoads we know from the behaviour of FindMemType that we can never choose a vector type with a different scalable property. In various places I have used EVT::bitsXY functions instead of TypeSize::isKnownXY, where it probably makes sense to keep an assert that scalable properties match. Differential Revision: https://reviews.llvm.org/D88654	2020-10-19 08:08:41 +01:00
Craig Topper	278bd06891	[TargetLowering] Extract simplifySetCCs ctpop into a separate function. NFCI As requested in D89346. This allows us to add some early outs. I reordered some checks a little bit to make the more common bail outs happen earlier. Like checking opcode before checking hasOneUse. And I moved the bit width check to make sure it was safe to look through a truncate to the spot where we look through truncates instead of after. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89494	2020-10-16 19:47:56 -07:00
Craig Topper	50c9f1e11d	[TargetLowering] Replace Log2_32_Ceil with Log2_32 in SimplifySetCC ctpop combine. This combine can look through (trunc (ctpop X)). When doing this it tries to make sure the trunc doesn't lose any information from the ctpop. It does this by checking that the truncated type has more bits that Log2_32_Ceil of the ctpop type. The Ceil is unnecessary and pessimizes non-power of 2 types. For example, ctpop of i256 requires 9 bits to represent the max value of 256. But ctpop of i255 only requires 8 bits to represent the max result of 255. Log2_32_Ceil of 256 and 255 both return 8 while Log2_32 returns 8 for 256 and 7 for 255 The code with popcnt enabled is a regression for this test case, but it does match what already happens with i256 truncated to i9. Since power of 2 is more likely, I don't think it should block this change. Differential Revision: https://reviews.llvm.org/D89412	2020-10-15 01:05:07 -07:00
Fangrui Song	c3de9a9e69	Fix incorect Register -> MCRegister conversion getReg returns a Register which may represent a virtual register.	2020-10-08 21:40:48 -07:00
Jay Foad	1aa8e6a51a	[SDag] SimplifyDemandedBits: simplify to FP constant if all bits known We were already doing this for integer constants. This patch implements the same thing for floating point constants. Differential Revision: https://reviews.llvm.org/D88570	2020-10-07 09:24:38 +01:00
Craig Topper	1127662c6d	[SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS. getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize FP constants to the RHS. When this happens we should create the node with the FMF that was requested. By using FlagInserter when can ensure any calls to getNode/getSetcc during canonicalization will also get the flags. Differential Revision: https://reviews.llvm.org/D88063	2020-10-05 14:55:23 -07:00
Qiu Chaofan	b326d4ff94	[SelectionDAG] Don't remove unused negated constant immediately This reverts partial of `a2fb5446` (actually, `2508ef01`) about removing negated FP constant immediately if it has no uses. However, as discussed in bug 47517, there're cases when NegX is folded into constant from other places while NegY is removed by that line of code and NegX is equal to NegY. In these cases, NegX is deleted before used and crash happens. So revert the code and add necessary test case.	2020-10-06 01:16:45 +08:00
David Sherwood	b8ce6a6756	[SVE][CodeGen] Add new EVT/MVT getFixedSizeInBits() functions When we know that a particular type is always going to be fixed width we have so far been writing code like this: getSizeInBits().getFixedSize() Since we are doing this in quite a few places now it seems to make sense to add a new helper function that allows us to replace these calls with a single getFixedSizeInBits() call. Differential Revision: https://reviews.llvm.org/D88649	2020-10-02 07:47:31 +01:00
Qiu Chaofan	a2fb5446be	[SelectionDAG] Check any use of negation result before removal `2508ef01` fixed a bug about constant removal in negation. But after sanitizing check I found there's still some issue about it so it's reverted. Temporary nodes will be removed if useless in negation. Before the removal, they'd be checked if any other nodes used it. So the removal was moved after getNode. However in rare cases the node to be removed is the same as result of getNode. We missed that and will be fixed by this patch. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D87614	2020-09-17 16:00:54 +08:00
Qiu Chaofan	e1669843f2	Revert "[SelectionDAG] Remove unused FP constant in getNegatedExpression" `2508ef01` doesn't totally fix the issue since we did not handle the case when unused temporary negated result is the same with the result, which is found by address sanitizer.	2020-09-15 22:03:50 +08:00
Qiu Chaofan	2508ef014e	[SelectionDAG] Remove unused FP constant in getNegatedExpression `960cbc53` immediately removes nodes that won't be used to avoid compilation time explosion. This patch adds the removal to constants to fix PR47517. Reviewed By: RKSimon, steven.zhang Differential Revision: https://reviews.llvm.org/D87614	2020-09-15 17:59:10 +08:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Craig Topper	56b33391d3	[SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits. Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1) but the AND might be separated from the ctpop. For example if the parity result is multiplied by 2, we'll pull the AND through the shift. So to handle more cases, move to SimplifyDemandedBits where we can handle more cases that result in only the LSB of the CTPOP being used.	2020-09-13 21:04:13 -07:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Jay Foad	517202c720	[TargetLowering] Fix comments describing XOR -> OR/AND transformations	2020-09-10 13:56:34 +01:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
David Sherwood	73a3d350a4	[SVE][CodeGen] Fix up warnings in sve-split-insert/extract tests I have fixed up some more ElementCount/TypeSize related warnings in the following tests: CodeGen/AArch64/sve-split-extract-elt.ll CodeGen/AArch64/sve-split-insert-elt.ll In SelectionDAG::CreateStackTemporary we were relying upon the implicit cast from TypeSize -> uint64_t when calling MachineFrameInfo::CreateStackObject. I've fixed this by passing in the known minimum size instead, which I believe is fine because the associated stack id indicates whether this is a scalable object or not. I've also fixed up a case in TargetLowering::SimplifyDemandedBits when extracting a vector element from a scalable vector. The result is a scalar, hence it wasn't caught at the start of the function. If the vector is scalable we just bail out for now. Differential Revision: https://reviews.llvm.org/D86431	2020-09-04 09:51:31 +01:00
Jay Foad	099c089d4b	[APInt] New member function setBitVal Differential Revision: https://reviews.llvm.org/D87033	2020-09-02 21:40:31 +01:00
David Sherwood	9fbb113247	[SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll I have fixed up a number of warnings resulting from TypeSize -> uint64_t casts and calling getVectorNumElements() on scalable vector types. I think most of the changes are fairly trivial except for those in DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create the MachineMemoryOperands in a sensible way for scalable vectors. I have added a CHECK line to the following test: CodeGen/AArch64/sve-split-load.ll that ensures no new warnings are added. Differential Revision: https://reviews.llvm.org/D86697	2020-09-01 07:47:59 +01:00
Nikita Popov	51d34c0c53	[TargetLowering] Strip tailing whitespace (NFC)	2020-08-29 18:09:08 +02:00
QingShan Zhang	deb4b25807	[DAGCombine] Don't delete the node if it has uses immediately This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node. ``` if (NegX && (CostX <= CostY)) { Cost = std::min(CostX, CostZ); RemoveDeadNode(NegY); return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags); #<-- NegY is used here if NegY == NegX. } ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86689	2020-08-28 16:13:43 +00:00
Jay Foad	b7e3599a22	[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT Differential Revision: https://reviews.llvm.org/D86449	2020-08-26 09:20:46 +01:00
Jay Foad	a522067692	[SDAG] Convert FSHL <--> FSHR if the target only supports one of them D77152 tried to do this but got it wrong in the shift-by-zero case. D86430 reverted the wrong code. Reimplement the optimization with different code depending on whether the shift amount is known to be non-zero (modulo bitwidth). This improves code quality for fshl tests on AMDGPU, which only has an fshr instruction. Differential Revision: https://reviews.llvm.org/D86438	2020-08-24 17:47:10 +01:00
Bjorn Pettersson	7a4e26adc8	[SelectionDAG] Fix miscompile bug in expandFunnelShift This is a fixup of commit `0819a6416f` (D77152) which could result in miscompiles. The miscompile could only happen for targets where isOperationLegalOrCustom could return different values for FSHL and FSHR. The commit mentioned above added logic in expandFunnelShift to convert between FSHL and FSHR by swapping direction of the funnel shift. However, that transform is only legal if we know that the shift count (modulo bitwidth) isn't zero. Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if Z modulo bitwidth, could be zero. ``` $ ./alive-tv /tmp/test.ll ---------------------------------------- define i32 @src(i32 %x, i32 %y, i32 %z) { %0: %t0 = fshl i32 %x, i32 %y, i32 %z ret i32 %t0 } => define i32 @tgt(i32 %x, i32 %y, i32 %z) { %0: %t0 = sub i32 32, %z %t1 = fshr i32 %x, i32 %y, i32 %t0 ret i32 %t1 } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %x = #x00000000 (0) i32 %y = #x00000400 (1024) i32 %z = #x00000000 (0) Source: i32 %t0 = #x00000000 (0) Target: i32 %t0 = #x00000020 (32) i32 %t1 = #x00000400 (1024) Source value: #x00000000 (0) Target value: #x00000400 (1024) ``` It could be possible to add back the transform, given that logic is added to check that (Z % BW) can't be zero. Since there were no test cases proving that such a transform actually would be useful I decided to simply remove the faulty code in this patch. Reviewed By: foad, lebedev.ri Differential Revision: https://reviews.llvm.org/D86430	2020-08-24 09:52:11 +02:00
QingShan Zhang	960cbc53ca	[DAGCombine] Remove dead node when it is created by getNegatedExpression We hit the compiling time reported by https://bugs.llvm.org/show_bug.cgi?id=46877 and the reason is the same as D77319. So we need to remove the dead node we created to avoid increase the problem size of DAGCombiner. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D86183	2020-08-24 02:50:58 +00:00
Jay Foad	0819a6416f	[SelectionDAG] Better legalization for FSHL and FSHR In SelectionDAGBuilder always translate the fshl and fshr intrinsics to FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and ORs. Improve the legalization of FSHL and FSHR to avoid code quality regressions. Differential Revision: https://reviews.llvm.org/D77152	2020-08-21 10:32:49 +01:00
David Sherwood	3f36561f69	[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only ever deals with fixed width types, hence the offsets for each individual store never take 'vscale' into account. I've changed the code in that function to use TypeSize instead of unsigned for tracking the remaining load amount. In addition, I've changed the load loop to use the new IncrementPointer helper function for updating the addresses in each iteration, since this handles scalable vector types. Also, I've added report_fatal_errors in GenWidenVectorExtLoads, TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores, since these functions currently use a sequence of element-by-element scalar loads/stores. In a similar vein, I've also added a fatal error report in FindMemType for the case when we decide to return the element type for a scalable vector type. I've added new tests in CodeGen/AArch64/sve-split-load.ll CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll for the changes in GenWidenVectorLoads. Differential Revision: https://reviews.llvm.org/D85909	2020-08-19 07:54:32 +01:00
Matt Arsenault	5b53b17cd3	DAG: Add missing comment for transform	2020-08-17 10:01:12 -04:00
Matt Arsenault	c7191e3185	DAG: Don't pass 0 alignment value to allowsMisalignedMemoryAccesses I think not unconditionally passing getDstAlign is broken, but leave that for another change.	2020-08-13 09:33:17 -04:00
Kerry McLaughlin	455ed56d48	[SVE][CodeGen] Legalisation of INSERT_VECTOR_ELT for scalable vectors When the result type of insertelement needs to be split, SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a stack temporary, store the element at the location of the stack temporary plus the index, and reload the Lo/Hi parts. This patch does the following to ensure this works for scalable vectors: - Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary - Adds an IsScalable flag to getMemBasePlusOffset() and scales the offset by VScale when this is true - Ensures the immediate is clamped correctly by clampDynamicVectorIndex so that we don't try to use an out of range index Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D84874	2020-08-11 12:57:28 +01:00
Kerry McLaughlin	85c7e89f3b	[CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize Changes the Offset arguments to both functions from int64_t to TypeSize & updates all uses of the functions to create the offset using TypeSize::Fixed() Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85220	2020-08-11 12:17:10 +01:00
Craig Topper	fdfdee98ac	[DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE X, SINTMAX -> SETGT X, -1. These aren't the canonical forms we'd get from InstCombine, but we do have X86 tests for them. Recognizing them is pretty cheap. While there make use of APInt:isSignedMinValue/isSignedMaxValue instead of creating a new APInt to compare with. Also use SelectionDAG::getAllOnesConstant helper to hide the all ones APInt creation.	2020-08-08 22:27:16 -07:00
Bevin Hansson	5de6c56f7e	[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics. Summary: This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat, which perform signed and unsigned saturating left shift, respectively. These are useful for implementing the Embedded-C fixed point support in Clang, originally discussed in http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html and http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html Reviewers: leonardchan, craig.topper, bjope, jdoerfert Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83216	2020-08-07 15:09:24 +02:00
Simon Pilgrim	b8ffbf0e02	[DAG] TargetLowering::expandMUL_LOHI - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods. This also exposes an issue where we were passing a SDNode as a SDLoc, relying on the implicit SDLoc(SDNode) constructor.	2020-08-02 15:31:36 +01:00
Simon Pilgrim	d14a22da5e	[DAG] TargetLowering::LowerAsmOutputForConstraint - pass SDLoc as const& Try to be more consistent with the SDLoc param in the TargetLowering methods.	2020-08-02 15:12:02 +01:00
Simon Pilgrim	fdc902774e	[DAG][AMDGPU][X86] Add SimplifyMultipleUseDemandedBits handling for SIGN/ZERO_EXTEND + SIGN/ZERO_EXTEND_VECTOR_INREG Peek through multiple use ops like we already do for ANY_EXTEND/ANY_EXTEND_VECTOR_INREG Differential Revision: https://reviews.llvm.org/D84863	2020-07-29 18:10:59 +01:00
Matt Arsenault	9d3e56e2ee	DAG: Try scalarizing when expanding saturating add/sub In an upcoming AMDGPU patch, the scalar cases will be legal and vector ops should be scalarized, rather than producing a long sequence of vector ops which will also need to be scalarized. Use a lazy heuristic that seems to work and improves the thumb2 MVE test.	2020-07-16 14:05:16 -04:00
Kerry McLaughlin	2762da0a16	[SVE][CodeGen] Legalisation of masked loads and stores Summary: This patch modifies IncrementMemoryAddress to use a vscale when calculating the new address if the data type is scalable. Also adds tablegen patterns which match an extract_subvector of a legal predicate type with zip1/zip2 instructions Reviewers: sdesmalen, efriedma, david-arm Reviewed By: efriedma, david-arm Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83137	2020-07-16 10:55:45 +01:00
Jay Foad	babbeafa00	[TargetLowering] Improve expansion of FSHL/FSHR by non-zero amount Use a simpler code sequence when the shift amount is known not to be zero modulo the bit width. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82540	2020-07-06 12:07:14 +01:00
Jay Foad	e7a4a24dc5	[TargetLowering] Improve expansion of ROTL/ROTR Using a negation instead of a subtraction from a constant can save an instruction on some targets. Nothing much uses this until D77152 changes the translation of fshl and fshr intrinsics. Differential Revision: https://reviews.llvm.org/D82539	2020-07-06 12:07:14 +01:00
Simon Pilgrim	3521ecf1f8	[X86] Add vector support to targetShrinkDemandedConstant for OR/XOR opcodes If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops. This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases. Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>. Differential Revision: https://reviews.llvm.org/D82257	2020-06-29 12:19:05 +01:00
Simon Pilgrim	973685fc78	[TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.	2020-06-29 11:46:58 +01:00
Eli Friedman	e9d4e34ab8	[AArch64][SVE] Add legalization support for i32/i64 vector srem/urem Implement them on top of sdiv/udiv, similar to what we do for integer types. Potential future work: implementing i8/i16 srem/urem, optimizations for constant divisors, optimizing the mul+sub to mls. Differential Revision: https://reviews.llvm.org/D81511	2020-06-23 16:27:52 -07:00
Paul Walker	499c63288f	[SVE] Code generation for fixed length vector loads & stores. Summary: This patch adds base support for code generating fixed length vector operations targeting a known SVE vector length. To achieve this we lower fixed length vector operations to equivalent scalable vector operations, whereby SVE predication is used to limit the elements processed to those present within the fixed length vector. Specifically this patch implements load and store operations, which get lowered to their masked counterparts thusly: V = load(Addr) => V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr)) store(V, (Addr)) => masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr)) Reviewers: rengolin, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80385	2020-06-23 09:39:03 +00:00
Simon Pilgrim	48d1a2d6d0	[DAG] Add SimplifyMultipleUseDemandedVectorElts helper for SimplifyMultipleUseDemandedBits. NFCI. We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.	2020-06-22 14:24:39 +01:00
Simon Pilgrim	ecc5d7ee0d	[DAG] SimplifyMultipleUseDemandedBits - drop unnecessary *_EXTEND_VECTOR_INREG cases For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly. We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.	2020-06-22 12:35:32 +01:00
David Sherwood	7edc7f6edb	[CodeGen] Fix SimplifyDemandedBits for scalable vectors For now I have changed SimplifyDemandedBits and it's various callers to assume we know nothing for scalable vectors and to ignore the demanded bits completely. I have also done something similar for SimplifyDemandedVectorElts. These changes fix up lots of warnings due to calls to EVT::getVectorNumElements() for types with scalable vectors. These functions are all used for optimisations, rather than functional requirements. In future we can revisit this code if there is a need to improve code quality for SVE. Differential Revision: https://reviews.llvm.org/D80537	2020-06-19 07:59:35 +01:00
Simon Pilgrim	2474421398	[TargetLowering] SimplifyMultipleUseDemandedBits - drop already extended ISD::SIGN_EXTEND_INREG nodes. If the source of the SIGN_EXTEND_INREG node is already sign extended, use the source directly.	2020-06-18 16:41:08 +01:00
Matt Arsenault	45e1a22a92	GlobalISel: Make known bits/alignment API more consistent Just computing the alignment makes sense without caring about the general known bits, such as for non-integral pointers. Separate the two and start calling into the TargetLowering hooks for frame indexes. Start calling the TargetLowering implementation for FrameIndexes, which improves the AMDGPU matching for stack addressing modes. Also introduce a new hook for returning known alignment of target instructions. For AMDGPU, it would be useful to report the known alignment implied by certain intrinsic calls. Also stop using MaybeAlign.	2020-06-05 14:57:22 -04:00
Matt Arsenault	af867b7850	DAG: Change computeKnownBitsForFrameIndex to be usable by GISel This wasn't getting much value from the DAG or depth arguments, since it's only called on the frame index root nodes. FrameIndexes can also only return a scalar value, so it also didn't need DemandedElts.	2020-06-04 10:50:26 -04:00
Simon Pilgrim	ea80b40669	[DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits. If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly. This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI. Differential Revision: https://reviews.llvm.org/D80869	2020-06-03 16:11:54 +01:00
Simon Pilgrim	63824ad947	[TargetLowering] SimplifyDemandedBits - remove shift amount clamps from getValidShiftAmountConstant calls. NFC. getValidShiftAmountConstant only returns a value if the shift amount is in range, so we don't need to check it again.	2020-05-30 14:04:55 +01:00
Simon Pilgrim	9fa58d1bf2	[DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies. This helps with PR40502. Differential Revision: https://reviews.llvm.org/D79003	2020-05-25 12:41:22 +01:00
Simon Pilgrim	1603106725	[TargetLowering] Improve expandFunnelShift shift amount masking For the 'inverse shift', we currently always perform a subtraction of the original (masked) shift amount. But for the case where we are handling power-of-2 type widths, we can replace: (sub bw-1, (and amt, bw-1) ) -> (and (xor amt, bw-1), bw-1) -> (and ~amt, bw-1) This allows x86 shifts to fold away the and-mask. Followup to D77301 + D80466. http://volta.cs.utah.edu:8080/z/Nod0Gr Differential Revision: https://reviews.llvm.org/D80489	2020-05-24 11:25:09 +01:00
Arthur Eubanks	8a88755610	Reland [X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 11:25:44 -07:00
Arthur Eubanks	b8cbff51d3	Revert "[X86] Codegen for preallocated" This reverts commit `810567dc69`. Some tests are unexpectedly passing	2020-05-20 10:04:55 -07:00
Arthur Eubanks	810567dc69	[X86] Codegen for preallocated See https://reviews.llvm.org/D74651 for the preallocated IR constructs and LangRef changes. In X86TargetLowering::LowerCall(), if a call is preallocated, record each argument's offset from the stack pointer and the total stack adjustment. Associate the call Value with an integer index. Store the info in X86MachineFunctionInfo with the integer index as the key. This adds two new target independent ISDOpcodes and two new target dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}. The setup ISelDAG node takes in a chain and outputs a chain and a SrcValue of the preallocated call Value. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an %esp adjustment, the exact amount determined by looking in X86MachineFunctionInfo with the integer index key. The arg ISelDAG node takes in a chain, a SrcValue of the preallocated call Value, and the arg index int constant. It produces a chain and the pointer fo the arg. It is lowered to a target dependent node with the SrcValue replaced with the integer index key by looking in X86MachineFunctionInfo. In X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a lea of the stack pointer plus an offset determined by looking in X86MachineFunctionInfo with the integer index key. Force any function containing a preallocated call to use the frame pointer. Does not yet handle a setup without a call, or a conditional call. Does not yet handle musttail. That requires a LangRef change first. Tried to look at all references to inalloca and see if they apply to preallocated. I've made preallocated versions of tests testing inalloca whenever possible and when they make sense (e.g. not alloca related, inalloca edge cases). Aside from the tests added here, I checked that this codegen produces correct code for something like ``` struct A { A(); A(A&&); ~A(); }; void bar() { foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8); } ``` by replacing the inalloca version of the .ll file with the appropriate preallocated code. Running the executable produces the same results as using the current inalloca implementation. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77689	2020-05-20 09:20:38 -07:00
QingShan Zhang	2b59e9f1bd	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-20 02:12:16 +00:00
Simon Pilgrim	acb6f1ae09	TargetLowering.cpp - remove non-constant EXTRACT_SUBVECTOR/INSERT_SUBVECTOR handling. NFC. Now that D79814 has landed, we can assume that subvector ops use constant, in-range indices.	2020-05-14 18:13:58 +01:00
Jay Foad	17941437a2	[TargetLowering] Improve expansion of FSHL/FSHR Use an extra shift-by-1 instead of a compare and select to handle the shift-by-zero case. This sometimes saves one instruction (if the compare couldn't be combined with a previous instruction). It also works better on targets that don't have good select instructions. Note that currently this change doesn't affect most targets because expandFunnelShift is not used because funnel shift intrinsics are lowered early in SelectionDAGBuilder. But there is work afoot to change that; see D77152. Differential Revision: https://reviews.llvm.org/D77301	2020-05-14 16:36:22 +01:00
Sam McCall	728cf6d86b	Revert "[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression" This reverts commit `3c44c441db`. Causes infloops on some inputs, see https://reviews.llvm.org/D77319 for repro	2020-05-11 16:44:01 +02:00
QingShan Zhang	3c44c441db	[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression. However, during negating the expression, the cost might change as we are changing the DAG, and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore. This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression, and check the cost during negating the expression. It also reduce the duplicated code between getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638 Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D77319	2020-05-11 02:41:10 +00:00
Simon Pilgrim	70293ba26f	[DAG] SimplifyMultipleUseDemandedBits - remove superfluous bitcasts If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely. Differential Revision: https://reviews.llvm.org/D79572	2020-05-08 19:04:49 +01:00
LemonBoy	7fa5abd343	[SelectionDAG] Fix assertion failure with big shift amounts Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to. Fixes the regression introduced by D79096 Differential Revision: https://reviews.llvm.org/D79405	2020-05-06 11:58:37 -07:00
LemonBoy	6d103ca855	[SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces. The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines. Differential Revision: https://reviews.llvm.org/D79096	2020-05-02 15:18:10 -07:00
Simon Pilgrim	7cb5a51f38	[DAG] SimplifyDemandedVectorElts - add INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling	2020-05-01 16:20:51 +01:00
Simon Pilgrim	65d32a9892	[DAG] SimplifyDemandedVectorElts - remove INSERT_SUBVECTOR if we don't demand the subvector	2020-05-01 16:20:51 +01:00
Simon Pilgrim	e3c0be596c	[DAG] SimplifyDemandedVectorElts - add EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling	2020-05-01 13:48:07 +01:00
Craig Topper	a58b62b4a2	[IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand(). This method has been commented as deprecated for a while. Remove it and replace all uses with the equivalent getCalledOperand(). I also made a few cleanups in here. For example, to removes use of getElementType on a pointer when we could just use getFunctionType from the call. Differential Revision: https://reviews.llvm.org/D78882	2020-04-27 22:17:03 -07:00
QingShan Zhang	2957fa0cd1	[NFC][DAGCombine] Adding three helper functions and change the getNegatedExpression to negateExpression This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression() to have it return null if the cost is expensive, and add some helper function for easy to use. And rename the old getNegatedExpression to negateExpression to avoid the semantic conflict. Reviewed By: RKSimon Differential revision: https://reviews.llvm.org/D78291	2020-04-27 04:11:42 +00:00
QingShan Zhang	c9f9c79c5a	[NFC][DAGCombine] Change the value of NegatibleCost to make it align with the semantics This is a minor NFC change to make the code more clear. We have the NegatibleCost that has cheaper, neutral, and expensive. Typically, the smaller one means the less cost. It is inverse for current implementation, which makes following code not easy to read. If (CostX > CostY) negate(X) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D77993	2020-04-15 02:20:58 +00:00
Craig Topper	95192f548d	[CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface. Differential Revision: https://reviews.llvm.org/D77929	2020-04-12 11:26:25 -07:00
Jay Foad	c63aed890e	[KnownBits] Move AND, OR and XOR logic into KnownBits Summary: There are at least three clients for KnownBits calculations: ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the common logic should be moved out of these clients and into KnownBits itself. This patch does this for AND, OR and XOR calculations by implementing and using appropriate operator overloads KnownBits::operator& etc. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74060	2020-04-09 10:10:37 +01:00
Matt Arsenault	aa26dd9858	CodeGen: Use Register in more places	2020-04-07 15:59:40 -04:00
Craig Topper	c41685b16f	[SelectionDAG] Make getZeroExtendInReg take a vector VT if the operand VT is a vector. This removes a call to getScalarType from a bunch of call sites. It also makes the behavior consistent with SIGN_EXTEND_INREG. Differential Revision: https://reviews.llvm.org/D77631	2020-04-07 11:34:08 -07:00
Guillaume Chatelet	9068bccbae	[Alignment][NFC] Deprecate InstrTypes getRetAlignment/getParamAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77312	2020-04-03 13:21:58 +00:00
Guillaume Chatelet	3a78f44daf	[Alignment][NFC] Convert SelectionDAG::InferPtrAlignment to MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77212	2020-04-01 13:22:11 +00:00
Matt Arsenault	aa63eb6a46	GlobalISel: Add computeKnownBitsForTargetInstr I think we can save the MRI argument from these since it's in GISelKnownBits already, but currently not accessible. Implementation deferred to avoid dependency on other patches.	2020-03-23 15:02:30 -04:00
Sanjay Patel	56da41393d	[SDAG] reduce code duplication in getNegatedExpression(); NFCI	2020-03-19 13:55:15 -04:00
Simon Pilgrim	68224c1952	[TargetLowering] Only demand a rotation's modulo amount bits ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits. Differential Revision: https://reviews.llvm.org/D76201	2020-03-17 21:23:46 +00:00
Simon Pilgrim	2b3b453a82	[TargetLowering] Only demand a funnelshift's modulo amount bits ISD::FSHL/FSHR shift amount values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.	2020-03-16 13:52:17 +00:00
Simon Pilgrim	e71fb46a8f	[TargetLowering] SimplifyDemandedVectorElts - add DemandedElts mask to ISD::BITCAST SimplifyDemandedBits call. This fixes most of the regressions introduced in the rG4bc6f6332028 bugfix. The vector-trunc.ll issue should be fixed by D66004.	2020-03-10 13:39:10 +00:00
QingShan Zhang	3906ae387f	[DAGCombine] Check the uses of negated floating constant and remove the hack PowerPC hits an assertion due to somewhat the same reason as https://reviews.llvm.org/D70975. Though there are already some hack, it still failed with some case, when the operand 0 is NOT a const fp, it is another fma that with const fp. And that const fp is negated which result in multi-uses. A better fix is to check the uses of the negated const fp. If there are already use of its negated value, we will have benefit as no extra Node is added. Differential revision: https://reviews.llvm.org/D75501	2020-03-05 03:42:50 +00:00
Jordan Rupprecht	d7803c3832	Add default case to fix -Wswitch errors	2020-03-02 14:23:46 -08:00
Craig Topper	adc69729ec	[TargetLowering] Fix what look like copy/paste mistakes in compare with infinity handling SimplifySetCC. I expect that the isCondCodeLegal checks should match that CC of the node that we're going to create. Rewriting to a switch to minimize repeated mentions of the same constants.	2020-03-02 14:12:16 -08:00
Simon Pilgrim	d20fb7ea13	Fix shadow variable warning. NFC.	2020-03-02 11:41:20 +00:00
Simon Pilgrim	4bc6f63320	[TargetLowering] SimplifyDemandedBits - fix SCALAR_TO_VECTOR knownbits bug We can only report the knownbits for a SCALAR_TO_VECTOR node if we only demand the 0'th element - the upper elements are undefined and shouldn't be trusted. This is causing a number of regressions that need addressing but we need to get the bugfix in first.	2020-02-28 15:23:37 +00:00
Craig Topper	a5fa778882	[LegalizeTypes] Scalarize non-byte sized loads in WidenRecRes_Load and SplitVecResLoad Should fix PR42803 and PR44902 Differential Revision: https://reviews.llvm.org/D74590	2020-02-24 15:14:33 -08:00
Bevin Hansson	6e561d1c94	[Intrinsic] Add fixed point saturating division intrinsics. Summary: This patch adds intrinsics and ISelDAG nodes for signed and unsigned fixed-point division: ``` llvm.sdiv.fix.sat.* llvm.udiv.fix.sat.* ``` These intrinsics perform scaled, saturating division on two integers or vectors of integers. They are required for the implementation of the Embedded-C fixed-point arithmetic in Clang. Reviewers: bjope, leonardchan, craig.topper Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71550	2020-02-24 10:50:52 +01:00
Simon Pilgrim	42ec6fdce9	[TargetLowering] Apply basic shift combines before recursive SimplifyDemandedBits calls. Minor refactor/cleanup before we begin adding non-uniform support.	2020-02-21 16:31:20 +00:00
Simon Pilgrim	86c52af05a	[TargetLowering] SimplifyDemandedBits - use getValidShiftAmountConstant helper. Use the SelectionDAG::getValidShiftAmountConstant helper to get const/constsplat shift amounts, which allows us to drop the out of range shift amount early-out. First step towards better non-uniform shift amount support in SimplifyDemandedBits.	2020-02-21 14:23:53 +00:00
Simon Pilgrim	d6eef0614f	[TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC.	2020-02-18 19:53:50 +00:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Simon Pilgrim	61621f826a	[TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.	2020-02-03 16:50:04 +00:00
Simon Pilgrim	8fbc7fd567	[DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors If we don't demand any elements of the inserted subvector then just skip it.	2020-01-31 18:57:22 +00:00
Simon Pilgrim	5702dadf6f	[DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.	2020-01-31 18:02:34 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00

... 2 3 4 5 6 ...

1293 Commits