llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	801c0f12b0	[DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible. Better handling of out-of-i64-range values due to large integer types or from fuzz tests. llvm-svn: 363955	2019-06-20 17:36:23 +00:00
Jordan Rupprecht	02508decf4	[DAGCombiner][NFC] Remove unused var llvm-svn: 363954	2019-06-20 17:30:01 +00:00
Simon Pilgrim	1d8093249f	[DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363929	2019-06-20 14:42:27 +00:00
Simon Pilgrim	98a0ac5c0f	[DAGCombine] Add TODOs for some combines that should support non-uniform vectors We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc. llvm-svn: 363924	2019-06-20 12:48:49 +00:00
Simon Pilgrim	a487628270	[DAGCombine] Reduce scope of ShAmtVal variable. NFCI. Fixes cppcheck warning. Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here. llvm-svn: 363921	2019-06-20 10:56:37 +00:00
Simon Pilgrim	046d49a8dc	[DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue(). Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something..... llvm-svn: 363887	2019-06-19 22:14:24 +00:00
Simon Pilgrim	f05369768c	[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case. llvm-svn: 363856	2019-06-19 18:34:58 +00:00
Simon Pilgrim	6016fb726c	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850	2019-06-19 18:00:24 +00:00
Simon Pilgrim	c3994f77cb	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802	2019-06-19 13:58:02 +00:00
Simon Pilgrim	9eed5d2f78	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363793	2019-06-19 12:41:37 +00:00
Simon Pilgrim	8c49366c9b	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths. llvm-svn: 363792	2019-06-19 12:25:29 +00:00
Simon Pilgrim	bb6b856183	[DAGCombiner] visitSHL - pull out repeated shift amount VT. NFCI. llvm-svn: 363789	2019-06-19 11:31:26 +00:00
Simon Pilgrim	d954a53633	[DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI. We pre-extend, not post. llvm-svn: 363787	2019-06-19 11:17:48 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Simon Pilgrim	5bef886cd8	[TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handling Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch. llvm-svn: 363716	2019-06-18 18:22:30 +00:00
Simon Pilgrim	032b54f8e8	[TargetLowering] SimplifyDemandedBits - Merge ZERO_EXTEND+ZERO_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363713	2019-06-18 18:08:30 +00:00
Simon Pilgrim	b6e7108dcd	[TargetLowering] SimplifyDemandedBits - Merge SIGN_EXTEND+SIGN_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363710	2019-06-18 17:57:53 +00:00
Simon Pilgrim	9aa25be149	[TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694	2019-06-18 15:49:35 +00:00
Simon Pilgrim	83bacd8d72	[SelectionDAG] Legalize vaargs that require vector splitting This adds vector splitting for vaarg instructions during type legalization Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D60762 llvm-svn: 363671	2019-06-18 12:24:02 +00:00
Luis Marques	2e46312ffd	[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544	2019-06-17 10:54:12 +00:00
Simon Pilgrim	ef78e55205	[SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source. llvm-svn: 363542	2019-06-17 10:14:52 +00:00
Michael Berg	ad6bb86b2d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	968b5f84af	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Michael Berg	69394bedc5	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Simon Pilgrim	287e78c82b	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105) Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040	2019-06-11 09:44:33 +00:00
Sander de Smalen	cbeb563cfb	Change semantics of fadd/fmul vector reductions. This patch changes how LLVM handles the accumulator/start value in the reduction, by never ignoring it regardless of the presence of fast-math flags on callsites. This change introduces the following new intrinsics to replace the existing ones: llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul and adds functionality to auto-upgrade existing LLVM IR and bitcode. Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D60261 llvm-svn: 363035	2019-06-11 08:22:10 +00:00
Francis Visoiu Mistrih	a438432acc	[FastISel] Skip creating unnecessary vregs for arguments This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963	2019-06-10 16:53:37 +00:00
QingShan Zhang	ab846da7e8	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921	2019-06-10 05:40:21 +00:00
David Bolvansky	dcf5e6abdf	[TargetLowering] Simplify (ctpop x) == 1 Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912	2019-06-09 18:18:57 +00:00
Simon Pilgrim	5f337149fa	Use for-range loop. NFCI. llvm-svn: 362897	2019-06-09 09:07:30 +00:00
Simon Pilgrim	6bae6d5a5d	[DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI. Same codegen, only differ by the oneuse limit for the sextload case. llvm-svn: 362880	2019-06-08 17:02:00 +00:00
Amara Emerson	829037a914	Factor out SelectionDAG's switch analysis and lowering into a separate component. In order for GlobalISel to re-use the significant amount of analysis and optimization code in SDAG's switch lowering, we first have to extract it and create an interface to be used by both frameworks. No test changes as it's NFC. Differential Revision: https://reviews.llvm.org/D62745 llvm-svn: 362857	2019-06-08 00:05:17 +00:00
Simon Pilgrim	f0240ee76d	[DAGCombine] visitAND - fix local shadow variable warnings. NFCI. llvm-svn: 362825	2019-06-07 18:36:43 +00:00
Simon Pilgrim	4c9db2045a	[DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI. llvm-svn: 362820	2019-06-07 18:07:06 +00:00
Jason Liu	60ec248148	[AIX] Implement function descriptor on SDAG Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735	2019-06-06 19:13:36 +00:00
Simon Pilgrim	842c7792aa	[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123) This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores: 1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another. 2 - The merged load\store node must retain the non-temporal flag. Differential Revision: https://reviews.llvm.org/D62910 llvm-svn: 362723	2019-06-06 17:04:13 +00:00
Simon Pilgrim	da993d08c8	[DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI. Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s llvm-svn: 362695	2019-06-06 10:21:18 +00:00
Ulrich Weigand	6c5d5ce551	Allow target to handle STRICT floating-point nodes The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is both marked as mayRaiseFPException and FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663	2019-06-05 22:33:10 +00:00
Tim Northover	607c8a9d14	IR: make getParamByValType Just Work. NFC. Most parts of LLVM don't care whether the byval type is derived from an explicit Attribute or from the parameter's pointee type, so it makes sense for the main access function to just return the right value. The very few users who do care (only BitcodeReader so far) can find out how it's specified by accessing the Attribute directly. llvm-svn: 362642	2019-06-05 20:37:47 +00:00
Simon Pilgrim	77d6adc491	Fix shadow local variable warning. NFCI. llvm-svn: 362622	2019-06-05 17:26:29 +00:00
Simon Pilgrim	5a81af547c	[TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI. Will be used more in an upcoming patch. llvm-svn: 362595	2019-06-05 10:59:04 +00:00
Johannes Doerfert	6b432dca5d	[SelectionDAG][FIX] Allow "returned" arguments to be bit-casted Summary: An argument that is return by a function but bit-casted before can still be annotated as "returned". Make sure we do not crash for this case. Reviewers: sunfish, stephenwlin, niravd, arsenm Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59917 llvm-svn: 362546	2019-06-04 20:34:43 +00:00
Nemanja Ivanovic	aed7227b71	Revert r362472 as it is breaking PPC build bots The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots. Reverting it to bring the bots back to green. llvm-svn: 362539	2019-06-04 18:48:43 +00:00
Craig Topper	09a4415803	[DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1) This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial. Fixes PR42118 Differential Revision: https://reviews.llvm.org/D62828 llvm-svn: 362533	2019-06-04 17:44:18 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Roman Lebedev	3dce0326fe	[DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952) Summary: This might be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet. As far as i can tell, there are no regressions here (ignoring x86-32), all changes are either good or neutral. This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`) `@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/vMd3 Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma Reviewed By: RKSimon Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62774 llvm-svn: 362488	2019-06-04 11:06:21 +00:00
Roman Lebedev	be6ce7b3f2	[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold Summary: All changes except ARM look great. https://rise4fun.com/Alive/R2M The regression `test/CodeGen/ARM/addsubcarry-promotion.ll` is recovered fully by D62392 + D62450. Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma Reviewed By: efriedma Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62266 llvm-svn: 362487	2019-06-04 11:06:08 +00:00
Simon Pilgrim	ad298f86b7	[SelectionDAG] ComputeNumSignBits - support constant pool values from target As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits. The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select. It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup. Differential Revision: https://reviews.llvm.org/D62777 llvm-svn: 362486	2019-06-04 10:49:06 +00:00
Simon Pilgrim	3178546a27	[SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD comments. NFCI. Pre-commit requested for D62777. llvm-svn: 362485	2019-06-04 10:17:56 +00:00
Simon Pilgrim	3018d505a3	[SelectionDAG] Add fpto[us]i(undef) --> undef constant fold Follow up to D62807. Differential Revision: https://reviews.llvm.org/D62811 llvm-svn: 362483	2019-06-04 10:04:55 +00:00
QingShan Zhang	11de0e71b0	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D61843 llvm-svn: 362472	2019-06-04 08:53:53 +00:00
Michael Berg	6ff978ee05	Propagate fmf for setcc in SDAG for select folds llvm-svn: 362448	2019-06-03 21:53:26 +00:00
Michael Berg	0b7f98da65	Propagate fmf for setcc/select folds Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG. Reviewers: qcolombet, spatel Reviewed By: qcolombet Subscribers: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D62552 llvm-svn: 362439	2019-06-03 19:12:15 +00:00
Simon Pilgrim	cb7e4e8193	[SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205) We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction Differential Revision: https://reviews.llvm.org/D62807 llvm-svn: 362397	2019-06-03 13:02:07 +00:00
Florian Hahn	e71963c850	Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor. If we hit the limit, we do expand the outstanding tokenfactors. Otherwise, we might drop nodes with users in the unexpanded tokenfactors. This fixes the crashes reported by Jordan Rupprecht. Reviewers: niravd, spatel, craig.topper, rupprecht Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D62633 llvm-svn: 362350	2019-06-03 01:30:19 +00:00
Craig Topper	50b35caf30	[DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask. Similar to what was done for masked load and gather. llvm-svn: 362342	2019-06-02 22:52:38 +00:00
Craig Topper	a7bc31ebc6	[DAGCombiner] Replace masked loads with a zero mask with the passthru value Similar to what was recently done for gathers in r362015. llvm-svn: 362337	2019-06-02 18:58:46 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Simon Pilgrim	ffb4d2bff7	[DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020) Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot. PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe. Differential Revision: https://reviews.llvm.org/D62783 llvm-svn: 362323	2019-06-02 11:56:39 +00:00
Simon Pilgrim	88522ce388	[TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables in analysis. These might have been replaced in multiple use cases. llvm-svn: 362322	2019-06-02 10:12:55 +00:00
Simon Pilgrim	30a6caa3e7	[TargetLowering] SimplifyDemandedVectorElts - use same arg names as SimplifyDemandedBits. NFCI. Helps with debugging as we recurse between them. llvm-svn: 362321	2019-06-02 10:03:56 +00:00
Craig Topper	f58ef87bb7	[DAGCombiner] Replace two unchecked dyn_casts with casts. The results of the dyn_casts were immediately dereferenced on the next line so they had better not be null. I don't think there's any way for these dyn_casts to fail, so use a cast of adding null check. llvm-svn: 362315	2019-06-02 03:31:01 +00:00
Craig Topper	bc9e04d0c3	[SelectionDAG] Make the code in mutateStrictFPToFP less aware of how many operands each node has. NFCI Just copy all of the operands except the chain and call MorphNode on that. This removes the IsUnary and IsTernary flags. Also always get the result type from the result type of the original nodes. Previously we got it from the operand except for two nodes where that didn't work. llvm-svn: 362269	2019-05-31 22:18:45 +00:00
Kevin P. Neal	ac79007205	Revert revert of r362112 with minor SystemZ test file corrections. [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: https://reviews.llvm.org/D62546 llvm-svn: 362241	2019-05-31 16:32:12 +00:00
Roman Lebedev	46511d75b5	[DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts I don't have a test case for these, but there is a test case for D62266 where, even after all the constant-folding patches, we still end up with endless combine loop. Which makes sense, since we don't constant fold for opaque constants. llvm-svn: 362156	2019-05-30 21:10:37 +00:00
Roman Lebedev	a4e3b50e26	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2 Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 362146	2019-05-30 20:37:49 +00:00
Roman Lebedev	57aa36ff91	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 362145	2019-05-30 20:37:39 +00:00
Roman Lebedev	63b4741534	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 362144	2019-05-30 20:37:29 +00:00
Roman Lebedev	05ad5fd213	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 362143	2019-05-30 20:37:18 +00:00
Roman Lebedev	1d9ec7a81b	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 362142	2019-05-30 20:36:54 +00:00
Roman Lebedev	7eb8b5b5dd	[DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold Summary: https://rise4fun.com/Alive/B0A Reviewers: t.p.northover, RKSimon, spatel, craig.topper Reviewed By: RKSimon Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62691 llvm-svn: 362135	2019-05-30 19:27:51 +00:00
Roman Lebedev	691b5e2ecc	[DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold Summary: https://rise4fun.com/Alive/Mb1M Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62689 llvm-svn: 362134	2019-05-30 19:27:42 +00:00
Roman Lebedev	0a3dbbcdfb	[DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold Summary: Direct sibling of D62662, the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62664 llvm-svn: 362133	2019-05-30 19:27:32 +00:00
Roman Lebedev	9ff3159b4a	[DAGCombine] Use FoldConstantArithmetic() to perform C2-(A+C1) -> (C2-C1)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62663 llvm-svn: 362132	2019-05-30 19:27:26 +00:00
Roman Lebedev	cc9a9cf237	[DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold Summary: This was the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, spatel, craig.topper, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62662 llvm-svn: 362131	2019-05-30 19:27:19 +00:00
Roman Lebedev	ef95679741	[DAGCombine] Use FoldConstantArithmetic() to perform ((c1-A)+c2) -> (c1+c2)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62661 llvm-svn: 362130	2019-05-30 19:27:10 +00:00
Tim Northover	b7141207a4	Reapply: IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. The original commit did not remap byval types when linking modules, which broke LTO. This version fixes that. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362128	2019-05-30 18:48:23 +00:00
Kevin P. Neal	51ce0b196a	Correct error in revert of r362112. Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362118	2019-05-30 17:21:45 +00:00
Kevin P. Neal	d3db7b40b0	Revert r362112, it broke the bots with the message "Unsupported vector argument or return type" Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362117	2019-05-30 17:10:21 +00:00
Kevin P. Neal	2e1807678d	[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362112	2019-05-30 16:44:47 +00:00
Roman Lebedev	019d270e43	[DAGCombine] Revert of recommit of "binop-with-const hoisting" patches I was looking into an endless combine loop the uncommitted follow-up patch was causing, and it appears even these patches can exibit such an endless loop. The root cause is that we try to hoist one binop (add/sub) with constant operand, and if we get two such binops both of which are eligible for this hoisting, we get stuck. Some cases may highlight missing constant-folds. Reverts r361871,r361872,r361873,r361874. llvm-svn: 362109	2019-05-30 16:07:11 +00:00
Tim Northover	71ee3d0237	Revert "IR: add optional type to 'byval' function parameters" The IRLinker doesn't delve into the new byval attribute when mapping types, and this breaks LTO. llvm-svn: 362029	2019-05-29 20:46:38 +00:00
Benjamin Kramer	107f8d9873	[DAGCombiner] Replace gathers with a zero mask with the passthru value These can be created by the legalizer when splitting a larger gather. See https://llvm.org/PR42055 for a motivating example. Differential Revision: https://reviews.llvm.org/D62613 llvm-svn: 362015	2019-05-29 19:24:19 +00:00
Tim Northover	6e07f16fae	IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362012	2019-05-29 19:12:48 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Roman Lebedev	dfc34f0211	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361856, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361874	2019-05-28 20:40:10 +00:00
Roman Lebedev	d485c6bc9f	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361855, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361873	2019-05-28 20:40:03 +00:00
Roman Lebedev	96c9986199	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361853, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361872	2019-05-28 20:39:55 +00:00
Roman Lebedev	2feb7e56e2	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361871	2019-05-28 20:39:39 +00:00
Roman Lebedev	272d70c366	Revert DAGCombine "hoist binop with const" folds Appear to introduce test-suite compile-time hang. http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825 This reverts r361852,r361853,r361854,r361855,r361856 llvm-svn: 361865	2019-05-28 19:04:21 +00:00
Roman Lebedev	7669665432	[DAGCombine] (x - C) - y -> (x - y) - C fold Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361856	2019-05-28 17:54:21 +00:00
Roman Lebedev	8c9b3e4e4a	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361855	2019-05-28 17:54:13 +00:00
Roman Lebedev	6a24c9b9ab	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 361854	2019-05-28 17:54:04 +00:00
Roman Lebedev	1499f65ac1	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361853	2019-05-28 17:53:54 +00:00
Roman Lebedev	19f51ec04a	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361852	2019-05-28 17:53:43 +00:00
Simon Pilgrim	9cd9624fb6	[DAG] LegalizeVectorTypes - reduce scope of local variables. NFCI. Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings. llvm-svn: 361821	2019-05-28 13:46:26 +00:00
Benjamin Kramer	57e267a2e9	[X86] Custom lower CONCAT_VECTORS of v2i1 The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814	2019-05-28 12:52:57 +00:00
Sanjay Patel	2f99d009c1	[SelectionDAG] fold concat of extract subvectors This is derived from the related fold for build vectors. We also have a version of this in DAGCombiner. The benefit of having this fold at node creation time is (1) efficiency and (2) preventing infinite looping from creating patterns that should not exist in the first place. Currently, the inf-loop could happen with MergeConsecutiveStores() because it naively creates concat of extracts when forming a wider vector store. That could fight with target-specific store narrowing. llvm-svn: 361780	2019-05-27 20:26:21 +00:00
Sanjay Patel	e13ae3e4d8	[SelectionDAG] fix formatting and redundant comments; NFC There's a possible missing fold here for extracting from the same source vector. It's similar to a check that we use to squash a build vector with all extracted elements from the same source vector. llvm-svn: 361778	2019-05-27 18:26:43 +00:00
Michael Liao	9c70c574b4	[SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`. Summary: - The current implementation simplifies the case where the source of `copyto` is `implicit-def`ed. However, it only works when that `implicit-def` is single-used since it detects that from `implicit-def` and cannot determine which destination vreg should be used if there are multiple uses. - This patch changes that detection when `copyto` is being emitted. If that `copyto`'s source is defined from `implicit-def`, it simplifies it. Hence, it works even that `implicit-def` is multi-used. - Except it simplifies the internal IR, it won't improve the quality of code generation. However, it helps to detect 'implicit-def` in a straight-forward manner in some passes, such as `si-i1-copies`. A test case is added. Reviewers: sunfish, nhaehnle Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62342 llvm-svn: 361777	2019-05-27 18:26:29 +00:00
Simon Pilgrim	ebb053b139	[SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to. llvm-svn: 361773	2019-05-27 16:39:25 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Simon Pilgrim	06e02856ab	[SelectionDAG] GetDemandedBits - cleanup to more closely match SimplifyDemandedBits. NFCI. Prep work before adding demanded elts support. llvm-svn: 361739	2019-05-26 18:58:14 +00:00
Simon Pilgrim	2916b9e28c	[SelectionDAG] MaskedValueIsZero - add demanded elements implementation Will be used in an upcoming patch but I've updated the original implementation to call this to ensure test coverage. llvm-svn: 361738	2019-05-26 18:43:44 +00:00
Sanjay Patel	91131b6500	[SelectionDAG] soften assertion when legalizing narrow vector FP ops The test based on PR42010: https://bugs.llvm.org/show_bug.cgi?id=42010 ...may show an inaccuracy for PPC's target defs, but we should not be so aggressive with an assert here. There's no telling what out-of-tree targets look like. llvm-svn: 361696	2019-05-25 13:48:07 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Tim Northover	3d7a057b0d	CodeGen: factor out swifterror value tracking. llvm-svn: 361607	2019-05-24 08:39:43 +00:00
Sanjay Patel	7d6c0bce50	[DAGCombiner] make folds of binops safe for opcodes that produce >1 value This is no-functional-change-intended currently because the definition of isBinOp() only includes opcodes that produce 1 value. But if we share that implementation with isCommutativeBinOp() as proposed in D62191, then we need to make sure that the callers bail out for opcodes that they are not prepared to handle correctly. llvm-svn: 361547	2019-05-23 20:17:25 +00:00
Kees Cook	c2187c20a4	[TargetLowering] Extend bool args to inline-asm according to getBooleanType Summary: This extends Krzysztof Parzyszek's X86-specific solution (https://reviews.llvm.org/D60208) to the generic code pointed out by James Y Knight. Reviewers: kparzysz, craig.topper, nickdesaulniers Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D60224 llvm-svn: 361404	2019-05-22 16:16:15 +00:00
Kees Cook	a7a687e500	[TargetLowering] Add blank line (test commit) llvm-svn: 361403	2019-05-22 16:02:13 +00:00
Leonard Chan	0bada7ce6c	[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55720 llvm-svn: 361289	2019-05-21 19:17:19 +00:00
Sanjay Patel	10f6b39899	[SelectionDAG] fold insert subvector of undef into undef DAGCombiner simplifies this more liberally as: // If inserting an UNDEF, just return the original vector. if (N1.isUndef()) return N0; So there's no way to make this visible in output AFAIK, but doing this at node creation time should be slightly more efficient. llvm-svn: 361287	2019-05-21 18:53:53 +00:00
Sanjay Patel	51dc59d090	[SelectionDAG] remove redundant code; NFCI getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS(): // Concat of UNDEFs is UNDEF. if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); })) return DAG.getUNDEF(VT); llvm-svn: 361284	2019-05-21 18:28:22 +00:00
Sanjay Patel	78c3f58122	[DAGCombiner] prevent unsafe reassociation of FP ops There are no FP callers of DAGCombiner::reassociateOps() currently, but we can add a fast-math check to make sure this API is not being misused. This was noted as a potential risk (and that risk might increase) with: D62191 llvm-svn: 361268	2019-05-21 14:47:38 +00:00
Dylan McKay	e967308da4	Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222	2019-05-21 06:38:02 +00:00
Craig Topper	97d4f7c194	[SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto. Since INLINEASM_BR is a terminator we need to flush the pending exports before emitting it. If we don't do this, a TokenFactor can be inserted between it and the BR instruction emitted to finish the callbr lowering. It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit the TokenFactor before that. Differential Revision: https://reviews.llvm.org/D59981 llvm-svn: 361177	2019-05-20 17:08:02 +00:00
Craig Topper	af7a188453	[Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64 We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well. Differential Revision: https://reviews.llvm.org/D62026 llvm-svn: 361169	2019-05-20 16:27:09 +00:00
Craig Topper	203bfdd0f0	[DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC This changes the isShift variable to include the constant operand check that was previously in the if statement. While there fix an 80 column violation and an unnecessary use of getNode. Also fix variable name capitalization. llvm-svn: 361168	2019-05-20 16:26:55 +00:00
Nikita Popov	9060b6df97	[SDAG] Vector op legalization for overflow ops Fixes issue reported by aemerson on D57348. Vector op legalization support is added for uaddo, usubo, saddo and ssubo (umulo and smulo were already supported). As usual, by extracting TargetLowering methods and calling them from vector op legalization. Vector op legalization doesn't really deal with multiple result nodes, so I'm explicitly performing a recursive legalization call on the result value that is not being legalized. There are some existing test changes because expansion happens earlier, so we don't get a DAG combiner run in between anymore. Differential Revision: https://reviews.llvm.org/D61692 llvm-svn: 361166	2019-05-20 16:09:22 +00:00
Guillaume Chatelet	e386a01e84	[NFC] Refactor visitIntrinsicCall so it doesn't return a const char* Summary: API simplification Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61306 llvm-svn: 361140	2019-05-20 11:01:30 +00:00
Petar Jovanovic	e85bbf564d	[DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC) Refactor DIExpression::With* into a flag enum in order to be less error-prone to use (as discussed on D60866). Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D61943 llvm-svn: 361137	2019-05-20 10:35:57 +00:00
Guillaume Chatelet	a760e69840	Revert "[NFC] Refactor visitIntrinsicCall so it doesn't return a const char*" This reverts commit 706d3cd6388cc3446aab282f3af879862b10cbed. llvm-svn: 361130	2019-05-20 09:00:12 +00:00
Guillaume Chatelet	fa8c152576	[NFC] Refactor visitIntrinsicCall so it doesn't return a const char* Summary: API simplification Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61306 llvm-svn: 361129	2019-05-20 08:52:10 +00:00
Roman Lebedev	64c756b991	[DAGCombiner] visitShiftByConstant(): drop bogus signbit check Summary: That check claims that the transform is illegal otherwise. That isn't true: 1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter https://rise4fun.com/Alive/K4A 2. For `ISD::AND`, there is no restriction on constants: https://rise4fun.com/Alive/Wy3 3. For `ISD::OR`, there is no restriction on constants: https://rise4fun.com/Alive/GOH 3. For `ISD::XOR`, there is no restriction on constants: https://rise4fun.com/Alive/ml6 So, why is it there then? This changes the testcase that was touched by @spatel in rL347478, but i'm not sure that test tests anything particular? Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin Reviewed By: spatel Subscribers: javed.absar, llvm-commits, spatel Tags: #llvm Differential Revision: https://reviews.llvm.org/D61918 llvm-svn: 361044	2019-05-17 15:52:58 +00:00
Tim Renouf	e3cbdaf1b5	[CodeGen] Fixed de-optimization of legalize subvector extract The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU 3-dword memory instructions, caused a de-optimization problem for code with such a load that then bitcasts via vector of i8, because v12i8 is not an MVT so it legalizes the bitcast by widening it. This commit adds the ability to widen a bitcast using extract_subvector on the result, so the value does not need to go via memory. Differential Revision: https://reviews.llvm.org/D60457 Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64 llvm-svn: 360942	2019-05-16 21:49:06 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Reid Kleckner	4882490349	[codeview] Fix SDNode representation of annotation labels Before this change, they were erroneously constructed with the EH_LABEL SDNode opcode, which caused other passes to interact with them in incorrect ways. See the FIXME about fastisel that this addresses in the existing test case. Fixes PR41890 llvm-svn: 360818	2019-05-15 21:46:05 +00:00
Clement Courbet	d9d0665d1c	[[DAGCombiner][NFC] Add a comment. As suggested in D61846. llvm-svn: 360755	2019-05-15 08:21:18 +00:00
Sanjay Patel	99d6420a82	[SDAG] fix unused variable warning and unneeded indirection; NFC llvm-svn: 360640	2019-05-14 00:57:31 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	d3cedee3c6	[TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG More work for PR39709. llvm-svn: 360592	2019-05-13 15:51:26 +00:00
Sanjay Patel	05dafb1c97	[DAGCombiner] narrow vector binop with inserts/extract We catch most of these patterns (on x86 at least) by matching a concat vectors opcode early in combining, but the pattern may emerge later using insert subvector instead. The AVX1 diffs for add/sub overflow show another missed narrowing pattern. That one may be falling though the cracks because of combine ordering and multiple uses. llvm-svn: 360585	2019-05-13 14:31:14 +00:00
Kevin P. Neal	5987749e33	Add constrained fptrunc and fpext intrinsics. The new fptrunc and fpext intrinsics are constrained versions of the regular fptrunc and fpext instructions. Reviewed by: Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D55897 llvm-svn: 360581	2019-05-13 13:23:30 +00:00
Simon Pilgrim	d845bc3d0c	TargetLowering::SimplifyDemandedBits - early-out for UNDEF ops. NFCI. llvm-svn: 360579	2019-05-13 12:44:03 +00:00
Clement Courbet	9afc4764dd	[DAGCombiner] Fix invalid alias analysis. Summary: When we know for sure whether two addresses do or do not alias, we should immediately return from DAGCombiner::isAlias(). I think this comes from a bad copy/paste, Sorry for not catching that during the code review. Fixes PR41855. Reviewers: niravd, gchatelet, EricWF Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61846 llvm-svn: 360566	2019-05-13 09:07:37 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
Sanjay Patel	a09e686821	[DAGCombiner] try to move bitcast after extract_subvector I noticed that we were failing to narrow an x86 ymm math op in a case similar to the 'madd' test diff. That is because a bitcast is sitting between the math and the extract subvector and thwarting our pattern matching for narrowing: t56: v8i32 = add t59, t58 t68: v4i64 = bitcast t56 t73: v2i64 = extract_subvector t68, Constant:i64<2> t96: v4i32 = bitcast t73 There are a few wins and neutral diffs in the other tests. Differential Revision: https://reviews.llvm.org/D61806 llvm-svn: 360541	2019-05-12 14:43:20 +00:00
Simon Pilgrim	605a840747	[DAG] Add SimplifyDemandedBits support for BITREVERSE Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC llvm-svn: 360534	2019-05-11 20:56:05 +00:00
Simon Pilgrim	aeed0a30c0	SelectionDAGISel::CodeGenAndEmitDAG - remove unused variable. NFCI. llvm-svn: 360514	2019-05-11 11:00:37 +00:00
Jordan Rupprecht	16c7fbd112	Revert [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor This reverts r360171 (git commit `a9d6c32eaf`). A repro showing the asan/msan failures is forthcoming. llvm-svn: 360481	2019-05-10 23:20:02 +00:00
Craig Topper	114f763f37	[LegalizeVectorOps] Remove calls to LegalizeOp on the return value from ExpandLoad/ExpandStore. We already updated the LegalizedNodes map at the end of the Expand call. This would have marked the new node as being mapped to itself. So the LegalizeOp call will find that an immediately return. llvm-svn: 360472	2019-05-10 21:42:27 +00:00
Nikita Popov	9f7537bd48	[SDAG] Recursively legalize both vector mulo results Split out from D61692 per RKSimon's suggestion. Vector op legalization will automatically recursively legalize the returned SDValue, but we need to take care of the other results ourselves. Otherwise it will end up getting legalized only during op legalization, by which point it might be too late (though I'm not aware of any specific cases right now). There are codegen differences because expansion occurs earlier now and we don't get a DAGCombiner run in between. Differential Revision: https://reviews.llvm.org/D61744 llvm-svn: 360470	2019-05-10 20:42:48 +00:00
Sanjay Patel	b37ddeafc0	[DAGCombiner] reduce code duplication; NFC llvm-svn: 360462	2019-05-10 20:02:30 +00:00
Tim Northover	6c1e3f9493	SelectionDAG: accommodate atomic floating stores. We were applying a pointer truncation to floating types, which crashed LLVM. That is Not A Good Thing(TM). llvm-svn: 360421	2019-05-10 11:23:04 +00:00
Cameron McInally	156eb28289	[CodeGen] Add comment about FSUB <-> FNEG xforms Differential Revision: https://reviews.llvm.org/D61741 llvm-svn: 360366	2019-05-09 19:28:52 +00:00
Florian Hahn	be10bc71f9	[DAGCombiner] Limit number of nodes explored as store candidates. To find the candidates to merge stores we iterate over all nodes in a chain for each store, which leads to quadratic compile times for large basic blocks with a large number of stores. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61511 llvm-svn: 360357	2019-05-09 17:05:52 +00:00
Leonard Chan	95b7abdcc5	[SelectionDAG] Expand ADD/SUBCARRY This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it. Differential Revision: https://reviews.llvm.org/D61411 llvm-svn: 360303	2019-05-09 01:17:48 +00:00
Sanjay Patel	902b3ecdad	[SelectionDAG] fold 'fneg undef' to undef This is extracted from the original draft of D61419 with some additional tests. We don't currently get this in IR (it's conservatively turned into a NaN), but presumably that'll get updated as we add real IR support for 'fneg' rather than 'fsub -0.0, x'. The x86-32 run shows the following, and I haven't looked further to see why, but that seems to be independent: Legalizing: t1: f32 = undef Trying to expand node Creating fp constant: t4: f32 = ConstantFP<0.000000e+00> Differential Revision: https://reviews.llvm.org/D61516 llvm-svn: 360296	2019-05-08 22:19:52 +00:00
Craig Topper	493aec3ef5	[FastISel][X86] Support FNeg instruction in target independent fast isel handling This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom Differential Revision: https://reviews.llvm.org/D61624 llvm-svn: 360273	2019-05-08 17:27:08 +00:00
Simon Pilgrim	2788ad3ee2	[LegalizeDAG] Assert non-power-of-2 load/store op splits are in range. NFCI. Fixes static analyzer undefined/out-of-range shift warnings. llvm-svn: 360245	2019-05-08 11:22:10 +00:00
Simon Pilgrim	97a0c54179	Fix cppcheck operator precedence warning. NFCI. llvm-svn: 360234	2019-05-08 10:07:34 +00:00
QingShan Zhang	e065af6a42	[NFC] Add a static function to do the endian check Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. Differential Revision: https://reviews.llvm.org/D61236 llvm-svn: 360226	2019-05-08 07:21:37 +00:00
Florian Hahn	a9d6c32eaf	[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor When simplifying TokenFactors, we potentially iterate over all operands of a large number of TokenFactors. This causes quadratic compile times in some cases and the large token factors cause additional scalability problems elsewhere. This patch adds some limits to the number of nodes explored for the cases mentioned above. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61397 llvm-svn: 360171	2019-05-07 16:47:27 +00:00
Simon Pilgrim	3044ac058b	Avoid use-after-move warnings by using swap instead. NFCI. Swap should be as quick in these cases, and leaves the original variables in a known (empty) state. llvm-svn: 360164	2019-05-07 15:45:00 +00:00
Craig Topper	c6d445f9c1	[FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub. Summary: If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd. I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that. Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma Reviewed By: cameron.mcinally Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61622 llvm-svn: 360111	2019-05-07 04:25:24 +00:00
Craig Topper	39f1a97417	[FastISel] Pass the fneg input operand to hasTrivialKill in FastISel::selectFNeg. We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here. llvm-svn: 360097	2019-05-06 23:09:09 +00:00
Philip Reames	2f53d79bff	Fix pr33010, a 2 year old crashing regression The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation. Arguably, we shouldn't be using TargetFrameIndices in StatepointLowering at all, but that's a much deeper change. llvm-svn: 360090	2019-05-06 22:09:31 +00:00
Craig Topper	ad56843dd7	[SelectionDAG][X86] Support inline assembly returning an mmx register into a type with fewer than 64 bits. It's possible to use the 'y' mmx constraint with a type narrower than 64-bits. This patch supports this by bitcasting the mmx type to 64-bits and then truncating to the desired type. There are probably other missing type combinations we need to support, but this is the case we have a bug report for. Fixes PR41748. Differential Revision: https://reviews.llvm.org/D61582 llvm-svn: 360069	2019-05-06 19:50:14 +00:00
Craig Topper	55a71b575c	Revert r359392 and r358887 Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066	2019-05-06 19:29:24 +00:00
Nikita Popov	cfe786a195	[SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635) This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635 by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is legal but former is not) for zero-or-all-ones boolean reductions (which are detected based on sign bits). Differential Revision: https://reviews.llvm.org/D61398 llvm-svn: 360054	2019-05-06 16:17:17 +00:00
Craig Topper	f723490e76	[SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with a report_fatal_error. Based on PR41748, not all cases are handled in this function. llvm_unreachable is treated as an optimization hint than can prune code paths in a release build. This causes weird behavior when PR41748 is encountered on a release build. It appears to generate an fp_round instruction from the floating point code. Making this a report_fatal_error prevents incorrect optimization of the code and will instead generate a message to file a bug report. llvm-svn: 360008	2019-05-06 04:01:49 +00:00
Simon Pilgrim	0f89b76b84	[SelectionDAG] Use any_of/all_of where possible. NFCI. llvm-svn: 359974	2019-05-05 10:30:04 +00:00
Simon Pilgrim	5d3b100750	[DAGCombine] Remove repeated variables. NFCI. llvm-svn: 359915	2019-05-03 18:20:28 +00:00
Simon Pilgrim	308b5ec1ff	[TargetLowering] SimplifySetCC - remove repeated variable. NFCI. Also reduce scope of Temp variable. llvm-svn: 359911	2019-05-03 18:02:33 +00:00
Simon Pilgrim	d857f64c31	[SelectionDAG] CreateTopologicalOrder - don't use iterator We shouldn't use an iterator to loop across a std::vector when the same loop is adding elements to that std::vector Found by cppcheck llvm-svn: 359900	2019-05-03 15:50:37 +00:00
Simon Pilgrim	bc876df3a5	[TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. NFCI. Only ever used in one block llvm-svn: 359890	2019-05-03 14:38:24 +00:00
Simon Pilgrim	e798e3a346	[TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI. Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings. llvm-svn: 359877	2019-05-03 12:55:25 +00:00
Simon Pilgrim	42d2b604b5	[SelectionDAG] Use INT_MIN as (1 << 31) is UB for signed integers. NFCI. llvm-svn: 359873	2019-05-03 11:32:00 +00:00
Simon Pilgrim	bfd00a6440	[SelectionDAG] computeKnownBits - remove some duplicate/shadow variables. NFCI. llvm-svn: 359872	2019-05-03 11:11:03 +00:00
Craig Topper	e8a1cde886	[SelectionDAG] Add asserts to verify the vectorness of input and output types of TRUNCATE/ZERO_EXTEND/ANY_EXTEND/SIGN_EXTEND agree As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR Differential Revision: https://reviews.llvm.org/D61463 llvm-svn: 359836	2019-05-02 22:26:26 +00:00
Sanjay Patel	1972826178	[DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try) The original patch was committed at rL359398 and reverted at rL359695 because of infinite looping. This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop. Original commit message: This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359793	2019-05-02 15:02:08 +00:00
Sanjay Patel	284472be6d	[SelectionDAG] remove constant folding limitations based on FP exceptions We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791	2019-05-02 14:47:59 +00:00
Sanjay Patel	64d5751254	Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate" This reverts commit `fb9a5307a9` (rL359398) because it can cause an infinite loop due to opposing combines. llvm-svn: 359695	2019-05-01 16:06:21 +00:00
Tim Northover	ee2474df9f	DAG: allow DAG pointer size different from memory representation. In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG builder code so that pointers are allowed to have a larger type when "live" in the DAG compared to memory. Pointers get zero-extended whenever they are loaded, and truncated prior to stores. In addition, a few not quite so obvious locations need updating: * A GEP that has not been marked inbounds needs to enforce the IR-documented 2s-complement wrapping at the memory pointer size. Inbounds GEPs are undefined if they overflow the address space, so no additional operations are needed. * Signed comparisons would give incorrect results if performed on the zero-extended values. This shouldn't affect CodeGen for now, but will become active when the AArch64 ILP32 support is committed. llvm-svn: 359676	2019-05-01 12:37:30 +00:00
Sanjay Patel	0387bf5269	[SelectionDAG] remove div-by-zero constant folding restriction We don't have this restriction in IR, so it should not be here either simply out of consistency. Code that wants to handle FP exceptions is expected to use the 'strict' variants of these nodes. We don't get the frem case because frem by 0.0 produces NaN (invalid), and that's the remaining check here (so the removed check for frem was dead code AFAIK). This is the only place in SDAG that uses "HasFPExceptions", so I think we should remove that entirely as a follow-up patch. llvm-svn: 359566	2019-04-30 14:37:15 +00:00
Sjoerd Meijer	0ed4619679	[TargetLowering] findOptimalMemOpLowering. NFCI. This was a local static funtion in SelectionDAG, which I've promoted to TargetLowering so that I can reuse it to estimate the cost of a memory operation in D59787. Differential Revision: https://reviews.llvm.org/D59766 llvm-svn: 359543	2019-04-30 10:09:15 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Zi Xuan Wu	49d60fdc2e	[DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), if adde is not legal for the target. Even it's at type-legalize phase. Because adde is special and will not be legalized at operation-legalize phase later. This fixes: PR40922 https://bugs.llvm.org/show_bug.cgi?id=40922 Differential Revision: https://reviews.llvm.org//D60854 llvm-svn: 359532	2019-04-30 03:01:14 +00:00
Bjorn Pettersson	820994572c	[DAG] Refactor DAGCombiner::ReassociateOps Summary: Extract the logic for doing reassociations from DAGCombiner::reassociateOps into a helper function DAGCombiner::reassociateOpsCommutative, and use that helper to trigger reassociation on the original operand order, or the commuted operand order. Codegen is not identical since the operand order will be different when doing the reassociations for the commuted case. That causes some unfortunate churn in some test cases. Apart from that this should be NFC. Reviewers: spatel, craig.topper, tstellar Reviewed By: spatel Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61199 llvm-svn: 359476	2019-04-29 17:50:10 +00:00
Sanjay Patel	fb9a5307a9	[DAGCombiner] try repeated fdiv divisor transform before building estimate This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359398	2019-04-28 12:23:43 +00:00
Simon Pilgrim	ef54b1dddf	[DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI. Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format. llvm-svn: 359326	2019-04-26 17:49:02 +00:00
Simon Pilgrim	5d6ef94c36	[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293	2019-04-26 10:49:13 +00:00
Craig Topper	f9c30eddd0	[SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the input needs to be be split and the output type is a vector. We had special case handling here, but it uses a scalar any_extend for the promotion then bitcasts to the final type. This won't split up the input data into multiple promoted elements like we need. This patch falls back to doing the conversion through memory. Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll changes. The changes to vector-half-conversions.ll are fixing a previously unknown miscompile from this issue. Differential Revision: https://reviews.llvm.org/D61114 llvm-svn: 359219	2019-04-25 18:19:59 +00:00
Amy Huang	68c9199493	Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label). Summary: This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug info in codeview. Currently only changes FastISel, so emitting labels still needs to be implemented in SelectionDAG. Reviewers: rnk Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61083 llvm-svn: 359149	2019-04-24 23:02:48 +00:00
Sanjay Patel	6f41bf948b	[DAGCombiner] scale repeated FP divisor by splat factor If we have a vector FP division with a splatted divisor, use the existing transform that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then potentially be converted into a scalar FP division by existing combines (rL358984) as seen in the tests here. That can be a potentially big perf difference if scalar fdiv has better timing (including avoiding possible frequency throttling for vector ops). Differential Revision: https://reviews.llvm.org/D61028 llvm-svn: 359147	2019-04-24 22:28:58 +00:00
Bjorn Pettersson	71e8c6f20f	Add "const" in GetUnderlyingObjects. NFC Summary: Both the input Value pointer and the returned Value pointers in GetUnderlyingObjects are now declared as const. It turned out that all current (in-tree) uses of GetUnderlyingObjects were trivial to update, being satisfied with have those Value pointers declared as const. Actually, in the past several of the users had to use const_cast, just because of ValueTracking not providing a version of GetUnderlyingObjects with "const" Value pointers. With this patch we get rid of those const casts. Reviewers: hfinkel, materi, jkorous Reviewed By: jkorous Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61038 llvm-svn: 359072	2019-04-24 06:55:50 +00:00
Amy Huang	fc79ab9857	Revert "[MS] Emit S_HEAPALLOCSITE debug info" because of ToTWin64(db) buildbot failure. This reverts commit `d07d6d6177` and `c774f687b6`. llvm-svn: 359034	2019-04-23 21:12:58 +00:00
Fangrui Song	efd94c56ba	Use llvm::stable_sort While touching the code, simplify if feasible. llvm-svn: 358996	2019-04-23 14:51:27 +00:00
Sanjay Patel	06ff5eae5b	[DAGCombiner] generalize binop-of-splats scalarization If we only match build vectors, we can miss some patterns that use shuffles as seen in the affected tests. Note that the underlying calls within getSplatSourceVector() have the potential for compile-time explosion because of exponential recursion looking through binop opcodes, but currently the list of supported opcodes is very limited. Both of those problems should be addressed in follow-up patches. llvm-svn: 358984	2019-04-23 13:16:41 +00:00
Bjorn Pettersson	f97b29be88	[DAGCombiner] Combine OR as ADD when no common bits are set Summary: The DAGCombiner is rewriting (canonicalizing) an ISD::ADD with no common bits set in the operands as an ISD::OR node. This could sometimes result in "missing out" on some combines that normally are performed for ADD. To be more specific this could happen if we already have rewritten an ADD into OR, and later (after legalizations or combines) we expose patterns that could have been optimized if we had seen the OR as an ADD (e.g. reassociations based on ADD). To make the DAG combiner less sensitive to if ADD or OR is used for these "no common bits set" ADD/OR operations we now apply most of the ADD combines also to an OR operation, when value tracking indicates that the operands have no common bits set. Reviewers: spatel, RKSimon, craig.topper, kparzysz Reviewed By: spatel Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59758 llvm-svn: 358965	2019-04-23 10:01:08 +00:00
Sanjay Patel	bf8aacb715	[SelectionDAG] move splat util functions up from x86 lowering This was supposed to be NFC, but the change in SDLoc definitions causes instruction scheduling changes. There's nothing x86-specific in this code, and it can likely be used from DAGCombiner's simplifyVBinOp(). llvm-svn: 358930	2019-04-22 22:43:36 +00:00
Simon Pilgrim	6276ce0142	[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887	2019-04-22 14:04:35 +00:00
Sanjay Patel	9bc6c77220	[DAGCombiner] make variable name less ambiguous; NFC llvm-svn: 358886	2019-04-22 13:42:50 +00:00
Sanjay Patel	d6989daae9	[DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC llvm-svn: 358884	2019-04-22 13:36:07 +00:00
Amy Huang	c774f687b6	[MS] Emit S_HEAPALLOCSITE debug info Summary: This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug info in codeview. Currently only changes FastISel, so emitting labels still needs to be implemented in SelectionDAG. Reviewers: hans, rnk Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60800 llvm-svn: 358783	2019-04-19 21:09:11 +00:00
Sanjay Patel	e197c617a6	[SelectionDAG] soften splat mask assert/unreachable (PR41535) These are general queries, so they should not die when given a degenerate input like an all undef mask. Callers should be able to deal with an op that will eventually be simplified away. llvm-svn: 358761	2019-04-19 15:31:11 +00:00
Simon Pilgrim	e7fe6dd5ed	[DAGCombine] Add SimplifyDemandedBits helper that handles demanded elts mask as well The other SimplifyDemandedBits helpers become wrappers to this new demanded elts variant. llvm-svn: 358585	2019-04-17 15:45:44 +00:00
Florian Hahn	258a425c69	[ScheduleDAGRRList] Recompute topological ordering on demand. Currently there is a single point in ScheduleDAGRRList, where we actually query the topological order (besides init code). Currently we are recomputing the order after adding a node (which does not have predecessors) and then we add predecessors edge-by-edge. We can avoid adding edges one-by-one after we added a new node. In that case, we can just rebuild the order from scratch after adding the edges to the DAG and avoid all the updates to the ordering. Also, we can delay updating the DAG until we query the DAG, if we keep a list of added edges. Depending on the number of updates, we can either apply them when needed or recompute the order from scratch. This brings down the geomean compile time for of CTMark with -O1 down 0.3% on X86, with no regressions. Reviewers: MatzeB, atrick, efriedma, niravd, paquette Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D60125 llvm-svn: 358583	2019-04-17 15:05:29 +00:00
Simon Pilgrim	e5573f4f4e	[TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526	2019-04-16 20:57:28 +00:00
Luis Marques	eda370d4c8	[DAGCombiner] Add missing flag to addressing mode check The checks in `canFoldInAddressingMode` tested for addressing modes that have a base register but didn't set the `HasBaseReg` flag to true (it's false by default). This patch fixes that. Although the omission of the flag was technically incorrect it had no known observable impact, so no tests were changed by this patch. Differential Revision: https://reviews.llvm.org/D60314 llvm-svn: 358502	2019-04-16 15:09:18 +00:00
Tim Northover	2be3f868f9	DAG: propagate ConsecutiveRegs flags to returns too. Arguments already have a flag to inform backends when they have been split up. The AArch64 arm64_32 ABI makes use of these on return types too, so that code emitted for armv7k can be ABI-compliant. There should be no CodeGen changes yet, just making more information available. llvm-svn: 358399	2019-04-15 12:04:10 +00:00
Tim Northover	9db00f7e5b	DAG: propagate whether an arg is a pointer for CallingConv decisions. The arm64_32 ABI specifies that pointers (despite being 32-bits) should be zero-extended to 64-bits when passed in registers for efficiency reasons. This means that the SelectionDAG needs to be able to tell the backend that an argument was originally a pointer, which is implmented here. Additionally, some memory intrinsics need to be declared as taking an i8* instead of an iPTR. There should be no CodeGen change yet, but it will be triggered when AArch64 backend support for ILP32 is added. llvm-svn: 358398	2019-04-15 12:03:54 +00:00
Bjorn Pettersson	60569363a5	[SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry Summary: Use KnownBits::computeForAddSub/computeForAddCarry in SelectionDAG::computeKnownBits when doing value tracking for addition/subtraction. This should improve the precision of the known bits, as we only used to make a simple estimate of known zeroes. The KnownBits support functions are also able to deduce bits that are known to be one in the result. Reviewers: spatel, RKSimon, nikic, lebedev.ri Reviewed By: nikic Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60460 llvm-svn: 358372	2019-04-15 07:19:11 +00:00
Sanjay Patel	5e4ad39af7	[DAGCombiner] narrow shuffle of concatenated vectors // shuffle (concat X, undef), (concat Y, undef), Mask --> // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1) The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements. The x86 changes look neutral or better. There's one test with an extra instruction, but that could be reversed for a subtarget with the right attributes. But by default, we want to avoid the 256-bit op when possible (in my motivating benchmark, a handful of ymm ops sprinkled into a sequence of xmm ops are triggering frequency throttling on Haswell resulting in significantly worse perf). Differential Revision: https://reviews.llvm.org/D60545 llvm-svn: 358291	2019-04-12 16:31:56 +00:00
Craig Topper	3b1239d2a8	[TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes. If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding. Differential Revision: https://reviews.llvm.org/D60358 llvm-svn: 358257	2019-04-12 06:49:28 +00:00
Sanjay Patel	fd314eca8f	[DAGCombiner] refactor narrowing of extracted vector binop; NFC There's a TODO comment about handling patterns with insert_subvector, and we do want to match that. llvm-svn: 358187	2019-04-11 15:59:47 +00:00
Sanjay Patel	c0f4a35e68	[DAGCombiner][x86] scalarize inserted vector FP ops // bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) --> // build_vec ...undef, (bo x, y), undef... The lifetime of the nodes in these examples is different for variables versus constants, but they are all build vectors briefly, so I'm proposing to catch them in this form to handle all of the leading examples in the motivating test file. Before we have build vectors, we might have insert_vector_element. After that, we might have scalar_to_vector and constant pool loads. It's going to take more work to ensure that FP vector operands are getting simplified with undef elements, so this transform can apply more widely. In a non-loose FP environment, we are likely simplifying FP elements to NaN values rather than undefs. We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors yet. Differential Revision: https://reviews.llvm.org/D60514 llvm-svn: 358172	2019-04-11 14:21:57 +00:00
David Green	0861c87b06	Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not seeing through to the constant in other blocks. Revert this patch while we come up with a better way to handle that. I will try to follow this up with some better tests. llvm-svn: 358113	2019-04-10 18:00:41 +00:00
Craig Topper	61e77b11d1	[DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate. This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate. Differential Revision: https://reviews.llvm.org/D60020 llvm-svn: 358027	2019-04-09 18:33:56 +00:00
Simon Pilgrim	d7cc0ec581	[TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support llvm-svn: 358019	2019-04-09 16:52:21 +00:00
Simon Pilgrim	55f79ef9fe	[TargetLowering] SimplifyDemandedBits - Remove GetDemandedSrcMask lambda. NFCI. An older version of this could return false but now that this always succeeds we can just inline and simplify it. llvm-svn: 357999	2019-04-09 12:29:26 +00:00
Simon Pilgrim	345eacd555	[TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op. llvm-svn: 357992	2019-04-09 10:27:59 +00:00
Simon Pilgrim	9f74df7d5b	[TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts. llvm-svn: 357942	2019-04-08 20:59:38 +00:00
Simon Pilgrim	561ba38623	[DAG] Pull out ComputeNumSignBits call to make debugging easier. NFCI. llvm-svn: 357861	2019-04-07 11:49:33 +00:00
Simon Pilgrim	17586cda4a	[SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D60006 llvm-svn: 357765	2019-04-05 14:56:21 +00:00
Sanjay Patel	50a8652785	[DAGCombiner][x86] scalarize splatted vector FP ops There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760	2019-04-05 13:32:17 +00:00
Piotr Sobczak	0376ac1d94	[SelectionDAG] Compute known bits of CopyFromReg Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745	2019-04-05 07:44:09 +00:00
Serguei Katkov	c39636cc2c	[FastISel] Fix crash for gc.relocate lowring Lowering safepoint checks that all gc.relocaes observed in safepoint must be lowered. However Fast-Isel is able to skip dead gc.relocate. To resolve this issue we just ignore dead gc.relocate in the check. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60184 llvm-svn: 357742	2019-04-05 05:41:08 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
Serguei Katkov	fb44846e37	[FastISel] Fix the crash in gc.result lowering The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction. This works as follows: Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel for these instructions if it is a call and continue fast instruction selections. However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining instructions in basic block. However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint causing breakage invariant the gc.results should be handled after statepoint. Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext) and as a result test does not check fast-isel at all. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60182 llvm-svn: 357672	2019-04-04 04:19:56 +00:00
Simon Pilgrim	8d248dbd77	[DAGCombiner] Rename variables Demanded -> DemandedBits/DemandedElts. NFCI. Use consistent variable names down the SimplifyDemanded* call stack so debugging isn't such a annoyance. llvm-svn: 357602	2019-04-03 16:00:59 +00:00
Sanjay Patel	00dae6b22d	[DAGCombiner] loosen restrictions for moving shuffles after vector binop There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580	2019-04-03 13:42:06 +00:00
Simon Pilgrim	02599de2e1	[DAGCombine] Don't use getZExtValue() until we know the constant is in range. Noticed during prep for a patch for PR40758. llvm-svn: 357571	2019-04-03 11:00:55 +00:00
Hans Wennborg	94b867dc7c	Revert r357256 "[DAGCombine] Improve Lifetime node chains." As it caused a pathological compile-time regressionin V8, see PR41352. > Improve both start and end lifetime nodes chain dependencies. > > Reviewers: courbet > > Reviewed By: courbet > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D59795 This also reverts the follow-up r357309: > [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. > > Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357563	2019-04-03 07:41:58 +00:00
Sanjay Patel	7cb7daabbb	[DAGCombiner] reduce code duplication; NFC llvm-svn: 357498	2019-04-02 17:20:54 +00:00
Nirav Dave	54f7118de5	[DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357309	2019-03-29 20:26:23 +00:00
Nirav Dave	7e84cacdbd	[DAG] Avoid redundancy in StoreMerge TokenFactor generation. Avoid generating redundant TokenFactor when all merged stores have the same chain. llvm-svn: 357299	2019-03-29 18:50:22 +00:00
Nirav Dave	fe59e14031	[DAGCombine] Prune unnused nodes. Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283	2019-03-29 17:35:56 +00:00
Nirav Dave	610036c506	[DAG] Set up infrastructure to avoid smart constructor-based dangling nodes Summary: Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations without fully pruning unused result values. This results in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter for the combiner to make sure such nodes have the chance of being pruned. This allows a number of additional peephole optimizations. Reviewers: efriedma, RKSimon, craig.topper, jyknight Reviewed By: jyknight Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58068 llvm-svn: 357279	2019-03-29 17:26:40 +00:00
Sanjay Patel	12685d0f7c	[DAGCombiner] simplify shuffle of shuffle After investigating the examples from D59777 targeting an SSE4.1 machine, it looks like a very different problem due to how we map illegal types (256-bit in these cases). We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand. We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that generality means it is limited to patterns with a one-use constraint, and the examples here have 2 uses. We don't need any uses or legality limitations for a simplification (no new value is created). It looks like we miss this pattern in IR too. In one of the zext examples here, we have shuffle masks like this: Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7> Shuf = vector_shuffle<4,u,6,7,u,u,u,u> ...so that's moving the high half of the 1st vector into the low half. But the high half of the 1st vector is already identical to the low half. Differential Revision: https://reviews.llvm.org/D59961 llvm-svn: 357258	2019-03-29 14:20:38 +00:00
Nirav Dave	9259de217e	[DAGCombine] Improve Lifetime node chains. Improve both start and end lifetime nodes chain dependencies. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59795 llvm-svn: 357256	2019-03-29 14:09:47 +00:00
Sanjay Patel	665a385035	[DAGCombiner] fold sext into decrement This is a sibling to rL357178 that I noticed we'd hit if we chose an alternate transform in D59818. %z = zext i8 %x to i32 %dec = add i32 %z, -1 %r = sext i32 %dec to i64 => %z2 = zext i8 %x to i64 %r = add i64 %z2, -1 https://rise4fun.com/Alive/kPP The x86 vector diffs show a slight regression, so there's a chance that we should limit this and the previous transform to scalars. But given that we allowed vectors before, I'm matching that behavior here. We should change both transforms together if that's the right thing to do. llvm-svn: 357254	2019-03-29 13:49:08 +00:00
Hans Wennborg	800b12f90a	Switch lowering: exploit unreachable fall-through when lowering case range cluster In the example below, we would previously emit two range checks, one for cases 1--3 and one for 4--6. This patch makes us exploit the fact that the fall-through is unreachable and only one range check is necessary. switch i32 %i, label %default [ i32 1, label %bb1 i32 2, label %bb1 i32 3, label %bb1 i32 4, label %bb2 i32 5, label %bb2 i32 6, label %bb2 ] default: unreachable llvm-svn: 357252	2019-03-29 13:40:05 +00:00
Craig Topper	ea626d8bdb	[SelectionDAGBuilder] Fix 80 column violation. NFC llvm-svn: 357213	2019-03-28 20:52:22 +00:00
Nirav Dave	8b9c9822a1	[DAG] Fix Lifetime Node ID hashing. llvm-svn: 357179	2019-03-28 15:53:01 +00:00
Sanjay Patel	ffa8d3def7	[DAGCombiner] fold sext into negation As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178	2019-03-28 15:46:02 +00:00
Simon Pilgrim	38a0616c1d	[DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161	2019-03-28 11:34:21 +00:00
Nirav Dave	6b741a8038	[DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121	2019-03-27 20:37:08 +00:00
Justin Bogner	b1650f0da9	[LegalizeVectorTypes] Allow single loads and stores for more short vectors When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120	2019-03-27 20:35:56 +00:00
Nirav Dave	c6dfaa0e83	Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes." This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116	2019-03-27 19:54:41 +00:00
Nikita Popov	6d855ea024	[ConstantRange] Rename isWrappedSet() to isUpperWrapped() Split out from D59749. The current implementation of isWrappedSet() doesn't do what it says on the tin, and treats ranges like [X, Max] as wrapping, because they are represented as [X, 0) when using half-inclusive ranges. This also makes it inconsistent with the semantics of isSignWrappedSet(). This patch renames isWrappedSet() to isUpperWrapped(), in preparation for the introduction of a new isWrappedSet() method with corrected behavior. llvm-svn: 357107	2019-03-27 18:19:33 +00:00
Nirav Dave	b5630a2ab1	[DAGCombiner] Unify Lifetime and memory Op aliasing. Rework BaseIndexOffset and isAlias to fully work with lifetime nodes and fold in lifetime alias analysis. This is mostly NFC. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59794 llvm-svn: 357070	2019-03-27 14:14:46 +00:00
Nirav Dave	96a264e053	[DAGCombine] Refactor GatherAllAliases. NFCI. llvm-svn: 357069	2019-03-27 14:14:35 +00:00
Hans Wennborg	5c0d7a24e8	Re-commit r355490 "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default" Original commit by Ayonam Ray. This commit adds a regression test for the issue discovered in the previous commit: that the range check for the jump table can only be omitted if the fall-through destination of the jump table is unreachable, which isn't necessarily true just because the default of the switch is unreachable. This addresses the missing optimization in PR41242. > During the lowering of a switch that would result in the generation of a > jump table, a range check is performed before indexing into the jump > table, for the switch value being outside the jump table range and a > conditional branch is inserted to jump to the default block. In case the > default block is unreachable, this conditional jump can be omitted. This > patch implements omitting this conditional branch for unreachable > defaults. > > Differential Revision: https://reviews.llvm.org/D52002 > Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 357067	2019-03-27 14:10:11 +00:00
Jonas Paulsson	38342a5185	[DAGCombiner] Don't allow addcarry if the carry producer is illegal. getAsCarry() checks that the input argument is a carry-producing node before allowing a transformation to addcarry. This patch adds a check to make sure that the carry-producing node is legal. If it is not, it may not remain in a form that is manageable by the target backend. The test case caused a compilation failure during instruction selection for this reason on SystemZ. Patch by Ulrich Weigand. Review: Sanjay Patel https://reviews.llvm.org/D59822 llvm-svn: 357052	2019-03-27 08:41:46 +00:00

... 3 4 5 6 7 ...

10115 Commits