llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	5df3a87224	[AArch64][X86] Don't assume __powidf2 is available on Windows. We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013	2019-11-08 12:43:21 -08:00
David Green	2179867ddc	[AArch64] Select saturating Neon instructions This adds some extra patterns to select AArch64 Neon SQADD, UQADD, SQSUB and UQSUB from the existing target independent sadd_sat, uadd_sat, ssub_sat and usub_sat nodes. It does not attempt to replace the existing int_aarch64_neon_uqadd intrinsic nodes as they are apparently used for both scalar and vector, and need to be legal on scalar types for some of the patterns to work. The int_aarch64_neon_uqadd on scalar would move the two integers into floating point registers, perform a Neon uqadd and move the value back. I don't believe this is good idea for uadd_sat to do the same as the scalar alternative is simpler (an adds with a csinv). For signed it may be smaller, but I'm not sure about it being better. So this just adds some extra patterns for the existing vector instructions, matching on the _sat nodes. Differential Revision: https://reviews.llvm.org/D69374	2019-10-31 17:28:36 +00:00
Ehsan Amiri	ed7bcb2cb1	[AArch64][SVE] Add patterns for some integer vector instructions Add pattern matching for SVE vector instructions: -- add, sub, and, or, xor instructions -- sqadd, uqadd, sqsub, uqsub target-independent intrinsics -- bic intrinsics -- predicated add, sub, subr intrinsics Patch Review: https://reviews.llvm.org/D69128 Patch authored by: dancgr (Danilo Carvalho Grael)	2019-10-30 21:52:19 -04:00
Andrew Paverd	d157a9bc8b	Add Windows Control Flow Guard checks (/guard:cf). Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761	2019-10-28 15:19:39 +00:00
Kerry McLaughlin	da720a38b9	[AArch64][SVE] Implement masked load intrinsics Summary: Adds support for codegen of masked loads, with non-extending, zero-extending and sign-extending variants. Reviewers: huntergr, rovka, greened, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68877	2019-10-28 10:06:14 +00:00
Graham Hunter	84da2596f9	[AArch64][SVE] Add SPLAT_VECTOR ISD Node Adds a new ISD node to replicate a scalar value across all elements of a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot be used. Fixes up default type legalization for scalable vectors after the new MVT type ranges were introduced. At present I only use this node for scalable vectors. A DAGCombine has been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all elements are the same, but only if the default operation action of Expand has been overridden by the target. I've only added result promotion legalization for scalable vector i8/i16/i32/i64 types in AArch64 for now. Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy Reviewed By: jmolloy Differential Revision: https://reviews.llvm.org/D47775 llvm-svn: 375222	2019-10-18 11:48:35 +00:00
Kerry McLaughlin	0c7cc383e5	[AArch64][SVE] Implement unpack intrinsics Summary: Implements the following intrinsics: - int_aarch64_sve_sunpkhi - int_aarch64_sve_sunpklo - int_aarch64_sve_uunpkhi - int_aarch64_sve_uunpklo This patch also adds AArch64ISD nodes for UNPK instead of implementing the intrinsics directly, as they are required for a future patch which implements the sign/zero extension of legal vectors. This patch includes tests for the Subdivide2Argument type added by D67549 Reviewers: sdesmalen, SjoerdMeijer, greened, rengolin, rovka Reviewed By: greened Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D67550 llvm-svn: 375210	2019-10-18 09:40:16 +00:00
Graham Hunter	b302561b76	[SVE][IR] Scalable Vector size queries and IR instruction support * Adds a TypeSize struct to represent the known minimum size of a type along with a flag to indicate that the runtime size is a integer multiple of that size * Converts existing size query functions from Type.h and DataLayout.h to return a TypeSize result * Adds convenience methods (including a transparent conversion operator to uint64_t) so that most existing code 'just works' as if the return values were still scalars. * Uses the new size queries along with ElementCount to ensure that all supported instructions used with scalable vectors can be constructed in IR. Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen Reviewed By: rovka, sdesmalen Differential Revision: https://reviews.llvm.org/D53137 llvm-svn: 374042	2019-10-08 12:53:54 +00:00
Nikola Prica	02682498b8	[ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers Support for tracking registers that forward function parameters into the following function frame. For now we only support cases when parameter is forwarded through single register. Reviewers: aprantl, vsk, t.p.northover Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D66953 llvm-svn: 374033	2019-10-08 09:43:05 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Hans Wennborg	3740ae3b8a	Revert r372893 "[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets" This caused severe compile-time regressions, see PR43455. > Modern processors predict the targets of an indirect branch regardless of > the size of any jump table used to glean its target address. Moreover, > branch predictors typically use resources limited by the number of actual > targets that occur at run time. > > This patch changes the semantics of the option `-max-jump-table-size` to limit > the number of different targets instead of the number of entries in a jump > table. Thus, it is now renamed to `-max-jump-table-targets`. > > Before, when `-max-jump-table-size` was specified, it could happen that > cluster jump tables could have targets used repeatedly, but each one was > counted and typically resulted in tables with the same number of entries. > With this patch, when specifying `-max-jump-table-targets`, tables may have > different lengths, since the number of unique targets is counted towards the > limit, but the number of unique targets in tables is the same, but for the > last one containing the balance of targets. > > Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 373060	2019-09-27 09:54:26 +00:00
Evandro Menezes	3bd8ba156b	[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets Modern processors predict the targets of an indirect branch regardless of the size of any jump table used to glean its target address. Moreover, branch predictors typically use resources limited by the number of actual targets that occur at run time. This patch changes the semantics of the option `-max-jump-table-size` to limit the number of different targets instead of the number of entries in a jump table. Thus, it is now renamed to `-max-jump-table-targets`. Before, when `-max-jump-table-size` was specified, it could happen that cluster jump tables could have targets used repeatedly, but each one was counted and typically resulted in tables with the same number of entries. With this patch, when specifying `-max-jump-table-targets`, tables may have different lengths, since the number of unique targets is counted towards the limit, but the number of unique targets in tables is the same, but for the last one containing the balance of targets. Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 372893	2019-09-25 16:10:20 +00:00
Florian Hahn	364a23427b	[AArch64] Convert neon_ushl and neon_sshl with positive constants to VSHL. I think we should be able to use shl instead of sshl and ushl for positive constant shift values, unless I am missing something. We already have the machinery in place to ensure we only replace nodes, if the shift value is positive and <= the element width. This is a generalization of an earlier patch rL372565. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D67955 llvm-svn: 372824	2019-09-25 08:22:05 +00:00
Florian Hahn	3e2fdbee80	[AArch64] support neon_sshl and neon_ushl in performIntrinsicCombine. Try to generate ushll/sshll for aarch64_neon_ushl/aarch64_neon_sshl, if their first operand is extended and the second operand is a constant Also adds a few tests marked with FIXME, where we can further increase codegen. Reviewers: t.p.northover, samparker, dmgreen, anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D62308 llvm-svn: 372565	2019-09-23 09:38:53 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Kerry McLaughlin	e55b3bf40e	[SVE][Inline-Asm] Add constraints for SVE predicate registers Summary: Adds the following inline asm constraints for SVE: - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive - Upa: SVE predicate register with full range, P0 to P15 Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin Reviewed By: rovka Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66524 llvm-svn: 371967	2019-09-16 09:45:27 +00:00
Tim Northover	f1c2892912	AArch64: support arm64_32, an ILP32 slice for watchOS. This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM. FastISel is mostly disabled for now since it would generate incorrect code for ILP32. llvm-svn: 371722	2019-09-12 10:22:23 +00:00
Guillaume Chatelet	ad1cea0dda	[Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212	2019-09-06 15:03:49 +00:00
Guillaume Chatelet	9fcf066d0c	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210	2019-09-06 14:51:15 +00:00
Guillaume Chatelet	4fc3ad9e13	[Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200	2019-09-06 12:48:34 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Kerry McLaughlin	7b5c6b8d86	[SVE][Inline-Asm] Fix -Wimplicit-fallthrough in AArch64ISelLowering.cpp Summary: Adds break to 'x' case in getRegForInlineAsmConstraint added by D66302, fixing the unintentional fallthrough. Reviewers: sdesmalen, rovka, cameron.mcinally, greened, gribozavr, ruiu Reviewed By: sdesmalen Subscribers: bjope, javed.absar, tschuett, kristof.beyls, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67095 llvm-svn: 370769	2019-09-03 15:45:42 +00:00
Kerry McLaughlin	da4ef9b4c8	[SVE][Inline-Asm] Support for SVE asm operands Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673	2019-09-02 16:12:31 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Hans Wennborg	cff90f07cb	[SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711) Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204	2019-08-28 13:55:10 +00:00
Tim Northover	a7f226f9db	AArch64: avoid creating cycle in DAG for post-increment NEON ops. Inserting a value into Visited has the effect of terminating a search for predecessors if that node is seen. This is legitimate for the base address, and acts as a slight performance optimization, but the vector-building node can be paert of a legitimate cycle so we shouldn't stop searching there. PR43056. llvm-svn: 370036	2019-08-27 10:21:11 +00:00
Simon Pilgrim	c88408cf85	Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI. llvm-svn: 369751	2019-08-23 12:37:09 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Eli Friedman	b5eb3e1e82	[AArch64] Remove incorrect usage of MONonTemporal. This has no effect at the moment, but might matter if we try to implement non-temporal loads in the future. llvm-svn: 368770	2019-08-13 23:12:14 +00:00
Daniel Sanders	5ae66e56cf	[aarch64] Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Manual fixups in: AArch64InstrInfo.cpp - genFusedMultiply() now takes a Register* instead of unsigned* AArch64LoadStoreOptimizer.cpp - Ternary operator was ambiguous between Register/MCRegister. Settled on Register Depends on D65919 Reviewers: aemerson Subscribers: jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision for full review was: https://reviews.llvm.org/D65962 llvm-svn: 368628	2019-08-12 22:40:53 +00:00
Evandro Menezes	a005c1ac4f	[AArch64] Expand bcmp() for small block lengths Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should the target support the it. Unless `TTI::MemCmpExpansionOptions()` is overridden by the target. In a proprietary benchmark we see a performance drop of about 12% on PNG compression before this patch, though it passes all tests. This patch mirrors X86 for AArch64 and initializes `TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when appropriate. No tuning of the parameters was performed, but, at this point, it's enough to recover the performance drop above. This problem also exists on ARM. Once a consensus is reached for AArch64, we can work to fix ARM as well. Authors: - Evandro Menezes (@evandro) <e.menezes@samsung.com> - Brian Rzycki (@brzycki) <b.rzycki@samsung.com> Differential revision: https://reviews.llvm.org/D64805 llvm-svn: 367898	2019-08-05 18:09:14 +00:00
Cullen Rhodes	2a48176373	[AArch64] Implement initial SVE calling convention support Summary: This patch adds initial support for the SVE calling convention such that SVE types can be passed as arguments and return values to/from a subroutine. The SVE AAPCS states [1]: z0-z7 are used to pass scalable vector arguments to a subroutine, and to return scalable vector results from a function. If a subroutine takes arguments in scalable vector or predicate registers, or if it is a function that returns results in such registers, it must ensure that the entire contents of z8-z23 are preserved across the call. In other cases it need only preserve the low 64 bits of z8-z15, as described in §5.1.2. p0-p3 are used to pass scalable predicate arguments to a subroutine and to return scalable predicate results from a function. If a subroutine takes arguments in scalable vector or predicate registers, or if it is a function that returns results in these registers, it must ensure that p4-p15 are preserved across the call. In other cases it need not preserve any scalable predicate register contents. SVE predicate and data registers are passed indirectly (i.e. spilled to the stack and pass the address) if they exceed the registers used for argument passing defined by the PCS referenced above. Until SVE stack support is merged we can't spill SVE registers to the stack, so currently an llvm_unreachable is used where we will eventually handle this. [1] https://static.docs.arm.com/100986/0000/100986_0000.pdf Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D65448 llvm-svn: 367859	2019-08-05 13:44:10 +00:00
Florian Hahn	e3ea97b049	[AArch64] Skip isZIPMask check for masks with an odd number of elements. We process 2 elements at a time and expect the number of elements to be even. Similar to D60690. Reviewers: dmgreen, samparker, t.p.northover Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D65400 llvm-svn: 367831	2019-08-05 11:12:23 +00:00
Guillaume Chatelet	c97a3d15d2	[LLVM][Alignment] Introduce Alignment Type Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828	2019-08-05 11:02:05 +00:00
Bill Wendling	41a2847a9a	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750	2019-08-03 05:52:47 +00:00
Peter Collingbourne	33773d5cfc	SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned char to unsigned. This makes the field wider than MachineOperand::SubReg_TargetFlags so that we don't end up silently truncating any higher bits. We should still catch any bits truncated from the MachineOperand field as a consequence of the assertion in MachineOperand::setTargetFlags(). Differential Revision: https://reviews.llvm.org/D65465 llvm-svn: 367474	2019-07-31 20:14:09 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Amara Emerson	13af1ed8e3	[GlobalISel] Support for inlining memcpy, memset and memmove calls. This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951	2019-07-24 22:17:31 +00:00
Evgeniy Stepanov	d752f5e953	Basic codegen for MTE stack tagging. Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360	2019-07-17 19:24:02 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Tom Tan	7ecb5145ba	[COFF, ARM64] Fix encoding of debugtrap for Windows On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break point instruction on ARM64 which is recognized by Windows debugger and exception handling code, so llvm.debugtrap should map to it instead of redirecting to llvm.trap (brk #1) as the default implementation. Differential Revision: https://reviews.llvm.org/D63635 llvm-svn: 364115	2019-06-21 23:38:05 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Sam Parker	a0bd6f8a1a	[AArch64] Check for simple type in FPToUInt DAGCombiner was hitting a SimpleType assertion when trying to combine a v3f32 before type legalization. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916 Differential Revision: https://reviews.llvm.org/D62734 llvm-svn: 362365	2019-06-03 08:49:17 +00:00
Adhemerval Zanella	34d8daae53	[AArch64] Handle ISD::LRINT and ISD::LLRINT This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus fcvtzs. It currently only handles the scalar version. Reviewed By: SjoerdMeijer, mstorsjo Differential Revision: https://reviews.llvm.org/D62018 llvm-svn: 361877	2019-05-28 21:04:29 +00:00
Reid Kleckner	b7a78c7dff	[AArch64] Preserve X8 for thunks ending in variadic musttail calls Summary: On Windows, X8 may be used to pass in the address of an aggregate that is returned indirectly. Therefore, it should be forwarded to variadic musttail calls and preserved in thunks. Fixes PR41997 Reviewers: mgrang, efriedma Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62344 llvm-svn: 361585	2019-05-24 01:27:20 +00:00
Florian Hahn	4a8835c655	[AArch64] Skip mask checks for masks with an odd number of elements. Some checks in isShuffleMaskLegal expect an even number of elements, e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access invalid elements and crash. This patch adds checks to the impacted functions. Fixes PR41951 Reviewers: t.p.northover, dmgreen, samparker Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D60690 llvm-svn: 361235	2019-05-21 10:05:26 +00:00
Adhemerval Zanella	2d28db6b9f	[AArch64] Handle ISD::LROUND and ISD::LLROUND This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas instruction. It currently only handles the scalar version. llvm-svn: 360894	2019-05-16 13:30:18 +00:00
Mandeep Singh Grang	5dc8aeb26d	[COFF, ARM64] Fix ABI implementation of struct returns Summary: Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values Related clang patch: D60349 Reviewers: rnk, efriedma, TomTan, ssijaric Reviewed By: rnk, efriedma Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60348 llvm-svn: 359934	2019-05-03 21:12:36 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Craig Topper	35fe07916a	[AArch64] Teach getTestBitOperand to look through ANY_EXTENDS This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression. Differential Revision: https://reviews.llvm.org/D60482 llvm-svn: 358108	2019-04-10 17:27:29 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
Evandro Menezes	0f797b8732	[CodeGen] Refactor the option for the maximum jump table size Refactor the option `max-jump-table-size` to default to the maximum representable number. Essentially, NFC. llvm-svn: 357280	2019-03-29 17:28:11 +00:00
Adhemerval Zanella	a3cefa5d64	[AArch64] Optimize floating point materialization This patch follows some ideas from r352866 to optimize the floating point materialization even further. It changes isFPImmLegal to considere up to 2 mov instruction or up to 5 in case subtarget has fused literals. The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but the mov+fmov sequence is always better because of the reduced d-cache pressure. The timings are still the same if you consider movw+movk+fmov vs. adrp+ldr will be fused (although one instruction longer). Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D58460 llvm-svn: 356390	2019-03-18 18:45:57 +00:00
Adhemerval Zanella	664c1ef528	[TargetLowering] Add code size information on isFPImmLegal. NFC This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389	2019-03-18 18:40:07 +00:00
Nikita Popov	1a26144ff5	[AArch64] Turn BIC immediate creation into a DAG combine Switch BIC immediate creation for vector ANDs from custom lowering to a DAG combine, which gives generic DAG combines a change to apply first. In particular this avoids (and x, -1) being turned into a (bic x, 0) instead of being eliminated entirely. Differential Revision: https://reviews.llvm.org/D59187 llvm-svn: 356299	2019-03-15 21:04:34 +00:00
Nikita Popov	aa7cfa75f9	[SDAG][AArch64] Legalize VECREDUCE Fixes https://bugs.llvm.org/show_bug.cgi?id=36796. Implement basic legalizations (PromoteIntRes, PromoteIntOp, ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes. There are more legalizations missing (esp float legalizations), but there's no way to test them right now, so I'm not adding them. This also includes a few more changes to make this work somewhat reasonably: * Add support for expanding VECREDUCE in SDAG. Usually experimental.vector.reduce is expanded prior to codegen, but if the target does have native vector reduce, it may of course still be necessary to expand due to legalization issues. This uses a shuffle reduction if possible, followed by a naive scalar reduction. * Allow the result type of integer VECREDUCE to be larger than the vector element type. For example we need to be able to reduce a v8i8 into an (nominally) i32 result type on AArch64. * Use the vector operand type rather than the scalar result type to determine the action, so we can control exactly which vector types are supported. Also change the legalize vector op code to handle operations that only have vector operands, but no vector results, as is the case for VECREDUCE. * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE), explicitly specify for which vector types the reductions are supported. This does not handle anything related to VECREDUCE_STRICT_*. Differential Revision: https://reviews.llvm.org/D58015 llvm-svn: 355860	2019-03-11 20:22:13 +00:00
Abderrazek Zaafrani	5ced596198	[AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions https://reviews.llvm.org/D58855 llvm-svn: 355545	2019-03-06 20:30:06 +00:00
Abderrazek Zaafrani	abfd10807c	[AArch64] Improve FP16 vector convert from short instructions. https://reviews.llvm.org/D58563 llvm-svn: 355134	2019-02-28 20:21:46 +00:00
Abderrazek Zaafrani	2fc498a652	[AArch64] Generate FP16 vector compare instructions. https://reviews.llvm.org/D58561 llvm-svn: 355050	2019-02-28 00:31:38 +00:00
Petr Hosek	fcbec02ea6	[AArch64] Support reserving arbitrary general purpose registers This is a follow up to D48580 and D48581 which allows reserving arbitrary general purpose registers with the exception of registers with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM (X0, X19). This change also generalizes some of the existing logic to rely entirely on values generated from tablegen. Differential Revision: https://reviews.llvm.org/D56305 llvm-svn: 353957	2019-02-13 17:28:47 +00:00
Nikita Popov	a3be17ea1c	[AArch64] Expand v8i8 cttz (PR39729) Fix for https://bugs.llvm.org/show_bug.cgi?id=39729. Rather than adding just a case for v8i8 I'm setting cttz to expand for all vector types. Differential Revision: https://reviews.llvm.org/D58008 llvm-svn: 353872	2019-02-12 18:55:53 +00:00
Eli Friedman	29c0609301	[AArch64] Fix condition for "high-vector" DUP optimizations. AArch64 NEON has a bunch of instructions with a "2" suffix that extract the top half of the source vectors, instead of the bottom half. We have some DAGCombines to try to take advantage of that. However, they assumed that any EXTRACT_VECTOR was extracting the high half of the vector in question. This issue has apparently existed since the AArch64 backend was merged. Fixes https://bugs.llvm.org/show_bug.cgi?id=40632 . Differential Revision: https://reviews.llvm.org/D57862 llvm-svn: 353486	2019-02-08 00:23:35 +00:00
Florian Hahn	3b251963c3	[CGP] Add support for sinking operands to their users, if they are free. This patch improves code generation for some AArch64 ACLE intrinsics. It adds support to CGP to duplicate and sink operands to their user, if they can be folded into a target instruction, like zexts and sub into usubl. It adds a TargetLowering hook shouldSinkOperands, which looks at the operands of instructions to see if sinking is profitable. I decided to add a new target hook, as for the sinking to be profitable, at least on AArch64, we have to look at multiple operands of an instruction, instead of looking at the users of a zext for example. The sinking is done in CGP, because it works around an instruction selection limitation. If instruction selection is not limited to a single basic block, this patch should not be needed any longer. Alternatively this could be done in the LoopSink pass, which tries to undo LICM for instructions in blocks that are not executed frequently. Note that we do not force the operands to sink to have a single user, because we duplicate them before sinking. Therefore this is only desirable if they really can be done for free. Additionally we could consider the impact on live ranges later on. This should fix https://bugs.llvm.org/show_bug.cgi?id=40025. As for performance, we have internal code that uses intrinsics and can be speed up by 10% by this change. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D57377 llvm-svn: 353152	2019-02-05 10:27:40 +00:00
Mandeep Singh Grang	70d484d94e	[COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present. Reviewers: rnk, efriedma, ssijaric, TomTan Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D57183 llvm-svn: 352923	2019-02-01 21:41:33 +00:00
James Y Knight	7716075a17	[opaque pointer types] Pass value type to GetElementPtr creation. This cleans up all GetElementPtr creation in LLVM to explicitly pass a value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57173 llvm-svn: 352913	2019-02-01 20:44:47 +00:00
James Y Knight	7976eb5838	[opaque pointer types] Pass function types to CallInst creation. This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909	2019-02-01 20:43:25 +00:00
Adhemerval Zanella	b3ccc5550d	[AArch64] Optimize floating point materialization This patch changes isFPImmLegal to return if the value can be enconded as the immediate operand of a logical instruction besides checking if for immediate field for fmov. This optimizes some floating point materization, inclusive values used on isinf lowering. Reviewed By: rengolin, efriedma, evandro Differential Revision: https://reviews.llvm.org/D57044 llvm-svn: 352866	2019-02-01 12:26:06 +00:00
James Y Knight	13680223b9	[opaque pointer types] Add a FunctionCallee wrapper type, and use it. Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc doesn't choke on it, hopefully. Original Message: The FunctionCallee type is effectively a {FunctionType,Value} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352827	2019-02-01 02:28:03 +00:00
James Y Knight	fadf25068e	Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it." This reverts commit `f47d6b38c7` (r352791). Seems to run into compilation failures with GCC (but not clang, where I tested it). Reverting while I investigate. llvm-svn: 352800	2019-01-31 21:51:58 +00:00
James Y Knight	f47d6b38c7	[opaque pointer types] Add a FunctionCallee wrapper type, and use it. The FunctionCallee type is effectively a {FunctionType,Value} pair, and is a useful convenience to enable code to continue passing the result of getOrInsertFunction() through to EmitCall, even once pointer types lose their pointee-type. Then: - update the CallInst/InvokeInst instruction creation functions to take a Callee, - modify getOrInsertFunction to return FunctionCallee, and - update all callers appropriately. One area of particular note is the change to the sanitizer code. Previously, they had been casting the result of `getOrInsertFunction` to a `Function*` via `checkSanitizerInterfaceFunction`, and storing that. That would report an error if someone had already inserted a function declaraction with a mismatching signature. However, in general, LLVM allows for such mismatches, as `getOrInsertFunction` will automatically insert a bitcast if needed. As part of this cleanup, cause the sanitizer code to do the same. (It will call its functions using the expected signature, however they may have been declared.) Finally, in a small number of locations, callers of `getOrInsertFunction` actually were expecting/requiring that a brand new function was being created. In such cases, I've switched them to Function::Create instead. Differential Revision: https://reviews.llvm.org/D57315 llvm-svn: 352791	2019-01-31 20:35:56 +00:00
Reid Kleckner	96c581d7d0	[AArch64] Include AArch64GenCallingConv.inc once Summary: Avoids duplicating generated static helpers for calling convention analysis. This also means you can modify AArch64CallingConv.td without recompiling the AArch64ISelLowering.cpp monolith, so it provides faster incremental rebuilds. Saves 12K in llc.exe, but adds a new object file, which is large. Reviewers: efriedma, t.p.northover Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56948 llvm-svn: 352430	2019-01-28 21:28:40 +00:00
Matt Arsenault	39508331ef	Reapply "IR: Add fp operations to atomicrmw" This reapplies commits r351778 and r351782 with RISCV test fixes. llvm-svn: 351850	2019-01-22 18:18:02 +00:00
Chandler Carruth	285fe716c5	Revert r351778: IR: Add fp operations to atomicrmw This broke the RISCV build, and even with that fixed, one of the RISCV tests behaves surprisingly differently with asserts than without, leaving there no clear test pattern to use. Generally it seems bad for hte IR to differ substantially due to asserts (as in, an alloca is used with asserts that isn't needed without!) and nothing I did simply would fix it so I'm reverting back to green. This also required reverting the RISCV build fix in r351782. llvm-svn: 351796	2019-01-22 10:29:58 +00:00
Matt Arsenault	bfdba5e4fc	IR: Add fp operations to atomicrmw Add just fadd/fsub for now. llvm-svn: 351778	2019-01-22 03:32:36 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Matt Arsenault	0cb08e448a	Allow FP types for atomicrmw xchg llvm-svn: 351427	2019-01-17 10:49:01 +00:00
Mandeep Singh Grang	33c49c0c82	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370	2019-01-16 19:52:59 +00:00
Eli Friedman	33aecc8182	[AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 . Otherwise, with D56544, the intrinsic will be expanded to an integer csel, which is probably not what the user expected. This matches the general convention of using "v1" types to represent scalar integer operations in vector registers. While I'm here, also add some error checking so we don't generate illegal ABS nodes. Differential Revision: https://reviews.llvm.org/D56616 llvm-svn: 351141	2019-01-15 00:15:24 +00:00
Bryan Chan	7ce5775e62	[AArch64] Fix operation actions for FP16 vector intrinsics Summary: This patch changes the legalization action for some half-precision floating- point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops are not supported in hardware for half-precision vectors, but promotion is not always possible (for v8f16 operands). Changing the action to Expand fixes an assertion failure in the legalizer when the frontend produces such ops. In addition, a quick microbenchmark shows that, in the v4f16 case, expanding introduces fewer spills and is therefore slightly faster than promoting. Reviewers: t.p.northover, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56296 llvm-svn: 350825	2019-01-10 15:02:37 +00:00
Tim Northover	964eea7ad2	AArch64: avoid splitting vector truncating stores. We have code to split vector splats (of zero and non-zero) for performance reasons, but it ignores the fact that a store might be truncating. Actually, truncating stores are formed for vNi8 and vNi16 types. Since the truncation is from a legal type, the size of the store is always <= 64-bits and so they don't actually benefit from being split up anyway, so this patch just disables that transformation. llvm-svn: 350620	2019-01-08 13:30:27 +00:00
Simon Pilgrim	148957f336	[AArch64] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349908	2018-12-21 15:05:10 +00:00
Kristof Beyls	e66bc1f756	Introduce control flow speculation tracking pass for AArch64 The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/). For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision: https://reviews.llvm.org/D54896 llvm-svn: 349456	2018-12-18 08:50:02 +00:00
Arnaud A. de Grandmaison	dfe861087d	[AArch64] Catch some more CMN opportunities. Fixes https://bugs.llvm.org/show_bug.cgi?id=33486 llvm-svn: 349022	2018-12-13 10:31:32 +00:00
Matthias Braun	d041212c07	AArch64: Fix invalid CCMP emission The code emitting AND-subtrees used to check whether any of the operands was an OR in order to figure out if the result needs to be negated. However the OR could be hidden in further subtrees and not immediately visible. Change the code so that canEmitConjunction() determines whether the result of the generated subtree needs to be negated. Cleanup emission logic to use this. I also changed the code a bit to make all negation decisions early before we actually emit the subtrees. This fixes http://llvm.org/PR39550 Differential Revision: https://reviews.llvm.org/D54137 llvm-svn: 348444	2018-12-06 01:40:23 +00:00
Craig Topper	129d529ab3	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
David Blaikie	1fecbec5fa	AArch64ISelLowering: Remove a return-of-assignment to allow NRVO Patch by Arthur O'Dwyer! llvm-svn: 347609	2018-11-26 22:57:18 +00:00
John Brawn	d6e0ebea10	[AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop. Fix this by making LowerBUILD_VECTOR not do this transformation for those vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element constant vector. Doing that means we get a DUP, which we then need to recognise in ISel as a copy. llvm-svn: 347456	2018-11-22 11:45:23 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Eli Friedman	ad1151cf6a	[ARM64] [Windows] Handle funclets This patch adds support for funclets in frame lowering and ISel lowering. Together with D50288 and D50166, it enables C++ exception handling. Patch by Sanjin Sijaric, with some fixes by me. Differential Revision: https://reviews.llvm.org/D51524 llvm-svn: 346568	2018-11-09 23:33:30 +00:00
Mandeep Singh Grang	397765bc51	[COFF, ARM64] Add support for MSVC buffer security check Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan Reviewed By: rnk Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D54248 llvm-svn: 346469	2018-11-09 02:48:36 +00:00
Matthias Braun	96d12513a1	AArch64: Cleanup CCMP code; NFC Cleanup CCMP pattern matching code in preparation for review/bugfix: - Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()` (it won't accept arbitrary disjunctions and is really about whether we can transform the subtree into a conjunction that we can emit). - Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()` llvm-svn: 346203	2018-11-06 03:15:22 +00:00
Craig Topper	0b5f8169b0	[TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180	2018-11-05 23:26:13 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Mandeep Singh Grang	df19e57a1c	[COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53962 llvm-svn: 345892	2018-11-01 21:23:47 +00:00
Reid Kleckner	eb56894a4b	[AArch64] Fix unintended fallthrough and strengthen cast This was added in r330630. GCC's -Wimplicit-fallthrough seems to not fire when the previous case contains a switch itself. This fallthrough was bening because the helper function implementing the case used dyn_cast to re-check the type of the node in question. After fixing the fallthrough, we can strengthen the cast. llvm-svn: 345864	2018-11-01 18:02:27 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Mandeep Singh Grang	71e0cc2a0b	[COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg Summary: Thunk functions in Windows are varag functions that call a musttail function to pass the arguments after the fixup is done. We need to make sure that we forward the arguments from the caller vararg to the callee vararg function. This is the same mechanism that is used for Windows on X86. Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53843 llvm-svn: 345641	2018-10-30 20:46:10 +00:00
David Greene	3e89fa8e08	[AArch64] Create proper memoperand for multi-vector stores Re-apply r345315 with testcase fixes. Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345631	2018-10-30 19:17:51 +00:00
Vlad Tsyrklevich	21beeb29ea	Revert "[AArch64] Create proper memoperand for multi-vector stores" This reverts commit r345315, it was causing test failures on sanitizer-x86_64-linux-fast. llvm-svn: 345356	2018-10-26 02:00:14 +00:00
David Greene	53e869da7d	[AArch64] Create proper memoperand for multi-vector stores Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345315	2018-10-25 21:10:39 +00:00
Thomas Lively	30f1d69115	[NFC] Rename minnan and maxnan to minimum and maximum Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218	2018-10-24 22:49:55 +00:00
Tim Northover	1c353419ab	AArch64: add a pass to compress jump-table entries when possible. llvm-svn: 345188	2018-10-24 20:19:09 +00:00
Simon Pilgrim	095a7fe635	[AARCH64] Improve vector popcnt lowering with ADDLP AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554	2018-10-15 21:15:58 +00:00
Arnaud A. de Grandmaison	162435e7b5	[AArch64] Swap comparison operands if that enables some folding. Summary: AArch64 can fold some shift+extend operations on the RHS operand of comparisons, so swap the operands if that makes sense. This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751 Reviewers: efriedma, t.p.northover, javed.absar Subscribers: mcrosier, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53067 llvm-svn: 344439	2018-10-13 07:43:56 +00:00
Nirav Dave	e40e2bbd37	[AArch64] Share search bookkeeping in combines. NFCI. Share predecessor search bookkeeping in both perform PostLD1Combine and performNEONPostLDSTCombine. This should be approximately a 4x and 2x performance improvement. llvm-svn: 342986	2018-09-25 15:30:22 +00:00
Tri Vo	6c47c62588	[AArch64] Support adding X[8-15,18] registers as CSRs. Summary: Specifying X[8-15,18] registers as callee-saved is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we: - use custom CSR list/mask when user specifies custom CSRs - update Machine Register Info's list of CSRs with additional custom CSRs in LowerCall and LowerFormalArguments. Reviewers: srhines, nickdesaulniers, efriedma, javed.absar Reviewed By: nickdesaulniers Subscribers: kristof.beyls, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52216 llvm-svn: 342824	2018-09-22 22:17:50 +00:00
Alex Bradbury	79518b02cd	[AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they will see no functional change. Previously this hook returned bool, but it now returns AtomicExpansionKind. This hook allows targets to select how a given cmpxchg is to be expanded. D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic. See my associated RFC for more info on the motivation for this change <http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>. Differential Revision: https://reviews.llvm.org/D48130 llvm-svn: 342550	2018-09-19 14:51:42 +00:00
Sander de Smalen	4dbc512676	[AArch64] Add parsing of aarch64_vector_pcs attribute. This patch adds parsing support for the 'aarch64_vector_pcs' calling convention attribute to calls and function declarations. More information describing the vector ABI and procedure call standard can be found here: https://developer.arm.com/products/software-development-tools/\ hpc/arm-compiler-for-hpc/vector-function-abi Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D51477 llvm-svn: 342030	2018-09-12 08:54:06 +00:00
Nick Desaulniers	287a3be379	[AArch64] Support reserving x1-7 registers. Summary: Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7. Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma Reviewed By: nickdesaulniers, efriedma Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48580 llvm-svn: 341706	2018-09-07 20:58:57 +00:00
JF Bastien	2920061105	ARM64: improve non-zero memset isel by ~2x Summary: I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated where the generated code was bad. This patch fixes the majority of the issues by requesting that a 2xi64 vector be used for memset of 32 bytes and above. The patch leaves the former request for f128 unchanged, despite f128 materialization being suboptimal: doing otherwise runs into other asserts in isel and makes this patch too broad. This patch hides the issue that was present in bzero_40_stack and bzero_72_stack because the code now generates in a better order which doesn't have the store offset issue. I'm not aware of that issue appearing elsewhere at the moment. <rdar://problem/44157755> Reviewers: t.p.northover, MatzeB, javed.absar Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51706 llvm-svn: 341558	2018-09-06 16:03:32 +00:00
JF Bastien	da33900b95	NFC: improve ARM64 isFPImmLegal debug print Forking this change from D51706. This just made it easier to understand llc output with -debug. llvm-svn: 341504	2018-09-05 23:38:11 +00:00
Martin Storsjo	fed420d6b6	[MinGW] [AArch64] Add stubs for potential automatic dllimported variables The runtime pseudo relocations can't handle the AArch64 format PC relative addressing in adrp+add/ldr pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51452 llvm-svn: 341401	2018-09-04 20:56:21 +00:00
Martin Storsjo	5c984fb16d	[AArch64] Simplify code in LowerGlobalAddress. NFCI. When initial support for dllimport was added for aarch64 in SVN r316555, ClassifyGlobalReference didn't set the MO_DLLIMPORT flag - that was only completed in SVN r323810. Reuse the return value from ClassifyGlobalReference for this purpose as well. llvm-svn: 341310	2018-09-03 11:59:23 +00:00
Eli Friedman	071203bbf2	[AArch64] Reject inline asm with FP registers when FP is disabled. Otherwise, we would crash trying to deal with an illegal input. Differential Revision: https://reviews.llvm.org/D51202 llvm-svn: 340637	2018-08-24 19:12:13 +00:00
David Green	9dd1d451d9	[AArch64] Add Tiny Code Model for AArch64 This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397	2018-08-22 11:31:39 +00:00
Chandler Carruth	66654b72c9	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Eli Friedman	0d12e90bf5	[ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly. Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734	2018-08-14 22:10:25 +00:00
Bryan Chan	e023706471	[AArch64] Fix assertion failure on widened f16 BUILD_VECTOR Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013	2018-08-06 14:14:41 +00:00
Craig Topper	2f60ef2c78	[DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329	2018-07-30 23:22:00 +00:00
Craig Topper	a568a27dfa	[DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2. llvm-svn: 338303	2018-07-30 21:04:34 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Tim Northover	70666e7765	[AArch64] Implement FLT_ROUNDS macro. Very similar to ARM implementation, just maps to an MRS. Should fix PR25191. Patch by Michael Brase. llvm-svn: 335118	2018-06-20 12:09:01 +00:00
Petr Hosek	7250908016	[AArch64] Support reserving x20 register Register x20 is a callee-saved register which may be used for other purposes in certain contexts, for example to hold special variables within the kernel. This change adds support for reserving this register both to frontend and backend to make this register usable for these purposes. Differential Revision: https://reviews.llvm.org/D46552 llvm-svn: 334531	2018-06-12 20:00:50 +00:00
Evandro Menezes	f8425340e4	[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change makes the inlining of `memset()` and `memcpy()` more aggressive when compiling for speed. The tuning remains the same when optimizing for size. Patch by: Sebastian Pop <s.pop@samsung.com> Evandro Menezes <e.menezes@samsung.com> Differential revision: https://reviews.llvm.org/D45098 llvm-svn: 333429	2018-05-29 15:58:50 +00:00
Sirish Pande	cabe50a308	[AArch64] Gangup loads and stores for pairing. Keep loads and stores together (target defines how many loads and stores to gang up), such that it will help in pairing and vectorization. Differential Revision https://reviews.llvm.org/D46477 llvm-svn: 332482	2018-05-16 15:36:52 +00:00
Peter Smith	c811758da6	[AArch64] Support "S" inline assembler constraint This patch re-introduces the "S" inline assembler constraint. This matches an absolute symbolic address or a label reference. The primary use case is asm("adrp %0, %1\n\t" "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var)); I say re-introduces as it seems like "S" was implemented in the original AArch64 backend, but it looks like it wasn't carried forward to the merged backend. The original implementation had A and L modifiers that could be used to print ":lo12:" to the string. It looks like gcc doesn't use these and :lo12: is expected to be written in the inline assembly string so I've not implemented A and L. Clang already supports the S modifier. Fixes PR37180 Differential Revision: https://reviews.llvm.org/D46745 llvm-svn: 332444	2018-05-16 09:33:25 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Geoff Berry	60460268c0	[AArch64] Fix performPostLD1Combine to check for constant lane index. Summary: performPostLD1Combine in AArch64ISelLowering looks for vector insert_vector_elt of a loaded value which it can optimize into a single LD1LANE instruction. The code checking for the pattern was not checking if the lane index was a constant which could cause two problems: - an assert when lowering the LD1LANE ISD node since it assumes an constant operand - an assert in isel if the lane index value depends on the post-incremented base register Both of these issues are avoided by simply checking that the lane index is a constant. Fixes bug 35822. Reviewers: t.p.northover, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46591 llvm-svn: 332103	2018-05-11 16:25:06 +00:00
Haicheng Wu	0aae2bc260	[CGP] Split large data structres to sink more GEPs Accessing the members of a large data structures needs a lot of GEPs which usually have large offsets due to the size of the underlying data structure. If the offsets are too large to fit into the r+i addressing mode, these GEPs cannot be sunk to their users' blocks and many extra registers are needed then to carry the values of these GEPs. This patch tries to split a large data struct starting from %base like the following. Before: BB0: %base = BB1: %gep0 = gep %base, off0 %gep1 = gep %base, off1 %gep2 = gep %base, off2 BB2: %load1 = load %gep0 %load2 = load %gep1 %load3 = load %gep2 After: BB0: %base = %new_base = gep %base, off0 BB1: %new_gep0 = %new_base %new_gep1 = gep %new_base, off1 - off0 %new_gep2 = gep %new_base, off2 - off0 BB2: %load1 = load i32, i32* %new_gep0 %load2 = load i32, i32* %new_gep1 %load3 = load i32, i32* %new_gep2 In the above example, the struct is split into two parts. The first part still starts from %base and the second part starts from %new_base. After the splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to BB2 and folded into their users. The algorithm to split data structure is simple and very similar to the work of merging SExts. First, it collects GEPs that have large offsets when iterating the blocks. Second, it splits the underlying data structures and updates the collected GEPs to use smaller offsets. Differential Revision: https://reviews.llvm.org/D42759 llvm-svn: 332015	2018-05-10 18:27:36 +00:00
Craig Topper	781aa181ab	Fix a bunch of places where operator-> was used directly on the return from dyn_cast. Inspired by r331508, I did a grep and found these. Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa. llvm-svn: 331577	2018-05-05 01:57:00 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Adhemerval Zanella	a57ef17ab6	[AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32 This patch adds a custom lowering for ISD::MULH{S,U} used on divide by constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV). New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL can be correctly lowered to smull2 and umull2. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46009 llvm-svn: 331522	2018-05-04 14:33:55 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Peter Collingbourne	5ab4a4793e	Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure. This reland includes a check to prevent the DAG combiner from folding an offset that is smaller than the existing one. This can cause oscillations between two possible DAGs, which was the cause of the hang and later assertion failure observed on the lnt-ctmark-aarch64-O3-flto bot. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ Original commit message: > This is a code size win in code that takes offseted addresses > frequently, such as C++ constructors that typically need to compute > an offseted address of a vtable. This reduces the size of Chromium > for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 330630	2018-04-23 19:09:34 +00:00
Peter Collingbourne	ab40e44ba1	Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses." Caused a hang and eventually an assertion failure in LTO builds of 7zip-benchmark on aarch64 iOS targets. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ llvm-svn: 330063	2018-04-13 20:21:00 +00:00
Peter Collingbourne	00db326b0d	AArch64: Introduce a DAG combine for folding offsets into addresses. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. This reduces the size of Chromium for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329956	2018-04-12 21:23:55 +00:00
Amara Emerson	e27d5016ef	[AArch64] Fix isel failure when BUILD_PAIR nodes are left over. rdar://39175175 llvm-svn: 329743	2018-04-10 19:01:58 +00:00
Peter Collingbourne	a7d936f0c0	Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF." Caused a build failure in check-tsan. llvm-svn: 329718	2018-04-10 16:19:30 +00:00
Peter Collingbourne	5cff2409ae	AArch64: Allow offsets to be folded into addresses with ELF. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611	2018-04-09 19:59:57 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
John Brawn	e3b44f9de6	[AArch64] Don't reduce the width of loads if it prevents combining a shift Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321	2018-03-23 14:47:07 +00:00
Martin Storsjo	9a55c1b0dc	[ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897	2018-03-19 20:06:50 +00:00
Martin Storsjo	36d6419cc5	[AArch64] Skip an unnecessary getCopyToReg in DYNAMIC_STACKALLOC Differential Revision: https://reviews.llvm.org/D44586 llvm-svn: 327779	2018-03-17 20:08:48 +00:00
Martin Storsjo	bde677289a	[AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC Support for this relocation is missing in both LLD and GNU binutils at the moment. This reverts the ELF parts of SVN r327316. llvm-svn: 327503	2018-03-14 13:09:10 +00:00
Martin Storsjo	7bc64bd889	[AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str Differential Revision: https://reviews.llvm.org/D44355 llvm-svn: 327316	2018-03-12 18:47:43 +00:00
Martin Storsjo	cc24096d4d	[AArch64] Implement native TLS for Windows Differential Revision: https://reviews.llvm.org/D43971 llvm-svn: 327220	2018-03-10 19:05:21 +00:00
Sebastian Pop	41073e8046	[AArch64] define isExtractSubvectorCheap Following the ARM-neon backend, define isExtractSubvectorCheap to return true when extracting low and high part of a neon register. The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive" all DAG transforms until ISelLowering. The testcase is supposed to check that AArch64TargetLowering::ReconstructShuffle() works, and for that we need a BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to ISelLowering. As there is no way to disable the combiner to only exercise the code in ISelLowering, the patch disables the testcase. Differential revision: https://reviews.llvm.org/D43973 llvm-svn: 326811	2018-03-06 16:54:55 +00:00
Sebastian Pop	ac0bfb5938	fix PR36582 The error occurs when reading i16 elements (as in the testcase) from a v8i8 with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the operation is not a VUZP. The patch stops the pattern recognition of VUZP when EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR. llvm-svn: 326722	2018-03-05 17:35:49 +00:00
Evandro Menezes	cd855f70c5	[AArch64] Improve code generation of constant vectors Use the whole gammut of constant immediates available to set up a vector. Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which transfers between register files, use the more efficient `movi v0.4s, #-1` instead. Not limited to just a few values, but any immediate value that can be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating the need to there be patterns to optimize special cases. Differential revision: https://reviews.llvm.org/D42133 llvm-svn: 326718	2018-03-05 17:02:47 +00:00
Evandro Menezes	2bbb4a7c93	[AArch64] Clean up code (NFC) Clean up a couple of functions in `AArch64TargetLowering` by removing redundant statements. llvm-svn: 326486	2018-03-01 21:17:36 +00:00
Sebastian Pop	c33af715d7	[AArch64] generate vuzp instead of mov when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the BUILD_VECTOR with either vuzp1 or vuzp2. With this patch LLVM generates the following code for the first function fun1 in the testcase: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b ext v1.16b, v0.16b, v0.16b, #8 uzp1 v0.8b, v0.8b, v1.8b str d0, [x8] ret Without this patch LLVM currently generates this code: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b mov v1.16b, v0.16b mov v1.b[1], v0.b[2] mov v1.b[2], v0.b[4] mov v1.b[3], v0.b[6] mov v1.b[4], v0.b[8] mov v1.b[5], v0.b[10] mov v1.b[6], v0.b[12] mov v1.b[7], v0.b[14] str d1, [x8] ret llvm-svn: 326443	2018-03-01 15:47:39 +00:00
Chih-Hung Hsieh	9f9e4681ac	[TLS] use emulated TLS if the target supports only this mode Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341	2018-02-28 17:48:55 +00:00
Evandro Menezes	72f3983633	[AArch64] Refactor instructions using SIMD immediates Get rid of icky goto loops and make the code easier to maintain. Otherwise, NFC. Restore r324903 and fix PR36369. Differentail revision: https://reviews.llvm.org/D43364 llvm-svn: 325621	2018-02-20 20:31:45 +00:00
Martin Storsjo	a63a5b993e	[AArch64] Implement dynamic stack probing for windows This makes sure that alloca() function calls properly probe the stack as needed. Differential Revision: https://reviews.llvm.org/D42356 llvm-svn: 325433	2018-02-17 14:26:32 +00:00
Evandro Menezes	10ae20d80c	[AArch64] Fix BITCAST lowering crash The data type is assumed to be a vector, but sometimes it is not, leading to an assertion. Add simple test-case to verify this. Differential revision: https://reviews.llvm.org/D42599 llvm-svn: 325378	2018-02-16 20:00:57 +00:00
Hans Wennborg	f381e94ac8	Revert r324903 "[AArch64] Refactor identification of SIMD immediates" It caused "Cannot select: t33: f64 = AArch64ISD::FMOV Constant:i32<0>" in Chromium builds. See PR36369. > Get rid of icky goto loops and make the code easier to maintain (NFC). > > Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 325034	2018-02-13 18:14:38 +00:00
Oliver Stannard	02f08c9d1f	[AArch64] Improve v8.1-A code-gen for atomic load-and Armv8.1-A added an atomic load-clear instruction (which performs bitwise and with the complement of it's operand), but not a load-and instruction. Our current code-generation for atomic load-and always inserts an MVN instruction to invert its argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-and operation into an xor with -1 and a load-clear, allowing the normal DAG optimisations to work on it. To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't see any easy way to do this with an AArch64-specific ISD node, because the code-generation for atomic operations assumes the SDNodes are of type AtomicSDNode. I've left the old tablegen patterns in because they are still needed for global isel. Differential revision: https://reviews.llvm.org/D42478 llvm-svn: 324908	2018-02-12 17:03:11 +00:00
Evandro Menezes	7dc0f1ec45	[AArch64] Refactor identification of SIMD immediates Get rid of icky goto loops and make the code easier to maintain (NFC). Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 324903	2018-02-12 16:41:41 +00:00
Oliver Stannard	4269917304	[AArch64] Improve v8.1-A code-gen for atomic load-subtract Armv8.1-A added an atomic load-add instruction, but not a load-subtract instruction. Our current code-generation for atomic load-subtract always inserts a NEG instruction to negate it's argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-subtract operation into a subtract and a load-add, allowing the normal DAG optimisations to work on it. I've left the old tablegen patterns in because they are still needed for global isel. Some of the tests in this patch are copied from D35375 by Chad Rosier (which was abandoned). Differential revision: https://reviews.llvm.org/D42477 llvm-svn: 324892	2018-02-12 14:22:03 +00:00
Sjoerd Meijer	5ea465ded7	[AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled. I've not added any tests, because the problem was visible in: test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll, which I had to change: I don't think Cyclone has FullFP16 enabled by default, so it shouldn't be using this v8.2a instruction. I've also removed these rdar tags, please shout if there are any objections. Differential Revision: https://reviews.llvm.org/D43020 llvm-svn: 324581	2018-02-08 08:39:05 +00:00
Sanjay Patel	d7bed12192	[AArch64] remove bogus comment; NFC I added this comment with D42323, but as discussed in D42806, the architecture does the right thing for denorms. We don't even need the select on 0.0 here? llvm-svn: 323996	2018-02-01 19:59:33 +00:00
Sanjay Patel	657e5d8d41	[DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994) As shown in the example in PR34994: https://bugs.llvm.org/show_bug.cgi?id=34994 ...we can return a very wrong answer (inf instead of 0.0) for square root when using a reciprocal square root estimate instruction. Here, I've conditionalized the filtering out of denorms based on the function having "denormal-fp-math"="ieee" in its attributes. The other options for this attribute are 'preserve-sign' and 'positive-zero'. So we don't generate this extra code by default with just '-ffast-math' (because then there's no denormal attribute string at all), but it works if you specify '-ffast-math -fdenormal-fp-math=ieee' from clang. As noted in the review, there may be other problems in clang that affect the results depending on platform (Linux x86 at least), but this should allow creating the desired codegen. Differential Revision: https://reviews.llvm.org/D42323 llvm-svn: 323981	2018-02-01 16:57:18 +00:00
Jun Bum Lim	fc7d56d949	Revert "AArch64: Omit callframe setup/destroy when not necessary" This reverts commit r322917 due to multiple performance regressions in spec2006 and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially motivated this change. llvm-svn: 323683	2018-01-29 19:56:42 +00:00
Oliver Stannard	a9d2e004d2	[AArch64] Generate the CASP instruction for 128-bit cmpxchg The Large System Extension added an atomic compare-and-swap instruction that operates on a pair of 64-bit registers, which we can use to implement a 128-bit cmpxchg. Because i128 is not a legal type for AArch64 we have to do all of the instruction selection in C++, and the instruction requires even/odd register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM backend. Differential revision: https://reviews.llvm.org/D42104 llvm-svn: 323634	2018-01-29 09:18:37 +00:00
Joel Jones	0715092c65	[AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others. This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma). Test case demonstrating the effects on T99 is included. Patch by: steleman (Stefan Teleman) Differential Revision: https://reviews.llvm.org/D40696 llvm-svn: 323474	2018-01-25 21:55:39 +00:00
Pablo Barrio	9b3d4c01a0	[AArch64] Avoid unnecessary vector byte-swapping in big-endian Summary: Loads/stores of some NEON vector types are promoted to other vector types with different lane sizes but same vector size. This is not a problem in little-endian but, when in big-endian, it requires additional byte reversals required to preserve the lane ordering while keeping the right endianness of the data inside each lane. For example: %1 = load <4 x half>, <4 x half>* %p results in the following assembly: ld1 { v0.2s }, [x1] rev32 v0.4h, v0.4h This patch changes the promotion of these loads/stores so that the actual vector load/store (LD1/ST1) takes care of the endianness correctly and there is no need for further byte reversals. The previous code now results in the following assembly: ld1 { v0.4h }, [x1] Reviewers: olista01, SjoerdMeijer, efriedma Reviewed By: efriedma Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D42235 llvm-svn: 323325	2018-01-24 14:13:47 +00:00
Carey Williams	da15b5b116	[AArch64] optimise v4f16 fcmps to utilise vector instructions Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported. Generating FCTVL(s) rather than a longer series of FCVTs. Differential Revision: https://reviews.llvm.org/D41772 llvm-svn: 323118	2018-01-22 14:16:11 +00:00
Carey Williams	22c49c6470	Test commit llvm-svn: 322958	2018-01-19 16:55:23 +00:00
Matthias Braun	5c290dc206	AArch64: Fix emergency spillslot being out of reach for large callframes Re-commit of r322200: The testcase shouldn't hit machineverifiers anymore with r322917 in place. Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322919	2018-01-19 03:16:36 +00:00
Matthias Braun	dc4b3e87f4	AArch64: Omit callframe setup/destroy when not necessary Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to setup and the callframe size is 0. - Fixes an invalid callframe nesting for byval arguments, which would look like this before this patch (as in `big-byval.ll`): ... ADJCALLSTACKDOWN 32768, 0, ... # Setup for extfunc ... ADJCALLSTACKDOWN 0, 0, ... # setup for memcpy ... BL &memcpy ... ADJCALLSTACKUP 0, 0, ... # destroy for memcpy ... BL &extfunc ADJCALLSTACKUP 32768, 0, ... # destroy for extfunc - Saves us two instructions in the common case of zero-sized stackframes. - Remove an unnecessary scheduling barrier (hence the small unittest changes). Differential Revision: https://reviews.llvm.org/D42006 llvm-svn: 322917	2018-01-19 02:45:38 +00:00
Matthias Braun	e3a8db7ba1	Revert "AArch64: Fix emergency spillslot being out of reach for large callframes" Revert for now as the testcase is hitting a pre-existing verifier error that manifest as a failure when expensive checks are enabled (or -verify-machineinstrs) is used. This reverts commit r322200. llvm-svn: 322231	2018-01-10 22:36:28 +00:00
Matthias Braun	b42ffa1283	AArch64: Fix emergency spillslot being out of reach for large callframes Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322200	2018-01-10 18:16:24 +00:00
Craig Topper	a4f9997675	[SelectionDAG][X86][AArch64] Require targets to specify the promotion type when using setOperationAction Promote for INT_TO_FP and FP_TO_INT Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same. If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works. getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening. FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum. Differential Revision: https://reviews.llvm.org/D40664 llvm-svn: 321629	2018-01-01 19:21:35 +00:00
Matthias Braun	a4852d2c19	X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI Note: - X86ISelLowering: setLibcallName(SINCOS) was superfluous as InitLibcalls() already does it. - ARMISelLowering: Setting libcallnames for sincos/sincosf seemed superfluous as in the darwin case it wouldn't be used while for all other cases InitLibcalls already does it. llvm-svn: 321036	2017-12-18 23:19:42 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00
Matt Arsenault	1117133687	DAG: Expose all MMO flags in getTgtMemIntrinsic Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. llvm-svn: 320746	2017-12-14 21:39:51 +00:00
Joel Galenson	3e40883e4c	[AArch64] Do not abort if overflow check does not use EQ or NE. As suggested by Eli Friedman, instead of aborting if an overflow check uses something other than SETEQ or SETNE, simply do not apply the optimization. Differential Revision: https://reviews.llvm.org/D39147 llvm-svn: 319837	2017-12-05 21:33:12 +00:00
Martin Storsjo	eca862de07	[AArch64] Allow using emulated tls on platforms other than ELF This matches how it is done on X86. This allows using emulated tls on windows; in MinGW environments, native tls isn't supported at the moment. Set the right Data*bitsDirective for windows to match the existing tests for other platforms. Make parts of the existing tests a regex, to allow matching .section .rdata for windows, to avoid having to duplicate the rest of the tests for windows. Differential Revision: https://reviews.llvm.org/D40770 llvm-svn: 319644	2017-12-04 09:09:04 +00:00
Nirav Dave	db77e57ea8	[DAG] Do MergeConsecutiveStores again before Instruction Selection Summary: Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well. Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33675 llvm-svn: 319036	2017-11-27 15:28:15 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Martin Storsjo	4629f52312	[ARM, AArch64] Fix an assert message, Darwin isn't the only target supporting TLS. NFC. llvm-svn: 318184	2017-11-14 19:57:59 +00:00
David Blaikie	3f833edc7c	Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647	2017-11-08 01:01:31 +00:00
Evandro Menezes	9dcf099944	[AArch64] Fix the number of iterations for the Newton series The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349	2017-11-03 18:56:36 +00:00
Martin Storsjo	373c8efa1e	[AArch64] Add support for dllimport of values and functions Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555	2017-10-25 07:25:18 +00:00
Amara Emerson	24ca39ce71	[AArch64] Improve codegen for inverted overflow checking intrinsics E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the inverted condition code for the CSEL. rdar://28495949 Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D38160 llvm-svn: 315205	2017-10-09 15:15:09 +00:00
Geoff Berry	bb23df92b5	[AArch64] Fix bug in store of vector 0 DAGCombine. Summary: Avoid using XZR/WZR directly as operands to split stores of zero vectors. Doing so can lead to the XZR/WZR being used by an instruction that doesn't allow it (e.g. add). Fixes bug 34674. Reviewers: t.p.northover, efriedma, MatzeB Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38146 llvm-svn: 313916	2017-09-21 21:10:06 +00:00
Sjoerd Meijer	0c5ba21cbf	[AArch64] allow v8f16 types when FullFP16 is supported This adds support for allowing v8f16 vector types, thus avoiding conversions from/to single precision for these types. This is a follow up patch of commits r311154 and r312104, which added support for scalars and v4f16 types, respectively. Differential Revision: https://reviews.llvm.org/D37802 llvm-svn: 313351	2017-09-15 09:24:48 +00:00
Petr Hosek	c35fe2b70b	[Fuchsia] Magenta -> Zircon Fuchsia's lowest API layer has been renamed from Magenta to Zircon. In LLVM proper, this is only mentioned in comments. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D37763 llvm-svn: 313105	2017-09-13 01:18:06 +00:00
Sjoerd Meijer	bafde8f3e3	[AArch64] ISel: Add some debug messages to LowerBUILDVECTOR. NFC. Differential Revision: https://reviews.llvm.org/D37676 llvm-svn: 313017	2017-09-12 10:24:12 +00:00
Sjoerd Meijer	be5b60f735	[AArch64] allow v4f16 types when FullFP16 is supported Support for scalars was committed in r311154, this adds support for allowing v4f16 vector types (thus avoiding conversions from/to single precision for these types). Differential Revision: https://reviews.llvm.org/D37145 llvm-svn: 312104	2017-08-30 08:38:13 +00:00
Sjoerd Meijer	b0eb5fb317	[AArch64] Add FMOVH0: materialize 0 using zero register for f16 values Instead of loading 0 from a constant pool, it's of course much better to materialize it using an fmov and the zero register. Thanks to Ahmed Bougacha for the suggestion. Differential Revision: https://reviews.llvm.org/D37102 llvm-svn: 311662	2017-08-24 14:47:06 +00:00
Sjoerd Meijer	afc2cd3c9e	[AArch64] Custom lowering of copysign f16 This is a follow up patch of r311154 and introduces custom lowering of copysign f16 to avoid promotions to single precision types when the subtarget supports fullfp16. Differential Revision: https://reviews.llvm.org/D36893 llvm-svn: 311646	2017-08-24 09:21:10 +00:00
Sjoerd Meijer	046a969360	[AArch64] fix for fcos and frem f16 promotion Fix for copy-paste mistake in r311154; setOperationAction for fcos and frem f16 operands appeared twice (and it should be set to 'promote'). Differential Revision: https://reviews.llvm.org/D37071 llvm-svn: 311635	2017-08-24 07:43:52 +00:00
Krasimir Georgiev	3d55cef48b	[AArch64] Silence unused variable warning in opt mode after r311533 llvm-svn: 311535	2017-08-23 08:40:22 +00:00
Sjoerd Meijer	24c98189ed	[AArch64] ISel legalization debug messages. NFCI. Debugging AArch64 instruction legalization and custom lowering is really an unpleasant experience because it shows nodes that appear out of thin air. In commit r311444, some debug messages have been added to SelectionDAG, the target independent part, and this patch adds some AArch64 specific messages. Differential Revision: https://reviews.llvm.org/D36964 llvm-svn: 311533	2017-08-23 08:18:37 +00:00
Sjoerd Meijer	b9de2b4871	[AArch64] Cleanup of HasFullFP16 argument. NFC. This is a clean up of commit r311154; it's not necessary to pass HasFullFP16 as an argument, instead just query the DAG. Differential Revision: https://reviews.llvm.org/D36978 llvm-svn: 311438	2017-08-22 09:21:08 +00:00
Alex Bradbury	080f6976c0	Use report_fatal_error for unsupported calling conventions The calling convention can be specified by the user in IR. Failing to support a particular calling convention isn't a programming error, and so relying on llvm_unreachable to catch and report an unsupported calling convention is not appropriate. Differential Revision: https://reviews.llvm.org/D36830 llvm-svn: 311435	2017-08-22 09:11:41 +00:00
Sjoerd Meijer	ec9581e5e0	[AArch64] Do not promote f16 when subtarget HasFullFP16 Armv8.2-A adds FP16 support, i.e. f16 is not only a storage-only type, but it also supports performing data processing on 16-bit floating-point quantities. All the necessary (tablegen) groundwork of adding the ARMv8.2-A FP16 (scalar) instructions was done in D15014. To take advantage of this, this patch avoids promotion of f16 to f32 types when the subtarget supports FullFP16, which enables instruction selection of these FP16 instructions. Differential Revision: https://reviews.llvm.org/D36396 llvm-svn: 311154	2017-08-18 10:51:14 +00:00
Craig Topper	4e22ee6745	[ConstantInt] Use ConstantInt::getValue instead of Constant::getUniqueInteger in a few places where we obviously have a ConstantInt. NFC getUniqueInteger will ultimately call ConstantInt::getValue, but calling ConstantInt::getValue should be inlined. llvm-svn: 310069	2017-08-04 16:59:29 +00:00
Haicheng Wu	50692a203c	[AArch64] Fix a typo in isExtFreeImpl() next => not Differential Revision: https://reviews.llvm.org/D36104 llvm-svn: 309748	2017-08-01 21:26:45 +00:00
Ahmed Bougacha	87807c5a86	[AArch64] Fix legality info passed to demanded bits for TBI opt. The (seldom-used) TBI-aware optimization had a typo lying dormant since it was first introduced, in r252573: when asking for demanded bits, it told TLI that it was running after legalize, where the opposite was true. This is an important piece of information, that the demanded bits analysis uses to make assumptions about the node. r301019 added such an assumption, which was broken by the TBI combine. Instead, pass the correct flags to TLO. llvm-svn: 309323	2017-07-27 21:27:25 +00:00
Peter Collingbourne	081ffe2ff2	Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI. This was a use-after-free waiting to happen. llvm-svn: 309159	2017-07-26 19:15:29 +00:00
Zvi Rackover	1b73682243	TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085	2017-07-26 08:06:58 +00:00
Martin Storsjo	8cb3667541	[AArch64] Reserve a 16 byte aligned amount of fixed stack for win64 varargs Create a dummy 8 byte fixed object for the unused slot below the first stored vararg. Alternative ideas tested but skipped: One could try to align the whole fixed object to 16, but I haven't found how to add an offset to the stack frame used in LowerWin64_VASTART. If only the size of the fixed stack object size is padded but not the offset, via MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false), PrologEpilogInserter crashes due to "Attempted to reset backwards range!". This fixes misconceptions about where registers are spilled, since AArch64FrameLowering.cpp assumes the offset from fixed objects is aligned to 16 bytes (and the Win64 case there already manually aligns the offset to 16 bytes). This fixes cases where local stack allocations could overwrite callee saved registers on the stack. Differential Revision: https://reviews.llvm.org/D35720 llvm-svn: 308950	2017-07-25 05:20:01 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Martin Storsjo	2f24e93481	[AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well Rename the enum value from X86_64_Win64 to plain Win64. The symbol exposed in the textual IR is changed from 'x86_64_win64cc' to 'win64cc', but the numeric value is kept, keeping support for old bitcode. Differential Revision: https://reviews.llvm.org/D34474 llvm-svn: 308208	2017-07-17 20:05:19 +00:00
Geoff Berry	b1e8714af9	[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1) Summary: This patch is the first step in reducing HW prefetcher instruction tag collisions in inner loops for Falkor. It adds a pass that annotates IR loads with metadata to indicate that they are known to be strided loads, and adds a target lowering hook that translates this metadata to a target-specific MachineMemOperand flag. A follow on change will use this MachineMemOperand flag to re-write instructions to reduce tag collisions. Reviewers: mcrosier, t.p.northover Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34963 llvm-svn: 308059	2017-07-14 21:44:12 +00:00
Martin Storsjo	68266faa31	[AArch64] Implement support for windows style vararg functions Pass parameters properly in calls to such functions (pass all floats in integer registers), and handle va_start properly (allocate stack immediately below the arguments on the stack, to save the register arguments into a single continuous array). Differential Revision: https://reviews.llvm.org/D35006 llvm-svn: 307928	2017-07-13 17:03:12 +00:00
Matthew Simpson	06e6a6bdff	[AArch64] Add preliminary support for ARMv8.1 SUB/AND atomics This patch is a follow-up to r305893 and adds preliminary support for the fetch_sub and fetch_and operations. llvm-svn: 307913	2017-07-13 15:01:23 +00:00
Joel Jones	7466ccfc59	Doxygen formatting. NFCI llvm-svn: 307597	2017-07-10 22:11:50 +00:00
Dehao Chen	38f1bc7834	Fix the bug when handling shufflevector for aarch64. Summary: This Fixes https://bugs.llvm.org/show_bug.cgi?id=33600 Reviewers: mssimpso, davidxl, Carrot Reviewed By: mssimpso Subscribers: aemerson, rengolin, sanjoy, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D34641 llvm-svn: 306334	2017-06-26 21:33:51 +00:00
Christof Douma	c1c28051d2	[AARCH64][LSE] Preliminary support for ARMv8.1 LSE Atomics. Implemented support to AArch64 codegen for ARMv8.1 Large System Extensions atomic instructions. Where supported, these instructions can provide atomic operations with higher performance. Currently supported operations include: fetch_add, fetch_or, fetch_xor, fetch_smin, fetch_min/max (signed and unsigned), swap, and compare_exchange. This implementation implies sequential-consistency ordering, more relaxed ordering is under development. Subtarget->hasLSE is currently supported for Cavium ThunderX2T99. Patch by Ananth Jasty. Differential Revision: https://reviews.llvm.org/D33586 Change-Id: I82f6d3d64255622791ceb0715b7ab9f4dc4d4b2c llvm-svn: 305893	2017-06-21 10:58:31 +00:00
Nirav Dave	85e92223b4	[AArch64] Add indexed check to splitStores. NFC. Add explicit check for unhandled cases in preparation for delaying splitStores to post-legalization. llvm-svn: 305471	2017-06-15 14:47:44 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Craig Topper	f6d4dc5b4a	[SelectionDAG] Set ISD::FPOWI to Expand by default Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie". This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default. Reviewers: spatel, RKSimon, efriedma Reviewed By: RKSimon Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits Differential Revision: https://reviews.llvm.org/D33530 llvm-svn: 304215	2017-05-30 15:27:55 +00:00
Nirav Dave	6ff50bf242	Fix signedness of constant. NFC. llvm-svn: 303980	2017-05-26 12:53:10 +00:00
Nirav Dave	bb20b5d5c3	[AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFC In preparation for late-stage store merging. llvm-svn: 303800	2017-05-24 19:55:49 +00:00
Akira Hatanaka	e8ae3346a3	[AArch64] Fix PRR33100. This commit fixes a bug introduced in r301019 where optimizeLogicalImm would replace a logical node's immediate operand that was CSE'd and was also an operand of another node. This commit fixes the bug by replacing the logical node instead of its immediate operand. rdar://problem/32295276 llvm-svn: 303607	2017-05-23 06:08:37 +00:00
Amara Emerson	c9916d7e97	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Hans Wennborg	bd6e9e77a7	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Amara Emerson	816542ceb3	[AArch64] Enable use of reduction intrinsics. The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678	2017-05-10 15:15:38 +00:00
Serge Guelton	e38003f839	Suppress all uses of LLVM_END_WITH_NULL. NFC. Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential Revision: https://reviews.llvm.org/D32541 llvm-svn: 302571	2017-05-09 19:31:13 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Simon Pilgrim	7a28a3ac78	[AARCH64][NEON] Add support for ISD::ABS lowering Update int_aarch64_neon_abs intrinsic to use the ISD::ABS opcode directly Differential Revision: https://reviews.llvm.org/D32940 llvm-svn: 302415	2017-05-08 10:25:18 +00:00
Amara Emerson	d28f0cd448	Generalize the specialized flag-carrying SDNodes by moving flags into SDNode. This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803	2017-05-01 15:17:51 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00
Akira Hatanaka	22e839f4b2	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019	2017-04-21 18:53:12 +00:00
Joel Jones	a7c4a52188	[AArch64] Refactor instruction selection lowering for addresses. NFCI Factor out the common code used for generating addresses into common templated functions that call overloaded versions of a new function, getTargetNode. Tested with make check-llvm with targets AArch64. Differential Revision: https://reviews.llvm.org/D32169 llvm-svn: 301005	2017-04-21 17:31:03 +00:00
Akira Hatanaka	78ccba6a20	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940	2017-04-21 01:31:50 +00:00
Akira Hatanaka	e52caddae8	[AArch64] Use suffix ULL to shift a 64-bit value. llvm-svn: 300932	2017-04-21 00:35:27 +00:00
Akira Hatanaka	19077aaee0	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930	2017-04-21 00:05:16 +00:00
Akira Hatanaka	7b06cebe73	Revert "[AArch64] Improve code generation for logical instructions taking" This reverts r300913. This broke bots. llvm-svn: 300916	2017-04-20 23:03:30 +00:00
Akira Hatanaka	e327f09832	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913	2017-04-20 22:47:56 +00:00
Matt Arsenault	3138075dd4	DAG: Make mayBeEmittedAsTailCall parameter const llvm-svn: 300603	2017-04-18 21:16:46 +00:00
Davide Italiano	3e9986f368	[Target] Use hasOneUse() instead of getNumUses(). The latter does a liner scan over a linked list, therefore is much more expensive. llvm-svn: 300518	2017-04-18 00:29:54 +00:00
Tim Northover	879a0b2e1b	AArch64: support nonlazybind It's almost certainly not a good idea to actually use it in most cases (there's a pretty large code size overhead on AArch64), but we can't do those experiments until it's supported. llvm-svn: 300462	2017-04-17 17:27:56 +00:00
Adam Nemet	c5779460f4	[AArch64] Avoid partial register writes on lane 0 of BUILD_VECTOR for i8/i16/f16 This further improves Ahmed's change in rL299482. See the new comment for the rationale. The patch recovers most of the regression for bzip2 after D31965. We're down to +2.68% from +6.97%. Differential Revision: https://reviews.llvm.org/D32028 llvm-svn: 300276	2017-04-13 23:32:47 +00:00
Matthew Simpson	1468d3e04e	[ARM/AArch64] Ensure valid vector element types for interleaved accesses This patch refactors and strengthens the type checks performed for interleaved accesses. The primary functional change is to ensure that the interleaved accesses have valid element types. The added test cases previously failed because the element type is f128. Differential Revision: https://reviews.llvm.org/D31817 llvm-svn: 299864	2017-04-10 18:34:37 +00:00
Petr Hosek	c3a9e6db38	[AArch64] Allow global register asm("x18") or asm("w18") under -ffixed-x18 When using -ffixed-x18, the x18 (or w18) register can safely be used with the "global register variable" GCC extension, but the backend fails to recognize it. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31793 llvm-svn: 299799	2017-04-07 20:41:58 +00:00
Ahmed Bougacha	d3c03a5ddd	[AArch64] Avoid partial register deps on insertelt of load into lane 0. This improves upon r246462: that prevented FMOVs from being emitted for the cross-class INSERT_SUBREGs by disabling the formation of INSERT_SUBREGs of LOAD. But the ld1.s that we started selecting caused us to introduce partial dependencies on the vector register. Avoid that by using SCALAR_TO_VECTOR: it's a first-class citizen that is folded away by many patterns, including the scalar LDRS that we want in this case. Credit goes to Adam for finding the issue! llvm-svn: 299482	2017-04-04 22:55:53 +00:00
Petr Hosek	9eb0a1e09b	[AArch64][Fuchsia] Allow -mcmodel=kernel for --target=aarch64-fuchsia This mode is just like -mcmodel=small except that it moves the thread pointer from TPIDR_EL0 to TPIDR_EL1. Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D31624 llvm-svn: 299462	2017-04-04 19:51:53 +00:00
Simon Pilgrim	37b536e4b3	[DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201	2017-03-31 11:24:16 +00:00
Davide Italiano	a0bd28c4d9	[AArch64ISelLowering] Remove `else` after `return` in LowerGlobalTLSAddress. llvm-svn: 299103	2017-03-30 19:52:31 +00:00
Davide Italiano	de05686ec6	[AArch64] Simplify isSingExtended()/isZeroExtended(). NFCI. llvm-svn: 299102	2017-03-30 19:46:18 +00:00

... 3 4 5 6 7 ...

930 Commits