llvm-project

Commit Graph

Author	SHA1	Message	Date
Roger Ferrer Ibanez	4f92b4162f	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313009	2017-09-12 07:40:09 +00:00
Hans Wennborg	075e5a2e2b	Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY" It caused PR34564. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312980	2017-09-11 23:52:02 +00:00
Roger Ferrer Ibanez	12b20f2307	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312898	2017-09-11 07:38:05 +00:00
Diana Picus	1d101d76c0	Move static helper into ARMTargetLowering. NFC This exposes the isReadOnly(GlobalValue *) in the ARMTargetLowering so we can make use of it in GlobalISel as well. llvm-svn: 312320	2017-09-01 10:44:48 +00:00
Evgeny Astigeevich	540a39adf7	[ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes for the Thumb1 target. This causes generation of redundant code and affects performance. This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106 Differential Revision: https://reviews.llvm.org/D36467 llvm-svn: 311649	2017-08-24 10:00:25 +00:00
Craig Topper	2251ef95a3	[X86][ARM][TargetLowering] Add SrcVT to isExtractSubvectorCheap Summary: Without the SrcVT its hard to know what is really being asked for. For example if your target has 128, 256, and 512 bit vectors. Maybe extracting 128 from 256 is cheap, but maybe extracting 128 from 512 is not. For x86 we do support extracting a quarter of a 512-bit register. But for i1 vectors we don't have isel patterns for extracting arbitrary pieces. So we need this to have a correct implementation of isExtractSubvectorCheap for mask vectors. Reviewers: RKSimon, zvi, efriedma Reviewed By: RKSimon Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D36649 llvm-svn: 310793	2017-08-13 17:29:07 +00:00
Nico Weber	bfde70b097	Revert r309923, it caused PR34045. llvm-svn: 309950	2017-08-03 15:41:26 +00:00
Roger Ferrer Ibanez	854980341b	[ARM] Use ADDCARRY / SUBCARRY This patch: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) <- (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) <- (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRY into ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) -> C Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 309923	2017-08-03 07:45:10 +00:00
Zvi Rackover	1b73682243	TargetLowering: Change isShuffleMaskLegal's mask argument type to ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085	2017-07-26 08:06:58 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Nirav Dave	4dcad5dc6b	Add DAG argument to canMergeStoresTo NFC. llvm-svn: 307583	2017-07-10 20:25:54 +00:00
Alexandros Lamprineas	2b2b420563	[ARM] Support constant pools in data when generating execute-only code. Resubmission of r305387, which was reverted at r305390. The Address Sanitizer caught a stack-use-after-scope of a Twine variable. This is now fixed by passing the Twine directly as a function parameter. The ARM backend asserts against constant pool lowering when it generates execute-only code in order to prevent the generation of constant pools in the text section. It appears that target independent optimizations might generate DAG nodes that represent constant pools. By lowering such nodes as global addresses we don't violate the semantics of execute-only code and also it is guaranteed that execute-only behaves correct with the position-independent addressing modes that support execute-only code. Differential Revision: https://reviews.llvm.org/D33773 llvm-svn: 305776	2017-06-20 07:20:52 +00:00
Alexandros Lamprineas	1c15ee2631	Revert "[ARM] Support constant pools in data when generating execute-only code." This reverts commit 3a204faa093c681a1e96c5e0622f50649b761ee0. I've upset a buildbot which runs the address sanitizer: ERROR: AddressSanitizer: stack-use-after-scope lib/Target/ARM/ARMISelLowering.cpp:2690 That Twine variable is used illegally. llvm-svn: 305390	2017-06-14 15:00:08 +00:00
Alexandros Lamprineas	c582d6e133	[ARM] Support constant pools in data when generating execute-only code. The ARM backend asserts against constant pool lowering when it generates execute-only code in order to prevent the generation of constant pools in the text section. It appears that target independent optimizations might generate DAG nodes that represent constant pools. By lowering such nodes as global addresses we don't violate the semantics of execute-only code and also it is guaranteed that execute-only behaves correct with the position-independent addressing modes that support execute-only code. Differential Revision: https://reviews.llvm.org/D33773 llvm-svn: 305387	2017-06-14 13:22:41 +00:00
Nirav Dave	6c910c0dd8	[DAG] Add AddressSpace parameter to canMergeStoresTo. NFC. llvm-svn: 303673	2017-05-23 18:53:02 +00:00
Tim Shen	04de70d3a7	[Atomic] Remove IsStore/IsLoad in the interface, and pass the instruction instead. NFC. Now both emitLeadingFence and emitTrailingFence take the instruction itself, instead of taking IsLoad/IsStore pairs. Instruction::mayReadFromMemory and Instrucion::mayWriteToMemory are used for determining those two booleans. The instruction argument is also useful for later D32763, in emitTrailingFence. For emitLeadingFence, it seems to have cleaner interface with the proposed change. Differential Revision: https://reviews.llvm.org/D32762 llvm-svn: 302539	2017-05-09 15:27:17 +00:00
Matthias Braun	4682ac6c83	ARM: Compute MaxCallFrame size early This exposes a method in MachineFrameInfo that calculates MaxCallFrameSize and calls it after instruction selection in the ARM target. This avoids ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame() giving different answers in early/late phases of codegen. The testcase shows a particular nasty example result of that where we would fail to properly align an alloca. Differential Revision: https://reviews.llvm.org/D32622 llvm-svn: 302303	2017-05-05 22:04:05 +00:00
Sam Parker	df337704f0	[ARM] ACLE Chapter 9 intrinsics Added the integer data processing intrinsics from ACLE v2.1 Chapter 9 but I have missed out the saturation_occurred intrinsics for now. For the instructions that read and write the GE bits, a chain is included and the only instruction that reads these flags (sel) is only selectable via the implemented intrinsic. Differential Revision: https://reviews.llvm.org/D32281 llvm-svn: 302126	2017-05-04 07:31:28 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00
Matt Arsenault	3138075dd4	DAG: Make mayBeEmittedAsTailCall parameter const llvm-svn: 300603	2017-04-18 21:16:46 +00:00
Matthew Simpson	1468d3e04e	[ARM/AArch64] Ensure valid vector element types for interleaved accesses This patch refactors and strengthens the type checks performed for interleaved accesses. The primary functional change is to ensure that the interleaved accesses have valid element types. The added test cases previously failed because the element type is f128. Differential Revision: https://reviews.llvm.org/D31817 llvm-svn: 299864	2017-04-10 18:34:37 +00:00
Sanjay Patel	b2f1621bb1	[DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401) This is a generic combine enabled via target hook to reduce icmp logic as discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 It's likely that other targets will want to enable this hook for scalar transforms, and there are probably other patterns that can use bitwise logic to reduce comparisons. Note that we are missing an IR canonicalization for these patterns, and we will probably prefer the pair-of-compares form in IR (shorter, more likely to fold). Differential Revision: https://reviews.llvm.org/D31483 llvm-svn: 299542	2017-04-05 14:09:39 +00:00
Simon Pilgrim	37b536e4b3	[DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201	2017-03-31 11:24:16 +00:00
Sam Parker	654cb8263a	[ARM] Enable SMLAL[B\|T] isel Enable the selection of the 64-bit signed multiply accumulate instructions which operate on 16-bit operands. These are enabled for ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb architectures. Differential Revision: https://reviews.llvm.org/D30044 llvm-svn: 297809	2017-03-15 08:27:11 +00:00
Sam Parker	916b1ba617	[ARM] Move SMULW[B\|T] isel to DAG Combine Create nodes for smulwb and smulwt and move their selection from DAGToDAG to DAG combine. smlawb and smlawt can then be selected using tablegen. Added some helper functions to detect shift patterns as well as a wrapper around SimplifyDemandBits. Added a couple of extra tests. Differential Revision: https://reviews.llvm.org/D30708 llvm-svn: 297716	2017-03-14 09:13:22 +00:00
Nirav Dave	54e22f33d9	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695	2017-03-14 00:34:14 +00:00
Chandler Carruth	ce52b80744	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862	2017-03-03 10:02:25 +00:00
Nirav Dave	f830dec3f2	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476	2017-02-28 14:24:15 +00:00
Nirav Dave	73cd0194cf	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279	2017-02-26 01:27:32 +00:00
Nirav Dave	beabf456df	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252	2017-02-25 11:43:58 +00:00
James Molloy	d508789668	[ARM] Use VCMP, not VCMPE, for floating point equality comparisons When generating a floating point comparison we currently unconditionally generate VCMPE. This has the sideeffect of setting the cumulative Invalid bit in FPSCR if any of the operands are QNaN. It is expected that use of a relational predicate on a QNaN value should raise Invalid. Quoting from the C standard: The relational and equality operators support the usual mathematical relationships between numeric values. For any ordered pair of numeric values exactly one of relationships the less, greater, equal and is true. Relational operators may raise the floating-point exception when argument values are NaNs. The standard doesn't explicitly state the expectation for equality operators, but the implication and obvious expectation is that equality operators should not raise Invalid on a QNaN input, as those predicates are wholly defined on unordered inputs (to return not equal). Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if QNaN should raise Invalid, and pipe that through to TableGen. llvm-svn: 294945	2017-02-13 12:32:47 +00:00
Nirav Dave	93f9d5ce04	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915	2017-02-02 18:24:55 +00:00
Nirav Dave	4442667fc5	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893	2017-02-02 14:39:42 +00:00
Nirav Dave	d32a421f75	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188	2017-01-26 16:46:13 +00:00
Nirav Dave	de6516c466	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184	2017-01-26 16:02:24 +00:00
Saleem Abdulrasool	6ef45916c6	ARM: match GCC's behaviour for builtins GCC changes the CC between the user-code and the builtins based on the value of `-target` rather than `-mfloat-abi`. When a HF target is used, the VFP variant of the AAPCS CC is used. Otherwise, the AAPCS variant is used. In all cases, the AEABI functions use the AAPCS CC. Adjust the calling convention based on the target. Resolves PR30543! llvm-svn: 291909	2017-01-13 16:25:33 +00:00
Eugene Zelenko	c4ad1ce068	[Target] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 291641	2017-01-11 01:45:03 +00:00
Eli Friedman	d03df8145f	[ARM] Implement isExtractSubvectorCheap. See https://reviews.llvm.org/D6678 for the history of isExtractSubvectorCheap. Essentially the same considerations apply to ARM. This temporarily breaks the formation of vpadd/vpaddl in certain cases; AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles. See https://reviews.llvm.org/D27779 for followup fix. Differential Revision: https://reviews.llvm.org/D27774 llvm-svn: 290198	2016-12-20 20:05:07 +00:00
Eli Friedman	f624ec27b7	[ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently. Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972	2016-12-16 18:44:08 +00:00
Diana Picus	2af9c389bf	[ARM] Expose methods to get the CCAssignFn. NFCI Add two public methods to ARMTargetLowering: CCAssignFnForCall and CCAssignFnForReturn, which are just calling the already existing private method CCAssignFnForNode. These will come in handy for GlobalISel on ARM. We also replace all calls to CCAssignFnForNode in ARMISelLowering.cpp, because the new methods are friendlier to the reader. llvm-svn: 289932	2016-12-16 10:35:20 +00:00
Nico Weber	11c2e6ee3a	Revert 279703, it caused PR31404. llvm-svn: 289923	2016-12-16 04:51:25 +00:00
Eli Friedman	10576e73c9	[ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently. Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289703	2016-12-14 20:25:26 +00:00
Javed Absar	85874a9360	[ARM]: Assign cost of scaling used in addressing mode for ARM cores This patch assigns cost of the scaling used in addressing. On many ARM cores, a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. For instance: LDR R0, [R1, R2 LSL #2] LDR R0, [R1, -R2 LSL #2] Above, (1) takes less cycles than (2). By assigning appropriate scaling factor cost, we enable the LLVM to make the right trade-offs in the optimization and code-selection phase. Differential Revision: http://reviews.llvm.org/D24857 Reviewers: jmolloy, rengolin llvm-svn: 284127	2016-10-13 14:57:43 +00:00
Saleem Abdulrasool	92e33a3ebc	ARM: move the builtins libcall CC setup Move the target specific setup into the target specific lowering setup. As pointed out by Anton, the initial change was moving this too high up the stack resulting in a violation of the layering (the target generic code path setup target specific bits). Sink this into the ARM specific setup. NFC. llvm-svn: 281088	2016-09-09 20:11:31 +00:00
Diana Picus	774d157a5d	[ARM] Honour ABI for rem under -O0 for EABI, GNUEABI, Android and Musl At higher optimization levels, we generate the libcall for DIVREM_Ix, which is fine: aeabi_{u\|i}divmod. At -O0 we generate the one for REM_Ix, which is the default {u}mod{q\|h\|s\|d}i3. This commit makes sure that we don't generate REM_Ix calls for ABIs that don't support them (i.e. where we need to use DIVREM_Ix instead). This is achieved by bailing out of FastISel, which can't handle non-double multi-reg returns, and letting the legalization infrastructure expand the REM_Ix calls. It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some Windows checks to it to make sure we don't break things for it. Fixes PR27068 Differential Revision: https://reviews.llvm.org/D21926 llvm-svn: 275773	2016-07-18 06:48:25 +00:00
Duncan P. N. Exon Smith	e4f5e4f4d1	CodeGen: Use MachineInstr& in TargetLowering, NFC This is a mechanical change to make TargetLowering API take MachineInstr& (instead of MachineInstr), since the argument is expected to be a valid MachineInstr. In one case, changed a parameter from MachineInstr to MachineBasicBlock::iterator, since it was used as an insertion point. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. llvm-svn: 274287	2016-06-30 22:52:52 +00:00
Rafael Espindola	ae0d866f56	Refactor a duplicated predicate. NFC. llvm-svn: 273826	2016-06-26 22:13:55 +00:00
Pablo Barrio	7a64346533	[ARM] Lower (select_cc k k (select_cc ~k ~k x)) into (SSAT l_k x) Summary: SSAT saturates an integer, making sure that its value lies within an interval [-k, k]. Since the constant is given to SSAT as the number of bytes set to one, k + 1 must be a power of 2, otherwise the optimization is not possible. Also, the select_cc must use < and > respectively so that they define an interval. Reviewers: mcrosier, jmolloy, rengolin Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D21372 llvm-svn: 273581	2016-06-23 16:53:49 +00:00
Sam Parker	d616cf07b2	[ARM] Enable isel of UMAAL TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL dags into UMAAL. Selection is split into the two phases because it is easier to match the two patterns at those different times. Differential Revision: http://http://reviews.llvm.org/D21461 llvm-svn: 273165	2016-06-20 16:47:09 +00:00
Rafael Espindola	0f89833c31	Add a isPositionIndependent predicate. Reduces a bit of code duplication and clarify where we are interested just on position independence and no the location of the symbol. llvm-svn: 273164	2016-06-20 16:43:17 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Tim Northover	b5ece527a1	ARM: stop emitting blx instructions for most calls on MachO. I'm really not sure why we were in the first place, it's the linker's job to convert between BL/BLX as necessary. Even worse, using BLX left Thumb calls that could be locally resolved completely unencodable since all offsets to BLX are multiples of 4. rdar://26182344 llvm-svn: 269101	2016-05-10 19:17:47 +00:00
Silviu Baranga	82d04260b7	[ARM] Add support for the X asm constraint Summary: This patch adds support for the X asm constraint. To do this, we lower the constraint to either a "w" or "r" constraint depending on the operand type (both constraints are supported on ARM). Fixes PR26493 Reviewers: t.p.northover, echristo, rengolin Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D19061 llvm-svn: 267411	2016-04-25 14:29:18 +00:00
Manman Ren	5751814eda	Swift Calling Convention: swifterror target support. Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997	2016-04-11 21:08:06 +00:00
Peter Collingbourne	86b9fbe980	ARM: Better codegen for 64-bit compares. This introduces a custom lowering for ISD::SETCCE (introduced in r253572) that allows us to emit a short code sequence for 64-bit compares. Before: push {r7, lr} cmp r0, r2 mov.w r0, #0 mov.w r12, #0 it hs movhs r0, #1 cmp r1, r3 it ge movge.w r12, #1 it eq moveq r12, r0 cmp.w r12, #0 bne .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} After: push {r7, lr} subs r0, r0, r2 sbcs.w r0, r1, r3 bge .LBB1_2 @ BB#1: @ %bb1 bl f pop {r7, pc} .LBB1_2: @ %bb2 bl g pop {r7, pc} Saves around 80KB in Chromium's libchrome.so. Some notes on this patch: - I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I introduced (nothing else needs them). However, they are necessary in order to avoid poor codegen, and they seem similar to existing combines in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to (brcond Compare)). - No support for Thumb-1. This is in principle possible, but we'd need to implement ARMISD::SUBE for Thumb-1. Differential Revision: http://reviews.llvm.org/D15256 llvm-svn: 263962	2016-03-21 18:00:02 +00:00
James Y Knight	f44fc5219f	Tweak some atomics functions in preparation for larger changes; NFC. - Rename getATOMIC to getSYNC, as llvm will soon be able to emit both '__sync' libcalls and '__atomic' libcalls, and this function is for the '__sync' ones. - getInsertFencesForAtomic() has been replaced with shouldInsertFencesForAtomic(Instruction), so that the decision can be made per-instruction. This functionality will be used soon. - emitLeadingFence/emitTrailingFence are no longer called if shouldInsertFencesForAtomic returns false, and thus don't need to check the condition themselves. llvm-svn: 263665	2016-03-16 22:12:04 +00:00
Saleem Abdulrasool	f36005a358	ARM: support TLS for WoA Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676	2016-02-03 18:21:59 +00:00
Manman Ren	5e9e65e705	CXX_FAST_TLS calling convention: performance improvement for ARM. This is the same change on ARM as r255821 on AArch64. rdar://9001553 llvm-svn: 257424	2016-01-12 00:47:18 +00:00
Tim Northover	bd41cf880c	ARM: support TLS accesses on Darwin platforms Darwin TLS accesses most closely resemble ELF's general-dynamic situation, since they have to be able to handle all possible situations. The descriptors and so on are obviously slightly different though. llvm-svn: 257039	2016-01-07 09:03:03 +00:00
Martell Malone	d12292480a	ARM: address WOA unsigned division overflow crash Building on r253865 the crash is not limited to signed overflows. Disable custom handling of unsigned 32-bit and 64-bit integer divide. Add test cases for both 32-bit and 64-bit unsigned integer overflow. llvm-svn: 254158	2015-11-26 15:34:03 +00:00
Martell Malone	a6b867eb0d	ARM: address WoA division overflow crash Disable custom handling of signed 32-bit and 64-bit integer divide. Add test cases for both 32-bit and 64-bit integer overflow crashes. llvm-svn: 253865	2015-11-23 13:11:39 +00:00
James Molloy	b564098c62	[ARM] Replace ARMISD::RBIT with ISD::BITREVERSE ISD::BITREVERSE matches "rbit" completely, so remove ARMISD::RBIT and mark ISD::BITREVERSE as legal, adding a test for lowering. llvm-svn: 253047	2015-11-13 16:05:22 +00:00
Sanjay Patel	af1b48bfdc	[ARM] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz() ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2 implementation. The net result of allowing this speculation for the regression tests in this patch is that we get this code: ctlz: clz r0, r0 bx lr cttz: rbit r0, r0 clz r0, r0 bx lr Instead of: ctlz: cmp r0, #0 moveq r0, #32 clzne r0, r0 bx lr cttz: cmp r0, #0 moveq r0, #32 rbitne r0, r0 clzne r0, r0 bx lr This will help solve a general speculation/despeculation problem noted in PR24818: https://llvm.org/bugs/show_bug.cgi?id=24818 Differential Revision: http://reviews.llvm.org/D14469 llvm-svn: 252639	2015-11-10 19:24:31 +00:00
James Molloy	9d55f19cfa	Reapply "[ARM] Combine CMOV into BFI where possible" Added fixes for stage2 failures: CMOV is not commutable; commuting the operands results in the condition being flipped! d'oh! Original commit message: If we have a CMOV, OR and AND combination such as: if (x & CN) y \|= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252606	2015-11-10 14:22:05 +00:00
Renato Golin	1d8a2c952f	Revert "[ARM] Combine CMOV into BFI where possible" This reverts commit r252057, as it broke ARM self-hosting buildbots, probably due to a code-gen fault. llvm-svn: 252460	2015-11-09 12:19:10 +00:00
Joseph Tremoulet	f748c8937e	[WinEH] Update exception pointer registers Summary: The CLR's personality routine passes these in rdx/edx, not rax/eax. Make getExceptionPointerRegister a virtual method parameterized by personality function to allow making this distinction. Similarly make getExceptionSelectorRegister a virtual method parameterized by personality function, for symmetry. Reviewers: pgavlin, majnemer, rnk Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D14344 llvm-svn: 252383	2015-11-07 01:11:31 +00:00
James Molloy	e7d679cf4c	[ARM] Combine CMOV into BFI where possible If we have a CMOV, OR and AND combination such as: if (x & CN) y \|= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252057	2015-11-04 16:55:07 +00:00
Peter Collingbourne	97aae40880	ARM/ELF: Better codegen for global variable addresses. In PIC mode we were previously computing global variable addresses (or GOT entry addresses) by adding the PC, the PC-relative GOT displacement and the GOT-relative symbol/GOT entry displacement. Because the latter two displacements are fixed, we ended up performing one more addition than necessary. This change causes us to compute addresses using a single PC-relative displacement, resulting in a shorter code sequence. This reduces code size by about 4% in a recent build of Chromium for Android. As a result of this change we no longer need to compute the GOT base address in the ARM backend, which allows us to remove the Global Base Reg pass and SDAG lowering for the GOT. We also now no longer use the GOT when addressing a symbol which is known to be defined in the same linkage unit. Specifically, the symbol must have either hidden visibility or a strong definition in the current module in order to not use the the GOT. This is a change from the previous behaviour where we would use the GOT to address externally visible symbols defined in the same module. I think the only cases where this could matter are cases involving symbol interposition, but we don't really support that well anyway. Differential Revision: http://reviews.llvm.org/D13650 llvm-svn: 251322	2015-10-26 18:23:16 +00:00
James Molloy	72222f5dca	[ARM] Handle the inline asm constraint type 'o' This means "memory with offset" and requires very little plumbing to get working. This fixes PR25317. llvm-svn: 251280	2015-10-26 10:04:52 +00:00
Scott Douglass	953f908173	[ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM. We were previously codegen'ing memcpy as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achieving better codegen is to create MEMCPY pseudo instructions, attach scratch virtual registers to them and then, post register allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers. The register allocator will have picked arbitrary registers which we sort when expanding the MEMCPY. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. Fixes PR9199 and PR23768. [This is based on Peter Collingbourne's r238473 which was reverted.] Differential Revision: http://reviews.llvm.org/D13239 Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458 Author: Peter Collingbourne <peter@pcc.me.uk> llvm-svn: 249322	2015-10-05 14:49:54 +00:00
Saleem Abdulrasool	fe83b50289	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Ahmed Bougacha	81616a72ea	[ARM] Emit clrex in the expanded cmpxchg fail block. ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294	2015-09-22 17:22:58 +00:00
Ahmed Bougacha	5246867384	[CodeGen] Refactor TLI/AtomicExpand interface to make LLSC explicit. We used to have this magic "hasLoadLinkedStoreConditional()" callback, which really meant two things: - expand cmpxchg (to ll/sc). - expand atomic loads using ll/sc (rather than cmpxchg). Remove it, and, instead, introduce explicit callbacks: - bool shouldExpandAtomicCmpXchgInIR(inst) - AtomicExpansionKind shouldExpandAtomicLoadInIR(inst) Differential Revision: http://reviews.llvm.org/D12557 llvm-svn: 247429	2015-09-11 17:08:28 +00:00
Ahmed Bougacha	9d677131c4	[CodeGen] Rename AtomicRMWExpansionKind to AtomicExpansionKind. This lets us generalize its usage to the other atomic instructions. llvm-svn: 247428	2015-09-11 17:08:17 +00:00
Scott Douglass	bdef60462d	[ARM] Use AEABI helpers for i64 div and rem Differential Revision: http://reviews.llvm.org/D12232 llvm-svn: 245830	2015-08-24 09:17:18 +00:00
James Molloy	ea3a687a33	[ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM NFCI. This replaces another custom ISDNode with a generic equivalent. llvm-svn: 244591	2015-08-11 12:06:22 +00:00
James Molloy	db8ee4b5a9	[ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN. NFCI. This removes a custom ISDNode. llvm-svn: 244590	2015-08-11 12:06:15 +00:00
Matthias Braun	3cd00c1739	Fix __builtin_setjmp in combination with sjlj exception handling. llvm.eh.sjlj.setjmp was used as part of the SjLj exception handling style but is also used in clang to implement __builtin_setjmp. The ARM backend needs to output additional dispatch tables for the SjLj exception handling style, these tables however can't be emitted if llvm.eh.sjlj.setjmp is simply used for __builtin_setjmp and no actual landing pad blocks exist. To solve this issue a new llvm.eh.sjlj.setup_dispatch intrinsic is introduced which is used instead of llvm.eh.sjlj.setjmp in the SjLj exception handling lowering, so we can differentiate between the case where we actually need to setup a dispatch table and the case where we just need the __builtin_setjmp semantic. Differential Revision: http://reviews.llvm.org/D9313 llvm-svn: 242481	2015-07-16 22:34:16 +00:00
Pat Gavlin	a717f255b6	Allow {e,r}bp as the target of {read,write}_register. This patch allows the read_register and write_register intrinsics to read/write the RBP/EBP registers on X86 iff the targeted register is the frame pointer for the containing function. Differential Revision: http://reviews.llvm.org/D10977 llvm-svn: 241827	2015-07-09 17:40:29 +00:00
Mehdi Amini	0cdec1e2ab	Make isLegalAddressingMode() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11040 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241778	2015-07-09 02:09:40 +00:00
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Benjamin Kramer	9bfb627a0e	[TargetLowering] StringRefize asm constraint getters. There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411	2015-07-05 19:29:18 +00:00
Hao Liu	2cd34bb585	[ARM] Lower interleaved memory accesses to vldN/vstN intrinsics. This patch also adds a function to calculate the cost of interleaved memory accesses. E.g. Lower an interleaved load: %wide.vec = load <8 x i32>, <8 x i32>* %ptr, align 4 %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> into: %vld2 = { <4 x i32>, <4 x i32> } call llvm.arm.neon.vld2(%ptr, 4) %vec0 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 0 %vec1 = extractelement { <4 x i32>, <4 x i32> } %vld2, i32 1 E.g. Lower an interleaved store: %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> store <12 x i32> %i.vec, <12 x i32>* %ptr, align 4 into: %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3> %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7> %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11> call void llvm.arm.neon.vst3(%ptr, %sub.v0, %sub.v1, %sub.v2, 4) Differential Revision: http://reviews.llvm.org/D10533 llvm-svn: 240755	2015-06-26 02:45:36 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Peter Collingbourne	6679fc1a79	Revert r238473, "Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM." as it caused miscompilations and assertion failures (PR23768, http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150601/280380.html). llvm-svn: 239169	2015-06-05 18:01:28 +00:00
Daniel Sanders	1f58ef71ea	[arm] Distinguish the /U[qytnms]/, 'Uv', 'Q', and 'm' inline assembly memory constraints. Summary: But still handle them the same way since I don't know how they differ on this target. Of these, /U[qytnms]/ do not have backend tests but are accepted by clang. No functional change intended. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D8203 llvm-svn: 238921	2015-06-03 12:33:56 +00:00
Matt Arsenault	bd7d80a4a6	Add address space argument to isLegalAddressingMode This is important because of different addressing modes depending on the address space for GPU targets. This only adds the argument, and does not update any of the uses to provide the correct address space. llvm-svn: 238723	2015-06-01 05:31:59 +00:00
Peter Collingbourne	450fbee6b2	Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM. We were previously codegen'ing these as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achiveing better codegen is to create LDM/STM instructions with identical sets of virtual registers, let the register allocator pick arbitrary registers and order register lists when printing an MCInst. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. This is implemented by lowering the memcpy intrinsic to a series of SD-only MCOPY pseudo-instructions which performs a memory copy using a given number of registers. During SD->MI lowering, we lower MCOPY to LDM/STM. This is a little unusual, but it avoids the need to encode register lists in the SD, and we can take advantage of SD use lists to decide whether to use the _UPD variant of the instructions. Fixes PR9199. Differential Revision: http://reviews.llvm.org/D9508 llvm-svn: 238473	2015-05-28 20:02:45 +00:00
Eric Christopher	824f42f209	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Matthias Braun	d04893fa36	Change getTargetNodeName() to produce compiler warnings for missing cases, fix them llvm-svn: 236775	2015-05-07 21:33:59 +00:00
Matthias Braun	eec4efcca5	Cleanup, remove unused return value llvm-svn: 235952	2015-04-28 00:37:05 +00:00
James Molloy	fa041153e5	[ARM] Remove target-specific ITOFP/FPTOI nodes Anton tried this 5 years ago but it was reverted due to extra VMOVs being emitted. This can be easily fixed with a liberal application of patterns - matching loads/stores and extractelts. llvm-svn: 232958	2015-03-23 16:15:16 +00:00
John Brawn	0dbcd65442	[ARM] Align stack objects passed to memory intrinsics Memcpy, and other memory intrinsics, typically tries to use LDM/STM if the source and target addresses are 4-byte aligned. In CodeGenPrepare look for calls to memory intrinsics and, if the object is on the stack, 4-byte align it if it's large enough that we expect that memcpy would want to use LDM/STM to copy it. Differential Revision: http://reviews.llvm.org/D7908 llvm-svn: 232627	2015-03-18 12:01:59 +00:00
Daniel Sanders	bf5b80f5f9	Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC. Summary: This is instead of doing this in target independent code and is the last non-functional change before targets begin to distinguish between different memory constraints when selecting code for the ISD::INLINEASM node. Next, each target will individually move away from the idea that all memory constraints behave like 'm'. Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8173 llvm-svn: 232373	2015-03-16 13:13:41 +00:00
Tim Northover	8cda34f5e7	ARM: simplify and extend byval handling The main issue being fixed here is that APCS targets handling a "byval align N" parameter with N > 4 were miscounting what objects were where on the stack, leading to FrameLowering setting the frame pointer incorrectly and clobbering the stack. But byval handling had grown over many years, and had multiple layers of cruft trying to compensate for each other and calculate padding correctly. This only really needs to be done once, in the HandleByVal function. Elsewhere should just do what it's told by that call. I also stripped out unnecessary APCS/AAPCS distinctions (now that Clang emits byvals with the correct C ABI alignment), which simplified HandleByVal. rdar://20095672 llvm-svn: 231959	2015-03-11 18:54:22 +00:00
Ahmed Bougacha	4200cc95b4	[ARM] Enable vector extload combine for legal types. This commit enables forming vector extloads for ARM. It only does so for legal types, and when we can't fold the extension in a wide/long form of the user instruction. Enabling it for larger types isn't as good an idea on ARM as it is on X86, because: - we pretend that extloads are legal, but end up generating vld+vmov - we have instructions like vld {dN, dM}, which can't be generated when we "manually expand" extloads to vld+vmov. For legal types, the combine doesn't fire that often: in the integration tests only in a big endian testcase, where it removes a pointless AND. Related to rdar://19723053 Differential Revision: http://reviews.llvm.org/D7423 llvm-svn: 231396	2015-03-05 19:37:53 +00:00
JF Bastien	f14889ee34	Mutate TargetLowering::shouldExpandAtomicRMWInIR to specifically dictate how AtomicRMWInsts are expanded. Summary: In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all. This does not represent a behavioural change and as such no tests were added. Patch by: Richard Diamond. Reviewers: jfb Reviewed By: jfb Subscribers: jfb, aemerson, t.p.northover, llvm-commits Differential Revision: http://reviews.llvm.org/D7713 llvm-svn: 231250	2015-03-04 15:47:57 +00:00
Eric Christopher	11e4df73c8	getRegForInlineAsmConstraint wants to use TargetRegisterInfo for a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699	2015-02-26 22:38:43 +00:00
Eric Christopher	23a3a7c871	Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583	2015-02-26 00:00:24 +00:00

1 2 3 4 5 ...

444 Commits