llvm-project

Commit Graph

Author	SHA1	Message	Date
Jessica Paquette	f472f6159a	[MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758	2018-06-27 17:43:27 +00:00
Luke Geeson	316327150b	[AArch64] Reverting FP16 vcvth_n_s64_f16 to fix llvm-svn: 335737	2018-06-27 14:34:40 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Luke Geeson	68cb233c0f	[AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns llvm-svn: 335715	2018-06-27 09:20:13 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Sirish Pande	b60acb9e48	Revert "[AArch64] Coalesce Copy Zero during instruction selection" This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7. Original Commit: Author: Haicheng Wu <haicheng@codeaurora.org> Date: Sun Feb 18 13:51:33 2018 +0000 [AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 Author's intention is to remove a BB that has one mov instruction. In order to do that, d8f571050 pessmizes MachineSinking by introducing a copy, such that mov instruction is NOT moved to the BB. Optimization downstream gets rid of the BB with only mov instruction. This works well if we have only one fall through branch as there is only one "extra" mov instruction. If we have multiple fall throughs, we will have a lot of redundant movs. In such a case, it's better to have this BB which has one mov instruction. This is causing degradation in jpeg, fft and other codebases. I believe if we want to remove a BB with only one branch instruction, we should not pessimize Machine Sinking at all, and find some other solution. llvm-svn: 335251	2018-06-21 16:05:24 +00:00
Tim Northover	70666e7765	[AArch64] Implement FLT_ROUNDS macro. Very similar to ARM implementation, just maps to an MRS. Should fix PR25191. Patch by Michael Brase. llvm-svn: 335118	2018-06-20 12:09:01 +00:00
Vlad Tsyrklevich	98724e582e	Revert r334980 and 334983 This reverts commits r334980 and r334983 because they were causing build timeouts on the x86_64-linux-ubsan bot. llvm-svn: 335085	2018-06-20 00:02:32 +00:00
Jessica Paquette	32de26d432	[MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue insertOutlinerPrologue was not used by any target, and prologue-esque code was beginning to appear in insertOutlinerEpilogue. Refactor that into one function, buildOutlinedFrame. This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue. llvm-svn: 335076	2018-06-19 21:14:48 +00:00
Sander de Smalen	067eee1c13	[AArch64][SVE] Asm: Fix predicate pattern diagnostics. This patch uses the DiagnosticPredicate for SVE predicate patterns to improve their diagnostics, now giving a 'invalid operand' diagnostic if the type is not an immediate or one of the expected pattern labels. Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48220 llvm-svn: 334983	2018-06-18 21:03:02 +00:00
Sander de Smalen	7ac9e193ec	[AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions. The variants added by this patch are: - SQINC signed increment, e.g. sqinc x0, w0, all, mul #4 - SQDEC signed decrement, e.g. sqdec x0, w0, all, mul #4 - UQINC unsigned increment, e.g. uqinc w0, all, mul #4 - UQDEC unsigned decrement, e.g. uqdec w0, all, mul #4 This patch includes asmparser changes to parse a GPR64 as a GPR32 in order to satisfy the constraint check: x0 == GPR64(w0) in: sqinc x0, w0, all, mul #4 ^___^ (must match) Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47716 llvm-svn: 334980	2018-06-18 20:50:33 +00:00
Sander de Smalen	13684d8400	[AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions. Summary: The variants added by this patch are: - SQINC (signed increment) - UQINC (unsigned increment) - SQDEC (signed decrement) - UQDEC (unsigned decrement) For example: uqincw x0, all, mul #4 Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Differential Revision: https://reviews.llvm.org/D47715 llvm-svn: 334948	2018-06-18 14:47:52 +00:00
Sander de Smalen	d521c4353e	[AArch64][SVE] Asm: Support for vector element compares. This patch adds instructions for comparing elements from two vectors, e.g. cmpgt p0.s, p0/z, z0.s, z1.s and also adds support for comparing to a 64-bit wide element vector, e.g. cmpgt p0.s, p0/z, z0.s, z1.d The patch also contains aliases for certain comparisons, e.g.: cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s llvm-svn: 334931	2018-06-18 10:59:19 +00:00
Sander de Smalen	279b7e74e7	[AArch64][SVE] Asm: Support for bitwise operations on predicate vectors. This patch adds support for instructions performing bitwise operations on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS. This patch also adds several aliases: orr p0.b, p1/z, p1.b, p1.b => mov p0.b, p1.b orrs p0.b, p1/z, p1.b, p1.b => movs p0.b, p1.b and p0.b, p1/z, p2.b, p2.b => mov p0.b, p1/z, p2.b ands p0.b, p1/z, p2.b, p2.b => movs p0.b, p1/z, p2.b eor p0.b, p1/z, p2.b, p1.b => not p0.b, p1/z, p2.b eors p0.b, p1/z, p2.b, p1.b => nots p0.b, p1/z, p2.b llvm-svn: 334906	2018-06-17 10:48:21 +00:00
Sander de Smalen	2c25b4cd36	[AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions. Support for SVE's predicated select instructions to select elements from either vector, both in a data-vector and a predicate-vector variant. llvm-svn: 334905	2018-06-17 10:11:04 +00:00
Sander de Smalen	a6edca72ba	[AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions. Predicated splat/copy of SIMD/FP register or general purpose register to SVE vector, along with MOV-aliases. llvm-svn: 334842	2018-06-15 16:39:46 +00:00
Sander de Smalen	18ac8f9f25	[AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions. Increment/decrement scalar register by (scaled) element count given by predicate pattern, e.g. 'incw x0, all, mul #4'. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47713 llvm-svn: 334838	2018-06-15 15:47:44 +00:00
Sander de Smalen	5eb51d7495	[AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47712 llvm-svn: 334831	2018-06-15 13:57:51 +00:00
Sander de Smalen	3cbf171479	[AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates. Some instructions require of a limited set of FP immediates as operands, for example '#0.5 or #1.0' for SVE's FADD instruction. This patch adds support for parsing and printing such FP immediates as exact values (e.g. #0.499999 is not accepted for #0.5). Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47711 llvm-svn: 334826	2018-06-15 13:11:49 +00:00
Clement Courbet	5eeed77f87	[TableGen] Emit a fatal error on inconsistencies in resource units vs cycles. Summary: For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files. For more obvious cases, I've ventured a fix. Some notes: - Exynos is especially fishy. - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice. - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit. Also see PR37310. Reviewers: RKSimon, craig.topper, javed.absar Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46356 llvm-svn: 334586	2018-06-13 09:41:49 +00:00
Petr Hosek	7250908016	[AArch64] Support reserving x20 register Register x20 is a callee-saved register which may be used for other purposes in certain contexts, for example to hold special variables within the kernel. This change adds support for reserving this register both to frontend and backend to make this register usable for these purposes. Differential Revision: https://reviews.llvm.org/D46552 llvm-svn: 334531	2018-06-12 20:00:50 +00:00
Luke Geeson	dc82aa44e6	[AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns llvm-svn: 334488	2018-06-12 09:35:20 +00:00
Clement Courbet	f4f6899cdf	[ExynosM1][Sched] Fix resource usage in scheduling model. This is part of https://reviews.llvm.org/D46356. llvm-svn: 334391	2018-06-11 07:33:08 +00:00
Evandro Menezes	b2c8244715	[AArch64, ARM] Add support for Samsung Exynos M4 Create a separate feature set for Exynos M4 and add test cases. llvm-svn: 334115	2018-06-06 18:56:00 +00:00
Peter Smith	57f661bd7d	[MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078	2018-06-06 09:40:06 +00:00
Jessica Paquette	aa087327ce	[MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.h This is setting up to fix bug 37573 cleanly. This moves data structures that are technically both used in some way by the target and the general-purpose outlining algorithm into MachineOutliner.h. In particular, the `Candidate` class is of importance. Before, the outliner passed the locations of `Candidates` to the target, which would then make some decisions about the prospective outlined function. This change allows us to just pass `Candidates` along to the target. This will allow the target to discard `Candidates` that would be considered unsafe before cost calculation. Thus, we will be able to remove the unsafe candidates described in the bug without resorting to torching the entire prospective function. Also, as a side-effect, it makes the outliner a bit cleaner. https://bugs.llvm.org/show_bug.cgi?id=37573 llvm-svn: 333952	2018-06-04 21:14:16 +00:00
Nicolai Haehnle	01d261f18d	TableGen: Streamline the semantics of NAME Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 llvm-svn: 333900	2018-06-04 14:26:05 +00:00
Luke Geeson	43e4367961	[AArch64] Audit on rL333634 to fix FP16 Disasm BitPatterns llvm-svn: 333879	2018-06-04 09:41:32 +00:00
Sander de Smalen	d0a6f6a502	[AArch64][SVE] Fix range for DUP immediates (16bit elts) For immediates used in DUP instructions that have the range -128 to 127, or a multiple of 256 in the range -32768 to 32512, one could argue that when the result element size is 16bits (.h), the value can be considered both signed and unsigned. Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47619 llvm-svn: 333873	2018-06-04 07:24:23 +00:00
Sander de Smalen	fd54a781f6	[AArch64][SVE] Asm: Print indexed element 0 as FPR. Print the first indexed element as a FP register, for example: mov z0.d, z1.d[0] Is now printed as: mov z0.d, d1 Next to printing, this patch also adds aliases to parse 'mov z0.d, d1'. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47571 llvm-svn: 333872	2018-06-04 07:07:35 +00:00
Sander de Smalen	c33d668ab7	[AArch64][SVE] Asm: Support for indexed DUP instructions. Unpredicated copy of indexed SVE element to SVE vector, along with MOV-aliases. For example: dup z0.h, z1.h[0] duplicates the first 16-bit element from z1 to all elements in the result vector z0. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47570 llvm-svn: 333871	2018-06-04 06:40:55 +00:00
Sander de Smalen	367a53b059	[AArch64][SVE] Asm: Support for FCPY immediate instructions. Predicated copy of floating-point immediate value to SVE vector, along with MOV-aliases. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47518 llvm-svn: 333869	2018-06-04 05:58:06 +00:00
Sander de Smalen	512d57f1a5	[AArch64][SVE] Asm: Support for CPY immediate instructions Predicated copy of possibly shifted immediate value into SVE vector, along with MOV-aliases. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47517 llvm-svn: 333868	2018-06-04 05:40:46 +00:00
Amara Emerson	5a3bb68e12	[AArch64][GlobalISel] Zero-extend s1 values when returning. Before we were relying on the any extend of the s1 to s32, but for AAPCS we need to zero-extend it to at least s8. Fixes PR36719 Differential Revision: https://reviews.llvm.org/D47425 llvm-svn: 333747	2018-06-01 13:20:32 +00:00
Sander de Smalen	f95ea047e5	[AArch64][SVE] Asm: Support for FDUP_ZI (copy fp immediate) instruction. Unpredicated copy of floating-point immediate value into SVE vector, along with MOV-aliases. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47482 llvm-svn: 333744	2018-06-01 12:54:46 +00:00
Sander de Smalen	97ca6b9e09	[AArch64][SVE] Asm: Support for DUPM (masked immediate) instruction. Unpredicated copy of repeating immediate pattern to SVE vector, along with MOV-aliases. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47328 llvm-svn: 333731	2018-06-01 07:25:46 +00:00
Francis Visoiu Mistrih	90aba024c5	[MC] Fallback on DWARF when generating compact unwind on AArch64 Instead of asserting when using the def_cfa directive with a register different from fp, fallback on DWARF. Easily triggered with: .cfi_def_cfa x1, 32; rdar://40249694 Differential Revision: https://reviews.llvm.org/D47593 llvm-svn: 333667	2018-05-31 16:33:26 +00:00
Luke Geeson	2e09995d42	[AArch64] Reverted rL333427 fixing Clang UnitTest Failure llvm-svn: 333634	2018-05-31 08:27:53 +00:00
Roman Tereshin	5a65eb75c7	[GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...) Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333618	2018-05-31 01:56:05 +00:00
Roman Tereshin	8f1753e994	[GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs This is to make it clear what kind of bugs the LegalizerInfo::verifier is able to catch and test its output Reviewers: aemerson, qcolombet Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D46338 llvm-svn: 333597	2018-05-30 22:10:04 +00:00
Tim Northover	d8949f5002	AArch64: print correct annotation for ADRP addresses. The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the actual PC-offset that will be calculated. llvm-svn: 333525	2018-05-30 09:54:59 +00:00
Sander de Smalen	bdf09fe7a2	[AArch64][AsmParser] Fix segfault on illegal fpimm. Floating point immediate combining a negative sign and a hexadecimal number, e.g. #-0x0 caused the compiler to crash. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: javed.absar Differential Revision: https://reviews.llvm.org/D47483 llvm-svn: 333524	2018-05-30 09:54:19 +00:00
Evandro Menezes	f8425340e4	[AArch64] Fix PR32384: bump up the number of stores per memset and memcpy As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change makes the inlining of `memset()` and `memcpy()` more aggressive when compiling for speed. The tuning remains the same when optimizing for size. Patch by: Sebastian Pop <s.pop@samsung.com> Evandro Menezes <e.menezes@samsung.com> Differential revision: https://reviews.llvm.org/D45098 llvm-svn: 333429	2018-05-29 15:58:50 +00:00
Amara Emerson	d5a9e7bbc9	Revert "[AArch64] added FP16 vcvth intrinsic support" This reverts commit r333410 due to bot failures. llvm-svn: 333427	2018-05-29 15:34:22 +00:00
Sander de Smalen	8704b03c4d	[AArch64][SVE] Asm: Support for predicated LSL/LSR (vectors) Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D47365 llvm-svn: 333422	2018-05-29 14:40:24 +00:00
Sander de Smalen	26b9b2a8c3	[AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions. This patch addresses the following variants: - bitmask immediate, e.g. 'and z0.d, z0.d, #0x6'. - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'. - predicated data vectors, e.g. 'and z0.d, p0/m, z0.d, z1.d'. And also several aliases, such as: - ORN, alias of ORR. - EON, alias of EOR. - BIC, alias of AND (immediate variant) - MOV, alias of ORR (if unpredicated and source register operands are the same) Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47363 llvm-svn: 333414	2018-05-29 13:08:43 +00:00
Luke Geeson	16092ab3c5	[AArch64] added FP16 vcvth intrinsic support Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705 Reviewers: javed.absar, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: llvm-commits, SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46311 llvm-svn: 333410	2018-05-29 11:40:33 +00:00
Sander de Smalen	98686c6b15	[AArch64][SVE] Asm: Support for ADD (immediate) instructions. This patch adds addsub_imm8_opt_lsl_(i8\|i16\|i32\|i64) operands that are unsigned values in the range 0 to 255. For element widths of 16 bits or higher it may also be a signed multiple of 256 in the range 0 to 65280. Note: This also does some refactoring to reuse convenience function getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be accepted to be mapped to SUB #4096. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47310 llvm-svn: 333408	2018-05-29 10:39:49 +00:00
Sander de Smalen	6e2a5b4cf0	Fix ubsan errors introduced by r333263 re. left-shifting negative values. llvm-svn: 333270	2018-05-25 11:41:04 +00:00
Sander de Smalen	62770795a5	[AArch64][SVE] Asm: Support for DUP (immediate) instructions. Unpredicated copy of optionally-shifted immediate to SVE vector, along with MOV-aliases. This patch contains parsing and printing support for cpy_imm8_opt_lsl_(i8\|i16\|i32\|i64). This operand allows a signed value in the range -128 to +127. For element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512. For element-width of 8 bits a range of -128 to 255 is accepted, since a copy of a byte can be considered either signed/unsigned. Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift() and moves the behaviour of trying to shift a plain immediate by an allowed shift-value to its addImmWithOptionalShiftOperands() method, so that the parsing itself is generic and allows immediates from multiple shifted operands. This is done because an immediate can be divisible by both shifted operands. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D47309 llvm-svn: 333263	2018-05-25 09:47:52 +00:00
Eli Friedman	9e177882aa	[AArch64] Improve orr+movk sequences for MOVi64imm. The existing code has three different ways to try to lower a 64-bit immediate to the sequence ORR+MOVK. The result is messy: it misses some possible sequences, and the order of the checks means we sometimes emit two MOVKs when we only need one. Instead, just use a simple loop to try all possible two-instruction ORR+MOVK sequences. Differential Revision: https://reviews.llvm.org/D47176 llvm-svn: 333218	2018-05-24 19:38:23 +00:00
Geoff Berry	98150e3a62	[AArch64] Take advantage of variable shift/rotate amount implicit mod operation. Summary: Optimize code generated for variable shifts/rotates by taking advantage of the implicit and/mod done on the variable shift amount register. Resolves bug 27582 and bug 37421. Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46844 llvm-svn: 333214	2018-05-24 18:29:42 +00:00
Chad Rosier	3f66363139	[CodeGen][AArch64] Use RegUnits to track register aliases. (NFC) Use RegUnits to track register aliases in AArch64RedundantCopyElimination. Differential Revision: https://reviews.llvm.org/D47269 llvm-svn: 333107	2018-05-23 17:49:38 +00:00
Alex Bradbury	0a59f18951	[AArch64] Use addAliasForDirective to support data directives The AArch64 asm parser currently has custom parsing logic for .hword, .word, and .xword. Rather than use this custom logic, we can just use addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue. Differential Revision: https://reviews.llvm.org/D47000 llvm-svn: 333077	2018-05-23 11:17:20 +00:00
Eli Friedman	785acce51d	Delete unused variable from r333015. (The assertion suppressed the unused variable warning on Release+Asserts builds, so I didn't notice.) llvm-svn: 333018	2018-05-22 19:38:07 +00:00
Eli Friedman	042dc9e092	[MachineOutliner] Add "thunk" outlining for AArch64. When we're outlining a sequence that ends in a call, we can save up to three instructions in the outlined function by turning the call into a tail-call. I refer to this as thunk outlining because the resulting outlined function looks like a thunk; suggestions welcome for a better name. In addition to making the outlined function shorter, thunk outlining allows outlining calls which would otherwise be illegal to outline: we don't need to save/restore LR, so we don't need to prove anything about the stack access patterns of the callee. To make this work effectively, I also added MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner code; this allows treating an arbitrary instruction as a terminator in the suffix tree. Differential Revision: https://reviews.llvm.org/D47173 llvm-svn: 333015	2018-05-22 19:11:06 +00:00
Roman Lebedev	7772de25d0	[DAGCombine][X86][AArch64] Masked merge unfolding: vector edition. Summary: This appears to be the last missing piece for the masked merge pattern handling in the backend. This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated. Differential Revision: https://reviews.llvm.org/D46528 llvm-svn: 332904	2018-05-21 21:41:02 +00:00
Peter Collingbourne	dcd7d6c331	MC: Separate creating a generic object writer from creating a target object writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868	2018-05-21 19:20:29 +00:00
Peter Collingbourne	571a3301ae	MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857	2018-05-21 17:57:19 +00:00
Peter Collingbourne	e3f652973e	Support: Simplify endian stream interface. NFCI. Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757	2018-05-18 19:46:24 +00:00
Peter Collingbourne	f7b81db715	MC: Change the streamer ctors to take an object writer instead of a stream. NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749	2018-05-18 18:26:45 +00:00
Clement Courbet	8892c7db08	[ExynosM3] Fix scheduling info. Differential Revision: https://reviews.llvm.org/D46356 llvm-svn: 332713	2018-05-18 13:10:41 +00:00
Eli Friedman	4081a57af7	[MachineOutliner] Count savings from outlining in bytes. Counting the number of instructions is both unintuitive and inaccurate. On AArch64, this only affects the generated remarks and certain rare pseudo-instructions, but it will have a bigger impact on other targets. Differential Revision: https://reviews.llvm.org/D46921 llvm-svn: 332685	2018-05-18 01:52:16 +00:00
Sander de Smalen	75cfa34156	[AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+scalar) store instructions. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46680 llvm-svn: 332584	2018-05-17 09:05:41 +00:00
Eli Friedman	ddbf6d6514	[MachineOutliner] Don't outline instructions that modify SP. This breaks the code which saves and restores LR, so we can't outline without doing something more complicated for stack adjustment. Found by inspection; we get lucky in most cases because getMemOpInfo only handles STRWpost, not any other pre/post-increment forms. But it hits a couple of artificial testcases in the tree. Differential Revision: https://reviews.llvm.org/D46920 llvm-svn: 332529	2018-05-16 21:20:16 +00:00
Eli Friedman	02709bcb78	[MachineOutliner] Don't save/restore LR for tail calls. The cost computation assumes we do this correctly, but the actual lowering was wrong. Differential Revision: https://reviews.llvm.org/D46923 llvm-svn: 332514	2018-05-16 19:49:01 +00:00
Sander de Smalen	22176a2242	[AArch64][SVE] Improve diagnostics for vectors with incorrect element-size. For regular SVE vector operands, this patch introduces a more sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b). For example: add z0.s, z1.s, z2.b -> invalid element width ^_____^ mismatch For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes a slightly different approach and instead returns a 'invalid operand' if the element size is not as expected. This is because the diagnostics are more specificied to suggest using the right shift/extend suffix. This is a trade-off not to introduce more operand classes and still provide useful diagnostics for LD1 and PRF instructions. For example: ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw\|sxtw)' ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand ^________________^ mismatch For gather prefetches, both 'z0.s' and 'z0.d' would be allowed: prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw\|sxtw) #2' prfw #0, p0, [x0, z0.d] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl\|uxtw\|sxtw) #2' Without this change, the diagnostic would unnecessarily suggest a different element size: prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl\|uxtw\|sxtw) #2' Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46688 llvm-svn: 332483	2018-05-16 15:45:17 +00:00
Sirish Pande	cabe50a308	[AArch64] Gangup loads and stores for pairing. Keep loads and stores together (target defines how many loads and stores to gang up), such that it will help in pairing and vectorization. Differential Revision https://reviews.llvm.org/D46477 llvm-svn: 332482	2018-05-16 15:36:52 +00:00
Sander de Smalen	bbc4e9a4e3	[AArch64][SVE] Asm: Support for gather PRF prefetch instructions Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46686 llvm-svn: 332472	2018-05-16 14:16:01 +00:00
Amara Emerson	0d6a26dffc	[GlobalISel][IRTranslator] Split aggregates during IR translation. We currently handle all aggregates by creating one large LLT, and letting the legalizer deal with splitting them up. However using this approach means that we can't support big endian code correctly. This patch changes the way that the IRTranslator deals with aggregate values, by splitting them up into their constituent element values. To do this, parts of the translator need to be modified to deal with multiple VRegs for a single Value. A new Value to VReg mapper is introduced to help keep compile time under control, currently there is no measurable impact on CTMark despite the extra code being generated in some cases. Patch is based on the original work of Tim Northover. Differential Revision: https://reviews.llvm.org/D46018 llvm-svn: 332449	2018-05-16 10:32:02 +00:00
Peter Smith	c811758da6	[AArch64] Support "S" inline assembler constraint This patch re-introduces the "S" inline assembler constraint. This matches an absolute symbolic address or a label reference. The primary use case is asm("adrp %0, %1\n\t" "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var)); I say re-introduces as it seems like "S" was implemented in the original AArch64 backend, but it looks like it wasn't carried forward to the merged backend. The original implementation had A and L modifiers that could be used to print ":lo12:" to the string. It looks like gcc doesn't use these and :lo12: is expected to be written in the inline assembly string so I've not implemented A and L. Clang already supports the S modifier. Fixes PR37180 Differential Revision: https://reviews.llvm.org/D46745 llvm-svn: 332444	2018-05-16 09:33:25 +00:00
Sander de Smalen	a680f558be	[AArch64][SVE] Asm: Support for structured LD2, LD3 and LD4 (scalar+scalar) load instructions. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D46679 llvm-svn: 332442	2018-05-16 09:16:20 +00:00
Sander de Smalen	67f9154964	[AArch64][SVE] Asm: Support for contiguous PRF prefetch instructions. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46682 llvm-svn: 332433	2018-05-16 07:50:09 +00:00
Evandro Menezes	8d522d811a	[AArch64] Improve single vector lane unscaled stores When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead. In this case, also using the unscaled offset. Differential revision: https://reviews.llvm.org/D46762 llvm-svn: 332394	2018-05-15 20:41:12 +00:00
Evandro Menezes	14fa2e4fa5	[AArch64] Improve single vector lane stores When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead. Differential revision: https://reviews.llvm.org/D46655 llvm-svn: 332251	2018-05-14 15:26:35 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Sander de Smalen	93380371bb	[AArch64][SVE] Extend parsing of Prefetch operation for SVE. Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D46681 llvm-svn: 332234	2018-05-14 11:54:41 +00:00
Geoff Berry	60460268c0	[AArch64] Fix performPostLD1Combine to check for constant lane index. Summary: performPostLD1Combine in AArch64ISelLowering looks for vector insert_vector_elt of a loaded value which it can optimize into a single LD1LANE instruction. The code checking for the pattern was not checking if the lane index was a constant which could cause two problems: - an assert when lowering the LD1LANE ISD node since it assumes an constant operand - an assert in isel if the lane index value depends on the post-incremented base register Both of these issues are avoided by simply checking that the lane index is a constant. Fixes bug 35822. Reviewers: t.p.northover, javed.absar Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46591 llvm-svn: 332103	2018-05-11 16:25:06 +00:00
Haicheng Wu	0aae2bc260	[CGP] Split large data structres to sink more GEPs Accessing the members of a large data structures needs a lot of GEPs which usually have large offsets due to the size of the underlying data structure. If the offsets are too large to fit into the r+i addressing mode, these GEPs cannot be sunk to their users' blocks and many extra registers are needed then to carry the values of these GEPs. This patch tries to split a large data struct starting from %base like the following. Before: BB0: %base = BB1: %gep0 = gep %base, off0 %gep1 = gep %base, off1 %gep2 = gep %base, off2 BB2: %load1 = load %gep0 %load2 = load %gep1 %load3 = load %gep2 After: BB0: %base = %new_base = gep %base, off0 BB1: %new_gep0 = %new_base %new_gep1 = gep %new_base, off1 - off0 %new_gep2 = gep %new_base, off2 - off0 BB2: %load1 = load i32, i32* %new_gep0 %load2 = load i32, i32* %new_gep1 %load3 = load i32, i32* %new_gep2 In the above example, the struct is split into two parts. The first part still starts from %base and the second part starts from %new_base. After the splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to BB2 and folded into their users. The algorithm to split data structure is simple and very similar to the work of merging SExts. First, it collects GEPs that have large offsets when iterating the blocks. Second, it splits the underlying data structures and updates the collected GEPs to use smaller offsets. Differential Revision: https://reviews.llvm.org/D42759 llvm-svn: 332015	2018-05-10 18:27:36 +00:00
Adhemerval Zanella	f384bc7166	[AArch64] Improve cost of vector division by constant With custom lowering for vector MULLH{S,U}, it is now profitable to vectorize a divide by constant loop for the custom types (v16i8, v8i16, and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which uses a multiply by constant plus adjustment to express a divide by constant. Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by Instruction::AShr. llvm-svn: 331873	2018-05-09 12:48:22 +00:00
Daniel Sanders	618437459c	Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Reverting this to see if the clang-cmake-aarch64-global-isel and clang-cmake-aarch64-quick bots are failing because of this commit. We know it wasn't r331819. llvm-svn: 331846	2018-05-09 05:00:17 +00:00
Shiva Chen	801bf7ebbe	[DebugInfo] Examine all uses of isDebugValue() for debug instructions. Because we create a new kind of debug instruction, DBG_LABEL, we need to check all passes which use isDebugValue() to check MachineInstr is debug instruction or not. When expelling debug instructions, we should expel both DBG_VALUE and DBG_LABEL. So, I create a new function, isDebugInstr(), in MachineInstr to check whether the MachineInstr is debug instruction or not. This patch has no new test case. I have run regression test and there is no difference in regression test. Differential Revision: https://reviews.llvm.org/D45342 Patch by Hsiangkai Wang. llvm-svn: 331844	2018-05-09 02:42:00 +00:00
Daniel Sanders	d24dcdd1f7	[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Reviewed By: aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 llvm-svn: 331816	2018-05-08 22:26:39 +00:00
Sander de Smalen	d8e76494fc	[AArch64][SVE] Asm: Support for LD1R load-and-replicate scalar instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D46251 llvm-svn: 331758	2018-05-08 10:46:55 +00:00
Sander de Smalen	20eede7093	[AArch64] Disallow vector operand if FPR128 Q register is required. Patch https://reviews.llvm.org/D41445 changed the behaviour of 'isReg()' to also return 'true' if the parsed register operand is a vector register. Code in the AsmMatcher checks if a register is a subclass of the expected register class. However, even though both parsed registers map to the same physical register, the 'v' register is of kind 'NeonVector', where 'q' is of type Scalar, where isSubclass() does not distinguish between the two cases. The solution is to use an AsmOperand instead of the register directly, and use the PredicateMethod to distinguish the two operands. This fixes for example: ldr v0, [x0] // 'v0' is an invalid operand for this instruction ldr q0, [x0] // valid Reviewers: aemerson, Gerolf, SjoerdMeijer, javed.absar Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D46310 llvm-svn: 331755	2018-05-08 10:01:04 +00:00
Daniel Sanders	f84bc3793e	[globalisel] Update GlobalISel emitter to match new representation of extending loads Summary: Previously, a extending load was represented at (G_EXT (G_LOAD x)). This had a few drawbacks: G_LOAD had to be legal for all sizes you could extend from, even if registers didn't naturally hold those sizes. * All sizes you could extend from had to be allocatable just in case the extend went missing (e.g. by optimization). * At minimum, G_EXT and G_TRUNC had to be legal for these sizes. As we improve optimization of extends and truncates, this legality requirement would spread without considerable care w.r.t when certain combines were permitted. The SelectionDAG importer required some ugly and fragile pattern rewriting to translate patterns into this style. This patch changes the representation to: * (G_[SZ]EXTLOAD x) * (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits() which resolves these issues by allowing targets to work entirely in their native register sizes, and by having a more direct translation from SelectionDAG patterns. Each extending load can be lowered by the legalizer into separate extends and loads, however a target that supports s1 will need the any-extending load to extend to at least s8 since LLVM does not represent memory accesses smaller than 8 bit. The legalizer can widenScalar G_LOAD into an any-extending load but sign/zero-extending loads need help from something else like a combiner pass. A follow-up patch that adds combiner helpers for for this will follow. The new representation requires that the MMO correctly reflect the memory access so this has been corrected in a couple tests. I've also moved the extending loads to their own tests since they are (mostly) separate opcodes now. Additionally, the re-write appears to have invalidated two tests from select-with-no-legality-check.mir since the matcher table no longer contains loads that result in s1's and they aren't legal in AArch64 anymore. Depends on D45540 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar Reviewed By: rtereshin Subscribers: javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45541 llvm-svn: 331601	2018-05-05 20:53:24 +00:00
Craig Topper	781aa181ab	Fix a bunch of places where operator-> was used directly on the return from dyn_cast. Inspired by r331508, I did a grep and found these. Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa. llvm-svn: 331577	2018-05-05 01:57:00 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Adhemerval Zanella	a57ef17ab6	[AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32 This patch adds a custom lowering for ISD::MULH{S,U} used on divide by constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV). New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL can be correctly lowered to smull2 and umull2. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46009 llvm-svn: 331522	2018-05-04 14:33:55 +00:00
Martin Storsjo	d0b5034b8a	[COFF, ARM64] Hook up a few remaining relocations Differential Revision: https://reviews.llvm.org/D46355 llvm-svn: 331384	2018-05-02 18:24:37 +00:00
Sander de Smalen	659a48cd38	[AArch64][SVE] Asm: Support for LDR/STR fill and spill instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D46270 llvm-svn: 331352	2018-05-02 13:32:39 +00:00
Sander de Smalen	57da042e32	[AArch64][SVE] Asm: Support for scatter ST1 store instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46248 llvm-svn: 331349	2018-05-02 13:00:30 +00:00
Sander de Smalen	414d2358a4	[AArch64][SVE] Asm: Support for non-temporal, contiguous LDNT1/STNT1 load/store instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D46269 llvm-svn: 331343	2018-05-02 11:48:49 +00:00
Sander de Smalen	c1e44bdfc7	[AArch64][SVE] Asm: Support for LD1RQ load-and-replicate quad-word vector instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46250 llvm-svn: 331339	2018-05-02 08:49:08 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Sander de Smalen	788dc70c78	[AArch64][SVE] Asm: Support for contiguous ST1 (scalar+scalar) store instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D46121 llvm-svn: 331260	2018-05-01 13:36:03 +00:00
Daniel Sanders	2de9d4ad5d	Fix infinite loop after r331115 There are two separate fixes here: * The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction. * The target should not be requesting lowering of non-extending loads. llvm-svn: 331201	2018-04-30 17:20:01 +00:00
Sander de Smalen	5861c263e0	[AArch64][SVE] Asm: Improve diagnostics for gather loads. This patch extends the 'isSVEVectorRegWithShiftExtend' function to improve diagnostics for SVE's gather load (scalar + vector) addressing modes. Instead of always suggesting the 'unscaled' addressing mode, the use of DiagnosticPredicate enables a more specific error message in the context where the scaling is incorrect. For example: ld1h z0.d, p0/z, [x0, z0.d, lsl #2] ^ shift amount should be '1' Instead of suggesting the packed, unscaled addressing mode: expected 'z[0..31].d, (uxtw\|sxtw)' the assembler now suggests using the proper scaling: expected 'z[0..31].d, (lsl\|uxtw\|sxtw) #1' Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D46124 llvm-svn: 331162	2018-04-30 07:24:38 +00:00
Sander de Smalen	afe1ee2180	[AArch64][AsmParser] NFC: Cleanup of addOperands functions Most of the add<operandname>Operands() functions are the same and can be replaced by using a single 'RenderMethod' in the AArch64InstrFormats.td file. Since many of the scaled immediates (with different scaling/bits) are the same, most of these can reuse the same AsmOperandClass. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D46122 llvm-svn: 331146	2018-04-29 18:18:21 +00:00
Sander de Smalen	50ded90072	[AArch64][SVE] Asm: Support for gather LD1/LDFF1 (vector + imm) load instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D46120 llvm-svn: 331145	2018-04-29 17:33:38 +00:00
Daniel Sanders	5eb9f581b6	[globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them Summary: Previously, a extending load was represented at (G_EXT (G_LOAD x)). This had a few drawbacks: G_LOAD had to be legal for all sizes you could extend from, even if registers didn't naturally hold those sizes. * All sizes you could extend from had to be allocatable just in case the extend went missing (e.g. by optimization). * At minimum, G_EXT and G_TRUNC had to be legal for these sizes. As we improve optimization of extends and truncates, this legality requirement would spread without considerable care w.r.t when certain combines were permitted. The SelectionDAG importer required some ugly and fragile pattern rewriting to translate patterns into this style. This patch begins changing the representation to: * (G_[SZ]EXTLOAD x) * (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits() which resolves these issues by allowing targets to work entirely in their native register sizes, and by having a more direct translation from SelectionDAG patterns. This patch introduces the new generic instructions and new variation on G_LOAD and adds lowering for them to convert back to the existing representations. Depends on D45466 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar Reviewed By: aemerson Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45540 llvm-svn: 331115	2018-04-28 18:14:50 +00:00
Jessica Paquette	0b6724917a	[MachineOutliner] Add defs to calls + don't track liveness on outlined functions This commit makes it so that if you outline a def of some register, then the call instruction created by the outliner actually reflects that the register is defined by the call. It also makes it so that outlined functions don't have the TracksLiveness property. Outlined calls shouldn't break liveness assumptions that someone might make. This also un-XFAILs the noredzone test, and updates the calls test. llvm-svn: 331095	2018-04-27 23:36:35 +00:00
Daniel Sanders	27fe8a5011	[globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand Summary: Currently only the memory size is supported but others can be added as needed. narrowScalar for G_LOAD and G_STORE now correctly update the MachineMemOperand and will refuse to legalize atomics since those need more careful expansions to maintain atomicity. Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar Reviewed By: aemerson Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45466 llvm-svn: 331071	2018-04-27 19:48:53 +00:00
Jun Bum Lim	47aece1344	[CodeGen] Use RegUnits to track register aliases (NFC) Summary: Use RegUnits to track register aliases in PostRASink and AArch64LoadStoreOptimizer. Reviewers: thegameg, mcrosier, gberry, qcolombet, sebpop, MatzeB, t.p.northover, javed.absar Reviewed By: thegameg, sebpop Subscribers: javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45695 llvm-svn: 331066	2018-04-27 18:44:37 +00:00
Francis Visoiu Mistrih	c855e92ca9	[AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true Put the first ldp at the end, so that the load-store optimizer can run and merge the ldp and the add into a post-index ldp. This didn't work in case no frame was needed and resulted in code size regressions. llvm-svn: 331044	2018-04-27 15:30:54 +00:00
Oliver Stannard	76088a5929	[AArch64] Codegen for v8.2A dot product intrinsics This adds IR intrinsics for the AArch64 dot-product instructions introduced in v8.2-A. Differential revisioon: https://reviews.llvm.org/D46107 llvm-svn: 331036	2018-04-27 13:45:32 +00:00
Eli Friedman	da018e5687	[MachineOutliner] Don't outline from functions with a section marking. The program might have unusual expectations for functions; for example, the Linux kernel's build system warns if it finds references from .text to .init.data. I'm not sure this is something we actually want to make any guarantees about (there isn't any explicit rule that would disallow outlining in this case), but we might want to be conservative anyway. Differential Revision: https://reviews.llvm.org/D46091 llvm-svn: 331007	2018-04-27 00:21:34 +00:00
Geoff Berry	08ab8c9544	[AArch64] Fix scavenged spill slot base when stack realignment required. Summary: Use the FP for scavenged spill slot accesses to prevent corruption of the callee-save region when the SP is re-aligned. Based on problem and patch reported by @paulwalker-arm This is an alternative to solution proposed in D45770 Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46063 llvm-svn: 330976	2018-04-26 18:50:45 +00:00
Matthew Simpson	b4096ebe26	[TTI, AArch64] Add transpose shuffle kind This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941	2018-04-26 13:48:33 +00:00
Sander de Smalen	fe17a78b86	[AArch64][SVE] Enable DiagnosticPredicates for SVE LD1 instructions. This patch extends the PredicateMethod of AsmOperands used in SVE's LD1 instructions with a DiagnosticPredicate. This makes them 'context sensitive' to the operand that has been parsed and tells the user to use the right register (with expected shift/extend), rather than telling the immediate is out of range when it actually parsed a register. Patch [2/2] in a series to improve assembler diagnostics for SVE: - Patch [1/2]: https://reviews.llvm.org/D45879 - Patch [2/2]: https://reviews.llvm.org/D45880 Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D45880 llvm-svn: 330934	2018-04-26 12:54:42 +00:00
Sander de Smalen	74f9e6720b	[AArch64][SVE] Asm: Support for gather LD1/LDFF1 (scalar + vector) load instructions. Patch [2/3] in series to add support for SVE's gather load instructions that use scalar+vector addressing modes: - Patch [1/3]: https://reviews.llvm.org/D45951 - Patch [2/3]: https://reviews.llvm.org/D46023 - Patch [3/3]: https://reviews.llvm.org/D45958 Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D46023 llvm-svn: 330928	2018-04-26 08:19:53 +00:00
Amara Emerson	1f5d994119	[AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic. rdar://38674040 llvm-svn: 330831	2018-04-25 14:43:59 +00:00
Sander de Smalen	eb896b148b	[AArch64][SVE] Asm: Add AsmOperand classes for SVE gather/scatter addressing modes. This patch adds parsing support for 'vector + shift/extend' and corresponding asm operand classes, needed for implementing SVE's gather/scatter addressing modes. The added combinations of vector (ZPR) and Shift/Extend are: Unscaled: ZPR64ExtLSL8: signed 64-bit offsets (z0.d) ZPR32ExtUXTW8: unsigned 32-bit offsets (z0.s, uxtw) ZPR32ExtSXTW8: signed 32-bit offsets (z0.s, sxtw) Unpacked and unscaled: ZPR64ExtUXTW8: unsigned 32-bit offsets (z0.d, uxtw) ZPR64ExtSXTW8: signed 32-bit offsets (z0.d, sxtw) Unpacked and scaled: ZPR64ExtUXTW<scale>: unsigned 32-bit offsets (z0.d, uxtw #<shift>) ZPR64ExtSXTW<scale>: signed 32-bit offsets (z0.d, sxtw #<shift>) Scaled: ZPR32ExtUXTW<scale>: unsigned 32-bit offsets (z0.s, uxtw #<shift>) ZPR32ExtSXTW<scale>: signed 32-bit offsets (z0.s, sxtw #<shift>) ZPR64ExtLSL<scale>: unsigned 64-bit offsets (z0.d, lsl #<shift>) ZPR64ExtLSL<scale>: signed 64-bit offsets (z0.d, lsl #<shift>) Patch [1/3] in series to add support for SVE's gather load instructions that use scalar+vector addressing modes: - Patch [1/3]: https://reviews.llvm.org/D45951 - Patch [2/3]: https://reviews.llvm.org/D46023 - Patch [3/3]: https://reviews.llvm.org/D45958 Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D45951 llvm-svn: 330805	2018-04-25 09:26:47 +00:00
Jessica Paquette	4f56428de1	[MachineOutliner] Check for explicit uses of LR/W30 in MI operands Before, the outliner would grab ADRPs that used LR/W30. This patch fixes that by checking for explicit uses of those registers before the special-casing for ADRPs. This also adds a test that ensures that those sorts of ADRPs won't be outlined. llvm-svn: 330783	2018-04-24 22:38:15 +00:00
Sander de Smalen	eb1053f9d3	[AArch64][SVE] Asm: Support for contiguous, first-faulting LDFF1 (scalar+scalar) load instructions. Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar Reviewed By: rengolin Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45946 llvm-svn: 330697	2018-04-24 08:59:08 +00:00
Peter Collingbourne	5ab4a4793e	Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure. This reland includes a check to prevent the DAG combiner from folding an offset that is smaller than the existing one. This can cause oscillations between two possible DAGs, which was the cause of the hang and later assertion failure observed on the lnt-ctmark-aarch64-O3-flto bot. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ Original commit message: > This is a code size win in code that takes offseted addresses > frequently, such as C++ constructors that typically need to compute > an offseted address of a vtable. This reduces the size of Chromium > for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 330630	2018-04-23 19:09:34 +00:00
Nico Weber	5d53aed419	Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt llvm-svn: 330584	2018-04-23 12:49:34 +00:00
Sander de Smalen	7893f722b2	[AArch64][SVE] Asm: Support for contiguous, non-faulting LDNF1 (scalar+imm) load instructions Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45684 llvm-svn: 330583	2018-04-23 12:43:19 +00:00
Sander de Smalen	1b6d374422	[AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+imm) store instructions. Reviewers: fhahn, rengolin, javed.absar, SjoerdMeijer, t.p.northover, echristo, evandro, huntergr Reviewed By: rengolin Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45681 llvm-svn: 330565	2018-04-23 07:50:35 +00:00
Jessica Paquette	2e5ada5c81	[MachineOutliner] Change B instruction for tail calls to TCRETURNdi First off, this is more correct than having the B. Second off, this was making a bot upset. This fixes that. Update the test to include -verify-machineinstrs as well to prevent stuff like this slipping by non debug/assert builds in the future. llvm-svn: 330459	2018-04-20 18:03:21 +00:00
Sander de Smalen	30f9f11d51	[AArch64][SVE] Asm: Support for contiguous LD1 (scalar+scalar) load instructions. This is patch [4/4] in a series to add assembler/disassembler support for SVE's contiguous LD1 (scalar+scalar) instructions: - Patch [1/4]: https://reviews.llvm.org/D45687 - Patch [2/4]: https://reviews.llvm.org/D45688 - Patch [3/4]: https://reviews.llvm.org/D45689 - Patch [4/4]: https://reviews.llvm.org/D45690 Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D45690 llvm-svn: 330423	2018-04-20 12:52:01 +00:00
Sander de Smalen	137efb231e	[AArch64][SVE] Fix diagnostic for SVE LD4 instructions: Diagnostic: 'index must be multiple of 3 in range [-32, 28]' Must be: 'index must be multiple of 4 in range [-32, 28]' llvm-svn: 330407	2018-04-20 09:45:50 +00:00
Sander de Smalen	367694b093	[AArch64][SVE] Added GPR64shifted and GPR64NoXZRshifted register classes. Summary: This is patch [3/4] in a series to add assembler/disassembler support for SVE's contiguous LD1 (scalar+scalar) instructions: - Patch [1/4]: https://reviews.llvm.org/D45687 - Patch [2/4]: https://reviews.llvm.org/D45688 - Patch [3/4]: https://reviews.llvm.org/D45689 - Patch [4/4]: https://reviews.llvm.org/D45690 Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: SjoerdMeijer Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45689 llvm-svn: 330406	2018-04-20 08:54:49 +00:00
Sander de Smalen	149916d29a	[AArch64][AsmParser] Extend RegOp with integrated 'shift/extend'. Summary: In some cases the shift/extend needs to be explicitly parsed together with the register, rather than as a separate operand. This is needed for addressing modes where the instruction as a whole dictates the scaling/extend, rather than specific bits in the instruction. By parsing them as a single operand, we avoid the need to pass an extra operand in all CodeGen patterns (because all operands need to have an associated value), and we avoid the need to update TableGen to accept operands that have no associated bits in the instruction. An added benefit of parsing them together is that the assembler can give a sensible diagnostic if the scaling is not correct. This is patch [2/4] in a series to add assembler/disassembler support for SVE's contiguous LD1 (scalar+scalar) instructions: - Patch [1/4]: https://reviews.llvm.org/D45687 - Patch [2/4]: https://reviews.llvm.org/D45688 - Patch [3/4]: https://reviews.llvm.org/D45689 - Patch [4/4]: https://reviews.llvm.org/D45690 Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn, SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45688 llvm-svn: 330394	2018-04-20 07:24:20 +00:00
Sander de Smalen	50d8702f26	[AArch64][AsmParser] NFC: Cleanup parsing of scalar registers. Summary: - Renamed tryParseRegister to tryParseScalarRegister, which now returns an OperandMatchResultTy. - Moved matching of certain aliases into matchRegisterNameAlias. - Changed type of most 'Reg' variables to 'unsigned'. This is patch [1/4] in a series to add assembler/disassembler support for SVE's contiguous LD1 (scalar+scalar) instructions: - Patch [1/4]: https://reviews.llvm.org/D45687 - Patch [2/4]: https://reviews.llvm.org/D45688 - Patch [3/4]: https://reviews.llvm.org/D45689 - Patch [4/4]: https://reviews.llvm.org/D45690 Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro, samparker Reviewed By: samparker Subscribers: samparker, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45687 llvm-svn: 330311	2018-04-19 07:35:08 +00:00
Amara Emerson	9de072f8ae	[AArch64] Add isel pattern for v8i8->v2f32 NVCASTs. rdar://39454635 llvm-svn: 330276	2018-04-18 17:10:19 +00:00
Sander de Smalen	7a210db81e	[AArch64][SVE] Asm: Support for structured LD4 (scalar+imm) load instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45624 llvm-svn: 330120	2018-04-16 10:46:18 +00:00
Sander de Smalen	d239eb3ce3	[AArch64][SVE] Asm: Support for structured LD3 (scalar+imm) load instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45623 llvm-svn: 330116	2018-04-16 10:10:48 +00:00
Sander de Smalen	f836af869d	[AArch64][SVE] Asm: Support for structured LD2 (scalar+imm) load instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45622 llvm-svn: 330108	2018-04-16 07:09:29 +00:00
Tim Northover	271d3d2771	MachO: trap unreachable instructions Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073	2018-04-13 22:25:20 +00:00
Peter Collingbourne	ab40e44ba1	Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses." Caused a hang and eventually an assertion failure in LTO builds of 7zip-benchmark on aarch64 iOS targets. http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/ llvm-svn: 330063	2018-04-13 20:21:00 +00:00
Sander de Smalen	5b12db5d23	[AArch64][SVE] Asm: Support for contiguous LD1 (scalar+imm) load instructions Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45618 llvm-svn: 330024	2018-04-13 14:41:36 +00:00
Sander de Smalen	5c62598b0d	[AArch64][SVE] Asm: Support for contiguous ST1 (scalar+imm) store instructions. Summary: Added instructions for contiguous stores, ST1, with scalar+imm addressing modes and corresponding tests. The patch also adds parsing of 'mul vl' as needed for the VL-scaled immediate. This is patch [6/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45432 llvm-svn: 330014	2018-04-13 12:56:14 +00:00
Sander de Smalen	ea626e37ba	[AArch64][SVE] Asm: Add support for parsing and printing SVE vector lists. Summary: Added Z_(b\|h\|s\|d) vector list RegisterOperands along with support to add/print the vector lists. This is patch [5/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45431 llvm-svn: 330000	2018-04-13 09:11:53 +00:00
Peter Collingbourne	00db326b0d	AArch64: Introduce a DAG combine for folding offsets into addresses. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. This reduces the size of Chromium for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329956	2018-04-12 21:23:55 +00:00
Jessica Paquette	8aa6cd5cb9	[AArch64] Move AFI->setRedZone(false) to top of emitPrologue AFI->setRedZone(false) was put in the wrong place before, and so it only fired on functions that didn't have stack frames. This moves that to the top of emitPrologue to make sure that every function without a redzone has it set correctly. This also adds a function representing one of the early exit cases (GHC calling convention) to the MachineOutliner noredzone test to ensure that we can outline from functions like these, where we never use a redzone. llvm-svn: 329922	2018-04-12 16:16:18 +00:00
Sander de Smalen	525e3225c2	[AArch64][AsmParser] Unify 'addVectorListOperands' functions. Summary: Merged 'addVectorList64Operands' and 'addVectorList128Operands' into a generic 'addVectorListOperands', which can be easily extended to work for SVE vectors. This is patch [4/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45430 llvm-svn: 329909	2018-04-12 13:19:32 +00:00
Sander de Smalen	650234ba36	[AArch64][AsmParser] Make parse function for VectorLists generic to other vector types. Summary: Added 'RegisterKind' to the VectorListOp structure, so that this operand type can be reused for SVE vector lists in a later patch. It also refactors the 'tryParseVectorList' function so it can be used directly in the ParserMethod of an operand. The parsing can now parse multiple kinds of vectors and recover if there is no match. This is patch [3/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45429 llvm-svn: 329900	2018-04-12 11:40:52 +00:00
Sander de Smalen	c88f9a1a57	[AArch64][AsmParser] Split index parsing from vector list. Summary: Place parsing of a vector index into a separate function to reduce duplication, since the code is duplicated in both the parsing of a Neon vector register operand and a Neon vector list. This is patch [2/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: rengolin Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45428 llvm-svn: 329809	2018-04-11 14:10:37 +00:00
Francis Visoiu Mistrih	6463922e3a	[AArch64] Fix regression after r329691 In r329691, we would choose FP even if the offset wouldn't fit, just because the offset is smaller than the one from BP. This made many accesses through FP need to scavenge a register, which resulted in slower and bigger code for no good reason. This patch now always picks the offset that fits first, even if FP is preferred. llvm-svn: 329797	2018-04-11 12:36:55 +00:00
Sander de Smalen	73937b7c9d	[AArch64][AsmParser] Unify code for parsing Neon/SVE vectors. Summary: Merged 'tryMatchVectorRegister' (specific to Neon) and 'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and created a generic 'parseVectorKind()' function that returns the #Elements and ElementWidth of a vector suffix. This reduces the duplication of this functionality between two the vector implementations. This is patch [1/6] in a series to add assembler/disassembler support for SVE's contiguous ST1 (scalar+imm) instructions. Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro Reviewed By: fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45427 llvm-svn: 329782	2018-04-11 07:36:10 +00:00
Geoff Berry	5696e075c3	[AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass. Summary: When inserting MOVs to avoid Falkor HWPF collisions, the non-base register operand of load instructions (e.g. a register offset) was not being considered live, so it could potentially have been used as a scratch register, clobbering the actual offset value. Reviewers: mcrosier Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45502 llvm-svn: 329761	2018-04-10 21:43:03 +00:00
Jessica Paquette	a450ed2352	Recommit r329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation" This commit fixes the bot failures that were coming up before with r329716. The fix was to move the check for "isInSection()" inside of the if condition and emit the error there instead of waiting to get past the unreachable statement. This should work in debug and release builds now. llvm-svn: 329746	2018-04-10 19:46:43 +00:00
Amara Emerson	e27d5016ef	[AArch64] Fix isel failure when BUILD_PAIR nodes are left over. rdar://39175175 llvm-svn: 329743	2018-04-10 19:01:58 +00:00
Jessica Paquette	c140bbddaf	Revert 329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation" This broke a bunch of bots so I'm reverting while I figure it out. llvm-svn: 329728	2018-04-10 17:53:41 +00:00
Peter Collingbourne	a7d936f0c0	Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF." Caused a build failure in check-tsan. llvm-svn: 329718	2018-04-10 16:19:30 +00:00
Jessica Paquette	e4b90d82a0	Add missing nullptr check to AArch64MachObjectWriter::recordRelocation There was missing nullptr check before a call to getSection() in recordRelocation. This would result in a segfault in code like the attached test. This adds the missing check and a test which makes sure we get the expected error output. llvm-svn: 329716	2018-04-10 15:53:28 +00:00
Chad Rosier	af7519e9af	Fix spelling. NFC. llvm-svn: 329709	2018-04-10 14:57:13 +00:00
Francis Visoiu Mistrih	f2c22050e8	[AArch64] Use FP to access the emergency spill slot In the presence of variable-sized stack objects, we always picked the base pointer when resolving frame indices if it was available. This makes us hit an assert where we can't reach the emergency spill slot if it's too far away from the base pointer. Since on AArch64 we decide to place the emergency spill slot at the top of the frame, it makes more sense to use FP to access it. The changes here don't affect only emergency spill slots but all the frame indices. The goal here is to try to choose between FP, BP and SP so that we minimize the offset and avoid scavenging, or worse, asserting when trying to access a slot allocated by the scavenger. Previously discussed here: https://reviews.llvm.org/D40876. Differential Revision: https://reviews.llvm.org/D45358 llvm-svn: 329691	2018-04-10 11:29:40 +00:00
Tim Northover	6a1c51bf6b	AArch64: diagnose unpredictable store-exclusive instructions Much like any written register in load/store instructions, the status register is not allowed to overlap with any others. So diagnose it like we already do with the other cases. llvm-svn: 329687	2018-04-10 11:04:29 +00:00
Sander de Smalen	f974e255fe	[AArch64][SVE] Asm: Add support for unpredicated LSL/LSR (shift by immediate) instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45371 llvm-svn: 329681	2018-04-10 10:03:13 +00:00
Sander de Smalen	30fda45c18	[AArch64][SVE] Asm: Add support for SVE INDEX instructions. Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro Reviewed By: rengolin, fhahn Subscribers: tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D45370 llvm-svn: 329674	2018-04-10 07:01:53 +00:00
Daniel Sanders	5281b02e84	[globalisel][legalizerinfo] Add support for the Lower action in getActionDefinitionsBuilder() and use it in AArch64. Lower is slightly odd. It often doesn't change the type but the lowerings do use the new type to decide what code to create. Treat it like a mutation but provide convenience functions that re-use the existing type. Re-uses the existing tests: test/CodeGen/AArch64/GlobalISel/legalize-rem.mir test/CodeGen/AArch64/GlobalISel//legalize-mul.mir test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir llvm-svn: 329623	2018-04-09 21:10:09 +00:00
Peter Collingbourne	5cff2409ae	AArch64: Allow offsets to be folded into addresses with ELF. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. It reduces the size of Chromium for Android's .text section by 46KB, or 56KB with ThinLTO (which exposes more opportunities to use a direct access rather than a GOT access). Because the addend range is limited in COFF and Mach-O, this is enabled for ELF only. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329611	2018-04-09 19:59:57 +00:00
Hiroshi Inoue	9ff2380ea6	[NFC] fix trivial typos in comments and error message "is is" -> "is", "are are" -> "are" llvm-svn: 329546	2018-04-09 04:37:53 +00:00
Sanjay Patel	0d7df36c66	[TargetSchedule] shrink interface for init(); NFCI The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540	2018-04-08 19:56:04 +00:00
Peter Collingbourne	f11eb3ebe7	AArch64: Implement support for the shadowcallstack attribute. The implementation of shadow call stack on aarch64 is quite different to the implementation on x86_64. Instead of reserving a segment register for the shadow call stack, we reserve the platform register, x18. Any function that spills lr to sp also spills it to the shadow call stack, a pointer to which is stored in x18. Differential Revision: https://reviews.llvm.org/D45239 llvm-svn: 329236	2018-04-04 21:55:44 +00:00
Jessica Paquette	bccd18b816	[MachineOutliner] Add `useMachineOutliner` target hook The MachineOutliner has a bunch of target hooks that will call llvm_unreachable if the target doesn't implement them. Therefore, if you enable the outliner on such a target, it'll just crash. It'd be much better if it'd just not run the outliner at all in this case. This commit adds a hook to TargetInstrInfo that returns false by default. Targets that implement the hook make it return true. The outliner checks the return value of this hook to decide whether or not to continue. llvm-svn: 329220	2018-04-04 19:13:31 +00:00
Mandeep Singh Grang	93ab79d205	[AArch64] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: t.p.northover, jmolloy, RKSimon, rengolin Reviewed By: rengolin Subscribers: dexonsmith, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44853 llvm-svn: 329216	2018-04-04 18:20:28 +00:00
Nico Weber	1cbd096914	Sort targetgen calls in lib/Target/*/CMakeLists. Makes it easier to see mistakes such as the one fixed in r329178 and makes the different target CMakeLists more consistent. Also remove some stale-looking comments from the Nios2 target cmakefile. No intended behavior change. llvm-svn: 329181	2018-04-04 12:37:44 +00:00
John Brawn	21d9b33d62	[AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y) Differential Revision: https://reviews.llvm.org/D44573 llvm-svn: 329163	2018-04-04 10:12:53 +00:00
Evandro Menezes	6b8d8f4010	[AArch64] Adjust the cost model for Exynos M3 Fix typo and simplify matching expression. llvm-svn: 329130	2018-04-03 22:57:17 +00:00
Jessica Paquette	642f6c61a3	[MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfo This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It returns true if the function is known to use a redzone, false if it is known to not use a redzone, and no value otherwise. This removes the requirement to pass -mno-red-zone when outlining for AArch64. https://reviews.llvm.org/D45189 llvm-svn: 329120	2018-04-03 21:56:10 +00:00
Petr Hosek	934e5d5436	[AArch64] Reserve x18 register on Fuchsia This register is reserved as a platform register on Fuchsia. Differential Revision: https://reviews.llvm.org/D45105 llvm-svn: 328950	2018-04-01 23:44:04 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
David Blaikie	8ad9a97310	Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header dependency Thanks to echristo for the pointers on direction. llvm-svn: 328737	2018-03-28 22:28:50 +00:00
Jessica Paquette	4aa14dbcc2	[MachineOutliner] Simplify call outlining + require valid callee save info for call outlining This commit simplifies the call outlining logic by removing references to the Function associated with the callee. To do this, it requires that valid callee save info is available to the outliner. llvm-svn: 328719	2018-03-28 17:52:31 +00:00
Jessica Paquette	2519ee7081	[MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operands If an ADRP appears with, say, a CPI operand, we shouldn't outline it. This moves the check for unsafe operands so that it occurs before the special-case for ADRPs. Also add a test for outlining ADRPs. llvm-svn: 328674	2018-03-27 22:23:48 +00:00
Rafael Auler	d058b882be	[AArch64] Decorate AArch64 instrs with OPERAND_PCREL Summary: This is a canonical way to teach objdump to print the target symbols for branches when disassembling AArch64 code. Reviewers: evandro, t.p.northover, espindola Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D44851 llvm-svn: 328638	2018-03-27 16:58:01 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
David Blaikie	6054e650ff	Move TargetLoweringObjectFile from CodeGen to Target to fix layering It's implemented in Target & include from other Target headers, so the header should be in Target. llvm-svn: 328392	2018-03-23 23:58:19 +00:00
John Brawn	e3b44f9de6	[AArch64] Don't reduce the width of loads if it prevents combining a shift Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321	2018-03-23 14:47:07 +00:00
Florian Hahn	588e640ea1	[AArch64] Clean-up a few over-eager regexps in models. Patch by Simon Pilgrim <llvm-dev@redking.me.uk> That is a slightly modified version of the AArch64 changes from Simon's D44687 . llvm-svn: 328303	2018-03-23 11:00:42 +00:00
Michael Zolotukhin	fab7a676c2	State that CFG is preserved in 'Falkor HW Prefetch Fix Late Phase'. That removes some redundant recomputations from the passes pipeline. llvm-svn: 328272	2018-03-22 23:44:40 +00:00
Jun Bum Lim	2ecb7ba4c6	[CodeGen] Add a new pass for PostRA sink Summary: This pass sinks COPY instructions into a successor block, if the COPY is not used in the current block and the COPY is live-in to a single successor (i.e., doesn't require the COPY to be duplicated). This avoids executing the the copy on paths where their results aren't needed. This also exposes additional opportunites for dead copy elimination and shrink wrapping. These copies were either not handled by or are inserted after the MachineSink pass. As an example of the former case, the MachineSink pass cannot sink COPY instructions with allocatable source registers; for AArch64 these type of copy instructions are frequently used to move function parameters (PhyReg) into virtual registers in the entry block.. For the machine IR below, this pass will sink %w19 in the entry into its successor (%bb.1) because %w19 is only live-in in %bb.1. ``` %bb.0: %wzr = SUBSWri %w1, 1 %w19 = COPY %w0 Bcc 11, %bb.2 %bb.1: Live Ins: %w19 BL @fun %w0 = ADDWrr %w0, %w19 RET %w0 %bb.2: %w0 = COPY %wzr RET %w0 ``` As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be able to see %bb.0 as a candidate. With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64. Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz Reviewed By: sebpop Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41463 llvm-svn: 328237	2018-03-22 20:06:47 +00:00
Evandro Menezes	36afbee1d8	[AArch64] Adjust the cost model for Exynos M3 Fix typo in the number of integer dividers. llvm-svn: 328027	2018-03-20 20:00:29 +00:00
Geoff Berry	0b64402adb	[AArch64][Falkor] Correct load/store increment scheduling details llvm-svn: 327982	2018-03-20 13:46:35 +00:00
Jessica Paquette	563548d8f3	[MachineOutliner] AArch64: Emit CFI instructions when outlining calls When outlining calls, the outliner needs to update CFI to ensure that, say, exception handling works. This commit adds that functionality and adds a test just for call outlining. Call outlining stuff in machine-outliner.mir should be moved into machine-outliner-calls.mir in a later commit. llvm-svn: 327917	2018-03-19 22:48:40 +00:00
Martin Storsjo	9a55c1b0dc	[ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897	2018-03-19 20:06:50 +00:00
Nicolai Haehnle	4186cc7c08	TableGen: Check the dynamic type of !cast<Rec>(string) Summary: The docs already claim that this happens, but so far it hasn't. As a consequence, existing TableGen files get this wrong a lot, but luckily the fixes are all reasonably straightforward. To make this work with all the existing forms of self-references (since the true type of a record is only built up over time), the lookup of self-references in !cast is delayed until the final resolving step. Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164 Reviewers: arsenm, craig.topper, tra, MartinO Subscribers: wdng, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D44475 llvm-svn: 327849	2018-03-19 14:14:20 +00:00
Craig Topper	75aeb62eb4	[AArch64] Fix a few InstRWs in the A53 scheduler model and enable FullInstRWOverlapCheck. This fixes the errors found by the new check added in r327808. llvm-svn: 327812	2018-03-18 22:16:53 +00:00
Craig Topper	e1d6a4df1c	[TableGen] When trying to reuse a scheduler class for instructions from an InstRW, make sure we haven't already seen another InstRW containing this instruction on this CPU. This is similar to the check later when we remap some of the instructions from one class to a new one. But if we reuse the class we don't get to do that check. So many CPUs have violations of this check that I had to add a flag to the SchedMachineModel to allow it to be disabled. Hopefully we can get those cleaned up quickly and remove this flag. A lot of the violations are due to overlapping regular expressions, but that's not the only kind of issue it found. llvm-svn: 327808	2018-03-18 19:56:15 +00:00
Martin Storsjo	36d6419cc5	[AArch64] Skip an unnecessary getCopyToReg in DYNAMIC_STACKALLOC Differential Revision: https://reviews.llvm.org/D44586 llvm-svn: 327779	2018-03-17 20:08:48 +00:00
Jessica Paquette	b3e7dc9144	[MachineOutliner] Make KILLs invisible At the point the outliner runs, KILLs don't impact anything, but they're still considered unique instructions. This commit makes them invisible like DebugValues so that they can still be outlined without impacting outlining decisions. llvm-svn: 327760	2018-03-16 22:53:34 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00
Evandro Menezes	d4254ac1b9	[AArch64] Adjust the cost model for Exynos M3 Fix typo. llvm-svn: 327663	2018-03-15 20:37:32 +00:00
Evandro Menezes	5303f897d4	[AArch64] Adjust the cost model for Exynos M3 Add special case for rotate right. llvm-svn: 327662	2018-03-15 20:31:25 +00:00
Evandro Menezes	1515e859c6	[AArch64] Adjust the cost model for Exynos M3 Increase the number of cheap as move cases of register reset. llvm-svn: 327661	2018-03-15 20:31:13 +00:00
Francis Visoiu Mistrih	164560bd74	[AArch64] Emit CSR loads in the same order as stores Optionally allow the order of restoring the callee-saved registers in the epilogue to be reversed. The flag -reverse-csr-restore-seq generates the following code: ``` stp x26, x25, [sp, #-64]! stp x24, x23, [sp, #16] stp x22, x21, [sp, #32] stp x20, x19, [sp, #48] ; [..] ldp x24, x23, [sp, #16] ldp x22, x21, [sp, #32] ldp x20, x19, [sp, #48] ldp x26, x25, [sp], #64 ret ``` Note how the CSRs are restored in the same order as they are saved. One exception to this rule is the last `ldp`, which allows us to merge the stack adjustment and the ldp into a post-index ldp. This is done by first generating: ldp x26, x27, [sp] add sp, sp, #64 which gets merged by the arm64 load store optimizer into ldp x26, x25, [sp], #64 The flag is disabled by default. llvm-svn: 327569	2018-03-14 20:34:03 +00:00
Francis Visoiu Mistrih	084e7d8770	[AArch64] Keep track of MIFlags in the LoadStoreOptimizer Merging: * $x26, $x25 = frame-setup LDPXi $sp, 0 * $sp = frame-destroy ADDXri $sp, 64, 0 into an LDPXpost should preserve the flags from both instructions as following: * frame-setup frame-destroy LDPXpost Differential Revision: https://reviews.llvm.org/D44446 llvm-svn: 327533	2018-03-14 17:10:58 +00:00
Martin Storsjo	bde677289a	[AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC Support for this relocation is missing in both LLD and GNU binutils at the moment. This reverts the ELF parts of SVN r327316. llvm-svn: 327503	2018-03-14 13:09:10 +00:00
Martin Storsjo	7bc64bd889	[AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str Differential Revision: https://reviews.llvm.org/D44355 llvm-svn: 327316	2018-03-12 18:47:43 +00:00
Martin Storsjo	cc24096d4d	[AArch64] Implement native TLS for Windows Differential Revision: https://reviews.llvm.org/D43971 llvm-svn: 327220	2018-03-10 19:05:21 +00:00
Weiming Zhao	a4259cd3a6	[AArch64] Fix UB about shift amount exceeds data bit-width Summary: Fixes an UB caught by sanitizer. The shift amount might be larger than 32 so the operand should be 1ULL. In this patch, we replace the original expression with existing API with uint64_t type. Reviewers: eli.friedman, rengolin Reviewed By: rengolin Subscribers: rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D44234 llvm-svn: 326969	2018-03-08 00:28:25 +00:00
Rafael Espindola	06c064824e	Delete code that is probably dead since r249303. With r249303 the expression evaluation should expand variables that are not in sections (and so don't have an atom). llvm-svn: 326966	2018-03-08 00:17:13 +00:00
Evandro Menezes	f9bd871d32	[AArch64] Adjust the cost of integer vector division Since there is no instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction. Differential revision: https://reviews.llvm.org/D43974 llvm-svn: 326955	2018-03-07 22:35:32 +00:00
Sebastian Pop	33bdb3e0e6	[AArch64] add missing pattern for insert_subvector undef The attached testcase started failing after the patch to define isExtractSubvectorCheap with the following pattern mismatch: ISEL: Starting pattern match Initial Opcode index to 85068 Match failed at index 85076 LLVM ERROR: Cannot select: t47: v8i16 = insert_subvector undef:v8i16, t43, Constant:i64<0> The code generated from llvm/lib/Target/AArch64/AArch64InstrInfo.td def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i32 0)), (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>; is in ninja/lib/Target/AArch64/AArch64GenDAGISel.inc At the location of the error it is: /* 85076*/ OPC_CheckChild2Type, MVT::i32, And it failed to match the type of operand 2. Adding another def-pat for i64 fixes the failed def-pat error: def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i64 0)), (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>; llvm-svn: 326949	2018-03-07 22:07:13 +00:00
Sebastian Pop	41073e8046	[AArch64] define isExtractSubvectorCheap Following the ARM-neon backend, define isExtractSubvectorCheap to return true when extracting low and high part of a neon register. The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive" all DAG transforms until ISelLowering. The testcase is supposed to check that AArch64TargetLowering::ReconstructShuffle() works, and for that we need a BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to ISelLowering. As there is no way to disable the combiner to only exercise the code in ISelLowering, the patch disables the testcase. Differential revision: https://reviews.llvm.org/D43973 llvm-svn: 326811	2018-03-06 16:54:55 +00:00
Sebastian Pop	ac0bfb5938	fix PR36582 The error occurs when reading i16 elements (as in the testcase) from a v8i8 with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the operation is not a VUZP. The patch stops the pattern recognition of VUZP when EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR. llvm-svn: 326722	2018-03-05 17:35:49 +00:00
Evandro Menezes	cd855f70c5	[AArch64] Improve code generation of constant vectors Use the whole gammut of constant immediates available to set up a vector. Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which transfers between register files, use the more efficient `movi v0.4s, #-1` instead. Not limited to just a few values, but any immediate value that can be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating the need to there be patterns to optimize special cases. Differential revision: https://reviews.llvm.org/D42133 llvm-svn: 326718	2018-03-05 17:02:47 +00:00
Evandro Menezes	2bbb4a7c93	[AArch64] Clean up code (NFC) Clean up a couple of functions in `AArch64TargetLowering` by removing redundant statements. llvm-svn: 326486	2018-03-01 21:17:36 +00:00
Martin Storsjo	c61ff3bef1	[AArch64] Add support for secrel add/load/store relocations for COFF Differential Revision: https://reviews.llvm.org/D43288 llvm-svn: 326480	2018-03-01 20:42:28 +00:00
Sebastian Pop	c33af715d7	[AArch64] generate vuzp instead of mov when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the BUILD_VECTOR with either vuzp1 or vuzp2. With this patch LLVM generates the following code for the first function fun1 in the testcase: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b ext v1.16b, v0.16b, v0.16b, #8 uzp1 v0.8b, v0.8b, v1.8b str d0, [x8] ret Without this patch LLVM currently generates this code: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] tbl v0.16b, { v0.16b }, v0.16b mov v1.16b, v0.16b mov v1.b[1], v0.b[2] mov v1.b[2], v0.b[4] mov v1.b[3], v0.b[6] mov v1.b[4], v0.b[8] mov v1.b[5], v0.b[10] mov v1.b[6], v0.b[12] mov v1.b[7], v0.b[14] str d1, [x8] ret llvm-svn: 326443	2018-03-01 15:47:39 +00:00
Chih-Hung Hsieh	9f9e4681ac	[TLS] use emulated TLS if the target supports only this mode Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341	2018-02-28 17:48:55 +00:00
Aditya Nandakumar	599990530e	[GISel]: Don't assert when constraining RegisterOperands which are uses. Currently we assert that only non target specific opcodes can have missing RegisterClass constraints in the MCDesc. The backend can have instructions with register operands but don't have RegisterClass constraints (say using unknown_class) in which case the instruction defining the register will constrain it. Change the assert to only fire if a def has no regclass. https://reviews.llvm.org/D43409 llvm-svn: 326142	2018-02-26 22:56:21 +00:00
Evandro Menezes	1afffac05b	[PATCH] [AArch64] Add new target feature to fuse conditional select This feature enables the fusion of the comparison and the conditional select instructions together. Differential revision: https://reviews.llvm.org/D42392 llvm-svn: 325939	2018-02-23 19:27:43 +00:00
Geoff Berry	f8bf2ec0a8	[MachineOperand][Target] MachineOperand::isRenamable semantics changes Summary: Add a target option AllowRegisterRenaming that is used to opt in to post-register-allocation renaming of registers. This is set to 0 by default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq fields of all opcodes to be set to 1, causing MachineOperand::isRenamable to always return false. Set the AllowRegisterRenaming flag to 1 for all in-tree targets that have lit tests that were effected by enabling COPY forwarding in MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC, RISCV, Sparc, SystemZ and X86). Add some more comments describing the semantics of the MachineOperand::isRenamable function and how it is set and maintained. Change isRenamable to check the operand's opcode hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of relying on it being consistently reflected in the IsRenamable bit setting. Clear the IsRenamable bit when changing an operand's register value. Remove target code that was clearing the IsRenamable bit when changing registers/opcodes now that this is done conservatively by default. Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in one place covering all opcodes that have constant pipe read limit restrictions. Reviewers: qcolombet, MatzeB Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D43042 llvm-svn: 325931	2018-02-23 18:25:08 +00:00
Hans Wennborg	89c35fc44d	Support for the mno-stack-arg-probe flag Adds support for this flag. There is also another piece for clang (separate review). More info: https://bugs.llvm.org/show_bug.cgi?id=36221 By Ruslan Nikolaev! Differential Revision: https://reviews.llvm.org/D43107 llvm-svn: 325900	2018-02-23 13:46:25 +00:00
Evandro Menezes	5c986b010b	[AArch64] Refactor macro fusion (NFC) Move checks for each fusion case into separate functions for better legibility and maintainability. Differential revision: https://reviews.llvm.org/D43649 llvm-svn: 325844	2018-02-23 00:14:39 +00:00
Evandro Menezes	72f3983633	[AArch64] Refactor instructions using SIMD immediates Get rid of icky goto loops and make the code easier to maintain. Otherwise, NFC. Restore r324903 and fix PR36369. Differentail revision: https://reviews.llvm.org/D43364 llvm-svn: 325621	2018-02-20 20:31:45 +00:00
Amara Emerson	db211892ed	[AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first. This is a follow on commit to r[x] where we fix the other direction of copy. For this case, after converting the source from gpr32 -> fpr32, we use a subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land. https://reviews.llvm.org/D43444 llvm-svn: 325550	2018-02-20 05:11:57 +00:00
Francis Visoiu Mistrih	68ced40a23	Revert "[CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print" This reverts commit r324681. llvm-svn: 325505	2018-02-19 15:08:49 +00:00
Amara Emerson	242efdb54b	Fix unused assertion variable warning. llvm-svn: 325464	2018-02-18 17:28:34 +00:00
Amara Emerson	7e9f348b2d	[AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied to gpr register banks. PR36345. rdar://36478867 Differential Revision: https://reviews.llvm.org/D43310 llvm-svn: 325463	2018-02-18 17:10:49 +00:00
Amara Emerson	bc03baef77	[AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits. These are needed for operations on fp16 types in a later patch. llvm-svn: 325462	2018-02-18 17:03:02 +00:00
Haicheng Wu	aed6e52b3c	[AArch64] Coalesce Copy Zero during instruction selection Add special case for copy of zero to avoid a double copy. Differential Revision: https://reviews.llvm.org/D36104 llvm-svn: 325459	2018-02-18 13:51:33 +00:00
Martin Storsjo	a63a5b993e	[AArch64] Implement dynamic stack probing for windows This makes sure that alloca() function calls properly probe the stack as needed. Differential Revision: https://reviews.llvm.org/D42356 llvm-svn: 325433	2018-02-17 14:26:32 +00:00
Simon Pilgrim	63db669013	Fix unused variable warning. NFCI. We were casting to AArch64InstrInfo but only using it for static methods which some compilers complain about. llvm-svn: 325432	2018-02-17 13:48:23 +00:00
Evandro Menezes	10ae20d80c	[AArch64] Fix BITCAST lowering crash The data type is assumed to be a vector, but sometimes it is not, leading to an assertion. Add simple test-case to verify this. Differential revision: https://reviews.llvm.org/D42599 llvm-svn: 325378	2018-02-16 20:00:57 +00:00
Daniel Sanders	7fc87360e9	[globalisel][legalizerinfo] Follow up on post-commit review comments after r323681 * Document most API's * Delete a useless function call * Fix a discrepancy between the single and multi-opcode variants of getActionDefinitions(). The multi-opcode variant now requires that more than one opcode is requested. Previously it acted much like the single-opcode form but unnecessarily enforced the requirements of the multi-opcode form. llvm-svn: 325067	2018-02-13 23:02:44 +00:00
Hans Wennborg	f381e94ac8	Revert r324903 "[AArch64] Refactor identification of SIMD immediates" It caused "Cannot select: t33: f64 = AArch64ISD::FMOV Constant:i32<0>" in Chromium builds. See PR36369. > Get rid of icky goto loops and make the code easier to maintain (NFC). > > Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 325034	2018-02-13 18:14:38 +00:00
Abderrazek Zaafrani	e72d99261f	[AArch64] Fixes for ARMv8.2-A FP16 scalar intrinsic - llvm portion https://reviews.llvm.org/D42993 llvm-svn: 324912	2018-02-12 17:35:42 +00:00
Oliver Stannard	02f08c9d1f	[AArch64] Improve v8.1-A code-gen for atomic load-and Armv8.1-A added an atomic load-clear instruction (which performs bitwise and with the complement of it's operand), but not a load-and instruction. Our current code-generation for atomic load-and always inserts an MVN instruction to invert its argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-and operation into an xor with -1 and a load-clear, allowing the normal DAG optimisations to work on it. To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't see any easy way to do this with an AArch64-specific ISD node, because the code-generation for atomic operations assumes the SDNodes are of type AtomicSDNode. I've left the old tablegen patterns in because they are still needed for global isel. Differential revision: https://reviews.llvm.org/D42478 llvm-svn: 324908	2018-02-12 17:03:11 +00:00
Evandro Menezes	7dc0f1ec45	[AArch64] Refactor identification of SIMD immediates Get rid of icky goto loops and make the code easier to maintain (NFC). Differential revision: https://reviews.llvm.org/D42723 llvm-svn: 324903	2018-02-12 16:41:41 +00:00
Oliver Stannard	4269917304	[AArch64] Improve v8.1-A code-gen for atomic load-subtract Armv8.1-A added an atomic load-add instruction, but not a load-subtract instruction. Our current code-generation for atomic load-subtract always inserts a NEG instruction to negate it's argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-subtract operation into a subtract and a load-add, allowing the normal DAG optimisations to work on it. I've left the old tablegen patterns in because they are still needed for global isel. Some of the tests in this patch are copied from D35375 by Chad Rosier (which was abandoned). Differential revision: https://reviews.llvm.org/D42477 llvm-svn: 324892	2018-02-12 14:22:03 +00:00
Daniel Neilson	4a58b4b52c	[AArch64FastISel] Replace deprecated calls to MemoryIntrinsic::getAlignment() (NFCI) Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes AArch64FastISel to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324773	2018-02-09 21:49:29 +00:00
Evandro Menezes	205c0e085e	[AArch64] Adjust the cost model for Exynos M3 Fix the modeling of transfers between a generic register and a partial ASIMD one. llvm-svn: 324766	2018-02-09 19:26:11 +00:00
Evandro Menezes	b5f12090fc	[AArch64] Refactor stand alone methods (NFC) Make stand alone methods in AArch64InstrInfo static. llvm-svn: 324745	2018-02-09 16:14:41 +00:00
Jonas Paulsson	7850601fa3	[AArch64] Return true in enableMultipleCopyHints(). Enable multiple COPY hints to eliminate more COPYs during register allocation. Note that this is something all targets should do, see https://reviews.llvm.org/D38128. Review: Martin Storsjö llvm-svn: 324720	2018-02-09 09:22:20 +00:00
Francis Visoiu Mistrih	d65438d0ca	[CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print MBB.print wasn't printing it, but the MIRPrinter is printing it. The goal is to unify that as much as possible. llvm-svn: 324681	2018-02-08 23:42:27 +00:00
Sjoerd Meijer	5ea465ded7	[AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled. I've not added any tests, because the problem was visible in: test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll, which I had to change: I don't think Cyclone has FullFP16 enabled by default, so it shouldn't be using this v8.2a instruction. I've also removed these rdar tags, please shout if there are any objections. Differential Revision: https://reviews.llvm.org/D43020 llvm-svn: 324581	2018-02-08 08:39:05 +00:00
Evandro Menezes	cb7959fd78	[AArch64] Adjust the cost model for Exynos M3 Fix the modeling of long division and SIMD conversion from integer and horizontal minimum and maximum. llvm-svn: 324417	2018-02-06 22:35:47 +00:00
Sander de Smalen	81fcf865be	[AArch64][SVE] Asm: Add AND_ZI instructions and aliases Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases. Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D42295 llvm-svn: 324343	2018-02-06 13:13:21 +00:00
Oliver Stannard	6df8f43c4d	[AArch64] Fix spelling of ICH_ELRSR_EL2 system register This register was mis-spelled as ICH_ELSR_EL2, but has the correct encoding for ICH_ELRSR_EL2. llvm-svn: 324325	2018-02-06 09:39:04 +00:00
Oliver Stannard	ee0ac39305	[ARM][AArch64] Add CSDB speculation barrier instruction This adds the CSDB instruction, which is a new barrier instruction described by the whitepaper at [1]. This is in encoding space which was previously executed as a NOP, so it is available for all targets that have the relevant NOP encoding space. This matches the binutils behaviour for these instructions [2][3]. [1] https://developer.arm.com/support/security-update [2] https://sourceware.org/ml/binutils/2018-01/msg00116.html [3] https://sourceware.org/ml/binutils/2018-01/msg00120.html llvm-svn: 324324	2018-02-06 09:24:47 +00:00
Amara Emerson	3838ed0370	[AArch64][GlobalISel] Use getRegClassForTypeOnBank() in selectCopy. Differential Revision: https://reviews.llvm.org/D42832 llvm-svn: 324110	2018-02-02 18:03:30 +00:00
Amara Emerson	58aea52bc4	[GlobalISel] Constrain the dest reg of IMPLICT_DEF. This fixes a crash where the user is a COPY, which deliberately does not constrain its source operands, resulting in a vreg without a reg class escaping selection. Differential Revision: https://reviews.llvm.org/D42697 llvm-svn: 324047	2018-02-02 01:44:43 +00:00
Sanjay Patel	d7bed12192	[AArch64] remove bogus comment; NFC I added this comment with D42323, but as discussed in D42806, the architecture does the right thing for denorms. We don't even need the select on 0.0 here? llvm-svn: 323996	2018-02-01 19:59:33 +00:00
Sanjay Patel	657e5d8d41	[DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994) As shown in the example in PR34994: https://bugs.llvm.org/show_bug.cgi?id=34994 ...we can return a very wrong answer (inf instead of 0.0) for square root when using a reciprocal square root estimate instruction. Here, I've conditionalized the filtering out of denorms based on the function having "denormal-fp-math"="ieee" in its attributes. The other options for this attribute are 'preserve-sign' and 'positive-zero'. So we don't generate this extra code by default with just '-ffast-math' (because then there's no denormal attribute string at all), but it works if you specify '-ffast-math -fdenormal-fp-math=ieee' from clang. As noted in the review, there may be other problems in clang that affect the results depending on platform (Linux x86 at least), but this should allow creating the desired codegen. Differential Revision: https://reviews.llvm.org/D42323 llvm-svn: 323981	2018-02-01 16:57:18 +00:00
Clement Courbet	ea8d07eb76	[AArch64][NFC] Make all ProcResource definitions include their SchedModel. This makes targets ExynosM1,ExynosM3,ThunderX2T99 consistent with all other targets. llvm-svn: 323955	2018-02-01 12:12:01 +00:00
Martin Storsjo	708498a164	[AArch64] Properly handle dllimport of variables when using fast-isel Differential Revision: https://reviews.llvm.org/D42567 llvm-svn: 323810	2018-01-30 19:50:51 +00:00
Evandro Menezes	f1d01645a7	[AArch64] Add new target feature to fuse address generation with load or store This feature enables the fusion of the address generation and a corresponding load or store together. Differential revision: https://reviews.llvm.org/D42393 llvm-svn: 323782	2018-01-30 16:28:01 +00:00
Evandro Menezes	07c78eeeef	[AArch64] Add new target feature to handle cheap as move for Exynos This feature enables special handling of cheap as move in the existing custom handling specifically for Exynos processors. Differential revision: https://reviews.llvm.org/D42387 llvm-svn: 323774	2018-01-30 15:40:22 +00:00
Evandro Menezes	9f9daa1f14	[AArch64] Add pipeline model for Exynos M3 Add the scheduling and cost model for Exynos M3. Differential revision: https://reviews.llvm.org/D42387 llvm-svn: 323773	2018-01-30 15:40:16 +00:00
Evandro Menezes	1589d6e6a3	[AArch64] Change the filename of the Exynos M1 scheduling defs After request by Matthias Braun in https://reviews.llvm.org/D42387. llvm-svn: 323686	2018-01-29 20:22:24 +00:00
Jun Bum Lim	fc7d56d949	Revert "AArch64: Omit callframe setup/destroy when not necessary" This reverts commit r322917 due to multiple performance regressions in spec2006 and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially motivated this change. llvm-svn: 323683	2018-01-29 19:56:42 +00:00
Daniel Sanders	79cb839fcd	[globalisel][legalizer] Adapt LegalizerInfo to support inter-type dependencies and other things. Summary: As discussed in D42244, we have difficulty describing the legality of some operations. We're not able to specify relationships between types. For example, declaring the following setAction({..., 0, s32}, Legal) setAction({..., 0, s64}, Legal) setAction({..., 1, s32}, Legal) setAction({..., 1, s64}, Legal) currently declares these type combinations as legal: {s32, s32} {s64, s32} {s32, s64} {s64, s64} but we currently have no means to say that, for example, {s64, s32} is not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/ G_UNMERGE_VALUES have relationships between the types that are currently described incorrectly. Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics differently to atomics. The necessary information is in the MMO but we have no way to use this in the legalizer. Similarly, there is currently no way for the register type and the memory type to differ so there is no way to cleanly represent extending-load/truncating-store in a way that can't be broken by optimizers (resulting in illegal MIR). It's also difficult to control the legalization strategy. We've added support for legalizing non-power of 2 types but there's still some hardcoded assumptions about the strategy. The main one I've noticed is that type0 is always legalized before type1 which is not a good strategy for `type0 = G_EXTRACT type1, ...` if you need to widen the container. It will converge on the same result eventually but it will take a much longer route when legalizing type0 than if you legalize type1 first. Lastly, the definition of legality and the legalization strategy is kept separate which is not ideal. It's helpful to be able to look at a one piece of code and see both what is legal and the method the legalizer will use to make illegal MIR more legal. This patch adds a layer onto the LegalizerInfo (to be removed when all targets have been migrated) which resolves all these issues. Here are the rules for shift and division: for (unsigned BinOp : {G_LSHR, G_ASHR, G_SDIV, G_UDIV}) getActionDefinitions(BinOp) .legalFor({s32, s64}) // If type0 is s32/s64 then it's Legal .clampScalar(0, s32, s64) // If type0 is <s32 then WidenScalar to s32 // If type0 is >s64 then NarrowScalar to s64 .widenScalarToPow2(0) // Round type0 scalars up to powers of 2 .unsupported(); // Otherwise, it's unsupported This describes everything needed to both define legality and describe how to make illegal things legal. Here's an example of a complex rule: getActionDefinitions(G_INSERT) .unsupportedIf([=](const LegalityQuery &Query) { // If type0 is smaller than type1 then it's unsupported return Query.Types[0].getSizeInBits() <= Query.Types[1].getSizeInBits(); }) .legalIf([=](const LegalityQuery &Query) { // If type0 is s32/s64/p0 and type1 is a power of 2 other than 2 or 4 then it's legal // We don't need to worry about large type1's because unsupportedIf caught that. const LLT &Ty0 = Query.Types[0]; const LLT &Ty1 = Query.Types[1]; if (Ty0 != s32 && Ty0 != s64 && Ty0 != p0) return false; return isPowerOf2_32(Ty1.getSizeInBits()) && (Ty1.getSizeInBits() == 1 \|\| Ty1.getSizeInBits() >= 8); }) .clampScalar(0, s32, s64) .widenScalarToPow2(0) .maxScalarIf(typeInSet(0, {s32}), 1, s16) // If type0 is s32 and type1 is bigger than s16 then NarrowScalar type1 to s16 .maxScalarIf(typeInSet(0, {s64}), 1, s32) // If type0 is s64 and type1 is bigger than s32 then NarrowScalar type1 to s32 .widenScalarToPow2(1) // Round type1 scalars up to powers of 2 .unsupported(); This uses a lambda to say that G_INSERT is unsupported when type0 is bigger than type1 (in practice, this would be a default rule for G_INSERT). It also uses one to describe the legal cases. This particular predicate is equivalent to: .legalFor({{s32, s1}, {s32, s8}, {s32, s16}, {s64, s1}, {s64, s8}, {s64, s16}, {s64, s32}}) In terms of performance, I saw a slight (~6%) performance improvement when AArch64 was around 30% ported but it's pretty much break even right now. I'm going to take a look at constexpr as a means to reduce the initialization cost. Future work: * Make it possible for opcodes to share rulesets. There's no need for G_LSHR/G_ASHR/G_SDIV/G_UDIV to have separate rule and ruleset objects. There's no technical barrier to this, it just hasn't been done yet. * Replace the type-index numbers with an enum to get .clampScalar(Type0, s32, s64) * Better names for things like .maxScalarIf() (clampMaxScalar?) and the vector rules. * Improve initialization cost using constexpr Possible future work: * It's possible to make these rulesets change the MIR directly instead of returning a description of how to change the MIR. This should remove a little overhead caused by parsing the description and routing to the right code, but the real motivation is that it removes the need for LegalizeAction::Custom. With Custom removed, there's no longer a requirement that Custom legalization change the opcode to something that's considered legal. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner Reviewed By: bogner Subscribers: hintonda, bogner, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42251 llvm-svn: 323681	2018-01-29 19:54:49 +00:00
Daniel Sanders	9ade5592d9	[globalisel] Make LegalizerInfo::LegalizeAction available outside of LegalizerInfo. NFC Summary: The improvements to the LegalizerInfo discussed in D42244 require that LegalizerInfo::LegalizeAction be available for use in other classes. As such, it needs to be moved out of LegalizerInfo. This has been done separately to the next patch to minimize the noise in that patch. llvm-svn: 323669	2018-01-29 17:37:29 +00:00
Sander de Smalen	a1c259c22c	[AArch64][AsmParser] NFC: Generalize LogicalImm[Not](32\|64) code Summary: All variants of isLogicalImm[Not](32\|64) can be combined into a single templated function, same for printLogicalImm(32\|64). By making it use a template instead, further SVE patches can use it for other data types as well (e.g. 8, 16 bits). Reviewers: fhahn, rengolin, aadg, echristo, kristof.beyls, samparker Reviewed By: samparker Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42294 llvm-svn: 323646	2018-01-29 13:05:38 +00:00
Oliver Stannard	a9d2e004d2	[AArch64] Generate the CASP instruction for 128-bit cmpxchg The Large System Extension added an atomic compare-and-swap instruction that operates on a pair of 64-bit registers, which we can use to implement a 128-bit cmpxchg. Because i128 is not a legal type for AArch64 we have to do all of the instruction selection in C++, and the instruction requires even/odd register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM backend. Differential revision: https://reviews.llvm.org/D42104 llvm-svn: 323634	2018-01-29 09:18:37 +00:00
Craig Topper	8f324bb1a4	[SelectionDAGISel] Add a debug print before call to Select. Adjust where blank lines are printed during isel process to make things more sensibly grouped. Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table. It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search. There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line. llvm-svn: 323551	2018-01-26 19:34:20 +00:00
Joel Jones	0715092c65	[AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others. This patch enables aggressive FMA by default on T99, and provides a -mllvm option to enable the same on other AArch64 micro-arch's (-mllvm -aarch64-enable-aggressive-fma). Test case demonstrating the effects on T99 is included. Patch by: steleman (Stefan Teleman) Differential Revision: https://reviews.llvm.org/D40696 llvm-svn: 323474	2018-01-25 21:55:39 +00:00
Amara Emerson	4f84f8862b	[AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load. The tablegen imported patterns for sext(load(a)) don't check for single uses of the load or delete the original after matching. As a result two loads are left in the generated code. This particular issue will be fixed by adding support for a G_SEXTLOAD opcode in future. There are however other potential issues around this that wouldn't be fixed by a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile loads at all in the AArch64 selector. Fixes/works around PR36018. llvm-svn: 323371	2018-01-24 20:35:37 +00:00
Pablo Barrio	9b3d4c01a0	[AArch64] Avoid unnecessary vector byte-swapping in big-endian Summary: Loads/stores of some NEON vector types are promoted to other vector types with different lane sizes but same vector size. This is not a problem in little-endian but, when in big-endian, it requires additional byte reversals required to preserve the lane ordering while keeping the right endianness of the data inside each lane. For example: %1 = load <4 x half>, <4 x half>* %p results in the following assembly: ld1 { v0.2s }, [x1] rev32 v0.4h, v0.4h This patch changes the promotion of these loads/stores so that the actual vector load/store (LD1/ST1) takes care of the endianness correctly and there is no need for further byte reversals. The previous code now results in the following assembly: ld1 { v0.4h }, [x1] Reviewers: olista01, SjoerdMeijer, efriedma Reviewed By: efriedma Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D42235 llvm-svn: 323325	2018-01-24 14:13:47 +00:00
Matthias Braun	70fd374d1e	AArch64: Cyclone: Remove SlowMisaligned128Store tuning flag Remove FeatureSlowMisaligned128Store from cyclone flags. This flag causes splitting of 16 byte wide stores into 2 stored of 8 bytes. This was useful on older apple CPUs which were slow for 16byte stores that were not aligned on 16byte. As the compiler often cannot predict the actual alignment, the splitting was choosen. This has been a topic for a lot of debate as the splitting also decreases performance for some benchmarks. Measuring the effects on newer apple chips (rdar://35525421) shows that it harms more cases than it helps. So it is time to retire this workaround. llvm-svn: 323289	2018-01-24 00:39:53 +00:00
Tim Northover	f9b560aa8e	AArch64: get type from correct result when forming BFX Some nodes produce multiple values so when obtaining the type of an ISD::OR we need to make sure we ask for the correct one. Hopefully that's all of them. llvm-svn: 323205	2018-01-23 15:11:27 +00:00
Tim Northover	9f3003d08f	AArch64: get type from correct result when forming BFI/BFM Some nodes produce multiple values so when obtaining the type of an ISD::OR we need to make sure we ask for the correct one. llvm-svn: 323202	2018-01-23 14:37:03 +00:00
Evandro Menezes	312443fd83	[AArch64] Create a separate feature set for Exynos M3 Distinguish the features from Exynos M2. llvm-svn: 323139	2018-01-22 19:03:26 +00:00
Sander de Smalen	7ab96f534c	[AArch64][SVE] Asm: PTRUE and PTRUES instructions Summary: These instructions initialize a predicate vector from a pattern/immediate. Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover, samparker, olista01 Reviewed By: samparker Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41819 llvm-svn: 323124	2018-01-22 15:29:19 +00:00
Carey Williams	da15b5b116	[AArch64] optimise v4f16 fcmps to utilise vector instructions Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported. Generating FCTVL(s) rather than a longer series of FCVTs. Differential Revision: https://reviews.llvm.org/D41772 llvm-svn: 323118	2018-01-22 14:16:11 +00:00
Sander de Smalen	245e0e67f3	[AArch64][SVE] Asm: Predicate patterns Summary: This patch adds support for parsing/printing of named or unnamed patterns that are used in SVE's PTRUE instruction, amongst others. The pattern can be specified as a named pattern to initialize the predicate vector or it can be specified as an immediate in the range 0-31. Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41818 llvm-svn: 323098	2018-01-22 10:46:00 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Carey Williams	22c49c6470	Test commit llvm-svn: 322958	2018-01-19 16:55:23 +00:00
Sander de Smalen	909cf956a1	[AArch64][SVE] Asm: Add support for RDVL/ADDVL/ADDPL instructions Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: SjoerdMeijer, aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41900 llvm-svn: 322951	2018-01-19 15:22:00 +00:00
Matthias Braun	5c290dc206	AArch64: Fix emergency spillslot being out of reach for large callframes Re-commit of r322200: The testcase shouldn't hit machineverifiers anymore with r322917 in place. Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322919	2018-01-19 03:16:36 +00:00
Matthias Braun	dc4b3e87f4	AArch64: Omit callframe setup/destroy when not necessary Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to setup and the callframe size is 0. - Fixes an invalid callframe nesting for byval arguments, which would look like this before this patch (as in `big-byval.ll`): ... ADJCALLSTACKDOWN 32768, 0, ... # Setup for extfunc ... ADJCALLSTACKDOWN 0, 0, ... # setup for memcpy ... BL &memcpy ... ADJCALLSTACKUP 0, 0, ... # destroy for memcpy ... BL &extfunc ADJCALLSTACKUP 32768, 0, ... # destroy for extfunc - Saves us two instructions in the common case of zero-sized stackframes. - Remove an unnecessary scheduling barrier (hence the small unittest changes). Differential Revision: https://reviews.llvm.org/D42006 llvm-svn: 322917	2018-01-19 02:45:38 +00:00
Amara Emerson	d5785775f8	[AArch64][GlobalISel] Add isel support for global values in the large code model. Fixes PR35958. Differential Revision: https://reviews.llvm.org/D42175 llvm-svn: 322878	2018-01-18 19:21:27 +00:00
Reid Kleckner	1aa9061c5f	[CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64 Every known PE COFF target emits /EXPORT: linker flags into a .drective section. The AsmPrinter should handle this. While we're at it, use global_values() and emit each export flag with its own .ascii directive. This should make the .s file output more readable. llvm-svn: 322788	2018-01-17 23:55:23 +00:00
Volkan Keles	a79b0620a0	Add a TargetOption to enable/disable GlobalISel Summary: This patch adds a new target option in order to control GlobalISel. This will allow the users to enable/disable GlobalISel prior to the backend by calling `TargetMachine::setGlobalISel(bool Enable)`. No test case as there is already a test to check GlobalISel command line options. See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll. Reviewers: qcolombet, aemerson, ab, dsanders Reviewed By: qcolombet Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42137 llvm-svn: 322773	2018-01-17 22:34:21 +00:00
Aditya Nandakumar	18b3f9d384	[GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC https://reviews.llvm.org/D42149 llvm-svn: 322743	2018-01-17 19:31:33 +00:00
Pablo Barrio	f2c29571da	[AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian Summary: Loading a vector of 4 half-precision FP sometimes results in an LD1 of 2 single-precision FP + a reversal. This results in an incorrect byte swap due to the conversion from little endian to big endian. In order to generate the correct byte swap, it is easier to generate the correct LD1 of 4 half-precision FP, thus avoiding the subsequent reversal. Reviewers: craig.topper, jmolloy, olista01 Reviewed By: olista01 Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41863 llvm-svn: 322663	2018-01-17 14:39:29 +00:00
Volkan Keles	f7f2568613	[GlobalISel][TableGen] Add support for SDNodeXForm Summary: This patch adds CustomRenderer which renders the matched operands to the specified instruction. Targets can enable the matching of SDNodeXForm by adding a definition that inherits from GICustomOperandRenderer and GISDNodeXFormEquiv as follows. def gi_imm8 : GICustomOperandRenderer<"renderImm8”>, GISDNodeXFormEquiv<imm8_xform>; Custom renderer functions should be of the form: void render(MachineInstrBuilder &MIB, const MachineInstr &I); Reviewers: dsanders, ab, rovka Reviewed By: dsanders Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet Differential Revision: https://reviews.llvm.org/D42012 llvm-svn: 322582	2018-01-16 18:44:05 +00:00
Sander de Smalen	5aa809db79	[AArch64][AsmParser] Cleanup isSImm7s4, isSImm7s8, (etc) functions. Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, samparker Reviewed By: fhahn, samparker Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41899 llvm-svn: 322481	2018-01-15 12:47:17 +00:00
Jessica Paquette	757e120379	[MachineOutliner] Move hasAddressTaken check to MachineOutliner.cpp Mostly NFC. Still updating the test though just for completeness. This moves the hasAddressTaken check to MachineOutliner.cpp and replaces it with a per-basic block test rather than a per-function test. The old test was too conservative and was preventing functions in C programs from being outlined even though they were safe to outline. This was mostly a problem in C sources. llvm-svn: 322425	2018-01-13 00:42:28 +00:00
Evandro Menezes	2e05279399	[AArch64] Fix scheduling resources for post indexed loads and stores Fix typos in the default scheduling resources when using the post indexed addressing modes. Differential revision: https://reviews.llvm.org/D40511 llvm-svn: 322392	2018-01-12 19:20:11 +00:00
Evgeniy Stepanov	99fa3e774d	[hwasan] Stack instrumentation. Summary: Very basic stack instrumentation using tagged pointers. Tag for N'th alloca in a function is built as XOR of: * base tag for the function, which is just some bits of SP (poor man's random) * small constant which is a function of N. Allocas are aligned to 16 bytes. On every ReturnInst allocas are re-tagged to catch use-after-return. This implementation has a bunch of issues that will be taken care of later: 1. lifetime intrinsics referring to tagged pointers are not recognized in SDAG. This effectively disables stack coloring. 2. Generated code is quite inefficient. There is one extra instruction at each memory access that adds the base tag to the untagged alloca address. It would be better to keep tagged SP in a callee-saved register and address allocas as an offset of that XOR retag, but that needs better coordination between hwasan instrumentation pass and prologue/epilogue insertion. 3. Lifetime instrinsics are ignored and use-after-scope is not implemented. This would be harder to do than in ASan, because we need to use a differently tagged pointer depending on which lifetime.start / lifetime.end the current instruction is dominated / post-dominated. Reviewers: kcc, alekseyshl Subscribers: srhines, kubamracek, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41602 llvm-svn: 322324	2018-01-11 22:53:30 +00:00
Joel Jones	90a60501c3	[AArch64] Remove Unsupported = 1 flag for the WriteAtomic WriteRes. In practice, this patch has no effect on scheduling. There is no test case as there already exists a comprehensive test case for LSE Atomics. Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D40694 llvm-svn: 322291	2018-01-11 16:50:56 +00:00
Matthias Braun	e3a8db7ba1	Revert "AArch64: Fix emergency spillslot being out of reach for large callframes" Revert for now as the testcase is hitting a pre-existing verifier error that manifest as a failure when expensive checks are enabled (or -verify-machineinstrs) is used. This reverts commit r322200. llvm-svn: 322231	2018-01-10 22:36:28 +00:00
Jessica Paquette	c191f1097c	[MachineOutliner] Outline ADRPs ADRP instructions weren't being outlined because they're PC-relative and thus fail the LR checks. This patch adds a special case for ADRPs to getOutliningType to make sure that ADRPs can be outlined and updates the MIR test. llvm-svn: 322207	2018-01-10 18:49:57 +00:00
Matthias Braun	b42ffa1283	AArch64: Fix emergency spillslot being out of reach for large callframes Large callframes (calls with several hundreds or thousands or parameters) could lead to situations in which the emergency spillslot is out of range to be addressed relative to the stack pointer. This commit forces the use of a frame pointer in the presence of large callframes. This commit does several things: - Compute max callframe size at the end of instruction selection. - Add mirFileLoaded target callback. Use it to compute the max callframe size after loading a .mir file when the size wasn't specified in the file. - Let TargetFrameLowering::hasFP() return true if there exists a callframe > 255 bytes. - Always place the emergency spillslot close to FP if we have a frame pointer. - Note that `useFPForScavengingIndex()` would previously return false when a base pointer was available leading to the emergency spillslot getting allocated late (that's the whole effect of this callback). Which made no sense to me so I took this case out: Even though the emergency spillslot is technically not referenced by FP in this case we still want it allocated early. Differential Revision: https://reviews.llvm.org/D40876 llvm-svn: 322200	2018-01-10 18:16:24 +00:00
Sander de Smalen	a7ec090eaa	[AArch64][SVE] Asm: Add support for (mov\|dup) of scalar Summary: This patch adds support for 'dup' (Scalar -> SVE) and its corresponding 'mov' alias. Reviewers: fhahn, rengolin, evandro, echristo Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41822 llvm-svn: 322172	2018-01-10 11:32:47 +00:00
Sander de Smalen	886510f350	[TableGen][AsmMatcherEmitter] Generate assembler checks for tied operands Summary: This extends TableGen's AsmMatcherEmitter with code that generates a table with tied-operand constraints. The constraints are checked when parsing the instruction. If an operand is not equal to its tied operand, the assembler will give an error. Patch [2/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB. Reviewers: olista01, rengolin, mcrosier, fhahn, craig.topper, evandro, echristo Reviewed By: fhahn Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D41446 llvm-svn: 322166	2018-01-10 10:10:56 +00:00
Sander de Smalen	906a5deace	Recommit r322073: [AArch64][SVE] Asm: Add predicated ADD/SUB instructions Fixed issue that was found on sanitizer-x86_64-linux-fast. I changed the result type of 'Parser.getTok().getString().lower()' in AArch64AsmParser::tryParseSVEPredicateVector() from 'StringRef' to 'auto', since StringRef::lower() returns a std::string. llvm-svn: 322092	2018-01-09 17:01:27 +00:00
Sander de Smalen	6595603187	Reverted r322073 because of AddressSanitizer failure on sanitizer-x86_64-linux-fast builder. llvm-svn: 322077	2018-01-09 13:51:09 +00:00
Sander de Smalen	1f97363e5f	[AArch64][SVE] Asm: Add predicated ADD/SUB instructions Summary: Add the predicated ADD/SUB instructions and corresponding tests. Patch [3/3] in a series to add predicated ADD/SUB instructions for SVE. Reviewers: rengolin, mcrosier, evandro, fhahn, echristo Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D41443 llvm-svn: 322073	2018-01-09 12:43:46 +00:00
Sander de Smalen	7868e74033	[AArch64][SVE] Asm: Add parsing of merging/zeroing suffix for SVE predicate vector operands Summary: Parsing of the '/m' (merging) or '/z' (zeroing) suffix of a predicate operand. Patch [2/3] in a series to add predicated ADD/SUB instructions for SVE. Reviewers: rengolin, mcrosier, evandro, fhahn, echristo, MatzeB, t.p.northover Reviewed By: fhahn Subscribers: t.p.northover, MatzeB, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D41442 llvm-svn: 322070	2018-01-09 11:17:06 +00:00
Jessica Paquette	3291e7353e	[MachineOutliner] AArch64: Handle instrs that use SP and will never need fixups This commit does two things. Firstly, it adds a collection of flags which can be passed along to the target to encode information about the MBB that an instruction lives in to the outliner. Second, it adds some of those flags to the AArch64 outliner in order to add more stack instructions to the list of legal instructions that are handled by the outliner. The two flags added check if - There are calls in the MachineBasicBlock containing the instruction - The link register is available in the entire block If the link register is available and there are no calls, then a stack instruction can always be outlined without fixups, regardless of what it is, since in this case, the outliner will never modify the stack to create a call or outlined frame. The motivation for doing this was checking which instructions are most often missed by the outliner. Instructions like, say %sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy are very common, but cannot be outlined in the case that the outliner might modify the stack. This commit allows us to outline instructions like this. llvm-svn: 322048	2018-01-09 00:26:18 +00:00
Reid Kleckner	5619669a5a	Fix -Wsign-compare warnings on Windows These arise because enums are 'int' by default. llvm-svn: 321887	2018-01-05 19:53:51 +00:00
Evandro Menezes	6161a0b3b0	[AArch64] Improve code generation of vector build Instead of using, for example, `dup v0.4s, wzr`, which transfers between register files, use the more efficient `movi v0.4s, #0` instead. Differential revision: https://reviews.llvm.org/D41515 llvm-svn: 321824	2018-01-04 21:43:12 +00:00
Sander de Smalen	dc5e081b93	[AArch64][SVE] Asm: Add restricted register classes for SVE predicate vectors. Summary: Add a register class for SVE predicate operands that can only be p0-p7 (as opposed to p0-p15) Patch [1/3] in a series to add predicated ADD/SUB instructions for SVE. Reviewers: rengolin, mcrosier, evandro, fhahn, echristo, olista01, SjoerdMeijer, javed.absar Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41441 llvm-svn: 321699	2018-01-03 10:15:46 +00:00
Alex Bradbury	b22f751fa7	Thread MCSubtargetInfo through Target::createMCAsmBackend Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. D20830 threaded an MCSubtargetInfo reference through MCAsmBackend::relaxInstruction, but this isn't the only function that would benefit from access. This patch removes the Triple and CPUString arguments from createMCAsmBackend and replaces them with MCSubtargetInfo. This patch just changes the interface without making any intentional functional changes. Once in, several cleanups are possible: * Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend * Support 16-bit instructions when valid in MipsAsmBackend::writeNopData * Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl * Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221) This change initially exposed PR35686, which has since been resolved in r321026. Differential Revision: https://reviews.llvm.org/D41349 llvm-svn: 321692	2018-01-03 08:53:05 +00:00
Amara Emerson	854d10d10b	[AArch64][GlobalISel] Enable GlobalISel at -O0 by default Tests updated to explicitly use fast-isel at -O0 instead of implicitly. This change also allows an explicit -fast-isel option to override an implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0. Differential Revision: https://reviews.llvm.org/D41362 llvm-svn: 321655	2018-01-02 16:30:47 +00:00
Sander de Smalen	c9b3e1cf03	[AArch64][AsmParser] Add isScalarReg() and repurpose isReg() Summary: isReg() in AArch64AsmParser.cpp is a bit of a misnomer, and would be better named 'isScalarReg()' instead. Patch [1/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB. Reviewers: rengolin, mcrosier, evandro, fhahn, echristo Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D41445 llvm-svn: 321646	2018-01-02 13:39:44 +00:00
Craig Topper	a4f9997675	[SelectionDAG][X86][AArch64] Require targets to specify the promotion type when using setOperationAction Promote for INT_TO_FP and FP_TO_INT Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same. If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works. getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening. FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum. Differential Revision: https://reviews.llvm.org/D40664 llvm-svn: 321629	2018-01-01 19:21:35 +00:00
Matthew Simpson	9439f54902	[AArch64] Change order of candidate FMLS patterns r319980 added new patterns to the machine combiner for transforming (fsub (fmul x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source operand is an fmul are transformed. We previously only matched the case where the second source operand of an fsub was an fmul, transforming (fsub z (fmul x y)) into (fmls z x y). Now, if we have an fsub where both source operands are fmuls, both of the above patterns are applicable. However, the order in which we add the patterns to the list of candidates determines the transformation that takes place, since only the first pattern that matches will be used. This patch changes the order these two patterns are added to the list of candidates such that we prefer the case where the second source operand is an fmul (the fmls case), rather than the other one (the fmla/fneg case). When both source operands are fmuls, this ordering results in fewer instructions. Differential Revision: https://reviews.llvm.org/D41587 llvm-svn: 321491	2017-12-27 15:25:01 +00:00
Sanjoy Das	26d11ca4b0	(Re-landing) Expose a TargetMachine::getTargetTransformInfo function Re-land r321234. It had to be reverted because it broke the shared library build. The shared library build broke because there was a missing LLVMBuild dependency from lib/Passes (which calls TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can tell, this problem was always there but was somehow masked before (perhaps because TargetMachine::getTargetIRAnalysis was a virtual function). Original commit message: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321375	2017-12-22 18:21:59 +00:00
Sanjoy Das	747d1114d6	Revert "Expose a TargetMachine::getTargetTransformInfo function" This reverts commit r321234. It breaks the -DBUILD_SHARED_LIBS=ON build. llvm-svn: 321243	2017-12-21 02:34:39 +00:00
Sanjoy Das	0c3de350b4	Expose a TargetMachine::getTargetTransformInfo function Summary: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321234	2017-12-21 01:06:58 +00:00
Sander de Smalen	cd6be960ce	[AArch64][SVE] Re-submit patch series for ZIP1/ZIP2 This patch resubmits the SVE ZIP1/ZIP2 patch series consisting of of r320992, r320986, r320973, and r320970 by reverting https://reviews.llvm.org/rL321024. The issue that caused r321024 has been addressed in https://reviews.llvm.org/rL321158, so this patch-series should be safe to resubmit. llvm-svn: 321163	2017-12-20 11:02:42 +00:00
Tim Northover	6db5d027c6	AArch64: fix one more place movi.2d could be created. Somehow got missed out of r320965. llvm-svn: 321162	2017-12-20 10:45:39 +00:00
Sander de Smalen	c067c30d9e	[AArch64] Asm: Fix parsing of register aliases that have a name starting with 'z' Summary: This fixes an issue as identified by @rnk in https://reviews.llvm.org/rL321029. Reviewers: rnk, fhahn, rengolin, efriedma, echristo, olista01 Reviewed By: rnk, fhahn Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, rnk Differential Revision: https://reviews.llvm.org/D41382 llvm-svn: 321158	2017-12-20 09:45:45 +00:00
Sam Parker	daed9de622	[AArch64] CCSIDR2 system register Implement the 'Current Cache Size' register that has been introduced as part of the Armv8.3 architecture. I originally missed this, and (hopefully) should be the final patch for assembler support. Differential Revision: https://reviews.llvm.org/D41396 llvm-svn: 321155	2017-12-20 08:56:41 +00:00
Martin Storsjo	2778fd0b59	[AArch64] Implement stack probing for windows Differential Revision: https://reviews.llvm.org/D41131 llvm-svn: 321150	2017-12-20 06:51:45 +00:00
Adrian Prantl	0e6694d111	Silence a bunch of implicit fallthrough warnings llvm-svn: 321114	2017-12-19 22:05:25 +00:00
Matthias Braun	a4852d2c19	X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI Note: - X86ISelLowering: setLibcallName(SINCOS) was superfluous as InitLibcalls() already does it. - ARMISelLowering: Setting libcallnames for sincos/sincosf seemed superfluous as in the darwin case it wouldn't be used while for all other cases InitLibcalls already does it. llvm-svn: 321036	2017-12-18 23:19:42 +00:00
Matthias Braun	a92cecfbda	AArch64/X86: Factor out common bzero logic; NFC llvm-svn: 321035	2017-12-18 23:14:28 +00:00
Jessica Paquette	8565d3af84	[MachineOutliner][NFC] Gardening: use std::any_of instead of bool + loop River Riddle suggested to use std::any_of instead of the bool + loop thing on r320229. This commit does that. llvm-svn: 321028	2017-12-18 21:44:52 +00:00
Reid Kleckner	37517a2ddd	Revert "[AArch64][SVE] Asm" changes, they broke libjpeg_turbo This reverts changes r320992, r320986, r320973, and r320970. r320970 by itself breaks the test case, and the rest depend on it. Test case will land soon. llvm-svn: 321024	2017-12-18 20:58:25 +00:00
Jessica Paquette	02c124d644	[MachineOutliner] Recommit r320229 LR was undefined entering outlined functions that contain calls. This made the machine verifier unhappy when expensive checks were enabled. This fixes that. llvm-svn: 321014	2017-12-18 19:33:21 +00:00
Sander de Smalen	09f56a54d0	[AArch64][SVE] Asm: Improve diagnostics further when +sve is not specified Summary: Patch [4/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. This patch further improves diagnostic messages for when the SVE feature is not specified. Reviewers: rengolin, fhahn, olista01, echristo, efriedma Reviewed By: fhahn Subscribers: sdardis, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D40363 llvm-svn: 320992	2017-12-18 16:48:53 +00:00
Sander de Smalen	fce0c1c45b	[AArch64][SVE] Asm: Add ZIP1/ZIP2 instructions (predicate/data vectors) Summary: Patch [2/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D40361 llvm-svn: 320973	2017-12-18 11:29:59 +00:00
Sander de Smalen	ce1e0975f4	[AArch64][SVE] Asm: Add SVE predicate register definitions and parsing support Summary: Patch [1/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro, echristo, efriedma Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D40360 llvm-svn: 320970	2017-12-18 11:26:34 +00:00
Tim Northover	9097a07e4e	AArch64: work around how Cyclone handles "movi.2d vD, #0". For Cylone, the instruction "movi.2d vD, #0" is executed incorrectly in some rare circumstances. Work around the issue conservatively by avoiding the instruction entirely. This patch changes CodeGen so that problematic instructions are never generated, and the AsmParser so that an equivalent instruction is used (with a warning). llvm-svn: 320965	2017-12-18 10:36:00 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Evandro Menezes	a9134e86f1	[AArch64] Fix typo in the ASIMD instruction optimization pass Fix typo in the representative instruction replacement. Also, fix formatting and reword some comments. llvm-svn: 320839	2017-12-15 18:26:54 +00:00
Francis Visoiu Mistrih	0b5bdceabf	[CodeGen] Print stack object references as %(fixed-)stack.0 in both MIR and debug output Work towards the unification of MIR and debug output by printing `%stack.0` instead of `<fi#0>`, and `%fixed-stack.0` instead of `<fi#-4>` (supposing there are 4 fixed stack objects). Only debug syntax is affected. Differential Revision: https://reviews.llvm.org/D41027 llvm-svn: 320827	2017-12-15 16:33:45 +00:00
Evandro Menezes	d123f8ccf1	[AArch64] Test patch Fix formatting by adding a missing blank line to test new network setup. llvm-svn: 320760	2017-12-14 23:06:18 +00:00
Matt Arsenault	7d7adf4f2e	TLI: Allow using PSV for intrinsic mem operands llvm-svn: 320756	2017-12-14 22:34:10 +00:00
Sanjay Patel	0ab0c1a201	[SimplifyCFG] don't sink common insts too soon (PR34603) This should solve: https://bugs.llvm.org/show_bug.cgi?id=34603 ...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run. It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the sinking transform later in the optimization pipeline. Differential Revision: https://reviews.llvm.org/D38566 llvm-svn: 320749	2017-12-14 22:05:20 +00:00
Matt Arsenault	1117133687	DAG: Expose all MMO flags in getTgtMemIntrinsic Rather than adding more bits to express every MMO flag you could want, just directly use the MMO flags. Also fixes using a bunch of bool arguments to getMemIntrinsicNode. On AMDGPU, buffer and image intrinsics should always have MODereferencable set, but currently there is no way to do that directly during the initial intrinsic lowering. llvm-svn: 320746	2017-12-14 21:39:51 +00:00
Sander de Smalen	14e36ee5c3	Re-commit: [TableGen] AsmMatcher: Fix bug with reported diagnostic for operand. Summary: The generated diagnostic by the AsmMatcher isn't always applicable to the AsmOperand. This is because the code will only update the diagnostic if it is more specific than the previous diagnostic. However, when having validated operands and 'moved on' to a next operand (for some instruction/alias for which all previous operands are valid), if the diagnostic is InvalidOperand, than that should be set as the diagnostic, not the more specific message about a previous operand for some other instruction/alias candidate. (Re-committed with an extra whitespace in SVEInstrFormats.td to trigger rebuild of AArch64GenAsmMatcher.inc, since the llvm-clang-x86_64-expensive-checks-win builder does not seem to rebuild AArch64GenAsmMatcher.inc with the newly built TableGen due to a missing dependency somewhere (see: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119555.html)) Reviewers: craig.topper, olista01, rengolin, stoklund Reviewed By: olista01 Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D40011 llvm-svn: 320711	2017-12-14 16:09:48 +00:00
Fedor Sergeev	84693033b4	Remove redundant includes from lib/Target/AArch64. llvm-svn: 320686	2017-12-14 10:36:20 +00:00
Michael Zolotukhin	67b04bd8ac	Recover some overzealously removed includes. llvm-svn: 320648	2017-12-13 22:21:02 +00:00
Michael Zolotukhin	a859bd9ced	Remove redundant includes from lib/Target/AArch64. llvm-svn: 320634	2017-12-13 21:31:16 +00:00
Galina Kistanova	9dee3f0a97	Reverted r320229. It broke tests on builder llvm-clang-x86_64-expensive-checks-win. llvm-svn: 320588	2017-12-13 15:26:27 +00:00
Matthias Braun	f842297d50	Rename LiveIntervalAnalysis.h to LiveIntervals.h Headers/Implementation files should be named after the class they declare/define. Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in favor of `class LiveIntarvals;` llvm-svn: 320546	2017-12-13 02:51:04 +00:00
Joel Jones	5cc21e83ce	[AArch64] Improve loop unrolling performance on Cavium T99 This patch improves performance on Cavium T99 as shown here (libquantum 0.2.4): https://docs.google.com/spreadsheets/d/1Lo1o2E1NjrpkwS7DvYYWsiVvPdd93h7KBaqeptMrZPY/edit?usp=sharing By increasing the LoopMicroOpsBufferSize in the Cavium T99 Scheduler file, loop unrolling becomes more aggressive. This helps performance on T99. Test case included. Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D40695 llvm-svn: 320272	2017-12-09 23:59:55 +00:00
Jessica Paquette	a249c4f513	[MachineOutliner] Outline calls The outliner previously would never outline calls. Calls are pretty common in files, so it makes sense to outline them. In fact, in the LLVM test suite, if you count the number of instructions that the outliner misses when you outline calls vs when you don't, it turns out that, on average, around 6% of the instructions encountered are calls. So, if we outline calls, we can find more candidates, and thus save some more space. This commit adds that functionality and updates the mir test to reflect that. llvm-svn: 320229	2017-12-09 00:43:49 +00:00
Abderrazek Zaafrani	5a2583f026	[AArch64] Rename AArch64VecorByElementOpt.cpp into AArch64SIMDInstrOpt.cpp to reflect the recently added features. The name change is dicsussed in https://reviews.llvm.org/D38196 llvm-svn: 320204	2017-12-08 22:04:13 +00:00
Abderrazek Zaafrani	2c80e4c7c3	[AArch64] Avoid SIMD interleaved store instruction for Exynos. Replace interleaved store instructions by equivalent and more efficient instructions based on latency cost model. Https://reviews.llvm.org/D38196 llvm-svn: 320123	2017-12-08 00:58:49 +00:00
Jessica Paquette	59948666fb	[MachineOutliner] Fix offset overflow check The offset overflow check before was incorrect. It would always give the correct result, but it was comparing the SCALED potential fixed-up offset against an UNSCALED minimum/maximum. As a result, the outliner was missing a bunch of frame setup/destroy instructions that ought to have been safe to outline. This fixes that, and adds an instruction to the .mir test that failed the old test. llvm-svn: 320090	2017-12-07 21:51:43 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Florian Hahn	5d6a4e43ba	[AArch64] Add patterns to replace fsub fmul with fma fneg. Summary: This patch adds MachineCombiner patterns for transforming (fsub (fmul x y) z) into (fma x y (fneg z)). This has a lower latency on micro architectures where fneg is cheap. Patch based on work by George Steed. Reviewers: rengolin, joelkevinjones, joel_k_jones, evandro, efriedma Reviewed By: evandro Subscribers: aemerson, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D40306 llvm-svn: 319980	2017-12-06 22:48:36 +00:00
Nirav Dave	7d8f3e0c93	[ARM][AArch64][DAG] Reenable post-legalize store merge Reenable post-legalize stores with constant merging computation and corresponding test case. * Properly truncate store merge constants * Disable merging of truncated stores floating points * Ensure merges of constant stores into a single vector are constructed from legal elements. Reviewers: eastig, efriedma Reviewed By: eastig Subscribers: spatel, rengolin, aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40701 llvm-svn: 319899	2017-12-06 15:30:13 +00:00
Joel Galenson	3e40883e4c	[AArch64] Do not abort if overflow check does not use EQ or NE. As suggested by Eli Friedman, instead of aborting if an overflow check uses something other than SETEQ or SETNE, simply do not apply the optimization. Differential Revision: https://reviews.llvm.org/D39147 llvm-svn: 319837	2017-12-05 21:33:12 +00:00
Daniel Sanders	3c1c4c0ee0	Revert r319691: [globalisel][tablegen] Split atomic load/store into separate opcode and enable for AArch64. Some concerns were raised with the direction. Revert while we discuss it and look into an alternative llvm-svn: 319739	2017-12-05 05:52:07 +00:00
Daniel Sanders	04e4f47e93	[globalisel][tablegen] Split atomic load/store into separate opcode and enable for AArch64. This patch splits atomics out of the generic G_LOAD/G_STORE and into their own G_ATOMIC_LOAD/G_ATOMIC_STORE. This is a pragmatic decision rather than a necessary one. Atomic load/store has little in implementation in common with non-atomic load/store. They tend to be handled very differently throughout the backend. It also has the nice side-effect of slightly improving the common-case performance at ISel since there's no longer a need for an atomicity check in the matcher table. All targets have been updated to remove the atomic load/store check from the G_LOAD/G_STORE path. AArch64 has also been updated to mark G_ATOMIC_LOAD/G_ATOMIC_STORE legal. There is one issue with this patch though which also affects the extending loads and truncating stores. The rules only match when an appropriate G_ANYEXT is present in the MIR. For example, (G_ATOMIC_STORE (G_TRUNC:s16 (G_ANYEXT:s32 (G_ATOMIC_LOAD:s16 X)))) will match but: (G_ATOMIC_STORE (G_ATOMIC_LOAD:s16 X)) will not. This shouldn't be a problem at the moment, but as we get better at eliminating extends/truncates we'll likely start failing to match in some cases. The current plan is to fix this in a patch that changes the representation of extending-load/truncating-store to allow the MMO to describe a different type to the operation. llvm-svn: 319691	2017-12-04 20:39:32 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Martin Storsjo	eca862de07	[AArch64] Allow using emulated tls on platforms other than ELF This matches how it is done on X86. This allows using emulated tls on windows; in MinGW environments, native tls isn't supported at the moment. Set the right Data*bitsDirective for windows to match the existing tests for other platforms. Make parts of the existing tests a regex, to allow matching .section .rdata for windows, to avoid having to duplicate the rest of the tests for windows. Differential Revision: https://reviews.llvm.org/D40770 llvm-svn: 319644	2017-12-04 09:09:04 +00:00
Nirav Dave	839ff79a8d	[DAG][AArch64] Disable post-legalization store Disable post-legalization store for AArch64 backend which is causing errors out-of-tree. llvm-svn: 319607	2017-12-02 04:01:26 +00:00
Volkan Keles	a32ff00b00	GlobalISel: Enable the legalization of G_MERGE_VALUES and G_UNMERGE_VALUES Summary: LegalizerInfo assumes all G_MERGE_VALUES and G_UNMERGE_VALUES instructions are legal, so it is not possible to legalize vector operations on illegal vector types. This patch fixes the problem by removing the related check and adding default actions for G_MERGE_VALUES and G_UNMERGE_VALUES. Reviewers: qcolombet, ab, dsanders, aditya_nandakumar, t.p.northover, kristof.beyls Reviewed By: dsanders Subscribers: rovka, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D39823 llvm-svn: 319524	2017-12-01 08:19:10 +00:00
Daniel Sanders	0c43b3a023	[globalisel][tablegen] Add support for relative AtomicOrderings No test yet because the relevant rules are blocked on the atomic_load, and atomic_store nodes. llvm-svn: 319475	2017-11-30 21:05:59 +00:00
Daniel Sanders	aef1dfc690	[aarch64][globalisel] Legalize G_ATOMIC_CMPXCHG_WITH_SUCCESS and G_ATOMICRMW_* G_ATOMICRMW_* is generally legal on AArch64. The exception is G_ATOMICRMW_NAND. G_ATOMIC_CMPXCHG_WITH_SUCCESS needs to be lowered to G_ATOMIC_CMPXCHG with an external comparison. Note that IRTranslator doesn't generate these instructions yet. llvm-svn: 319466	2017-11-30 20:11:42 +00:00
Amara Emerson	d78d65c2a4	[GlobalISel][IRTranslator] Fix crash during translation of zero sized loads/stores/args/returns. This fixes PR35358. rdar://35619533 Differential Revision: https://reviews.llvm.org/D40604 llvm-svn: 319465	2017-11-30 20:06:02 +00:00
Francis Visoiu Mistrih	c71cced0aa	[CodeGen] Always use `printReg` to print registers in both MIR and debug output As part of the unification of the debug format and the MIR format, always use `printReg` to print all kinds of registers. Updated the tests using '_' instead of '%noreg' until we decide which one we want to be the default one. Differential Revision: https://reviews.llvm.org/D40421 llvm-svn: 319445	2017-11-30 16:12:24 +00:00
Francis Visoiu Mistrih	93ef145862	[CodeGen] Print "%vreg0" as "%0" in both MIR and debug output As part of the unification of the debug format and the MIR format, avoid printing "vreg" for virtual registers (which is one of the current MIR possibilities). Basically: * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g" * grep -nr '%vreg' . and fix if needed * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g" * grep -nr 'vreg[0-9]\+' . and fix if needed Differential Revision: https://reviews.llvm.org/D40420 llvm-svn: 319427	2017-11-30 12:12:19 +00:00
Sander de Smalen	6a3bf1f84a	Reverted r319315 because of unused functions (due to PPR not yet being used by any instructions). llvm-svn: 319321	2017-11-29 15:14:39 +00:00
Sander de Smalen	2b6338b2bc	[AArch64][SVE] Asm: Add SVE predicate register definitions and parsing support Summary: Patch [1/4] in a series to add parsing of predicates and properly parse SVE ZIP1/ZIP2 instructions. Reviewers: rengolin, kristof.beyls, fhahn, mcrosier, evandro, echristo, efriedma Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits, tschuett Differential Revision: https://reviews.llvm.org/D40360 llvm-svn: 319315	2017-11-29 14:34:18 +00:00
Simon Pilgrim	14d3fd29f8	Fix VS2017 narrowing conversion warning. NFCI llvm-svn: 319240	2017-11-28 22:32:43 +00:00
Daniel Sanders	7fe7acc6b1	[aarch64][globalisel] Define G_ATOMIC_CMPXCHG and G_ATOMICRMW_* and make them legal The IRTranslator cannot generate these instructions at the moment so there's no issue with not having implemented ISel for them yet. D40092 will add G_ATOMIC_CMPXCHG_WITH_SUCCESS and G_ATOMICRMW_* to the IRTranslator and a further patch will add support for lowering G_ATOMIC_CMPXCHG_WITH_SUCCESS into G_ATOMIC_CMPXCHG with an external success check via the `Lower` action. The separation of G_ATOMIC_CMPXCHG_WITH_SUCCESS and G_ATOMIC_CMPXCHG is to import SelectionDAG rules while still supporting targets that prefer to custom lower the original LLVM-IR-like operation. llvm-svn: 319216	2017-11-28 20:21:15 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Francis Visoiu Mistrih	9d419d3b0c	[CodeGen] Rename functions PrintReg* to printReg* LLVM Coding Standards: Function names should be verb phrases (as they represent actions), and command-like function should be imperative. The name should be camel case, and start with a lower case letter (e.g. openFile() or isFoo()). Differential Revision: https://reviews.llvm.org/D40416 llvm-svn: 319168	2017-11-28 12:42:37 +00:00
Nirav Dave	db77e57ea8	[DAG] Do MergeConsecutiveStores again before Instruction Selection Summary: Now that store-merge is only generates type-safe stores, do a second pass just before instruction selection to allow lowered intrinsics to be merged as well. Reviewers: jyknight, hfinkel, RKSimon, efriedma, rnk, jmolloy Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33675 llvm-svn: 319036	2017-11-27 15:28:15 +00:00
Evandro Menezes	ed721e32cd	[AArch64] Adjust the cost model for Exynos M1 and M2 Fix the modeling of some loads and stores. llvm-svn: 318884	2017-11-22 22:48:50 +00:00
Chad Rosier	fe97d73674	[AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as having side effects. This partially reverts r298851. The the underlying issue is that we don't currently model the dependency between mrs (read system register) and msr (write system register) instructions. Something like the below should never be reordered: msr TPIDR_EL0, x0 ;; set thread pointer mrs x8, TPIDR_EL0 ;; read thread pointer but was being reordered after r298851. The functional part of the patch that wasn't reverted needed to remain in place in order to not break r299462. PR35317 llvm-svn: 318788	2017-11-21 18:08:34 +00:00
Evandro Menezes	46f672b759	[AArch64] Adjust the cost model for Exynos M1 and M2 Fix the modeling of test and branch. llvm-svn: 318685	2017-11-20 19:11:56 +00:00
Sander de Smalen	0c5a29b6be	[AArch64][TableGen] Skip tied result operands for InstAlias Summary: This patch fixes an issue so that the right alias is printed when the instruction has tied operands. It checks the number of operands in the resulting instruction as opposed to the alias, and then skips over tied operands that should not be printed in the alias. This allows to generate the preferred assembly syntax for the AArch64 'ins' instruction, which should always be displayed as 'mov' according to the ARM Architecture Reference Manual. Several unit tests have changed as a result, but only to reflect the preferred disassembly. Some other InstAlias patterns (movk/bic/orr) needed a slight adjustment to stop them becoming the default and breaking other unit tests. Please note that the patch is mostly the same as https://reviews.llvm.org/D29219 which was reverted because of an issue found when running TableGen with the Address Sanitizer. That issue has been addressed in this iteration of the patch. Reviewers: rengolin, stoklund, huntergr, SjoerdMeijer, rovka Reviewed By: rengolin, SjoerdMeijer Subscribers: fhahn, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D40030 llvm-svn: 318650	2017-11-20 14:36:40 +00:00
Quentin Colombet	c0d34d38cb	[AArch64] Map G_LOAD on FPR when the definition goes to a copy to FPR We used to detect loads feeding fp instructions, but we were failing to take into account cases where this happens through copies. For instance, loads can fed copies coming from the ABI lowering of floating point arguments/results. llvm-svn: 318589	2017-11-18 04:28:59 +00:00
Quentin Colombet	63816c0957	[AArch64] Map G_STORE on FPR when the source comes from a FPR copy We used to detect that stores were fed by fp instructions, but we were failing to take into account cases where this happens through copies. For instance, stores can be fed by copies coming from the ABI lowering of floating point arguments. llvm-svn: 318588	2017-11-18 04:28:58 +00:00
Quentin Colombet	91801f68aa	[AArch64][RegisterBankInfo] Teach instruction mapping about gpr32 -> fpr16 cross copies Turns out this copies can actually occur because of the way we lower the ABI for half. llvm-svn: 318586	2017-11-18 04:28:56 +00:00
Evandro Menezes	4b964f2b95	[AArch64] Adjust the cost model for Exynos M1 and M2 Improve the accuracy of the model by specifying the proper number of uops. llvm-svn: 318531	2017-11-17 16:42:15 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Daniel Sanders	f76f315436	[globalisel][tablegen] Generate rule coverage and use it to identify untested rules Summary: This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV, causes TableGen to instrument the generated table to collect rule coverage information. However, LLVM_ENABLE_GISEL_COV goes a bit further than LLVM_ENABLE_DAGISEL_COV. The information is written to files (${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will read this information and use it to emit warnings about untested rules. This technique could also be used by SelectionDAG and can be further extended to detect hot rules and give them priority over colder rules. Usage: * Enable LLVM_ENABLE_GISEL_COV in CMake * Build the compiler and run some tests * cat gisel-coverage-[0-9]* > gisel-coverage-all * Delete lib/Target//GenGlobalISel.inc* * Build the compiler Known issues: * ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual step due to a lack of a portable 'cat' command. It should be the concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files. * There's no mechanism to discard coverage information when the ruleset changes Depends on D39742 Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka Reviewed By: rovka Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D39747 llvm-svn: 318356	2017-11-16 00:46:35 +00:00
Daniel Sanders	725584e26d	Add backend name to Target to enable runtime info to be fed back into TableGen Summary: Make it possible to feed runtime information back to tablegen to enable profile-guided tablegen-eration, detection of untested tablegen definitions, etc. Being a cross-compiler by nature, LLVM will potentially collect data for multiple architectures (e.g. when running 'ninja check'). We therefore need a way for TableGen to figure out what data applies to the backend it is generating at the time. This patch achieves that by including the name of the 'def X : Target ...' for the backend in the TargetRegistry. Reviewers: qcolombet Reviewed By: qcolombet Subscribers: jholewinski, arsenm, jyknight, aditya_nandakumar, sdardis, nemanjai, ab, nhaehnle, t.p.northover, javed.absar, qcolombet, llvm-commits, fedor.sergeev Differential Revision: https://reviews.llvm.org/D39742 llvm-svn: 318352	2017-11-15 23:55:44 +00:00
Evandro Menezes	82665b1ec4	[AArch64] Adjust the cost model for Exynos M1 and M2 Fix the modeling of FP stores. llvm-svn: 318351	2017-11-15 23:49:58 +00:00
Evandro Menezes	5ba804bc11	[AArch64] Refactor the loads and stores optimizer Move remaining inline matching of instructions of some optimizations into separate functions, like in the other optimizations. Otherwise, NFC. Differential revision: https://reviews.llvm.org/D40090 llvm-svn: 318335	2017-11-15 21:06:22 +00:00
Evandro Menezes	cbf70486bc	[AArch64] Adjust the cost model for Exynos M1 and M2 Fix the modeling of loads and stores using the pre or post indexed addressing modes. llvm-svn: 318312	2017-11-15 17:39:37 +00:00
Sander de Smalen	8e607346af	[AArch64][SVE] Asm: Report SVE parsing diagnostics only once Summary: Prevent an issue where a diagnostic is reported multiple times by bailing out with a ParseFail if an invalid SVE register element qualifier/suffix is specified, for example: <stdin>:10:18: error: invalid sve vector kind qualifier add z20.h, z2.h, z31.x ^ <stdin>:10:18: error: invalid sve vector kind qualifier add z20.h, z2.h, z31.x ... <stdin>:10:18: error: invalid sve vector kind qualifier add z20.h, z2.h, z31.x ^ Reviewers: fhahn, rengolin Reviewed By: rengolin Subscribers: aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D39894 llvm-svn: 318297	2017-11-15 15:44:43 +00:00
Evandro Menezes	1c94538693	[AArch64] Adjust the cost model for Exynos M1 and M2 Fix the modeling of loads and stores of registers pairs. llvm-svn: 318186	2017-11-14 19:59:43 +00:00
Martin Storsjo	4629f52312	[ARM, AArch64] Fix an assert message, Darwin isn't the only target supporting TLS. NFC. llvm-svn: 318184	2017-11-14 19:57:59 +00:00
Sander de Smalen	070a7ff1ad	Test commit llvm-svn: 318027	2017-11-13 09:57:20 +00:00
David Blaikie	3f833edc7c	Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647	2017-11-08 01:01:31 +00:00
Florian Hahn	b936810833	[AArch64][SVE] Asm: Add support for (ADD\|SUB)_ZZZ Patch [5/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39091 llvm-svn: 317591	2017-11-07 16:58:13 +00:00
Florian Hahn	91f11e5ad1	[AArch64][SVE] Asm: Add SVE (Z) Register definitions and parsing support Patch [3/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. To summarise, this patch adds: * SVE register definitions * Methods to parse SVE register operands * Methods to print SVE register operands * RegKind SVEDataVector to distinguish it from other data types like scalar register or Neon vector. * k_SVEDataRegister and SVEDataRegOp to describe SVE registers (which will be extended by further patches with e.g. ElementWidth and the shift-extend type). Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39089 llvm-svn: 317590	2017-11-07 16:45:48 +00:00
Florian Hahn	d825bbdc41	[AArch64][SVE] Asm: Set SVE as unsupported feature for existing scheduler models. Patch [4/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. We add SVE as unsupported feature for CPUs that don't have SVE to prevent errors from scheduler models saying it lacks information for these instructions. Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39090 llvm-svn: 317582	2017-11-07 15:03:11 +00:00
Florian Hahn	c4422247b3	[AArch64][SVE] Asm: Replace 'IsVector' by 'RegKind' in AArch64AsmParser (NFC) Patch [2/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. This change is a non functional change that adds RegKind as an alternative to 'isVector' to prepare it for newer types (SVE data vectors and predicate vectors) that will be added in next patches (where the SVE data vector is added as part of this patch set) Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39088 llvm-svn: 317569	2017-11-07 13:07:50 +00:00
Kristof Beyls	af9814a1fc	[GlobalISel] Enable legalizing non-power-of-2 sized types. This changes the interface of how targets describe how to legalize, see the below description. 1. Interface for targets to describe how to legalize. In GlobalISel, the API in the LegalizerInfo class is the main interface for targets to specify which types are legal for which operations, and what to do to turn illegal type/operation combinations into legal ones. For each operation the type sizes that can be legalized without having to change the size of the type are specified with a call to setAction. This isn't different to how GlobalISel worked before. For example, for a target that supports 32 and 64 bit adds natively: for (auto Ty : {s32, s64}) setAction({G_ADD, 0, s32}, Legal); or for a target that needs a library call for a 32 bit division: setAction({G_SDIV, s32}, Libcall); The main conceptual change to the LegalizerInfo API, is in specifying how to legalize the type sizes for which a change of size is needed. For example, in the above example, how to specify how all types from i1 to i8388607 (apart from s32 and s64 which are legal) need to be legalized and expressed in terms of operations on the available legal sizes (again, i32 and i64 in this case). Before, the implementation only allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0, s128}, NarrowScalar). A worse limitation was that if you'd wanted to specify how to legalize all the sized types as allowed by the LLVM-IR LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times and probably would need a lot of memory to store all of these specifications. Instead, the legalization actions that need to change the size of the type are specified now using a "SizeChangeStrategy". For example: setLegalizeScalarToDifferentSizeStrategy( G_ADD, 0, widenToLargerAndNarrowToLargest); This example indicates that for type sizes for which there is a larger size that can be legalized towards, do it by Widening the size. For example, G_ADD on s17 will be legalized by first doing WidenScalar to make it s32, after which it's legal. The "NarrowToLargest" indicates what to do if there is no larger size that can be legalized towards. E.g. G_ADD on s92 will be legalized by doing NarrowScalar to s64. Another example, taken from the ARM backend is: for (unsigned Op : {G_SDIV, G_UDIV}) { setLegalizeScalarToDifferentSizeStrategy(Op, 0, widenToLargerTypesUnsupportedOtherwise); if (ST.hasDivideInARMMode()) setAction({Op, s32}, Legal); else setAction({Op, s32}, Libcall); } For this example, G_SDIV on s8, on a target without a divide instruction, would be legalized by first doing action (WidenScalar, s32), followed by (Libcall, s32). The same principle is also followed for when the number of vector lanes on vector data types need to be changed, e.g.: setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal); setLegalizeVectorElementToDifferentSizeStrategy( G_ADD, 0, widenToLargerTypesUnsupportedOtherwise); As currently implemented here, vector types are legalized by first making the vector element size legal, followed by then making the number of lanes legal. The strategy to follow in the first step is set by a call to setLegalizeVectorElementToDifferentSizeStrategy, see example above. The strategy followed in the second step "moreToWiderTypesAndLessToWidest" (see code for its definition), indicating that vectors are widened to more elements so they map to natively supported vector widths, or when there isn't a legal wider vector, split the vector to map it to the widest vector supported. Therefore, for the above specification, some example legalizations are: * getAction({G_ADD, LLT::vector(3, 3)}) returns {WidenScalar, LLT::vector(3, 8)} * getAction({G_ADD, LLT::vector(3, 8)}) then returns {MoreElements, LLT::vector(8, 8)} * getAction({G_ADD, LLT::vector(20, 8)}) returns {FewerElements, LLT::vector(16, 8)} 2. Key implementation aspects. How to legalize a specific (operation, type index, size) tuple is represented by mapping intervals of integers representing a range of size types to an action to take, e.g.: setScalarAction({G_ADD, LLT:scalar(1)}, {{1, WidenScalar}, // bit sizes [ 1, 31[ {32, Legal}, // bit sizes [32, 33[ {33, WidenScalar}, // bit sizes [33, 64[ {64, Legal}, // bit sizes [64, 65[ {65, NarrowScalar} // bit sizes [65, +inf[ }); Please note that most of the code to do the actual lowering of non-power-of-2 sized types is currently missing, this is just trying to make it possible for targets to specify what is legal, and how non-legal types should be legalized. Probably quite a bit of further work is needed in the actual legalizing and the other passes in GlobalISel to support non-power-of-2 sized types. I hope the documentation in LegalizerInfo.h and the examples provided in the various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well enough how this is meant to be used. This drops the need for LLT::{half,double}...Size(). Differential Revision: https://reviews.llvm.org/D30529 llvm-svn: 317560	2017-11-07 10:34:34 +00:00
David Blaikie	1be62f0327	Move TargetFrameLowering.h to CodeGen where it's implemented This header already includes a CodeGen header and is implemented in lib/CodeGen, so move the header there to match. This fixes a link error with modular codegeneration builds - where a header and its implementation are circularly dependent and so need to be in the same library, not split between two like this. llvm-svn: 317379	2017-11-03 22:32:11 +00:00
Evandro Menezes	9dcf099944	[AArch64] Fix the number of iterations for the Newton series The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349	2017-11-03 18:56:36 +00:00
Martin Storsjo	9befcd7d8d	[AArch64] Use dwarf exception handling on MinGW Ideally we should probably produce WinEH here as well, but until then, we can use dwarf exceptions, without any further changes required in clang, libunwind or libcxxabi. Differential Revision: https://reviews.llvm.org/D39535 llvm-svn: 317304	2017-11-03 07:33:20 +00:00
Quentin Colombet	b6afac1f9a	[AArch64][RegisterBankInfo] Add mapping for G_FPEXT. This fixes http://llvm.org/PR32560. We were missing a description for half floating point type and as a result were using the FPR 32 mapping. Because of the size mismatch the generic code was complaining that the default mapping is not appropriate. Fix the mapping description so that the default mapping can be properly applied. llvm-svn: 317287	2017-11-02 23:38:19 +00:00
Quentin Colombet	619d649878	[AArch64][RegisterBankInfo] Add FPR16 support in value mapping. NFC. llvm-svn: 317286	2017-11-02 23:38:13 +00:00
Javed Absar	d13d419d4a	[AArch64]: range loopify frame-lowering llvm-svn: 316960	2017-10-30 22:00:06 +00:00
Sanjay Patel	b049173157	[SimplifyCFG] use pass options and remove the latesimplifycfg pass This is no-functional-change-intended. This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and D35411 (defer folding unconditional branches) with pass parameters rather than a named "latesimplifycfg" pass. Now that we have individual options to control the functionality, we could decouple when these fire (but that's an independent patch if desired). The next planned step would be to add another option bit to disable the sinking transform mentioned in D38566. This should also make it clear that the new pass manager needs to be updated to limit simplifycfg in the same way as the old pass manager. Differential Revision: https://reviews.llvm.org/D38631 llvm-svn: 316835	2017-10-28 18:43:07 +00:00
David Blaikie	6265130054	InstructionSelectorImpl.h: Modularize/remove ODR violations by using a static member function to expose the debug name llvm-svn: 316715	2017-10-26 23:39:54 +00:00
Yichao Yu	221dae31a5	Clear LastMappingSymbols and LastEMS(Info) when resetting the ARM(AArch64)ELFStreamer Summary: This causes a segfault on ARM when (I think) the pass manager is used multiple times. Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again. The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue. The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though. No test is included since the test is believed to be too complicated for such an obvious fix and not worth doing. Reviewers: llvm-commits, shankare, t.p.northover, peter.smith, rengolin Reviewed By: rengolin Subscribers: mgorny, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38588 llvm-svn: 316679	2017-10-26 17:36:43 +00:00
Craig Topper	0551556ed2	[AsmParser][TableGen] Add VariantID argument to the generated mnemonic spell check function so it can use the correct table based on variant. I'm considering implementing the mnemonic spell checker for x86, and that would require the separate intel and att variants. llvm-svn: 316641	2017-10-26 06:46:41 +00:00
Craig Topper	2a06028c0a	[AsmParser][TableGen] Make the generated mnemonic spell checker function a file local static function. Also only emit in targets that specificially request it. This is required so we don't get an unused static function error. llvm-svn: 316640	2017-10-26 06:46:40 +00:00
Martin Storsjo	373c8efa1e	[AArch64] Add support for dllimport of values and functions Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555	2017-10-25 07:25:18 +00:00
Daniel Sanders	d66e0901ae	[globalisel][tablegen] Import stores and allow GISel to automatically substitute zero regs like WZR/XZR/$zero. This patch enables the import of stores. Unfortunately, doing so by itself, loses an optimization where storing 0 to memory makes use of WZR/XZR. To mitigate this, this patch also introduces a new feature that allows register operands to nominate a zero register. When this is done, GlobalISel will substitute (G_CONSTANT 0) with the nominated register automatically. This is currently configured to only apply to the stores. Applying it to GPR32/GPR64 register classes in general will be done after review see (https://reviews.llvm.org/D39150). llvm-svn: 316360	2017-10-23 18:19:24 +00:00
Daniel Sanders	1e4569fdc1	[globalisel][tablegen] Fix small spelling nits. NFC ComplexRendererFn -> ComplexRendererFns Corrected a couple lingering references to tied operands that were missed. llvm-svn: 316237	2017-10-20 20:55:29 +00:00
Daniel Sanders	30247fd1d9	[aarch64][globalisel] Register banks and classes should have distinct names. Otherwise they are ambiguous in MIR. llvm-svn: 316047	2017-10-18 00:12:43 +00:00
Matthias Braun	a2f96b5bde	AArch64: Enable AES instruction fusion on Cyclone. Note that cyclone itself doesn't fuse, but newer apple chips do and we are using cyclone as the default when targeting apple OSes. The current code also does not capture all fusion patterns of apple CPUs yet; I am still looking for ways to refactor the code nicely to extend it. llvm-svn: 316036	2017-10-17 21:46:15 +00:00
Tim Northover	350a87eaf1	AArch64: account for possible frame index operand in compares. If the address of a local is used in a comparison, AArch64 can fold the address-calculation into the comparison via "adds". Unfortunately, a couple of places (both hit in this one test) are not ready to deal with that yet and just assume the first source operand is a register. llvm-svn: 316035	2017-10-17 21:43:52 +00:00
Quentin Colombet	0bd2825517	Re-apply [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY This reverts commit r315823, thus re-applying r315781. Also make sure we don't use G_BITCAST mapping for non-generic registers. Non-generic registers don't have a type but do have a reg bank. Something the COPY mapping now how to deal with but the G_BITCAST mapping don't. -- Original Commit Message -- We use to resort on the generic implementation to get the mappings for COPYs. The generic implementation resorts on table lookup and dynamically allocated objects to get the valid mappings. Given we already know how to map G_BITCAST and have the static mappings for them, use that code path for COPY as well. This is much more efficient. Improve the compile time of RegBankSelect by up to 20%. Note: When we eventually generate all the mappings via TableGen, we wouldn't have to do that dance to shave compile time. The intent of this change was to make sure that moving to static structure really pays off. NFC. llvm-svn: 315947	2017-10-16 22:28:40 +00:00
Quentin Colombet	9f20af6135	[AArch64][RegisterBankInfo] Add mapping support for G_BITCAST of s128 Anything bigger than 64-bit just map to FPR. llvm-svn: 315946	2017-10-16 22:28:38 +00:00
Quentin Colombet	7c114d3d70	[AArch64][LegalizerInfo] Mark s128 G_BITCAST legal We used to mark all G_BITCAST of 128-bit legal but only for vector types. Scalars of this size are just fine as well. llvm-svn: 315945	2017-10-16 22:28:27 +00:00
Daniel Sanders	01805b6747	[aarch64][globalisel] Fix a crash in selectAddrModeIndexed() caused by incorrect G_FRAME_INDEX handling The wrong operand was being rendered to the result instruction. The crash was detected by Bitcode/simd_ops/AArch64_halide_runtime.bc llvm-svn: 315890	2017-10-16 05:39:30 +00:00
Daniel Sanders	ea8711b88e	Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. The previous commit failed on MSVC due to a failure to convert an initializer_list to a std::vector. Hopefully, MSVC will accept this version. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315887	2017-10-16 03:36:29 +00:00

... 6 7 8 9 10 ...

3232 Commits