llvm-project

Commit Graph

Author	SHA1	Message	Date
Anna Welker	7cd1cfdd6b	[NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter] Add an extra parameter so alignment can be taken under consideration in gather/scatter legalization. Differential Revision: https://reviews.llvm.org/D71610	2019-12-18 09:14:39 +00:00
Sjoerd Meijer	049f9672d8	[ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC. In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have quite a few helper functions all looking at the opcodes of MVE instructions. This moves all these utility functions to ARMBaseInstrInfo. Diferential Revision: https://reviews.llvm.org/D71426	2019-12-16 09:13:59 +00:00
Momchil Velikov	8e8e3181aa	[ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for `VLLDM` and `VLSTM`, which are currently defined with itineraries having a dynamic count of micro-ops. Assuming an optimistic case in which these instruction do not actually perform loads or stores, and with the idea that Armv8-m cores are supposed to use the new style scheduling models, this patch just sets the itinerary for those two instructions to `NoItinerary`. Differential Revision: https://reviews.llvm.org/D71266	2019-12-13 18:19:40 +00:00
Fangrui Song	f16377f11c	[ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71062	2019-12-13 09:26:26 -08:00
Sam Parker	84593f058b	[ARM][MVE] Make VPT invalid for tail predication We've been marking VPT incompatible instructions as invalid for tail predication too, though this may not strictly be true. VPT are incompatible and, unless its the first predicate def in a loop, they shouldn't be compatible for tail predication either. Differential Revision: https://reviews.llvm.org/D71410	2019-12-13 15:01:08 +00:00
Mikhail Maltsev	99581fd4c8	[ARM][MVE] Add vector reduction intrinsics with two vector operands Summary: This patch adds intrinsics for the following MVE instructions: * VABAV * VMLADAV, VMLSDAV * VMLALDAV, VMLSLDAV * VRMLALDAVH, VRMLSLDAVH Each of the above 4 groups has a corresponding new LLVM IR intrinsic, since the instructions cannot be easily represented using general-purpose IR operations. Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71062	2019-12-13 13:17:29 +00:00
Simon Tatham	25305a9311	[ARM][MVE] Add intrinsics for more immediate shifts. Summary: This fills in the remaining shift operations that take a single vector input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and `vshll[bt]` families. `vshll[bt]` (which shifts each input lane left into a double-width output lane) is the most interesting one. There are separate MC instruction ids for shifting by exactly the input lane width and shifting by less than that, because the instruction encoding is so completely different for the lane-width special case. So I had to write two sets of patterns to match based on the immediate shift count, which involved adding a ComplexPattern matcher to avoid the general-case pattern accidentally matching the special case too. For that family I've made sure to add an llc codegen test for both versions of each instruction. I'm experimenting with a new strategy for parametrising the isel patterns for all these instructions: adding extra fields to the relevant `Instruction` subclass itself, which are ignored by the Tablegen backends that generate the MC data, but can be retrieved from each instance of that instruction subclass when it's passed as a template parameter to the multiclass that generates its isel patterns. A nice effect of that is that I can fill in those informational fields using `let` blocks, rather than having to type them out once per instruction at `defm` time. (As a result, quite a lot of existing instruction `def`s are reindented by this patch, so it's clearer to read with whitespace changes ignored.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71458	2019-12-13 13:07:39 +00:00
John Brawn	01ba201abc	[ARM] Add custom strict fp conversion lowering when non-strict is custom We have custom lowering for operations converting to/from floating-point types when we don't have hardware support for those types, and this doesn't interact well with the target-independent legalization of the strict versions of these operations. Fix this by adding similar custom lowering of the strict versions. This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics test, with the remaining failures due to poor instruction selection. Differential Revision: https://reviews.llvm.org/D71127	2019-12-13 13:00:00 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sjoerd Meijer	e91420e17d	Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC." This reverts commit `9468e3334b`. There's a test that doesn't like this change. The RDA analysis gets invalided by changes in the block, which is not taken into account. Revert while I work on a fix for this.	2019-12-13 11:56:44 +00:00
Mark Murray	228c74076d	[ARM][MVE][Intrinsics] Add _x() variants of my _m() intrinsics. Summary: Better use of multiclass is used, and this helped find some existing bugs in the predicated VMULL* intrinsics, which are now fixed. The refactored VMULL[TB]Q_(INT\|POLY)_M() intrinsics were discovered to have an argument ("inactive") with incorrect type, and this required a fix that is included in this whole patch. The argument "inactive" should have been the same width (per vector element) as the return type of the intrinsic, but was not in the case where the return type was double the element width of the input types. To assist in testing the multiclassing , and to thwart further gremlins, the unit tests are improved in scope. The .ll tests are all generated by a small bit of throw-away scripting from the corresponding .c tests, and as such the diffs are large and nasty. Look at the file rather than the diff. Reviewers: dmgreen, miyuki, ostannard, simon_tatham Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71421	2019-12-13 11:51:23 +00:00
Sjoerd Meijer	9468e3334b	[ARM][MVE] findVCMPToFoldIntoVPS. NFC. This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can reimplement findVCMPToFoldIntoVPS with just a few calls to RDA. Differential Revision: https://reviews.llvm.org/D71330	2019-12-12 15:41:20 +00:00
Sam Parker	1274ac3dc2	[ARM][MVE] Sink vector shift operand Recommit `e0b966643f`. sub instructions were being generated for the negated value, and for some reason they were the register only ones. I think the problem was because I was grabbing the 'zero' from vmovimm, which is a target constant. Now I'm just generating a new Constant zero and so rsb instructions are now generated. Original commit message: The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 14:34:00 +00:00
Sam Parker	f8ff3bf55b	Revert "[ARM][MVE] Sink vector shift operand" This reverts commit `e0b966643f`. Instruction selection is failing with expensive checks.	2019-12-12 07:52:57 +00:00
Sam Parker	e0b966643f	[ARM][MVE] Sink vector shift operand The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 07:35:21 +00:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Reid Kleckner	85ba5f637a	Rename TTI::getIntImmCost for instructions and intrinsics Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381	2019-12-11 18:00:20 -08:00
Sjoerd Meijer	d97cf1f889	[ARM][LowOverheadLoops] Remove dead loop update instructions. After creating a low-overhead loop, the loop update instruction was still lingering around hurting performance. This removes dead loop update instructions, which in our case are mostly SUBS instructions. To support this, some helper functions were added to MachineLoopUtils and ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses before a particular loop instruction, respectively. This is a first version that removes a SUBS instruction when there are no other uses inside and outside the loop block, but there are some more interesting cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which shows that there is room for improvement. For example, we can't handle this case yet: .. dlstp.32 lr, r2 .LBB0_1: mov r3, r2 subs r2, #4 vldrh.u32 q2, [r1], #8 vmov q1, q0 vmla.u32 q0, q2, r0 letp lr, .LBB0_1 @ %bb.2: vctp.32 r3 .. which is a lot more tricky because r2 is not only used by the subs, but also by the mov to r3, which is used outside the low-overhead loop by the vctp instruction, and that requires a bit of a different approach, and I will follow up on this. Differential Revision: https://reviews.llvm.org/D71007	2019-12-11 10:20:19 +00:00
Simon Tatham	bd0f271c9e	[ARM][MVE] Add intrinsics for immediate shifts. (reland) This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: dmgreen, MarkMurrayARM Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-11 10:10:09 +00:00
Mikhail Maltsev	e6d3261c67	[ARM][MVE] Refactor complex vector intrinsics [NFCI] Summary: This patch refactors instruction selection of the complex vector addition, multiplication and multiply-add intrinsics, so that it is now based on TableGen patterns rather than C++ code. It also changes the first parameter (halving vs non-halving) of the arm_mve_vcaddq IR intrinsic to match the corresponding instruction encoding, hence it requires some changes in the tests. The patch addresses David's comment in https://reviews.llvm.org/D71190 Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM Reviewed By: dmgreen Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71245	2019-12-10 16:21:52 +00:00
Eric Christopher	9c6b7f68b8	Revert "[ARM][MVE] Add intrinsics for immediate shifts." and two follow-on commits: one warning fix and one functionality. As it's breaking at least the lto bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio This reverts commits: `8d70f3c933` `ff4dceef92` `d97b3e3e65`	2019-12-09 16:47:38 -08:00
Mark Murray	fc3417cb5a	[ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics. Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen, miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71198	2019-12-09 17:41:47 +00:00
Mark Murray	2eb61fa5d6	[ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT\|POLY) intrinsics. Summary: Add VMULL[BT]Q_(INT\|POLY) intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71066	2019-12-09 17:41:47 +00:00
Simon Tatham	8d70f3c933	[ARM] Fix NEON failure introduced by D71065. I rewrote the isel tablegen for MVE immediate shifts, and accidentally removed the `let Predicates=[HasMVEInt]` that was wrapping the old version, which seems to have allowed those rules to cause trouble on non-MVE targets. That's what I get for only re-running the MVE tests.	2019-12-09 16:56:00 +00:00
Simon Tatham	d97b3e3e65	[ARM][MVE] Add intrinsics for immediate shifts. Summary: This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which shift every lane of a vector left or right by a compile-time immediate. They mostly work by expanding to the IR `shl`, `lshr` and `ashr` operations, with their second operand being a vector splat of the immediate. There's a fiddly special case, though. ACLE specifies that the immediate in `vshrq_n` can take values up to //and including// the bit size of the vector lane. But LLVM IR thinks that shifting right by the full size of the lane is UB, and feels free to replace the `lshr` with an `undef` half way through the optimization pipeline. Hence, to keep this legal in source code, I have to detect it at codegen time. Logical (unsigned) right shifts by the element size are handled by simply emitting the zero vector; arithmetic ones are converted into a shift of one bit less, which will always give the same output. In order to do that check, I also had to enhance the tablegen MveEmitter so that it can cope with converting a builtin function's operand into a bare integer to pass to a code-generating subfunction. Previously the only bare integers it knew how to handle were flags generated from within `arm_mve.td`. Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71065	2019-12-09 15:44:09 +00:00
Mikhail Maltsev	0d1490bf6a	[ARM][MVE] Add complex vector intrinsics Summary: This patch adds intrinsics for the following MVE instructions: * VCADD, VHCADD * VCMUL * VCMLA Each of the above 3 groups has a corresponding new LLVM IR intrinsic. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: MarkMurrayARM Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71190	2019-12-09 12:05:59 +00:00
David Green	b1aba0378e	[ARM] Enable MVE masked loads and stores With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968	2019-12-09 11:37:34 +00:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
David Green	f008b5b8ce	[ARM] Additional tests and minor formatting. NFC This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.	2019-12-09 10:24:33 +00:00
David Stenberg	6965f835b4	[DebugInfo] Make describeLoadedValue() reg aware Summary: Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes situations in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: ormris, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D70431	2019-12-09 10:47:49 +01:00
David Stenberg	f3696533f2	Revert "[DebugInfo] Make describeLoadedValue() reg aware" This reverts commit `3cd93a4efc`. I'll recommit with a well-formatted arcanist commit message.	2019-12-09 10:45:13 +01:00
David Stenberg	3cd93a4efc	[DebugInfo] Make describeLoadedValue() reg aware Currently the describeLoadedValue() hook is assumed to describe the value of the instruction's first explicit define. The hook will not be called for instructions with more than one explicit define. This commit adds a register parameter to the describeLoadedValue() hook, and invokes the hook for all registers in the worklist. This will allow us to for example describe instructions which produce more than two parameters' values; e.g. Hexagon's various combine instructions. This also fixes a case in our downstream target where we may pass smaller parameters in the high part of a register. If such a parameter's value is produced by a larger copy instruction, we can't describe the call site value using the super-register, and we instead need to know which sub-register that should be used. This also allows us to handle cases like this: $ebx = [...] $rdi = MOVSX64rr32 $ebx $esi = MOV32rr $edi CALL64pcrel32 @call The hook will first be invoked for the MOV32rr instruction, which will say that @call's second parameter (passed in $esi) is described by $edi. As $edi is not preserved it will be added to the worklist. When we get to the MOVSX64rr32 instruction, we need to describe two values; the sign-extended value of $ebx -> $rdi for the first parameter, and $ebx -> $edi for the second parameter, which is now possible. This commit modifies the dbgcall-site-lea-interpretation.mir test case. In the test case, the values of some 32-bit parameters were produced with LEA64r. Perhaps we can in general cases handle such by emitting expressions that AND out the lower 32-bits, but I have not been able to land in a case where a LEA64r is used for a 32-bit parameter instead of LEA64_32 from C code. I have not found a case where it would be useful to describe parameters using implicit defines, so in this patch the hook is still only invoked for explicit defines of forwarding registers.	2019-12-09 10:44:17 +01:00
David Green	792fab343b	[ARM] Attempt to use whole register vmovs for MVE shuffles. MVE doesn't have the range of shuffle instructions available in Neon. We also cannot use the trick of cutting a difficult vector shuffle in half to simplify things. Instead we need to be more careful about how we lower shuffles. This patch adds an extra combine that attempts to find "whole lane" vmovs when lowering shuffles of smaller types. This helps us make some shuffles a lot simpler, generating single lane movs for the parts that can make use of it, falling back to the original shuffle for the rest. Differential Revision: https://reviews.llvm.org/D69509	2019-12-08 10:53:54 +00:00
David Green	3a6eb5f160	[ARM] Disable VLD4 under MVE Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109	2019-12-08 10:37:29 +00:00
Alina Sbirlea	c7faa68142	Revert "ARM-Darwin: keep the frame register reserved even if not updated." This reverts commit `a7d90af1be`. This revision came back as the root-cause for crashes in internal ARM-IOS apps. Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.	2019-12-06 10:59:26 -08:00
Simon Tatham	3fab4276cb	[ARM][MVE] Fix copy-paste error in VQSHL instruction ids. Summary: The immediate forms of the MVE VQSHL instruction have MC names like `MVE_VSLIimms8` and `MVE_VSLIimmu32`. Those names are confusing, because VSLI is a completely different shift instruction with no semantic relation to VQSHL. But it just happens to be defined immediately before VQSHL in `ARMInstrMVE.td`, so this looks like a copy-paste error. Renamed the ids to match the instruction name. Reviewers: ostannard, dmgreen, MarkMurrayARM, miyuki Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71114	2019-12-06 15:23:23 +00:00
Mark Murray	d3f62ceac0	[ARM][MVE][Intrinsics] Add VMULH/VRMULH intrinsics. Summary: Add MVE VMULH/VRMULH intrinsics and unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70948	2019-12-04 14:27:12 +00:00
Sam Parker	bc76dadb3c	[CodeGen] Move ARMCodegenPrepare to TypePromotion Convert ARMCodeGenPrepare into a generic type promotion pass by: - Removing the insertion of arm specific intrinsics to handle narrow types as we weren't using this. - Removing ARMSubtarget references. - Now query a generic TLI object to know which types should be promoted and what they should be promoted to. - Move all codegen tests into Transforms folder and testing using opt and not llc, which is how they should have been written in the first place... The pass searches up from icmp operands in an attempt to safely promote types so we can avoid generating unnecessary unsigned extends during DAG ISel. Differential Revision: https://reviews.llvm.org/D69556	2019-12-03 11:12:52 +00:00
David Green	469ee617a0	[ARM] Add ARMVCCThen to tablegen and make use of it. NFC Similar to the parent, this adds some constants to tablegen to replace the existing magic values. Differential Revision: https://reviews.llvm.org/D70825	2019-12-02 19:57:12 +00:00
David Green	a223a4d66f	[ARM] Add ARMCC constants to tablegen. NFC I got tired of looking at magic constants in tablegen files. This adds condition codes like ARMCCeq and makes use of them. I also removed the extra patterns for reverse condition codes from D70296, they should now be covered by the parent commit. Differential Revision: https://reviews.llvm.org/D70824	2019-12-02 19:57:12 +00:00
David Green	57d96ab593	[ARM] Add some VCMP folding and canonicalisation The VCMP instructions in MVE can accept a register or ZR, but only as the right hand operator. Most of the time this will already be correct because the icmp will have been canonicalised that way already. There are some cases in the lowering of float conditions that this will not apply to though. This code should fix up those cases. Differential Revision: https://reviews.llvm.org/D70822	2019-12-02 19:57:12 +00:00
Simon Tatham	d173fb5d28	[ARM,MVE] Add intrinsics to deal with predicates. Summary: This commit adds the `vpselq` intrinsics which take an MVE predicate word and select lanes from two vectors; the `vctp` intrinsics which create a tail predicate word suitable for processing the first m elements of a vector (e.g. in the last iteration of a loop); and `vpnot`, which simply complements a predicate word and is just syntactic sugar for the `~` operator. The `vctp` ACLE intrinsics are lowered to the IR intrinsics we've already added (and which D70592 just reorganized). I've filled in the missing isel rule for VCTP64, and added another set of rules to generate the predicated forms. I needed one small tweak in MveEmitter to allow the `unpromoted` type modifier to apply to predicates as well as integers, so that `vpnot` doesn't pointlessly convert its input integer to an `<n x i1>` before complementing it. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70485	2019-12-02 16:20:30 +00:00
Simon Tatham	48cce077ef	[ARM,MVE] Rename and clean up VCTP IR intrinsics. Summary: D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction, to use in tail predication. But the 64-bit one doesn't work properly: its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE type (due to not having a full set of instructions that manipulate it usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through `opt` fine, as the test expects, but if you then feed it to `llc` it causes a type legality failure at isel time. The usual workaround we've been using in the rest of the MVE intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the `vctp64` IR intrinsic to do that, and completely removed the code (and test) that uses that intrinsic for 64-bit tail predication. That will allow me to add isel rules (upcoming in D70485) that actually generate the VCTP64 instruction. Also renamed all four of these IR intrinsics so that they have `mve` in the name, since its absence was confusing. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70592	2019-12-02 16:20:30 +00:00
Victor Campos	dcf11c5e86	[ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A Summary: Add support for vcadd_* family of intrinsics. This set of intrinsics is available in Armv8.3-A. The fp16 versions require the FP16 extension, which has been available (opt-in) since Armv8.2-A. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70862	2019-12-02 14:38:39 +00:00
Mark Murray	510792a2e0	[ARM][MVE][Intrinsics] Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics. Summary: Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics and their predicated versions. Add unit tests. Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70829	2019-12-02 11:18:53 +00:00
David Green	e9e1daf2b9	[ARM] Remove VHADD patterns These instructions do not work quite like I expected them to. They perform the addition and then shift in a higher precision integer, so do not match up with the patterns that we added. For example with s8s, adding 100 and 100 should wrap leaving the shift to work on a negative number. VHADD will instead do the arithmetic in higher precision, giving 100 overall. The vhadd gives a "better" result, but not one that matches up with the input. I am just removing the patterns here. We might be able to re-add them in the future by checking for wrap flags or changing bitwidths. But for the moment just remove them to remove the problem cases.	2019-12-02 10:38:14 +00:00
Carey Williams	76fd58d0fe	Revert "[ARM] Allocatable Global Register Variables for ARM" This reverts commit `2d739f98d8`.	2019-11-29 17:01:05 +00:00
Victor Campos	e478385e77	[ARM] Fix instruction selection for ARMISD::CMOV with f16 type Summary: In the cases where the CMOV (f16) SDNode is used with condition codes LT, LE, VC or NE, it is successfully selected into a VSEL instruction. In the remaining cases, however, instruction selection fails since VSEL does not support other condition codes. This patch handles such cases by using the single-precision version of the VMOV instruction. Reviewers: ostannard, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70667	2019-11-29 10:40:37 +00:00
Mark Murray	a048bf87fb	[ARM][MVE][Intrinsics] Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests. Summary: Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70547	2019-11-27 16:52:05 +00:00
Mark Murray	e8a8dbe9c4	[ARM][MVE][Intrinsics] Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests. Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70546	2019-11-27 16:52:05 +00:00
Mark Murray	f4bba07b87	[ARM][MVE][Intrinsics] Add MVE VABD intrinsics. Add unit tests. Summary: Add MVE VABD intrinsics. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70545	2019-11-27 16:52:04 +00:00
David Green	9f15fcc271	[ARM] Replace arm_neon_vqadds with sadd_sat This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat. This helps generate vqadds from standard IR nodes, which might be produced from the vectoriser. The old variants are removed in the process. Differential Revision: https://reviews.llvm.org/D69350	2019-11-27 13:32:29 +00:00
David Green	4965779f17	[ARM] Clean up the load and store code. NFC Some of these patterns have grown quite organically. I've tried to organise them a little here, moving all the PatFlags together and giving them a more consistent naming scheme, to allow some of the later patterns to be merged into a single multiclass. Differential Revision: https://reviews.llvm.org/D70178	2019-11-26 16:21:01 +00:00
David Green	b5315ae8ff	[Codegen][ARM] Add addressing modes from masked loads and stores MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176	2019-11-26 16:21:01 +00:00
Sam Parker	28166816b0	[ARM][ReachingDefs] Remove dead code in loloops. Add some more helper functions to ReachingDefs to query the uses of a given MachineInstr and also to query whether two MachineInstrs use the same def of a register. For Arm, while tail-predicating, these helpers are used in the low-overhead loops to remove the dead code that calculates the number of loop iterations. Differential Revision: https://reviews.llvm.org/D70240	2019-11-26 10:27:46 +00:00
Sam Parker	cced971fd3	[ARM][ReachingDefs] RDA in LoLoops Add several new methods to ReachingDefAnalysis: - getReachingMIDef, instead of returning an integer, return the MachineInstr that produces the def. - getInstFromId, return a MachineInstr for which the given integer corresponds to. - hasSameReachingDef, return whether two MachineInstr use the same def of a register. - isRegUsedAfter, return whether a register is used after a given MachineInstr. These methods have been used in ARMLowOverhead to replace searching for uses/defs. Differential Revision: https://reviews.llvm.org/D70009	2019-11-26 10:13:46 +00:00
Sam Parker	4a59eedd2d	[ARM][ConstantIslands] Correct block size update When inserting a non-decrementing LE, the basic block was being resized to take into consideration that a tCMP and tBcc had been combined into one T1 instruction. This is not true in the LE case where we generate a CBN?Z and an LE. Differential Revision: https://reviews.llvm.org/D70536	2019-11-26 09:55:58 +00:00
Momchil Velikov	09555ce071	[ARM] Generate CMSE instructions from CMSE intrinsics This patch adds instruction selection patterns for the TT, TTT, TTA, and TTAT instructions and tests for llvm.arm.cmse.tt, llvm.arm.cmse.ttt, llvm.arm.cmse.tta, and llvm.arm.cmse.ttat intrinsics (added in a previous patch). Patch by Javed Absar. Differential Revision: https://reviews.llvm.org/D70407	2019-11-25 18:26:12 +00:00
Anna Welker	6fc3e6f2eb	[ARM][MVE] Select vqneg Adds a pattern to ARMInstrMVE.td to use a VQNEG instruction if an equivalent multi-instruction construct is found. Differential Revision: https://reviews.llvm.org/D70491	2019-11-25 11:29:14 +00:00
Tom Stellard	ab411801b8	[cmake] Explicitly mark libraries defined in lib/ as "Component Libraries" Summary: Most libraries are defined in the lib/ directory but there are also a few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining "Component Libraries" as libraries defined in lib/ that may be included in libLLVM.so. Explicitly marking the libraries in lib/ as component libraries allows us to remove some fragile checks that attempt to differentiate between lib/ libraries and tools/ libraires: 1. In tools/llvm-shlib, because llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of all libraries defined in the whole project, there was custom code needed to filter out libraries defined in tools/, none of which should be included in libLLVM.so. This code assumed that any library defined as static was from lib/ and everything else should be excluded. With this change, llvm_map_components_to_libnames(LIB_NAMES, "all") only returns libraries that have been added to the LLVM_COMPONENT_LIBS global cmake property, so this custom filtering logic can be removed. Doing this also fixes the build with BUILD_SHARED_LIBS=ON and LLVM_BUILD_LLVM_DYLIB=ON. 2. There was some code in llvm_add_library that assumed that libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or ARG_LINK_COMPONENTS set. This is only true because libraries defined lib lib/ use LLVMBuild.txt and don't set these values. This code has been fixed now to check if the library has been explicitly marked as a component library, which should now make it easier to remove LLVMBuild at some point in the future. I have tested this patch on Windows, MacOS and Linux with release builds and the following combinations of CMake options: - "" (No options) - -DLLVM_BUILD_LLVM_DYLIB=ON - -DLLVM_LINK_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON Reviewers: beanz, smeenai, compnerd, phosek Reviewed By: beanz Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70179	2019-11-21 10:48:08 -08:00
Anna Welker	96e94e37e3	[ARM][MVE] Select vqabs Adds a pattern to ARMInstrMVE.td to use a VQABS instruction if an equivalent multi-instruction construct is found. Differential revision: https://reviews.llvm.org/D70181	2019-11-20 13:58:38 +00:00
David Green	882f23caea	[ARM] MVE interleaving load and stores. Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392	2019-11-19 18:37:30 +00:00
Simon Tatham	254b4f2500	[ARM,MVE] Add intrinsics for scalar shifts. This fills in the small family of MVE intrinsics that have nothing to do with vectors: they implement bit-shift operations on 32- or 64-bit values held in one or two general-purpose registers. Most of these shift operations saturate if shifting left, and round to nearest if shifting right, although LSLL and ASRL behave like ordinary shifts. When these instructions take a variable shift count in a register, they pay attention to its sign, so that (for example) LSLL or UQRSHLL will shift left if given a positive number but right if given a negative one. That makes even LSLL and ASRL different enough from standard LLVM IR shift semantics that I couldn't see any better alternative than to simply model the whole family as a set of MVE-specific IR intrinsics. (The //immediate// forms of LSLL and ASRL, on the other hand, do behave exactly like a standard IR shift of a 64-bit value. In fact, those forms don't have ACLE intrinsics defined at all, because you can just write an ordinary C shift operation if you want one of those.) The 64-bit shifts have to be instruction-selected in C++, because they deliver two output values. But the 32-bit ones are simple enough that I could write a DAG isel pattern directly into each Instruction record. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70319	2019-11-19 14:47:29 +00:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Sam Parker	d43913ae38	[ARM][MVE] Enable narrow vectors for tail pred Remove the restriction, from the mve tail predication pass, that the all masked vectors instructions need to be 128-bits. This allows us to supported extending loads and truncating stores. Differential Revision: https://reviews.llvm.org/D69946	2019-11-19 08:51:12 +00:00
Sam Parker	8978c12b39	[ARM][MVE] Tail predication conversion This patch modifies ARMLowOverheadLoops to convert a predicated vector low-overhead loop into a tail-predicatd one. This is currently a very basic conversion, with the following restrictions: - Operates only on single block loops. - The loop can only contain a single vctp instruction. - No other instructions can write to the vpr. - We only allow a subset of the mve instructions in the loop. TODO: Pass the number of elements, not the number of iterations to dlstp/wlstp. Differential Revision: https://reviews.llvm.org/D69945	2019-11-19 08:22:18 +00:00
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Anna Welker	2d739f98d8	[ARM] Allocatable Global Register Variables for ARM Provides support for using r6-r11 as globally scoped register variables. This requires a -ffixed-rN flag in order to reserve rN against general allocation. If for a given GRV declaration the corresponding flag is not found, or the the register in question is the target's FP, we fail with a diagnostic. Differential Revision: https://reviews.llvm.org/D68862	2019-11-18 10:07:37 +00:00
James Y Knight	bf142fc433	MCObjectStreamer: assign MCSymbols in the dummy fragment to offset 0. In MCObjectStreamer, when there is no current fragment, initially symbols are created in a "pending" state and assigned to a dummy empty fragment. Previously, they were not being assigned an offset, and thus evaluateAbsolute would fail if trying to evaluate an expression 'a - b', where both 'a' and 'b' were in this pending state. Also slightly refactored the EmitLabel overload which takes an MCFragment for clarity. Fixes: https://llvm.org/PR41825 Differential Revision: https://reviews.llvm.org/D70062	2019-11-16 09:52:07 -05:00
Simon Tatham	b0c1900820	[ARM,MVE] Add reversed isel patterns for MVE `vcmp qN,rN` Summary: As well as vector/vector compare instructions, MVE also has a family of comparisons taking a vector and a scalar, which compare every lane of the vector against the same value. We generate those at isel time using isel patterns that match `(ARMvcmp vector, (ARMvdup scalar))`. This commit adds corresponding patterns for the operand-reversed form `(ARMvcmp (ARMvdup scalar), vector)`, with condition codes swapped as necessary. That way, we can still generate the vector/scalar compare instruction if the IR happens to have been rearranged to put the operands the other way round, which can happen in some optimization phases. Previously, a vcmp the other way round was handled by emitting a `vdup` instruction to //explicitly// replicate the scalar input into a vector, and then doing a vector/vector comparison. I haven't added a new test, because it turned out that several existing tests were already exhibiting that failure mode. So just updating the expected output in the existing MVE codegen tests demonstrates what's been improved. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70296	2019-11-15 14:06:00 +00:00
Sjoerd Meijer	71327707b0	[ARM][MVE] tail-predication This is a follow up of `d90804d`, to also flag fmcp instructions as instructions that we do not support in tail-predicated vector loops. Differential Revision: https://reviews.llvm.org/D70295	2019-11-15 11:01:13 +00:00
Tim Northover	232cdb3d30	ARM: allow rewriting frame indexes for all prefetch variants. For some reason we could handle PLD but not PLDW or PLI, but all of them can potentially refer to the stack region (if weirdly for PLI).	2019-11-14 14:26:28 +00:00
Simon Pilgrim	b5f94adbf3	Fix uninitialized variable warnings. NFCI.	2019-11-14 14:21:16 +00:00
Anna Welker	e78083929d	[NFC] Fix typo in ARMBaseRegisterInfo	2019-11-14 09:56:18 +00:00
Quentin Colombet	de94cda81b	[LiveInterval] Allow updating subranges with slightly out-dated IR During register coalescing, we update the live-intervals on-the-fly. To do that we are in this strange mode where the live-intervals can be slightly out-of-sync (more precisely they are forward looking) compared to what the IR actually represents. This happens because the register coalescer only updates the IR when it is done with updating the live-intervals and it has to do it this way because updating the IR on-the-fly would actually clobber some information on how the live-ranges that are being updated look like. This is problematic for updates that rely on the IR to accurately represents the state of the live-ranges. Right now, we have only one of those: stripValuesNotDefiningMask. To reconcile this need of out-of-sync IR, this patch introduces a new argument to LiveInterval::refineSubRanges that allows the code doing the live range updates to reason about how the code should look like after the coalescer will have rewritten the registers. Essentially this captures how a subregister index with be offseted to match its position in a new register class. E.g., let say we want to merge: V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32> We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32> overlap, i.e., by choosing a class where we can find "offset + 1 == 3". Put differently we align V2's sub3 with V1's sub1: V2: sub0 sub1 sub2 sub3 V1: <offset> sub0 sub1 This offset will look like a composed subregidx in the the class: V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> => V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> Now if we didn't rewrite the uses and def of V1, all the checks for V1 need to account for this offset to match what the live intervals intend to capture. Prior to this patch, we would fail to recognize the uses and def of V1 and would end up with machine verifier errors: No live segment at def. This could lead to miscompile as we would drop some live-ranges and thus, miss some interferences. For this problem to trigger, we need to reach stripValuesNotDefiningMask while having a mismatch between the IR and the live-ranges (i.e., we have to apply a subreg offset to the IR.) This requires the following three conditions: 1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1> 2. An update with Tuple registers with a possibility to coalesce the subreg index: e.g., v1.dsub_1 == v2.dsub_3 3. Subreg liveness enabled. looking at the IR to decide what is alive and what is not, i.e., calling stripValuesNotDefiningMask. coalescer maintains for the live-ranges information. None of the targets that currently use subreg liveness (i.e., the targets that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and and #2, so this patch also artificial enables subreg liveness for ARM, so that a nice test case can be attached.	2019-11-13 11:17:56 -08:00
Matthew Malcomson	e5f3760e8c	Fix comment spelling {addresing -> addressing} (NFC)	2019-11-13 16:14:32 +00:00
Sjoerd Meijer	d90804d26b	[ARM][MVE] canTailPredicateLoop This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845	2019-11-13 13:24:33 +00:00
Simon Tatham	5b9e4daef0	[ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement. MVE includes instructions that extract an 8- or 16-bit lane from a vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td` already included isel patterns to select those instructions in response to the `ARMISD::VGETLANEs` selection-DAG node type. But `ARMISD::VGETLANEs` was never actually generated, because the code that creates it was conditioned on NEON only. It's an easy fix to enable the same code for integer MVE, and now IR that sign-extends the result of an extractelement (whether explicitly or as part of the function call ABI) will use `vmov.s8` instead of `vmov.u8` followed by `sxtb`. Reviewers: SjoerdMeijer, dmgreen, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70132	2019-11-13 09:08:41 +00:00
Peter Collingbourne	1549b4699a	ARM: Don't emit R_ARM_NONE relocations to compact unwinding decoders in .ARM.exidx on Android. These relocations are specified by the ARM EHABI (section 6.3). As I understand it, their purpose is to accommodate unwinder implementations that wish to reduce code size by placing the implementations of the compact unwinding decoders in a separate translation unit, and using extern weak symbols to refer to them from the main unwinder implementation, so that they are only linked when something in the binary needs them in order to unwind. However, neither of the unwinders used on Android (libgcc, LLVM libunwind) use this technique, and in fact emitting these relocations ends up being counterproductive to code size because they cause a copy of the unwinder to be statically linked into most binaries, regardless of whether it is actually needed. Furthermore, these relocations create circular dependencies (between libc and the unwinder) in cases where the unwinder is dynamically linked and libc contains compact unwind info. Therefore, deviate from the EHABI here and stop emitting these relocations on Android. Differential Revision: https://reviews.llvm.org/D70027	2019-11-12 10:52:59 -08:00
Matt Arsenault	e6c9a9af39	Use MCRegister in copyPhysReg	2019-11-11 14:42:33 +05:30
Eli Friedman	5df3a87224	[AArch64][X86] Don't assume __powidf2 is available on Windows. We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013	2019-11-08 12:43:21 -08:00
Djordje Todorovic	8d2ccd1ac3	Reland: [TII] Use optional destination and source pair as a return value; NFC Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods to return optional machine operand pair of destination and source registers. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D69622	2019-11-08 13:00:39 +01:00
Sjoerd Meijer	6c2a4f5ff9	[TTI][LV] preferPredicateOverEpilogue We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040	2019-11-06 10:14:20 +00:00
Simon Tatham	6c3fee47a6	[ARM,MVE] Add intrinsics for gather/scatter load/stores. This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791	2019-11-06 09:01:42 +00:00
Daniel Sanders	e74c5b9661	[globalisel] Rename G_GEP to G_PTR_ADD Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734	2019-11-05 10:31:17 -08:00
David Green	cf581d7977	[ARM] Always enable UseAA in the arm backend This feature controls whether AA is used into the backend, and was previously turned on for certain subtargets to help create less constrained scheduling graphs. This patch turns it on for all subtargets, so that they can all make use of the extra information to produce better code. Differential Revision: https://reviews.llvm.org/D69796	2019-11-05 10:46:56 +00:00
David Green	7d9af03ff7	[Scheduling][ARM] Consistently enable PostRA Machine scheduling In the ARM backend, for historical reasons we have only some targets using Machine Scheduling. The rest use the old list scheduler as they are using itinaries and the list scheduler seems to produce better code (and not crash running out of register on v6m codes). So whether to use the MIScheduler or not is checked at runtime from the subtarget features. This is fine, except for post-ra scheduling. Whether to use the old post-ra list scheduler or the post-ra machine schedule is decided as the pass manager is set up, in arms case from a newly constructed subtarget. Under some situations, like LTO, this won't include the correct cpu so can pick the wrong option. This can have a surprising effect on performance. To fix that, this patch overrides targetSchedulesPostRAScheduling and addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and picking at runtime which to execute. To pick between the two I've had to add a enablePostRAMachineScheduler() method that normally returns enableMachineScheduler() && enablePostRAScheduler(), which can be overridden to enable just one of PostRAMachineScheduler vs PostRAScheduler. Thanks to David Penry for the identifying this problem. Differential Revision: https://reviews.llvm.org/D69775	2019-11-05 10:44:55 +00:00
Oliver Stannard	73c3137a82	Fix static analysis warnings in ARM calling convention lowering Fixes https://bugs.llvm.org/show_bug.cgi?id=43891	2019-11-04 17:17:55 +00:00
David Green	91b0cad813	[ARM] Use isFMAFasterThanFMulAndFAdd for MVE The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd, where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually available for scalar code. For MVE we don't have the non-fused version though. It makes more sense for isFMAFasterThanFMulAndFAdd to return true, allowing us to simplify some of the existing ISel patterns. The tests here are that non of the existing tests failed, and so we are still selecting VFMA and VFMS. The one test that changed shows we can now select from fast math flags, as opposed to just relying on the isFMADLegalForFAddFSub option. Differential Revision: https://reviews.llvm.org/D69115	2019-11-04 15:05:41 +00:00
David Green	6bae5d16a2	[ARM] Add vrev32 NEON fp16 patterns Fill in the gaps for vrev32.16 f16 patterns, extending the existing i16 patterns. Differential Revision: https://reviews.llvm.org/D69508	2019-11-04 13:37:01 +00:00
Diogo Sampaio	3169f0129a	[FIX] Removed duplicated v4f16 and v8f16 declarations Reviewers: RKSimon, ostannard Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69795	2019-11-04 11:33:21 +00:00
Simon Pilgrim	d397e29273	A15SDOptimizer::getPrefSPRLane - fix null dereference warning. NFCI	2019-11-02 21:49:12 +00:00
Simon Pilgrim	3842b94c4e	Revert rG57ee0435bd47f23f3939f402914c231b4f65ca5e - [TII] Use optional destination and source pair as a return value; NFC This is breaking MSVC builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20375	2019-10-31 18:00:29 +00:00
Djordje Todorovic	57ee0435bd	[TII] Use optional destination and source pair as a return value; NFC Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods to return optional machine operand pair of destination and source registers. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D69622	2019-10-31 15:34:49 +01:00
Evandro Menezes	215da6606c	[clang][llvm] Obsolete Exynos M1 and M2	2019-10-30 15:02:59 -05:00
Djordje Todorovic	532815dd5c	[ARM][AArch64][DebugInfo] Improve call site instruction interpretation Extend the describeLoadedValue() with support for target specific ARM and AArch64 instructions interpretation. The patch provides specialization for ADD and SUB operations that include a register and an immediate/offset operand. Some of the instructions can operate with global string addresses or constant pool indexes but such cases are omitted since we currently lack flexible support for processing such operands at DWARF production stage. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D67556	2019-10-30 13:58:14 +01:00
Sander de Smalen	d6a7da80aa	Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2) llvm/test/DebugInfo/MIR/X86/live-debug-values-reg-copy.mir failed with EXPENSIVE_CHECKS enabled, causing the patch to be reverted in rG2c496bb5309c972d59b11f05aee4782ddc087e71. This patch relands the patch with a proper fix to the live-debug-values-reg-copy.mir tests, by ensuring the MIR encodes the callee-saves correctly so that the CalleeSaved info is taken from MIR directly, rather than letting it be recalculated by the PEI pass. I've done this by running `llc -stop-before=prologepilog` on the LLVM IR as captured in the test files, adding the extra MOV instructions that were manually added in the original test file, then running `llc -run-pass=prologepilog` and finally re-added the comments for the MOV instructions.	2019-10-29 16:13:07 +00:00
Simon Pilgrim	2c496bb530	Revert rG70f5aecedef9a6e347e425eb5b843bf797b95319 - "Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2)" This fails on EXPENSIVE_CHECKS builds	2019-10-29 11:54:58 +00:00
David Tellenbach	e3a45a24d1	[ARM][Thumb2InstrInfo] Fix default `0` opcode when rewriting frame indices The static functions `positiveOffsetOpcode`, `negativeOffsetOpcode` and `immediateOffsetOpcode` (lib/Target/ARM/Thumb2InstrInfo.cpp) currently can return `0` as default opcode which is meaningless in this situation. This patch replaces this default value by llvm_unreachable. Reviewers: t.p.northover, tellenbach Reviewed By: tellenbach Subscribers: tellenbach, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69432 Patch By: Lorenzo Casalino <lorenzo.casalino93@gmail.com>	2019-10-28 18:58:45 +00:00
Sander de Smalen	70f5aecede	Reland [AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2) Fixed up test/DebugInfo/MIR/Mips/live-debug-values-reg-copy.mir that broke r375425.	2019-10-28 18:05:19 +00:00
Andrew Paverd	d157a9bc8b	Add Windows Control Flow Guard checks (/guard:cf). Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761	2019-10-28 15:19:39 +00:00
vhscampos	f6e11a36c4	[ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250	2019-10-28 11:06:58 +00:00
Jian Cai	a6b0219fc4	Revert "[ARM] Uses "Sun Style" syntax for section switching" This reverts commit `03de2f84fc`.	2019-10-25 14:03:07 -07:00
Jian Cai	03de2f84fc	[ARM] Uses "Sun Style" syntax for section switching Summary: Support "Sun Style" syntax for section switching ("#alloc,#write" etc). https://bugs.llvm.org/show_bug.cgi?id=43759 Reviewers: peter.smith, eli.friedman, kristof.beyls, t.p.northover Reviewed By: peter.smith Subscribers: MaskRay, llozano, manojgupta, nickdesaulniers, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69296	2019-10-25 13:27:35 -07:00
Guillaume Chatelet	a4783ef58d	[Alignment][NFC] getMemoryOpCost uses MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307	2019-10-25 21:26:59 +02:00
Simon Tatham	e0ef4ebe2f	[ARM] Add IR intrinsics for MVE VLD[24] and VST[24]. The VST2 and VST4 instructions take two or four vector registers as input, and store part of each register to memory in an interleaved pattern. They come in variants indicating which part of each register they store (VST20 and VST21; VST40 to VST43 inclusive); the intention is that issuing each of those variants in turn has the combined effect of loading or storing the whole set of registers to a memory block of equal size. The corresponding VLD2 and VLD4 instructions load from memory in the same interleaved format: each one overwrites only part of its output register set, and again, the idea is that if you use VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the whole of each register. I've implemented the stores and loads quite differently. The loads were easiest to implement as a single intrinsic that expands to all four VLD4x instructions or both VLD2x, delivering four complete output registers. (Implementing each individual load as a separate instruction taking four input registers to partially overwrite is possible in theory, but pointless, and when I tried it, I found it would need extra work to get the register allocation not to be horrible.) Since that intrinsic delivers multiple outputs, it has to be instruction-selected in custom C++. But the store instructions are easier to model individually, because they don't overwrite any register at all and you can write a DAG Isel pattern in Tablegen for each one. Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load instructions, delivers four full output vectors, and is handled by C++ code, whereas `int_arm_mve_vst4q` expands to just one store instruction, takes four input vectors and a constant indicating which lanes to store, and is handled entirely in Tablegen. (And similarly for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do each one. Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68700	2019-10-24 16:33:13 +01:00
Simon Tatham	ceeff95ca4	[ARM] Add some sample IR MVE intrinsics with C++ isel. This adds some initial example IR intrinsics for MVE instructions that deliver multiple output values, and hence, have to be instruction- selected by custom C++ code instead of Tablegen patterns. I've added the writeback gather load instructions (taking a vector of base addresses and a single common offset, returning a vector of loaded values and an updated vector of base addresses); one example from the long shift family (taking and returning a 64-bit value in two GPRs); and the VADC instruction (which propagates a carry bit from each vector-lane addition to the next, taking an input carry flag in FPSCR and outputting the final one in FPSCR as well). To support the VPT-predicated forms of these instructions, I've written some helper functions to add the cluster of MVE predicate operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used when the instruction actually is predicated (so it takes a predicate mask argument), and `AddEmptyMVEPredicateToOps` is for when the instruction is unpredicated (so it fills in $noreg for the mask). Each one comes in a form suitable for `vpred_n`, and one for `vpred_r` which takes the extra 'inactive' parameter. For VADC, the representation of the carry flag in the IR intrinsic is a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e. with the carry flag in bit 29 of the word. (The user-facing ACLE intrinsic will want it to be in bit 0, but I'll do that on the clang side.) Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68699	2019-10-24 16:33:13 +01:00
Simon Tatham	1b45297e01	[ARM] Begin adding IR intrinsics for MVE instructions. This commit, together with the next few, will add a representative sample of the kind of IR intrinsics that we'll need in order to implement the user-facing ACLE intrinsics for MVE. Supporting all of them will take more work; the intention of this initial series of commits is to implement an intrinsic or two from lots of different categories, as examples and proofs of concept. This initial commit introduces a small number of IR intrinsics for instructions simple enough that they can use Tablegen ISel patterns: the predicated versions of the VADD and VSUB instructions (both integer and FP), VMIN and VMAX, and the float->half VCVT instruction (predicated and unpredicated). When using VPT-predicated instructions in automatic code generation, it will be convenient to specify the predicate value as a vector of the appropriate number of i1. To make it easy to specify all sizes of an instruction in one go and give each one the matching predicate vector type, I've added a system of Tablegen informational records describing MVE's vector types: each one gives the underlying LLVM IR ValueType (which may not be the same if the MVE vector is of explicitly signed or unsigned integers) and an appropriate vNi1 to use as the predicate vector. (Also, those info records include the usual encoding for the types, so that as we add associations between each instruction encoding and one of the new `MVEVectorVTInfo` records, we can remove some of the existing template parameters and replace them with references to the vector type info's fields.) The user-facing ACLE intrinsics will receive a predicate mask as a 16-bit integer, so I've also provided a pair of intrinsics i2v and v2i, to convert between an integer and a vector of i1 by just changing the register class. Reviewers: dmgreen, miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67158	2019-10-24 16:33:13 +01:00
Mirko Brkusanin	4b63ca1379	[Mips] Use appropriate private label prefix based on Mips ABI MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64 regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo we can find out Mips ABI and pick appropriate prefix. Tags: #llvm, #clang, #lldb Differential Revision: https://reviews.llvm.org/D66795	2019-10-23 12:24:35 +02:00
Sander de Smalen	8f2dac471a	Reverted r375425 as it broke some buildbots. llvm-svn: 375444	2019-10-21 19:11:40 +00:00
Sander de Smalen	814548ec8e	[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2) Commit message from D66935: This patch fixes a bug exposed by D65653 where a subsequent invocation of `determineCalleeSaves` ends up with a different size for the callee save area, leading to different frame-offsets in debug information. In the invocation by PEI, `determineCalleeSaves` tries to determine whether it needs to spill an extra callee-saved register to get an emergency spill slot. To do this, it calls 'estimateStackSize' and manually adds the size of the callee-saves to this. PEI then allocates the spill objects for the callee saves and the remaining frame layout is calculated accordingly. A second invocation in LiveDebugValues causes estimateStackSize to return the size of the stack frame including the callee-saves. Given that the size of the callee-saves is added to this, these callee-saves are counted twice, which leads `determineCalleeSaves` to believe the stack has become big enough to require spilling an extra callee-save as emergency spillslot. It then updates CalleeSavedStackSize with a larger value. Since CalleeSavedStackSize is used in the calculation of the frame offset in getFrameIndexReference, this leads to incorrect offsets for variables/locals when this information is recalculated after PEI. This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*` Changes after D66935: Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return the uninitialized CalleeSavedStackSize when running `llc` on a specific pass where the MIR code has already been expected to have gone through PEI. Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug mode, the compiler will assert the recalculated size equals the cached size as calculated through a call to determineCalleeSaves. This fixes two tests: test/DebugInfo/AArch64/asan-stack-vars.mir test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir that otherwise fail when compiled using msan. Reviewed By: omjavaid, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D68783 llvm-svn: 375425	2019-10-21 17:12:56 +00:00
David Green	0765a4c288	[ARM] Extra qdadd patterns This adds some new qdadd patterns to go along with the other recently added qadd's. Differential Revision: https://reviews.llvm.org/D68999 llvm-svn: 375414	2019-10-21 14:06:49 +00:00
David Green	d7b77f2203	[ARM] Add qadd lowering from a sadd_sat This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the same time. The qadd instruction sets the q flag, but we already have many cases where we do not model this in llvm. Differential Revision: https://reviews.llvm.org/D68976 llvm-svn: 375411	2019-10-21 12:33:46 +00:00
Guillaume Chatelet	bac5f6bd21	[Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69243 llvm-svn: 375407	2019-10-21 11:01:55 +00:00
David Green	fba831e791	[ARM] Lower sadd_sat to qadd8 and qadd16 Lower the target independent signed saturating intrinsics to qadd8 and qadd16. This custom lowers them from a sadd_sat, catching the node early before it is promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a qadd8/qadd16, so that we can call demand bits on it to show that it does not use the upper bits. Also handles QSUB8 and QSUB16. Differential Revision: https://reviews.llvm.org/D68974 llvm-svn: 375402	2019-10-21 09:53:38 +00:00
Guillaume Chatelet	3cc4835c00	Use Align for TFL::TransientStackAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69216 llvm-svn: 375398	2019-10-21 08:31:25 +00:00
Reid Kleckner	904cd3e06b	Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each Now X86ISelLowering doesn't depend on many IR analyses. llvm-svn: 375320	2019-10-19 01:31:09 +00:00
Reid Kleckner	1d7b41361f	Prune two MachineInstr.h includes, fix up deps MachineInstr.h included AliasAnalysis.h, which includes a world of IR constructs mostly unneeded in CodeGen. Prune it. Same for DebugInfoMetadata.h. Noticed with -ftime-trace. llvm-svn: 375311	2019-10-19 00:22:07 +00:00
Quentin Colombet	9f9151d494	[GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual method The default implementation of isIncomingArgumentHandler could lead to generating incorrect code. Make it a pure virtual method, so that targets know they have to override it to produce correct code. NFC Differential Revision: https://reviews.llvm.org/D69187 llvm-svn: 375277	2019-10-18 20:13:42 +00:00
Sam Parker	8e6a638c74	[ARM][MVE] Enable truncating masked stores Allow us to generate truncating masked store which take v4i32 and v8i16 vectors and can store to v4i8, v4i16 and v8i8 and memory. Removed support for unaligned masked stores. Differential Revision: https://reviews.llvm.org/D68461 llvm-svn: 375108	2019-10-17 12:11:18 +00:00
Sam Parker	3ff961cabd	[ARM][MVE] Change VPST to use, not def, VPR Unlike VPT, VPST just uses the current value of VPR.P0. Differential Revision: https://reviews.llvm.org/D69037 llvm-svn: 375087	2019-10-17 08:46:31 +00:00
Sam Parker	39af8a3a3b	[DAGCombine][ARM] Enable extending masked loads Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085	2019-10-17 07:55:55 +00:00
Guillaume Chatelet	882c43d703	[Alignment][NFC] Use Align for TargetFrameLowering/Subtarget Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68993 llvm-svn: 375084	2019-10-17 07:49:39 +00:00
Mikhail Maltsev	95b5d459a0	[ARM] Add a register class for GPR pairs without SP and use it. NFCI Summary: Currently Thumb2InstrInfo.cpp uses a register class which is auto-generated by tablegen. Such approach is fragile because auto-generated classes might change when other register classes are added. For example, before https://reviews.llvm.org/D62667 we were using GPRPair_with_gsub_1_in_rGPRRegClass, but had to change it to GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass because the former class stopped being generated (this did not change the functionality though). This patch adds a register class consisting of even-odd GPR register pairs from (R0, R1) to (R10, R11), which excludes (R12, SP) and uses it in Thumb2InstrInfo.cpp instead of GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass. Reviewers: ostannard, simon_tatham, dmgreen, efriedma Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69026 llvm-svn: 374990	2019-10-16 10:40:57 +00:00
Sam Parker	1c3ca61294	[ARM][ParallelDSP] Change smlad insertion order Instead of inserting everything after the 'root' of the reduction, insert all instructions as close to their operands as possible. This can help reduce register pressure. Differential Revision: https://reviews.llvm.org/D67392 llvm-svn: 374981	2019-10-16 09:37:03 +00:00
Sam Parker	ce39278f25	[ARM][MVE] validForTailPredication insts Reverse the logic for valid tail predication instructions and create a whitelist instead. Added other instruction groups that aren't obviously safe: - instructions that 'narrow' their result. - lane moves. - byte swapping instructions. - interleaving loads and stores. - cross-beat carries. - top/bottom instructions. - complex operations. Hopefully we should be able to add more of these instructions to the whitelist, once we have a more concrete idea of the transform. Differential Revision: https://reviews.llvm.org/D67904 llvm-svn: 374887	2019-10-15 13:12:51 +00:00
Jian Cai	e9089c223c	[ARM][AsmParser] handles offset expression in parentheses Summary: Integrated assembler does not accept offset expressions surrounded by parenthesis. Handle this case for GAS compability. https://bugs.llvm.org/show_bug.cgi?id=43631 Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68764 llvm-svn: 374832	2019-10-14 22:22:26 +00:00
David Green	543236232c	[ARM] Selection for MVE VMOVN The adds both VMOVNt and VMOVNb instruction selection from the appropriate shuffles. We detect shuffle masks of the form: 0, N, 2, N+2, 4, N+4, ... or 0, N+1, 2, N+3, 4, N+5, ... ISel will also try the opposite patterns, with inputs reversed. These are selected to VMOVNt and VMOVNb respectively. Differential Revision: https://reviews.llvm.org/D68283 llvm-svn: 374781	2019-10-14 15:19:33 +00:00
Sam Parker	527a35e155	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store] Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763	2019-10-14 10:00:21 +00:00
Zi Xuan Wu	9802268ad3	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634	2019-10-12 02:53:04 +00:00
David Green	8628bb0491	[ARM] VQSUB instruction Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377	2019-10-10 16:34:30 +00:00
David Green	39596ec2fe	[ARM] VQADD instructions This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336	2019-10-10 13:05:04 +00:00
Oliver Stannard	4f454b2275	[IfCvt][ARM] Optimise diamond if-conversion for code size Currently, the heuristics the if-conversion pass uses for diamond if-conversion are based on execution time, with no consideration for code size. This adds a new set of heuristics to be used when optimising for code size. This is mostly target-independent, because the if-conversion pass can see the code size of the instructions which it is removing. For thumb, there are a few passes (insertion of IT instructions, selection of narrow branches, and selection of CBZ instructions) which are run after if conversion and affect these heuristics, so I've added target hooks to better predict the code-size effect of a proposed if-conversion. Differential revision: https://reviews.llvm.org/D67350 llvm-svn: 374301	2019-10-10 09:58:28 +00:00
Jinsong Ji	9912232b46	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit `9f41deccc0`. This reverts commit `18b6fe07bc`. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091	2019-10-08 17:32:56 +00:00
Nikola Prica	98603a8153	[DebugInfo][If-Converter] Update call site info during the optimization During the If-Converter optimization pay attention when copying or deleting call instructions in order to keep call site information in valid state. Reviewers: aprantl, vsk, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D66955 llvm-svn: 374068	2019-10-08 15:43:12 +00:00
Nikola Prica	02682498b8	[ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers Support for tracking registers that forward function parameters into the following function frame. For now we only support cases when parameter is forwarded through single register. Reviewers: aprantl, vsk, t.p.northover Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D66953 llvm-svn: 374033	2019-10-08 09:43:05 +00:00
Kristof Beyls	78bfe3ab94	[ARM] Generate vcmp instead of vcmpe Based on the discussion in http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the conclusion was reached that the ARM backend should produce vcmp instead of vcmpe instructions by default, i.e. not be producing an Invalid Operation exception when either arguments in a floating point compare are quiet NaNs. In the future, after constrained floating point intrinsics for floating point compare have been introduced, vcmpe instructions probably should be produced for those intrinsics - depending on the exact semantics they'll be defined to have. This patch logically consists of the following parts: - Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which implemented fine-tuning for when to produce vcmpe (i.e. not do it for equality comparisons). The complexity introduced by those patches isn't needed anymore if we just always produce vcmp instead. Maybe these patches need to be reintroduced again once support is needed to map potential LLVM-IR constrained floating point compare intrinsics to the ARM instruction set. - Simply select vcmp, instead of vcmpe, see simple changes in lib/Target/ARM/ARMInstrVFP.td - Adapt lots of tests that tested for vcmpe (instead of vcmp). For all of these test, the intent of what is tested for isn't related to whether the vcmp should produce an Invalid Operation exception or not. Fixes PR43374. Differential Revision: https://reviews.llvm.org/D68463 llvm-svn: 374025	2019-10-08 08:25:42 +00:00
Zi Xuan Wu	9f41deccc0	[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017	2019-10-08 03:28:33 +00:00
Tim Northover	a7d90af1be	ARM-Darwin: keep the frame register reserved even if not updated. Darwin platforms need the frame register to always point at a valid record even if it's not updated in a leaf function. Backtraces are more important than one extra GPR. llvm-svn: 373738	2019-10-04 12:29:32 +00:00
Simon Pilgrim	b327dc1966	Fix uninitialized variable warning. NFCI llvm-svn: 373582	2019-10-03 11:21:46 +00:00
Benjamin Kramer	12e915b3fc	[ARM] Make helpers static. NFC. llvm-svn: 373503	2019-10-02 18:20:24 +00:00
David Green	c9b5ab8b1c	[ARM] Identity shuffles are legal Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446	2019-10-02 11:40:51 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Sam Parker	aac03ae06a	[ARM][MVE] Change VCTP operand The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration. Differential Revision: https://reviews.llvm.org/D67921 llvm-svn: 373188	2019-09-30 08:03:23 +00:00
Sam Parker	b3438f1cc0	[ARM][CGP] Allow signext arguments As we perform a zext on any arguments used in the promoted tree, it doesn't matter if they're marked as signext. The only permitted user(s) in the tree which would interpret the sign bits are signed icmps. For these instructions, their promoted operands are truncated before the icmp uses them. Differential Revision: https://reviews.llvm.org/D68019 llvm-svn: 373186	2019-09-30 07:52:10 +00:00
David Green	120a5e9a74	[ARM] Cortex-M4 schedule additions This is an attempt to fill in some of the missing instructions from the Cortex-M4 schedule, and make it easier to do the same for other ARM cpus. - Some instructions are marked as hasNoSchedulingInfo as they are pseudos or otherwise do not require scheduling info - A lot of features have been marked not supported - Some WriteRes's have been added for cvt instructions. - Some extra instruction latencies have been added, notably by relaxing the regex for dsp instruction to catch more cases, and some fp instructions. This goes a long way to get the CompleteModel working for this CPU. It does not go far enough as to get all scheduling info for all output operands correct. Differential Revision: https://reviews.llvm.org/D67957 llvm-svn: 373163	2019-09-29 08:38:48 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Alexandros Lamprineas	c006b6f4cb	[MC][ARM] vscclrm disassembles as vldmia Happens only when the mve.fp subtarget feature is enabled: $ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b" .text vldmia pc, {d0, d1, d2, d3} $ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b" .text vscclrm {d0, d1, d2, d3, vpr} Assembling returns the correct encoding with or without mve.fp: $ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}" .text vscclrm {d0, d1, d2, d3, vpr} @ encoding: [0x9f,0xec,0x08,0x0b] $ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}" .text vscclrm {d0, d1, d2, d3, vpr} @ encoding: [0x9f,0xec,0x08,0x0b] The problem seems to be in the TableGen description of VSCCLRMD. The least significant bit should be set to zero. Differential Revision: https://reviews.llvm.org/D68025 llvm-svn: 373052	2019-09-27 08:22:24 +00:00
Simon Pilgrim	2cf54d7b71	ARMBaseInstrInfo getOperandLatency - silence static analyzer dyn_cast<> null dereference warnings. NFCI. The static analyzer is warning about potential null dereferences, but we should be able to use cast<> directly and if not assert will fire for us. llvm-svn: 372992	2019-09-26 16:05:55 +00:00
David Green	10d10102a4	[ARM] Ensure we do not attempt to create lsll #0 During legalisation we can end up with some pretty strange nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid assembly instructions. A long shift with a zero immediate actually encodes a shift by 32. Differential Revision: https://reviews.llvm.org/D67664 llvm-svn: 372839	2019-09-25 10:16:48 +00:00
David Green	2fb41fc70c	[ARM] Split large widening MVE loads Similar to rL372717, we can force the splitting of extends of vector loads in MVE, in order to use the better widening loads as opposed to going through expensive extends. This adds a combine to early-on detect extends of loads and split the load in two, from where normal legalisation will kick in and we get a series of widening loads. Differential Revision: https://reviews.llvm.org/D67909 llvm-svn: 372721	2019-09-24 10:53:09 +00:00
David Green	49d851f403	[ARM] Split large truncating MVE stores MVE does not have a simple sign extend instruction that can move elements across lanes. We currently often end up moving each lane into and out of a GPR, in order to get elements into the correct places. When we have a store of a trunc (or a extend of a load), we can instead just split the store/load in two, using the narrowing/widening load/store instructions from each half of the vector. This does that for stores. It happens very early in a store combine, so as to easily detect the truncates. (It would be possible to do this later, but that would involve looking through a buildvector of extract elements. Not impossible but this way seemed simpler). By enabling store combines we also get a vmovdrr combine for free, helping some other tests. Differential Revision: https://reviews.llvm.org/D67828 llvm-svn: 372717	2019-09-24 10:10:41 +00:00
Pavel Labath	aaff1a631a	MCRegisterInfo: Merge getLLVMRegNum and getLLVMRegNumFromEH Summary: The functions different in two ways: - getLLVMRegNum could return both "eh" and "other" dwarf register numbers, while getLLVMRegNumFromEH only returned the "eh" number. - getLLVMRegNum asserted if the register was not found, while the second function returned -1. The second distinction was pretty important, but it was very hard to infer that from the function name. Aditionally, for the use case of dumping dwarf expressions, we needed a function which can work with both kinds of number, but does not assert. This patch solves both of these issues by merging the two functions into one, returning an Optional<unsigned> value. While the same thing could be achieved by adding an "IsEH" argument to the (renamed) getLLVMRegNumFromEH function, it seemed better to avoid the confusion of two functions and put the choice of asserting into the hands of the caller -- if he checks the Optional value, he can safely process "untrusted" input, and if he blindly dereferences the Optional, he gets the assertion. I've updated all call sites to the new API, choosing between the two options according to the function they were calling originally, except that I've updated the usage in DWARFExpression.cpp to use the "safe" method instead, and added a test case which would have previously triggered an assertion failure when processing (incorrect?) dwarf expressions. Reviewers: dsanders, arsenm, JDevlieghere Subscribers: wdng, aprantl, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67154 llvm-svn: 372710	2019-09-24 09:31:02 +00:00
Mark Murray	c720f63845	Cosmetic; don't use the magic constant 35 when HASH is more readable. This matches other MCK__<THING>_* usage better. Summary: No functional change. This fixes a magic constant in MCK__*_... macros only. Reviewers: ostannard Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67840 llvm-svn: 372599	2019-09-23 12:52:42 +00:00
Guillaume Chatelet	c281b40814	[Alignment] Get DataLayout::StackAlignment as Align Summary: Internally it is needed to know if StackAlignment is set but we can expose it as llvm::Align. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67852 llvm-svn: 372585	2019-09-23 12:01:32 +00:00
Sam Parker	9feb429a33	[ARM][MVE] Remove old tail predicates Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567	2019-09-23 09:48:25 +00:00
Sam Parker	4ba6d0ded2	[ARM][LowOverheadLoops] Use subs during revert. Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a cmp. Differential Revision: https://reviews.llvm.org/D67801 llvm-svn: 372560	2019-09-23 08:57:50 +00:00
Sam Parker	566127e376	[ARM][LowOverheadLoops] Use tBcc when reverting Check the branch target ranges and use a tBcc instead of t2Bcc when we can. Differential Revision: https://reviews.llvm.org/D67796 llvm-svn: 372557	2019-09-23 08:35:31 +00:00
Oliver Cruickshank	c84722ff27	[ARM] Fix CTTZ not generating correct instructions MVE CTTZ intrinsic should have been set to Custom, not Expand llvm-svn: 372401	2019-09-20 15:03:44 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
David Green	0cfb78e52a	[ARM] MVE i1 splat We needn't BFI each lane individually into a predicate register when each lane in the same. A simple sign extend and a vmsr will do. Differential Revision: https://reviews.llvm.org/D67653 llvm-svn: 372313	2019-09-19 12:17:41 +00:00
Simon Pilgrim	da89495a3e	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. llvm-svn: 372308	2019-09-19 10:47:12 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Guillaume Chatelet	d4c4671aa7	[Alignment][NFC] Remove LogAlignment functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67620 llvm-svn: 372231	2019-09-18 15:49:49 +00:00
Krasimir Georgiev	2f1bba7fd0	Revert "[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize" Summary: This reverts commit r372204. This change causes build bot failures under msan: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/35236/steps/check-llvm%20msan/logs/stdio: ``` FAIL: LLVM :: DebugInfo/AArch64/asan-stack-vars.mir (19531 of 33579) ****************** TEST 'LLVM :: DebugInfo/AArch64/asan-stack-vars.mir' FAILED **************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc -O0 -start-before=livedebugvalues -filetype=obj -o - /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-dwarfdump -v - \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir -- Exit Code: 2 Command Output (stderr): -- ==62894==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0xdfcafb in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 #1 0xdfae8a in resolveFrameIndexReference /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1580:10 #2 0xdfae8a in llvm::AArch64FrameLowering::getFrameIndexReference(llvm::MachineFunction const&, int, unsigned int&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1536 #3 0x46642c1 in (anonymous namespace)::LiveDebugValues::extractSpillBaseRegAndOffset(llvm::MachineInstr const&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:582:21 #4 0x4647cb3 in transferSpillOrRestoreInst /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:883:11 #5 0x4647cb3 in process /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1079 #6 0x4647cb3 in (anonymous namespace)::LiveDebugValues::ExtendRanges(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1361 #7 0x463ac0e in (anonymous namespace)::LiveDebugValues::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1415:18 #8 0x4854ef0 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13 #9 0x53b0b01 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1648:27 #10 0x53b15f6 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1685:16 #11 0x53b298d in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1750:27 #12 0x53b298d in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1863 #13 0x905f21 in compileModule(char, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:601:8 #14 0x8fdc4e in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:355:22 #15 0x7f67673632e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #16 0x882369 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x882369) MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const Exiting error: -: The file was not recognized as a valid object file FileCheck error: '-' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir ``` Reviewers: bkramer Reviewed By: bkramer Subscribers: sdardis, aprantl, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67710 llvm-svn: 372228	2019-09-18 14:42:09 +00:00
Sander de Smalen	dc2a7f5b39	[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize This patch fixes a bug exposed by D65653 where a subsequent invocation of `determineCalleeSaves` ends up with a different size for the callee save area, leading to different frame-offsets in debug information. In the invocation by PEI, `determineCalleeSaves` tries to determine whether it needs to spill an extra callee-saved register to get an emergency spill slot. To do this, it calls 'estimateStackSize' and manually adds the size of the callee-saves to this. PEI then allocates the spill objects for the callee saves and the remaining frame layout is calculated accordingly. A second invocation in LiveDebugValues causes estimateStackSize to return the size of the stack frame including the callee-saves. Given that the size of the callee-saves is added to this, these callee-saves are counted twice, which leads `determineCalleeSaves` to believe the stack has become big enough to require spilling an extra callee-save as emergency spillslot. It then updates CalleeSavedStackSize with a larger value. Since CalleeSavedStackSize is used in the calculation of the frame offset in getFrameIndexReference, this leads to incorrect offsets for variables/locals when this information is recalculated after PEI. Reviewers: omjavaid, eli.friedman, thegameg, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D66935 llvm-svn: 372204	2019-09-18 09:02:44 +00:00
Eli Friedman	ddf5e86c22	[ARM] VFPv2 only supports 16 D registers. r361845 changed the way we handle "D16" vs. "D32" targets; there used to be a negative "d16" which removed instructions from the instruction set, and now there's a "d32" feature which adds instructions to the instruction set. This is good, but there was an oversight in the implementation: the behavior of VFPv2 was changed. In particular, the "vfp2" feature was changed to imply "d32". This is wrong: VFPv2 only supports 16 D registers. In practice, this means if you specify -mfpu=vfpv2, the compiler will generate illegal instructions. This patch gets rid of "vfp2d16" and "vfp2d16sp", and fixes "vfp2" and "vfp2sp" so they don't imply "d32". Differential Revision: https://reviews.llvm.org/D67375 llvm-svn: 372186	2019-09-17 21:42:38 +00:00
Simon Pilgrim	a9a27d1ded	[ARM][AsmParser] Don't dereference a dyn_cast result. NFCI. The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 372145	2019-09-17 17:26:14 +00:00
David Green	91724b8530	[ARM] Add a SelectTAddrModeImm7 for MVE narrow loads and stores We were previously using the SelectT2AddrModeImm7 for both normal and narrowing MVE loads/stores. As the narrowing instructions do not accept sp as a register, it makes little sense to optimise a FrameIndex into the load, only to have to recover that later on. This adds a SelectTAddrModeImm7 which does not do that folding, and uses it for narrowing load/store patterns. Differential Revision: https://reviews.llvm.org/D67489 llvm-svn: 372134	2019-09-17 15:32:28 +00:00
David Green	22a2209433	[ARM] Reserve an emergency spill slot for fp16 addressing modes that need it Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill slot for instructions that will be out of range to use sp directly. AddrMode5FP16 is 8 bits with a scale of 2. Differential Revision: https://reviews.llvm.org/D67483 llvm-svn: 372132	2019-09-17 15:23:09 +00:00
Sam Parker	1d9ba08543	[ARM] Fix for buildbots Remove setPreservesCFG from ARMConstantIslandPass and add a couple of -verify-machine-dom-info instances into the existing codegen tests. llvm-svn: 372126	2019-09-17 14:21:36 +00:00
David Green	1ff9553057	[ARM] Fix for MVE load/store stack accesses MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices. Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used. Differential Revision: https://reviews.llvm.org/D67327 llvm-svn: 372114	2019-09-17 12:58:51 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Sam Parker	36c922278e	[ARM][LowOverheadLoops] Add LR def safety check Converting the *LoopStart pseudo instructions into DLS/WLS results in LR being defined. These instructions were inserted on the assumption that LR would already contain the loop counter because a mov is introduced during ISel as the the consumers in the loop can only use LR. That assumption proved wrong! So perform a safety check, finding an appropriate place to insert the DLS/WLS instructions or revert if this isn't possible. Differential Revision: https://reviews.llvm.org/D67539 llvm-svn: 372111	2019-09-17 12:19:32 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Sam Parker	95b28a4c72	[ARM] LE support in ConstantIslands The low-overhead branch extension provides a loop-end 'LE' instruction that performs no decrement nor compare, it just jumps backwards. This patch modifies the constant islands pass to try to insert LE instructions in place of a Thumb2 conditional branch, instead of shrinking it. This only happens if a cmp can be converted to a cbn/z and used to exit the loop. Differential Revision: https://reviews.llvm.org/D67404 llvm-svn: 372085	2019-09-17 09:08:05 +00:00
Sam Parker	26a475afe5	[ARM][MVE] Add invalidForTailPredication to TSFlags Set this bit for the MVE reduction instructions to prevent a loop from becoming tail predicated in their presence. Differential Revision: https://reviews.llvm.org/D67444 llvm-svn: 372076	2019-09-17 07:43:04 +00:00
David Green	8d21460dc5	[ARM] A predicate cast of a predicate cast is a predicate cast The adds some very basic folding of PREDICATE_CASTS, removing cases when they are chained together. These would already be removed eventually, as these are lowered to copies. This just allows it to happen earlier, which can help other simplifications. Differential Revision: https://reviews.llvm.org/D67591 llvm-svn: 372012	2019-09-16 17:29:07 +00:00
Oliver Cruickshank	ee6fbebbaf	[ARM] Add patterns for BSWAP intrinsic on MVE BSWAP can use the VREV instruction on MVE to produce better results than expanding. llvm-svn: 372002	2019-09-16 15:20:10 +00:00
Oliver Cruickshank	e9510a6cad	[ARM] Add patterns for bitreverse intrinsic on MVE BITREVERSE can use the VBRSR which will reverse and right shift. Shifting right by 0 will just reverse the bits. llvm-svn: 372001	2019-09-16 15:20:03 +00:00
Oliver Cruickshank	5f799ef162	[ARM] Lower CTTZ on MVE Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and count the leading zeros, equivalent to a count trailing zeros (CTTZ). llvm-svn: 372000	2019-09-16 15:19:56 +00:00
Oliver Cruickshank	cd1a0b9271	[ARM] Add patterns for CTLZ on MVE CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999	2019-09-16 15:19:49 +00:00
David Green	ce7328cb61	[ARM] Fold VCMP into VPT MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold them, searching back through the basicblock for a valid VCMP and creating the VPT from its operands. There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places. Differential Revision: https://reviews.llvm.org/D66577 llvm-svn: 371982	2019-09-16 13:02:41 +00:00
David Green	b325c05732	[ARM] Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932	2019-09-15 14:14:47 +00:00
David Green	b7b7f26220	[ARM] Add earlyclobber for cross beat MVE instructions rL367544 added @earlyclobbers for the MVE VREV64 instruction. This adds the same for a number of other 32bit instructions that are similarly unpredictable if the destination equals the source (due to the cross beat nature of the instructions). This includes: VCADD.f32 VCADD.i32 VCMUL.f32 VHCADD.s32 VMULLT/B.s/u32 VQDMLADH{X}.s32 VQRDMLADH{X}.s32 VQDMLSDH{X}.s32 VQRDMLSDH{X}.s32 VQDMULLT/B.s32 with Qm and Rm No tests here as this would require intrinsics (or very interesting codegen) to manifest. The tests will follow naturally as the intrinsics are added. Differential Revision: https://reviews.llvm.org/D67462 llvm-svn: 371838	2019-09-13 11:20:17 +00:00
Sam Tebbs	1572b68509	[ARM] Add support for MVE vmaxv and vminv This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv. Differential Revision: https://reviews.llvm.org/D66413 llvm-svn: 371827	2019-09-13 09:11:46 +00:00
Guillaume Chatelet	af11cc7eb5	[Alignment] Move OffsetToAlignment to Alignment.h Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, JDevlieghere, alexshap, rupprecht, jhenderson Subscribers: sdardis, nemanjai, hiraditya, kbarton, jakehehrlich, jrtc27, MaskRay, atanasyan, jsji, seiya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67499 llvm-svn: 371742	2019-09-12 15:20:36 +00:00
Guillaume Chatelet	97264366fb	[Alignment][NFC] use llvm::Align for AsmPrinter::EmitAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: dschuff, sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67443 llvm-svn: 371616	2019-09-11 13:37:35 +00:00
Guillaume Chatelet	48904e9452	[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608	2019-09-11 11:16:48 +00:00
Guillaume Chatelet	b6722af068	[Alignment] Use Align for TargetLowering::MinStackArgumentAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67288 llvm-svn: 371498	2019-09-10 09:01:18 +00:00
David Green	2b7089949e	[ARM] Fix loads and stores for predicate vectors These predicate vectors can usually be loaded and stored with a single instruction, a VSTR_P0. However this instruction will store the entire P0 predicate, 16 bits, zeroextended to 32bits. Each lane of the the v4i1/v8i1/v16i1 representing 4/2/1 bits. As far as I understand, when llvm says "store this v4i1", it really does need to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a store followed by a load, which is how the code is expanded. So this instead lowers the v4i1/v8i1 load/store through some shuffles to get the bits into the correct positions. This, as you might imagine, is not as efficient as a single instruction. But I believe it is needed for correctness. v16i1 equally should not load/store 32bits, only storing the 16bits of data. Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not changing). This is fine as they are self-consistent, it is only "externally observable loads/stores" (from our point of view) that need to be corrected. Differential revision: https://reviews.llvm.org/D67085 llvm-svn: 371419	2019-09-09 16:35:49 +00:00
Simon Tatham	0e48bd24e2	[ARM] Remove some spurious MVE reduction instructions. The family of 'dual-accumulating' vector multiply-add instructions (VMLADAV, VMLALDAV and VRMLALDAVH) can all operate on both signed and unsigned integer types, and they all have an 'exchange' variant (with an X in the name) that modifies which pairs of vector lanes in the two inputs are multiplied together. But there's a clause in the spec that says that the X variants //don't// operate on unsigned integer types, only signed. You can have X, or unsigned, or neither, but not both. We didn't notice that clause when we implemented the MC support for these instructions, so LLVM believes that things like VMLADAVX.U8 do exist, contradicting the spec. Here I fix that by conditioning them out in Tablegen. In order to do that, I've reversed the nesting order of the Tablegen multiclasses for those instructions. Previously, the innermost multiclass generated the X and not-X variants, and the one outside that generated the A and not-A variants. Now X is done by the outer multiclass, which allows me to bypass that one when I only want the two not-X variants. Changing the multiclass nesting order also changes the names of the instruction ids unless I make a special effort not to. I decided that while I was changing them anyway I'd make them look nicer; so now the instructions have names like MVE_VMLADAVs32 or MVE_VMLADAVaxs32, instead of cumbersome _noacc_noexch suffixes. The corresponding multiply-subtract instructions are unaffected. Those don't accept unsigned types at all, either in the spec or in LLVM. Reviewers: ostannard, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67214 llvm-svn: 371405	2019-09-09 15:17:26 +00:00
Sam Parker	1ad508e8e2	[ARM][MVE] VCTP instruction selection Add codegen support for vctp{8,16,32}. Differential Revision: https://reviews.llvm.org/D67344 llvm-svn: 371395	2019-09-09 12:54:47 +00:00
David Green	d936a6301b	[ARM] Prevent generating NEON stack accesses under MVE. We should not be generating Neon stack loads/stores even for these large registers. No test here because my understanding is we will only generate these QQPR regs for intrinsics and VLDn's. The tests will follow once those are available. Differential revision: https://reviews.llvm.org/D67169 llvm-svn: 371386	2019-09-09 10:46:25 +00:00
Oliver Stannard	6b9aedaec6	[ARM][MVE] Decoding of uqrshl and sqrshl accepts unpredictable encodings Specify the Unpredictable bits, and return softfails when appropriate. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66939 llvm-svn: 371374	2019-09-09 08:50:28 +00:00
Sam Parker	c363deb575	[ARM][ParallelDSP] Fix for sext input The incoming accumulator value can be discovered through a sext, in which case there will be a mismatch between the input and the result. So sign extend the accumulator input if we're performing a 64-bit mac. Differential Revision: https://reviews.llvm.org/D67220 llvm-svn: 371370	2019-09-09 08:39:14 +00:00
David Green	df2501adca	[ARM] Remove declaration of unimplemented function. NFC. llvm-svn: 371331	2019-09-08 13:13:15 +00:00
Simon Pilgrim	d7d8bb937a	Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI. llvm-svn: 371302	2019-09-07 11:04:04 +00:00
Teresa Johnson	9c27b59cec	Change TargetLibraryInfo analysis passes to always require Function Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284	2019-09-07 03:09:36 +00:00
Oliver Cruickshank	a050307c05	[ARM] Add patterns for VSUB with q and r registers Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231	2019-09-06 17:02:42 +00:00
Oliver Cruickshank	3aed95af4e	[ARM] Add patterns for VADD with q and r registers Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230	2019-09-06 17:02:35 +00:00
Oliver Cruickshank	9bf27928e1	[ARM] Add patterns for VMUL with q and r registers Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229	2019-09-06 17:02:21 +00:00
Sam Tebbs	f1cdd95a2f	[ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218	2019-09-06 16:01:32 +00:00
Guillaume Chatelet	9fcf066d0c	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210	2019-09-06 14:51:15 +00:00
Guillaume Chatelet	4fc3ad9e13	[Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200	2019-09-06 12:48:34 +00:00
Sam Parker	312409e464	[ARM] MVE Tail Predication The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179	2019-09-06 08:24:41 +00:00
David Candler	a59bffb576	[ARM] Add support for the s,j,x,N,O inline asm constraints A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang: Target independent constraints: s: An integer constant, but allowing only relocatable values ARM specific constraints: j: An immediate integer between 0 and 65535 (valid for MOVW) x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3 N: An immediate integer between 0 and 31 (Thumb1 only) O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only) This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation. Differential Revision: https://reviews.llvm.org/D65863 Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a llvm-svn: 371079	2019-09-05 15:17:25 +00:00
David Green	83a3341246	[ARM] Fixup the creation of VPT blocks This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064	2019-09-05 13:37:04 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Sam Parker	fea532230b	[ARM][ParallelDSP] SExt mul for accumulation For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851	2019-09-04 08:41:34 +00:00
Amara Emerson	fbaf425b79	[GlobalISel][CallLowering] Add support for splitting types according to calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822	2019-09-03 21:42:28 +00:00
Reid Kleckner	b2d10cf22e	[MC] Pass through .code16/32/64 and .syntax unified for COFF These flags should simply be passed through to the target, which will do the right thing. Add an MC/X86 test that uses these directives with the three primary object file formats and shows that they disassemble the same everywhere. There is a missing test for .code32 on Windows ARM, since I'm not sure exactly how to construct one. Fixes PR43203 llvm-svn: 370805	2019-09-03 18:16:52 +00:00
David Green	2f3574c168	[ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745	2019-09-03 11:30:54 +00:00
David Green	61973d978b	[ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741	2019-09-03 11:06:24 +00:00
David Green	57cc65ff47	[ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions. Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739	2019-09-03 10:53:07 +00:00
David Green	3e8d5f335d	[ARM] Fix MVE ldst offset ranges We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731	2019-09-03 09:57:02 +00:00
Oliver Stannard	3be2df2418	[ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodings Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set. Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g. Fill in the "should-be-(0)" bits. Designate the Unpredictable{} bits for both VMRS and VMSR. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66938 llvm-svn: 370729	2019-09-03 09:55:30 +00:00
Oliver Stannard	39bf484d92	Bug fix on function epilog optimization (ARM backend) To save a 'add sp,#val' instruction by adding registers to the final pop instruction, the first register transferred by this pop instruction need to be found. If the function to be optimized has a non-void return value, the operand list contains r0 (implicit) which prevents the optimization to take place. Therefore implicit register references should be skipped in the search loop, because this registers are never popped from the stack. Patch by Rainer Herbertz (rOptimizer)! Differential revision: https://reviews.llvm.org/D66730 llvm-svn: 370728	2019-09-03 09:51:19 +00:00
Sam Tebbs	8b2df85d02	[ARM] Select vmla This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704	2019-09-03 08:17:46 +00:00
David Green	a95ec59fa5	[ARM] Use MQPR not QPR for MVE registers We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676	2019-09-02 17:18:23 +00:00
David Green	8469a39af3	[ARM] Remove MVE masked loads/stores These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607	2019-09-01 10:11:40 +00:00
David Green	942c2e3795	[ARM] MVE Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329	2019-08-29 10:54:35 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
David Green	379f6186dd	[ARM] Move MVEVPTBlockPass to a separate file. NFC This just pulls the MVEVPTBlockPass into a separate file, as opposed to being wrapped up in Thumb2ITBlockPass. Differential revision: https://reviews.llvm.org/D66579 llvm-svn: 370187	2019-08-28 11:37:31 +00:00
David Green	1c5b143c99	[MVE] VMOVX patterns This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184	2019-08-28 10:13:23 +00:00
Sam Parker	a761ba0f2d	[ARM][ParallelDSP] Change search for muls rL369567 reverted a couple of recent changes made to ARMParallelDSP because of a miscompilation error: PR43073. The issue stemmed from an underlying bug that was caused by adding muls into a reduction before it was proved that they could be executed in parallel with another mul. Most of the changes here are from the previously reverted commits. The additional changes have been made area: 1) The Search function now doesn't insert any muls into the Reduction object. That now happens once the search has successfully finished. 2) For any muls added into the reduction but that weren't paired, we accumulate their values as an input into the smlad. Differential Revision: https://reviews.llvm.org/D66660 llvm-svn: 370171	2019-08-28 08:51:13 +00:00
Sam Clegg	90b6bb75e8	[MC] Minor cleanup to MCFixup::Kind handling. NFC. Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720	2019-08-23 01:00:55 +00:00
Sam Tebbs	a69d9d6156	Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit `cd53ff6`, reapplying `7c6b229`. llvm-svn: 369638	2019-08-22 10:29:20 +00:00
Hans Wennborg	cd53ff6c0d	Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32" It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636	2019-08-22 09:16:53 +00:00
Sam Tebbs	7c6b229204	[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626	2019-08-22 08:12:06 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Nico Weber	ed18e70c86	Revert r367389 (and follow-up r368404); it caused PR43073. llvm-svn: 369567	2019-08-21 19:53:42 +00:00
David Green	717feabdf0	[ARM] Formatting for ARMInstrMVE.td. NFC This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545	2019-08-21 16:20:35 +00:00
Sam Tebbs	dcfc2d40d3	[ARM] Select vaddva This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404	2019-08-20 16:33:34 +00:00
Sam Tebbs	f312c1ecf4	[ARM] Add support for MVE vaddv This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245	2019-08-19 09:38:28 +00:00
David Green	2bfc13fde1	[ARM] MVE sext costs This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244	2019-08-19 09:13:22 +00:00
Jian Cai	16fa8b0970	Reland "[ARM] push LR before __gnu_mcount_nc" This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173	2019-08-16 23:30:16 +00:00
Eli Friedman	eaff844fe9	[ARM] Preserve liveness in ARMConstantIslands. We currently don't use liveness information after this point, but it can be useful to catch bugs using -verify-machineinstrs, and optimizations could potentially use this information in the future. Differential Revision: https://reviews.llvm.org/D66319 llvm-svn: 369162	2019-08-16 22:20:14 +00:00
Jian Cai	2d957cfe02	Revert "[ARM] push LR before __gnu_mcount_nc" This reverts commit `f4cf3b9593`. llvm-svn: 369149	2019-08-16 20:40:21 +00:00
Jian Cai	f4cf3b9593	[ARM] push LR before __gnu_mcount_nc Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147	2019-08-16 20:21:08 +00:00
David Green	b782e61e47	[ARM] MVE sext of a load is free MVE also has some sext of loads, which will be free just as scalar instructions are. Differential Revision: https://reviews.llvm.org/D66008 llvm-svn: 369118	2019-08-16 15:13:37 +00:00
David Green	6e1ac42474	[ARM] Correct register for narrowing and widening MVE loads and stores. The widening and narrowing MVE instructions like VLDRH.32 are only permitted to use low tGPR registers. This means that if they are used for a stack slot, where the register used is only decided during frame setup, we need to be able to correctly pick a thumb1 register over a normal GPR. This attempts to add the required logic into eliminateFrameIndex and rewriteT2FrameIndex, only picking the FrameReg if it is a valid register for the operands register class, and picking a valid scratch register for the register class. Differential Revision: https://reviews.llvm.org/D66285 llvm-svn: 369108	2019-08-16 13:42:39 +00:00
David Green	8c2c5f5045	[ARM] Don't pretend we know how to generate MVE VLDn We don't yet know how to generate these instructions for MVE. And in the case of VLD3, we don't even have the instruction. For the moment don't tell the vectoriser that we have VLD4, just to end up serialising the results. Differential Revision: https://reviews.llvm.org/D66009 llvm-svn: 369101	2019-08-16 13:06:49 +00:00
Eli Friedman	9b9a308452	[ARM][LowOverheadLoops] Fix generated code for "revert". Two issues: 1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't really have any visible effect at the moment, but it might matter in the future. 2. The t2CMPri generated for t2WhileLoopStart might need to use a register that isn't LR. My team found this because we have a patch to track register liveness late in the pass pipeline. I'll look into upstreaming it to help catch issues like this earlier. Differential Revision: https://reviews.llvm.org/D66243 llvm-svn: 369069	2019-08-15 23:35:53 +00:00
Philip Reames	5c38ca3534	[SDAG] Minor code cleanup/standardization of atomic accessors [NFC] llvm-svn: 369057	2019-08-15 22:21:14 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
David Green	3a99101812	[ARM] Fix alignment checks for BE VLDRH We need to allow any alignment at least 2, not just exactly 2, so that the big endian loads and stores can be selected successfully. I've also added extra BE testing for the load and store tests. Thanks to Oliver for the report. Differential Revision: https://reviews.llvm.org/D66222 llvm-svn: 368996	2019-08-15 12:54:47 +00:00
David Green	0ff2296a49	[ARM] MVE predicate store patterns Stack loads and stores were already working, but direct stores were not. This adds the patterns for them, same as predicate loads. Differential Revision: https://reviews.llvm.org/D66213 llvm-svn: 368988	2019-08-15 10:41:42 +00:00
David Green	04f2f32869	[ARM] MVE trunc to i1 vectors This adds patterns for selecting trunc instructions from full vectors to i1's vectors. Differential Revision: https://reviews.llvm.org/D66201 llvm-svn: 368981	2019-08-15 09:26:51 +00:00
David Green	a655393f17	[ARM] Add MVE beats vector cost model The MVE architecture has the idea of "beats", where a vector instruction can be executed over several ticks of the architecture. This adds a similar system into the Arm backend cost model, multiplying the cost of all vector instructions by a factor. This factor essentially becomes the expected difference between scalar code and vector code, on average. MVE Vector instructions can also overlap so the a true cost of them is often lower. But equally scalar instructions can in some situations be dual issued, or have other optimisations such as unrolling or make use of dsp instructions. The default is chosen as 2. This should not prevent vectorisation is a most cases (as the vector instructions will still be doing at least 4 times the work), but it will help prevent over vectorising in cases where the benefits are less likely. This adds things so far to the obvious places in ARMTargetTransformInfo, and updates a few related costs like not treating float instructions as cost 2 just because they are floats. Differential Revision: https://reviews.llvm.org/D66005 llvm-svn: 368733	2019-08-13 18:12:08 +00:00
Momchil Velikov	114c37e72a	[ARM] Fix detection of duplicates when parsing reg list operands Differential Revision: https://reviews.llvm.org/D65957 llvm-svn: 368712	2019-08-13 16:13:00 +00:00
Momchil Velikov	f990e4a4c7	[ARM] Fix encoding of APSR in CLRM instruction The APSR is encoded by setting bit 15 in the register list of the CLRM instruction (cf. https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf). Differential Revision: https://reviews.llvm.org/D65873 llvm-svn: 368711	2019-08-13 16:12:46 +00:00
Matt Arsenault	5af9cf042f	GlobalISel: Change representation of shuffle masks Currently shufflemasks get emitted as any other constant, and you end up with a bunch of virtual registers of G_CONSTANT with a G_BUILD_VECTOR. The AArch64 selector then asserts on anything that doesn't fit this pattern. This isn't an ideal representation, and should avoid legalization and have fewer opportunities for a representational error. Rather than invent a new shuffle mask operand type, similar to what ShuffleVectorSDNode does, just track the original IR Constant mask operand. I don't completely like the idea of adding another link to the IR, but MIR is already quite dependent on IR constants already, and this will allow sharing the shuffle mask utility functions with the IR. llvm-svn: 368704	2019-08-13 15:34:38 +00:00
Amara Emerson	e14c91b71a	[GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652	2019-08-13 06:26:59 +00:00
David Green	86876422ef	[ARM] sext of a load is free This teaches the cost model that the sext or zext of a load is going to be free. Differential Revision: https://reviews.llvm.org/D66006 llvm-svn: 368593	2019-08-12 17:39:56 +00:00
David Green	3e39f39ad9	[ARM] MVE shuffle broadcast costs A VDUP will perform a vector broadcast in a single instruction. Update the cost model for MVE accordingly. Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63448 llvm-svn: 368589	2019-08-12 16:54:07 +00:00
David Green	83bbfaa5e4	[ARM] Put some of the TTI costmodel behind hasNeon calls. This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon() checks, now that we have MVE, and updates all the tests accordingly. Differential Revision: https://reviews.llvm.org/D63447 llvm-svn: 368587	2019-08-12 15:59:52 +00:00
David Green	11c4602fce	[MVE] Don't try to unroll vectorised MVE loops Due to the nature of the beat system in the MVE architecture, along with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Preventing unrolling also makes the code easier to read and reason about. So if a loop contains vector code, don't enable the runtime unrolling. At least for the time being. Differential Revision: https://reviews.llvm.org/D65803 llvm-svn: 368530	2019-08-11 08:53:18 +00:00
David Green	44f8d635e2	[ARM] Permit auto-vectorization using MVE With enough codegen complete, we can now correctly report the number and size of vector registers for MVE, allowing auto vectorisation. This also allows FP auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE FP arithmetic and parity between scalar and vector FP behaviour. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63728 llvm-svn: 368529	2019-08-11 08:42:57 +00:00
Daniel Sanders	e9a57c2b23	[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487	2019-08-09 21:11:20 +00:00
Tim Northover	e1a5f668b3	GlobalISel: pack various parameters for lowerCall into a struct. I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408	2019-08-09 08:26:38 +00:00
Sam Parker	0dba791a25	[ARM][ParallelDSP] Replace SExt uses As loads are combined and widened, we replaced their sext users operands whereas we should have been replacing the uses of the sext. I've added a load of tests, with only a few of them originally causing assertion failures, the rest improve pattern coverage. Differential Revision: https://reviews.llvm.org/D65740 llvm-svn: 368404	2019-08-09 07:48:50 +00:00
David Green	27ca82f32a	[ARM] Add support for MVE pre and post inc loads and stores This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305	2019-08-08 15:27:58 +00:00
David Green	824ffd8b12	[ARM] MVE big endian loads/stores This adds some missing patterns for big endian loads/stores, allowing unaligned loads/stores to also be selected with an extra VREV, which produces better code than aligning through a stack. Also moves VLDR_P0 to not be LE only, and adjusts some of the tests to show all that working. Differential Revision: https://reviews.llvm.org/D65583 llvm-svn: 368304	2019-08-08 15:15:19 +00:00
Sam Tebbs	7ca980edcd	[ARM] Select VFMA llvm-svn: 368264	2019-08-08 08:21:01 +00:00
David Green	1becefd3f7	[ARM] Tighten up VLDRH.32 with low alignments VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift. Differential Revision: https://reviews.llvm.org/D65580 llvm-svn: 368256	2019-08-08 06:22:03 +00:00
Oliver Cruickshank	4d4eefda6c	[ARM] Expand CTPOP intrinsic for MVE llvm-svn: 368180	2019-08-07 15:47:45 +00:00
Oliver Cruickshank	30dcae0956	[ARM] Generate MVE VHADDs/VHSUBs llvm-svn: 368146	2019-08-07 10:26:57 +00:00
Sam Parker	173de03740	[ARM][LowOverheadLoops] Revert after read/write Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: - It relies on all load/store instructions being labelled as such in tablegen. - Actually any use of loop decrement is troublesome because the value doesn't exist! So we need to check for any read/write of LR that occurs between the two instructions and revert if we find anything. Differential Revision: https://reviews.llvm.org/D65792 llvm-svn: 368130	2019-08-07 07:39:19 +00:00
Matt Arsenault	f4d3113a5f	CodeGen: Migration to using Register llvm-svn: 367974	2019-08-06 03:59:31 +00:00
Amara Emerson	bc1172df14	[GlobalISel][CallLowering] Rename isArgumentHandler() -> isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945	2019-08-05 23:05:28 +00:00
Matt Arsenault	3922392969	AMDGPU: Correct behavior of f16 buffer loads Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879	2019-08-05 15:59:07 +00:00
Guillaume Chatelet	c97a3d15d2	[LLVM][Alignment] Introduce Alignment Type Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828	2019-08-05 11:02:05 +00:00
Oliver Stannard	8ed8353fc4	Reland: Fix and test inter-procedural register allocation for ARM Add an explicit construction of the ArrayRef, gcc 5 and earlier don't seem to select the ArrayRef constructor which takes a C array when the construction is implicit. Original commit message: - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367819	2019-08-05 09:04:10 +00:00
David Green	91296295d0	[ARM] MVE big endian bitcasts This adds big endian MVE patterns for bitcasts. They are defined in llvm as being the same as a store of the existing type and the load into the new. This means that they have to become a VREV between the two types, working in the same way that NEON works in big-endian. This also adds some example tests for bigendian, showing where code is and isn't different. The main difference, especially from a testing perspective is that vectors are passed as v2f64, and so are VREV into and out of call arguments, and the parameters are passed in a v2f64 format. Same happens for inline assembly where the register class is used, so it is VREV to a v16i8. So some of this is probably not correct yet, but it is (mostly) self-consistent and seems to be consistent with how llvm treats vectors. The rest we can hopefully fix later. More details about big endian neon can be found in https://llvm.org/docs/BigEndianNEON.html. Differential Revision: https://reviews.llvm.org/D65581 llvm-svn: 367780	2019-08-04 10:18:15 +00:00
Nikita Popov	4f8259bdbc	[Thumb] Fix invalid symbol redefinition due to duplicated jumptable (PR42760) Fix for https://bugs.llvm.org/show_bug.cgi?id=42760. A tBR_JTr instruction is duplicated by tail duplication, which results in the same jumptable with the same label being emitted twice. Fix this by marking tBR_JTr as not duplicable. The corresponding ARM/Thumb instructions are already marked as not duplicable. Additionally also mark tTBB_JT and tTBH_JT to be consistent with Thumb2, even though this shouldn't be strictly necessary. Differential Revision: https://reviews.llvm.org/D65606 llvm-svn: 367753	2019-08-03 06:47:23 +00:00
Bill Wendling	41a2847a9a	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750	2019-08-03 05:52:47 +00:00
Douglas Yung	42618b270d	Revert Fix and test inter-procedural register allocation for ARM This reverts r367669 (git commit `f6b00c279a`) This was breaking a build bot http://lab.llvm.org:8011/builders/netbsd-amd64/builds/21233 llvm-svn: 367731	2019-08-02 22:11:49 +00:00
Tim Northover	522fb7eedc	GlobalISel: support swiftself attribute llvm-svn: 367683	2019-08-02 14:09:49 +00:00
Oliver Stannard	4b7239ebac	[IPRA][ARM] Disable no-CSR optimisation for ARM This optimisation isn't generally profitable for ARM, because we can save/restore many registers in the prologue and epilogue using the PUSH and POP instructions, but mostly use individual LDR/STR instructions for other spills. Differential revision: https://reviews.llvm.org/D64910 llvm-svn: 367670	2019-08-02 10:23:17 +00:00
Oliver Stannard	f6b00c279a	Fix and test inter-procedural register allocation for ARM - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367669	2019-08-02 10:23:05 +00:00
Sam Parker	cd38599275	[NFC][ARM[ParallelDSP] Rename/remove/change types Remove forward declaration, fold a couple of typedefs and change one to be more useful. llvm-svn: 367665	2019-08-02 08:21:17 +00:00
Sam Parker	14c6dfdfe2	[NFC][ARM][ParallelDSP] Remove ValueList We only care about the first element in the list. llvm-svn: 367660	2019-08-02 07:32:28 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
David Green	1343814fb4	[ARM] Fix for MVE VREV64 The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the cross-beat nature of the instruction. This adds an earlyclobber to Qd, which seems to be the same way we deal with this on other instructions like the write-back on loads and stores. Differential Revision: https://reviews.llvm.org/D65502 llvm-svn: 367544	2019-08-01 11:22:03 +00:00
Sam Parker	7ca8c6f6db	[NFC][ARM][ParallelDSP] Getters and renaming Add a couple of getters for Reduction and do some renaming of variables around CreateSMLAD for clarity. llvm-svn: 367522	2019-08-01 08:17:51 +00:00
Eli Friedman	89b80f1239	[ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1. This is extremely specific, but saves three instructions when it's legal. I don't think the code can be usefully generalized. Differential Revision: https://reviews.llvm.org/D65351 llvm-svn: 367492	2019-07-31 23:19:21 +00:00
Eli Friedman	2f45ec1c39	[ARM] Transform compare of masked value to shift on Thumb1. Thumb1 has very limited immediate modes, so turning an "and" into a shift can save multiple instructions. It's possible to simplify the generated code for test2 and test3 in cmp-and-fold.ll a little more, but I'll implement that as a followup. Differential Revision: https://reviews.llvm.org/D65175 llvm-svn: 367491	2019-07-31 23:17:34 +00:00
Mark Lacey	641ea2e701	[GISel] Address review feedback on passing MD_callees to lowerCall. Preserve the nullptr default for KnownCallees that appears in the base class. llvm-svn: 367477	2019-07-31 20:34:05 +00:00
Mark Lacey	7b8d3eb9e2	[GISel] Pass MD_callees metadata down in call lowering. Summary: This will make it possible to improve IPRA by taking into account register usage in indirect calls. NFC yet; this is just laying the groundwork to start building up patches to take advantage of the information for improved register allocation. Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65488 llvm-svn: 367476	2019-07-31 20:34:02 +00:00
Mikhail Maltsev	806231ecc3	[ARM] Reject CSEL instructions with invalid operands Summary: According to the Armv8.1-M manual CSEL, CSINC, CSINV and CSNEG are "constrained unpredictable" when SP is used as the source register Rn. The assembler should diagnose this case. Reviewers: momchil.velikov, dmgreen, ostannard, simon_tatham, t.p.northover Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65505 llvm-svn: 367433	2019-07-31 14:22:45 +00:00
Oliver Cruickshank	09a1b8172b	[ARM] Generate MVE VFMAs llvm-svn: 367408	2019-07-31 10:44:11 +00:00
Oliver Cruickshank	e7241e8592	[NFC] Test Commit llvm-svn: 367405	2019-07-31 10:08:09 +00:00
Sam Parker	5ea07f7c07	[NFC][ARMCGP] Use switch in isSupportedValue Use a switch instead of many isa<> while checking for supported values. Also be explicit about which cast instructions are supported; This allows the removal of SIToFP from GenerateSignBits. llvm-svn: 367402	2019-07-31 09:34:11 +00:00
Sam Parker	2200a9bdf3	[ARM][ParallelDSP] Convert to function pass Run across a whole function, visiting each basic block one at a time. Differential Revision: https://reviews.llvm.org/D65324 llvm-svn: 367389	2019-07-31 07:32:03 +00:00
Sam Parker	e3a4a13fcc	[ARM][LowOverheadLoops] Enable by default The code is now in a good enough state to pass the bunch of tests that I have run (after fixing the bugs), so let's enable it by default. Differential Revision: https://reviews.llvm.org/D65277 llvm-svn: 367297	2019-07-30 08:14:28 +00:00
Sam Parker	ed2ea3e46b	[ARM][LowOverheadLoops] Revert non-header LE target Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure. Differential Revision: https://reviews.llvm.org/D65268 llvm-svn: 367296	2019-07-30 08:08:44 +00:00
Sam Parker	414dd1c946	[NFC][ARM[ParallelDSP] Cleanup of BinOpChain - Remove some unused typedefs. - Rename BinOpChain struct to MulCandidate. - Remove the size method of MulCandidate. - Store only the first input of the ValueList provided to MulCandidate, as it's the only value we care about. This means we don't have to perform any ugly (and unnecessary) iterations of the list later on. llvm-svn: 367208	2019-07-29 08:41:51 +00:00
Sam Parker	8538060103	[NFC][ARM][ParallelDSP] Remove AreSymmetrical We explicitly search for a parallel mac and we only care about its inputs, checking for symmetry doesn't add anything here. llvm-svn: 367205	2019-07-29 08:12:24 +00:00
Sam Parker	11ad33ede6	[NFC][ARM][ParallelDSP] Remove PopulateLoads We no longer have to check what loads are used, all this is performed at the start of the transform, so it's not doing anything now. llvm-svn: 367204	2019-07-29 08:07:23 +00:00
David Green	b8b8b46a51	[ARM] MVE VPNOT This adds the patterns required to transform xor P0, -1 to a VPNOT. The instruction operands have to change a little for this, adding an in and an out VCCR reg and using a custom DecodeMVEVPNOT for the decode. Differential Revision: https://reviews.llvm.org/D65133 llvm-svn: 367192	2019-07-28 14:07:48 +00:00
David Green	9cf344e739	[ARM] Better patterns for fp <> predicate vectors These are some better patterns for converting between predicates and floating points. Much like the extends, we select "1"/"-1" or "0" depending on the predicate value. Or we perform a compare against 0 to convert to a predicate. Differential Revision: https://reviews.llvm.org/D65103 llvm-svn: 367191	2019-07-28 13:53:39 +00:00
Sam Parker	3da59e5513	[ARM][ParallelDSP] Combine structs Combine OpChain and BinOpChain structs as OpChain is a base class to BinOpChain that is never used. llvm-svn: 367114	2019-07-26 14:11:40 +00:00
Sam Parker	7440065bd8	[NFC][ARM][ParallelDSP] Cleanup isNarrowSequence Remove unused logic. llvm-svn: 367099	2019-07-26 10:57:42 +00:00
Sam Parker	c760b5da11	[ARM][LowOverheadLoops] Add CPSR defs Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair, so add an implicit def to these pseudo instructions in case that WLS and LE aren't generated. Differential Revision: https://reviews.llvm.org/D65275 llvm-svn: 367089	2019-07-26 08:15:01 +00:00
Pablo Barrio	275954539d	[ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1 Summary: Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1. Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the Arm architecture. Neoverse N1 implements both AArch32 and AArch64. Cortex-A65: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65 Cortex-A65AE: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae Neoverse E1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1 Neoverse N1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 Patch by Diogo Sampaio and Pablo Barrio Reviewers: samparker, LukeCheeseman, sbaranga, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64406 llvm-svn: 367007	2019-07-25 10:59:45 +00:00
Eli Friedman	82e109279d	[ARM] Remove dead code from ARMConstantIslands. tLDRHi is not a pc-relative load; it can't directly refer to a constant pool or jump table. llvm-svn: 366963	2019-07-24 23:36:14 +00:00
David Green	cd7a6fa314	[ARM] Rewrite how VCMP are lowered, using a single node This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934	2019-07-24 17:36:47 +00:00
David Green	047a0b6575	[ARM] Disable MVE fptosi and friends The prevents us from trying to convert an i1 predicate vector to a float, or vice-versa. Better patterns are possible, which will follow in a subsequent commit. For now we just expand them. Differential Revision: https://reviews.llvm.org/D65066 llvm-svn: 366931	2019-07-24 17:26:26 +00:00
David Green	b342bddbe2	[ARM] More MVE compare vector splat combines for ANDs Adds some extra r register compare combines, this time for ANDs. Differential Revision: https://reviews.llvm.org/D65062 llvm-svn: 366928	2019-07-24 17:08:09 +00:00
David Green	93b5f61295	[ARM] MVE compare vector splat combine MVE VCMP instructions can use a general purpose register as the second operand. This adds the combines for it, selecting from a compare of a vdup. Differential Revision: https://reviews.llvm.org/D65061 llvm-svn: 366924	2019-07-24 16:58:41 +00:00
David Green	bab4d8ac5a	[ARM] Better OR's for MVE compares This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920	2019-07-24 16:42:09 +00:00
David Green	69fba7434e	[ARM] Better AND's for MVE compares Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where the second vcmp becomes predicated on the first. The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the VPTBlockPass. Differential Revision: https://reviews.llvm.org/D65058 llvm-svn: 366910	2019-07-24 14:42:05 +00:00
David Green	4fc78c496e	[ARM] MVE floating point compares and selects Much like integers, this adds MVE floating point compares and select. It requires a lot more buildvector/shuffle code because we may need to expand the compares without mve.fp, and requires support for and/or because of the way we lower llvm condition codes. Some original code by David Sherwood Differential Revision: https://reviews.llvm.org/D65054 llvm-svn: 366909	2019-07-24 14:28:22 +00:00
David Green	a4a4698c16	[ARM] Basic And/Or/Xor handling for MVE predicates This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It does this by going into and out of GPRs, doing the operation on scalars. Code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65053 llvm-svn: 366907	2019-07-24 14:17:54 +00:00
Simi Pallipurath	724888af45	[ARM] Make sure that the constant pool does not keep in the middle of an IT block. This change make sure that llvm does not emit an invalid IT block by putting the constant pool in the middle of an IT block. We have code to try to avoid putting a constant island in the middle of an IT block, but it only works if we see an IT between the one currently referencing CPE and possible insertion point. If the first instruction we look at is the VLDRD after the IT , we never see the IT and does not realize that the instruction doing the load could be in an IT block itself. Differential Revision: https://reviews.llvm.org/D64621 Change-Id: I24cecb37cded75e8992870bd997f6226853bd920 llvm-svn: 366905	2019-07-24 13:54:14 +00:00
Sjoerd Meijer	a19f5a76e6	Test commit. NFC. Removed 2 trailing whitespaces in 2 files that used to be in different repos to test my new github monorepo workflow. llvm-svn: 366904	2019-07-24 13:30:36 +00:00
David Green	c7e55d4f52	[ARM] MVE predicate register support This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890	2019-07-24 11:51:36 +00:00
David Green	b9d96ceca0	[ARM] MVE integer compares and selects This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885	2019-07-24 11:08:14 +00:00
Sam Parker	aeb21b96a0	[ARM][ParallelDSP] Fix pointer operand reordering While combining two loads into a single load, we often need to reorder the pointer operands for the new load. This reordering was broken in the cases where there was a chain of values that built up the pointer. Differential Revision: https://reviews.llvm.org/D65193 llvm-svn: 366881	2019-07-24 09:38:39 +00:00
Eli Friedman	b27fc95e89	[ARM] Add opt-bisect support to ARMParallelDSP. llvm-svn: 366851	2019-07-23 20:48:46 +00:00
Sam Parker	57e87dd81b	[ARM][LowOverheadLoops] Fix branch target codegen While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809	2019-07-23 14:08:46 +00:00
David Green	fdedf240f8	[ARM] Rename NEONModImm to VMOVModImm. NFC Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE. llvm-svn: 366790	2019-07-23 09:19:24 +00:00
Sam Parker	4379a40088	[ARM][LowOverheadLoops] Revert remaining pseudos ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691	2019-07-22 14:16:40 +00:00
David Green	8876a312a8	[ARM] Fix for MVE VPT block pass We need to ensure that the number of T's is correct when adding multiple instructions into the same VPT block. Differential revision: https://reviews.llvm.org/D65049 llvm-svn: 366684	2019-07-22 12:51:38 +00:00
Oliver Stannard	6771a89fa0	[IPRA][ARM] Make use of the "returned" parameter attribute ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669	2019-07-22 08:44:36 +00:00
Mikhail Maltsev	0b001f94a5	[ARM] Add <saturate> operand to SQRSHRL and UQRSHLL Summary: According to the new Armv8-M specification https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the instructions SQRSHRL and UQRSHLL now have an additional immediate operand <saturate>. The new assembly syntax is: SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm where <saturate> can be either 64 (the existing behavior) or 48, in that case the result is saturated to 48 bits. The new operand is encoded as follows: #64 Encoded as sat = 0 #48 Encoded as sat = 1 sat is bit 7 of the instruction bit pattern. This patch adds a new assembler operand class MveSaturateOperand which implements parsing and encoding. Decoding is implemented in DecodeMVEOverlappingLongShift. Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64810 llvm-svn: 366555	2019-07-19 09:46:28 +00:00
Diogo N. Sampaio	11512e742b	[ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine Summary: PerformVMOVRRDCombine ommits adding a offset of 4 to the PointerInfo, when converting a f64 = load[M] to {i32, i32} = {load[M], load[M + 4]} Which would allow the machine scheduller to break dependencies with the second load. - pr42638 Reviewers: eli.friedman, dmgreen, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64870 llvm-svn: 366423	2019-07-18 10:05:56 +00:00
Diana Picus	37e403d18c	[ARM GlobalISel] Cleanup CallLowering. NFC Migrate CallLowering::lowerReturnVal to use the same infrastructure as lowerCall/FormalArguments and remove the now obsolete code path from splitToValueTypes. Forgot to push this earlier. llvm-svn: 366308	2019-07-17 10:01:27 +00:00
Kyrylo Tkachov	a3e26d1a6c	[NFC] Test commit: add full stop at end of comment llvm-svn: 366195	2019-07-16 09:15:01 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
David Green	dc56995c57	[ARM] MVE vector for 64bit types We need to make sure that we are sensibly dealing with vectors of types v2i64 and v2f64, even if most of the time we cannot generate native operations for them. This mostly adds a lot of testing, plus fixes up a couple of the issues found. And, or and xor can be legal for v2i64, and shifts combining needs a slight fixup. Differential Revision: https://reviews.llvm.org/D64316 llvm-svn: 366106	2019-07-15 18:42:54 +00:00
David Green	8e7eee617a	[ARM] Minor formatting in ARMInstrMVE.td. NFC llvm-svn: 366089	2019-07-15 17:29:06 +00:00
David Green	6e89887642	[ARM] MVE Vector Shifts This adds basic lowering for MVE shifts. There are many shifts in MVE, but the instructions handled here are: VSHL (imm) VSHRu (imm) VSHRs (imm) VSHL (vector) VSHL (register) MVE, like NEON before it, doesn't have shift right by a vector (or register). We instead have to negate the amount and shift in the opposite direction. This means we have to convert any SHR's into a form of SHL (that is still signed or unsigned) with a negated condition and selecting from there. MVE still does have shifting by an immediate for SHL, ASR and LSR. This adds lowering for these and for register forms, which work well for shift lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially work optimally for right shifts. Differential Revision: https://reviews.llvm.org/D64212 llvm-svn: 366056	2019-07-15 11:35:39 +00:00
David Green	f059147a10	[ARM] Move Shifts after Bits. NFC This just moves the shift instruction definitions further down the ARMInstrMVE.td file, to make positioning patterns slightly more natural. llvm-svn: 366054	2019-07-15 11:22:05 +00:00
David Green	da750b1688	[ARM] Adjust how NEON shifts are lowered This adjusts the way that we lower NEON shifts to use a DAG target node, not via a neon intrinsic. This is useful for handling MVE shifts operations in the same the way. It also renames some of the immediate shift nodes for consistency, and moves some of the processing of immediate shifts into LowerShift allowing it to capture more cases. Differential Revision: https://reviews.llvm.org/D64426 llvm-svn: 366051	2019-07-15 10:44:50 +00:00
David Green	458a720ec1	[ARM] Add sign and zero extend patterns for MVE The vmovlb instructions can be uses to sign or zero extend vector registers between types. This adds some patterns for them and relevant testing. The VBICIMM generation is also put behind a hasNEON check (as is already done for VORRIMM). Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D64069 llvm-svn: 366008	2019-07-13 15:43:00 +00:00
David Green	07a7ec2021	[ARM] MVE VNEG instruction patterns This selects integer VNEG instructions, which can be especially useful with shifts. Differential Revision: https://reviews.llvm.org/D64204 llvm-svn: 366006	2019-07-13 15:26:51 +00:00
David Green	4ce648b5e8	[ARM] MVE integer abs Similar to floating point abs, we also have instructions for integers. Differential Revision: https://reviews.llvm.org/D64027 llvm-svn: 366005	2019-07-13 14:58:32 +00:00
David Green	701bf714db	[ARM] MVE integer min and max This simply makes the MVE integer min and max instructions legal and adds the relevant patterns for them. Differential Revision: https://reviews.llvm.org/D64026 llvm-svn: 366004	2019-07-13 14:48:54 +00:00
David Green	ac5bcbeb9f	[ARM] MVE VRINT support This adds support for the floor/ceil/trunc/... series of instructions, converting to various forms of VRINT. They use the same suffixes as their floating point counterparts. There is not VTINTR, so nearbyint is expanded. Also added a copysign test, to show it is expanded. Differential Revision: https://reviews.llvm.org/D63985 llvm-svn: 366003	2019-07-13 14:38:53 +00:00
David Green	ec8af0db6c	[ARM] MVE minnm and maxnm instructions This adds the patterns for minnm and maxnm from the fminnum and fmaxnum nodes, similar to scalar types. Original patch by Simon Tatham Differential Revision: https://reviews.llvm.org/D63870 llvm-svn: 366002	2019-07-13 14:29:02 +00:00
Sam Parker	08b4a8da07	[ARM][LowOverheadLoops] Correct offset checking This patch addresses a couple of problems: 1) The maximum supported offset of LE is -4094. 2) The offset of WLS also needs to be checked, this uses a maximum positive offset of 4094. The use of BasicBlockUtils has been changed because the block offsets weren't being initialised, but the isBBInRange checks both positive and negative offsets. ARMISelLowering has been tweaked because the test case presented another pattern that we weren't supporting. llvm-svn: 365749	2019-07-11 09:56:15 +00:00
Simon Tatham	7916198a41	[ARM] Remove nonexistent unsigned forms of MVE VQDMLAH. The VQDMLAH.U8, VQDMLAH.U16 and VQDMLAH.U32 instructions don't actually exist: the Armv8.1-M architecture spec only lists signed forms of that instruction. The unsigned ones were added in error: they existed in an early draft of the spec, but they were removed before the public version, and we missed that particular spec change. Also affects the variant forms VQDMLASH, VQRDMLAH and VQRDMLASH. Reviewers: miyuki Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64502 llvm-svn: 365747	2019-07-11 09:52:15 +00:00
Sam Parker	85ad78b1cf	[ARM][ParallelDSP] Change the search for smlads Two functional changes have been made here: - Now search up from any add instruction to find the chains of operations that we may turn into a smlad. This allows the generation of a smlad which doesn't accumulate into a phi. - The search function has been corrected to stop it falsely searching up through an invalid path. The bulk of the changes have been making the Reduction struct a class and making it more C++y with getters and setters. Differential Revision: https://reviews.llvm.org/D61780 llvm-svn: 365740	2019-07-11 07:47:50 +00:00
Sam Parker	775b2f598a	[NFC][ARM] Convert lambdas to static helpers Break up and convert some of the lambdas in ARMLowOverheadLoops into static functions. llvm-svn: 365623	2019-07-10 12:29:43 +00:00
Mikhail Maltsev	ed143c5d59	[ARM] Enable VPUSH/VPOP aliases when either MVE or VFP is present Summary: Use the same predicates as VSTMDB/VLDMIA since VPUSH/VPOP alias to these. Patch by Momchil Velikov. Reviewers: ostannard, simon_tatham, SjoerdMeijer, samparker, t.p.northover, dmgreen Reviewed By: dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64413 llvm-svn: 365604	2019-07-10 08:59:17 +00:00

... 5 6 7 8 9 ...

10782 Commits