llvm-project

Commit Graph

Author	SHA1	Message	Date
Andrew Paverd	d157a9bc8b	Add Windows Control Flow Guard checks (/guard:cf). Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761	2019-10-28 15:19:39 +00:00
vhscampos	f6e11a36c4	[ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250	2019-10-28 11:06:58 +00:00
Jian Cai	a6b0219fc4	Revert "[ARM] Uses "Sun Style" syntax for section switching" This reverts commit `03de2f84fc`.	2019-10-25 14:03:07 -07:00
Jian Cai	03de2f84fc	[ARM] Uses "Sun Style" syntax for section switching Summary: Support "Sun Style" syntax for section switching ("#alloc,#write" etc). https://bugs.llvm.org/show_bug.cgi?id=43759 Reviewers: peter.smith, eli.friedman, kristof.beyls, t.p.northover Reviewed By: peter.smith Subscribers: MaskRay, llozano, manojgupta, nickdesaulniers, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69296	2019-10-25 13:27:35 -07:00
Guillaume Chatelet	a4783ef58d	[Alignment][NFC] getMemoryOpCost uses MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307	2019-10-25 21:26:59 +02:00
Simon Tatham	e0ef4ebe2f	[ARM] Add IR intrinsics for MVE VLD[24] and VST[24]. The VST2 and VST4 instructions take two or four vector registers as input, and store part of each register to memory in an interleaved pattern. They come in variants indicating which part of each register they store (VST20 and VST21; VST40 to VST43 inclusive); the intention is that issuing each of those variants in turn has the combined effect of loading or storing the whole set of registers to a memory block of equal size. The corresponding VLD2 and VLD4 instructions load from memory in the same interleaved format: each one overwrites only part of its output register set, and again, the idea is that if you use VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the whole of each register. I've implemented the stores and loads quite differently. The loads were easiest to implement as a single intrinsic that expands to all four VLD4x instructions or both VLD2x, delivering four complete output registers. (Implementing each individual load as a separate instruction taking four input registers to partially overwrite is possible in theory, but pointless, and when I tried it, I found it would need extra work to get the register allocation not to be horrible.) Since that intrinsic delivers multiple outputs, it has to be instruction-selected in custom C++. But the store instructions are easier to model individually, because they don't overwrite any register at all and you can write a DAG Isel pattern in Tablegen for each one. Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load instructions, delivers four full output vectors, and is handled by C++ code, whereas `int_arm_mve_vst4q` expands to just one store instruction, takes four input vectors and a constant indicating which lanes to store, and is handled entirely in Tablegen. (And similarly for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do each one. Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68700	2019-10-24 16:33:13 +01:00
Simon Tatham	ceeff95ca4	[ARM] Add some sample IR MVE intrinsics with C++ isel. This adds some initial example IR intrinsics for MVE instructions that deliver multiple output values, and hence, have to be instruction- selected by custom C++ code instead of Tablegen patterns. I've added the writeback gather load instructions (taking a vector of base addresses and a single common offset, returning a vector of loaded values and an updated vector of base addresses); one example from the long shift family (taking and returning a 64-bit value in two GPRs); and the VADC instruction (which propagates a carry bit from each vector-lane addition to the next, taking an input carry flag in FPSCR and outputting the final one in FPSCR as well). To support the VPT-predicated forms of these instructions, I've written some helper functions to add the cluster of MVE predicate operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used when the instruction actually is predicated (so it takes a predicate mask argument), and `AddEmptyMVEPredicateToOps` is for when the instruction is unpredicated (so it fills in $noreg for the mask). Each one comes in a form suitable for `vpred_n`, and one for `vpred_r` which takes the extra 'inactive' parameter. For VADC, the representation of the carry flag in the IR intrinsic is a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e. with the carry flag in bit 29 of the word. (The user-facing ACLE intrinsic will want it to be in bit 0, but I'll do that on the clang side.) Reviewers: dmgreen, miyuki, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68699	2019-10-24 16:33:13 +01:00
Simon Tatham	1b45297e01	[ARM] Begin adding IR intrinsics for MVE instructions. This commit, together with the next few, will add a representative sample of the kind of IR intrinsics that we'll need in order to implement the user-facing ACLE intrinsics for MVE. Supporting all of them will take more work; the intention of this initial series of commits is to implement an intrinsic or two from lots of different categories, as examples and proofs of concept. This initial commit introduces a small number of IR intrinsics for instructions simple enough that they can use Tablegen ISel patterns: the predicated versions of the VADD and VSUB instructions (both integer and FP), VMIN and VMAX, and the float->half VCVT instruction (predicated and unpredicated). When using VPT-predicated instructions in automatic code generation, it will be convenient to specify the predicate value as a vector of the appropriate number of i1. To make it easy to specify all sizes of an instruction in one go and give each one the matching predicate vector type, I've added a system of Tablegen informational records describing MVE's vector types: each one gives the underlying LLVM IR ValueType (which may not be the same if the MVE vector is of explicitly signed or unsigned integers) and an appropriate vNi1 to use as the predicate vector. (Also, those info records include the usual encoding for the types, so that as we add associations between each instruction encoding and one of the new `MVEVectorVTInfo` records, we can remove some of the existing template parameters and replace them with references to the vector type info's fields.) The user-facing ACLE intrinsics will receive a predicate mask as a 16-bit integer, so I've also provided a pair of intrinsics i2v and v2i, to convert between an integer and a vector of i1 by just changing the register class. Reviewers: dmgreen, miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67158	2019-10-24 16:33:13 +01:00
Mirko Brkusanin	4b63ca1379	[Mips] Use appropriate private label prefix based on Mips ABI MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64 regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo we can find out Mips ABI and pick appropriate prefix. Tags: #llvm, #clang, #lldb Differential Revision: https://reviews.llvm.org/D66795	2019-10-23 12:24:35 +02:00
Sander de Smalen	8f2dac471a	Reverted r375425 as it broke some buildbots. llvm-svn: 375444	2019-10-21 19:11:40 +00:00
Sander de Smalen	814548ec8e	[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize (Take 2) Commit message from D66935: This patch fixes a bug exposed by D65653 where a subsequent invocation of `determineCalleeSaves` ends up with a different size for the callee save area, leading to different frame-offsets in debug information. In the invocation by PEI, `determineCalleeSaves` tries to determine whether it needs to spill an extra callee-saved register to get an emergency spill slot. To do this, it calls 'estimateStackSize' and manually adds the size of the callee-saves to this. PEI then allocates the spill objects for the callee saves and the remaining frame layout is calculated accordingly. A second invocation in LiveDebugValues causes estimateStackSize to return the size of the stack frame including the callee-saves. Given that the size of the callee-saves is added to this, these callee-saves are counted twice, which leads `determineCalleeSaves` to believe the stack has become big enough to require spilling an extra callee-save as emergency spillslot. It then updates CalleeSavedStackSize with a larger value. Since CalleeSavedStackSize is used in the calculation of the frame offset in getFrameIndexReference, this leads to incorrect offsets for variables/locals when this information is recalculated after PEI. This patch fixes the lldb unit tests in `functionalities/thread/concurrent_events/*` Changes after D66935: Ensures AArch64FunctionInfo::getCalleeSavedStackSize does not return the uninitialized CalleeSavedStackSize when running `llc` on a specific pass where the MIR code has already been expected to have gone through PEI. Instead, getCalleeSavedStackSize (when passed the MachineFrameInfo) will try to recalculate the CalleeSavedStackSize from the CalleeSavedInfo. In debug mode, the compiler will assert the recalculated size equals the cached size as calculated through a call to determineCalleeSaves. This fixes two tests: test/DebugInfo/AArch64/asan-stack-vars.mir test/DebugInfo/AArch64/compiler-gen-bbs-livedebugvalues.mir that otherwise fail when compiled using msan. Reviewed By: omjavaid, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D68783 llvm-svn: 375425	2019-10-21 17:12:56 +00:00
David Green	0765a4c288	[ARM] Extra qdadd patterns This adds some new qdadd patterns to go along with the other recently added qadd's. Differential Revision: https://reviews.llvm.org/D68999 llvm-svn: 375414	2019-10-21 14:06:49 +00:00
David Green	d7b77f2203	[ARM] Add qadd lowering from a sadd_sat This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the same time. The qadd instruction sets the q flag, but we already have many cases where we do not model this in llvm. Differential Revision: https://reviews.llvm.org/D68976 llvm-svn: 375411	2019-10-21 12:33:46 +00:00
Guillaume Chatelet	bac5f6bd21	[Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69243 llvm-svn: 375407	2019-10-21 11:01:55 +00:00
David Green	fba831e791	[ARM] Lower sadd_sat to qadd8 and qadd16 Lower the target independent signed saturating intrinsics to qadd8 and qadd16. This custom lowers them from a sadd_sat, catching the node early before it is promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a qadd8/qadd16, so that we can call demand bits on it to show that it does not use the upper bits. Also handles QSUB8 and QSUB16. Differential Revision: https://reviews.llvm.org/D68974 llvm-svn: 375402	2019-10-21 09:53:38 +00:00
Guillaume Chatelet	3cc4835c00	Use Align for TFL::TransientStackAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69216 llvm-svn: 375398	2019-10-21 08:31:25 +00:00
Reid Kleckner	904cd3e06b	Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each Now X86ISelLowering doesn't depend on many IR analyses. llvm-svn: 375320	2019-10-19 01:31:09 +00:00
Reid Kleckner	1d7b41361f	Prune two MachineInstr.h includes, fix up deps MachineInstr.h included AliasAnalysis.h, which includes a world of IR constructs mostly unneeded in CodeGen. Prune it. Same for DebugInfoMetadata.h. Noticed with -ftime-trace. llvm-svn: 375311	2019-10-19 00:22:07 +00:00
Quentin Colombet	9f9151d494	[GISel][CallLowering] Make isIncomingArgumentHandler a pure virtual method The default implementation of isIncomingArgumentHandler could lead to generating incorrect code. Make it a pure virtual method, so that targets know they have to override it to produce correct code. NFC Differential Revision: https://reviews.llvm.org/D69187 llvm-svn: 375277	2019-10-18 20:13:42 +00:00
Sam Parker	8e6a638c74	[ARM][MVE] Enable truncating masked stores Allow us to generate truncating masked store which take v4i32 and v8i16 vectors and can store to v4i8, v4i16 and v8i8 and memory. Removed support for unaligned masked stores. Differential Revision: https://reviews.llvm.org/D68461 llvm-svn: 375108	2019-10-17 12:11:18 +00:00
Sam Parker	3ff961cabd	[ARM][MVE] Change VPST to use, not def, VPR Unlike VPT, VPST just uses the current value of VPR.P0. Differential Revision: https://reviews.llvm.org/D69037 llvm-svn: 375087	2019-10-17 08:46:31 +00:00
Sam Parker	39af8a3a3b	[DAGCombine][ARM] Enable extending masked loads Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085	2019-10-17 07:55:55 +00:00
Guillaume Chatelet	882c43d703	[Alignment][NFC] Use Align for TargetFrameLowering/Subtarget Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68993 llvm-svn: 375084	2019-10-17 07:49:39 +00:00
Mikhail Maltsev	95b5d459a0	[ARM] Add a register class for GPR pairs without SP and use it. NFCI Summary: Currently Thumb2InstrInfo.cpp uses a register class which is auto-generated by tablegen. Such approach is fragile because auto-generated classes might change when other register classes are added. For example, before https://reviews.llvm.org/D62667 we were using GPRPair_with_gsub_1_in_rGPRRegClass, but had to change it to GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass because the former class stopped being generated (this did not change the functionality though). This patch adds a register class consisting of even-odd GPR register pairs from (R0, R1) to (R10, R11), which excludes (R12, SP) and uses it in Thumb2InstrInfo.cpp instead of GPRPair_with_gsub_1_in_GPRwithAPSRnospRegClass. Reviewers: ostannard, simon_tatham, dmgreen, efriedma Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69026 llvm-svn: 374990	2019-10-16 10:40:57 +00:00
Sam Parker	1c3ca61294	[ARM][ParallelDSP] Change smlad insertion order Instead of inserting everything after the 'root' of the reduction, insert all instructions as close to their operands as possible. This can help reduce register pressure. Differential Revision: https://reviews.llvm.org/D67392 llvm-svn: 374981	2019-10-16 09:37:03 +00:00
Sam Parker	ce39278f25	[ARM][MVE] validForTailPredication insts Reverse the logic for valid tail predication instructions and create a whitelist instead. Added other instruction groups that aren't obviously safe: - instructions that 'narrow' their result. - lane moves. - byte swapping instructions. - interleaving loads and stores. - cross-beat carries. - top/bottom instructions. - complex operations. Hopefully we should be able to add more of these instructions to the whitelist, once we have a more concrete idea of the transform. Differential Revision: https://reviews.llvm.org/D67904 llvm-svn: 374887	2019-10-15 13:12:51 +00:00
Jian Cai	e9089c223c	[ARM][AsmParser] handles offset expression in parentheses Summary: Integrated assembler does not accept offset expressions surrounded by parenthesis. Handle this case for GAS compability. https://bugs.llvm.org/show_bug.cgi?id=43631 Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68764 llvm-svn: 374832	2019-10-14 22:22:26 +00:00
David Green	543236232c	[ARM] Selection for MVE VMOVN The adds both VMOVNt and VMOVNb instruction selection from the appropriate shuffles. We detect shuffle masks of the form: 0, N, 2, N+2, 4, N+4, ... or 0, N+1, 2, N+3, 4, N+5, ... ISel will also try the opposite patterns, with inputs reversed. These are selected to VMOVNt and VMOVNb respectively. Differential Revision: https://reviews.llvm.org/D68283 llvm-svn: 374781	2019-10-14 15:19:33 +00:00
Sam Parker	527a35e155	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store] Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763	2019-10-14 10:00:21 +00:00
Zi Xuan Wu	9802268ad3	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634	2019-10-12 02:53:04 +00:00
David Green	8628bb0491	[ARM] VQSUB instruction Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377	2019-10-10 16:34:30 +00:00
David Green	39596ec2fe	[ARM] VQADD instructions This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336	2019-10-10 13:05:04 +00:00
Oliver Stannard	4f454b2275	[IfCvt][ARM] Optimise diamond if-conversion for code size Currently, the heuristics the if-conversion pass uses for diamond if-conversion are based on execution time, with no consideration for code size. This adds a new set of heuristics to be used when optimising for code size. This is mostly target-independent, because the if-conversion pass can see the code size of the instructions which it is removing. For thumb, there are a few passes (insertion of IT instructions, selection of narrow branches, and selection of CBZ instructions) which are run after if conversion and affect these heuristics, so I've added target hooks to better predict the code-size effect of a proposed if-conversion. Differential revision: https://reviews.llvm.org/D67350 llvm-svn: 374301	2019-10-10 09:58:28 +00:00
Jinsong Ji	9912232b46	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit `9f41deccc0`. This reverts commit `18b6fe07bc`. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091	2019-10-08 17:32:56 +00:00
Nikola Prica	98603a8153	[DebugInfo][If-Converter] Update call site info during the optimization During the If-Converter optimization pay attention when copying or deleting call instructions in order to keep call site information in valid state. Reviewers: aprantl, vsk, efriedma Reviewed By: vsk, efriedma Differential Revision: https://reviews.llvm.org/D66955 llvm-svn: 374068	2019-10-08 15:43:12 +00:00
Nikola Prica	02682498b8	[ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers Support for tracking registers that forward function parameters into the following function frame. For now we only support cases when parameter is forwarded through single register. Reviewers: aprantl, vsk, t.p.northover Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D66953 llvm-svn: 374033	2019-10-08 09:43:05 +00:00
Kristof Beyls	78bfe3ab94	[ARM] Generate vcmp instead of vcmpe Based on the discussion in http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the conclusion was reached that the ARM backend should produce vcmp instead of vcmpe instructions by default, i.e. not be producing an Invalid Operation exception when either arguments in a floating point compare are quiet NaNs. In the future, after constrained floating point intrinsics for floating point compare have been introduced, vcmpe instructions probably should be produced for those intrinsics - depending on the exact semantics they'll be defined to have. This patch logically consists of the following parts: - Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which implemented fine-tuning for when to produce vcmpe (i.e. not do it for equality comparisons). The complexity introduced by those patches isn't needed anymore if we just always produce vcmp instead. Maybe these patches need to be reintroduced again once support is needed to map potential LLVM-IR constrained floating point compare intrinsics to the ARM instruction set. - Simply select vcmp, instead of vcmpe, see simple changes in lib/Target/ARM/ARMInstrVFP.td - Adapt lots of tests that tested for vcmpe (instead of vcmp). For all of these test, the intent of what is tested for isn't related to whether the vcmp should produce an Invalid Operation exception or not. Fixes PR43374. Differential Revision: https://reviews.llvm.org/D68463 llvm-svn: 374025	2019-10-08 08:25:42 +00:00
Zi Xuan Wu	9f41deccc0	[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017	2019-10-08 03:28:33 +00:00
Tim Northover	a7d90af1be	ARM-Darwin: keep the frame register reserved even if not updated. Darwin platforms need the frame register to always point at a valid record even if it's not updated in a leaf function. Backtraces are more important than one extra GPR. llvm-svn: 373738	2019-10-04 12:29:32 +00:00
Simon Pilgrim	b327dc1966	Fix uninitialized variable warning. NFCI llvm-svn: 373582	2019-10-03 11:21:46 +00:00
Benjamin Kramer	12e915b3fc	[ARM] Make helpers static. NFC. llvm-svn: 373503	2019-10-02 18:20:24 +00:00
David Green	c9b5ab8b1c	[ARM] Identity shuffles are legal Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446	2019-10-02 11:40:51 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Sam Parker	aac03ae06a	[ARM][MVE] Change VCTP operand The VCTP instruction will calculate the predicate masked based upon the number of elements that need to be processed. I had inserted the sub before the vctp intrinsic and supplied it as the operand, but this is incorrect as the phi should directly feed the vctp. The sub is calculating the value for the next iteration. Differential Revision: https://reviews.llvm.org/D67921 llvm-svn: 373188	2019-09-30 08:03:23 +00:00
Sam Parker	b3438f1cc0	[ARM][CGP] Allow signext arguments As we perform a zext on any arguments used in the promoted tree, it doesn't matter if they're marked as signext. The only permitted user(s) in the tree which would interpret the sign bits are signed icmps. For these instructions, their promoted operands are truncated before the icmp uses them. Differential Revision: https://reviews.llvm.org/D68019 llvm-svn: 373186	2019-09-30 07:52:10 +00:00
David Green	120a5e9a74	[ARM] Cortex-M4 schedule additions This is an attempt to fill in some of the missing instructions from the Cortex-M4 schedule, and make it easier to do the same for other ARM cpus. - Some instructions are marked as hasNoSchedulingInfo as they are pseudos or otherwise do not require scheduling info - A lot of features have been marked not supported - Some WriteRes's have been added for cvt instructions. - Some extra instruction latencies have been added, notably by relaxing the regex for dsp instruction to catch more cases, and some fp instructions. This goes a long way to get the CompleteModel working for this CPU. It does not go far enough as to get all scheduling info for all output operands correct. Differential Revision: https://reviews.llvm.org/D67957 llvm-svn: 373163	2019-09-29 08:38:48 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Alexandros Lamprineas	c006b6f4cb	[MC][ARM] vscclrm disassembles as vldmia Happens only when the mve.fp subtarget feature is enabled: $ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b" .text vldmia pc, {d0, d1, d2, d3} $ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -disassemble <<< "0x9f,0xec,0x08,0x0b" .text vscclrm {d0, d1, d2, d3, vpr} Assembling returns the correct encoding with or without mve.fp: $ llvm-mc -triple thumbv8.1m.main -mattr=+mve.fp,+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}" .text vscclrm {d0, d1, d2, d3, vpr} @ encoding: [0x9f,0xec,0x08,0x0b] $ llvm-mc -triple thumbv8.1m.main -mattr=+8msecext -show-encoding <<< "vscclrm {d0-d3, vpr}" .text vscclrm {d0, d1, d2, d3, vpr} @ encoding: [0x9f,0xec,0x08,0x0b] The problem seems to be in the TableGen description of VSCCLRMD. The least significant bit should be set to zero. Differential Revision: https://reviews.llvm.org/D68025 llvm-svn: 373052	2019-09-27 08:22:24 +00:00
Simon Pilgrim	2cf54d7b71	ARMBaseInstrInfo getOperandLatency - silence static analyzer dyn_cast<> null dereference warnings. NFCI. The static analyzer is warning about potential null dereferences, but we should be able to use cast<> directly and if not assert will fire for us. llvm-svn: 372992	2019-09-26 16:05:55 +00:00
David Green	10d10102a4	[ARM] Ensure we do not attempt to create lsll #0 During legalisation we can end up with some pretty strange nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid assembly instructions. A long shift with a zero immediate actually encodes a shift by 32. Differential Revision: https://reviews.llvm.org/D67664 llvm-svn: 372839	2019-09-25 10:16:48 +00:00
David Green	2fb41fc70c	[ARM] Split large widening MVE loads Similar to rL372717, we can force the splitting of extends of vector loads in MVE, in order to use the better widening loads as opposed to going through expensive extends. This adds a combine to early-on detect extends of loads and split the load in two, from where normal legalisation will kick in and we get a series of widening loads. Differential Revision: https://reviews.llvm.org/D67909 llvm-svn: 372721	2019-09-24 10:53:09 +00:00
David Green	49d851f403	[ARM] Split large truncating MVE stores MVE does not have a simple sign extend instruction that can move elements across lanes. We currently often end up moving each lane into and out of a GPR, in order to get elements into the correct places. When we have a store of a trunc (or a extend of a load), we can instead just split the store/load in two, using the narrowing/widening load/store instructions from each half of the vector. This does that for stores. It happens very early in a store combine, so as to easily detect the truncates. (It would be possible to do this later, but that would involve looking through a buildvector of extract elements. Not impossible but this way seemed simpler). By enabling store combines we also get a vmovdrr combine for free, helping some other tests. Differential Revision: https://reviews.llvm.org/D67828 llvm-svn: 372717	2019-09-24 10:10:41 +00:00
Pavel Labath	aaff1a631a	MCRegisterInfo: Merge getLLVMRegNum and getLLVMRegNumFromEH Summary: The functions different in two ways: - getLLVMRegNum could return both "eh" and "other" dwarf register numbers, while getLLVMRegNumFromEH only returned the "eh" number. - getLLVMRegNum asserted if the register was not found, while the second function returned -1. The second distinction was pretty important, but it was very hard to infer that from the function name. Aditionally, for the use case of dumping dwarf expressions, we needed a function which can work with both kinds of number, but does not assert. This patch solves both of these issues by merging the two functions into one, returning an Optional<unsigned> value. While the same thing could be achieved by adding an "IsEH" argument to the (renamed) getLLVMRegNumFromEH function, it seemed better to avoid the confusion of two functions and put the choice of asserting into the hands of the caller -- if he checks the Optional value, he can safely process "untrusted" input, and if he blindly dereferences the Optional, he gets the assertion. I've updated all call sites to the new API, choosing between the two options according to the function they were calling originally, except that I've updated the usage in DWARFExpression.cpp to use the "safe" method instead, and added a test case which would have previously triggered an assertion failure when processing (incorrect?) dwarf expressions. Reviewers: dsanders, arsenm, JDevlieghere Subscribers: wdng, aprantl, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67154 llvm-svn: 372710	2019-09-24 09:31:02 +00:00
Mark Murray	c720f63845	Cosmetic; don't use the magic constant 35 when HASH is more readable. This matches other MCK__<THING>_* usage better. Summary: No functional change. This fixes a magic constant in MCK__*_... macros only. Reviewers: ostannard Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67840 llvm-svn: 372599	2019-09-23 12:52:42 +00:00
Guillaume Chatelet	c281b40814	[Alignment] Get DataLayout::StackAlignment as Align Summary: Internally it is needed to know if StackAlignment is set but we can expose it as llvm::Align. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67852 llvm-svn: 372585	2019-09-23 12:01:32 +00:00
Sam Parker	9feb429a33	[ARM][MVE] Remove old tail predicates Remove any predicate that we replace with a vctp intrinsic, and try to remove their operands too. Also look into the exit block to see if there's any duplicates of the predicates that we've replaced and clone the vctp to be used there instead. Differential Revision: https://reviews.llvm.org/D67709 llvm-svn: 372567	2019-09-23 09:48:25 +00:00
Sam Parker	4ba6d0ded2	[ARM][LowOverheadLoops] Use subs during revert. Check whether there are any uses or defs between the LoopDec and LoopEnd. If there's not, then we can use a subs to set the cpsr and skip generating a cmp. Differential Revision: https://reviews.llvm.org/D67801 llvm-svn: 372560	2019-09-23 08:57:50 +00:00
Sam Parker	566127e376	[ARM][LowOverheadLoops] Use tBcc when reverting Check the branch target ranges and use a tBcc instead of t2Bcc when we can. Differential Revision: https://reviews.llvm.org/D67796 llvm-svn: 372557	2019-09-23 08:35:31 +00:00
Oliver Cruickshank	c84722ff27	[ARM] Fix CTTZ not generating correct instructions MVE CTTZ intrinsic should have been set to Custom, not Expand llvm-svn: 372401	2019-09-20 15:03:44 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
David Green	0cfb78e52a	[ARM] MVE i1 splat We needn't BFI each lane individually into a predicate register when each lane in the same. A simple sign extend and a vmsr will do. Differential Revision: https://reviews.llvm.org/D67653 llvm-svn: 372313	2019-09-19 12:17:41 +00:00
Simon Pilgrim	da89495a3e	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. llvm-svn: 372308	2019-09-19 10:47:12 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Guillaume Chatelet	d4c4671aa7	[Alignment][NFC] Remove LogAlignment functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67620 llvm-svn: 372231	2019-09-18 15:49:49 +00:00
Krasimir Georgiev	2f1bba7fd0	Revert "[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize" Summary: This reverts commit r372204. This change causes build bot failures under msan: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/35236/steps/check-llvm%20msan/logs/stdio: ``` FAIL: LLVM :: DebugInfo/AArch64/asan-stack-vars.mir (19531 of 33579) ****************** TEST 'LLVM :: DebugInfo/AArch64/asan-stack-vars.mir' FAILED **************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc -O0 -start-before=livedebugvalues -filetype=obj -o - /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-dwarfdump -v - \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir -- Exit Code: 2 Command Output (stderr): -- ==62894==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0xdfcafb in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 #1 0xdfae8a in resolveFrameIndexReference /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1580:10 #2 0xdfae8a in llvm::AArch64FrameLowering::getFrameIndexReference(llvm::MachineFunction const&, int, unsigned int&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1536 #3 0x46642c1 in (anonymous namespace)::LiveDebugValues::extractSpillBaseRegAndOffset(llvm::MachineInstr const&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:582:21 #4 0x4647cb3 in transferSpillOrRestoreInst /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:883:11 #5 0x4647cb3 in process /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1079 #6 0x4647cb3 in (anonymous namespace)::LiveDebugValues::ExtendRanges(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1361 #7 0x463ac0e in (anonymous namespace)::LiveDebugValues::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1415:18 #8 0x4854ef0 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13 #9 0x53b0b01 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1648:27 #10 0x53b15f6 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1685:16 #11 0x53b298d in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1750:27 #12 0x53b298d in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1863 #13 0x905f21 in compileModule(char, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:601:8 #14 0x8fdc4e in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:355:22 #15 0x7f67673632e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #16 0x882369 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x882369) MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const Exiting error: -: The file was not recognized as a valid object file FileCheck error: '-' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir ``` Reviewers: bkramer Reviewed By: bkramer Subscribers: sdardis, aprantl, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67710 llvm-svn: 372228	2019-09-18 14:42:09 +00:00
Sander de Smalen	dc2a7f5b39	[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize This patch fixes a bug exposed by D65653 where a subsequent invocation of `determineCalleeSaves` ends up with a different size for the callee save area, leading to different frame-offsets in debug information. In the invocation by PEI, `determineCalleeSaves` tries to determine whether it needs to spill an extra callee-saved register to get an emergency spill slot. To do this, it calls 'estimateStackSize' and manually adds the size of the callee-saves to this. PEI then allocates the spill objects for the callee saves and the remaining frame layout is calculated accordingly. A second invocation in LiveDebugValues causes estimateStackSize to return the size of the stack frame including the callee-saves. Given that the size of the callee-saves is added to this, these callee-saves are counted twice, which leads `determineCalleeSaves` to believe the stack has become big enough to require spilling an extra callee-save as emergency spillslot. It then updates CalleeSavedStackSize with a larger value. Since CalleeSavedStackSize is used in the calculation of the frame offset in getFrameIndexReference, this leads to incorrect offsets for variables/locals when this information is recalculated after PEI. Reviewers: omjavaid, eli.friedman, thegameg, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D66935 llvm-svn: 372204	2019-09-18 09:02:44 +00:00
Eli Friedman	ddf5e86c22	[ARM] VFPv2 only supports 16 D registers. r361845 changed the way we handle "D16" vs. "D32" targets; there used to be a negative "d16" which removed instructions from the instruction set, and now there's a "d32" feature which adds instructions to the instruction set. This is good, but there was an oversight in the implementation: the behavior of VFPv2 was changed. In particular, the "vfp2" feature was changed to imply "d32". This is wrong: VFPv2 only supports 16 D registers. In practice, this means if you specify -mfpu=vfpv2, the compiler will generate illegal instructions. This patch gets rid of "vfp2d16" and "vfp2d16sp", and fixes "vfp2" and "vfp2sp" so they don't imply "d32". Differential Revision: https://reviews.llvm.org/D67375 llvm-svn: 372186	2019-09-17 21:42:38 +00:00
Simon Pilgrim	a9a27d1ded	[ARM][AsmParser] Don't dereference a dyn_cast result. NFCI. The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't. llvm-svn: 372145	2019-09-17 17:26:14 +00:00
David Green	91724b8530	[ARM] Add a SelectTAddrModeImm7 for MVE narrow loads and stores We were previously using the SelectT2AddrModeImm7 for both normal and narrowing MVE loads/stores. As the narrowing instructions do not accept sp as a register, it makes little sense to optimise a FrameIndex into the load, only to have to recover that later on. This adds a SelectTAddrModeImm7 which does not do that folding, and uses it for narrowing load/store patterns. Differential Revision: https://reviews.llvm.org/D67489 llvm-svn: 372134	2019-09-17 15:32:28 +00:00
David Green	22a2209433	[ARM] Reserve an emergency spill slot for fp16 addressing modes that need it Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill slot for instructions that will be out of range to use sp directly. AddrMode5FP16 is 8 bits with a scale of 2. Differential Revision: https://reviews.llvm.org/D67483 llvm-svn: 372132	2019-09-17 15:23:09 +00:00
Sam Parker	1d9ba08543	[ARM] Fix for buildbots Remove setPreservesCFG from ARMConstantIslandPass and add a couple of -verify-machine-dom-info instances into the existing codegen tests. llvm-svn: 372126	2019-09-17 14:21:36 +00:00
David Green	1ff9553057	[ARM] Fix for MVE load/store stack accesses MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices. Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used. Differential Revision: https://reviews.llvm.org/D67327 llvm-svn: 372114	2019-09-17 12:58:51 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Sam Parker	36c922278e	[ARM][LowOverheadLoops] Add LR def safety check Converting the *LoopStart pseudo instructions into DLS/WLS results in LR being defined. These instructions were inserted on the assumption that LR would already contain the loop counter because a mov is introduced during ISel as the the consumers in the loop can only use LR. That assumption proved wrong! So perform a safety check, finding an appropriate place to insert the DLS/WLS instructions or revert if this isn't possible. Differential Revision: https://reviews.llvm.org/D67539 llvm-svn: 372111	2019-09-17 12:19:32 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Sam Parker	95b28a4c72	[ARM] LE support in ConstantIslands The low-overhead branch extension provides a loop-end 'LE' instruction that performs no decrement nor compare, it just jumps backwards. This patch modifies the constant islands pass to try to insert LE instructions in place of a Thumb2 conditional branch, instead of shrinking it. This only happens if a cmp can be converted to a cbn/z and used to exit the loop. Differential Revision: https://reviews.llvm.org/D67404 llvm-svn: 372085	2019-09-17 09:08:05 +00:00
Sam Parker	26a475afe5	[ARM][MVE] Add invalidForTailPredication to TSFlags Set this bit for the MVE reduction instructions to prevent a loop from becoming tail predicated in their presence. Differential Revision: https://reviews.llvm.org/D67444 llvm-svn: 372076	2019-09-17 07:43:04 +00:00
David Green	8d21460dc5	[ARM] A predicate cast of a predicate cast is a predicate cast The adds some very basic folding of PREDICATE_CASTS, removing cases when they are chained together. These would already be removed eventually, as these are lowered to copies. This just allows it to happen earlier, which can help other simplifications. Differential Revision: https://reviews.llvm.org/D67591 llvm-svn: 372012	2019-09-16 17:29:07 +00:00
Oliver Cruickshank	ee6fbebbaf	[ARM] Add patterns for BSWAP intrinsic on MVE BSWAP can use the VREV instruction on MVE to produce better results than expanding. llvm-svn: 372002	2019-09-16 15:20:10 +00:00
Oliver Cruickshank	e9510a6cad	[ARM] Add patterns for bitreverse intrinsic on MVE BITREVERSE can use the VBRSR which will reverse and right shift. Shifting right by 0 will just reverse the bits. llvm-svn: 372001	2019-09-16 15:20:03 +00:00
Oliver Cruickshank	5f799ef162	[ARM] Lower CTTZ on MVE Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and count the leading zeros, equivalent to a count trailing zeros (CTTZ). llvm-svn: 372000	2019-09-16 15:19:56 +00:00
Oliver Cruickshank	cd1a0b9271	[ARM] Add patterns for CTLZ on MVE CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999	2019-09-16 15:19:49 +00:00
David Green	ce7328cb61	[ARM] Fold VCMP into VPT MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold them, searching back through the basicblock for a valid VCMP and creating the VPT from its operands. There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places. Differential Revision: https://reviews.llvm.org/D66577 llvm-svn: 371982	2019-09-16 13:02:41 +00:00
David Green	b325c05732	[ARM] Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932	2019-09-15 14:14:47 +00:00
David Green	b7b7f26220	[ARM] Add earlyclobber for cross beat MVE instructions rL367544 added @earlyclobbers for the MVE VREV64 instruction. This adds the same for a number of other 32bit instructions that are similarly unpredictable if the destination equals the source (due to the cross beat nature of the instructions). This includes: VCADD.f32 VCADD.i32 VCMUL.f32 VHCADD.s32 VMULLT/B.s/u32 VQDMLADH{X}.s32 VQRDMLADH{X}.s32 VQDMLSDH{X}.s32 VQRDMLSDH{X}.s32 VQDMULLT/B.s32 with Qm and Rm No tests here as this would require intrinsics (or very interesting codegen) to manifest. The tests will follow naturally as the intrinsics are added. Differential Revision: https://reviews.llvm.org/D67462 llvm-svn: 371838	2019-09-13 11:20:17 +00:00
Sam Tebbs	1572b68509	[ARM] Add support for MVE vmaxv and vminv This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv. Differential Revision: https://reviews.llvm.org/D66413 llvm-svn: 371827	2019-09-13 09:11:46 +00:00
Guillaume Chatelet	af11cc7eb5	[Alignment] Move OffsetToAlignment to Alignment.h Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, JDevlieghere, alexshap, rupprecht, jhenderson Subscribers: sdardis, nemanjai, hiraditya, kbarton, jakehehrlich, jrtc27, MaskRay, atanasyan, jsji, seiya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67499 llvm-svn: 371742	2019-09-12 15:20:36 +00:00
Guillaume Chatelet	97264366fb	[Alignment][NFC] use llvm::Align for AsmPrinter::EmitAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: dschuff, sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67443 llvm-svn: 371616	2019-09-11 13:37:35 +00:00
Guillaume Chatelet	48904e9452	[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608	2019-09-11 11:16:48 +00:00
Guillaume Chatelet	b6722af068	[Alignment] Use Align for TargetLowering::MinStackArgumentAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67288 llvm-svn: 371498	2019-09-10 09:01:18 +00:00
David Green	2b7089949e	[ARM] Fix loads and stores for predicate vectors These predicate vectors can usually be loaded and stored with a single instruction, a VSTR_P0. However this instruction will store the entire P0 predicate, 16 bits, zeroextended to 32bits. Each lane of the the v4i1/v8i1/v16i1 representing 4/2/1 bits. As far as I understand, when llvm says "store this v4i1", it really does need to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a store followed by a load, which is how the code is expanded. So this instead lowers the v4i1/v8i1 load/store through some shuffles to get the bits into the correct positions. This, as you might imagine, is not as efficient as a single instruction. But I believe it is needed for correctness. v16i1 equally should not load/store 32bits, only storing the 16bits of data. Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not changing). This is fine as they are self-consistent, it is only "externally observable loads/stores" (from our point of view) that need to be corrected. Differential revision: https://reviews.llvm.org/D67085 llvm-svn: 371419	2019-09-09 16:35:49 +00:00
Simon Tatham	0e48bd24e2	[ARM] Remove some spurious MVE reduction instructions. The family of 'dual-accumulating' vector multiply-add instructions (VMLADAV, VMLALDAV and VRMLALDAVH) can all operate on both signed and unsigned integer types, and they all have an 'exchange' variant (with an X in the name) that modifies which pairs of vector lanes in the two inputs are multiplied together. But there's a clause in the spec that says that the X variants //don't// operate on unsigned integer types, only signed. You can have X, or unsigned, or neither, but not both. We didn't notice that clause when we implemented the MC support for these instructions, so LLVM believes that things like VMLADAVX.U8 do exist, contradicting the spec. Here I fix that by conditioning them out in Tablegen. In order to do that, I've reversed the nesting order of the Tablegen multiclasses for those instructions. Previously, the innermost multiclass generated the X and not-X variants, and the one outside that generated the A and not-A variants. Now X is done by the outer multiclass, which allows me to bypass that one when I only want the two not-X variants. Changing the multiclass nesting order also changes the names of the instruction ids unless I make a special effort not to. I decided that while I was changing them anyway I'd make them look nicer; so now the instructions have names like MVE_VMLADAVs32 or MVE_VMLADAVaxs32, instead of cumbersome _noacc_noexch suffixes. The corresponding multiply-subtract instructions are unaffected. Those don't accept unsigned types at all, either in the spec or in LLVM. Reviewers: ostannard, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67214 llvm-svn: 371405	2019-09-09 15:17:26 +00:00
Sam Parker	1ad508e8e2	[ARM][MVE] VCTP instruction selection Add codegen support for vctp{8,16,32}. Differential Revision: https://reviews.llvm.org/D67344 llvm-svn: 371395	2019-09-09 12:54:47 +00:00
David Green	d936a6301b	[ARM] Prevent generating NEON stack accesses under MVE. We should not be generating Neon stack loads/stores even for these large registers. No test here because my understanding is we will only generate these QQPR regs for intrinsics and VLDn's. The tests will follow once those are available. Differential revision: https://reviews.llvm.org/D67169 llvm-svn: 371386	2019-09-09 10:46:25 +00:00
Oliver Stannard	6b9aedaec6	[ARM][MVE] Decoding of uqrshl and sqrshl accepts unpredictable encodings Specify the Unpredictable bits, and return softfails when appropriate. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66939 llvm-svn: 371374	2019-09-09 08:50:28 +00:00
Sam Parker	c363deb575	[ARM][ParallelDSP] Fix for sext input The incoming accumulator value can be discovered through a sext, in which case there will be a mismatch between the input and the result. So sign extend the accumulator input if we're performing a 64-bit mac. Differential Revision: https://reviews.llvm.org/D67220 llvm-svn: 371370	2019-09-09 08:39:14 +00:00
David Green	df2501adca	[ARM] Remove declaration of unimplemented function. NFC. llvm-svn: 371331	2019-09-08 13:13:15 +00:00
Simon Pilgrim	d7d8bb937a	Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI. llvm-svn: 371302	2019-09-07 11:04:04 +00:00
Teresa Johnson	9c27b59cec	Change TargetLibraryInfo analysis passes to always require Function Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284	2019-09-07 03:09:36 +00:00
Oliver Cruickshank	a050307c05	[ARM] Add patterns for VSUB with q and r registers Added patterns for VSUB to support q and r registers, which reduces pressure on q registers. llvm-svn: 371231	2019-09-06 17:02:42 +00:00
Oliver Cruickshank	3aed95af4e	[ARM] Add patterns for VADD with q and r registers Added support for VADD to use q and r registers, which reduces pressure on q registers. llvm-svn: 371230	2019-09-06 17:02:35 +00:00
Oliver Cruickshank	9bf27928e1	[ARM] Add patterns for VMUL with q and r registers Added support for VMUL to use an r register, this reduces pressure on the q registers. llvm-svn: 371229	2019-09-06 17:02:21 +00:00
Sam Tebbs	f1cdd95a2f	[ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218	2019-09-06 16:01:32 +00:00
Guillaume Chatelet	9fcf066d0c	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210	2019-09-06 14:51:15 +00:00
Guillaume Chatelet	4fc3ad9e13	[Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200	2019-09-06 12:48:34 +00:00
Sam Parker	312409e464	[ARM] MVE Tail Predication The MVE and LOB extensions of Armv8.1m can be combined to enable 'tail predication' which removes the need for a scalar remainder loop after vectorization. Lane predication is performed implicitly via a system register. The effects of predication is described in Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points being: - For vector operations that perform reduction across the vector and produce a scalar result, whether the value is accumulated or not. - For non-load instructions, the predicate flags determine if the destination register byte is updated with the new value or if the previous value is preserved. - For vector store instructions, whether the store occurs or not. - For vector load instructions, whether the value that is loaded or whether zeros are written to that element of the destination register. This patch implements a pass that takes a hardware loop, containing masked vector instructions, and converts it something that resembles an MVE tail predicated loop. Currently, if we had code generation, we'd generate a loop in which the VCTP would generate the predicate and VPST would then setup the value of VPR.PO. The loads and stores would be placed in VPT blocks so this is not tail predication, but normal VPT predication with the predicate based upon a element counting induction variable. Further work needs to be done to finally produce a true tail predicated loop. Because only the loads and stores are predicated, in both the LLVM IR and MIR level, we will restrict support to only lane-wise operations (no horizontal reductions). We will perform a final check on MIR during loop finalisation too. Another restriction, specific to MVE, is that all the vector instructions need operate on the same number of elements. This is because predication is performed at the byte level and this is set on entry to the loop, or by the VCTP instead. Differential Revision: https://reviews.llvm.org/D65884 llvm-svn: 371179	2019-09-06 08:24:41 +00:00
David Candler	a59bffb576	[ARM] Add support for the s,j,x,N,O inline asm constraints A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang: Target independent constraints: s: An integer constant, but allowing only relocatable values ARM specific constraints: j: An immediate integer between 0 and 65535 (valid for MOVW) x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3 N: An immediate integer between 0 and 31 (Thumb1 only) O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only) This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation. Differential Revision: https://reviews.llvm.org/D65863 Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a llvm-svn: 371079	2019-09-05 15:17:25 +00:00
David Green	83a3341246	[ARM] Fixup the creation of VPT blocks This attempts to just fix the creation of VPT blocks, fixing up the iterating, which instructions are considered in the bundle, and making sure that we do not overrun the end of the block. Differential Revision: https://reviews.llvm.org/D67219 llvm-svn: 371064	2019-09-05 13:37:04 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Sam Parker	fea532230b	[ARM][ParallelDSP] SExt mul for accumulation For any unpaired muls, we accumulate them as an input to the reduction. Check the type of the mul and perform a sext if the existing accumlator input type is not the same. Differential Revision: https://reviews.llvm.org/D66993 llvm-svn: 370851	2019-09-04 08:41:34 +00:00
Amara Emerson	fbaf425b79	[GlobalISel][CallLowering] Add support for splitting types according to calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822	2019-09-03 21:42:28 +00:00
Reid Kleckner	b2d10cf22e	[MC] Pass through .code16/32/64 and .syntax unified for COFF These flags should simply be passed through to the target, which will do the right thing. Add an MC/X86 test that uses these directives with the three primary object file formats and shows that they disassemble the same everywhere. There is a missing test for .code32 on Windows ARM, since I'm not sure exactly how to construct one. Fixes PR43203 llvm-svn: 370805	2019-09-03 18:16:52 +00:00
David Green	2f3574c168	[ARM] Ignore Implicit CPSR regs when lowering from Machine to MC operands The code here seems to date back to r134705, when tablegen lowering was first being added. I don't believe that we need to include CPSR implicit operands on the MCInst. This now works more like other backends (like AArch64), where all implicit registers are skipped. This allows the AliasInst for CSEL's to match correctly, as can be seen in the test changes. Differential revision: https://reviews.llvm.org/D66703 llvm-svn: 370745	2019-09-03 11:30:54 +00:00
David Green	61973d978b	[ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741	2019-09-03 11:06:24 +00:00
David Green	57cc65ff47	[ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions. Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739	2019-09-03 10:53:07 +00:00
David Green	3e8d5f335d	[ARM] Fix MVE ldst offset ranges We were using isShiftedInt<7, Shift>(RHSC) to detect the ranges of offsets to fold into MVE loads/stores. The instructions actually take a 7 bit unsigned integer which is either added or subtracted. So something more like isShiftedUInt<7, Shift>(abs(RHSC)). Instead I've changes this to use the isScaledConstantInRange method, same as in SelectT2AddrModeImm7Offset used by pre/post inc, which seemed to already be getting this correct. Differential revision: https://reviews.llvm.org/D66997 llvm-svn: 370731	2019-09-03 09:57:02 +00:00
Oliver Stannard	3be2df2418	[ARM][MVE] Decoding of VMSR doesn't diagnose some unpredictable encodings Decoding of VMSR doesn't diagnose some unpredictable encodings, as the unpredictable bits are not correctly set. Diff-reduce this instruction's internals WRT VMRS so I can see the differences better. Mostly this is s/src/Rt/g. Fill in the "should-be-(0)" bits. Designate the Unpredictable{} bits for both VMRS and VMSR. Patch by Mark Murray! Differential revision: https://reviews.llvm.org/D66938 llvm-svn: 370729	2019-09-03 09:55:30 +00:00
Oliver Stannard	39bf484d92	Bug fix on function epilog optimization (ARM backend) To save a 'add sp,#val' instruction by adding registers to the final pop instruction, the first register transferred by this pop instruction need to be found. If the function to be optimized has a non-void return value, the operand list contains r0 (implicit) which prevents the optimization to take place. Therefore implicit register references should be skipped in the search loop, because this registers are never popped from the stack. Patch by Rainer Herbertz (rOptimizer)! Differential revision: https://reviews.llvm.org/D66730 llvm-svn: 370728	2019-09-03 09:51:19 +00:00
Sam Tebbs	8b2df85d02	[ARM] Select vmla This patch adds vmla selection. Differential revision: https://reviews.llvm.org/D66297 llvm-svn: 370704	2019-09-03 08:17:46 +00:00
David Green	a95ec59fa5	[ARM] Use MQPR not QPR for MVE registers We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676	2019-09-02 17:18:23 +00:00
David Green	8469a39af3	[ARM] Remove MVE masked loads/stores These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607	2019-09-01 10:11:40 +00:00
David Green	942c2e3795	[ARM] MVE Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329	2019-08-29 10:54:35 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
David Green	379f6186dd	[ARM] Move MVEVPTBlockPass to a separate file. NFC This just pulls the MVEVPTBlockPass into a separate file, as opposed to being wrapped up in Thumb2ITBlockPass. Differential revision: https://reviews.llvm.org/D66579 llvm-svn: 370187	2019-08-28 11:37:31 +00:00
David Green	1c5b143c99	[MVE] VMOVX patterns This adds fp16 VMOVX patterns, using the same patterns as rL362482 with some adjustments for MVE. It allows us to move fp16 registers without going into and out of gprs. VMOVX is able to move the top bits from a fp16 in a fp reg into the bottom bits of another register, zeroing the rest. This can be used for odd MVE register lanes. The top bits are not read by fp16 instructions, so no move is required there if we are dealing with even lanes. Differential revision: https://reviews.llvm.org/D66793 llvm-svn: 370184	2019-08-28 10:13:23 +00:00
Sam Parker	a761ba0f2d	[ARM][ParallelDSP] Change search for muls rL369567 reverted a couple of recent changes made to ARMParallelDSP because of a miscompilation error: PR43073. The issue stemmed from an underlying bug that was caused by adding muls into a reduction before it was proved that they could be executed in parallel with another mul. Most of the changes here are from the previously reverted commits. The additional changes have been made area: 1) The Search function now doesn't insert any muls into the Reduction object. That now happens once the search has successfully finished. 2) For any muls added into the reduction but that weren't paired, we accumulate their values as an input into the smlad. Differential Revision: https://reviews.llvm.org/D66660 llvm-svn: 370171	2019-08-28 08:51:13 +00:00
Sam Clegg	90b6bb75e8	[MC] Minor cleanup to MCFixup::Kind handling. NFC. Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720	2019-08-23 01:00:55 +00:00
Sam Tebbs	a69d9d6156	Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit `cd53ff6`, reapplying `7c6b229`. llvm-svn: 369638	2019-08-22 10:29:20 +00:00
Hans Wennborg	cd53ff6c0d	Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32" It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636	2019-08-22 09:16:53 +00:00
Sam Tebbs	7c6b229204	[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626	2019-08-22 08:12:06 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Nico Weber	ed18e70c86	Revert r367389 (and follow-up r368404); it caused PR43073. llvm-svn: 369567	2019-08-21 19:53:42 +00:00
David Green	717feabdf0	[ARM] Formatting for ARMInstrMVE.td. NFC This is just some formatting cleanup, prior to the masked load and store patch in D66534. llvm-svn: 369545	2019-08-21 16:20:35 +00:00
Sam Tebbs	dcfc2d40d3	[ARM] Select vaddva This patch adds vaddva selection. Differential revision: https://reviews.llvm.org/D66410 llvm-svn: 369404	2019-08-20 16:33:34 +00:00
Sam Tebbs	f312c1ecf4	[ARM] Add support for MVE vaddv This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245	2019-08-19 09:38:28 +00:00
David Green	2bfc13fde1	[ARM] MVE sext costs This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244	2019-08-19 09:13:22 +00:00
Jian Cai	16fa8b0970	Reland "[ARM] push LR before __gnu_mcount_nc" This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173	2019-08-16 23:30:16 +00:00
Eli Friedman	eaff844fe9	[ARM] Preserve liveness in ARMConstantIslands. We currently don't use liveness information after this point, but it can be useful to catch bugs using -verify-machineinstrs, and optimizations could potentially use this information in the future. Differential Revision: https://reviews.llvm.org/D66319 llvm-svn: 369162	2019-08-16 22:20:14 +00:00
Jian Cai	2d957cfe02	Revert "[ARM] push LR before __gnu_mcount_nc" This reverts commit `f4cf3b9593`. llvm-svn: 369149	2019-08-16 20:40:21 +00:00
Jian Cai	f4cf3b9593	[ARM] push LR before __gnu_mcount_nc Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147	2019-08-16 20:21:08 +00:00
David Green	b782e61e47	[ARM] MVE sext of a load is free MVE also has some sext of loads, which will be free just as scalar instructions are. Differential Revision: https://reviews.llvm.org/D66008 llvm-svn: 369118	2019-08-16 15:13:37 +00:00
David Green	6e1ac42474	[ARM] Correct register for narrowing and widening MVE loads and stores. The widening and narrowing MVE instructions like VLDRH.32 are only permitted to use low tGPR registers. This means that if they are used for a stack slot, where the register used is only decided during frame setup, we need to be able to correctly pick a thumb1 register over a normal GPR. This attempts to add the required logic into eliminateFrameIndex and rewriteT2FrameIndex, only picking the FrameReg if it is a valid register for the operands register class, and picking a valid scratch register for the register class. Differential Revision: https://reviews.llvm.org/D66285 llvm-svn: 369108	2019-08-16 13:42:39 +00:00
David Green	8c2c5f5045	[ARM] Don't pretend we know how to generate MVE VLDn We don't yet know how to generate these instructions for MVE. And in the case of VLD3, we don't even have the instruction. For the moment don't tell the vectoriser that we have VLD4, just to end up serialising the results. Differential Revision: https://reviews.llvm.org/D66009 llvm-svn: 369101	2019-08-16 13:06:49 +00:00
Eli Friedman	9b9a308452	[ARM][LowOverheadLoops] Fix generated code for "revert". Two issues: 1. t2CMPri shouldn't use CPSR if it isn't predicated. This doesn't really have any visible effect at the moment, but it might matter in the future. 2. The t2CMPri generated for t2WhileLoopStart might need to use a register that isn't LR. My team found this because we have a patch to track register liveness late in the pass pipeline. I'll look into upstreaming it to help catch issues like this earlier. Differential Revision: https://reviews.llvm.org/D66243 llvm-svn: 369069	2019-08-15 23:35:53 +00:00
Philip Reames	5c38ca3534	[SDAG] Minor code cleanup/standardization of atomic accessors [NFC] llvm-svn: 369057	2019-08-15 22:21:14 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
David Green	3a99101812	[ARM] Fix alignment checks for BE VLDRH We need to allow any alignment at least 2, not just exactly 2, so that the big endian loads and stores can be selected successfully. I've also added extra BE testing for the load and store tests. Thanks to Oliver for the report. Differential Revision: https://reviews.llvm.org/D66222 llvm-svn: 368996	2019-08-15 12:54:47 +00:00
David Green	0ff2296a49	[ARM] MVE predicate store patterns Stack loads and stores were already working, but direct stores were not. This adds the patterns for them, same as predicate loads. Differential Revision: https://reviews.llvm.org/D66213 llvm-svn: 368988	2019-08-15 10:41:42 +00:00
David Green	04f2f32869	[ARM] MVE trunc to i1 vectors This adds patterns for selecting trunc instructions from full vectors to i1's vectors. Differential Revision: https://reviews.llvm.org/D66201 llvm-svn: 368981	2019-08-15 09:26:51 +00:00
David Green	a655393f17	[ARM] Add MVE beats vector cost model The MVE architecture has the idea of "beats", where a vector instruction can be executed over several ticks of the architecture. This adds a similar system into the Arm backend cost model, multiplying the cost of all vector instructions by a factor. This factor essentially becomes the expected difference between scalar code and vector code, on average. MVE Vector instructions can also overlap so the a true cost of them is often lower. But equally scalar instructions can in some situations be dual issued, or have other optimisations such as unrolling or make use of dsp instructions. The default is chosen as 2. This should not prevent vectorisation is a most cases (as the vector instructions will still be doing at least 4 times the work), but it will help prevent over vectorising in cases where the benefits are less likely. This adds things so far to the obvious places in ARMTargetTransformInfo, and updates a few related costs like not treating float instructions as cost 2 just because they are floats. Differential Revision: https://reviews.llvm.org/D66005 llvm-svn: 368733	2019-08-13 18:12:08 +00:00
Momchil Velikov	114c37e72a	[ARM] Fix detection of duplicates when parsing reg list operands Differential Revision: https://reviews.llvm.org/D65957 llvm-svn: 368712	2019-08-13 16:13:00 +00:00
Momchil Velikov	f990e4a4c7	[ARM] Fix encoding of APSR in CLRM instruction The APSR is encoded by setting bit 15 in the register list of the CLRM instruction (cf. https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf). Differential Revision: https://reviews.llvm.org/D65873 llvm-svn: 368711	2019-08-13 16:12:46 +00:00
Matt Arsenault	5af9cf042f	GlobalISel: Change representation of shuffle masks Currently shufflemasks get emitted as any other constant, and you end up with a bunch of virtual registers of G_CONSTANT with a G_BUILD_VECTOR. The AArch64 selector then asserts on anything that doesn't fit this pattern. This isn't an ideal representation, and should avoid legalization and have fewer opportunities for a representational error. Rather than invent a new shuffle mask operand type, similar to what ShuffleVectorSDNode does, just track the original IR Constant mask operand. I don't completely like the idea of adding another link to the IR, but MIR is already quite dependent on IR constants already, and this will allow sharing the shuffle mask utility functions with the IR. llvm-svn: 368704	2019-08-13 15:34:38 +00:00
Amara Emerson	e14c91b71a	[GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652	2019-08-13 06:26:59 +00:00
David Green	86876422ef	[ARM] sext of a load is free This teaches the cost model that the sext or zext of a load is going to be free. Differential Revision: https://reviews.llvm.org/D66006 llvm-svn: 368593	2019-08-12 17:39:56 +00:00
David Green	3e39f39ad9	[ARM] MVE shuffle broadcast costs A VDUP will perform a vector broadcast in a single instruction. Update the cost model for MVE accordingly. Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63448 llvm-svn: 368589	2019-08-12 16:54:07 +00:00
David Green	83bbfaa5e4	[ARM] Put some of the TTI costmodel behind hasNeon calls. This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon() checks, now that we have MVE, and updates all the tests accordingly. Differential Revision: https://reviews.llvm.org/D63447 llvm-svn: 368587	2019-08-12 15:59:52 +00:00
David Green	11c4602fce	[MVE] Don't try to unroll vectorised MVE loops Due to the nature of the beat system in the MVE architecture, along with tail predication and low-overhead loops, unrolling has less benefit compared to normal loops. You can not, for example, hide the latency of a load with other instructions as you can for scalar code. Preventing unrolling also makes the code easier to read and reason about. So if a loop contains vector code, don't enable the runtime unrolling. At least for the time being. Differential Revision: https://reviews.llvm.org/D65803 llvm-svn: 368530	2019-08-11 08:53:18 +00:00
David Green	44f8d635e2	[ARM] Permit auto-vectorization using MVE With enough codegen complete, we can now correctly report the number and size of vector registers for MVE, allowing auto vectorisation. This also allows FP auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE FP arithmetic and parity between scalar and vector FP behaviour. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63728 llvm-svn: 368529	2019-08-11 08:42:57 +00:00
Daniel Sanders	e9a57c2b23	[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487	2019-08-09 21:11:20 +00:00
Tim Northover	e1a5f668b3	GlobalISel: pack various parameters for lowerCall into a struct. I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408	2019-08-09 08:26:38 +00:00
Sam Parker	0dba791a25	[ARM][ParallelDSP] Replace SExt uses As loads are combined and widened, we replaced their sext users operands whereas we should have been replacing the uses of the sext. I've added a load of tests, with only a few of them originally causing assertion failures, the rest improve pattern coverage. Differential Revision: https://reviews.llvm.org/D65740 llvm-svn: 368404	2019-08-09 07:48:50 +00:00
David Green	27ca82f32a	[ARM] Add support for MVE pre and post inc loads and stores This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305	2019-08-08 15:27:58 +00:00
David Green	824ffd8b12	[ARM] MVE big endian loads/stores This adds some missing patterns for big endian loads/stores, allowing unaligned loads/stores to also be selected with an extra VREV, which produces better code than aligning through a stack. Also moves VLDR_P0 to not be LE only, and adjusts some of the tests to show all that working. Differential Revision: https://reviews.llvm.org/D65583 llvm-svn: 368304	2019-08-08 15:15:19 +00:00
Sam Tebbs	7ca980edcd	[ARM] Select VFMA llvm-svn: 368264	2019-08-08 08:21:01 +00:00
David Green	1becefd3f7	[ARM] Tighten up VLDRH.32 with low alignments VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift. Differential Revision: https://reviews.llvm.org/D65580 llvm-svn: 368256	2019-08-08 06:22:03 +00:00
Oliver Cruickshank	4d4eefda6c	[ARM] Expand CTPOP intrinsic for MVE llvm-svn: 368180	2019-08-07 15:47:45 +00:00
Oliver Cruickshank	30dcae0956	[ARM] Generate MVE VHADDs/VHSUBs llvm-svn: 368146	2019-08-07 10:26:57 +00:00
Sam Parker	173de03740	[ARM][LowOverheadLoops] Revert after read/write Currently we check whether LR is stored/loaded to/from inbetween the loop decrement and loop end pseudo instructions. There's two problems here: - It relies on all load/store instructions being labelled as such in tablegen. - Actually any use of loop decrement is troublesome because the value doesn't exist! So we need to check for any read/write of LR that occurs between the two instructions and revert if we find anything. Differential Revision: https://reviews.llvm.org/D65792 llvm-svn: 368130	2019-08-07 07:39:19 +00:00
Matt Arsenault	f4d3113a5f	CodeGen: Migration to using Register llvm-svn: 367974	2019-08-06 03:59:31 +00:00
Amara Emerson	bc1172df14	[GlobalISel][CallLowering] Rename isArgumentHandler() -> isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945	2019-08-05 23:05:28 +00:00
Matt Arsenault	3922392969	AMDGPU: Correct behavior of f16 buffer loads Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879	2019-08-05 15:59:07 +00:00
Guillaume Chatelet	c97a3d15d2	[LLVM][Alignment] Introduce Alignment Type Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828	2019-08-05 11:02:05 +00:00
Oliver Stannard	8ed8353fc4	Reland: Fix and test inter-procedural register allocation for ARM Add an explicit construction of the ArrayRef, gcc 5 and earlier don't seem to select the ArrayRef constructor which takes a C array when the construction is implicit. Original commit message: - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367819	2019-08-05 09:04:10 +00:00
David Green	91296295d0	[ARM] MVE big endian bitcasts This adds big endian MVE patterns for bitcasts. They are defined in llvm as being the same as a store of the existing type and the load into the new. This means that they have to become a VREV between the two types, working in the same way that NEON works in big-endian. This also adds some example tests for bigendian, showing where code is and isn't different. The main difference, especially from a testing perspective is that vectors are passed as v2f64, and so are VREV into and out of call arguments, and the parameters are passed in a v2f64 format. Same happens for inline assembly where the register class is used, so it is VREV to a v16i8. So some of this is probably not correct yet, but it is (mostly) self-consistent and seems to be consistent with how llvm treats vectors. The rest we can hopefully fix later. More details about big endian neon can be found in https://llvm.org/docs/BigEndianNEON.html. Differential Revision: https://reviews.llvm.org/D65581 llvm-svn: 367780	2019-08-04 10:18:15 +00:00
Nikita Popov	4f8259bdbc	[Thumb] Fix invalid symbol redefinition due to duplicated jumptable (PR42760) Fix for https://bugs.llvm.org/show_bug.cgi?id=42760. A tBR_JTr instruction is duplicated by tail duplication, which results in the same jumptable with the same label being emitted twice. Fix this by marking tBR_JTr as not duplicable. The corresponding ARM/Thumb instructions are already marked as not duplicable. Additionally also mark tTBB_JT and tTBH_JT to be consistent with Thumb2, even though this shouldn't be strictly necessary. Differential Revision: https://reviews.llvm.org/D65606 llvm-svn: 367753	2019-08-03 06:47:23 +00:00
Bill Wendling	41a2847a9a	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750	2019-08-03 05:52:47 +00:00
Douglas Yung	42618b270d	Revert Fix and test inter-procedural register allocation for ARM This reverts r367669 (git commit `f6b00c279a`) This was breaking a build bot http://lab.llvm.org:8011/builders/netbsd-amd64/builds/21233 llvm-svn: 367731	2019-08-02 22:11:49 +00:00
Tim Northover	522fb7eedc	GlobalISel: support swiftself attribute llvm-svn: 367683	2019-08-02 14:09:49 +00:00
Oliver Stannard	4b7239ebac	[IPRA][ARM] Disable no-CSR optimisation for ARM This optimisation isn't generally profitable for ARM, because we can save/restore many registers in the prologue and epilogue using the PUSH and POP instructions, but mostly use individual LDR/STR instructions for other spills. Differential revision: https://reviews.llvm.org/D64910 llvm-svn: 367670	2019-08-02 10:23:17 +00:00
Oliver Stannard	f6b00c279a	Fix and test inter-procedural register allocation for ARM - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367669	2019-08-02 10:23:05 +00:00
Sam Parker	cd38599275	[NFC][ARM[ParallelDSP] Rename/remove/change types Remove forward declaration, fold a couple of typedefs and change one to be more useful. llvm-svn: 367665	2019-08-02 08:21:17 +00:00
Sam Parker	14c6dfdfe2	[NFC][ARM][ParallelDSP] Remove ValueList We only care about the first element in the list. llvm-svn: 367660	2019-08-02 07:32:28 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
David Green	1343814fb4	[ARM] Fix for MVE VREV64 The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the cross-beat nature of the instruction. This adds an earlyclobber to Qd, which seems to be the same way we deal with this on other instructions like the write-back on loads and stores. Differential Revision: https://reviews.llvm.org/D65502 llvm-svn: 367544	2019-08-01 11:22:03 +00:00
Sam Parker	7ca8c6f6db	[NFC][ARM][ParallelDSP] Getters and renaming Add a couple of getters for Reduction and do some renaming of variables around CreateSMLAD for clarity. llvm-svn: 367522	2019-08-01 08:17:51 +00:00
Eli Friedman	89b80f1239	[ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1. This is extremely specific, but saves three instructions when it's legal. I don't think the code can be usefully generalized. Differential Revision: https://reviews.llvm.org/D65351 llvm-svn: 367492	2019-07-31 23:19:21 +00:00
Eli Friedman	2f45ec1c39	[ARM] Transform compare of masked value to shift on Thumb1. Thumb1 has very limited immediate modes, so turning an "and" into a shift can save multiple instructions. It's possible to simplify the generated code for test2 and test3 in cmp-and-fold.ll a little more, but I'll implement that as a followup. Differential Revision: https://reviews.llvm.org/D65175 llvm-svn: 367491	2019-07-31 23:17:34 +00:00
Mark Lacey	641ea2e701	[GISel] Address review feedback on passing MD_callees to lowerCall. Preserve the nullptr default for KnownCallees that appears in the base class. llvm-svn: 367477	2019-07-31 20:34:05 +00:00
Mark Lacey	7b8d3eb9e2	[GISel] Pass MD_callees metadata down in call lowering. Summary: This will make it possible to improve IPRA by taking into account register usage in indirect calls. NFC yet; this is just laying the groundwork to start building up patches to take advantage of the information for improved register allocation. Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65488 llvm-svn: 367476	2019-07-31 20:34:02 +00:00
Mikhail Maltsev	806231ecc3	[ARM] Reject CSEL instructions with invalid operands Summary: According to the Armv8.1-M manual CSEL, CSINC, CSINV and CSNEG are "constrained unpredictable" when SP is used as the source register Rn. The assembler should diagnose this case. Reviewers: momchil.velikov, dmgreen, ostannard, simon_tatham, t.p.northover Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65505 llvm-svn: 367433	2019-07-31 14:22:45 +00:00
Oliver Cruickshank	09a1b8172b	[ARM] Generate MVE VFMAs llvm-svn: 367408	2019-07-31 10:44:11 +00:00
Oliver Cruickshank	e7241e8592	[NFC] Test Commit llvm-svn: 367405	2019-07-31 10:08:09 +00:00
Sam Parker	5ea07f7c07	[NFC][ARMCGP] Use switch in isSupportedValue Use a switch instead of many isa<> while checking for supported values. Also be explicit about which cast instructions are supported; This allows the removal of SIToFP from GenerateSignBits. llvm-svn: 367402	2019-07-31 09:34:11 +00:00
Sam Parker	2200a9bdf3	[ARM][ParallelDSP] Convert to function pass Run across a whole function, visiting each basic block one at a time. Differential Revision: https://reviews.llvm.org/D65324 llvm-svn: 367389	2019-07-31 07:32:03 +00:00
Sam Parker	e3a4a13fcc	[ARM][LowOverheadLoops] Enable by default The code is now in a good enough state to pass the bunch of tests that I have run (after fixing the bugs), so let's enable it by default. Differential Revision: https://reviews.llvm.org/D65277 llvm-svn: 367297	2019-07-30 08:14:28 +00:00
Sam Parker	ed2ea3e46b	[ARM][LowOverheadLoops] Revert non-header LE target Revert the hardware loop upon finding a LoopEnd that doesn't target the loop header, instead of asserting a failure. Differential Revision: https://reviews.llvm.org/D65268 llvm-svn: 367296	2019-07-30 08:08:44 +00:00
Sam Parker	414dd1c946	[NFC][ARM[ParallelDSP] Cleanup of BinOpChain - Remove some unused typedefs. - Rename BinOpChain struct to MulCandidate. - Remove the size method of MulCandidate. - Store only the first input of the ValueList provided to MulCandidate, as it's the only value we care about. This means we don't have to perform any ugly (and unnecessary) iterations of the list later on. llvm-svn: 367208	2019-07-29 08:41:51 +00:00
Sam Parker	8538060103	[NFC][ARM][ParallelDSP] Remove AreSymmetrical We explicitly search for a parallel mac and we only care about its inputs, checking for symmetry doesn't add anything here. llvm-svn: 367205	2019-07-29 08:12:24 +00:00
Sam Parker	11ad33ede6	[NFC][ARM][ParallelDSP] Remove PopulateLoads We no longer have to check what loads are used, all this is performed at the start of the transform, so it's not doing anything now. llvm-svn: 367204	2019-07-29 08:07:23 +00:00
David Green	b8b8b46a51	[ARM] MVE VPNOT This adds the patterns required to transform xor P0, -1 to a VPNOT. The instruction operands have to change a little for this, adding an in and an out VCCR reg and using a custom DecodeMVEVPNOT for the decode. Differential Revision: https://reviews.llvm.org/D65133 llvm-svn: 367192	2019-07-28 14:07:48 +00:00
David Green	9cf344e739	[ARM] Better patterns for fp <> predicate vectors These are some better patterns for converting between predicates and floating points. Much like the extends, we select "1"/"-1" or "0" depending on the predicate value. Or we perform a compare against 0 to convert to a predicate. Differential Revision: https://reviews.llvm.org/D65103 llvm-svn: 367191	2019-07-28 13:53:39 +00:00
Sam Parker	3da59e5513	[ARM][ParallelDSP] Combine structs Combine OpChain and BinOpChain structs as OpChain is a base class to BinOpChain that is never used. llvm-svn: 367114	2019-07-26 14:11:40 +00:00
Sam Parker	7440065bd8	[NFC][ARM][ParallelDSP] Cleanup isNarrowSequence Remove unused logic. llvm-svn: 367099	2019-07-26 10:57:42 +00:00
Sam Parker	c760b5da11	[ARM][LowOverheadLoops] Add CPSR defs Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair, so add an implicit def to these pseudo instructions in case that WLS and LE aren't generated. Differential Revision: https://reviews.llvm.org/D65275 llvm-svn: 367089	2019-07-26 08:15:01 +00:00
Pablo Barrio	275954539d	[ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1 Summary: Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1. Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the Arm architecture. Neoverse N1 implements both AArch32 and AArch64. Cortex-A65: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65 Cortex-A65AE: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae Neoverse E1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1 Neoverse N1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 Patch by Diogo Sampaio and Pablo Barrio Reviewers: samparker, LukeCheeseman, sbaranga, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64406 llvm-svn: 367007	2019-07-25 10:59:45 +00:00
Eli Friedman	82e109279d	[ARM] Remove dead code from ARMConstantIslands. tLDRHi is not a pc-relative load; it can't directly refer to a constant pool or jump table. llvm-svn: 366963	2019-07-24 23:36:14 +00:00
David Green	cd7a6fa314	[ARM] Rewrite how VCMP are lowered, using a single node This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934	2019-07-24 17:36:47 +00:00
David Green	047a0b6575	[ARM] Disable MVE fptosi and friends The prevents us from trying to convert an i1 predicate vector to a float, or vice-versa. Better patterns are possible, which will follow in a subsequent commit. For now we just expand them. Differential Revision: https://reviews.llvm.org/D65066 llvm-svn: 366931	2019-07-24 17:26:26 +00:00
David Green	b342bddbe2	[ARM] More MVE compare vector splat combines for ANDs Adds some extra r register compare combines, this time for ANDs. Differential Revision: https://reviews.llvm.org/D65062 llvm-svn: 366928	2019-07-24 17:08:09 +00:00
David Green	93b5f61295	[ARM] MVE compare vector splat combine MVE VCMP instructions can use a general purpose register as the second operand. This adds the combines for it, selecting from a compare of a vdup. Differential Revision: https://reviews.llvm.org/D65061 llvm-svn: 366924	2019-07-24 16:58:41 +00:00
David Green	bab4d8ac5a	[ARM] Better OR's for MVE compares This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920	2019-07-24 16:42:09 +00:00
David Green	69fba7434e	[ARM] Better AND's for MVE compares Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where the second vcmp becomes predicated on the first. The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the VPTBlockPass. Differential Revision: https://reviews.llvm.org/D65058 llvm-svn: 366910	2019-07-24 14:42:05 +00:00
David Green	4fc78c496e	[ARM] MVE floating point compares and selects Much like integers, this adds MVE floating point compares and select. It requires a lot more buildvector/shuffle code because we may need to expand the compares without mve.fp, and requires support for and/or because of the way we lower llvm condition codes. Some original code by David Sherwood Differential Revision: https://reviews.llvm.org/D65054 llvm-svn: 366909	2019-07-24 14:28:22 +00:00
David Green	a4a4698c16	[ARM] Basic And/Or/Xor handling for MVE predicates This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It does this by going into and out of GPRs, doing the operation on scalars. Code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65053 llvm-svn: 366907	2019-07-24 14:17:54 +00:00
Simi Pallipurath	724888af45	[ARM] Make sure that the constant pool does not keep in the middle of an IT block. This change make sure that llvm does not emit an invalid IT block by putting the constant pool in the middle of an IT block. We have code to try to avoid putting a constant island in the middle of an IT block, but it only works if we see an IT between the one currently referencing CPE and possible insertion point. If the first instruction we look at is the VLDRD after the IT , we never see the IT and does not realize that the instruction doing the load could be in an IT block itself. Differential Revision: https://reviews.llvm.org/D64621 Change-Id: I24cecb37cded75e8992870bd997f6226853bd920 llvm-svn: 366905	2019-07-24 13:54:14 +00:00
Sjoerd Meijer	a19f5a76e6	Test commit. NFC. Removed 2 trailing whitespaces in 2 files that used to be in different repos to test my new github monorepo workflow. llvm-svn: 366904	2019-07-24 13:30:36 +00:00
David Green	c7e55d4f52	[ARM] MVE predicate register support This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890	2019-07-24 11:51:36 +00:00
David Green	b9d96ceca0	[ARM] MVE integer compares and selects This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885	2019-07-24 11:08:14 +00:00
Sam Parker	aeb21b96a0	[ARM][ParallelDSP] Fix pointer operand reordering While combining two loads into a single load, we often need to reorder the pointer operands for the new load. This reordering was broken in the cases where there was a chain of values that built up the pointer. Differential Revision: https://reviews.llvm.org/D65193 llvm-svn: 366881	2019-07-24 09:38:39 +00:00
Eli Friedman	b27fc95e89	[ARM] Add opt-bisect support to ARMParallelDSP. llvm-svn: 366851	2019-07-23 20:48:46 +00:00
Sam Parker	57e87dd81b	[ARM][LowOverheadLoops] Fix branch target codegen While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809	2019-07-23 14:08:46 +00:00
David Green	fdedf240f8	[ARM] Rename NEONModImm to VMOVModImm. NFC Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE. llvm-svn: 366790	2019-07-23 09:19:24 +00:00
Sam Parker	4379a40088	[ARM][LowOverheadLoops] Revert remaining pseudos ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691	2019-07-22 14:16:40 +00:00
David Green	8876a312a8	[ARM] Fix for MVE VPT block pass We need to ensure that the number of T's is correct when adding multiple instructions into the same VPT block. Differential revision: https://reviews.llvm.org/D65049 llvm-svn: 366684	2019-07-22 12:51:38 +00:00
Oliver Stannard	6771a89fa0	[IPRA][ARM] Make use of the "returned" parameter attribute ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669	2019-07-22 08:44:36 +00:00
Mikhail Maltsev	0b001f94a5	[ARM] Add <saturate> operand to SQRSHRL and UQRSHLL Summary: According to the new Armv8-M specification https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the instructions SQRSHRL and UQRSHLL now have an additional immediate operand <saturate>. The new assembly syntax is: SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm where <saturate> can be either 64 (the existing behavior) or 48, in that case the result is saturated to 48 bits. The new operand is encoded as follows: #64 Encoded as sat = 0 #48 Encoded as sat = 1 sat is bit 7 of the instruction bit pattern. This patch adds a new assembler operand class MveSaturateOperand which implements parsing and encoding. Decoding is implemented in DecodeMVEOverlappingLongShift. Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64810 llvm-svn: 366555	2019-07-19 09:46:28 +00:00
Diogo N. Sampaio	11512e742b	[ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine Summary: PerformVMOVRRDCombine ommits adding a offset of 4 to the PointerInfo, when converting a f64 = load[M] to {i32, i32} = {load[M], load[M + 4]} Which would allow the machine scheduller to break dependencies with the second load. - pr42638 Reviewers: eli.friedman, dmgreen, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64870 llvm-svn: 366423	2019-07-18 10:05:56 +00:00
Diana Picus	37e403d18c	[ARM GlobalISel] Cleanup CallLowering. NFC Migrate CallLowering::lowerReturnVal to use the same infrastructure as lowerCall/FormalArguments and remove the now obsolete code path from splitToValueTypes. Forgot to push this earlier. llvm-svn: 366308	2019-07-17 10:01:27 +00:00
Kyrylo Tkachov	a3e26d1a6c	[NFC] Test commit: add full stop at end of comment llvm-svn: 366195	2019-07-16 09:15:01 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
David Green	dc56995c57	[ARM] MVE vector for 64bit types We need to make sure that we are sensibly dealing with vectors of types v2i64 and v2f64, even if most of the time we cannot generate native operations for them. This mostly adds a lot of testing, plus fixes up a couple of the issues found. And, or and xor can be legal for v2i64, and shifts combining needs a slight fixup. Differential Revision: https://reviews.llvm.org/D64316 llvm-svn: 366106	2019-07-15 18:42:54 +00:00
David Green	8e7eee617a	[ARM] Minor formatting in ARMInstrMVE.td. NFC llvm-svn: 366089	2019-07-15 17:29:06 +00:00
David Green	6e89887642	[ARM] MVE Vector Shifts This adds basic lowering for MVE shifts. There are many shifts in MVE, but the instructions handled here are: VSHL (imm) VSHRu (imm) VSHRs (imm) VSHL (vector) VSHL (register) MVE, like NEON before it, doesn't have shift right by a vector (or register). We instead have to negate the amount and shift in the opposite direction. This means we have to convert any SHR's into a form of SHL (that is still signed or unsigned) with a negated condition and selecting from there. MVE still does have shifting by an immediate for SHL, ASR and LSR. This adds lowering for these and for register forms, which work well for shift lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially work optimally for right shifts. Differential Revision: https://reviews.llvm.org/D64212 llvm-svn: 366056	2019-07-15 11:35:39 +00:00
David Green	f059147a10	[ARM] Move Shifts after Bits. NFC This just moves the shift instruction definitions further down the ARMInstrMVE.td file, to make positioning patterns slightly more natural. llvm-svn: 366054	2019-07-15 11:22:05 +00:00
David Green	da750b1688	[ARM] Adjust how NEON shifts are lowered This adjusts the way that we lower NEON shifts to use a DAG target node, not via a neon intrinsic. This is useful for handling MVE shifts operations in the same the way. It also renames some of the immediate shift nodes for consistency, and moves some of the processing of immediate shifts into LowerShift allowing it to capture more cases. Differential Revision: https://reviews.llvm.org/D64426 llvm-svn: 366051	2019-07-15 10:44:50 +00:00
David Green	458a720ec1	[ARM] Add sign and zero extend patterns for MVE The vmovlb instructions can be uses to sign or zero extend vector registers between types. This adds some patterns for them and relevant testing. The VBICIMM generation is also put behind a hasNEON check (as is already done for VORRIMM). Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D64069 llvm-svn: 366008	2019-07-13 15:43:00 +00:00
David Green	07a7ec2021	[ARM] MVE VNEG instruction patterns This selects integer VNEG instructions, which can be especially useful with shifts. Differential Revision: https://reviews.llvm.org/D64204 llvm-svn: 366006	2019-07-13 15:26:51 +00:00
David Green	4ce648b5e8	[ARM] MVE integer abs Similar to floating point abs, we also have instructions for integers. Differential Revision: https://reviews.llvm.org/D64027 llvm-svn: 366005	2019-07-13 14:58:32 +00:00
David Green	701bf714db	[ARM] MVE integer min and max This simply makes the MVE integer min and max instructions legal and adds the relevant patterns for them. Differential Revision: https://reviews.llvm.org/D64026 llvm-svn: 366004	2019-07-13 14:48:54 +00:00
David Green	ac5bcbeb9f	[ARM] MVE VRINT support This adds support for the floor/ceil/trunc/... series of instructions, converting to various forms of VRINT. They use the same suffixes as their floating point counterparts. There is not VTINTR, so nearbyint is expanded. Also added a copysign test, to show it is expanded. Differential Revision: https://reviews.llvm.org/D63985 llvm-svn: 366003	2019-07-13 14:38:53 +00:00
David Green	ec8af0db6c	[ARM] MVE minnm and maxnm instructions This adds the patterns for minnm and maxnm from the fminnum and fmaxnum nodes, similar to scalar types. Original patch by Simon Tatham Differential Revision: https://reviews.llvm.org/D63870 llvm-svn: 366002	2019-07-13 14:29:02 +00:00
Sam Parker	08b4a8da07	[ARM][LowOverheadLoops] Correct offset checking This patch addresses a couple of problems: 1) The maximum supported offset of LE is -4094. 2) The offset of WLS also needs to be checked, this uses a maximum positive offset of 4094. The use of BasicBlockUtils has been changed because the block offsets weren't being initialised, but the isBBInRange checks both positive and negative offsets. ARMISelLowering has been tweaked because the test case presented another pattern that we weren't supporting. llvm-svn: 365749	2019-07-11 09:56:15 +00:00
Simon Tatham	7916198a41	[ARM] Remove nonexistent unsigned forms of MVE VQDMLAH. The VQDMLAH.U8, VQDMLAH.U16 and VQDMLAH.U32 instructions don't actually exist: the Armv8.1-M architecture spec only lists signed forms of that instruction. The unsigned ones were added in error: they existed in an early draft of the spec, but they were removed before the public version, and we missed that particular spec change. Also affects the variant forms VQDMLASH, VQRDMLAH and VQRDMLASH. Reviewers: miyuki Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64502 llvm-svn: 365747	2019-07-11 09:52:15 +00:00
Sam Parker	85ad78b1cf	[ARM][ParallelDSP] Change the search for smlads Two functional changes have been made here: - Now search up from any add instruction to find the chains of operations that we may turn into a smlad. This allows the generation of a smlad which doesn't accumulate into a phi. - The search function has been corrected to stop it falsely searching up through an invalid path. The bulk of the changes have been making the Reduction struct a class and making it more C++y with getters and setters. Differential Revision: https://reviews.llvm.org/D61780 llvm-svn: 365740	2019-07-11 07:47:50 +00:00
Sam Parker	775b2f598a	[NFC][ARM] Convert lambdas to static helpers Break up and convert some of the lambdas in ARMLowOverheadLoops into static functions. llvm-svn: 365623	2019-07-10 12:29:43 +00:00
Mikhail Maltsev	ed143c5d59	[ARM] Enable VPUSH/VPOP aliases when either MVE or VFP is present Summary: Use the same predicates as VSTMDB/VLDMIA since VPUSH/VPOP alias to these. Patch by Momchil Velikov. Reviewers: ostannard, simon_tatham, SjoerdMeijer, samparker, t.p.northover, dmgreen Reviewed By: dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64413 llvm-svn: 365604	2019-07-10 08:59:17 +00:00
Mikhail Maltsev	ee81051fc9	[ARM] Relax constraints on operands of VQxDMLxDH instructions Summary: According to a recently updated Armv8-M spec (https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf) the 32-bit width versions of the following instructions: * VQDMLADH * VQDMLADHX * VQRDMLADH * VQRDMLADHX * VQDMLSDH * VQDMLSDHX * VQRDMLSDH * VQRDMLSDHX are no longer unpredictable when their output register is the same as one of the input registers. This patch updates the assembler parser and the corresponding tests and also removes @earlyclobber from the instruction constraints. Reviewers: simon_tatham, ostannard, dmgreen, SjoerdMeijer, samparker Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64250 llvm-svn: 365306	2019-07-08 09:44:52 +00:00
Fangrui Song	7d63be09b6	[ARM] Fix null pointer dereference in CodeGen/ARM/Windows/stack-protector-msvc.ll.test after D64292/r365283 CLI.CS may not be set. llvm-svn: 365299	2019-07-08 08:43:31 +00:00
Martin Storsjo	8d9d290d4c	[ARM] Add support for MSVC stack cookie checking Heavily based on the same for AArch64, from SVN r346469. Differential Revision: https://reviews.llvm.org/D64292 llvm-svn: 365283	2019-07-07 18:57:31 +00:00
David Green	47afdaa487	[ARM] MVE patterns for VMVN, VORR and VBIC This add simple Q register forms of bitwise not instructions. Differential Revision: https://reviews.llvm.org/D63983 llvm-svn: 365214	2019-07-05 15:21:29 +00:00
David Green	25cf705097	[ARM] MVE VMOV immediate handling This adds some handling for VMOVimm, using the same method that NEON uses. We create VMOVIMM/VMVNIMM/VMOVFPIMM nodes based on the immediate, and select them using the now renamed ARMvmovImm/etc. There is also an extra 64bit immediate mode that I have not yet added here. Code by David Sherwood Differential Revision: https://reviews.llvm.org/D63884 llvm-svn: 365178	2019-07-05 10:02:43 +00:00
David Green	bb7e97d783	[ARM] MVE fp to int conversions This adds the patterns needed for fptosi and sitofp. Differential Revision: https://reviews.llvm.org/D63729 llvm-svn: 365176	2019-07-05 09:34:30 +00:00
David Green	2b20ee4110	[ARM] Favour PL/MI over GE/LT when possible The arm condition codes for GE is N==V (and for LT is N!=V). If the source of flags cannot set V (overflow), such as a cmp against #0, then we can use the simpler PL and MI conditions that only check N. As these PL/MI conditions are simpler than GE/LT, other passes like the peephole optimiser can have a better time optimising away the redundant CMPs. The exception is the VSEL instruction, which cannot take the PL code, so there the transform favours GE. Differential Revision: https://reviews.llvm.org/D64160 llvm-svn: 365117	2019-07-04 08:58:58 +00:00
David Green	d2a9ec29d0	[ARM] MVE bitwise instruction patterns This adds patterns for the simpler VAND, VORR and VEOR bitwise vector instructions. It also adjusts the top16Zero PatLeaf to not match on vector instructions, which can otherwise cause problems. Code written by David Sherwood. Differential Revision: https://reviews.llvm.org/D63867 llvm-svn: 365113	2019-07-04 08:41:23 +00:00
Sam Parker	6005681ac6	[ARM] Fix for NDEBUG builds Fix unused variable warning as well as a nonsense assert. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 365046	2019-07-03 14:39:23 +00:00
Oliver Stannard	830b20344b	[ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsize For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow encoding. However, current allocation order is like: R0-R3, R12, LR, R4-R11 As a result, a lot of instructs that use R12/LR will be wide instrs. This patch changes the allocation order to: R0-R7, R12, LR, R8-R11 for thumb2 and -Osize. In most cases, there is no extra push/pop instrs as they will be folded into existing ones. There might be slight performance impact due to more stack usage, so we only enable it when opt for min size. https://reviews.llvm.org/D30324 llvm-svn: 365014	2019-07-03 09:58:52 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Eli Friedman	e97aa961d3	[ARM] Fix unwind info for Thumb1 functions that save high registers. There were two issues here: one, some of the relevant instructions were missing the expected "FrameSetup" flag, and two, ARMAsmPrinter::EmitUnwindingInstruction wasn't expecting "mov" instructions in the prologue. I'm sticking the additional state into ARMFunctionInfo so it's obvious it only applies to the current function. I considered a few alternative approaches where we would compute the correct unwind information as part of the prologue/epilogue lowering, but it seems like a lot of work to introduce pseudo-instructions, and the current code seems to be reliable enough. Fixes https://bugs.llvm.org/show_bug.cgi?id=42408. Differential Revision: https://reviews.llvm.org/D63964 llvm-svn: 364970	2019-07-02 21:35:15 +00:00
Simon Tatham	bffd099d15	[ARM] MVE: allow soft-float ABI to pass vector types. Passing a vector type over the soft-float ABI involves it being split into four GPRs, so the first thing that has to happen at the start of the function is to recombine those into a vector register. The ABI types all vectors as v2f64, so we need to support BUILD_VECTOR for that type, which I do in this patch by allowing it to be expanded in terms of INSERT_VECTOR_ELT, and writing an ISel pattern for that in turn. Similarly, I provide a rule for EXTRACT_VECTOR_ELT so that a returned vector can be marshalled back into GPRs. While I'm here, I've also added ISD::UNDEF to the list of operations we turn back on in `setAllExpand`, because I noticed that otherwise it gets expanded into a BUILD_VECTOR with explicit zero inputs, leading to pointless machine instructions to zero out a vector register that's about to have every lane overwritten of in any case. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63937 llvm-svn: 364910	2019-07-02 11:26:11 +00:00
Simon Tatham	7b63a9533c	[ARM] Stop using scalar FP instructions in integer-only MVE mode. If you compile with `-mattr=+mve` (enabling integer MVE instructions but not floating-point ones), then the scalar FP //registers// exist and it's legal to move things in and out of them, load and store them, but it's not legal to do arithmetic on them. In D60708, the calls to `addRegisterClass` in ARMISelLowering that enable use of the scalar FP registers became conditionalised on `Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so that loads, stores and moves of those registers would work. But I didn't realise that that would also enable all the operations on those types by default. Now, if the target doesn't have basic VFP, we follow up those `addRegisterClass` calls by turning back off all the nontrivial operations you can perform on f32 and f64. That causes several knock-on failures, which are fixed by allowing the `VMOVDcc` and `VMOVScc` instructions to be selected even if all you have is `HasFPRegs`, and adjusting several checks for 'is this a double in a single-precision-only world?' to the more general 'is this any FP type we can't do arithmetic on?'. Between those, the whole of the `float-ops.ll` and `fp16-instructions.ll` tests can now run in MVE-without-FP mode and generate correct-looking code. One odd side effect is that I had to relax the check lines in that test so that they permit test functions like `add_f` to be generated as tailcalls to software FP library functions, instead of ordinary calls. Doing that is entirely legal, but the mystery is why this is the first RUN line that's needed the relaxation: on the usual kind of non-FP target, no tailcalls ever seem to be generated. Going by the llc messages, I think `SoftenFloatResult` must be perturbing the code generation in some way, but that's as much as I can guess. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63938 llvm-svn: 364909	2019-07-02 11:26:00 +00:00
Mikhail Maltsev	8b2e304bc5	[ARM] Fix MVE_VQxDMLxDH instruction class Summary: According to the ARMARM, the VQDMLADH, VQRDMLADH, VQDMLSDH and VQRDMLSDH instructions handle their results as follows: "The base variant writes the results into the lower element of each pair of elements in the destination register, whereas the exchange variant writes to the upper element in each pair". I.e., the initial content of the output register affects the result, as usual, we model this with an additional input. Also, for 32-bit variants Qd is not allowed to be the same register as Qm and Qn, we use @earlyclobber to indicate this. This patch also changes vpred_r to vpred_n because the instructions don't have an explicit 'inactive' operand. Reviewers: dmgreen, ostannard, simon_tatham Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64007 llvm-svn: 364796	2019-07-01 16:07:58 +00:00
Mikhail Maltsev	4a9e3f15bb	[ARM] MVE: support QQPRRegClass and QQQQPRRegClass Summary: QQPRRegClass and QQQQPRRegClass are used by the interleaving/deinterleaving loads/stores to represent sequences of consecutive SIMD registers. Reviewers: ostannard, simon_tatham, dmgreen Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64009 llvm-svn: 364794	2019-07-01 16:05:23 +00:00
Sam Parker	98722691b0	[ARM] WLS/LE Code Generation Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733	2019-07-01 08:21:28 +00:00
Sam Tebbs	e39e958da3	[ARM] Add support for the MVE long shift instructions MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers. The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl. test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions. Differential Revision: https://reviews.llvm.org/D63430 llvm-svn: 364654	2019-06-28 15:43:31 +00:00
David Green	9dbdfe6b78	[ARM] Add MVE mul patterns This simply adds integer and floating point VMUL patterns for MVE, same as we have add and sub. Differential Revision: https://reviews.llvm.org/D63866 llvm-svn: 364643	2019-06-28 11:44:03 +00:00
David Green	2883944035	[ARM] Mark math routines as non-legal for MVE This adds handling and tests for a number of floating point math routines, which have no MVE instructions. Differential Revision: https://reviews.llvm.org/D63725 llvm-svn: 364641	2019-06-28 11:17:38 +00:00
David Green	ff70cbc895	[ARM] MVE patterns for VABS and VNEG This simply adds the required patterns for fp neg and abs. Differential Revision: https://reviews.llvm.org/D63861 llvm-svn: 364640	2019-06-28 10:25:35 +00:00
David Green	eb7080ac6e	[ARM] Widening loads and narrowing stores MVE has instructions to widen as it loads, and narrow as it stores. This adds the required patterns and legalisation to make them work including specifying that they are legal, patterns to select them and test changes. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63839 llvm-svn: 364636	2019-06-28 09:47:55 +00:00
Simon Tatham	29ff1b4f46	[ARM] Fix integer UB in MVE load/store immediate handling. llvm-svn: 364635	2019-06-28 09:28:39 +00:00
David Green	07e53fee14	[ARM] MVE loads and stores This fills in the gaps for basic MVE loads and stores, allowing unaligned access and adding far too many tests. These will become important as narrowing/expanding and pre/post inc are added. Big endian might still not be handled very well, because we have not yet added bitcasts (and I'm not sure how we want it to work yet). I've included the alignment code anyway which maps with our current patterns. We plan to return to that later. Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev. Differential Revision: https://reviews.llvm.org/D63838 llvm-svn: 364633	2019-06-28 08:41:40 +00:00
David Green	fc4102417b	[ARM] Mark div and rem as expand for MVE We don't have vector operations for these, so they need to be expanded for both integer and float. Differential Revision: https://reviews.llvm.org/D63595 llvm-svn: 364631	2019-06-28 08:18:55 +00:00
David Green	62889b0ea5	[ARM] Select MVE fp add and sub The same as integer arithmetic, we can add simple floating point MVE addition and subtraction patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63257 llvm-svn: 364629	2019-06-28 07:41:09 +00:00
David Green	be05b85db9	[ARM] Select MVE add and sub This adds the first few patterns for MVE code generation, adding simple integer add and sub patterns. Initial code by David Sherwood Differential Revision: https://reviews.llvm.org/D63255 llvm-svn: 364627	2019-06-28 07:21:11 +00:00
David Green	8be372b190	[ARM] MVE vector shuffles This patch adds necessary shuffle vector and buildvector support for ARM MVE. It essentially adds support for VDUP, VREVs and some VMOVs, which are often required by other code (like upcoming patches). This mostly uses the same code from Neon that already generated NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and moved to ARMInstrInfo as they are common to both architectures. Most of the selection code seems to be applicable to both, but NEON does have some more instructions making some parts specific. Most code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63567 llvm-svn: 364626	2019-06-28 07:08:42 +00:00
Sam Tebbs	8747c5f482	[ARM] Fix formatting issue in ARMISelLowering.cpp Fix a formatting error in ARMISelLowering.cpp::Expand64BitShift. My test commit after receiving write access. llvm-svn: 364560	2019-06-27 16:28:28 +00:00
Simon Tatham	1a3dc8f678	[ARM] Fix bogus assertions in copyPhysReg v8.1-M cases. The code to generate register move instructions in and out of VPR and FPSCR_NZCV had assertions checking that the other register involved was a GPR _pair_, instead of a single GPR as it should have been. Reviewers: miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63865 llvm-svn: 364534	2019-06-27 12:41:12 +00:00
Simon Tatham	ffb2b347ff	[ARM] Fix handling of zero offsets in LOB instructions. The BF and WLS/WLSTP instructions have various branch-offset fields occupying different positions and lengths in the instruction encoding, and all of them were decoded at disassembly time by the function DecodeBFLabelOffset() which returned SoftFail if the offset was zero. In fact, it's perfectly fine and not even a SoftFail for most of those offset fields to be zero. The only one that can't be zero is the 4-bit field labelled `boff` in the architecture spec, occupying bits {26-23} of the BF instruction family. If that one is zero, the encoding overlaps other instructions (WLS, DLS, LETP, VCTP), so it ought to be a full Fail. Fixed by adding an extra template parameter to DecodeBFLabelOffset which controls whether a zero offset is accepted or rejected. Adjusted existing tests (only in error messages for bad disassemblies); added extra tests to demonstrate zero offsets being accepted in all the right places, and a few demonstrating rejection of zero `boff`. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63864 llvm-svn: 364533	2019-06-27 12:41:07 +00:00
Simon Tatham	e5ce56fb95	[ARM] Make coprocessor number restrictions consistent. Different versions of the Arm architecture disallow the use of generic coprocessor instructions like MCR and CDP on different sets of coprocessors. This commit centralises the check of the coprocessor number so that it's consistent between assembly and disassembly, and also updates it for the new restrictions in Arm v8.1-M. New tests added that check all the coprocessor numbers; old tests updated, where they used a number that's now become illegal in the context in question. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63863 llvm-svn: 364532	2019-06-27 12:40:55 +00:00
Simon Tatham	02449f9c3c	[ARM] Tighten restrictions on use of SP in v8.1-M CSEL. In the `CSEL Rd,Rm,Rn` instruction family (also including CSINC, CSINV and CSNEG), the architecture lists it as CONSTRAINED UNPREDICTABLE (i.e. SoftFail) to use SP in the Rd or Rm slot, but outright illegal to use it in the Rn slot, not least because some encodings of that form are used by MVE instructions such as UQRSHLL. MC was treating all three slots the same, as SoftFail. So the only reason UQRSHLL was disassembled correctly at all was because the MVE decode table is separate from the Thumb2 one and takes priority; if you turned off MVE, then encodings such as `[0x5f,0xea,0x0d,0x83]` would disassemble as spurious CSELs. Fixed by inventing another version of the `GPRwithZR` register class, which disallows SP completely instead of just SoftFailing it. Reviewers: DavidSpickett, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63862 llvm-svn: 364531	2019-06-27 12:40:40 +00:00
Diana Picus	43fb5ae50c	[GlobalISel] Accept multiple vregs for lowerCall's args Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 llvm-svn: 364512	2019-06-27 09:18:03 +00:00
Diana Picus	8138996128	[GlobalISel] Accept multiple vregs for lowerCall's result Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 llvm-svn: 364511	2019-06-27 09:15:53 +00:00
Diana Picus	c3dbe23977	[GlobalISel] Accept multiple vregs in lowerFormalArgs Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510	2019-06-27 08:54:17 +00:00
Diana Picus	69ce1c1319	[GlobalISel] Allow multiple VRegs in ArgInfo. NFC Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509	2019-06-27 08:50:53 +00:00
Eli Friedman	ab1d73ee32	[ARM] Don't reserve R12 on Thumb1 as an emergency spill slot. The current implementation of ThumbRegisterInfo::saveScavengerRegister is bad for two reasons: one, it's buggy, and two, it blocks using R12 for other optimizations. So this patch gets rid of it, and adds the necessary support for using an ordinary emergency spill slot on Thumb1. (Specifically, I think saveScavengerRegister was broken by r305625, and nobody noticed for two years because the codepath is almost never used. The new code will also probably not be used much, but it now has better tests, and if we fail to emit a necessary emergency spill slot we get a reasonable error message instead of a miscompile.) A rough outline of the changes in the patch: 1. Gets rid of ThumbRegisterInfo::saveScavengerRegister. 2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an emergency spill slot for Thumb1. 3. Implements useFPForScavengingIndex, so the emergency spill slot isn't placed at a negative offset from FP on Thumb1. 4. Modifies the heuristics for allocating an emergency spill slot to support Thumb1. This includes fixing ExtraCSSpill so we don't try to use "lr" as a substitute for allocating an emergency spill slot. 5. Allocates a base pointer in more cases, so the emergency spill slot is always accessible. 6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the right offset in the new cases where we're forcing a base pointer. 7. Ensures we never generate a load or store with an offset outside of its frame object. This makes the heuristics more straightforward. 8. Changes Thumb1 prologue and epilogue emission so it never uses register scavenging. Some of the changes to the emergency spill slot heuristics in determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow the compiler to avoid allocating an emergency spill slot in cases where it isn't necessary. The rest of the changes should only affect Thumb1. Differential Revision: https://reviews.llvm.org/D63677 llvm-svn: 364490	2019-06-26 23:46:51 +00:00
Mikhail Maltsev	6dcbb3161e	[ARM] Handle fixup_arm_pcrel_9 correctly on big-endian targets Summary: The getFixupKindContainerSizeBytes function returns the size of the instruction containing a given fixup. Currently fixup_arm_pcrel_9 is not handled in this function, this causes an assertion failure in the debug build and incorrect codegen in the release build. This patch fixes the problem. Reviewers: ostannard, simon_tatham Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63778 llvm-svn: 364404	2019-06-26 10:48:40 +00:00
Fangrui Song	6a4c68e187	[ARM] Fix -Wimplicit-fallthrough after D60709/r364331 llvm-svn: 364376	2019-06-26 02:34:10 +00:00
Simon Tatham	e8de8ba6a6	[ARM] Support inline assembler constraints for MVE. "To" selects an odd-numbered GPR, and "Te" an even one. There are some 8.1-M instructions that have one too few bits in their register fields and require registers of particular parity, without necessarily using a consecutive even/odd pair. Also, the constraint letter "t" should select an MVE q-register, when MVE is present. This didn't need any source changes, but some extra tests have been added. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60709 llvm-svn: 364331	2019-06-25 16:49:32 +00:00
Simon Tatham	a4b415a683	[ARM] Code-generation infrastructure for MVE. This provides the low-level support to start using MVE vector types in LLVM IR, loading and storing them, passing them to __asm__ statements containing hand-written MVE vector instructions, and if you have the hard-float ABI turned on, using them as function parameters. (In the soft-float ABI, vector types are passed in integer registers, and combining all those 32-bit integers into a q-reg requires support for selection DAG nodes like insert_vector_elt and build_vector which aren't implemented yet for MVE. In fact I've also had to add `arm_aapcs_vfpcc` to a couple of existing tests to avoid that problem.) Specifically, this commit adds support for: * spills, reloads and register moves for MVE vector registers * ditto for the VPT predication mask that lives in VPR.P0 * make all the MVE vector types legal in ISel, and provide selection DAG patterns for BITCAST, LOAD and STORE * make loads and stores of scalar FP types conditional on `hasFPRegs()` rather than `hasVFP2Base()`. As a result a few existing tests needed their llc command lines updating to use `-mattr=-fpregs` as their method of turning off all hardware FP support. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60708 llvm-svn: 364329	2019-06-25 16:48:46 +00:00
Sam Parker	bcf0eb7a64	[ARM] Fix for DLS/LE CodeGen The expensive buildbots highlighted the mir tests were broken, which I've now updated and added --verify-machineinstrs to them. This also uncovered a couple of bugs in the backend pass, so these have also been fixed. llvm-svn: 364323	2019-06-25 15:11:17 +00:00
Fangrui Song	807d2f442a	[ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692 llvm-svn: 364312	2019-06-25 13:28:44 +00:00
Simon Tatham	287f0403e3	[ARM] Fix buildbot failure due to -Werror. Including both 'case ARM_AM::uxtw' and 'default' in the getShiftOp switch caused a buildbot to fail with error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] llvm-svn: 364300	2019-06-25 12:23:46 +00:00
Sjoerd Meijer	74ec25a197	[ARM] MVE VPT Blocks A minor iteration on the MVE VPT Block pass to enable more efficient VPT Block code generation: consecutive VPT predicated statements, predicated on the same condition, will be placed within the same VPT Block. This essentially is also an exercise to write some more tests for the next step, which should be more generic also merging instructions when they are not consecutive. Differential Revision: https://reviews.llvm.org/D63711 llvm-svn: 364298	2019-06-25 12:04:31 +00:00
Simon Tatham	4cf18c2849	[ARM] Explicit lowering of half <-> double conversions. If an FP_EXTEND or FP_ROUND isel dag node converts directly between f16 and f32 when the target CPU has no instruction to do it in one go, it has to be done in two steps instead, going via f32. Previously, this was done implicitly, because all such CPUs had the storage-only implementation of f16 (i.e. the only thing you can do with one at all is to convert it to/from f32). So isel would legalize the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp node (or vice versa), and then the fp_extend would already be f32->f64 rather than f16->f64. But that technique can't support a target CPU which has full f16 support but _not_ f64, such as some variants of Arm v8.1-M. So now we provide custom lowering for FP_EXTEND and FP_ROUND, which checks support for f16 and f64 and decides on the best thing to do given the combination of flags it gets back. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60692 llvm-svn: 364294	2019-06-25 11:24:50 +00:00
Simon Tatham	86b7a1e660	[ARM] Add remaining miscellaneous MVE instructions. This final batch includes the tail-predicated versions of the low-overhead loop instructions (LETP); the VPSEL instruction to select between two vector registers based on the predicate mask without having to open a VPT block; and VPNOT which complements the predicate mask in place. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62681 llvm-svn: 364292	2019-06-25 11:24:33 +00:00
Simon Tatham	e6824160dd	[ARM] Add MVE vector load/store instructions. This adds the rest of the vector memory access instructions. It includes contiguous loads/stores, with an ordinary addressing mode such as [r0,#offset] (plus writeback variants); gather loads and scatter stores with a scalar base address register and a vector of offsets from it (written [r0,q1] or similar); and gather/scatters with a vector of base addresses (written [q0,#offset], again with writeback). Additionally, some of the loads can widen each loaded value into a larger vector lane, and the corresponding stores narrow them again. To implement these, we also have to add the addressing modes they need. Also, in AsmParser, the `isMem` query function now has subqueries `isGPRMem` and `isMVEMem`, according to which kind of base register is used by a given memory access operand. I've also had to add an extra check in `checkTargetMatchPredicate` in the AsmParser, without which our last-minute check of `rGPR` register operands against SP and PC was failing an assertion because Tablegen had inserted an immediate 0 in place of one of a pair of tied register operands. (This matches the way the corresponding check for `MCK_rGPR` in `validateTargetOperandClass` is guarded.) Apparently the MVE load instructions were the first to have ever triggered this assertion, but I think only because they were the first to have a combination of the usual Arm pre/post writeback system and the `rGPR` class in particular. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62680 llvm-svn: 364291	2019-06-25 11:24:18 +00:00
Sam Parker	a6fd919cb3	[ARM] DLS/LE low-overhead loop code generation Introduce three pseudo instructions to be used during DAG ISel to represent v8.1-m low-overhead loops. One maps to set_loop_iterations while loop_decrement_reg is lowered to two, so that we can separate the decrement and branching operations. The pseudo instructions are expanded pre-emission, where we can still decide whether we actually want to generate a low-overhead loop, in a new pass: ARMLowOverheadLoops. The pass currently bails, reverting to an sub, icmp and br, in the cases where a call or stack spill/restore happens between the decrement and branching instructions, or if the loop is too large. Differential Revision: https://reviews.llvm.org/D63476 llvm-svn: 364288	2019-06-25 10:45:51 +00:00
Matt Arsenault	faeaedf8e9	GlobalISel: Remove unsigned variant of SrcOp Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194	2019-06-24 16:16:12 +00:00
Matt Arsenault	e3a676e9ad	CodeGen: Introduce a class for registers Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191	2019-06-24 15:50:29 +00:00
Simon Tatham	fe8017621e	[ARM] Add MVE interleaving load/store family. This adds the family of loads and stores with names like VLD20.8 and VST42.32, which load and store parts of multiple q-registers in such a way that executing both VLD20 and VLD21, or all four of VLD40..VLD43, will distribute 2 or 4 vectors' worth of memory data across the lanes of the same number of registers but in a transposed order. In addition to the Tablegen descriptions of the instructions themselves, this patch also adds encode and decode support for the QQPR and QQQQPR register classes (representing the range of loaded or stored vector registers), and tweaks to the parsing system for lists of vector registers to make it return the right format in this case (since, unlike NEON, MVE regards q-registers as primitive, and not just an alias for two d-registers). llvm-svn: 364172	2019-06-24 10:00:39 +00:00
Simon Pilgrim	bdea88325f	Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. llvm-svn: 364068	2019-06-21 16:11:18 +00:00
Simon Tatham	0c7af66450	[ARM] Add MVE 64-bit GPR <-> vector move instructions. These instructions let you load half a vector register at once from two general-purpose registers, or vice versa. The assembly syntax for these instructions mentions the vector register name twice. For the move _into_ a vector register, the MC operand list also has to mention the register name twice (once as the output, and once as an input to represent where the unchanged half of the output register comes from). So we can conveniently assign one of the two asm operands to be the output $Qd, and the other $QdSrc, which avoids confusing the auto-generated AsmMatcher too much. For the move _from_ a vector register, there's no way to get round the fact that both instances of that register name have to be inputs, so we need a custom AsmMatchConverter to avoid generating two separate output MC operands. (And even that wouldn't have worked if it hadn't been for D60695.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62679 llvm-svn: 364041	2019-06-21 13:17:23 +00:00
Simon Tatham	bafb105e96	[ARM] Add MVE vector instructions that take a scalar input. This adds the `MVE_qDest_rSrc` superclass and all its instances, plus a few other instructions that also take a scalar input register or two. I've also belatedly added custom diagnostic messages to the operand classes for odd- and even-numbered GPRs, which required matching changes in two of the existing MVE assembly test files. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62678 llvm-svn: 364040	2019-06-21 13:17:08 +00:00
Simon Tatham	a6b6a15701	[ARM] Add a batch of similarly encoded MVE instructions. Summary: This adds the `MVE_qDest_qSrc` superclass and all instructions that inherit from it. It's not the complete class of _everything_ with a q-register as both destination and source; it's a subset of them that all have similar encodings (but it would have been hopelessly unwieldy to call it anything like MVE_111x11100). This category includes add/sub with carry; long multiplies; halving multiplies; multiply and accumulate, and some more complex instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62677 llvm-svn: 364037	2019-06-21 12:13:59 +00:00
Fangrui Song	d5cf95e41c	[ARM] Fix -Wimplicit-fallthrough after D62675 llvm-svn: 364028	2019-06-21 11:19:11 +00:00
Simon Tatham	7d76f8acf0	[ARM] Add MVE vector compare instructions. Summary: These take a pair of vector register to compare, and a comparison type (written in the form of an Arm condition suffix); they output a vector of booleans in the VPR register, where predication can conveniently use them. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62676 llvm-svn: 364027	2019-06-21 11:14:51 +00:00
Simon Tatham	c9b2cd4674	[ARM] Add a batch of MVE floating-point instructions. Summary: This includes floating-point basic arithmetic (add/sub/multiply), complex add/multiply, unary negation and absolute value, rounding to integer value, and conversion to/from integer formats. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62675 llvm-svn: 364013	2019-06-21 09:35:07 +00:00
Fangrui Song	dc8de6037c	Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC llvm-svn: 364006	2019-06-21 05:40:31 +00:00
Eli Friedman	25f08a17c3	[ARM GlobalISel] Add support for s64 G_ADD and G_SUB. Teach RegisterBankInfo to use the correct register class, and tell the legalizer it's legal. Everything else just works. The one thing that's slightly weird about this compared to SelectionDAG isel is that legalization can't distinguish between i64 and <1 x i64>, so we might end up with more NEON instructions than the user expects. Differential Revision: https://reviews.llvm.org/D63585 llvm-svn: 363989	2019-06-20 21:56:47 +00:00
Simon Tatham	232db11020	[ARM] Add a batch of MVE integer instructions. This includes integer arithmetic of various kinds (add/sub/multiply, saturating and not), and the immediate forms of VMOV and VMVN that load an immediate into all lanes of a vector. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62674 llvm-svn: 363936	2019-06-20 15:16:56 +00:00
Eli Friedman	d88e28d13e	[llvm-objdump] Switch between ARM/Thumb based on mapping symbols. The ARMDisassembler changes allow changing between ARM and Thumb mode based on the MCSubtargetInfo, rather than the Target, which simplifies the other changes a bit. I'm not really happy with adding more target-specific logic to tools/llvm-objdump/, but there isn't any easy way around it: the logic in question specifically applies to disassembling an object file, and that code simply isn't located in lib/Target, at least at the moment. Differential Revision: https://reviews.llvm.org/D60927 llvm-svn: 363903	2019-06-20 00:29:40 +00:00
Simon Tatham	2f5188fd58	[ARM] Add MVE vector bit-operations (register inputs). This includes all the obvious bitwise operations (AND, OR, BIC, ORN, MVN) in register-to-register forms, and the immediate forms of AND/OR/BIC/ORN; byte-order reverse instructions; and the VMOVs that access a single lane of a vector. Some of those VMOVs (specifically, the ones that access a 32-bit lane) share an encoding with existing instructions that were disassembled as accessing half of a d-register (e.g. `vmov.32 r0, d1[0]`), but in 8.1-M they're now written as accessing a quarter of a q-register (e.g. `vmov.32 r0, q0[2]`). The older syntax is still accepted by the assembler. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62673 llvm-svn: 363838	2019-06-19 16:43:53 +00:00
Chen Zheng	c5b918de58	[NFC] move some hardware loop checking code to a common place for other using. Differential Revision: https://reviews.llvm.org/D63478 llvm-svn: 363758	2019-06-19 01:26:31 +00:00
Huihui Zhang	d16779a732	[ARM] Comply with rules on ARMv8-A thumb mode partial deprecation of IT. Summary: When identifing instructions that can be folded into a MOVCC instruction, checking for a predicate operand is not enough, also need to check for thumb2 function, with restrict-IT, is the machine instruction eligible for ARMv8 IT or not. Notes in ARMv8-A Architecture Reference Manual, section "Partial deprecation of IT" https://usermanual.wiki/Pdf/ARM20Architecture20Reference20ManualARMv8.1667877052.pdf "ARMv8-A deprecates some uses of the T32 IT instruction. All uses of IT that apply to instructions other than a single subsequent 16-bit instruction from a restricted set are deprecated, as are explicit references to the PC within that single 16-bit instruction. This permits the non-deprecated forms of IT and subsequent instructions to be treated as a single 32-bit conditional instruction." Reviewers: efriedma, lebedev.ri, t.p.northover, jmolloy, aemerson, compnerd, stoklund, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63474 llvm-svn: 363739	2019-06-18 20:55:09 +00:00
Simon Tatham	cfc70782d7	[ARM] Add MVE vector shift instructions. This includes saturating and non-saturating shifts, both with immediate shift count and with the shift counts given by another vector register; VSHLC (in which the bits shifted out of each active vector lane are shifted in to the next active lane); and also VMOVL, which is enough like an immediate shift that it didn't fit too badly in this category. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62672 llvm-svn: 363696	2019-06-18 16:19:59 +00:00
Simon Tatham	faaf1a5366	[ARM] Add MVE integer vector min/max instructions. Summary: These form a small family of their own, to go with the floating-point VMINNM/VMAXNM instructions added in a previous commit. They introduce the first of many special cases in the mnemonic recognition code, because VMIN with the E suffix used by the VPT predication system needs to avoid being interpreted as the nonexistent instruction 'VMI' with an ordinary 'NE' condition suffix. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62671 llvm-svn: 363695	2019-06-18 15:51:46 +00:00
Simon Tatham	ed4a602515	[ARM] Rename MVE instructions in Tablegen for consistency. Summary: Their names began with a mishmash of `MVE_`, `t2` and no prefix at all. Now they all start with `MVE_`, which seems like a reasonable choice on the grounds that (a) NEON is the thing they're most at risk of being confused with, and (b) MVE implies Thumb-2, so a prefix indicating MVE is strictly more specific than one indicating Thumb-2. Reviewers: ostannard, SjoerdMeijer, dmgreen Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63492 llvm-svn: 363690	2019-06-18 15:05:42 +00:00
Sjoerd Meijer	7a7009f7c8	[ARM] Some Thumb2ITBlock clean ups. NFC Some more refactoring, like registering the IT Block pass, less cryptic variable names, and some simplification of loops. Differential Revision: https://reviews.llvm.org/D63419 llvm-svn: 363666	2019-06-18 12:13:11 +00:00
Sam Parker	1bd3d00e7e	[CodeGen] Check for HardwareLoop Latch ExitBlock The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556	2019-06-17 13:39:28 +00:00
Fangrui Song	4bde5d3c08	[ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265 llvm-svn: 363535	2019-06-17 09:29:50 +00:00
Fangrui Song	89d6905c59	[ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265 llvm-svn: 363534	2019-06-17 09:26:50 +00:00
Sam Parker	a059efa885	[ARM] Remove ARMComputeBlockSize Forgot to remove file! llvm-svn: 363532	2019-06-17 09:13:10 +00:00
Sam Parker	f7c0b3aeb2	[ARM] Add ARMBasicBlockInfo.cpp Forgot to add file! llvm-svn: 363531	2019-06-17 09:05:43 +00:00
Sam Parker	966f4e874e	[ARM] Extract some code from ARMConstantIslandPass Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530	2019-06-17 08:49:09 +00:00
Mikhail Maltsev	d1cc2e1543	[ARM] Add MVE horizontal accumulation instructions This is the family of vector instructions that combine all the lanes in their input vector(s), and output a value in one or two GPRs. Differential Revision: https://reviews.llvm.org/D62670 llvm-svn: 363403	2019-06-14 14:31:13 +00:00
Sjoerd Meijer	3058a62b90	[ARM] MVE VPT Block Pass Initial commit of a new pass to create vector predication blocks, called VPT blocks, that are supported by the Armv8.1-M MVE architecture. This is a first naive implementation. I.e., for 2 consecutive predicated instructions I1 and I2, for example, it will generate 2 VPT blocks: VPST I1 VPST I2 A more optimal implementation would obviously put instructions in the same VPT block when they are predicated on the same condition and when it is allowed to do this: VPTT I1 I2 We will address this optimisation with follow up patches when the groundwork is in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis that we need for the more optimal implementation. VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes cannot interact with each other. Instructions allowed in VPT blocks must be MVE instructions that are marked as VPT compatible. Differential Revision: https://reviews.llvm.org/D63247 llvm-svn: 363370	2019-06-14 11:46:05 +00:00
Simon Tatham	286e1d2c2d	[ARM] Set up infrastructure for MVE vector instructions. This commit prepares the way to start adding the main collection of MVE instructions, which operate on the 128-bit vector registers. The most obvious thing that's needed, and the simplest, is to add the MQPR register class, which is like the existing QPR except that it has fewer registers in it. The more complicated part: MVE defines a system of vector predication, in which instructions operating on 128-bit vector registers can be constrained to operate on only a subset of the lanes, using a system of prefix instructions similar to the existing Thumb IT, in that you have one prefix instruction which designates up to 4 following instructions as subject to predication, and within that sequence, the predicate can be inverted by means of T/E suffixes ('Then' / 'Else'). To support instructions of this type, we've added two new Tablegen classes `vpred_n` and `vpred_r` for standard clusters of MC operands to add to a predicated instruction. Both include a flag indicating how the instruction is predicated at all (options are T, E and 'not predicated'), and an input register field for the register controlling the set of active lanes. They differ from each other in that `vpred_r` also includes an input operand for the previous value of the output register, for instructions that leave inactive lanes unchanged. `vpred_n` lacks that extra operand; it will be used for instructions that don't preserve inactive lanes in their output register (either because inactive lanes are zeroed, as the MVE load instructions do, or because the output register isn't a vector at all). This commit also adds the family of prefix instructions themselves (VPT / VPST), and all the machinery needed to work with them in assembly and disassembly (e.g. generating the 't' and 'e' mnemonic suffixes on disassembled instructions within a predicated block) I've added a couple of demo instructions that derive from the new Tablegen base classes and use those two operand clusters. The bulk of the vector instructions will come in followup commits small enough to be manageable. (One exception is that I've added the full version of `isMnemonicVPTPredicable` in the AsmParser, because it seemed pointless to carefully split it up.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62669 llvm-svn: 363258	2019-06-13 13:11:13 +00:00
Simon Tatham	848d3d0d2c	[ARM] Refactor handling of IT mask operands. During assembly, the mask operand to an IT instruction (storing the sequence of T/E for 'Then' and 'Else') is parsed out of the mnemonic into a representation that encodes 'Then' and 'Else' in the same way regardless of the condition code. At some point during encoding it has to be converted into the instruction encoding used in the architecture, in which the mask encodes a sequence of replacement low-order bits for the condition code, so that which bit value means 'then' and which 'else' depends on whether the original condition code had its low bit set. Previously, that transformation was done by processInstruction(), half way through assembly. So an MCOperand storing an IT mask would sometimes store it in one format, and sometimes in the other, depending on where in the assembly pipeline you were. You can see this in diagnostics from `llvm-mc -debug -triple=thumbv8a -show-inst`, for example: if you give it an instruction such as `itete eq`, you'd see an `<MCOperand Imm:5>` in a diagnostic become `<MCOperand Imm:11>` in the final output. Having the same data structure store values with time-dependent semantics is confusing already, and it will get more confusing when we introduce the MVE VPT instruction which reuses the Then/Else bitmask idea in a different context. So I'm refactoring: now, all `ARMOperand` and `MCOperand` representations of an IT mask work exactly the same way, namely, 0 means 'Then' and 1 means 'Else', regardless of what original predicate is being referred to. The architectural encoding of IT that depends on the original condition is now constructed at the point when we turn the `MCOperand` into the final instruction bit pattern, and decoded similarly in the disassembler. The previous condition-independent parse-time format used 0 for Else and 1 for Then. I've taken the opportunity to flip the sense of it while I'm changing all of this anyway, because it seems to me more natural to use 0 for 'leave the starting condition unchanged' and 1 for 'invert it', as if those bits were an XOR mask. Reviewers: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63219 llvm-svn: 363244	2019-06-13 10:01:52 +00:00
Sam Parker	179e0fa881	[NFC] Simplify Call query Use getIntrinsicID() directly from IntrinsicInst. llvm-svn: 363235	2019-06-13 08:32:56 +00:00
Sam Parker	9d28473a35	[ARM][TTI] Scan for existing loop intrinsics TTI should report that it's not profitable to generate a hardware loop if it, or one of its child loops, has already been converted. Differential Revision: https://reviews.llvm.org/D63212 llvm-svn: 363234	2019-06-13 08:28:46 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Mikael Holmen	030df51e27	[ARM] Fix compiler warning Without this fix clang 3.6 complains with: ../lib/Target/ARM/ARMAsmPrinter.cpp:1473:18: error: variable 'BranchTarget' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] } else if (MI->getOperand(1).isSymbol()) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1479:22: note: uninitialized use occurs here MCInst.addExpr(BranchTarget); ^~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1473:14: note: remove the 'if' if its condition is always true } else if (MI->getOperand(1).isSymbol()) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1465:33: note: initialize the variable 'BranchTarget' to silence this warning const MCExpr *BranchTarget; ^ = nullptr 1 error generated. Discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190610/661417.html llvm-svn: 363166	2019-06-12 14:19:22 +00:00
Sam Parker	757ac02dc8	[ARM] Implement TTI::isHardwareLoopProfitable Implement the backend target hook to drive the HardwareLoops pass. The low-overhead branch extension for Arm M-class cores is flexible enough that we don't have to ensure correctness at this point, except checking that the loop counter variable can be stored in LR - a 32-bit register. For it to be profitable, we want to avoid loops that contain function calls, or any other instruction that alters the PC. This implementation uses TargetLoweringInfo, to query type and operation actions, looks at intrinsic calls and also performs some manual checks for remainder/division and FP operations. I think this should be a good base to start and extra details can be filled out later. Differential Revision: https://reviews.llvm.org/D62907 llvm-svn: 363149	2019-06-12 12:00:42 +00:00
Mikhail Maltsev	7bd5c55cad	[ARM] First MVE instructions: scalar shifts. This introduces a new decoding table for MVE instructions, and starts by adding the family of scalar shift instructions that are part of the MVE architecture extension: saturating shifts within a single GPR, and long shifts across a pair of GPRs (both saturating and normal). Some of these shift instructions have only 3-bit register fields in the encoding, with the low bit fixed. So they can only address an odd or even numbered GPR (depending on the operand), and therefore I add two new register classes, GPREven and GPROdd. Differential Revision: https://reviews.llvm.org/D62668 Change-Id: Iad95d5f83d26aef70c674027a184a6b1e0098d33 llvm-svn: 363051	2019-06-11 12:04:32 +00:00
Simon Tatham	14241378d3	[ARM] Fix unused-variable warning in rL363039. The variable `OffsetMask` is currently only used in an assertion, so if assertions are compiled out and -Werror is enabled, it becomes a build failure. llvm-svn: 363043	2019-06-11 10:09:12 +00:00
Simon Tatham	8c865cacda	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need a new addressing mode. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 363039	2019-06-11 09:29:18 +00:00
Tom Stellard	4b0b26199b	Revert CMake: Make most target symbols hidden by default This reverts r362990 (git commit `374571301d`) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028	2019-06-11 03:21:13 +00:00
Tom Stellard	374571301d	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990	2019-06-10 22:12:56 +00:00
Simon Tatham	67065c5c70	Revert rL362953 and its followup rL362955. These caused a build failure because I managed not to notice they depended on a later unpushed commit in my current stack. Sorry about that. llvm-svn: 362956	2019-06-10 15:58:19 +00:00
Simon Tatham	42078d41d5	[ARM] Add the non-MVE instructions in Arm v8.1-M. This should have been part of r362953, but I had a finger-trouble incident and committed the old rather than new version of the patch. Sorry. llvm-svn: 362955	2019-06-10 15:41:58 +00:00
Simon Tatham	baeea91933	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need some new addressing modes. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 362953	2019-06-10 15:36:34 +00:00
Simon Tatham	b87669f166	[ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR. Arm v8.1-M supports the VMOV instructions that move a half-precision value to and from a GPR, but not if the GPR is SP or PC. To fix this, I've changed those instructions to use the rGPR register class instead of GPR. rGPR always excludes PC, and it excludes SP except in the presence of the HasV8Ops target feature (i.e. Arm v8-A). So the effect is that VMOV.F16 to and from PC is now illegal everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A cores (which I believe is all as it should be). Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60704 llvm-svn: 362942	2019-06-10 14:43:55 +00:00
David Green	d847aa573b	[ARM] Enable Unroll UpperBound This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928	2019-06-10 10:22:14 +00:00
David Green	c5471c2a57	[ARM] Adjust isLegalT1AddressImmediate for non-legal types Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874	2019-06-08 10:32:53 +00:00
David Green	342d1b81a3	[ARM] Add MVE addressing to isLegalT2AddressImmediate Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873	2019-06-08 10:18:23 +00:00
David Green	4ecce205d5	[ARM] Add fp16 addressing to isLegalT2AddressImmediate The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872	2019-06-08 10:09:02 +00:00
David Green	10fbaa96c5	[ARM] Add HasNEON for all Neon patterns in ARMInstrNEON.td. NFCI We are starting to add an entirely separate vector architecture to the ARM backend. To do that we need at least some separation between the existing NEON and the new MVE code. This patch just goes through the Neon patterns and ensures that they are predicated on HasNEON, giving MVE a stable place to start from. No tests yet as this is largely an NFC, and we don't have the other target that will treat any of these intructions as legal. Differential Revision: https://reviews.llvm.org/D62945 llvm-svn: 362870	2019-06-08 09:36:49 +00:00
Mikhail Maltsev	08da01b496	[ARM] Add FP16 vector insert/extract patterns This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into one of VFP2 class registers, where single FP32 sub-registers can be accessed. Then the extraction of even lanes are simple sub-register extractions (because we don't care about the top parts of registers for FP16 operations). Odd lanes need an additional VMOVX instruction. Unfortunately, insertions cannot be handled in the same way, because: * There is no instruction to insert FP16 into an even lane (VINS only works with odd lanes) * The patterns for odd lanes will have a form of a DAG (not a tree), and will not be implementable in pure tablegen Because of this insertions are handled in the same way as 16-bit integer insertions (with conversions between FP registers and GPRs using VMOVHR instructions). Without these patterns the ARM backend would sometimes fail during instruction selection. This patch also adds patterns which combine: * an FP16 element extraction and a store into a single VST1 instruction * an FP16 load and insertion into a single VLD1 instruction Differential Revision: https://reviews.llvm.org/D62651 llvm-svn: 362482	2019-06-04 09:39:55 +00:00
Simon Tatham	ac02445524	[ARM] Turn some undefined encoding bits into 0s. The family of 32-bit Thumb instruction encodings that include t2ORR, t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15. The Tablegen descriptions of those instructions listed them as ?. This change tightens that up by making them into 0 + Unpredictable. In the specific case of t2ORR, we tighten it up still further by making the zero bit mandatory. This change comes from Arm v8.1-M, in which encodings with that bit equal to 1 will now be used for different instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma Reviewed By: dmgreen, efriedma Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60705 llvm-svn: 362470	2019-06-04 08:28:48 +00:00
Diogo N. Sampaio	df92f84110	[ARM][FIX] Ran out of registers due tail recursion Summary: - pr42062 When compiling for MinSize, ARMTargetLowering::LowerCall decides to indirect multiple calls to a same function. However, it disconsiders the limitation that thumb1 indirect calls require the callee to be in a register from r0 to r3 (llvm limiation). If all those registers are used by arguments, the compiler dies with "error: run out of registers during register allocation". This patch tells the function IsEligibleForTailCallOptimization if we intend to perform indirect calls, as to avoid tail call optimization. Reviewers: dmgreen, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62683 llvm-svn: 362366	2019-06-03 08:58:05 +00:00
Sam Parker	913604a637	[NFC][ARM][ParallelDSP] Refactor narrow sequence Most of the code used for finding a 'narrow' sequence is not used, so I've removed it and simplified the calls from the smlad matcher. llvm-svn: 362104	2019-05-30 15:26:37 +00:00
Sjoerd Meijer	eb072b5a6a	[ARM] Change the MC names for VMAXNM/VMINNM Now the NEON ones have a prefix "NEON_", and the VFP ones have a prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be made to match both of _those_ classes of VMAXNM without also matching the MVE ones that are going to be introduced soon. NFCI. Patch by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60700 llvm-svn: 362097	2019-05-30 14:34:29 +00:00
Simon Pilgrim	5359bb4d31	[ARM] LowerVECTOR_SHUFFLE - fix uninitialized variable warnings. NFCI. llvm-svn: 362094	2019-05-30 14:01:24 +00:00
Sjoerd Meijer	930dee2c0b	[ARM] add target arch definitions for 8.1-M and MVE This adds: - LLVM subtarget features to make all the new instructions conditional on, - CPU and FPU names for use on clang's command line, with default FPUs set so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right FPU features, - architecture extension names "mve" and "mve.fp", - ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE (a new actual tag). Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60698 llvm-svn: 362090	2019-05-30 12:57:04 +00:00
Sjoerd Meijer	7eb95d672d	[ARM] Introduce separate features for FP registers The MVE extension in Arm v8.1-M permits the use of some move, load and store isntructions which access the FP registers, even if there's no actual FP support in the processor (in particular, if you have the integer-only version of MVE). Therefore, we need separate subtarget features to condition those instructions on, which are implied by both FP and MVE but are not part of either. Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60694 llvm-svn: 362088	2019-05-30 12:37:05 +00:00
Sjoerd Meijer	5857bf5d1e	[ARM] Add an MVE execution domain MVE architecturally specifies a 'beat' system in which a vector instruction executed now will complete its actual operation over the next four cycles, so it can overlap with the execution of the previous and next MVE instruction. This makes it generally an advantage to avoid moving values back and forth between MVE registers and anywhere else, if there's any sensible way to do the same processing in whatever register type the values already occupied. That's just what the 'execution domain' system is supposed to achieve. So here we add a new execution domain which will contain all the MVE vector instructions when they are added. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60703 llvm-svn: 362068	2019-05-30 08:07:06 +00:00
Sjoerd Meijer	24c5629625	[ARM] Split predicates out into their own .td file The new ARMPredicates.td is included from ARM.td, early enough that the predicate definitions are already in scope when ARMSchedule.td is included. This will make it possible to refer to them in UnsupportedFeatures fields of scheduling models. NFC: the chunk of Tablegen being moved here is copied and pasted verbatim. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60693 llvm-svn: 361958	2019-05-29 13:41:57 +00:00
Simon Tatham	760df47b77	[ARM] Replace fp-only-sp and d16 with fp64 and d32. Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list before rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845	2019-05-28 16:13:20 +00:00
Diana Picus	68b20c589c	[ARM GlobalISel] Cleanup CallLowering a bit We never actually use the Offsets produced by ComputeValueVTs, so remove them until we need them. llvm-svn: 361755	2019-05-27 10:30:33 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
David Green	0dbafe191e	[ARM] Select fp16 fma This adds a pattern for fma, similar to the float and double patterns. Differential Revision: https://reviews.llvm.org/D62330 llvm-svn: 361719	2019-05-26 11:34:30 +00:00
David Green	21542cd6f4	[ARM] Select a number of fp16 rounding functions This add patterns for fp16 round and ceil etc. Same as the float and double patterns. Differential Revision: https://reviews.llvm.org/D62326 llvm-svn: 361718	2019-05-26 11:13:00 +00:00
David Green	c9f4b7d201	[ARM] Promote various fp16 math intrinsics Promote a number of fp16 math intrinsics to float, so that the relevant float math routines can be used. Copysign is expanded so as to be handled in-place. Differential Revision: https://reviews.llvm.org/D62325 llvm-svn: 361717	2019-05-26 10:59:21 +00:00
David Green	2881325b17	[ARM] Select fp16 fabs This adds a pattern for the fabs intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62324 llvm-svn: 361715	2019-05-26 10:51:58 +00:00
David Green	aeade651f3	[ARM] Select fp16 fsqrt This adds a pattern for the sqrt intrinsic, the same as float and double. Differential Revision: https://reviews.llvm.org/D62322 llvm-svn: 361714	2019-05-26 10:42:24 +00:00
David Green	caf8a11b65	[ARM] Promote fp16 frem Promote fp16 frem operations on ARM to floats so they call fmodf. Differential Revision: https://reviews.llvm.org/D62321 llvm-svn: 361713	2019-05-26 10:30:22 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Nick Desaulniers	9f7bd71cf5	[ARM] additionally check for ARM::INLINEASM_BR w/ ARM::INLINEASM Summary: We were observing failures for arm32 allyesconfigs of the Linux kernel with the asm goto Clang patch, where ldr's were being generated to offsets too far away to encode in imm12. It looks like since INLINEASM_BR was created off of INLINEASM, a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Link: https://github.com/ClangBuiltLinux/linux/issues/490 Reviewers: peter.smith, kristof.beyls, ostannard, rengolin, t.p.northover Reviewed By: peter.smith Subscribers: jyu2, javed.absar, hiraditya, llvm-commits, nathanchance, craig.topper, kees, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62400 llvm-svn: 361659	2019-05-24 18:58:21 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Sjoerd Meijer	937af54666	[ARM] ARMExpandPseudoInsts: add debug messages This pass wasn't printing any messages at all, which I find really inconvenient while debugging/tracing things. It now dumps the before and after of expanded instructions. It doesn't do this yet for all instructions, but this is a good start I guess. Differential Revision: https://reviews.llvm.org/D62297 llvm-svn: 361604	2019-05-24 08:25:02 +00:00
Sam Parker	617cdc5a6d	[ARM][CGP] Clear SafeWrap before each search The previous patch added a member set to store instructions that we could allow to wrap. But this wasn't cleared between searches meaning that they could get promoted, incorrectly, during the promotion of a separate valid chain. Differential Revision: https://reviews.llvm.org/D62254 llvm-svn: 361462	2019-05-23 07:46:39 +00:00
Sam Parker	3141bbd52d	[ARM][CGP] Skip nuw in PrepareConstants PrepareConstants step converts add/sub with 'negative' immediates to sub/add with a 'positive' imm to make promotion more simple. nuw already states that the add shouldn't cause an unsigned wrap, so it shouldn't need any tweaking. Plus, we also don't allow a sub with a 'negative' immediate to be safe wrap, so this functionality has been removed. The PrepareConstants step now just handles the add instructions that we've determined would be safe if they wrap around zero. Differential Revision: https://reviews.llvm.org/D62057 llvm-svn: 361227	2019-05-21 07:56:47 +00:00
Fangrui Song	43ca0e9eb8	[ARM] Support .reloc , R_ARM_NONE, R_ARM_NONE can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. Add a generic MCFixupKind FK_NONE as this kind of no-op relocation is ubiquitous on ELF and COFF, and probably available on many other binary formats. See D62014. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D61992 llvm-svn: 360980	2019-05-17 02:51:54 +00:00
David Green	0582b22f10	[ARM] Don't use the Machine Scheduler for cortex-m at minsize The new cortex-m schedule in rL360768 helps performance, but can increase the amount of high-registers used. This, on average, ends up increasing the codesize by a fair amount (because less instructions are converted from T2 to T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use the existing DAG scheduler with the RegPressure scheduling preference (at least until the issues around T2 vs T1 instructions can be improved). I have also made sure that the Sched::RegPressure dag scheduler is always chosen for MinSize. The test shows one case where we increase the number of registers used. Differential Revision: https://reviews.llvm.org/D61882 llvm-svn: 360769	2019-05-15 12:58:02 +00:00
David Green	d2d0f46cd2	[ARM] Cortex-M4 schedule This patch adds a simple Cortex-M4 schedule, renaming the existing M3 schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM: https://developer.arm.com/docs/ddi0439/latest Most of these are 1, with the important exception being loads taking 2 cycles. A few others are also higher, but I don't believe they make a large difference. I've repurposed the M3 schedule as the latencies are mostly the same between the two cores, with the M4 having more FP and DSP instructions. We also turn on MISched and UseAA for the cores that now use this. It also adds some schedule Write's to various instruction to make things simpler. Differential Revision: https://reviews.llvm.org/D54142 llvm-svn: 360768	2019-05-15 12:41:58 +00:00
Richard Trieu	f3011b9b10	[ARM] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360718	2019-05-14 22:29:50 +00:00
Sam Parker	a33e311a3b	[ARM][ParallelDSP] Relax alias checks When deciding the safety of generating smlad, we checked for any writes within the block that may alias with any of the loads that need to be widened. This is overly conservative because it only matters when there's a potential aliasing write to a location accessed by a pair of loads. Now we check for aliasing writes only once, during setup. If two loads are found to have an aliasing write between them, we don't add these loads to LoadPairs. This means that later during the transform, we can safely widened a pair without worrying about aliasing. However, to maintain correctness, we also need to change the way that wide loads are inserted because the order is now important. The MatchSMLAD method has also been changed, absorbing MatchReductions and AddMACCandidate to hopefully improve readability. Differential Revision: https://reviews.llvm.org/D6102 llvm-svn: 360567	2019-05-13 09:23:32 +00:00
Richard Trieu	5e3ee4b84e	[ARM] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360490	2019-05-11 00:34:07 +00:00
Sam Parker	d7b650cc72	[ARM][CGP] Guard against signext args and sitofp Add an Argument that has the SExtAttr attached, as well as SIToFP instructions, as values that generate sign bits. SIToFP doesn't strictly do this and could be treated as a sink to be sign-extended. Differential Revision: https://reviews.llvm.org/D61381 llvm-svn: 360331	2019-05-09 11:56:16 +00:00
Diana Picus	3531453371	[ARM GlobalISel] Map DBG_VALUE for types != s32 ...and make sure we fail elegantly for unsupported values. s64 goes into DPR, anything <= 32 into GPR. llvm-svn: 360321	2019-05-09 09:49:36 +00:00
Tim Northover	18adcf331b	ARM: disallow SP as Rn for Thumb2 TST & TEQ instructions Using SP in this position is unpredictable in ARMv7. CMP and CMN are not affected, and of course v8 relaxes this requirement, but that's handled elsewhere. llvm-svn: 360242	2019-05-08 10:59:08 +00:00
Diana Picus	0a47fb8884	[ARM GlobalISel] Widen G_SELECT operands ...except for the condition operand. llvm-svn: 360135	2019-05-07 11:39:30 +00:00
Diana Picus	d6d3808fa4	[ARM GlobalISel] Widen G_INTTOPTR/G_PTRTOINT We actually have a couple of G_PTRTOINT to s8 when building clang, so we should do something about them. llvm-svn: 360130	2019-05-07 10:48:01 +00:00
Diana Picus	d18bac5d19	[ARM GlobalISel] Widen G_GEP index operand llvm-svn: 360127	2019-05-07 10:11:57 +00:00
Eli Friedman	2ea088173d	[ARM] Glue register copies to tail calls. This generally follows what other targets do. I don't completely understand why the special case for tail calls existed in the first place; even when the code was committed in r105413, call lowering didn't work in the way described in the comments. Stack protector lowering breaks if the register copies are not glued to a tail call: we have to insert the stack protector check before the tail call, and we choose the location based on the assumption that all physical register dependencies of a tail call are adjacent to the tail call. (See FindSplitPointForStackProtector.) This is sort of fragile, but I don't see any reason to break that assumption. I'm guessing nobody has seen this before just because it's hard to convince the scheduler to actually schedule the code in a way that breaks; even without the glue, the only computation that could actually be scheduled after the register copies is the computation of the call address, and the scheduler usually prefers to schedule that before the copies anyway. Fixes https://bugs.llvm.org/show_bug.cgi?id=41417 Differential Revision: https://reviews.llvm.org/D60427 llvm-svn: 360099	2019-05-06 23:21:59 +00:00
Diana Picus	1136ea2d44	[ARM GlobalISel] Fixup r359768 Get rid of local variable used only in assertion. llvm-svn: 359772	2019-05-02 10:08:29 +00:00
Diana Picus	06a61ccc42	[ARM GlobalISel] Select extensions to < 32 bits Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in the exact same way as 32 bits. This overwrites the higher bits, but that should be ok since all legal users of types smaller than 32 bits ignore those bits anyway. llvm-svn: 359768	2019-05-02 09:28:00 +00:00
Diana Picus	53bcf6f2e7	[ARM GlobalISel] Legalize extensions to < 32 bits Make it legal to extend from e.g. s1 to s8 or s16. llvm-svn: 359766	2019-05-02 09:21:46 +00:00
Sjoerd Meijer	ea31ddb36f	[ARM] Implement TTI::getMemcpyCost This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547	2019-04-30 10:28:50 +00:00
Diana Picus	59a4c0481a	[ARM GlobalISel] Widen small shift operands The legalizer was already widening the shift amount. Add tests for that behaviour, and also support widening the shifted value. llvm-svn: 359542	2019-04-30 09:24:43 +00:00
Diana Picus	1e88ac213b	[ARM GlobalISel] Be more careful about bailing out Bail out on function arguments/returns with types aggregating an unsupported type. This fixes cases where we would happily and incorrectly lower functions taking e.g. [1 x i64] parameters, when we don't even support plain i64 yet. llvm-svn: 359540	2019-04-30 09:05:25 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Diogo N. Sampaio	d95abb170b	[ARM] Add bitcast/extract_subvec. of fp16 vectors Summary: This patch adds some basic operations for fp16 vectors, such as bitcast from fp16 to i16, required to perform extract_subvector (also added here) and extract_element. Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard Reviewed By: ostannard Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60618 llvm-svn: 359433	2019-04-29 10:28:07 +00:00
Diogo N. Sampaio	2078eb745d	[ARM] Add v4f16 and v8f16 types to the CallingConv Summary: The Procedure Call Standard for the Arm Architecture states that float16x4_t and float16x8_t behave just as uint16x4_t and uint16x8_t for argument passing. This patch adds the fp16 vectors to the ARMCallingConv.td file. Reviewers: miyuki, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60720 llvm-svn: 359431	2019-04-29 10:10:37 +00:00
Nick Desaulniers	7ab164c4a4	[AsmPrinter] refactor to support %c w/ GlobalAddress' Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337	2019-04-26 18:45:04 +00:00
Tim Northover	6af366be8a	ARM: disallow add/sub to sp unless Rn is also sp. The manual says that Thumb2 add/sub instructions are only allowed to modify sp if the first source is also sp. This is slightly different from the usual rGPR restriction since it's context-sensitive, so implement it in C++. llvm-svn: 358987	2019-04-23 13:50:13 +00:00
David Green	c519d3c403	[ARM] Update check for CBZ in Ifcvt The check for creating CBZ in constant island pass recently obtained the ability to search backwards to find a Cmp instruction. The code in IfCvt should mirror this to allow more conversions to the smaller form. The common code has been pulled out into a separate function to be shared between the two places. Differential Revision: https://reviews.llvm.org/D60090 llvm-svn: 358977	2019-04-23 12:11:26 +00:00
David Green	2f9eed6265	[ARM] Don't replicate instructions in Ifcvt at minsize Ifcvt can replicate instructions as it converts them to be predicated. This stops that from happening on thumb2 targets at minsize where an extra IT instruction is likely needed. Differential Revision: https://reviews.llvm.org/D60089 llvm-svn: 358974	2019-04-23 11:46:58 +00:00
Diogo N. Sampaio	2619f399f9	[ARM][FIX] Add missing f16.lane.vldN/vstN lowering Summary: Add missing D and Q lane VLDSTLane lowering for fp16 elements. Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60874 llvm-svn: 358962	2019-04-23 09:36:39 +00:00
David Green	0d741507f7	[ARM] Rewrite isLegalT2AddressImmediate This does two main things, firstly adding some at least basic addressing modes for i64 types, and secondly treats floats and doubles sensibly when there is no fpu. The floating point change can help codesize in some cases, especially with D60294. Most backends seems to not consider the exact VT in isLegalAddressingMode, instead switching on type size. That is now what this does when the target does not have an fpu (as the float data will be loaded using LDR's). i64's currently use the address range of an LDRD (even though they may be legalised and loaded with an LDR). This is at least better than marking them all as illegal addressing modes. I have not attempted to do much with vectors yet. That will need changing once MVE is added. Differential Revision: https://reviews.llvm.org/D60677 llvm-svn: 358845	2019-04-21 09:54:29 +00:00
Nick Desaulniers	9609ce2f33	[AsmPrinter] hoist %a output template to base class for ARM+Aarch64 Summary: X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do basically the same thing (Aarch64 did not correctly handle immediates, ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses %a with an immediate) for a flag that should be target independent anyways. Reviewers: echristo, peter.smith Reviewed By: echristo Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60841 llvm-svn: 358618	2019-04-17 22:21:10 +00:00
Nick Desaulniers	a2077bab40	[AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC Summary: None of these derived classes do anything that the base class cannot. If we remove these case statements, then the base class can handle them just fine. Reviewers: peter.smith, echristo Reviewed By: echristo Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60803 llvm-svn: 358603	2019-04-17 18:22:48 +00:00
Simon Pilgrim	e5573f4f4e	[TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526	2019-04-16 20:57:28 +00:00
Amara Emerson	d189680baa	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368	2019-04-15 04:53:46 +00:00
Oliver Stannard	6fa145e429	Test commit access llvm-svn: 358162	2019-04-11 12:53:33 +00:00
Nick Desaulniers	5277b3ff25	[AsmPrinter] refactor to remove remove AsmVariant. NFC Summary: The InlineAsm::AsmDialect is only required for X86; no architecture makes use of it and as such it gets passed around between arch-specific and general code while being unused for all architectures but X86. Since the AsmDialect is queried from a MachineInstr, which we also pass around, remove the additional AsmDialect parameter and query for it deep in the X86AsmPrinter only when needed/as late as possible. This refactor should help later planned refactors to AsmPrinter, as this difference in the X86AsmPrinter makes it harder to make AsmPrinter more generic. Reviewers: craig.topper Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60488 llvm-svn: 358101	2019-04-10 16:38:43 +00:00
Diogo N. Sampaio	651463e4a8	[ARM] [FIX] Add missing f16 vector operations lowering Summary: Add missing <8xhalf> shufflevectors pattern, when using concat_vector dag node. As well, allows <8xhalf> and <4xhalf> vldup1 operations. These instructions are required for v8.2a fp16 lowering of vmul_n_f16, vmulq_n_f16 and vmulq_lane_f16 intrinsics. Reviewers: olista01, pbarrio, LukeGeeson, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60319 llvm-svn: 358081	2019-04-10 13:28:06 +00:00
Diana Picus	6bdade85de	Fixup r358063 Fix warning/error about mixed signedness. llvm-svn: 358065	2019-04-10 09:31:28 +00:00
Diana Picus	4a7f8d8d6b	[ARM GlobalISel] Add some asserts. NFC. Make sure some arm opcodes don't unintentionally sneak into thumb mode. llvm-svn: 358064	2019-04-10 09:14:37 +00:00
Diana Picus	b6e83b98f9	[ARM GlobalISel] Select G_FCONSTANT for VFP3 Make it possible to TableGen code for FCONSTS and FCONSTD. We need to make two changes to the TableGen descriptions of vfp_f32imm and vfp_f64imm respectively: * add GISelPredicateCode to check that the immediate fits in 8 bits; * extract the SDNodeXForms into separate definitions and create a GISDNodeXFormEquiv and a custom renderer function for each of them. There's a lot of boilerplate to get the actual value of the immediate, but it basically just boils down to calling ARM_AM::getFP32Imm or ARM_AM::getFP64Imm. llvm-svn: 358063	2019-04-10 09:14:32 +00:00
Diana Picus	3533ad6801	[ARM GlobalISel] Select G_FCONSTANT into pools Put all floating point constants into constant pools and load their values from there. llvm-svn: 358062	2019-04-10 09:14:24 +00:00
Diana Picus	165846b031	[ARM GlobalISel] Map G_FCONSTANT llvm-svn: 358061	2019-04-10 09:14:16 +00:00
Amara Emerson	2b523f8162	[GlobalISel][AArch64] Allow CallLowering to handle types which are normally required to be passed as different register types. E.g. <2 x i16> may need to be passed as a larger <2 x i32> type, so formal arg lowering needs to be able truncate it back. Likewise, when dealing with returns of these types, they need to be widened in the appropriate way back. Differential Revision: https://reviews.llvm.org/D60425 llvm-svn: 358032	2019-04-09 21:22:33 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
Diana Picus	153c3887e4	[ARM GlobalISel] Support DBG_VALUE Make sure we can map and select DBG_VALUE. llvm-svn: 357681	2019-04-04 10:24:51 +00:00
Evandro Menezes	7c711ccf36	[IR] Create new method in `Function` class (NFC) Create method `optForNone()` testing for the function level equivalent of `-O0` and refactor appropriately. Differential revision: https://reviews.llvm.org/D59852 llvm-svn: 357638	2019-04-03 21:27:03 +00:00
Eli Friedman	3813fe0bda	[ARM] Optimize expressions like "return x != 0;" for Thumb1. There's an existing optimization for x != C, but somehow it was missing a special case for 0. While I'm here, also cleaned up the code/comments a bit: the second value produced by the MERGE_VALUES was actually dead, since a CMOV only produces one result. Differential Revision: https://reviews.llvm.org/D59616 llvm-svn: 357437	2019-04-02 00:01:23 +00:00
Eli Friedman	73af6ef2e7	[ARM] Don't try to create "push {r12, lr}" in Thumb1 at -Oz. It's a little tricky to make this issue show up because prologue/epilogue emission normally likes to push at least two registers... but it doesn't when lr is force-spilled due to function length. Not sure if that really makes sense, but I decided not to touch it for now. Differential Revision: https://reviews.llvm.org/D59385 llvm-svn: 357436	2019-04-01 23:55:57 +00:00
Diana Picus	52495c472f	[ARM GlobalISel] Fix G_STORE with s1 G_STORE for 1-bit values uses a STRBi12, which stores the whole byte. Zero out the undefined bits before writing. llvm-svn: 357154	2019-03-28 09:09:36 +00:00
Diana Picus	4d512df300	[ARM GlobalISel] Fix selection of G_SELECT G_SELECT uses a 1-bit scalar for the condition, and is currently implemented with a plain CMPri against 0. This means that values such as 0x1110 are interpreted as true, when instead the higher bits should be treated as undefined and therefore ignored. Replace the CMPri with a TSTri against 0x1, which performs an implicit AND, yielding the expected result. llvm-svn: 357153	2019-03-28 09:09:27 +00:00
Sam Clegg	b2978c0203	[ARM] Remove dead function ARMMCCodeEmitter::getSOImmOpValue The last reference to this function was removed from the ARM td files in 2015 in rL225266. Differential Revision: https://reviews.llvm.org/D59868 llvm-svn: 357130	2019-03-27 23:00:12 +00:00
Eli Friedman	c388bfa230	[ARM] Don't confuse the scheduler for very large VLDMDIA etc. ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the memory operands of load and store-multiple operations. This doesn't really fix the problem properly, but it's enough to prevent crashing, at least. Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 . Differential Revision: https://reviews.llvm.org/D59834 llvm-svn: 357109	2019-03-27 18:33:30 +00:00
Oliver Stannard	5c90238479	[ARM][Asm] Accept upper case coprocessor number and registers Differential revision: https://reviews.llvm.org/D59760 llvm-svn: 356984	2019-03-26 10:24:03 +00:00
Eli Friedman	1e5d569c8c	[ARM] Add missing memory operands to a bunch of instructions. This should hopefully lead to minor improvements in code generation, and more accurate spill/reload comments in assembly. Also fix isLoadFromStackSlotPostFE/isStoreToStackSlotPostFE so they don't lead to misleading assembly comments for merged memory operands; this is technically orthogonal, but in practice the relevant memory operand lists don't show up without this change. Differential Revision: https://reviews.llvm.org/D59713 llvm-svn: 356963	2019-03-25 22:42:30 +00:00
Diana Picus	254b11a0fd	[ARM GlobalISel] 64-bit memops should be aligned We currently use only VLDR/VSTR for all 64-bit loads/stores, so the memory operands must be word-aligned. Mark aligned operations as legal and narrow non-aligned ones to 32 bits. While we're here, also mark non-power-of-2 loads/stores as unsupported. llvm-svn: 356872	2019-03-25 08:54:29 +00:00
Eli Friedman	b906bba576	[ARM] Don't form "ands" when it isn't scheduled correctly. In r322972/r323136, the iteration here was changed to catch cases at the beginning of a basic block... but we accidentally deleted an important safety check. Restore that check to the way it was. Fixes https://bugs.llvm.org/show_bug.cgi?id=41116 Differential Revision: https://reviews.llvm.org/D59680 llvm-svn: 356809	2019-03-22 20:49:15 +00:00
Evandro Menezes	4a7739b681	[AArch64, ARM] Add support for Exynos M5 Add Exynos M5 support and test cases. llvm-svn: 356793	2019-03-22 18:42:14 +00:00
Eli Friedman	c7870cce80	[ARM] [NFC] Use tGPR in patterns where appropriate. This doesn't have any practical effect at the moment, as far as I know, because high registers aren't allocatable in Thumb1 mode. But it might matter in the future. Differential Revision: https://reviews.llvm.org/D59675 llvm-svn: 356791	2019-03-22 18:37:26 +00:00
Simon Pilgrim	87d261bfd3	[Thumb] Fix infinite loop in ABS expansion (PR41160) Don't expand ISD::ABS node if its legal. llvm-svn: 356661	2019-03-21 12:41:18 +00:00
Eli Friedman	638be660d7	[ARM] Eliminate redundant "mov rN, sp" instructions in Thumb1. This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them to something like "str r0, [sp]". For regular stack variables, this optimization was already implemented: we lower loads and stores using frame indexes, which are expanded later. However, when constructing a call frame for a call with more than four arguments, the existing optimization doesn't apply. We need to use stores which are actually relative to the current value of sp, and don't have an associated frame index. This patch adds a special case to handle that construct. At the DAG level, this is an ISD::STORE where the address is a CopyFromReg from SP (plus a small constant offset). This applies only to Thumb1: in Thumb2 or ARM mode, a regular store instruction can access SP directly, so the COPY gets eliminated by existing code. The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related cleanup: we shouldn't pretend that it can select anything other than frame indexes. Differential Revision: https://reviews.llvm.org/D59568 llvm-svn: 356601	2019-03-20 19:40:45 +00:00
Eli Friedman	2596e8b3e7	[ARM] Make sure to save/restore LR when we use tBfar. This change does two things. One, it ensures compilation will abort instead of miscompiling if ARMFrameLowering::determineCalleeSaves chooses not to save LR in a case where it's necessary. Two, it changes the way we estimate the size of a function to be more conservative in the presence of constant pool entries and jump tables. EstimateFunctionSizeInBytes probably still isn't really conservative enough, but I'm not sure how we can come up with a reliable estimate before constant islands runs. Differential Revision: https://reviews.llvm.org/D59439 llvm-svn: 356527	2019-03-19 21:48:08 +00:00
Simon Pilgrim	7a8e5051f4	Fix unused variable warning. NFCI. llvm-svn: 356474	2019-03-19 16:49:59 +00:00
Simon Pilgrim	a56f2822d0	[SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect These changes are related to PR37743 and include: SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node. Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner. Add promoting the integer ABS node in the LegalizeIntegerType. Expand-based legalization of integer result for the ABS nodes. Expand-based legalization of ABS vector operations. Add some integer abs testcases for different typesizes for Thumb arch Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to: tmp = (SRA, Hi, 31) Lo = (UADDO tmp, Lo) Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1)) Lo = (XOR tmp, Lo) The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern: (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))). Change integer abs testcases for codegen with the ABS node support for AArch64. Indicate that the ABS is legal for the i64 type when the NEON is supported. Change the integer abs testcases to show changing of codegen. Add combine and legalization of ABS nodes for Thumb arch. Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition. For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743 Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D49837 llvm-svn: 356468	2019-03-19 16:24:55 +00:00
Adhemerval Zanella	664c1ef528	[TargetLowering] Add code size information on isFPImmLegal. NFC This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389	2019-03-18 18:40:07 +00:00
David Green	baa94ef03b	[ARM] Check that CPSR does not have other uses Fix up rL356335 by checking that CPSR is not read between the compare and the branch. llvm-svn: 356349	2019-03-17 21:36:15 +00:00
Tim Renouf	d1477e989c	[ARM] Fixed an assumption of power-of-2 vector MVT I am about to introduce some non-power-of-2 width vector MVTs. This commit fixes a power-of-2 assumption that my forthcoming change would otherwise break, as shown by test/CodeGen/ARM/vcvt_combine.ll and vdiv_combine.ll. Differential Revision: https://reviews.llvm.org/D58927 Change-Id: I56a282e365d3874ab0621e5bdef98a612f702317 llvm-svn: 356341	2019-03-17 20:48:54 +00:00
David Green	e0b48a8015	[ARM] Search backwards for CMP when combining into CBZ The constant island pass currently only looks at the instruction immediately before a branch for a CMP to fold into a CBZ/CBNZ. This extends it to search backwards for the instruction that defines CPSR. We need to ensure that the register is not overridden between the CMP and the branch. Differential Revision: https://reviews.llvm.org/D59317 llvm-svn: 356336	2019-03-17 16:11:22 +00:00
Eli Friedman	68d9a60573	[ARM] Add MachineVerifier logic for some Thumb1 instructions. tMOVr and tPUSH/tPOP/tPOP_RET have register constraints which can't be expressed in TableGen, so check them explicitly. I've unfortunately run into issues with both of these recently; hopefully this saves some time for someone else in the future. Differential Revision: https://reviews.llvm.org/D59383 llvm-svn: 356303	2019-03-15 21:44:49 +00:00
Sam Parker	f82d4ed771	[ARM] Remove EarlyCSE from backend There is an issue with early CSE hitting an assert, so temporarily remove the pass from the Arm backend. Bug: https://bugs.llvm.org/show_bug.cgi?id=41081 Differential Revision: https://reviews.llvm.org/D59410 llvm-svn: 356259	2019-03-15 13:36:37 +00:00
Sam Parker	9e73020bfa	[ARM][ParallelDSP] Disable for big-endian Bail early when we don't have a preheader and also if the target is big endian because it's written with only little endian in mind! Differential Revision: https://reviews.llvm.org/D59368 llvm-svn: 356243	2019-03-15 10:19:32 +00:00
Sam Parker	4c4ff13d3c	[ARM][ParallelDSP] Enable multiple uses of loads When choosing whether a pair of loads can be combined into a single wide load, we check that the load only has a sext user and that sext also only has one user. But this can prevent the transformation in the cases when parallel macs use the same loaded data multiple times. To enable this, we need to fix up any other uses after creating the wide load: generating a trunc and a shift + trunc pair to recreate the narrow values. We also need to keep a record of which loads have already been widened. Differential Revision: https://reviews.llvm.org/D59215 llvm-svn: 356132	2019-03-14 11:14:13 +00:00
Sam Parker	3b2ba20afd	[ARM] Run ARMParallelDSP in the IRPasses phase Run EarlyCSE before ParallelDSP and do this in the backend IR opt phase. Differential Revision: https://reviews.llvm.org/D59257 llvm-svn: 356130	2019-03-14 10:57:40 +00:00
Jason Liu	a03ae73c29	Add XCOFF triple object format type for AIX This patch adds an XCOFF triple object format type into LLVM. This XCOFF triple object file type will be used later by object file and assembly generation for the AIX platform. Differential Revision: https://reviews.llvm.org/D58930 llvm-svn: 355989	2019-03-12 22:01:10 +00:00
Stanislav Mekhanoshin	e98944ed47	Use bitset for assembler predicates AMDGPU target run out of Subtarget feature flags hitting the limit of 64. AssemblerPredicates uses at most uint64_t for their representation. At the same time CodeGen has exhausted this a long time ago and switched to a FeatureBitset with the current limit of 192 bits. This patch completes transition to the bitset for feature bits extending it to asm matcher and MC code emitter. Differential Revision: https://reviews.llvm.org/D59002 llvm-svn: 355839	2019-03-11 17:04:35 +00:00
Diogo N. Sampaio	c20c37ba7f	[ARM][FIX] Fix vfmal.f16 and vfmsl.f16 operand The indexed variant of vfmal.f16 and vfmsl.f16 instructions use the uppser bits of the indexed operand to store the index (1 bit for the double variant, 2 bits for the quad). This limits the usable registers to d0 - d7 or s0 - s15. This patch enforces this limitation. Differential Revision: https://reviews.llvm.org/D59021 llvm-svn: 355707	2019-03-08 17:11:20 +00:00
Michael Platings	308e82eceb	[IR][ARM] Add function pointer alignment to datalayout Use this feature to fix a bug on ARM where 4 byte alignment is incorrectly assumed. Differential Revision: https://reviews.llvm.org/D57335 llvm-svn: 355685	2019-03-08 10:44:06 +00:00
Mitch Phillips	92dd321a14	Rollback of rL355585. Introduces memory leak in FunctionTest.GetPointerAlignment that breaks sanitizer buildbots: ``` ================================================================= ==2453==ERROR: LeakSanitizer: detected memory leaks Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x610428 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:105 #1 0x16936bc in llvm::User::operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/IR/User.cpp:151:19 #2 0x7c3fe9 in Create /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/IR/Function.h:144:12 #3 0x7c3fe9 in (anonymous namespace)::FunctionTest_GetPointerAlignment_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/IR/FunctionTest.cpp:136 #4 0x1a836a0 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #5 0x1a836a0 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2474 #6 0x1a85c55 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #7 0x1a870d0 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #8 0x1aa5b84 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #9 0x1aa4d30 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #10 0x1aa4d30 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4257 #11 0x1a6b656 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #12 0x1a6b656 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50 #13 0x7f5af37a22e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) Indirect leak of 40 byte(s) in 1 object(s) allocated from: #0 0x610428 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:105 #1 0x151be6b in make_unique<llvm::ValueSymbolTable> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/ADT/STLExtras.h:1349:29 #2 0x151be6b in llvm::Function::Function(llvm::FunctionType, llvm::GlobalValue::LinkageTypes, unsigned int, llvm::Twine const&, llvm::Module) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/IR/Function.cpp:241 #3 0x7c4006 in Create /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/IR/Function.h:144:16 #4 0x7c4006 in (anonymous namespace)::FunctionTest_GetPointerAlignment_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/IR/FunctionTest.cpp:136 #5 0x1a836a0 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #6 0x1a836a0 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2474 #7 0x1a85c55 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #8 0x1a870d0 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #9 0x1aa5b84 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #10 0x1aa4d30 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #11 0x1aa4d30 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4257 #12 0x1a6b656 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #13 0x1a6b656 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50 #14 0x7f5af37a22e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) SUMMARY: AddressSanitizer: 168 byte(s) leaked in 2 allocation(s). ``` See http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/11358/steps/check-llvm%20asan/logs/stdio for more information. Also introduces use-of-uninitialized-value in ConstantsTest.FoldGlobalVariablePtr: ``` ==7070==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x14e703c in User /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/User.h:79:5 #1 0x14e703c in Constant /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/Constant.h:44 #2 0x14e703c in llvm::GlobalValue::GlobalValue(llvm::Type, llvm::Value::ValueTy, llvm::Use, unsigned int, llvm::GlobalValue::LinkageTypes, llvm::Twine const&, unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/GlobalValue.h:78 #3 0x14e5467 in GlobalObject /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/GlobalObject.h:34:9 #4 0x14e5467 in llvm::GlobalVariable::GlobalVariable(llvm::Type, bool, llvm::GlobalValue::LinkageTypes, llvm::Constant, llvm::Twine const&, llvm::GlobalValue::ThreadLocalMode, unsigned int, bool) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/Globals.cpp:314 #5 0x6938f1 in llvm::(anonymous namespace)::ConstantsTest_FoldGlobalVariablePtr_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm/unittests/IR/ConstantsTest.cpp:565:18 #6 0x1a240a1 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc #7 0x1a240a1 in testing::Test::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2474 #8 0x1a26d26 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #9 0x1a2815f in testing::TestCase::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #10 0x1a43de8 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #11 0x1a42c47 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc #12 0x1a42c47 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4257 #13 0x1a0dfba in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #14 0x1a0dfba in main /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50 #15 0x7f2081c412e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #16 0x4dff49 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/IR/IRTests+0x4dff49) SUMMARY: MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/User.h:79:5 in User ``` See http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30222/steps/check-llvm%20msan/logs/stdio for more information. llvm-svn: 355616	2019-03-07 18:13:39 +00:00

... 7 8 9 10 11 ...

10782 Commits