llvm-project

Commit Graph

Author	SHA1	Message	Date
QingShan Zhang	4bd186c0ff	[PowerPC] Exploit the rldicl + rldicl when and with mask If we are and the constant like 0xFFFFFFC00000, for now, we are using several instructions to generate this 48bit constant and final an "and". However, we could exploit it with two rotate instructions. MB ME MB+63-ME +----------------------+ +----------------------+ \|0000001111111111111000\| -> \|0000000001111111111111\| +----------------------+ +----------------------+ 0 63 0 63 Rotate left ME + 1 bit first, and then, mask it with (MB + 63 - ME, 63), finally, rotate back. Notice that, we need to round it with 64 bit for the wrapping case. Reviewed by: ChenZheng, Nemanjai Differential Revision: https://reviews.llvm.org/D71831	2020-04-17 05:24:00 +00:00
Wouter van Oortmerssen	48139ebc3a	[WebAssembly] Add int32 DW_OP_WASM_location variant This to allow us to add reloctable global indices as a symbol. Also adds R_WASM_GLOBAL_INDEX_I32 relocation type to support it. See discussion in https://github.com/WebAssembly/debugging/issues/12	2020-04-16 16:32:17 -07:00
Cameron McInally	1223255c2d	[AArch64][SVE] Add DestructiveBinaryImm SQSHLU patterns. Add DestructiveBinaryImm SQSHLU patterns and tests. These patterns allow the SQSHLU instruction to match with a MOVPRFX. Differential Revision: https://reviews.llvm.org/D76728	2020-04-16 13:48:08 -05:00
Stefan Pintilie	18b6050324	[PowerPC][Future] Initial support for PC Relative addressing for global values This patch adds PC Relative support for global values that are known at link time. If a global value requires access through the global offset table (GOT) it is not covered in this patch. Differential Revision: https://reviews.llvm.org/D75280	2020-04-16 12:45:22 -05:00
Stanislav Mekhanoshin	2e94a64b57	[AMDGPU] Define 16 bit SGPR subregs These are needed as a counterpart for VGPR subregs even though there are no scalar instructions which can operate 16 bit values. When we are materializing a constant that is done into an SGPR and that SGPR may/will be copied into a 16 bit VGPR subreg. Such copy is illegal. There are also similar problems if a source operand of a 16 bit VALU instruction is an SGPR. In addition we need to get a register with a lo16 subregister of an SGPR RC during selection and this fails as well. All of that makes me believe we need these subregisters as a syntactic glue. Differential Revision: https://reviews.llvm.org/D78250	2020-04-16 10:31:39 -07:00
Anna Welker	d736571538	[ARM][MVE] Fix location of optimized gather addresses Fix for the address optimization for gathers and scatters which would in some complex cases push out instructions not to the vector loop preheader, but to other locations as well which lead to a scrambled order and the compilation failing. This patch ensures that said instructions are always pushed to the end of the vector loop preheader. Differential Revision: https://reviews.llvm.org/D78293	2020-04-16 18:15:28 +01:00
Kang Zhang	513976df2e	[PowerPC] Ignore implicit register operands for MCInst Summary: When doing the conversion: MachineInst -> MCInst, we should ignore the implicit operands, it will expose more opportunity for InstiAlias. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D77118	2020-04-16 16:22:43 +00:00
Kazushi (Jam) Marukawa	48d64f5654	[VE] Update logical operation instructions Summary: Changing all mnemonic to match assembly instructions to simplify mnemonic naming rules. This time update all fixed-point arithmetic instructions. This also corrects bswp operand type. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D78177	2020-04-16 14:35:53 +02:00
Konstantin Schwarz	1a3e89aa2b	[MIR] Add comments to INLINEASM immediate flag MachineOperands Summary: The INLINEASM MIR instructions use immediate operands to encode the values of some operands. The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature. Reviewers: SjoerdMeijer, arsenm, efriedma Reviewed By: arsenm Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78088	2020-04-16 13:46:14 +02:00
Shengchen Kan	71303b753c	[X86] Add interface X86II::isPseudo Avoid duplicate code in X86MCCodeEmitter, NFCI.	2020-04-16 12:40:17 +08:00
Shengchen Kan	7aaaea5acd	[X86][MC][NFC] Code cleanup in X86MCCodeEmitter Make some function static, move the definitions of functions to a better place and use C++ style cast, etc.	2020-04-16 11:30:49 +08:00
Shengchen Kan	6c66bb393e	[X86][MC][NFC] Refine code in X86MCCodeEmitter As we mentioned in D78180, merge some if clauses and use CamelCase for variables, etc.	2020-04-16 10:43:42 +08:00
Shengchen Kan	322ac2e917	[X86][MC][NFC] Reduce the parameters of functions in X86MCCodeEmitter(Part I) Summary: The function in X86MCCodeEmitter has too many parameters to make it look messy, and some parameters are unnecessary. This is the first patch to reduce their parameters. The follwing operations are cheap ``` unsigned Opcode = MI.getOpcode(); const MCInstrDesc &Desc = MCII.get(Opcode); uint64_t TSFlags = Desc.TSFlags; ``` So if we pass a `MCInst`, we don't need to pass `MCInstrDesc`; if we pass a `MCInstrDesc`, we don't need to pass `TSFlags`. Reviewers: craig.topper, MaskRay, pengfei Reviewed By: craig.topper Subscribers: annita.zhang, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78180	2020-04-16 09:53:45 +08:00
Fangrui Song	7d1ff446b6	[MC] Rename MCSection::getSectionName() to getName(). NFC A pending change will merge MCSection::getName() to MCSection::getName().	2020-04-15 16:48:14 -07:00
Christopher Tetreault	85247c1e89	[SVE] Remove calls to getBitWidth from x86 Reviewers: efriedma, RKSimon, sdesmalen Reviewed By: RKSimon Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77901	2020-04-15 15:48:48 -07:00
Chris Bowler	bee6c234ed	[AIX][PowerPC] Implement caller byval arguments in stack memory Differential Revision: https://reviews.llvm.org/D77578	2020-04-15 17:57:31 -04:00
Francesco Petrogalli	89680f25e8	[llvm][CodeGen] Rename SVE gather prefetch intrinsics. [NFC] Summary: The renaming is necessary to make the naming scheme uniform with other gather/scatter load/stores SVE intrinsics. The naming of variables and functions have been adapted to make it explicit whether we are dealing with a scalar offset (which is unscaled) or an index (which is scaled according to the data type of the lanes of the vector). Reviewers: andwar, sdesmalen, rengolin Reviewed By: andwar Subscribers: tschuett, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77839	2020-04-15 21:49:16 +01:00
Nemanja Ivanovic	c196e2ca48	[PowerPC] Clear the set of symbols that need to be updated in MCTargetStreamer We have added code to correct the .localentry values on assignments. However, we never clear the set so presumably it will still contain the (now dangling) MCSymbol pointers across a call to finish() and reset() in the streamer. This is based on my speculation that it is the reason we are getting segmentation faults mentioned in https://bugs.llvm.org/show_bug.cgi?id=45366 Fixes: https://bugs.llvm.org/show_bug.cgi?id=45366 Differential revision: https://reviews.llvm.org/D78196	2020-04-15 15:42:02 -05:00
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Matt Arsenault	588bd7be36	AMDGPU/GlobalISel: Work around a selector crash Ideally types without a corresponding register class wouldn't reach here, but we're currently missing some (in particular a 192-bit class is missing).	2020-04-15 14:38:50 -04:00
Craig Topper	a916e81927	[X86] Various improvements to our vector splitting helpers for lowering. NFC -Consistently name the functions as split* -Add a helper for doing the two extractSubvector calls and determining the size of the split -Use getSplitDestVTs to get the result type for the split node. -Move the binary and unary helper to one place in the file near the extractSubvector functions. Left the VSETCC one near LowerVSETCC since that's its only caller. -Remove the 256/512 wrappers that just had asserts. I don't think they provided a lot of value and now with the routines called split* the call sites are more obvious what they do. -Make the unary routine support different source and dest types to support D76212. -Add some weaker asserts into the helpers to make up for losing the very specific asserts from the 256/512 wrappers. Differential Revision: https://reviews.llvm.org/D78176	2020-04-15 10:57:53 -07:00
Benjamin Kramer	316b49d373	Pass shufflevector indices as int instead of unsigned. No functionality change intended.	2020-04-15 15:52:49 +02:00
Victor Campos	d85b3877dc	[CodeGen][ARM] Error when writing to specific reserved registers in inline asm Summary: No error or warning is emitted when specific reserved registers are written to in inline assembly. Therefore, writes to the program counter or to the frame pointer, for instance, were permitted, which could have led to undesirable behaviour. Example: int foo() { register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM __asm__ __volatile__("mov %0, r1" : "=r"(a) : : ); return a; } In contrast, GCC issues an error in the same scenario. This patch detects writes to specific reserved registers in inline assembly for ARM and emits an error in such case. The detection works for output and input operands. Clobber operands are not handled here: they are already covered at a later point in AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers covered are: program counter, frame pointer and base pointer. This is ARM only. Therefore the implementation of other targets' counterparts remain open to do. Reviewers: efriedma Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76848	2020-04-15 14:40:42 +01:00
Benjamin Kramer	cc035d475f	Upgrade users of 'new ShuffleVectorInst' to pass indices as an int array No functionality change intended.	2020-04-15 14:29:43 +02:00
Jonas Paulsson	036242b868	[SystemZ] Bugfix in adjustSubwordCmp() adjustSubwordCmp() should not optimize a load of an i1 value. This is achieved by checking that the size and store-size of the MemoryVT are the same. Fixes https://bugs.llvm.org/show_bug.cgi?id=45511. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D78187	2020-04-15 12:58:39 +02:00
Benjamin Kramer	6f64daca8f	Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints No functionality change intended.	2020-04-15 12:51:38 +02:00
Sam Parker	dd8153b757	[ARM][MVE] Tail predicate VML[A\|S]LDAV Make the non-exchanging versions of the multiply add/sub instructions validForTailPredication. Differential Revision: https://reviews.llvm.org/D77648	2020-04-15 11:34:39 +01:00
Sameer Sahasrabuddhe	8c11bc0cd0	Introduce fix-irreducible pass An irreducible SCC is one which has multiple "header" blocks, i.e., blocks with control-flow edges incident from outside the SCC. This pass converts an irreducible SCC into a natural loop by introducing a single new header block and redirecting all the edges on the original headers to this new block. This is a useful workaround for a limitation in the structurizer which, which produces incorrect control flow in the presence of irreducible regions. The AMDGPU backend provides an option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D77198 This restores commit `2ada8e2525`. Originally reverted with commit `44e09b59b8`.	2020-04-15 15:05:51 +05:30
Kazushi (Jam) Marukawa	7a7f223042	[VE] Update integer arithmetic instructions Summary: Changing all mnemonic to match assembly instructions to simplify mnemonic naming rules. This time update all fixed-point arithmetic instructions. This also corrects smax/smin code generations. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D77856	2020-04-15 09:47:51 +02:00
Sameer Sahasrabuddhe	44e09b59b8	Revert "Introduce fix-irreducible pass" This reverts commit `2ada8e2525`. Buildbots produced compilation errors which I was not able to quickly reproduce locally. Need more time to investigate.	2020-04-15 12:19:50 +05:30
Sameer Sahasrabuddhe	2ada8e2525	Introduce fix-irreducible pass An irreducible SCC is one which has multiple "header" blocks, i.e., blocks with control-flow edges incident from outside the SCC. This pass converts an irreducible SCC into a natural loop by introducing a single new header block and redirecting all the edges on the original headers to this new block. This is a useful workaround for a limitation in the structurizer which, which produces incorrect control flow in the presence of irreducible regions. The AMDGPU backend provides an option to enable this pass before the structurizer, which may eventually be enabled by default. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D77198	2020-04-15 11:29:19 +05:30
Matt Arsenault	cc149172da	AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS This wasn't covered by existing tablegen patterns, but also suffers the same issues as G_FNEG. Workaround them by manually selecting, like G_FNEG.	2020-04-14 22:05:22 -04:00
Mircea Trofin	447e2c3067	[llvm][NFC][CallSite] Remove Implementation uses of CallSite Reviewers: dblaikie, davidxl, craig.topper Subscribers: arsenm, dschuff, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78142	2020-04-14 14:49:47 -07:00
Christopher Tetreault	e68f1f2d43	[SVE] Remove calls to getBitWidth from Hexagon Reviewers: efriedma, sdesmalen, kparzysz Reviewed By: kparzysz Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77899	2020-04-14 11:09:49 -07:00
Christopher Tetreault	0badd8f613	[SVE] Remove calls to getBitWidth from ARM Reviewers: efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77904	2020-04-14 10:56:38 -07:00
Christopher Tetreault	05a079895c	[SVE] Remove calls to getBitWidth from AArch64 Reviewers: efriedma Reviewed By: efriedma Subscribers: danielkiss, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77905	2020-04-14 10:26:37 -07:00
Andrzej Warzynski	8fc7e6dcd8	[AArch64][SVE] Refine node definitions for ff & nf loads/stores (NFC) Summary: Only first-faulting and non-faulting loads read/update the FFR register and hence only the corresponding SDNodes should be decorated with `SDNPOptInGlue` and `SDNPOutGlue`. This patch: * removes SDNPOptInGlue from regular loads stores (FFR is not read) * adds SDNPOutGlue to first-faulting and non-faulting loads (FFR is both read and updated) Differential Revision: https://reviews.llvm.org/D77724	2020-04-14 18:17:15 +01:00
Pierre-vh	13eb890139	[Target][ARM] Fix VPT Block Pass miscompilation The pass was incorrectly reverting back to a "T" when something wrote to VPR inside a "E" block. This is not the correct behaviour, the predicate should stay the same. Differential Revision: https://reviews.llvm.org/D77798	2020-04-14 15:16:27 +01:00
Pierre-vh	4563024356	[Target][ARM] Adding MVE VPT Optimisation Pass Differential Revision: https://reviews.llvm.org/D76709	2020-04-14 15:16:27 +01:00
Simon Pilgrim	426f37584e	[TTI][X86] Add X86TTIImpl::getScalarizationOverhead implementation. This is a currently just a wrapper to the base type, I'll be adding ISD::BUILD_VECTOR costs in a future patch.	2020-04-14 12:58:19 +01:00
Georgii Rymar	1647ff6e27	[ADT/STLExtras.h] - Add llvm::is_sorted wrapper and update callers. It can be used to avoid passing the begin and end of a range. This makes the code shorter and it is consistent with another wrappers we already have. Differential revision: https://reviews.llvm.org/D78016	2020-04-14 14:11:02 +03:00
Kerry McLaughlin	36c76de678	[AArch64][SVE] Add a pass for SVE intrinsic optimisations Summary: Creates the SVEIntrinsicOpts pass. In this patch, the pass tries to remove unnecessary reinterpret intrinsics which convert to and from svbool_t (llvm.aarch64.sve.convert.[to\|from].svbool) For example, the reinterprets below are redundant: %1 = call <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool.nxv4i1(<vscale x 4 x i1> %a) %2 = call <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1> %1) The pass also looks for ptest intrinsics and phi instructions where the operands are being needlessly converted to and from svbool_t. Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, c-rhodes, rengolin Reviewed By: efriedma Subscribers: mgorny, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76078	2020-04-14 10:41:49 +01:00
Peter Smith	31c8e11896	[MC][ARM] Emit R_ARM_BASE_PREL for _GLOBAL_OFFSET_TABLE_ expressions The _GLOBAL_OFFSET_TABLE_ in SysVr4 ELF is conventionally the base of the .got or .got.prel sections. Expressions such as _GLOBAL_OFFSET_TABLE_ - (.L1 +8) are used in assembler code to calculate offsets into the .got. At present MC outputs a R_ARM_REL32 with respect to the _GLOBAL_OFFSET_TABLE_ symbol, whereas gas outputs a R_ARM_BASE_PREL relocation with respect to the _GLOBAL_OFFSET_TABLE_ symbol. While both are correct the R_ARM_REL32 depends on the value of the _GLOBAL_OFFSET_TABLE_ symbol, wheras te R_ARM_BASE_PREL relocation is idependent of the symbol. The R_ARM_BASE_PREL is therefore slightly more robust to linker's that may not follow the conventional placement of _GLOBAL_OFFSET_TABLE_; for example LLD for some time defined _GLOBAL_OFFSET_TABLE_ to 0. Differential Revision: https://reviews.llvm.org/D46319	2020-04-14 10:13:21 +01:00
Kazushi (Jam) Marukawa	37db04dda6	[VE] Remove unnecessary iz pattern Summary: This iz pattern is a special pattern of im pattern. This im pattern has been supported by https://reviews.llvm.org/D77769, so removing iz pattern as a continuous patch. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D77770	2020-04-14 08:56:55 +02:00
Mircea Trofin	4aae4e3f48	[llvm][NFC] CallSite removal from inliner-related files Summary: This removes CallSite from inliner files. Some dependencies where thus affected. Reviewers: dblaikie, davidxl, craig.topper Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, aheejin, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77991	2020-04-13 21:28:58 -07:00
Craig Topper	2f60fbce6c	[X86] Use a more realisitic cost for truncate v16i64->v16i8 with avx512f. Still not great and we could probably codegen this better, but 11 was clearly ridiculous.	2020-04-13 21:09:43 -07:00
Craig Topper	535a566a01	[X86] Split AVX512 getCastInstrCost into tables that require useAVX512Regs() and those that just operate on 256 or smaller vectors. Use useAVX512Regs() to skip lookups instead of using type legalization action.	2020-04-13 21:09:42 -07:00
Craig Topper	071c64d68d	[X86] Add a more accurate truncate cost for v8i64->v8i8	2020-04-13 21:09:41 -07:00
Fangrui Song	d1a677cd33	[VE] Adapt D77995 CallSite removal	2020-04-13 19:54:49 -07:00
Matt Arsenault	f48fe2c36e	GlobalISel: Fix casted unmerge of G_CONCAT_VECTORS This was assuming a scalarizing unmerge, and would fail assert if the unmerge was to smaller vector types.	2020-04-13 22:03:05 -04:00
Matt Arsenault	0ba40d4ccf	AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3] Ports the existing DAG combines, minus the simplify demanded bits which seems to have no equivalent now. Without these, this isn't particularly helpful in most of the IR sample cases.	2020-04-13 19:18:19 -04:00
Austin Kerbow	a69b3e010c	[AMDGPU][GlobalISel] Fix div_scale in FDIV lowering Differential Revision: https://reviews.llvm.org/D78004	2020-04-13 15:54:49 -07:00
Heejin Ahn	ba40896f99	[WebAssembly] Fix try placement in fixing unwind mismatches Summary: In CFGStackify, `fixUnwindMismatches` function fixes unwind destination mismatches created by `try` marker placement. For example, ``` try ... call @qux ;; This should throw to the caller! catch ... end ``` When `call @qux` is supposed to throw to the caller, it is possible that it is wrapped inside a `catch` so in case it throws it ends up unwinding there incorrectly. (Also it is possible `call @qux` is supposed to unwind to another `catch` within the same function.) To fix this, we wrap this inner `call @qux` with a nested `try`-`catch`-`end` sequence, and within the nested `catch` body, branch to the right destination: ``` block $l0 try ... try ;; new nested try call @qux catch ;; new nested catch local.set n ;; store exnref to a local br $l0 end catch ... end end local.get n ;; retrieve exnref back rethrow ;; rethrow to the caller ``` The previous algorithm placed the nested `try` right before the `call`. But it is possible that there are stackified instructions before the call from which the call takes arguments. ``` try ... i32.const 5 call @qux ;; This should throw to the caller! catch ... end ``` In this case we have to place `try` before those stackified instructions. ``` block $l0 try ... try ;; this should go before 'i32.const 5' i32.const 5 call @qux catch local.set n br $l0 end catch ... end end local.get n rethrow ``` We correctly handle this in the first normal `try` placement phase (`placeTryMarker` function), but failed to handle this in this `fixUnwindMismatches`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77950	2020-04-13 15:50:01 -07:00
Craig Topper	113f37a1f9	[CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase Differential Revision: https://reviews.llvm.org/D77995	2020-04-13 13:50:15 -07:00
Rahman Lavaee	05192e585c	Extend BasicBlock sections to allow specifying clusters of basic blocks in the same section. Differential Revision: https://reviews.llvm.org/D76954	2020-04-13 12:19:59 -07:00
Rahman Lavaee	4ddf7ab454	Revert "Extend BasicBlock sections to allow specifying clusters of basic blocks" This reverts commit `0d4ec16d3d` Because tests were not added to the commit.	2020-04-13 12:19:59 -07:00
Rahman Lavaee	0d4ec16d3d	Extend BasicBlock sections to allow specifying clusters of basic blocks in the same section. This allows specifying BasicBlock clusters like the following example: !foo !!0 1 2 !!4 This places basic blocks 0, 1, and 2 in one section in this order, and places basic block #4 in a single section of its own.	2020-04-13 11:46:11 -07:00
Matt Arsenault	e6605a209c	DAG: Fix wrong legality check for ISD::FMAD Since `1725f28841`, this should check isFMADLegalForFAddFSub rather than the the plain isOperationLegal. This would assert in a subset of cases due to an oddity in how FMAD is selected. We will allow FMA formation pre-legalize, but not FMAD even in cases where it would be valid. The current hook requires passing in the root fadd/fsub. However, in this distributed case, this would be far more complicated to pass in the relevant operand. AMDGPU doesn't get any value from the node, and only needs the type and is the only implementor, so I'm not sure why we have this complexity. Just rename and expand the assert to avoid the more complicated checks spread through the distribution logic.	2020-04-13 10:25:39 -07:00
Craig Topper	6dbf1a1229	[X86] Move X86ShuffleDecode.cpp/h into MCTargetDesc and remove X86Utils library. NFC The shuffle decoding is used by X86ISelLowering and MCTargetDesc/X86InstComments. The latter used to be in a separate InstPrinter library. The Utils library existed to allow InstPrinter and CodeGen to share the shuffle decoding. Since X86InstComments now lives in the MCTargetDesc, which CodeGen already depends on, we can sink the shuffle decoding there as well. Differential Revision: https://reviews.llvm.org/D77980	2020-04-13 10:14:08 -07:00
Jay Foad	bc78baec4c	[X86] Improve combineVectorShiftImm Summary: Fold (shift (shift X, C2), C1) -> (shift X, (C1 + C2)) for logical as well as arithmetic shifts. This is needed to prevent regressions from an upcoming funnel shift expansion change. While we're here, fold (VSRAI -1, C) -> -1 too. Reviewers: RKSimon, craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77300	2020-04-13 15:54:55 +01:00
Simon Pilgrim	401cbe373b	[X86][AVX] Attempt to scale masked shuffles to match the root type Improve the chances of folding the writemask into the combined shuffle by scaling a wider shuffle mask to match the root's original type. This creates a few minor issues with variable shuffles, preventing combines of shuffles because of the more limited support binary shuffle types. In most cases we're probably better off combining the shuffles and losing the writemask fold, but this isn't always going to be true.	2020-04-13 14:57:25 +01:00
Simon Pilgrim	fdd9ff9700	[X86][AVX] Create splitVectorIntBinary helper. Removes duplicate code from split256IntArith/split512IntArith.	2020-04-13 13:09:38 +01:00
Austin Kerbow	eab9a4f119	[AMDGPU] Don't assert on partial exec copy After Machine CSE and coalescing we can end up with copies of exec to subregister SGPRs. Differential Revision: https://reviews.llvm.org/D77992	2020-04-12 21:14:36 -07:00
Craig Topper	dbb272b0a3	[CallSite removal][FastISel] Use CallBase instead of CallSite in fastLowerCall.	2020-04-12 18:02:24 -07:00
Craig Topper	42fc7852f5	[X86] Print k-mask in FMA3 comments.	2020-04-12 13:16:53 -07:00
Craig Topper	95192f548d	[CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface. Differential Revision: https://reviews.llvm.org/D77929	2020-04-12 11:26:25 -07:00
Mircea Trofin	d2f1cd5d97	[llvm][NFC] Refactor uses of CallSite to CallBase - call promotion Summary: Updated CallPromotionUtils and impacted sites. Parameters that are expected to be non-null, and return values that are guranteed non-null, were replaced with CallBase references rather than pointers. Left FIXME in places where more changes are facilitated by CallBase, but aren't CallSites: Instruction* parameters or return values, for example, where the contract that they are actually CallBase values. Reviewers: davidxl, dblaikie, wmi Reviewed By: dblaikie Subscribers: arsenm, jvesely, nhaehnle, eraman, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77930	2020-04-12 08:27:29 -07:00
Sanjay Patel	d04db4825a	[x86] use vector instructions to lower FP->int->FP casts As discussed in PR36617: https://bugs.llvm.org/show_bug.cgi?id=36617#c13 ...we can avoid the likely slow round-trip from XMM to GPR to XMM by using the vector versions of the convert instructions. Based on experimental results from recent Intel/AMD chips, we don't need to worry about triggering denorm stalls while operating on garbage data in the high lanes with convert instructions, so this is expected to always be as good or better perf than the scalar instruction equivalent. FP exceptions are also not a concern because strict code should not be using the regular SDAG opcodes. Differential Revision: https://reviews.llvm.org/D77895	2020-04-12 10:26:43 -04:00
Simon Pilgrim	40581a0a2b	[X86] Use isAnyZero shuffle mask helper where possible. NFC.	2020-04-12 10:57:29 +01:00
Craig Topper	d3465e0691	[X86] Enable shuffle combining for AVX512 unless the root is used by a vselect A lot of vectorized code doesn't use masks so we shouldn't penalize them by not doing shuffle combining on avx512 targets. I've added support for VALIGNQ/VALIGND and 512-bit SHUF128 to prevent some regressions. I also prevented recombining 256-bit SHUF128 to PERM2X128. We may not need to add 256-bit SHUF128 support, but I don't think I found any cases requiring that in my testing. Differential Revision: https://reviews.llvm.org/D77928	2020-04-11 20:05:10 -07:00
Matt Arsenault	96819011ca	AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts These need to be promoted and scalarized for the SALU.	2020-04-11 20:55:33 -04:00
Matt Arsenault	ac8d51a3c6	AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16 The current selector depends on 16-bit shifts using 16-bit shift amount types, but really it should accept either for all types.	2020-04-11 18:12:26 -04:00
Craig Topper	d1da1b53ff	[X86] Cleanup ISD::BRIND handling code in X86DAGToDAGISel::Select. NFC -Drop llvm:: on MVT::i32 -Use getValueType instead of getSimpleValueType for an equality check just cause its shorter and doesn't matter. -Don't create a const SDValue & since its cheap to copy. -Remove explicit case from MVT enum to EVT. -Add message to assert.	2020-04-11 15:01:05 -07:00
Craig Topper	21a7d08e72	[X86] Move code that replaces ISD::VSELECT with X86ISD::BLENDV from X86DAGToDAGISel::Select to PreprocessISelDAG	2020-04-11 15:01:05 -07:00
Matt Arsenault	c5497e5399	AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects	2020-04-11 15:59:51 -04:00
Fangrui Song	0a55d3f557	[MC] Default MCAsmInfo::UseIntegratedAssembler to true	2020-04-11 10:13:52 -07:00
Fangrui Song	d2e5157c1f	[MC] Add UseIntegratedAssembler = false. NFC	2020-04-11 10:13:49 -07:00
Matt Arsenault	cf29333f40	AMDGPU/GlobalISel: Work around forming illegal zextload after legalize Selection would fail after the post legalize combiner put an illegal zextload back together. The base combiner has parameter to only allow legal operations, but they appear to not be used. I also don't see a nice way to remove a single entry from all_combines, so just hack around this.	2020-04-11 10:52:58 -04:00
Sanjay Patel	1318ddbc14	[VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC As proposed in D77881, we'll have the related widening operation, so this name becomes too vague. While here, change the function signature to take an 'int' rather than 'size_t' for the scaling factor, add an assert for overflow of 32-bits, and improve the documentation comments.	2020-04-11 10:05:49 -04:00
Nemanja Ivanovic	512600e3c0	[PowerPC] Handle f16 as a storage type only The PPC back end currently crashes (fails to select) with f16 input. This patch expands it on subtargets prior to ISA 3.0 (Power9) and uses the HW conversions on Power9. Fixes https://bugs.llvm.org/show_bug.cgi?id=39865 Differential revision: https://reviews.llvm.org/D68237	2020-04-11 07:34:47 -05:00
Craig Topper	9c1842d8af	Change FastISel::CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI. This is the same as what was done to the CallLoweringInfo in TargetLowering.h in r309159. This is just a step on the way to replacing this with CallBase.	2020-04-10 23:45:36 -07:00
Nemanja Ivanovic	04eae39617	[PowerPC] Another folow-up fix for `6c4b40def7` There was another issue introduced by this commit that the OP initially missed. Namely, for functions that are free to use R2 as a callee-saved register, we emit a TOC expression based on the address of the GEP label without emitting the GEP label. Since we only emit such expressions for the large code model, this issue only surfaced there. I have confirmed that with this fix, the kernel build is successful with target "all".	2020-04-10 21:09:59 -05:00
Scott Constable	0505181006	[X86] Fix to X86LoadValueInjectionRetHardeningPass for possible segfault `MBB.back()` could segfault if `MBB.empty()`. Fixed by checking for `MBB.empty()` in the loop. Differential Revision: https://reviews.llvm.org/D77584	2020-04-10 18:28:08 -07:00
Sam Clegg	16206ee07d	[WebAssembly] Minor cleanup to WebAssemblySubtarget. NFC. Pretty much all other platforms pass CPU string as arg0 of initializeSubtargetDependencies. Differential Revision: https://reviews.llvm.org/D77894	2020-04-10 16:47:39 -07:00
Craig Topper	b8a108140d	[CallSite removal][X86] Remove uses of CallSite from X86WinEHState.cpp Differential Revision: https://reviews.llvm.org/D77862	2020-04-10 11:34:06 -07:00
Fangrui Song	a7aaaf7016	[MC][RISCV] Make .reloc support arbitrary relocation types Similar to D76746 (ARM), D76754 (AArch64) and llvmorg-11-init-6967-g152d14da64c (x86) Differential Revision: https://reviews.llvm.org/D77018	2020-04-10 10:43:53 -07:00
Craig Topper	a6732069ee	[CallSite removal][X86] Remove unneeded use of CallSite. NFC We already have a CallInst, we can just get the calling convention from it.	2020-04-10 10:27:21 -07:00
Fangrui Song	7f36cb1f1a	[AArch64InstPrinter] Change printAlignedLabel to print the target address in hexadecimal form Similar to D76580 (x86) and D76591 (PPC). ``` // llvm-objdump -d output (before) 10000: 08 00 00 94 bl #32 10004: 08 00 00 94 bl #32 // llvm-objdump -d output (after) 10000: 08 00 00 94 bl 0x10020 10004: 08 00 00 94 bl 0x10024 // GNU objdump -d. The lack of 0x is not ideal due to ambiguity. 10000: 94000008 bl 10020 <bar+0x18> 10004: 94000008 bl 10024 <bar+0x1c> ``` The new output makes it easier to find the jump target. Differential Revision: https://reviews.llvm.org/D77853	2020-04-10 09:21:09 -07:00
Simon Pilgrim	1824ae0f42	[X86] Remove defunct EmitLoweredAtomicFP declaration. NFC.	2020-04-10 17:05:07 +01:00
Simon Pilgrim	dd84a2f77a	[X86] Remove defunct emitFMA3Instr declaration. NFC.	2020-04-10 17:05:06 +01:00
Christopher Tetreault	65b8b643b4	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: sdesmalen, efriedma, jonpa Reviewed By: sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77265	2020-04-10 08:43:32 -07:00
Simon Pilgrim	a88cc20456	ProfileSummaryInfo.h - remove unnecessary includes. NFC Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration. This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.	2020-04-10 16:25:48 +01:00
Stanislav Mekhanoshin	44920e8566	[AMDGPU] Disable sub-dword scralar loads IR widening These will be widened in the DAG. In the meanwhile early widening prevents otherwise possible vectorization of such loads. Differential Revision: https://reviews.llvm.org/D77835	2020-04-10 08:20:49 -07:00
Simon Pilgrim	91bc50c0d7	[CostModel][X86] Improve InsertElement costs for sub-128bit vectors If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs	2020-04-10 14:55:46 +01:00
Michael Liao	b54b4ecac3	Fix `-Wextra` warning. NFC.	2020-04-10 03:22:02 -04:00
Kai Luo	b7d5229d78	[PowerPC] Update alignment for ReuseLoadInfo in LowerFP_TO_INTForReuse In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is not updated for following loads. It's related to failed alignment check reported in https://bugs.llvm.org/show_bug.cgi?id=45297 Differential Revision: https://reviews.llvm.org/D77624	2020-04-10 05:49:19 +00:00
Nemanja Ivanovic	7f3787c0f2	[PowerPC] Bail out of redundant LI elimination on an implicit kill The transformation currently does not differentiate between explicit and implicit kills. However, it is not valid to later simply clear an implicit kill flag since the kill could be due to a call or return. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45374	2020-04-09 22:17:29 -05:00
Heejin Ahn	b647de9925	[WebAssembly] Use dummy debug info in Emscripten SjLj Summary: D74269 added debug info to newly created instructions, including calls to `malloc` and `free`, by taking debug info from existing instructions around, whose debug info may or may not be empty. But there are cases debug info is required by the IR verifier: when both the caller and the callee functions have DISubprograms, meaning we already have declarations to `malloc` or `free` with a DISubprogram attached, newly created calls to `malloc` and `free` should have non-empty debug info. This patch creates a non-empty dummy debug info in this case to those calls to make the IR verifier pass. Fixes https://bugs.llvm.org/show_bug.cgi?id=45461. Reviewers: dschuff Subscribers: aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77784	2020-04-09 18:44:50 -07:00
Stefan Pintilie	5b18b6e9a8	[PowerPC][Future] Fix for `6c4b40def7` This is a fix for the previous patch `6c4b40def7`. In some cases it may be possible to have the compiler produce st_other=1 without the compiler using mcpu=future which should not be the case. This patch adds a guard to make sure that if we are using st_other=1 then we are also compiling for future CPU.	2020-04-10 01:12:11 +00:00
Craig Topper	5625e6ab37	[X86] Improve min/max reduction costs. This is similar to what I recently did for getArithmeticReductionCost. I'm trying to account for the narrowing from 512->256->128 as we go. I've also added a new helper method getMinMaxCost that tries to handle the cases where we have native min/max instructions and fall back to cmp+select when we don't. Differential Revision: https://reviews.llvm.org/D76634	2020-04-09 17:28:50 -07:00

1 2 3 4 5 ...

57083 Commits