llvm-project

Commit Graph

Author	SHA1	Message	Date
Nemanja Ivanovic	03e7fefff8	[PowerPC] Canonicalize shuffles on big endian targets as well Extend shuffle canonicalization and conversion of shuffles fed by vectorized scalars to big endian subtargets. For big endian subtargets, loads and direct moves of scalars into vector registers put the data in the correct element for SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is narrower, the value still ends up in the wrong place - althouth a different wrong place than on little endian targets. This patch extends the combine that keeps values where they are if they feed a shuffle to big endian targets. Differential revision: https://reviews.llvm.org/D100478	2021-04-20 07:29:47 -05:00
Jay Foad	edea476142	[AMDGPU] Use simpler alternatives to !foldl. NFC.	2021-04-20 12:59:04 +01:00
hsmahesha	840c4e4e90	[AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D100773	2021-04-20 16:17:15 +05:30
Ben Shi	30e2c7be99	[RISCV] Refactor an optimization of addition with immediate Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100769	2021-04-20 18:04:25 +08:00
Joe Ellis	c91cd4f3bb	[AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts... when the predicate used by last{a,b} specifies a known vector length. For example: aarch64_sve_lasta(VL1, D) -> extractelement(D, #1) aarch64_sve_lastb(VL1, D) -> extractelement(D, #0) Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100476	2021-04-20 10:01:33 +00:00
Fraser Cormack	b4a358a7ba	[RISCV] Fix missing emergency slots for scalable stack offsets This patch adds an additional emergency spill slot to RVV code. This is required as RVV stack offsets may require an additional register to compute. This patch includes an optimization by @HsiangKai <kai.wang@sifive.com> to reduce the number of registers required for the computation of stack offsets from 3 to 2. Otherwise we'd need two additional emergency spill slots. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100574	2021-04-20 09:59:41 +01:00
Qiu Chaofan	2432d80d3b	[PowerPC] Use mtvsrdd to put callee-saved GPR into VSR This patch exploits mtvsrdd instruction (available in ISA3.0+) to save two callee-saved GPR registers into a single VSR, making it more efficient. Reviewed By: jsji, nemanjai Differential Revision: https://reviews.llvm.org/D62565	2021-04-20 16:43:24 +08:00
Jay Foad	b22721f01a	[AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used Don't shrink VOP3 instructions if there are any uses of a carry-out operand, because the shrunken form of the instruction would write the carry-out to vcc instead of to a virtual register. Differential Revision: https://reviews.llvm.org/D100760	2021-04-20 09:17:52 +01:00
Qiu Chaofan	b820339752	[PowerPC] Support f128 under VSX This patch is the last one in backend to support fp128 type in pre-POWER9 subtargets with VSX, removing temporary option and updating remaining tests. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92374	2021-04-20 15:49:52 +08:00
Zi Xuan Wu	4bb60c285c	[CSKY 6/n] Add support branch and symbol series instruction This patch adds basic CSKY branch instructions and symbol address series instructions. Those two kinds of instruction have relationship between each other, and it involves much work about Fixups. For now, basic instructions are enabled except for disassembler support. We would support to generate basic codegen asm firstly and delay disassembler work later. Differential Revision: https://reviews.llvm.org/D95029	2021-04-20 15:36:49 +08:00
Zi Xuan Wu	4216389c26	[CSKY 5/n] Add support for all CSKY basic integer instructions except for branch series This patch adds basic CSKY integer instructions except for branch series such as bsr, br. It mainly includes basic ALU, load & store, compare and data move instructions. Branch series instructions need handle complex symbol operand as following patch later. Differential Revision: https://reviews.llvm.org/D94007	2021-04-20 15:36:49 +08:00
Zi Xuan Wu	8ba622bae1	[CSKY 4/n] Add basic CSKYAsmParser and CSKYInstPrinter This basic parser will handle basic instructions with register or immediate operands. With the addition of CSKYInstPrinter, we can now make use of lit tests. Differential Revision: https://reviews.llvm.org/D93798	2021-04-20 15:36:49 +08:00
Zakk Chen	d5fa71e9ec	[RISCV] Handle PseudoVRELOAD and PseudoVSPILL in getInstSizeInBytes. It's necessary to calculate correct instruction size because PseudoVRELOAD and PseudoSPILL will be expanded into multiple instructions. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100702	2021-04-19 22:30:03 -07:00
Jun Ma	5c6ac3b4a2	[AArch64][SVE] Combine add and index_vector This patch tries to combine pattern add(index_vector(zero, step), dup(X)) into index_vector(X, step) TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100107	2021-04-20 11:38:37 +08:00
Min-Yih Hsu	7ac461f6f7	[M68k] Put M68kDesc as the direct library dependency for disassembler M68kDisassembler should put M68kDesc as its direct library dependency since it uses logics releated to code beads Otherwise the build will fail when building LLVM libraries as shared objects (building LLVM libraries statically won't have this problem though)	2021-04-19 15:56:24 -07:00
Ricky Taylor	2221185776	[M68k] Implement Disassembler This is an implementation of a disassembler for M68k. Differential Revision: https://reviews.llvm.org/D98540	2021-04-19 22:24:12 +01:00
Ricky Taylor	6de262827c	[M68k] Change printing of absolute memory references This also includes PC-relative addresses since they are still referenced as absolute addresses in assembly and converted to relative addresses by the assembler. This changes, for example: - `bra #-2` -> `bra $100` - `jsr #16` -> `jsr $10` Differential Revision: https://reviews.llvm.org/D100697	2021-04-19 22:24:12 +01:00
David Penry	78a871abf7	[ARM] Use ProcResGroup in Cortex-M7 scheduling model Used to model structural hazards on FP issue, where some instructions take up 2 issue slots and others one as well as similar structural hazards on load issue, where some instructions take up two load lanes and others one. Differential Revision: https://reviews.llvm.org/D98977	2021-04-19 21:23:05 +01:00
Thomas Lively	e657c84fa1	[WebAssembly] Use v128.const instead of splats for constants We previously used splats instead of v128.const to materialize vector constants because V8 did not support v128.const. Now that V8 supports v128.const, we can use v128.const instead. Although this increases code size, it should also increase performance (or at least require fewer engine-side optimizations), so it is an appropriate change to make. Differential Revision: https://reviews.llvm.org/D100716	2021-04-19 12:43:59 -07:00
Jinsong Ji	d88d8c5b86	[PowerPC] Disable relative lookup table converter pass for AIX XCOFF hasn't implemented lowerRelativeReference. So we need to disable new pass introduced by https://reviews.llvm.org/D94355 for AIX for now. Reviewed By: gulfem Differential Revision: https://reviews.llvm.org/D100584	2021-04-19 19:28:11 +00:00
madhur13490	6a4d9cb7e0	[AMDGPU] Remove error check for indirect calls and add missing queue-ptr This patch removes -fixed-abi check for indirect calls and also adds queue-ptr which is required for indirect calls to work. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D100633	2021-04-20 00:35:17 +05:30
Pavel Iliin	2ec16103c6	[AArch64] Peephole rule to remove redundant cmp after cset. Comparisons to zero or one after cset instructions can be safely removed in examples like: cset w9, eq cset w9, eq cmp w9, #1 ---> <removed> b.ne .L1 b.ne .L1 cset w9, eq cset w9, eq cmp w9, #0 ---> <removed> b.ne .L1 b.eq .L1 Peephole optimization to detect suitable cases and get rid of that comparisons added. Differential Revision: https://reviews.llvm.org/D98564	2021-04-19 19:58:38 +01:00
Craig Topper	87afefcd22	[RISCV] Fix mistake in comment. NFC	2021-04-19 11:15:32 -07:00
Craig Topper	7ed01a420a	[RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte. As noted in the FIXME there's a sort of agreement that the any extra bits stored will be 0. The generated code is pretty terrible. I was really hoping we could use a tail undisturbed trick, but tail undisturbed no longer applies to masked destinations in the current draft spec. Fingers crossed that it isn't common to do this. I doubt IR from clang or the vectorizer would ever create this kind of store. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100618	2021-04-19 11:05:18 -07:00
Jessica Paquette	65f257a215	[AArch64][GlobalISel] Implement custom legalization for s32 and s64 G_CTPOP This is a partial port of AArch64TargetLowering::LowerCTPOP. This custom lowering tries to uses NEON instructions to give a more efficient CTPOP lowering when possible. In the non-NEON/noimplicitfloat case, this should use the generic lowering (see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after implementing the widening code for s16/s8 though. Differential Revision: https://reviews.llvm.org/D100399	2021-04-19 10:56:02 -07:00
Nick Desaulniers	c440b97d89	[TargetLowering] move "o" and "X" constraint handling to base class These constraints are machine agnostic; there's no reason to handle these per-arch. If arches don't support these constraints, then they will fail elsewhere during instruction selection. We don't need virtual calls to look these up; TargetLowering::getInlineAsmMemConstraint should only be overridden by architectures with additional unique memory constraints. Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D100416	2021-04-19 10:53:31 -07:00
Jessica Paquette	91bbb914e0	[AArch64][GlobalISel] Regbankselect + select @llvm.aarch64.neon.uaddlv It turns out we actually import a bunch of selection code for intrinsics. The imported code checks that the register banks on the G_INTRINSIC instruction are correct. If so, it goes ahead and selects it. This adds code to AArch64RegisterBankInfo to allow us to correctly determine register banks on intrinsics which have known register bank constraints. For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for porting AArch64TargetLowering::LowerCTPOP. Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction. This seems a little nicer than having to know about how intrinsic instructions are structured. Differential Revision: https://reviews.llvm.org/D100398	2021-04-19 10:47:49 -07:00
Jay Foad	a02aa91313	[AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC.	2021-04-19 14:20:46 +01:00
Jay Foad	ef443390a9	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Reapply after fixing dependent change D100188. Differential Revision: https://reviews.llvm.org/D100189	2021-04-19 12:08:02 +01:00
Jay Foad	323ef0eb45	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Reapply with a fix for using InstToErase after it had been erased. Differential Revision: https://reviews.llvm.org/D100188	2021-04-19 12:05:41 +01:00
Cullen Rhodes	f0bc2782f2	[TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100377	2021-04-19 11:01:34 +00:00
Dmitry Preobrazhensky	bcc29e0fcf	[AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3 Disabled constants as carry in/out operands. See bug 48711. Differential Revision: https://reviews.llvm.org/D100642	2021-04-19 13:42:31 +03:00
Roman Lebedev	df9597cf5a	[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap This is similar to the subvector extractions, except that the 0'th subvector isn't free to insert, because we generally don't know whether or not the upper elements need to be preserved: https://godbolt.org/z/rsxP5W4sW This is needed to avoid regressions in D100684 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100698	2021-04-19 13:24:58 +03:00
Fraser Cormack	c9a93c3e01	[RISCV] Lower vector shuffles to vrgather operations This patch extends the lowering of RVV fixed-length vector shuffles to avoid the default stack expansion and instead lower to vrgather instructions. For "permute"-style shuffles where one vector is swizzled, we can lower to one vrgather. For shuffles involving two vector operands, we lower to one unmasked vrgather (or splat, where appropriate) followed by a masked vrgather which blends in the second half. On occasion, when it's not possible to create a legal BUILD_VECTOR for the indices, we use vrgatherei16 instructions with 16-bit index types. For 8-bit element vectors where we may have indices over 255, we have a fairly blunt fallback to the stack expansion to avoid custom-splitting of the vector types. To enable the selection of masked vrgather instructions, this patch extends the various RISCVISD::VRGATHER nodes to take a passthru operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100549	2021-04-19 11:13:13 +01:00
Yaxun (Sam) Liu	3597f02fd5	[AMDGPU] Add GlobalDCE before internalization pass The internalization pass only internalizes global variables with no users. If the global variable has some dead user, the internalization pass will not internalize it. To be able to internalize global variables with dead users, a global dce pass is needed before the internalization pass. This patch adds that. Reviewed by: Artem Belevich, Matt Arsenault Differential Revision: https://reviews.llvm.org/D98783	2021-04-17 11:25:25 -04:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Nemanja Ivanovic	ff769dd111	[PowerPC] Minor improvement for insert_vector_elt codegen For v2f64, all VSX subtargets can insert an element with a single XXPERMDI.	2021-04-16 18:52:37 -05:00
Joe Nash	a0ed70abde	[AMDGPU] Remove redundant field from DPP8 def These lines set the value to what it already was, so they are redundant. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100664 Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3	2021-04-16 16:23:52 -04:00
Joe Nash	919236e608	[AMDGPU] NFC, Comment in disassembler for dpp8 Gives reasoning for convertDPP8. Also corrects typo in Operand type comment. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100665 Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4	2021-04-16 16:21:47 -04:00
Thomas Lively	5c729750a6	[WebAssembly] Remove saturating fp-to-int target intrinsics Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead. This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u}, which are now represented in IR as a saturating truncation to a v2i32 followed by a concatenation with a zero vector. Differential Revision: https://reviews.llvm.org/D100596	2021-04-16 12:11:20 -07:00
Christudasan Devadasan	97618522dc	[AMDGPU] Remove dead dcode (NFC).	2021-04-16 23:03:31 +05:30
Joe Nash	7cc4a02fa2	[AMDGPU] Refactor VOP3P Profile and AsmParser, NFC Refactors VOP3P tablegen and the AsmParser for VOP3P for better extensibility. NFC intended Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100602 Change-Id: I038e3a772ac348bb18979cdf3e3ae2e9476dd411	2021-04-16 13:06:50 -04:00
Malhar Jajoo	093f1828e5	[ARM] Prevent phi-node-elimination from generating copy above t2WhileLoopStartLR This patch prevents phi-node-elimination from generating a COPY operation for the register defined by t2WhileLoopStartLR, as it is a terminator that defines a value. This happens because of the presence of phi-nodes in the loop body (the Preheader of which is the block containing the t2WhileLoopStartLR). If this is not done, the COPY is generated above/before the terminator (t2WhileLoopStartLR here), and since it uses the value defined by t2WhileLoopStartLR, MachineVerifier throws a 'use before define' error. This essentially adds on to the change in differential D91887/D97729. Differential Revision: https://reviews.llvm.org/D100376	2021-04-16 16:45:07 +01:00
Roman Lebedev	b06c55a698	[X86][CostModel] Fix cost model for non-power-of-two vector load/stores Sometimes LV has to produce really wide vectors, and sometimes they end up being not powers of two. As it can be seen from the diff, the cost computation is currently completely non-sensical in those cases. Instead of just scalarizing everything, split/factorize the wide vector into a number of subvectors, each one having a power-of-two elements, recurse to get the cost of op on this subvector. Also, check how we'd legalize this subvector, and if the legalized type is scalar, also account for the scalarization cost. Note that for sub-vector loads, we might be able to do better, when the vectors are properly aligned. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100099	2021-04-16 15:30:57 +03:00
David Green	00a6045473	[ARM] Combine sub 0, csinc X, Y, CC -> csinv -X, Y, CC Combine sub 0, csinc X, Y, CC to csinv -X, Y, CC providing that the negation of X is cheap, currently just handling constants. This comes up during the splat of an i1 to a predicate, where we now generate csetm, as opposed to cset; rsb. Differential Revision: https://reviews.llvm.org/D99940	2021-04-16 11:52:31 +01:00
Nick Desaulniers	bb7016f8f5	[Aarch64] handle "o" inline asm memory constraints This Linux kernel is making use of this inline asm constraint which is causing an ICE. PR49956 Link: https://github.com/ClangBuiltLinux/linux/issues/1348 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100412	2021-04-15 23:36:21 -07:00
Jim Lin	2893570e86	[RISCV] Don't emit save-restore call if function is a interrupt handler It has to save all caller-saved registers before a call in the handler. So don't emit a call that save/restore registers. Reviewed By: simoncook, luismarques, asb Differential Revision: https://reviews.llvm.org/D100532	2021-04-16 12:54:47 +08:00
hsmahesha	099dcb68a6	[AMDGPU] Refactor ds_read/ds_write related select code for better readability. Part of the code related to ds_read/ds_write ISel is refactored, and the corresponding comment is re-written for better readability, which would help while implementing any future ds_read/ds_write ISel related modifications. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100300	2021-04-16 08:24:00 +05:30
Momchil Velikov	f9d932e673	[clang][AArch64] Correctly align HFA arguments when passed on the stack When we pass a AArch64 Homogeneous Floating-Point Aggregate (HFA) argument with increased alignment requirements, for example struct S { __attribute__ ((__aligned__(16))) double v[4]; }; Clang uses `[4 x double]` for the parameter, which is passed on the stack at alignment 8, whereas it should be at alignment 16, following Rule C.4 in AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules) Currently we don't have a way to express in LLVM IR the alignment requirements of the function arguments. The align attribute is applicable to pointers only, and only for some special ways of passing arguments (e..g byval). When implementing AAPCS32/AAPCS64, clang resorts to dubious hacks of coercing to types, which naturally have the needed alignment. We don't have enough types to cover all the cases, though. This patch introduces a new use of the stackalign attribute to control stack slot alignment, when and if an argument is passed in memory. The attribute align is left as an optimizer hint - it still applies to pointer types only and pertains to the content of the pointer, whereas the alignment of the pointer itself is determined by the stackalign attribute. For byval arguments, the stackalign attribute assumes the role, previously perfomed by align, falling back to align if stackalign` is absent. On the clang side, when passing arguments using the "direct" style (cf. `ABIArgInfo::Kind`), now we can optionally specify an alignment, which is emitted as the new `stackalign` attribute. Patch by Momchil Velikov and Lucas Prates. Differential Revision: https://reviews.llvm.org/D98794	2021-04-15 22:58:14 +01:00
Stanislav Mekhanoshin	13015ebd6f	[AMDGPU] Factor out predicate FmaakFmamkF32Insts Differential Revision: https://reviews.llvm.org/D100409	2021-04-15 12:29:16 -07:00
Stanislav Mekhanoshin	d4385e483d	[AMDGPU] Add new EmitDstSel field to VOPPofile. NFC. Differential Revision: https://reviews.llvm.org/D100589	2021-04-15 12:07:08 -07:00
hsmahesha	82787eb228	[AMDGPU] Move LDS lowering related utility functions to a separate utils file. Move some utility functions which are used within LDS lowering pass to a separate utils file so that other LDS related passes can make use of them when required. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100526	2021-04-16 00:15:48 +05:30
Krzysztof Parzyszek	280678122d	[Hexagon] Avoid infinite loops in type legalization when lowering SETCC Only widen SETCC if the operands can be widened. Not checking that caused infinite widen-split loops in legalization.	2021-04-15 13:34:37 -05:00
Craig Topper	1656df13da	[RISCV] Share RVInstIShift and RVInstIShiftW instruction format classes with the B extension. This generalizes RVInstIShift/RVInstIShiftW to take the upper 5 or 7 bits of the immediate as an input instead of only bit 30. Then we can share them. For RVInstIShift I left a hardcoded 0 at bit 26 where RV128 gets a 7th bit for the shift amount. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100424	2021-04-15 11:08:28 -07:00
Arthur Eubanks	c8f0a7c215	[NewPM] Cleanup IR printing instrumentation Being lazy with printing the banner seems hard to reason with, we should print it unconditionally first (it could also lead to duplicate banners if we have multiple functions in -filter-print-funcs). The printIR() functions were doing too many things. I separated out the call from PrintPassInstrumentation since we were essentially doing two completely separate things in printIR() from different callers. There were multiple ways to generate the name of some IR. That's all been moved to getIRName(). The printing of the IR name was also inconsistent, now it's always "IR Dump on $foo" where "$foo" is the name. For a function, it's the function name. For a loop, it's what's printed by Loop::print(), which is more detailed. For an SCC, it's the list of functions in parentheses. For a module it's "[module]", to differentiate between a possible SCC with a function called "module". To preserve D74814, we have to check if we're going to print anything at all first. This is unfortunate, but I would consider this a special case that shouldn't be handled in the core logic. Reviewed By: jamieschmeiser Differential Revision: https://reviews.llvm.org/D100231	2021-04-15 09:50:55 -07:00
Stefan Pintilie	f28cb01be0	[PowerPC] Add ROP Protection Instructions for PowerPC There are four new PowerPC instructions that are introduced in Power 10. They are hashst, hashchk, hashstp, hashchkp. These instructions will be used for ROP Protection. This patch adds the four instructions. Reviewed By: nemanjai, amyk, #powerpc Differential Revision: https://reviews.llvm.org/D99375	2021-04-15 11:38:38 -05:00
Sebastian Neubauer	7842e1725e	[AMDGPU] Fix large return values with amdgpu_gfx Returning in memory is not supported, so fall back to sret. Also, extend i1 and i16 to i32. Otherwise, they would be passed through memory. Differential Revision: https://reviews.llvm.org/D100543	2021-04-15 14:57:56 +02:00
Simon Pilgrim	9d57a77b81	[X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0) If we are truncating from a i32 source before comparing the result against zero, then see if we can directly compare the source value against zero. If the upper (truncated) bits are known to be zero then we can compare against that, hopefully increasing the chances of us folding the compare into a EFLAG result of the source's operation. Fixes PR49028. Differential Revision: https://reviews.llvm.org/D100491	2021-04-15 13:55:51 +01:00
Bradley Smith	22c017f0f9	[AArch64][NEON] Match (or (and -a b) (and (a+1) b)) => bit select With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100304	2021-04-15 13:52:47 +01:00
Florian Hahn	acd9cc7495	[AArch64] Use type-legalization cost for code size memop cost. At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory operations on large vectors have a cost of one, even if they will get expanded to a large number of memory operations in the backend. This patch updates getMemoryOpCost to return the cost for the type legalization for both CodeSize and SizeAndLatency. This should more accurately reflect the number of memory operations required. I am not sure how latency should properly be included in SizeAndLatency from the description, but returning the size cost should be clearly more accurate. This does not cause any binary changes when building MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because large vector memops are not really formed by code emitted from Clang. But using the C/C++ matrix extension can easily result in code with very large vector operations directly from Clang, e.g. https://clang.godbolt.org/z/6xzxcTGvb Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100291	2021-04-15 10:11:05 +01:00
Martin Storsjö	5144f730a8	[AArch64] Fix windows vararg functions with floats in the fixed args On Windows, float arguments are normally passed in float registers in the calling convention for regular functions. For variable argument functions, floats are passed in integer registers. This already was done correctly since many years. However, the surprising bit was that floats among the fixed arguments also are supposed to be passed in integer registers, contrary to regular functions. (This also seems to be the behaviour on ARM though, both on Windows, but also on e.g. hardfloat linux.) In the calling convention, don't promote shorter floats to f64, but convert them to integers of the same length. (Floats passed as part of the actual variable arguments are promoted to double already on the C/Clang level; the LLVM vararg calling convention doesn't do any extra promotion of f32 to f64 - this matches how it works on X86 too.) Technically, this is an ABI break compared to older LLVM versions, but it fixes compatibility with the official platform ABI. (In practice, floats among the fixed arguments in variable argument functions is a pretty rare construct.) Differential Revision: https://reviews.llvm.org/D100365	2021-04-15 11:02:14 +03:00
Craig Topper	c3f1271464	[RISCV] Add a PatFrag to shorten repeated (XLenVT (VLOp GPR:$vl)) in V extension patterns. Reduces the amount of changes needed in D100288.	2021-04-14 22:36:35 -07:00
hsmahesha	4973b0c4e7	[AMDGPU] Disable forceful inline of non-kernel functions which use LDS. Now since LDS uses within non-kernel functions are being handled in the pass - LowerModuleLDS, we NO need to forcefully inline non-kernel functions just because they use LDS. Do forceful inlining only when the pass - LowerModuleLDS is not enabled. It is enabled by default. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100481	2021-04-15 09:12:56 +05:30
Thomas Lively	6a18cc23ef	[WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u} Removes the builtins and intrinsics used to opt in to using these instructions and replaces them with normal ISel patterns now that they are no longer prototypes. Differential Revision: https://reviews.llvm.org/D100402	2021-04-14 13:43:09 -07:00
Thomas Lively	af7925b4dd	[WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u} Add a custom DAG combine and ISD opcode for detecting patterns like (uint_to_fp (extract_subvector ...)) before the extract_subvector is expanded to ensure that they will ultimately lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions are no longer prototypes and can now be produced via standard IR, this commit also removes the target intrinsics and builtins that had been used to prototype the instructions. Differential Revision: https://reviews.llvm.org/D100425	2021-04-14 10:42:45 -07:00
Stanislav Mekhanoshin	b7ebb25e53	[AMDGPU] Factor out SelectSAddrFI() This is a service function generally useful for selection of a FI in an SADDR. NFC for now, needed for future patch. Differential Revision: https://reviews.llvm.org/D100406	2021-04-14 09:40:02 -07:00
Sander de Smalen	4f42d873c2	[TTI] NFC: Change getArithmeticInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100317	2021-04-14 17:20:36 +01:00
Sander de Smalen	1af35e77f4	[TTI] NFC: Change getVectorInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100315	2021-04-14 17:20:35 +01:00
Sander de Smalen	174e8f6c5e	[TTI] NFC: Change getShuffleCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100314	2021-04-14 17:20:35 +01:00
Sander de Smalen	14b934f8a6	[TTI] NFC: Change getCFInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100313	2021-04-14 17:20:34 +01:00
Sander de Smalen	596f669cfb	[TTI] NFC: Change getCallInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D100312	2021-04-14 17:20:34 +01:00
Thomas Lively	af7ab81ce3	[WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops Now that these instructions are no longer prototypes, we do not need to be careful about keeping them opt-in and can use the standard LLVM infrastructure for them. This commit removes the bespoke intrinsics we were using to represent these operations in favor of the corresponding target-independent intrinsics. The clang builtins are preserved because there is no standard way to easily represent these operations in C/C++. For consistency with the scalar codegen in the Wasm backend, the intrinsic used to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though @llvm.roundeven better captures the semantics of the underlying Wasm instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is left to a potential future patch. Differential Revision: https://reviews.llvm.org/D100411	2021-04-14 09:19:27 -07:00
hsmahesha	e3070db0f7	[AMDGPU] Rename "LDS lowering" pass name. Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to `amdgpu-enable-lower-module-lds` as later is consistent and reads better. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100441	2021-04-14 20:19:53 +05:30
Simon Pilgrim	4fbe761572	[X86][SSE] canonicalizeShuffleWithBinOps - check for more combos of merge-able binary shuffles. In the fold SHUFFLE(BINOP(X,Y),BINOP(Z,W)) -> BINOP(SHUFFLE(X,Z),SHUFFLE(Y,W)), check if both X/Z AND Y/W have at least one merge-able shuffle in which case the total number of shuffle should still fall. Helps with instruction count regressions we saw while fixing PR48823	2021-04-14 15:24:41 +01:00
Pablo Barrio	cca40aa8d8	[AArch64][v8.5A] Add BTI to all function starts The existing BTI placement pass avoids inserting "BTI c" when the function has local linkage and is only directly called. However, even in this case, there is a (small) chance that the linker later adds a hunk with an indirect call to the function, e.g. if the function is placed in a separate section and moved far away from its callers. Make sure to add BTI for these functions too. Differential Revision: https://reviews.llvm.org/D99417	2021-04-14 15:24:01 +01:00
Sebastian Neubauer	929edd4375	[AMDGPU] Mark scavenged SGPR as used Otherwise it reuses the same register for storing the stack slot offset if the stack slot offset is big. Differential Revision: https://reviews.llvm.org/D100461	2021-04-14 14:55:01 +02:00
Zarko Todorovski	6b7838b68c	[AIX] Allow safe for 32bit P8 VSX pattern matching Pull some of the safe for 32bit pattern matching for Pwr8 and above. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97909	2021-04-14 08:12:48 -04:00
Simon Pilgrim	73737fe990	[X86] Fold cmpeq/ne(trunc(x),0) --> cmpeq/ne(x,0) Relax the fold from rGbaadbe04bf75 to compare any op, not just logic ops, now that the movmsk regressions have been handled.	2021-04-14 11:02:02 +01:00
Simon Pilgrim	016ceb8382	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) fold Extension to rG74f98391a7a4, we can also include any of the upper (known zero) bits in the comparison in the shuffle removal fold, just as long as we demand all the elements of the movmsk source vector.	2021-04-14 11:02:01 +01:00
Nemanja Ivanovic	8be3181df6	[PowerPC] Fix incorrect subreg typo from `0148bf53f0`	2021-04-14 05:01:12 -05:00
Martin Storsjö	3b32dc4b84	[ARM] [COFF] Properly produce cross-section relative relocations Differential Revision: https://reviews.llvm.org/D99574	2021-04-14 12:31:28 +03:00
Martin Storsjö	d5c5cf5ce8	[AArch64] [COFF] Properly produce cross-section relative relocations This fixes breakage on Windows/ARM64 after D94355. Modelled after the corresponding code for X86; not entirely familiar with those aspects of that layer otherwise. Differential Revision: https://reviews.llvm.org/D99572	2021-04-14 12:31:26 +03:00
Bogdan Graur	0acf4e5005	[NFC] Fix unused warning. Differential Revision: https://reviews.llvm.org/D100449	2021-04-14 09:09:20 +02:00
Min-Yih Hsu	91b6ef64db	[M68k] Put M68kInfo as the direct library dependency for AsmParser M68kAsmParser uses `llvm::getTheM68kTarget` from M68kInfo, therefore we should put M68kInfo as its direct dependency. Otherwise the build will fail when building LLVM libraries as shared objects (building LLVM libraries statically won't have this problem though).	2021-04-13 21:21:02 -07:00
Wang, Pengfei	a3b52a9d13	[X86][AMX] Refactor for PostRA ldtilecfg pass. This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error. This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately. This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg. There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc. Reviewed By: LuoYuanke, xiangzhangllvm Differential Revision: https://reviews.llvm.org/D99966	2021-04-14 10:08:23 +08:00
ShihPo Hung	d5e962f1f2	[RISCV] Implement COPY for Zvlsseg registers When copying Zvlsseg register tuples, we split the COPY to NF whole register moves as below: $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2 => $v10m2 = PseudoVMV2R_V $v4m2 $v12m2 = PseudoVMV2R_V $v6m2 This patch copies forwardCopyWillClobberTuple from AArch64 to check register overlapping. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100280	2021-04-13 18:55:51 -07:00
Nemanja Ivanovic	0148bf53f0	[PowerPC] Use correct node to get a super register from a subreg The VSX tablegen file has some rather eggregious uses of COPY_TO_REGCLASS even in situations where it needs to use SUBREG_TO_REG. While this produces correct code, it often doesn't allow the register coalescer to coalesce copies and the resulting code ends up being suboptimal. This patch just changes over patterns that should use SUBREG_TO_REG.	2021-04-13 19:52:21 -05:00
root	645ce31c20	Title: [RISCV] Add missing part of instruction vmsge {u}. VX Review By: craig.topper Differential Revision : https://reviews.llvm.org/D100115	2021-04-14 06:41:59 +08:00
Craig Topper	6aa6f748ae	[RISCV] Add a generic PatGprImm class and use it to simplify patterns in RISCVInstrInfoB.td. NFC	2021-04-13 12:07:24 -07:00
Craig Topper	cb073f1bc0	[RISCV] Make use of PatGprGpr and PatGpr in RISCVInstrInfoB.td. NFC	2021-04-13 12:06:58 -07:00
Yonghong Song	a285bdb56f	BPF: remove default .extern data section Currently, for any extern variable, if it doesn't have section attribution, it will be put into a default ".extern" btf DataSec. The initial design is to put every extern variable in a DataSec so libbpf can use it. But later on, libbpf actually requires extern variables to put into special sections, e.g., ".kconfig", ".ksyms", etc. so they can be used properly based on section name. Andrii mentioned since ".extern" variables are not actually used, it makes sense to remove it from the compiler so libbpf does not need to deal with it, esp. for static linking. The BTF for these extern variables is still generated. With this patch, I tested kernel selftests/bpf and all tests passed. Indeed, removing ".extern" DataSec seems having no impact. Differential Revision: https://reviews.llvm.org/D100392	2021-04-13 11:35:52 -07:00
Craig Topper	1afdfc6169	[RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant. Prep work for adding intrinsics for these instructions in the future.	2021-04-13 11:04:28 -07:00
Jessica Paquette	516d09387b	[AArch64][GlobalISel] Mark G_CTPOP as legal for v16s8 and v8s8 G_CTPOP can be directly selected to CNT in these cases. Differential Revision: https://reviews.llvm.org/D100349	2021-04-13 11:03:39 -07:00
Simon Pilgrim	74f98391a7	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in CMP(MOVMSK(PACKSS())) -> CMP(MOVMSK()) fold We already allow the comparison of the upper bits of 'IsAllOf' (allbits) patterns, but we can safely compare the known zero bits for 'IsAnyOf' (zerobits) patterns as well. This fixes an issues where we are comparing a type wide than the number of vector elements, which avoids a regression mentioned in rGbaadbe04bf75.	2021-04-13 17:37:24 +01:00
Anirudh Prasad	7da22dfcd0	[SystemZ][z/OS] Introduce dialect querying helper functions - In the SystemZAsmParser, there will be a few queries to the type of dialect it is (AD_ATT, AD_HLASM) in future patches. - It would be nice to have two small helper functions `isParsingATT()` and `isParsingHLASM()` - Putting this as a separate smaller patch allows us to remove its definitions from other dependent patches. Reviewed By: uweigand, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D99891	2021-04-13 12:14:34 -04:00
Yonghong Song	968292cb93	BPF: generate proper BTF for globals with WeakODRLinkage For a global weak symbol defined as below: char g __attribute__((weak)) = 2; LLVM generates an allocated global with WeakAnyLinkage, for which BPF backend generates proper BTF info. For the above example, if a modifier "const" is added like const char g __attribute__((weak)) = 2; LLVM generates an allocated global with WeakODRLinkage, for which BPF backend didn't generate any BTF as it didn't handle WeakODRLinkage. This patch addes support for WeakODRLinkage and proper BTF info can be generated for weak symbol defined with "const" modifier. Differential Revision: https://reviews.llvm.org/D100362	2021-04-13 08:54:05 -07:00
Anirudh Prasad	f7eec83932	[AsmParser][SystemZ][z/OS] Add in support to allow use of additional comment strings. - Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute. - However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... /), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference) For example: Consider a target which sets the CommentString attribute to ''. The following strings are all lexed as comments. ``` "# abc" -> comment "// abc" -> comment "/* abc / -> comment " abc" -> comment ``` - In HLASM however, only "*" is accepted as a comment string, and nothing else. - To achieve this, an additional attribute (`AllowAdditionalComments`) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#". Depends on https://reviews.llvm.org/D99277 Reviewed By: abhina.sreeskantharajan, myiwanch Differential Revision: https://reviews.llvm.org/D99286	2021-04-13 11:15:09 -04:00
Sander de Smalen	03f47bdcb1	[TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100205	2021-04-13 14:21:02 +01:00
Sander de Smalen	d676b5749d	[TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100204	2021-04-13 14:21:01 +01:00
Sander de Smalen	db134e2428	[TTI] NFC: Change getCmpSelInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100203	2021-04-13 14:21:01 +01:00
Sander de Smalen	2285dfb73f	[TTI] NFC: Change getMinMaxReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100202	2021-04-13 14:21:00 +01:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	fd1f8a5462	[TTI] NFC: Change getGatherScatterOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100200	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
madhur13490	5682ae2fc6	[AMDGPU] Set implicit arg attributes for indirect calls This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347	2021-04-13 13:15:13 +00:00
Ricky Taylor	6e098e133d	[M68k] Implement AsmParser This is a work-in-progress implementation of an assembler for M68k. Outstanding work: - Updating existing tests assembly syntax - Writing new tests for the assembler (and disassembler) I've left those until there's consensus that this approach is okay (I hope that's okay!). Questions I'm aware of: - Should this use Motorola or gas syntax? (At the moment it uses Motorola syntax.) - The disassembler produces a table at runtime for disassembly generated from the code beads. Is this okay? (This is less than ideal but as I mentioned in my llvm-dev post, it's quite complicated to write a table-gen parser for code beads.) Depends on D98519 Depends on D98532 Depends on D98534 Depends on D98535 Depends on D98536 Differential Revision: https://reviews.llvm.org/D98537	2021-04-13 09:25:34 +01:00
Craig Topper	7c9bbbf735	[RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate. Prep work for adding intrinsics in the future. Left an assert that the input is constant in ReplaceNodeResults, as the intrinsic shouldn't go through that path.	2021-04-12 23:46:50 -07:00
Chen Zheng	80aa9b0f7b	[PowerPC] stop reverse mem op generation for some cases. We should consider the feeder user number when we do reverse memory operation transformation. Otherwise, we may get negative impact. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D100166	2021-04-12 22:41:28 -04:00
Freddy Ye	3fc1fe8db8	[X86] Support -march=rocketlake Reviewed By: skan, craig.topper, MaskRay Differential Revision: https://reviews.llvm.org/D100085	2021-04-13 09:48:13 +08:00
Fangrui Song	0a614fff4f	[ARM] Fix -Wmissing-field-initializers	2021-04-12 14:28:23 -07:00
Jian Cai	ed1734931a	Fix up build failures after `cfce5b26a8` Build log: https://lab.llvm.org/buildbot/#/builders/37/builds/3538 Differential Revision: https://reviews.llvm.org/D98916	2021-04-12 14:09:15 -07:00
Jian Cai	cfce5b26a8	[ARM] support symbolic expression as immediate in memory instructions Currently the ARM backend only accpets constant expressions as the immediate operand in load and store instructions. This allows the result of symbolic expressions to be used in memory instructions. For example, 0: .space 2048 strb r2, [r0, #(.-0b)] would be assembled into the following instructions. strb r2, [r0, #2048] This only adds support to ldr, ldrb, str, and strb in arm mode to address the build failure of Linux kernel for now, but should facilitate adding support to similar instructions in the future if the need arises. Link: https://github.com/ClangBuiltLinux/linux/issues/1329 Reviewed By: peter.smith, nickdesaulniers Differential Revision: https://reviews.llvm.org/D98916	2021-04-12 12:13:55 -07:00
Fraser Cormack	d737c47137	[RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates This patch adds more optimized codegen for the above SETCC forms, by matching the '.vi' vector forms when the immediate is a 5-bit signed immediate plus 1. The immediate can be decremented and the corresponding SET[U]LE or SET[U]GT forms can be matched. This work was left as a TODO from D94168. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100096	2021-04-12 18:36:45 +01:00
David Green	dd31b2c6e5	[ARM] Add a number of intrinsics for MVE lane interleaving Add a number of intrinsics which natively lower to MVE operations to the lane interleaving pass, allowing it to efficiently interleave the lanes of chucks of operations containing these intrinsics. Differential Revision: https://reviews.llvm.org/D97293	2021-04-12 17:23:02 +01:00
Simon Pilgrim	baadbe04bf	[X86] Fold cmpeq/ne(trunc(logic(x)),0) --> cmpeq/ne(logic(x),0) Fixes the issues noted in PR48768, where the and/or/xor instruction had been promoted to avoid i8/i16 partial-dependencies, but the test against zero had not. We can almost certainly relax this fold to work for any truncation, although it breaks a number of existing folds (notable movmsk folds which tend to rely on the truncate to determine the demanded bits/elts in the source vector). There is a reverse combine in TargetLowering.SimplifySetCC so we must wait until after legalization before attempting this.	2021-04-12 16:05:34 +01:00
Wang, Pengfei	4cbaaf4a24	[X86][AMX] Hoist ldtilecfg The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop. This patch try to calculate the nearest point where all shapes of AMX registers are reachable. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99010	2021-04-12 22:36:41 +08:00
David Green	6c0a1ed3a9	[ARM] Add FP handling for MVE lane interleaving FP16 to FP32 converts can be handled in MVE lane interleaving, much like the sext/zext lowering we do. This expands the pass with fpext and fptrunc handling, and basic fp operations allowing more efficient lowering of fp vectors. Differential Revision: https://reviews.llvm.org/D97292	2021-04-12 15:28:13 +01:00
Malhar Jajoo	58f3201a20	[ARM] Updates to arm-block-placement pass The patch makes two updates to the arm-block-placement pass: - Handle arbitrarily nested loops - Extends the search (for t2WhileLoopStartLR) to the predecessor of the preHeader. Differential Revision: https://reviews.llvm.org/D99649	2021-04-12 14:46:23 +01:00
Andrew Savonichev	f037b07b5c	Revert "[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant" This reverts commit `cca9b5985c`. Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir: * Bad machine code: Virtual register killed in block, but needed live out. * - function: indexed_2s - basic block: %bb.0 entry (0x640fee8) Virtual register %7 is used after the block. * Bad machine code: Virtual register defs don't dominate all uses. * - function: indexed_2s - v. register: %7 LLVM ERROR: Found 2 machine code errors.	2021-04-12 16:28:49 +03:00
Andrew Savonichev	cca9b5985c	[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner. FMUL_indexed is normally selected during instruction selection, but it does not work in cases when VDUP and VMUL are in different basic blocks. Differential Revision: https://reviews.llvm.org/D99662	2021-04-12 16:08:39 +03:00
Sebastian Neubauer	6cc91adf1e	[AMDGPU] Kill temporary register after restoring Not a correctness issue, but the temporary register is not used afterwards and should be dead. Differential Revision: https://reviews.llvm.org/D100295	2021-04-12 14:20:03 +02:00
Bradley Smith	f2593a0bd1	[AArch64][SVE] Remove redundant PTEST of MATCH/NMATCH results Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99584	2021-04-12 12:55:00 +01:00
Dmitry Preobrazhensky	67b39661c8	[AMDGPU][MC][NFC] Removed extra spaces Fixed bugs 49646, 49647. Differential Revision: https://reviews.llvm.org/D100173	2021-04-12 13:33:19 +03:00
Sebastian Neubauer	7a8e65dd3d	[AMDGPU] Fix ubsan error The RegScavenger can be null sometimes, so a pointer is needed. Fixes UBSan error introduced in `f9a8c6a0e5`.	2021-04-12 12:14:00 +02:00
Sebastian Neubauer	b76c2a6c2b	[AMDGPU] Fix saving fp and bp Spilling the fp or bp to scratch could overwrite VGPRs of inactive lanes. Fix that by using only the active lanes of the scavenged VGPR. This builds on the assumptions that 1. a function is never called with exec=0 2. lanes do not die in a function, i.e. exec!=0 in the function epilog 3. no new lanes are active when exiting the function, i.e. exec in the epilog is a subset of exec in the prolog. Differential Revision: https://reviews.llvm.org/D96869	2021-04-12 11:52:55 +02:00
Sebastian Neubauer	32bc9a9bc3	[AMDGPU] Unify spill code Instead of reimplementing spilling in prolog and epilog, reuse buildSpillLoadStore. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D99269	2021-04-12 11:19:08 +02:00
Sebastian Neubauer	f9a8c6a0e5	[AMDGPU] Save VGPR of whole wave when spilling Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead. The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR? If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes) Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains. Differential Revision: https://reviews.llvm.org/D96336	2021-04-12 11:01:38 +02:00
Stelios Ioannou	a655f250fe	[AArch64] Adds memory operands for indexed loads. This patch adds the memory operands for indexed loads so that certain optimizations can take place. Differential Revision: https://reviews.llvm.org/D100215/ Change-Id: I539fcf046ca4ad1e7df1d893f57d751419d8364d	2021-04-12 09:11:37 +01:00
Bing1 Yu	747111ea71	[X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D99244	2021-04-12 13:58:14 +08:00
Freddy Ye	5cb47be410	[X86] Remove FeatureCLWB from FeaturesICLClient Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100279	2021-04-12 12:08:59 +08:00
Qiu Chaofan	ece7345859	[PowerPC] Lower f128 SETCC/SELECT_CC as libcall if p9vector disabled XSCMPUQP is not available for pre-P9 subtargets. This patch will lower them into libcall for correct behavior on power7/power8. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D92083	2021-04-12 10:33:32 +08:00
Jim Lin	a3bfddbb6a	[RISCV][NFC] Remove unneeded explict XLenVT type on codegen patterns Customized SDNode has been specified the explict XLenVT type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100190	2021-04-12 10:16:06 +08:00
Craig Topper	cb4c793e46	[RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned. According to the 0.10 spec, VLEN is at least 128 bits and is a power of 2.	2021-04-11 17:54:23 -07:00
Craig Topper	ff902080a9	[RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64. We don't need the sign extending behavior here and SLLI/SRLI are able to compress to C.SLLI/C.SRLI.	2021-04-11 13:59:51 -07:00
Simon Pilgrim	231b87618b	[X86][AVX512] Fold not(kmov(x)) -> kmov(not(x)) and not(widen_subvector(x)) -> widen_subvector(not(x)) Improve AVX512 mask inversion, rG38c799bce801 exposed some missing opportunities to move scalar not() back onto the boolvector types for folding with setcc etc.	2021-04-11 20:07:09 +01:00
Thomas Lively	ea8dd3ee2e	[WebAssembly] Update v128.any_true In the final SIMD spec, there is only a single v128.any_true instruction, rather than one for each lane interpretation because the semantics do not depend on the lane interpretation. Differential Revision: https://reviews.llvm.org/D100241	2021-04-11 11:13:16 -07:00
Simon Pilgrim	13bdac5709	[X86] combineXor - Pull out repeated getOperand() calls. NFCI.	2021-04-11 19:01:59 +01:00
Simon Pilgrim	38c799bce8	[X86] Fold cmpeq/ne(and(X,Y),Y) --> cmpeq/ne(and(~X,Y),0) Followup to D100177, handle an similar (demorgan inverse style) case from PR47797 as well The AVX512 test cases could be further improved if we folded not(iX bitcast(vXi1)) -> (iX bitcast(not(vXi1))) Alive2: https://alive2.llvm.org/ce/z/AnA_-W	2021-04-11 18:42:01 +01:00
Craig Topper	3ae71226ef	[RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, vfwadd.wf and vfwsub.wf. The first source has the same EEW as the destination and the other source is a scalar so the overlap constraints don't apply to the unmasked version. For the masked version we have a constraint that the destination can't be V0 so that covers the only overlap issue there. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D100217	2021-04-11 10:19:45 -07:00
Craig Topper	bc0e052730	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported. Similar to what we do for zext.w. Disable the (srl (and X, 0xffff), C) custom isel when zext.h is available.	2021-04-11 10:03:35 -07:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00
Mitch Phillips	092f288d36	Revert "[AMDGPU] Remove MachineDCE after SIFoldOperands" This reverts commit `5a0117b2d0`. Reason: Dependent change `d19a42eba9` broke the ASan buildbots.	2021-04-09 15:47:44 -07:00
Mitch Phillips	3d4730a73f	Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs" This reverts commit `d19a42eba9`. Reason: Broke the ASan buildbots. See the original phabricator review for more details: https://reviews.llvm.org/D100188	2021-04-09 15:47:44 -07:00
Jessica Paquette	49c3565b9b	[AArch64][GlobalISel] Swap compare operands when it may be profitable This adds support for swapping comparison operands when it may introduce new folding opportunities. This is roughly the same as the code added to AArch64ISelLowering in `162435e7b5`. For an example of a testcase which exercises this, see llvm/test/CodeGen/AArch64/swap-compare-operands.ll (Godbolt for that testcase: https://godbolt.org/z/43WEMb) The idea behind this is that sometimes, we may be able to fold away, say, a shift or extend in a compare by swapping its operands. e.g. in the case of this compare: ``` lsl x8, x0, #1 cmp x8, x1 cset w0, lt ``` The following is equivalent: ``` cmp x1, x0, lsl #1 cset w0, gt ``` Most of the code here is just a reimplementation of what already exists in AArch64ISelLowering. (See `getCmpOperandFoldingProfit` and `getAArch64Cmp` for the equivalent code.) Note that most of the AND code in the testcase doesn't actually fold. It seems like we're missing selection support for that sort of fold right now, since SDAG happily folds these away (e.g testSwapCmpWithShiftedZeroExtend8_32 in the original .ll testcase) Differential Revision: https://reviews.llvm.org/D89422	2021-04-09 15:46:48 -07:00
Mitch Phillips	1a2756b777	Revert "[PowerPC] Add ROP Protection Instructions for PowerPC" This reverts commit `16fe741c69`. Reason: Broke the UBSan buildbots. More information available in the phabricator review: https://reviews.llvm.org/D99375	2021-04-09 13:36:41 -07:00
Jay Foad	5a0117b2d0	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Differential Revision: https://reviews.llvm.org/D100189	2021-04-09 20:41:09 +01:00
Jay Foad	d19a42eba9	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Differential Revision: https://reviews.llvm.org/D100188	2021-04-09 20:41:09 +01:00
Stefan Pintilie	5bca7cdafb	Add correct types to the xxsplti32dx pattern. Regiser types for xxsplti32dx for two td file patterns was incorrect. Fixed the two types and added a test case that was reduced from a larger failing test. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D100223	2021-04-09 14:11:34 -05:00
Thomas Lively	f30c429da6	[WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR When lowering a BUILD_VECTOR SDNode, we choose among various possible vector creation instructions in an attempt to minimize the total number of instructions used. We previously considered using swizzles, consts, and splats, and this patch adds shuffles as well. A common pattern that now lowers to shuffles is when two 64-bit vectors are concatenated. Previously, concatenations generally lowered to sequences of extract_lane and replace_lane instructions when they could have been a single shuffle. Differential Revision: https://reviews.llvm.org/D100018	2021-04-09 11:21:49 -07:00
Amara Emerson	40e75cafc0	[AArch64][GlobalISel] Fix incorrect codegen for <16 x s8> G_ASHR. Fixes PR49904	2021-04-09 10:41:41 -07:00
Stefan Pintilie	16fe741c69	[PowerPC] Add ROP Protection Instructions for PowerPC There are four new PowerPC instructions that are introduced in Power 10. They are hashst, hashchk, hashstp, hashchkp. These instructions will be used for ROP Protection. This patch adds the four instructions. Reviewed By: nemanjai, amyk, #powerpc Differential Revision: https://reviews.llvm.org/D99375	2021-04-09 12:09:01 -05:00
Simon Pilgrim	d8bc4de3cf	[X86] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) on non-BMI targets (PR44136) Followup to D100177, enable the fold for non-BMI targets as well.	2021-04-09 16:11:11 +01:00
Simon Pilgrim	245036950a	[X86][BMI] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) (PR44136) I've initially just enabled this for BMI which has the ANDN instruction for i32/i64 - the i16/i8 cases give an idea of what'd we get when we enable it in all cases (I'll do this as a later commit). Additionally, the i16/i8 cases could be freely promoted to i32 (as the args are already zeroext) and we could then make use of ANDN + the free cmp0 there as well - this has come up in PR48768 and PR49028 so I'm going to look at this soon. https://alive2.llvm.org/ce/z/QVWHP_ https://alive2.llvm.org/ce/z/pLngT- Vector cases do not appear to benefit from this as we end up with having to generate the zero vector as well - this is one of the reasons I didn't try to tie this into hasAndNot/hasAndNotCompare. Differential Revision: https://reviews.llvm.org/D100177	2021-04-09 15:52:03 +01:00
Jay Foad	a4ced03d34	[AMDGPU] SIFoldOperands: eagerly delete dead copies This is cheap to implement, means less work for future passes like MachineDCE, and slightly improves the folding in some cases. Differential Revision: https://reviews.llvm.org/D100117	2021-04-09 13:52:54 +01:00
Sebastian Neubauer	cc7add5298	[AMDGPU] Use SIInstrFlags for flat variants. NFC Use SIInstrFlags to differentiate between the different variants of flat instructions (flat, global and scratch). This should make it easier to bundle the immediate offset logic in a single place and implement restrictions and bug workarounds. Fixed version of D99587, which does not rely on the address space. Differential Revision: https://reviews.llvm.org/D99743	2021-04-09 12:28:36 +02:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Simon Pilgrim	3ae0a405fc	[X86] combineHorizOpWithShuffle - peek through one use bitcasts when decoding shuffles. Checking for one use, peek through bitcasts of the horizop args to allows us to merge shuffles of different widths through the horizop.	2021-04-09 10:51:04 +01:00
Sebastian Neubauer	36138db116	[AMDGPU] IsFlatScratch/Global -> FlatScratch/Global Remove 'Is' from IsFlatScratch/Global. NFC Differential Revision: https://reviews.llvm.org/D100108	2021-04-09 11:20:31 +02:00
Jim Lin	7eaa2810c4	[RISCV][NFC] Replace explicit type i64 with riscv customized SDTypeProfile. New SDTypeProfile can be reused for other word operation patterns without explicit i64 type in the future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100097	2021-04-09 17:06:17 +08:00
Jim Lin	6169f1537c	[RISCV][NFC] Fix formatting	2021-04-09 14:41:09 +08:00
Jim Lin	49c79e3b56	[RISCV][NFC] Add explicit type i64 to RV64 only patterns. Add explicit type i64 to RV64 only patterns to stop emitting unneeded i32 patterns. It can reduce the isel table size. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100089	2021-04-09 09:37:04 +08:00
David Blaikie	8294019633	Use default ref capture to avoid unused capture warning on assert-used variable	2021-04-08 17:37:55 -07:00
Craig Topper	872931e5d8	[RISCV] Use multiclass inheritance where possible for the VPat* multiclasses in RISVInstrInfoVPseudos. NFCI Instead of instantiating multiclasses inside multiclasses, just inherit from them. We can do the same for the VPseudo* multiclasses, but that may interfere with the scheduler class work.	2021-04-08 15:14:06 -07:00
Craig Topper	ac347a8a0f	[RISCV] Remove empty string after 'defm' at top level of vector .td files. NFC This doesn't do anything so it's just wasted characters. I have other plans for the ones in multiclasses.	2021-04-08 15:14:06 -07:00
Konstantin Zhuravlyov	4fae63c612	AMDGPU: Add gfx90c support to code object v2 for backwards compatibility Differential Revision: https://reviews.llvm.org/D100126	2021-04-08 16:42:43 -04:00
Stanislav Mekhanoshin	627dab3dbf	[AMDGPU] Check for all meta instrs in GCNRegBankReassign It used to work correctly even with a KILL, but there is no reason to consider meta instructions since they do not create real HW uses. Differential Revision: https://reviews.llvm.org/D100135	2021-04-08 13:41:10 -07:00
Stanislav Mekhanoshin	189310a140	[AMDGPU] Allow -amdgpu-unsafe-fp-atomics to ignore denorm mode Fixes: SWDEV-274276 Differential Revision: https://reviews.llvm.org/D100072	2021-04-08 12:46:36 -07:00
Wouter van Oortmerssen	04e9cd09c8	[WebAssembly] Fix for PIC external symbol ISEL wasm64 was missing DAG ISEL patterns for external symbol based global.get, but simply adding these analogous to the existing 32-bit versions doesn't work. This is because we are conflating the 32-bit global index with the pointer represented by the external symbol, which for wasm32 happened to work. The simplest fix is to pretend we have a 64-bit global index. This sounds incorrect, but is immaterial since once this index is stored as a MachineOperand it becomes 64-bit anyway (and has been all along). As such, the EmitInstrWithCustomInserter based implementation I experimented with become a no-op and no further changes in the C++ code are required. Differential Revision: https://reviews.llvm.org/D99904	2021-04-08 12:07:38 -07:00
Levy Hsu	461b554999	[RISCV] Add InstAlias for Zbb Zbp and Zbs extension Add InstAlias that allows the last operand to be an imm for following instructions: 1. Zbb or Zbp: - ror - rorw (RV64 Only) 2. Zbs - best - bclr - binv - bext Reviewed By: craig.topper, jrtc27 Differential Revision: https://reviews.llvm.org/D100083	2021-04-08 11:51:31 -07:00
Jay Foad	a1a372dfb5	[AMDGPU] SIFoldOperands: remove an unneeded isReg check. NFC.	2021-04-08 16:37:43 +01:00
Jay Foad	a250e91d10	[AMDGPU] SIFoldOperands: make use of emplace_back. NFC.	2021-04-08 14:34:10 +01:00
Jay Foad	2724b57ecd	[AMDGPU] SIFoldOperands: remove an unneeded make_early_inc_range. NFC.	2021-04-08 14:32:36 +01:00
Jay Foad	c28f79a0e3	[AMDGPU] SIFoldOperands: try harder to fold cndmask instructions Look through copies to find more cases where the two values being selected are identical. The motivation for this is just to be able to remove the weird special case where tryFoldCndMask was called from foldInstOperand, part way through folding a move-immediate into its users, without regressing any lit tests.	2021-04-08 14:26:12 +01:00
Jay Foad	3344cd3a14	[AMDGPU] SIFoldOperands: make tryFoldCndMask a member function. NFC.	2021-04-08 14:05:29 +01:00
Sebastian Neubauer	c10cc4ea27	[AMDGPU] Fix computing live registers in prolog ScratchExecCopy needs to be marked as live, we cannot use that register while EXEC is stored in there. Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as available is unnecessary, they should not be live at that point anway. Differential Revision: https://reviews.llvm.org/D100098	2021-04-08 14:52:50 +02:00
David Sherwood	1206313f82	[CodeGen][AArch64] Fix isel crash for truncating FP stores When attempting to truncate a FP vector and store the result out to memory we crashed because we had no pattern for truncating FP stores. In fact, we don't support these types of stores and the correct fix is to stop marking these truncating stores as legal. Tests have been added here: CodeGen/AArch64/sve-fptrunc-store.ll Differential Revision: https://reviews.llvm.org/D100025	2021-04-08 13:21:29 +01:00
Jay Foad	94a6fe43de	[AMDGPU] SIFoldOperands: refactor tryFoldCndMask with early-outs. NFC.	2021-04-08 13:16:07 +01:00
Mikael Holmen	2a1f87167c	[NVPTX] Fix compiler warning in NDEBUG build [NFC] Without the fix we get ../lib/Target/NVPTX/NVPTXLowerArgs.cpp:236:24: error: lambda capture 'Arg' is not used [-Werror,-Wunused-lambda-capture] auto IsALoadChain = [Arg](Value *Start) { ^~~ 1 error generated.	2021-04-08 13:21:21 +02:00
Fraser Cormack	a5693445ca	[RISCV] Support OR/XOR/AND reductions on vector masks This patch adds RVV codegen support for OR/XOR/AND reductions for both scalable- and fixed-length vector types. There are a few possible codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be used to some extent -- but the vpopc.m instruction was chosen since it produces the scalar result in one instruction, after which scalar instructions can finish off the computation. The reductions are lowered identically for both scalable- and fixed-length vectors, although some alternate strategies may be more optimal on fixed-length vectors since it's cheaper to get the length of those types. Other reduction types were not deemed to be relevant for mask vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100030	2021-04-08 09:46:38 +01:00
Hsiangkai Wang	ba72bdef32	[RISCV] Add scalable offset under very large stack size. If the stack size is larger than 12 bits, we have to use a scratch register to store the stack size. Before we introduce the scalable stack offset, we could simplify %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %0 = ADD %fp, %scratch However, if the offset contains scalable part, we need to consider it. %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %scratch = ADD %fp, %scratch %scalable_offset = ... # sequence of instructions for vscaled-offset. %0 = ADD/SUB %scratch, %scalable_offset Differential Revision: https://reviews.llvm.org/D100035	2021-04-08 14:46:05 +08:00
Serge Pavlov	65b1103798	[RISCV] DAG nodes and pseudo instructions for CSR access New custom DAG nodes were added to represent operations on CSR. These nodes are lowered to corresponding pseudo instruction. Using the pseudo instructions allows to specify different scheduling information for operations on different system registers. It also make possible to specify dependencies of instructions on specific system registers. Differential Revision: https://reviews.llvm.org/D98936	2021-04-08 10:36:36 +07:00
hsmahesha	ac64995ceb	[AMDGPU] Only use ds_read/write_b128 for alignment >= 16 PS: Submitting on behalf of Jay. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100008	2021-04-08 08:12:05 +05:30
Chen Zheng	74e77295e7	[PowerPC] fixup killed flags for ri + addi to ri transformation Fixup killed flags if DefMI and MI are not in the same basic blocks. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D100023	2021-04-07 22:04:08 -04:00
Stanislav Mekhanoshin	37878de503	Disable use of SCC bit from asm Differential Revision: https://reviews.llvm.org/D100069	2021-04-07 15:32:17 -07:00
Tony Tye	4658cd4c18	[AMDGPU] Update gfx90a memory model support Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100070	2021-04-07 22:17:58 +00:00
Stanislav Mekhanoshin	d5d412f2ae	[AMDGPU] Split GCNRegBankReassign Allow pass to work separately with SGPR, VGPR registers or both. This is NFC now but will be needed to split RA for separate SGPR and VGPR passes. Differential Revision: https://reviews.llvm.org/D100063	2021-04-07 14:45:13 -07:00
Craig Topper	56ea2e2fdd	[RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition. If the constants have a difference of 1 we can convert one to the other by adding or subtracting the condition. We have a DAG combine for this, but it only runs before type legalization. If the select is introduced later during type legalization or op legalization we will miss it. We don't need a specific condition, but some conditions are harder to materialize than others on RISCV. I know that SETLT will be a single instruction and it is what is used by the motivating pattern from signed saturating add/sub. Differential Revision: https://reviews.llvm.org/D99021	2021-04-07 13:47:17 -07:00
Craig Topper	9895285191	[RISCV] Replace 'return ReplaceNode' with 'ReplaceNode; return;' NFC ReplaceNode is a void function as is the function that we were doing this in. While this is valid code, it was a bit confusing.	2021-04-07 12:18:41 -07:00
Jonas Hahnfeld	6415f424bc	[AArch64] Materialize FP constant in code for large code model When using the large code model with FastISel (for example via clang -O0 which adds the optnone attribute), FP constants could still be materialized using adrp + ldr. Unconditionally enable the existing path for MachO to materialize the constant in code. For testing, restore literal_pools_float.ll to exercise the constant pool and add two optnone-functions that return a float and a double, respectively. Consolidate fpimm.ll and add a new fast-isel-fpimm.ll to check the code paths taken with FastISel. Differential Revision: https://reviews.llvm.org/D99607	2021-04-07 21:02:05 +02:00
Craig Topper	f087d7544a	[RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32. This can't use our normal strategy of splatting the scalar and using a .vv operation instead of .vx. Instead this patch bitcasts the vector to the equivalent SEW=32 vector and inserts the scalar parts using two vslide1up/down. We do that unmasked and apply the mask separately at the end with a vmerge. For vslide1up there maybe some other options here like getting i64 into element 0 and using vslideup.vi with this vector as vd and the original source as vs1. Masking would still need to be done afterwards. That idea doesn't work for vslide1down. We need to slidedown and then insert a single scalar at vl-1 which we could do with a vslideup, but that assumes vl > 0 which I don't think we can assume. The i32 double slide1down implemented here is the best I could come up with and I just made vslide1up consistent. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99910	2021-04-07 10:44:53 -07:00
Sebastian Neubauer	2dc6be5209	[AMDGPU] Update SGPRSpillVGPRCSR name. NFC The struct is used for both, callee and caller-save registers now. The frame index is not set for entrypoints, as we do not need to save the registers then. Update the struct name to reflect that. Differential Revision: https://reviews.llvm.org/D99722	2021-04-07 16:30:40 +02:00
Simon Pilgrim	302e748065	[X86] Improve optimizeCompareInstr for signed comparisons after AND/OR/XOR instructions Extend D94856 to handle 'and', 'or' and 'xor' instructions as well We still fail on many i8/i16 cases as the test and the logic-op are performed on different widths	2021-04-07 14:28:42 +01:00
Jay Foad	bf6cab6f07	[AMDGPU] SIFoldOperands: don't dump extra '\n' after MachineInstr. NFC.	2021-04-07 14:13:00 +01:00
Simon Pilgrim	583258723f	[X86] Improve optimizeCompareInstr for signed comparisons after BZHI instructions Extend D94856 to handle 'bzhi' instructions as well	2021-04-07 12:07:26 +01:00
Qiu Chaofan	033c9c2552	[PowerPC] Fix use check of swap-reduction This will fix swap-reduction in DAGISel for cases where COPY_TO_REGCLASS has multiple uses.	2021-04-07 15:55:52 +08:00
Craig Topper	01a23dccb1	[RISCV] Add an assertion to the ReplaceNodeResults handling of bitcasts to make sure the VT is always a scalar integer.	2021-04-06 16:48:40 -07:00
Nicolás Alvarez	a1aada75f5	[docs] Fix doxygen comments wrongly attached to the llvm namespace Looking at the Doxygen-generated documentation for the llvm namespace currently shows all sorts of random comments from different parts of the codebase. These are mostly caused by: - File doc comments that aren't marked with \file, so they're attached to the next declaration, which is usually "namespace llvm {". - Class doc comments placed before the namespace rather than before the class. - Code comments before the namespace that (in my opinion) shouldn't be extracted by doxygen at all. This commit fixes these comments. The generated doxygen documentation now has proper docs for several classes and files, and the docs for the llvm and llvm::detail namespaces are now empty. Reviewed By: thakis, mizvekov Differential Revision: https://reviews.llvm.org/D96736	2021-04-07 01:20:18 +02:00
Craig Topper	2641c1f15e	[RISCV] Don't custom type legalize fixed vector to scalar integer bitcasts if the fixed vector type isn't legal. We encountered a hang in our internal code base. I'm having trouble creating a test case because the test that hit it was testing some code that is not upstream.	2021-04-06 15:00:33 -07:00
Artem Belevich	d0615a93bb	[NVPTX] Handle bitcast and ASC(101) when trying to avoid argument copy. This allows us to skip the copy in few more cases. Differential Revision: https://reviews.llvm.org/D99979	2021-04-06 13:06:00 -07:00
Amy Kwan	bd6033eca7	[PowerPC] Materialize 34-bit constants with pli directly Previously, 34-bit constants were materialized in selectI64Imm(), and we relied on td pattern matching to instead produce a pli. This becomes problematic as there is no guarantee that the 34-bit constant will reach the td pattern selection for pli. It is also possible for other transformations (such as complex bit permutations) to also produce and utilize the 34-bit constant materialized through selectI64Imm(). This patch instead produces pli on Power10 directly whenever the constant fits within 34-bits. Differential Revision: https://reviews.llvm.org/D99906	2021-04-06 13:38:11 -05:00
Craig Topper	3ae03f67fe	[RISCV] Add helper function to share some of the code for isel of vector load/store intrinsics. Many of the operands are handled the same or in the same order for all these intrinsics. Factor out the code for selecting and pushing them into the Operands vector. Differential Revision: https://reviews.llvm.org/D99923	2021-04-06 09:54:24 -07:00
Jay Foad	8f798566a3	[AMDGPU] SIFoldOperands: use isUseMIInFoldList. NFC.	2021-04-06 17:53:48 +01:00
Simon Pilgrim	53283cc2f1	[X86][SSE] canonicalizeShuffleWithBinOps - add MOVSD/MOVSS handling.	2021-04-06 16:42:18 +01:00
Konstantin Zhuravlyov	844012940e	AMDGPU: Add isBranch=1 to SOPP branch instructions Differential Revision: https://reviews.llvm.org/D99955	2021-04-06 10:59:30 -04:00
Jay Foad	efc7bf27f5	[AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	005dcd196e	[AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	ce9cca6c3a	[AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask This follows the pattern of the other tryFold* functions. NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	cf4f5292f6	[AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef We are in SSA so getVRegDef is equivalent but simpler. NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	e9608a84d8	[AMDGPU][SDag] Add IMG init also for image_gather4 instructions This fixes an oversight in D99747 which moved the IMG init code from SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the hasPostISelHook flag on gather4 instructions. Differential Revision: https://reviews.llvm.org/D99953	2021-04-06 14:47:20 +01:00
Simon Pilgrim	1dcb5b5e89	[X86] Improve optimizeCompareInstr for signed comparisons after ANDN instructions Extend D94856 to handle 'andn' instructions as well	2021-04-06 14:16:16 +01:00
Dmitry Preobrazhensky	3eadcb86ab	[AMDGPU][MC][GFX9] Corrected SMEM decoding Corrected SMEM decoding when IMM=0 and OFFSET>127 Fixed bug 49819 (https://bugs.llvm.org/show_bug.cgi?id=49819) Differential Revision: https://reviews.llvm.org/D99804	2021-04-06 14:10:46 +03:00
Simon Pilgrim	201877d572	[CostModel][X86] Improve accuracy of vXi8 multiply reduction costs After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....). Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.	2021-04-06 11:53:22 +01:00
Simon Pilgrim	ddbb58736a	[KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI. As promised in D98866	2021-04-06 10:11:41 +01:00
Sjoerd Meijer	d5f1131c81	[AArch64] Default to zero-cycle-zeroing FP registers It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this is most efficient across all cores; it is recognised as a zeroing idiom. For newer cores, fmov instructions can also be eliminated early and there is no difference with movi, but some implementations lack this so is not true for other/older cores. Thus this standardises on using movi as this should always gives the same or better performance than the fmov with wzr. Differential Revision: https://reviews.llvm.org/D99586	2021-04-06 09:47:50 +01:00
Sjoerd Meijer	ef05b08c61	[AArch64] Use 64-bit movi for zeroing halfs/floats This was using the .2d variant which zeros 128 bits, but using the .2s variant that zeros 64 bits is faster on some cores. This is a prep step for D99586 to always using movi for zeroing floats. Differential Revision: https://reviews.llvm.org/D99710	2021-04-06 08:42:13 +01:00
Craig Topper	cb1028a0b9	[RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register. I missed a few intrinsics in `3dd4aa7d09` when I did this for masked loads and masked segment loads/stores. Found while trying to share more code between these custom isel functions.	2021-04-05 21:28:32 -07:00
Craig Topper	780a47285a	[RISCV] Add SDTCisInt to the SDTRVVSlide1 since it is only used for vslide1up.vx/vslide1down.vx. The scalar type is already marked as XLenVT. The floating point version would need a different rule.	2021-04-05 13:03:39 -07:00
Craig Topper	af2837675a	[RISCV] Split RISCVISD::VMV_S_XF_VL into separate integer and FP. It's a bit silly, but it allows us to write stricter type constraints for isel. There's still some extra type checks in the generated table due to some type interference limitations around HWMode.	2021-04-05 12:57:35 -07:00
Craig Topper	7edda698c0	[RISCV] Move VSLIDE1UP_VX pattern out of a loop that includes FP types. FP would need VFSLIDE1UP_VF which uses an FP register.	2021-04-05 12:05:54 -07:00
Ricky Taylor	4db18d62af	[M68k] Add support for Motorola literal syntax to AsmParser These look like $00A0cf for hex and %001010101 for binary. They are used in Motorola assembly syntax. Differential Revision: https://reviews.llvm.org/D98519	2021-04-05 20:02:29 +01:00
Fraser Cormack	af3a839c70	[RISCV] Add support for bitcasts between scalars and fixed-length vectors This patch supports bitcasts from scalar types to fixed-length vectors and vice versa. It custom-lowers and custom-legalizes them to EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element vectors to hold the scalar where appropriate. Previously, some of these would fail to select, others would be expanded through stack loads and stores. Effort was made to ensure the codegen avoids the stack for both legal and illegal scalar types. Some of the codegen could be improved, but on first glance it looks like a general optimization of EXTRACT_VECTOR_ELT when extracting an i64 element on RV32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99667	2021-04-05 17:21:55 +01:00
John Paul Adrian Glaubitz	62a94b725c	[M68k] Mark public functions with the LLVM_EXTERNAL_VISIBILITY macro In `0dbcb36394`, most most target symbols were made hidden by default with the public ones marked with LLVM_EXTERNAL_VISIBILITY. When the M68k target was added, this particular change was forgotten so that external tools cannot make use of the public M68k target functions in libLLVM.so. Thus, add the missing LLVM_EXTERNAL_VISIBILITY macro to all public target functions in the M68k backend. Differential Revision: https://reviews.llvm.org/D99869	2021-04-05 09:24:30 -07:00
Fraser Cormack	3f0df4d7b0	[RISCV] Expand scalable-vector truncstores and extloads Caught in internal testing, these operations are assumed legal by default, even for scalable vector types. Expand them back into separate truncations and stores, or loads and extensions. Also add explicit fixed-length vector tests for these operations, even though they should have been correct already. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99654	2021-04-05 17:03:45 +01:00
Simon Pilgrim	36d4f6d7f8	[X86] Fold xor(zext(xor(x,c1)),c2) -> xor(zext(x),xor(zext(c1),c2)) Fixes PR47603 (second case) by extending rG89afec348dbd3e5078f176e978971ee2d3b5dec8	2021-04-05 11:40:37 +01:00
Craig Topper	4708a05da0	[RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled. The W version of orc.b does not exist in Zbp so we need to use gorci encoding. If we have Zbp, we can use gorciw which can avoid a sext.w in some cases.	2021-04-04 17:14:28 -07:00
Craig Topper	98d5db3e3a	[RISCV] Lower orc.b intrinsic to RISCVISD::GORCI. This will allow us to share any future known bits, demaned bits, or sign bits improvements.	2021-04-04 12:31:41 -07:00
Craig Topper	a2ea003fcb	[RISCV] Don't convert fshr/fshl to target specific FSL/FSR node if shift amount is a constant. As long as it's a constant we can directly pattern match it without any problems. It's only when it isn't a constant that we need to add an AND. In theory this should allow more target independent optimizations to remain active.	2021-04-03 23:13:30 -07:00
Roman Lebedev	7727cc242d	[NFC][X86] Split VPMOV* AVX2 instructions into their own sched class At least on all three Zen's, all such instructions cleanly map into this new class with no overrides needed.	2021-04-03 22:39:07 +03:00
Nikita Popov	665065821e	[FastISel] Remove kill tracking This is a followup to D98145: As far as I know, tracking of kill flags in FastISel is just a compile-time optimization. However, I'm not actually seeing any compile-time regression when removing the tracking. This probably used to be more important in the past, before FastRA was switched to allocate instructions in reverse order, which means that it discovers kills as a matter of course. As such, the kill tracking doesn't really seem to serve a purpose anymore, and just adds additional complexity and potential for errors. This patch removes it entirely. The primary changes are dropping the hasTrivialKill() method and removing the kill arguments from the emitFast methods. The rest is mechanical fixup. Differential Revision: https://reviews.llvm.org/D98294	2021-04-03 15:50:13 +02:00
Simon Pilgrim	89afec348d	[X86] Fold xor(truncate(xor(x,c1)),c2) -> xor(truncate(x),xor(truncate(c1),c2)) Fixes PR47603 This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.	2021-04-03 12:43:05 +01:00
Simon Pilgrim	7c17f1ea84	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper (REAPPLIED) Use the getTargetShuffleInputs helper for all shuffle decoding Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.	2021-04-03 11:59:19 +01:00
Levy Hsu	f78d932cf2	[RISCV] Add IR intrinsics for Zbc extension Head files are included in a separate patch in case the name needs to be changed. RV32 / 64: clmul clmulh clmulr Differential Revision: https://reviews.llvm.org/D99711	2021-04-02 12:09:13 -07:00
Levy Hsu	944adbf285	Recommit "[RISCV] Add IR intrinsic for Zbb extension" Forgot to amend the Author. Original commit message: Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b Differential Revision: https://reviews.llvm.org/D99320	2021-04-02 11:50:19 -07:00
Craig Topper	1f0b309f24	Revert "[RISCV] Add IR intrinsic for Zbb extension" This reverts commit `1808194590`. I forgot to change the author.	2021-04-02 11:47:02 -07:00
Craig Topper	1808194590	[RISCV] Add IR intrinsic for Zbb extension Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b	2021-04-02 11:23:57 -07:00
Levy Hsu	b001d574d7	[RISCV] Add IR intrinsic for Zbr extension Implementation for RISC-V Zbr extension intrinsic. Header files are included in separate patch in case the name needs to be changed RV32 / 64: crc32b crc32h crc32w crc32cb crc32ch crc32cw RV64 Only: crc32d crc32cd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99009	2021-04-02 10:58:45 -07:00
Craig Topper	d7ffa82a8e	[RISCV] Improve 64-bit integer constant materialization for more cases. For positive constants we try shifting left to remove leading zeros and fill the bottom bits with 1s. We then materialize that constant shift it right. This patch adds a new strategy to try filling the bottom bits with zeros instead. This catches some additional cases.	2021-04-02 10:18:08 -07:00
Brendon Cahoon	09a88278cb	[GlobalISel] Allow different types for G_SBFX and G_UBFX operands Change the definition of G_SBFX and G_UBFX so that the lsb and width can have different types than the src and dst operands. Differential Revision: https://reviews.llvm.org/D99739	2021-04-02 11:11:06 -04:00
Nico Weber	fa0aff6d69	Revert "[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper" This reverts commit `500969f1d0`. Makes clang assert compiling avx2 code, see https://bugs.chromium.org/p/chromium/issues/detail?id=1195353#c4 for a standalone repro.	2021-04-02 09:55:55 -04:00
Jun Ma	274ac9d40e	[AArch64][SVE] Lowering sve.dot to DOT node Differential Revision: https://reviews.llvm.org/D99699	2021-04-02 20:05:17 +08:00
Jun Ma	ab3c5fb282	[NFC][SVE] Use SVE_4_Op_Imm_Pat for sve_intx_dot_by_indexed_elem	2021-04-02 20:05:17 +08:00
Simon Pilgrim	500969f1d0	[X86][SSE] isHorizontalBinOp - use getTargetShuffleInputs helper Use the getTargetShuffleInputs helper for all shuffle decoding	2021-04-02 11:50:18 +01:00
Fraser Cormack	3b48d849d4	[RISCV] Optimize more redundant VSETVLIs D99717 introduced some test cases which showed that the output of one vsetvli into another would not be picked up by the RISCVCleanupVSETVLI pass. This patch teaches the optimization about such a pattern. The pattern is quite common when using the RVV vsetvli intrinsic to pass the VL onto other intrinsics. The second test case introduced by D99717 is left unoptimized by this patch. It is a rarer case and will require us to rewire any uses of the redundant vset[i]vli's output to the previous one's. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99730	2021-04-02 10:04:07 +01:00
Yang Fan	bc6001ce1e	[X86] Fix -Wunused-function warning (NFC) GCC warning: ``` /llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:9212:13: warning: ‘bool isHorizOp(unsigned int)’ defined but not used [-Wunused-function] 9212 \| static bool isHorizOp(unsigned Opcode) { \| ^~~~~~~~~ ```	2021-04-02 09:38:12 +08:00
Craig Topper	766d27dc85	[RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands. This occurs when we type legalize an i64 scalar input on RV32. We need to manually splat, which requires a vector input. Rather than special case this in lowering just pattern match it.	2021-04-01 14:10:21 -07:00
David Green	da98177cda	[ARM] Allow v6m runtime loop unrolling This removes the restriction that only Thumb2 targets enable runtime loop unrolling, allowing it for Thumb1 only cores as well. The existing T2 heuristics are used (for the time being) to control when and how unrolling is performed. Differential Revision: https://reviews.llvm.org/D99588	2021-04-01 21:21:40 +01:00
Craig Topper	dbbc95e3e5	[RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat. The default legalization strategy is PromoteFloat which keeps half in single precision format through multiple floating point operations. Conversion to/from float is done at loads, stores, bitcasts, and other places that care about the exact size being 16 bits. This patches switches to the alternative method softPromoteHalf. This aims to keep the type in 16-bit format between every operation. So we promote to float and immediately round for any arithmetic operation. This should be closer to the IR semantics since we are rounding after each operation and not accumulating extra precision across multiple operations. X86 is the only other target that enables this today. See https://reviews.llvm.org/D73749 I had to update getRegisterTypeForCallingConv to force f16 to use f32 when the F extension is enabled. This way we can still pass it in the lower bits of an FPR for ilp32f and lp64f ABIs. The softPromoteHalf would otherwise always give i16 as the argument type. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D99148	2021-04-01 12:41:57 -07:00
Martin Storsjö	4391d764e1	[ARM] Remove an unused parameter in ARMWinCOFFObjectWriter. NFC. This writer only ever operates on 32 bit arm code. Differential Revision: https://reviews.llvm.org/D99575	2021-04-01 21:25:41 +03:00
Nick Desaulniers	52338af569	[MC][ARM] add .w suffixes for RSB/RSBS T1 See also: F5.1.167 RSB, RSBS (register) T1 shift or rotate by value variant of the Arm ARM. Link: https://github.com/ClangBuiltLinux/linux/issues/1309 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D99542	2021-04-01 10:45:37 -07:00
Craig Topper	d157e3f387	[RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32. We need to splat the scalar separately and use .vv, but there is no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with swapped operands. We also need to get VT to use for the splat from an operand rather than the result since the result VT is nxvXi1. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99704	2021-04-01 10:38:05 -07:00

... 3 4 5 6 7 ...

62465 Commits