llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	ef443390a9	[AMDGPU] Remove MachineDCE after SIFoldOperands Remove the MachineDCE pass after the first SIFoldOperands pass now that SIFoldOperands deletes its own dead instructions. Reapply after fixing dependent change D100188. Differential Revision: https://reviews.llvm.org/D100189	2021-04-19 12:08:02 +01:00
Jay Foad	323ef0eb45	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Reapply with a fix for using InstToErase after it had been erased. Differential Revision: https://reviews.llvm.org/D100188	2021-04-19 12:05:41 +01:00
Cullen Rhodes	f0bc2782f2	[TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100377	2021-04-19 11:01:34 +00:00
Dmitry Preobrazhensky	bcc29e0fcf	[AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3 Disabled constants as carry in/out operands. See bug 48711. Differential Revision: https://reviews.llvm.org/D100642	2021-04-19 13:42:31 +03:00
Roman Lebedev	df9597cf5a	[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap This is similar to the subvector extractions, except that the 0'th subvector isn't free to insert, because we generally don't know whether or not the upper elements need to be preserved: https://godbolt.org/z/rsxP5W4sW This is needed to avoid regressions in D100684 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100698	2021-04-19 13:24:58 +03:00
Fraser Cormack	c9a93c3e01	[RISCV] Lower vector shuffles to vrgather operations This patch extends the lowering of RVV fixed-length vector shuffles to avoid the default stack expansion and instead lower to vrgather instructions. For "permute"-style shuffles where one vector is swizzled, we can lower to one vrgather. For shuffles involving two vector operands, we lower to one unmasked vrgather (or splat, where appropriate) followed by a masked vrgather which blends in the second half. On occasion, when it's not possible to create a legal BUILD_VECTOR for the indices, we use vrgatherei16 instructions with 16-bit index types. For 8-bit element vectors where we may have indices over 255, we have a fairly blunt fallback to the stack expansion to avoid custom-splitting of the vector types. To enable the selection of masked vrgather instructions, this patch extends the various RISCVISD::VRGATHER nodes to take a passthru operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100549	2021-04-19 11:13:13 +01:00
Yaxun (Sam) Liu	3597f02fd5	[AMDGPU] Add GlobalDCE before internalization pass The internalization pass only internalizes global variables with no users. If the global variable has some dead user, the internalization pass will not internalize it. To be able to internalize global variables with dead users, a global dce pass is needed before the internalization pass. This patch adds that. Reviewed by: Artem Belevich, Matt Arsenault Differential Revision: https://reviews.llvm.org/D98783	2021-04-17 11:25:25 -04:00
Serge Guelton	d6de1e1a71	Normalize interaction with boolean attributes Such attributes can either be unset, or set to "true" or "false" (as string). throughout the codebase, this led to inelegant checks ranging from if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") to if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true") Introduce a getValueAsBool that normalize the check, with the following behavior: no attributes or attribute set to "false" => return false attribute set to "true" => return true Differential Revision: https://reviews.llvm.org/D99299	2021-04-17 08:17:33 +02:00
Nemanja Ivanovic	ff769dd111	[PowerPC] Minor improvement for insert_vector_elt codegen For v2f64, all VSX subtargets can insert an element with a single XXPERMDI.	2021-04-16 18:52:37 -05:00
Joe Nash	a0ed70abde	[AMDGPU] Remove redundant field from DPP8 def These lines set the value to what it already was, so they are redundant. NFC Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100664 Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3	2021-04-16 16:23:52 -04:00
Joe Nash	919236e608	[AMDGPU] NFC, Comment in disassembler for dpp8 Gives reasoning for convertDPP8. Also corrects typo in Operand type comment. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100665 Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4	2021-04-16 16:21:47 -04:00
Thomas Lively	5c729750a6	[WebAssembly] Remove saturating fp-to-int target intrinsics Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead. This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u}, which are now represented in IR as a saturating truncation to a v2i32 followed by a concatenation with a zero vector. Differential Revision: https://reviews.llvm.org/D100596	2021-04-16 12:11:20 -07:00
Christudasan Devadasan	97618522dc	[AMDGPU] Remove dead dcode (NFC).	2021-04-16 23:03:31 +05:30
Joe Nash	7cc4a02fa2	[AMDGPU] Refactor VOP3P Profile and AsmParser, NFC Refactors VOP3P tablegen and the AsmParser for VOP3P for better extensibility. NFC intended Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100602 Change-Id: I038e3a772ac348bb18979cdf3e3ae2e9476dd411	2021-04-16 13:06:50 -04:00
Malhar Jajoo	093f1828e5	[ARM] Prevent phi-node-elimination from generating copy above t2WhileLoopStartLR This patch prevents phi-node-elimination from generating a COPY operation for the register defined by t2WhileLoopStartLR, as it is a terminator that defines a value. This happens because of the presence of phi-nodes in the loop body (the Preheader of which is the block containing the t2WhileLoopStartLR). If this is not done, the COPY is generated above/before the terminator (t2WhileLoopStartLR here), and since it uses the value defined by t2WhileLoopStartLR, MachineVerifier throws a 'use before define' error. This essentially adds on to the change in differential D91887/D97729. Differential Revision: https://reviews.llvm.org/D100376	2021-04-16 16:45:07 +01:00
Roman Lebedev	b06c55a698	[X86][CostModel] Fix cost model for non-power-of-two vector load/stores Sometimes LV has to produce really wide vectors, and sometimes they end up being not powers of two. As it can be seen from the diff, the cost computation is currently completely non-sensical in those cases. Instead of just scalarizing everything, split/factorize the wide vector into a number of subvectors, each one having a power-of-two elements, recurse to get the cost of op on this subvector. Also, check how we'd legalize this subvector, and if the legalized type is scalar, also account for the scalarization cost. Note that for sub-vector loads, we might be able to do better, when the vectors are properly aligned. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100099	2021-04-16 15:30:57 +03:00
David Green	00a6045473	[ARM] Combine sub 0, csinc X, Y, CC -> csinv -X, Y, CC Combine sub 0, csinc X, Y, CC to csinv -X, Y, CC providing that the negation of X is cheap, currently just handling constants. This comes up during the splat of an i1 to a predicate, where we now generate csetm, as opposed to cset; rsb. Differential Revision: https://reviews.llvm.org/D99940	2021-04-16 11:52:31 +01:00
Nick Desaulniers	bb7016f8f5	[Aarch64] handle "o" inline asm memory constraints This Linux kernel is making use of this inline asm constraint which is causing an ICE. PR49956 Link: https://github.com/ClangBuiltLinux/linux/issues/1348 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D100412	2021-04-15 23:36:21 -07:00
Jim Lin	2893570e86	[RISCV] Don't emit save-restore call if function is a interrupt handler It has to save all caller-saved registers before a call in the handler. So don't emit a call that save/restore registers. Reviewed By: simoncook, luismarques, asb Differential Revision: https://reviews.llvm.org/D100532	2021-04-16 12:54:47 +08:00
hsmahesha	099dcb68a6	[AMDGPU] Refactor ds_read/ds_write related select code for better readability. Part of the code related to ds_read/ds_write ISel is refactored, and the corresponding comment is re-written for better readability, which would help while implementing any future ds_read/ds_write ISel related modifications. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D100300	2021-04-16 08:24:00 +05:30
Momchil Velikov	f9d932e673	[clang][AArch64] Correctly align HFA arguments when passed on the stack When we pass a AArch64 Homogeneous Floating-Point Aggregate (HFA) argument with increased alignment requirements, for example struct S { __attribute__ ((__aligned__(16))) double v[4]; }; Clang uses `[4 x double]` for the parameter, which is passed on the stack at alignment 8, whereas it should be at alignment 16, following Rule C.4 in AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules) Currently we don't have a way to express in LLVM IR the alignment requirements of the function arguments. The align attribute is applicable to pointers only, and only for some special ways of passing arguments (e..g byval). When implementing AAPCS32/AAPCS64, clang resorts to dubious hacks of coercing to types, which naturally have the needed alignment. We don't have enough types to cover all the cases, though. This patch introduces a new use of the stackalign attribute to control stack slot alignment, when and if an argument is passed in memory. The attribute align is left as an optimizer hint - it still applies to pointer types only and pertains to the content of the pointer, whereas the alignment of the pointer itself is determined by the stackalign attribute. For byval arguments, the stackalign attribute assumes the role, previously perfomed by align, falling back to align if stackalign` is absent. On the clang side, when passing arguments using the "direct" style (cf. `ABIArgInfo::Kind`), now we can optionally specify an alignment, which is emitted as the new `stackalign` attribute. Patch by Momchil Velikov and Lucas Prates. Differential Revision: https://reviews.llvm.org/D98794	2021-04-15 22:58:14 +01:00
Stanislav Mekhanoshin	13015ebd6f	[AMDGPU] Factor out predicate FmaakFmamkF32Insts Differential Revision: https://reviews.llvm.org/D100409	2021-04-15 12:29:16 -07:00
Stanislav Mekhanoshin	d4385e483d	[AMDGPU] Add new EmitDstSel field to VOPPofile. NFC. Differential Revision: https://reviews.llvm.org/D100589	2021-04-15 12:07:08 -07:00
hsmahesha	82787eb228	[AMDGPU] Move LDS lowering related utility functions to a separate utils file. Move some utility functions which are used within LDS lowering pass to a separate utils file so that other LDS related passes can make use of them when required. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100526	2021-04-16 00:15:48 +05:30
Krzysztof Parzyszek	280678122d	[Hexagon] Avoid infinite loops in type legalization when lowering SETCC Only widen SETCC if the operands can be widened. Not checking that caused infinite widen-split loops in legalization.	2021-04-15 13:34:37 -05:00
Craig Topper	1656df13da	[RISCV] Share RVInstIShift and RVInstIShiftW instruction format classes with the B extension. This generalizes RVInstIShift/RVInstIShiftW to take the upper 5 or 7 bits of the immediate as an input instead of only bit 30. Then we can share them. For RVInstIShift I left a hardcoded 0 at bit 26 where RV128 gets a 7th bit for the shift amount. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100424	2021-04-15 11:08:28 -07:00
Arthur Eubanks	c8f0a7c215	[NewPM] Cleanup IR printing instrumentation Being lazy with printing the banner seems hard to reason with, we should print it unconditionally first (it could also lead to duplicate banners if we have multiple functions in -filter-print-funcs). The printIR() functions were doing too many things. I separated out the call from PrintPassInstrumentation since we were essentially doing two completely separate things in printIR() from different callers. There were multiple ways to generate the name of some IR. That's all been moved to getIRName(). The printing of the IR name was also inconsistent, now it's always "IR Dump on $foo" where "$foo" is the name. For a function, it's the function name. For a loop, it's what's printed by Loop::print(), which is more detailed. For an SCC, it's the list of functions in parentheses. For a module it's "[module]", to differentiate between a possible SCC with a function called "module". To preserve D74814, we have to check if we're going to print anything at all first. This is unfortunate, but I would consider this a special case that shouldn't be handled in the core logic. Reviewed By: jamieschmeiser Differential Revision: https://reviews.llvm.org/D100231	2021-04-15 09:50:55 -07:00
Stefan Pintilie	f28cb01be0	[PowerPC] Add ROP Protection Instructions for PowerPC There are four new PowerPC instructions that are introduced in Power 10. They are hashst, hashchk, hashstp, hashchkp. These instructions will be used for ROP Protection. This patch adds the four instructions. Reviewed By: nemanjai, amyk, #powerpc Differential Revision: https://reviews.llvm.org/D99375	2021-04-15 11:38:38 -05:00
Sebastian Neubauer	7842e1725e	[AMDGPU] Fix large return values with amdgpu_gfx Returning in memory is not supported, so fall back to sret. Also, extend i1 and i16 to i32. Otherwise, they would be passed through memory. Differential Revision: https://reviews.llvm.org/D100543	2021-04-15 14:57:56 +02:00
Simon Pilgrim	9d57a77b81	[X86] combineCMP - fold cmpEQ/NE(TRUNC(X),0) -> cmpEQ/NE(X,0) If we are truncating from a i32 source before comparing the result against zero, then see if we can directly compare the source value against zero. If the upper (truncated) bits are known to be zero then we can compare against that, hopefully increasing the chances of us folding the compare into a EFLAG result of the source's operation. Fixes PR49028. Differential Revision: https://reviews.llvm.org/D100491	2021-04-15 13:55:51 +01:00
Bradley Smith	22c017f0f9	[AArch64][NEON] Match (or (and -a b) (and (a+1) b)) => bit select With this patch vbslq_f32(vnegq_s32(a), b, c) lowers to a BIT instruction. Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D100304	2021-04-15 13:52:47 +01:00
Florian Hahn	acd9cc7495	[AArch64] Use type-legalization cost for code size memop cost. At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory operations on large vectors have a cost of one, even if they will get expanded to a large number of memory operations in the backend. This patch updates getMemoryOpCost to return the cost for the type legalization for both CodeSize and SizeAndLatency. This should more accurately reflect the number of memory operations required. I am not sure how latency should properly be included in SizeAndLatency from the description, but returning the size cost should be clearly more accurate. This does not cause any binary changes when building MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because large vector memops are not really formed by code emitted from Clang. But using the C/C++ matrix extension can easily result in code with very large vector operations directly from Clang, e.g. https://clang.godbolt.org/z/6xzxcTGvb Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100291	2021-04-15 10:11:05 +01:00
Martin Storsjö	5144f730a8	[AArch64] Fix windows vararg functions with floats in the fixed args On Windows, float arguments are normally passed in float registers in the calling convention for regular functions. For variable argument functions, floats are passed in integer registers. This already was done correctly since many years. However, the surprising bit was that floats among the fixed arguments also are supposed to be passed in integer registers, contrary to regular functions. (This also seems to be the behaviour on ARM though, both on Windows, but also on e.g. hardfloat linux.) In the calling convention, don't promote shorter floats to f64, but convert them to integers of the same length. (Floats passed as part of the actual variable arguments are promoted to double already on the C/Clang level; the LLVM vararg calling convention doesn't do any extra promotion of f32 to f64 - this matches how it works on X86 too.) Technically, this is an ABI break compared to older LLVM versions, but it fixes compatibility with the official platform ABI. (In practice, floats among the fixed arguments in variable argument functions is a pretty rare construct.) Differential Revision: https://reviews.llvm.org/D100365	2021-04-15 11:02:14 +03:00
Craig Topper	c3f1271464	[RISCV] Add a PatFrag to shorten repeated (XLenVT (VLOp GPR:$vl)) in V extension patterns. Reduces the amount of changes needed in D100288.	2021-04-14 22:36:35 -07:00
hsmahesha	4973b0c4e7	[AMDGPU] Disable forceful inline of non-kernel functions which use LDS. Now since LDS uses within non-kernel functions are being handled in the pass - LowerModuleLDS, we NO need to forcefully inline non-kernel functions just because they use LDS. Do forceful inlining only when the pass - LowerModuleLDS is not enabled. It is enabled by default. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100481	2021-04-15 09:12:56 +05:30
Thomas Lively	6a18cc23ef	[WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u} Removes the builtins and intrinsics used to opt in to using these instructions and replaces them with normal ISel patterns now that they are no longer prototypes. Differential Revision: https://reviews.llvm.org/D100402	2021-04-14 13:43:09 -07:00
Thomas Lively	af7925b4dd	[WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u} Add a custom DAG combine and ISD opcode for detecting patterns like (uint_to_fp (extract_subvector ...)) before the extract_subvector is expanded to ensure that they will ultimately lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions are no longer prototypes and can now be produced via standard IR, this commit also removes the target intrinsics and builtins that had been used to prototype the instructions. Differential Revision: https://reviews.llvm.org/D100425	2021-04-14 10:42:45 -07:00
Stanislav Mekhanoshin	b7ebb25e53	[AMDGPU] Factor out SelectSAddrFI() This is a service function generally useful for selection of a FI in an SADDR. NFC for now, needed for future patch. Differential Revision: https://reviews.llvm.org/D100406	2021-04-14 09:40:02 -07:00
Sander de Smalen	4f42d873c2	[TTI] NFC: Change getArithmeticInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100317	2021-04-14 17:20:36 +01:00
Sander de Smalen	1af35e77f4	[TTI] NFC: Change getVectorInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100315	2021-04-14 17:20:35 +01:00
Sander de Smalen	174e8f6c5e	[TTI] NFC: Change getShuffleCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100314	2021-04-14 17:20:35 +01:00
Sander de Smalen	14b934f8a6	[TTI] NFC: Change getCFInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100313	2021-04-14 17:20:34 +01:00
Sander de Smalen	596f669cfb	[TTI] NFC: Change getCallInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D100312	2021-04-14 17:20:34 +01:00
Thomas Lively	af7ab81ce3	[WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops Now that these instructions are no longer prototypes, we do not need to be careful about keeping them opt-in and can use the standard LLVM infrastructure for them. This commit removes the bespoke intrinsics we were using to represent these operations in favor of the corresponding target-independent intrinsics. The clang builtins are preserved because there is no standard way to easily represent these operations in C/C++. For consistency with the scalar codegen in the Wasm backend, the intrinsic used to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though @llvm.roundeven better captures the semantics of the underlying Wasm instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is left to a potential future patch. Differential Revision: https://reviews.llvm.org/D100411	2021-04-14 09:19:27 -07:00
hsmahesha	e3070db0f7	[AMDGPU] Rename "LDS lowering" pass name. Rename the name of "LDS lowering" pass from `amdgpu-disable-lower-module-lds` to `amdgpu-enable-lower-module-lds` as later is consistent and reads better. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D100441	2021-04-14 20:19:53 +05:30
Simon Pilgrim	4fbe761572	[X86][SSE] canonicalizeShuffleWithBinOps - check for more combos of merge-able binary shuffles. In the fold SHUFFLE(BINOP(X,Y),BINOP(Z,W)) -> BINOP(SHUFFLE(X,Z),SHUFFLE(Y,W)), check if both X/Z AND Y/W have at least one merge-able shuffle in which case the total number of shuffle should still fall. Helps with instruction count regressions we saw while fixing PR48823	2021-04-14 15:24:41 +01:00
Pablo Barrio	cca40aa8d8	[AArch64][v8.5A] Add BTI to all function starts The existing BTI placement pass avoids inserting "BTI c" when the function has local linkage and is only directly called. However, even in this case, there is a (small) chance that the linker later adds a hunk with an indirect call to the function, e.g. if the function is placed in a separate section and moved far away from its callers. Make sure to add BTI for these functions too. Differential Revision: https://reviews.llvm.org/D99417	2021-04-14 15:24:01 +01:00
Sebastian Neubauer	929edd4375	[AMDGPU] Mark scavenged SGPR as used Otherwise it reuses the same register for storing the stack slot offset if the stack slot offset is big. Differential Revision: https://reviews.llvm.org/D100461	2021-04-14 14:55:01 +02:00
Zarko Todorovski	6b7838b68c	[AIX] Allow safe for 32bit P8 VSX pattern matching Pull some of the safe for 32bit pattern matching for Pwr8 and above. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D97909	2021-04-14 08:12:48 -04:00
Simon Pilgrim	73737fe990	[X86] Fold cmpeq/ne(trunc(x),0) --> cmpeq/ne(x,0) Relax the fold from rGbaadbe04bf75 to compare any op, not just logic ops, now that the movmsk regressions have been handled.	2021-04-14 11:02:02 +01:00
Simon Pilgrim	016ceb8382	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X) fold Extension to rG74f98391a7a4, we can also include any of the upper (known zero) bits in the comparison in the shuffle removal fold, just as long as we demand all the elements of the movmsk source vector.	2021-04-14 11:02:01 +01:00
Nemanja Ivanovic	8be3181df6	[PowerPC] Fix incorrect subreg typo from `0148bf53f0`	2021-04-14 05:01:12 -05:00
Martin Storsjö	3b32dc4b84	[ARM] [COFF] Properly produce cross-section relative relocations Differential Revision: https://reviews.llvm.org/D99574	2021-04-14 12:31:28 +03:00
Martin Storsjö	d5c5cf5ce8	[AArch64] [COFF] Properly produce cross-section relative relocations This fixes breakage on Windows/ARM64 after D94355. Modelled after the corresponding code for X86; not entirely familiar with those aspects of that layer otherwise. Differential Revision: https://reviews.llvm.org/D99572	2021-04-14 12:31:26 +03:00
Bogdan Graur	0acf4e5005	[NFC] Fix unused warning. Differential Revision: https://reviews.llvm.org/D100449	2021-04-14 09:09:20 +02:00
Min-Yih Hsu	91b6ef64db	[M68k] Put M68kInfo as the direct library dependency for AsmParser M68kAsmParser uses `llvm::getTheM68kTarget` from M68kInfo, therefore we should put M68kInfo as its direct dependency. Otherwise the build will fail when building LLVM libraries as shared objects (building LLVM libraries statically won't have this problem though).	2021-04-13 21:21:02 -07:00
Wang, Pengfei	a3b52a9d13	[X86][AMX] Refactor for PostRA ldtilecfg pass. This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error. This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately. This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg. There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc. Reviewed By: LuoYuanke, xiangzhangllvm Differential Revision: https://reviews.llvm.org/D99966	2021-04-14 10:08:23 +08:00
ShihPo Hung	d5e962f1f2	[RISCV] Implement COPY for Zvlsseg registers When copying Zvlsseg register tuples, we split the COPY to NF whole register moves as below: $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2 => $v10m2 = PseudoVMV2R_V $v4m2 $v12m2 = PseudoVMV2R_V $v6m2 This patch copies forwardCopyWillClobberTuple from AArch64 to check register overlapping. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100280	2021-04-13 18:55:51 -07:00
Nemanja Ivanovic	0148bf53f0	[PowerPC] Use correct node to get a super register from a subreg The VSX tablegen file has some rather eggregious uses of COPY_TO_REGCLASS even in situations where it needs to use SUBREG_TO_REG. While this produces correct code, it often doesn't allow the register coalescer to coalesce copies and the resulting code ends up being suboptimal. This patch just changes over patterns that should use SUBREG_TO_REG.	2021-04-13 19:52:21 -05:00
root	645ce31c20	Title: [RISCV] Add missing part of instruction vmsge {u}. VX Review By: craig.topper Differential Revision : https://reviews.llvm.org/D100115	2021-04-14 06:41:59 +08:00
Craig Topper	6aa6f748ae	[RISCV] Add a generic PatGprImm class and use it to simplify patterns in RISCVInstrInfoB.td. NFC	2021-04-13 12:07:24 -07:00
Craig Topper	cb073f1bc0	[RISCV] Make use of PatGprGpr and PatGpr in RISCVInstrInfoB.td. NFC	2021-04-13 12:06:58 -07:00
Yonghong Song	a285bdb56f	BPF: remove default .extern data section Currently, for any extern variable, if it doesn't have section attribution, it will be put into a default ".extern" btf DataSec. The initial design is to put every extern variable in a DataSec so libbpf can use it. But later on, libbpf actually requires extern variables to put into special sections, e.g., ".kconfig", ".ksyms", etc. so they can be used properly based on section name. Andrii mentioned since ".extern" variables are not actually used, it makes sense to remove it from the compiler so libbpf does not need to deal with it, esp. for static linking. The BTF for these extern variables is still generated. With this patch, I tested kernel selftests/bpf and all tests passed. Indeed, removing ".extern" DataSec seems having no impact. Differential Revision: https://reviews.llvm.org/D100392	2021-04-13 11:35:52 -07:00
Craig Topper	1afdfc6169	[RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant. Prep work for adding intrinsics for these instructions in the future.	2021-04-13 11:04:28 -07:00
Jessica Paquette	516d09387b	[AArch64][GlobalISel] Mark G_CTPOP as legal for v16s8 and v8s8 G_CTPOP can be directly selected to CNT in these cases. Differential Revision: https://reviews.llvm.org/D100349	2021-04-13 11:03:39 -07:00
Simon Pilgrim	74f98391a7	[X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in CMP(MOVMSK(PACKSS())) -> CMP(MOVMSK()) fold We already allow the comparison of the upper bits of 'IsAllOf' (allbits) patterns, but we can safely compare the known zero bits for 'IsAnyOf' (zerobits) patterns as well. This fixes an issues where we are comparing a type wide than the number of vector elements, which avoids a regression mentioned in rGbaadbe04bf75.	2021-04-13 17:37:24 +01:00
Anirudh Prasad	7da22dfcd0	[SystemZ][z/OS] Introduce dialect querying helper functions - In the SystemZAsmParser, there will be a few queries to the type of dialect it is (AD_ATT, AD_HLASM) in future patches. - It would be nice to have two small helper functions `isParsingATT()` and `isParsingHLASM()` - Putting this as a separate smaller patch allows us to remove its definitions from other dependent patches. Reviewed By: uweigand, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D99891	2021-04-13 12:14:34 -04:00
Yonghong Song	968292cb93	BPF: generate proper BTF for globals with WeakODRLinkage For a global weak symbol defined as below: char g __attribute__((weak)) = 2; LLVM generates an allocated global with WeakAnyLinkage, for which BPF backend generates proper BTF info. For the above example, if a modifier "const" is added like const char g __attribute__((weak)) = 2; LLVM generates an allocated global with WeakODRLinkage, for which BPF backend didn't generate any BTF as it didn't handle WeakODRLinkage. This patch addes support for WeakODRLinkage and proper BTF info can be generated for weak symbol defined with "const" modifier. Differential Revision: https://reviews.llvm.org/D100362	2021-04-13 08:54:05 -07:00
Anirudh Prasad	f7eec83932	[AsmParser][SystemZ][z/OS] Add in support to allow use of additional comment strings. - Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute. - However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... /), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference) For example: Consider a target which sets the CommentString attribute to ''. The following strings are all lexed as comments. ``` "# abc" -> comment "// abc" -> comment "/* abc / -> comment " abc" -> comment ``` - In HLASM however, only "*" is accepted as a comment string, and nothing else. - To achieve this, an additional attribute (`AllowAdditionalComments`) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#". Depends on https://reviews.llvm.org/D99277 Reviewed By: abhina.sreeskantharajan, myiwanch Differential Revision: https://reviews.llvm.org/D99286	2021-04-13 11:15:09 -04:00
Sander de Smalen	03f47bdcb1	[TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100205	2021-04-13 14:21:02 +01:00
Sander de Smalen	d676b5749d	[TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100204	2021-04-13 14:21:01 +01:00
Sander de Smalen	db134e2428	[TTI] NFC: Change getCmpSelInstrCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100203	2021-04-13 14:21:01 +01:00
Sander de Smalen	2285dfb73f	[TTI] NFC: Change getMinMaxReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100202	2021-04-13 14:21:00 +01:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	fd1f8a5462	[TTI] NFC: Change getGatherScatterOpCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100200	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
madhur13490	5682ae2fc6	[AMDGPU] Set implicit arg attributes for indirect calls This patch adds attributes corresponding to implicits to functions/kernels if 1. it has an indirect call OR 2. it's address is taken. Once such attributes are set, rest of the codegen would work out-of-box for indirect calls. This patch eliminates the potential overhead -fixed-abi imposes even though indirect functions calls are not used. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D99347	2021-04-13 13:15:13 +00:00
Ricky Taylor	6e098e133d	[M68k] Implement AsmParser This is a work-in-progress implementation of an assembler for M68k. Outstanding work: - Updating existing tests assembly syntax - Writing new tests for the assembler (and disassembler) I've left those until there's consensus that this approach is okay (I hope that's okay!). Questions I'm aware of: - Should this use Motorola or gas syntax? (At the moment it uses Motorola syntax.) - The disassembler produces a table at runtime for disassembly generated from the code beads. Is this okay? (This is less than ideal but as I mentioned in my llvm-dev post, it's quite complicated to write a table-gen parser for code beads.) Depends on D98519 Depends on D98532 Depends on D98534 Depends on D98535 Depends on D98536 Differential Revision: https://reviews.llvm.org/D98537	2021-04-13 09:25:34 +01:00
Craig Topper	7c9bbbf735	[RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate. Prep work for adding intrinsics in the future. Left an assert that the input is constant in ReplaceNodeResults, as the intrinsic shouldn't go through that path.	2021-04-12 23:46:50 -07:00
Chen Zheng	80aa9b0f7b	[PowerPC] stop reverse mem op generation for some cases. We should consider the feeder user number when we do reverse memory operation transformation. Otherwise, we may get negative impact. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D100166	2021-04-12 22:41:28 -04:00
Freddy Ye	3fc1fe8db8	[X86] Support -march=rocketlake Reviewed By: skan, craig.topper, MaskRay Differential Revision: https://reviews.llvm.org/D100085	2021-04-13 09:48:13 +08:00
Fangrui Song	0a614fff4f	[ARM] Fix -Wmissing-field-initializers	2021-04-12 14:28:23 -07:00
Jian Cai	ed1734931a	Fix up build failures after `cfce5b26a8` Build log: https://lab.llvm.org/buildbot/#/builders/37/builds/3538 Differential Revision: https://reviews.llvm.org/D98916	2021-04-12 14:09:15 -07:00
Jian Cai	cfce5b26a8	[ARM] support symbolic expression as immediate in memory instructions Currently the ARM backend only accpets constant expressions as the immediate operand in load and store instructions. This allows the result of symbolic expressions to be used in memory instructions. For example, 0: .space 2048 strb r2, [r0, #(.-0b)] would be assembled into the following instructions. strb r2, [r0, #2048] This only adds support to ldr, ldrb, str, and strb in arm mode to address the build failure of Linux kernel for now, but should facilitate adding support to similar instructions in the future if the need arises. Link: https://github.com/ClangBuiltLinux/linux/issues/1329 Reviewed By: peter.smith, nickdesaulniers Differential Revision: https://reviews.llvm.org/D98916	2021-04-12 12:13:55 -07:00
Fraser Cormack	d737c47137	[RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates This patch adds more optimized codegen for the above SETCC forms, by matching the '.vi' vector forms when the immediate is a 5-bit signed immediate plus 1. The immediate can be decremented and the corresponding SET[U]LE or SET[U]GT forms can be matched. This work was left as a TODO from D94168. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100096	2021-04-12 18:36:45 +01:00
David Green	dd31b2c6e5	[ARM] Add a number of intrinsics for MVE lane interleaving Add a number of intrinsics which natively lower to MVE operations to the lane interleaving pass, allowing it to efficiently interleave the lanes of chucks of operations containing these intrinsics. Differential Revision: https://reviews.llvm.org/D97293	2021-04-12 17:23:02 +01:00
Simon Pilgrim	baadbe04bf	[X86] Fold cmpeq/ne(trunc(logic(x)),0) --> cmpeq/ne(logic(x),0) Fixes the issues noted in PR48768, where the and/or/xor instruction had been promoted to avoid i8/i16 partial-dependencies, but the test against zero had not. We can almost certainly relax this fold to work for any truncation, although it breaks a number of existing folds (notable movmsk folds which tend to rely on the truncate to determine the demanded bits/elts in the source vector). There is a reverse combine in TargetLowering.SimplifySetCC so we must wait until after legalization before attempting this.	2021-04-12 16:05:34 +01:00
Wang, Pengfei	4cbaaf4a24	[X86][AMX] Hoist ldtilecfg The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop. This patch try to calculate the nearest point where all shapes of AMX registers are reachable. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D99010	2021-04-12 22:36:41 +08:00
David Green	6c0a1ed3a9	[ARM] Add FP handling for MVE lane interleaving FP16 to FP32 converts can be handled in MVE lane interleaving, much like the sext/zext lowering we do. This expands the pass with fpext and fptrunc handling, and basic fp operations allowing more efficient lowering of fp vectors. Differential Revision: https://reviews.llvm.org/D97292	2021-04-12 15:28:13 +01:00
Malhar Jajoo	58f3201a20	[ARM] Updates to arm-block-placement pass The patch makes two updates to the arm-block-placement pass: - Handle arbitrarily nested loops - Extends the search (for t2WhileLoopStartLR) to the predecessor of the preHeader. Differential Revision: https://reviews.llvm.org/D99649	2021-04-12 14:46:23 +01:00
Andrew Savonichev	f037b07b5c	Revert "[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant" This reverts commit `cca9b5985c`. Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir: * Bad machine code: Virtual register killed in block, but needed live out. * - function: indexed_2s - basic block: %bb.0 entry (0x640fee8) Virtual register %7 is used after the block. * Bad machine code: Virtual register defs don't dominate all uses. * - function: indexed_2s - v. register: %7 LLVM ERROR: Found 2 machine code errors.	2021-04-12 16:28:49 +03:00
Andrew Savonichev	cca9b5985c	[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner. FMUL_indexed is normally selected during instruction selection, but it does not work in cases when VDUP and VMUL are in different basic blocks. Differential Revision: https://reviews.llvm.org/D99662	2021-04-12 16:08:39 +03:00
Sebastian Neubauer	6cc91adf1e	[AMDGPU] Kill temporary register after restoring Not a correctness issue, but the temporary register is not used afterwards and should be dead. Differential Revision: https://reviews.llvm.org/D100295	2021-04-12 14:20:03 +02:00
Bradley Smith	f2593a0bd1	[AArch64][SVE] Remove redundant PTEST of MATCH/NMATCH results Co-authored-by: Paul Walker <paul.walker@arm.com> Differential Revision: https://reviews.llvm.org/D99584	2021-04-12 12:55:00 +01:00
Dmitry Preobrazhensky	67b39661c8	[AMDGPU][MC][NFC] Removed extra spaces Fixed bugs 49646, 49647. Differential Revision: https://reviews.llvm.org/D100173	2021-04-12 13:33:19 +03:00
Sebastian Neubauer	7a8e65dd3d	[AMDGPU] Fix ubsan error The RegScavenger can be null sometimes, so a pointer is needed. Fixes UBSan error introduced in `f9a8c6a0e5`.	2021-04-12 12:14:00 +02:00
Sebastian Neubauer	b76c2a6c2b	[AMDGPU] Fix saving fp and bp Spilling the fp or bp to scratch could overwrite VGPRs of inactive lanes. Fix that by using only the active lanes of the scavenged VGPR. This builds on the assumptions that 1. a function is never called with exec=0 2. lanes do not die in a function, i.e. exec!=0 in the function epilog 3. no new lanes are active when exiting the function, i.e. exec in the epilog is a subset of exec in the prolog. Differential Revision: https://reviews.llvm.org/D96869	2021-04-12 11:52:55 +02:00
Sebastian Neubauer	32bc9a9bc3	[AMDGPU] Unify spill code Instead of reimplementing spilling in prolog and epilog, reuse buildSpillLoadStore. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D99269	2021-04-12 11:19:08 +02:00
Sebastian Neubauer	f9a8c6a0e5	[AMDGPU] Save VGPR of whole wave when spilling Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot determine if a VGPR is used in other lanes or not, so we need to save all lanes of the VGPR. We even need to save the VGPR if it is marked as dead. The generated code depends on two things: - Can we scavenge an SGPR to save EXEC? - And can we scavenge a VGPR? If we can scavenge an SGPR, we - save EXEC into the SGPR - set the needed lane mask - save the temporary VGPR - write the spilled SGPR into VGPR lanes - save the VGPR again to the target stack slot - restore the VGPR - restore EXEC If we were not able to scavenge an SGPR, we do the same operations, but everytime the temporary VGPR is written to memory, we - write VGPR to memory - flip exec (s_not exec, exec) - write VGPR again (previously inactive lanes) Surprisingly often, we are able to scavenge an SGPR, even though we are at the brink of running out of SGPRs. Scavenging a VGPR does not have a great effect (saves three instructions if no SGPR was scavenged), but we need to know if the VGPR we use is live before or not, otherwise the machine verifier complains. Differential Revision: https://reviews.llvm.org/D96336	2021-04-12 11:01:38 +02:00
Stelios Ioannou	a655f250fe	[AArch64] Adds memory operands for indexed loads. This patch adds the memory operands for indexed loads so that certain optimizations can take place. Differential Revision: https://reviews.llvm.org/D100215/ Change-Id: I539fcf046ca4ad1e7df1d893f57d751419d8364d	2021-04-12 09:11:37 +01:00

1 2 3 4 5 ...

62287 Commits