llvm-project

Commit Graph

Author	SHA1	Message	Date
Ard Biesheuvel	2caf85ad7a	[ARM] implement LOAD_STACK_GUARD for remaining targets Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register. So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue. This code was split off from D112768: [ARM] implement support for TLS register based stack protector for which it is a prerequisite. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112811	2021-11-08 22:59:15 +01:00
Michael Liao	bf225939bc	[InferAddressSpaces] Support assumed addrspaces from addrspace predicates. - CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC. - This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g., ``` foo(float *p) { __builtin_assume(__isGlobal(p)); // From there, we could assume p is a global pointer instead of a // generic one. } ``` This makes the code portable without introducing the implementation-specific features. Note that NVCC starts to support __builtin_assume from version 11. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112041	2021-11-08 16:51:57 -05:00
Craig Topper	304edbb553	[RISCV] SMUL_LOHI/UMUL_LOHI should expand for RVV. These and MULHS/MULHU both default to Legal. Targets need to set the ones they don't support to Expand. I think MULHS/MULHU likely has priority in most places so this change probably isn't directly testable. I found it while looking at disabling MULHS/MULHU for nxvXi64 as required for Zve64x. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113325	2021-11-08 09:38:36 -08:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
Joe Nash	79f52af4cd	[AMDGPU] Make getInstSizeInBytes more generic NFC. This check mainly handles size affecting literals. Make it check all explicit operands instead of a few by name. Also make the isLiteral check handle the KIMM operands, see https://reviews.llvm.org/D111067 Change-Id: I1a362d55b2a10f5c74d445272e8b7829a8b77597 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D113318 Change-Id: Ie6c688f30a71e0335d1c6dd1ff65019bd7ce684e	2021-11-08 10:34:49 -05:00
Simon Pilgrim	7f32edea23	[X86] combineMulToPMADDWD - use ComputeMinSignedBits(). NFCI. Use ComputeMinSignedBits() to ensure the mul source operands at least sign-extend down from the bottom 16 bits. This will make it easier if/when we try to support handling of source types larger than 32-bits.	2021-11-08 15:28:31 +00:00
Mindong Chen	495e258fd7	[AArch64][SVE] Add FP types to the supported SVE structure load/stores vector type list This adds FP type support to the SVE Container type list as a supplement to D112303. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113333	2021-11-08 22:29:08 +08:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
David Sherwood	8d38c24fb6	[SVE][CodeGen] Improve codegen for some FP insert_subvector cases When inserting an unpacked FP subvector into a packed vector we can simply cast the unpacked value into a packed value, since both types are legal for SVE. We can then use this as the input for the UZP instruction. This avoids us expanding the operation by going through the stack. Differential Revision: https://reviews.llvm.org/D113270	2021-11-08 13:45:55 +00:00
Matt	4a59694ba1	[AArch64][SVE] Combine FADD and FMUL aarch64 intrinsics to FMLA This is a refinement to the work in https://reviews.llvm.org/D111638 Fold (fadd p a (fmul p b c)) into (fma p a b c) Differential Revision: https://reviews.llvm.org/D113095	2021-11-08 12:22:38 +00:00
Simon Pilgrim	f60d3ec0c7	[DAG] Add BuildVectorSDNode::getConstantRawBits helper We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation. This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data. Differential Revision: https://reviews.llvm.org/D113351	2021-11-08 12:07:38 +00:00
Simon Moll	c2b91eef27	[VE] default to integrated asm in AsmInfo VE integrated asm has been the default in Clang. Also use the default setting for integrated asm in the backend. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D113384	2021-11-08 11:58:29 +01:00
David Green	a982940eb5	[AArch64] Combine fptoi.sat(fmul) to fixed point cvtf We already have patterns for fptosi and fptoui plus fmul to fixed point convert, this adds equivalent patterns for fptosi.sat and fptoui.sat, which should apply equally well for the legal saturating variants. Differential Revision: https://reviews.llvm.org/D113199	2021-11-08 10:07:34 +00:00
Qiu Chaofan	9b5e2b5261	[PowerPC] Implement basic macro fusion in Power10 Including basic fusion types around arithmetic and logical instructions. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111693	2021-11-08 17:23:56 +08:00
Andrew Wei	bf3784b882	[AArch64] Canonicalize X(Y+1) or X(1-Y) to madd/msub Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern Reviewed By: dmgreen, sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D111862	2021-11-08 16:49:31 +08:00
skc7	a0633f5ccb	[AMDGPU] Test Commit. NFC Reviewed By: hsmhsm Differential Revision: https://reviews.llvm.org/D113379	2021-11-08 07:09:09 +00:00
Ben Shi	e32cf690df	[RISCV] Optimize (add (mul r, c0), c1) Optimize (add (mul x, c0), c1) -> (add (mul (add x, c1/c0+1), c0), c1%c0-c0), if c1/c0+1 and c1%c0-c0 are simm12, while c1 is not. Optimize (add (mul x, c0), c1) -> (add (mul (add x, c1/c0-1), c0), c1%c0+c0), if c1/c0-1 and c1%c0+c0 are simm12, while c1 is not. Reviewed By: craig.topper, asb Differential Revision: https://reviews.llvm.org/D111141	2021-11-08 02:58:25 +00:00
Chen Zheng	7c6f5950f0	[PowerPC] comment for different input register classes; nfc Add comments to explain why XXPERMDIs and XXPERMDI have different input register classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI. This addresses the comments in abandoned patch D113178, we keep using `f0` instead of using `vs0` for XXPERMDIs on purpose.	2021-11-08 02:21:30 +00:00
Zi Xuan Wu	4fb282fec5	[CSKY] Add CSKY 16-bit instruction format and encoding CSKY is a ARCH which supports mixture of 16-bit and 32-bit instructions natively, and there is not an indivual predictor or feature to enable/disable 16-bit instruction. So I think it's better to add 16-bit instruction early, and naturally to use 16-bit and 32-bit instructions. Differential Revision: https://reviews.llvm.org/D112919	2021-11-08 10:02:15 +08:00
Simon Pilgrim	55e4cd8485	[X86][AVX2] Recognise 256-bit truncation shuffles and mask 256-bit source For v8i16 shuffle patterns that are lowered with AND+PACKUS, check to see if the sources are from a 256-bit vector and perform the masking using BLENDW at the 256-bit level. With the test changes we can see more examples of duplicate XMM/YMM zero vectors (PR26018) :(	2021-11-07 21:24:55 +00:00
Nikita Popov	a8c318b50e	[BasicAA] Use index size instead of pointer size When accumulating the GEP offset in BasicAA, we should use the pointer index size rather than the pointer size. Differential Revision: https://reviews.llvm.org/D112370	2021-11-07 18:56:11 +01:00
Kazu Hirata	aee86f9b6c	[AMDGPU] Remove unused declaration selectSMRD (NFC) The function body proper was removed on Feb 20, 2019 in commit `79b5c3842b`.	2021-11-07 09:53:18 -08:00
Kazu Hirata	41ef3187e0	[ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-07 09:53:16 -08:00
Simon Pilgrim	d391e4fe84	[X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth. Helps prevent future scheduler model mismatches like those that were only addressed in D44687. Differential Revision: https://reviews.llvm.org/D113302	2021-11-07 15:06:54 +00:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Simon Pilgrim	b5ef56f0bc	[X86][AVX] Add missing X86ISD::VBROADCAST(v4f32 -> v8f32) isel pattern for AVX1 targets D109434 addressed the v2f64 -> v4f64 case, an internal test has found an equivalent crash for the v4f32 -> v8f32 case.	2021-11-07 12:59:35 +00:00
Kazu Hirata	22e21da47d	[WebAssembly] Remove unused declaration SelectExternRefAddr (NFC)	2021-11-06 19:31:22 -07:00
Kazu Hirata	e4bab21848	[AMDGPU] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-06 19:31:20 -07:00
Kazu Hirata	cefc01fa65	[X86] Simplify a call to MachineBasicBlock::erase (NFC)	2021-11-06 13:08:25 -07:00
Kazu Hirata	815e8b5a20	[Hexagon] Remove an extraneous variable (NFC)	2021-11-06 13:08:23 -07:00
Kazu Hirata	14d656b3d8	[Target] Use llvm::reverse (NFC)	2021-11-06 13:08:21 -07:00
Roman Lebedev	f8efc5c0ac	[NFC][TTI] Add/extract `getReplicationShuffleCost()` method, deduplicate it's implementations Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons, including testability and reuse, let's do better. In a followup `getUserCost()` will be taught to use to to estimate the mask costs, which will allow for better cost model tests for it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113313	2021-11-06 16:45:15 +03:00
Bin Cheng	54d891a7d5	[RISCV]: Fix typo by abstracting VWholeLoad* classes This patch abstracts VWholeLoad* classes into VWholeLoadN, simplifies existing code as well as fixes a typo. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109319	2021-11-06 10:48:03 +08:00
Bin Cheng	d488f1fff2	[RISCV][NFC]: Refactor classes for load/store instructions of RVV This patch refactors classes for load/store of V extension by: - Introduce new class for VUnitStrideLoadFF and VUnitStrideSegmentLoadFF so that uses of L/SUMOP* are not spread around different places. - Reorder classes for Unit-Stride load/store in line with table describing lumop/sumop in riscv-v-spec.pdf. Reviewed By: HsiangKai, craig.topper Differential Revision: https://reviews.llvm.org/D109318	2021-11-06 10:48:03 +08:00
Florian Hahn	f64580f8d2	[AArch64][GISel] Optimize 8 and 16 bit variants of uaddo. Try simplify G_UADDO with 8 or 16 bit operands to wide G_ADD and TBNZ if result is only used in the no-overflow case. It is restricted to cases where we know that the high-bits of the operands are 0. If there's an overflow, then the the 9th or 17th bit must be set, which can be checked using TBNZ. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D111888	2021-11-05 19:11:15 +01:00
Shao-Ce SUN	5c3d7184b4	[RISCV] Support Zfhmin extension According to RISC-V Unprivileged ISA 15.6. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D111866	2021-11-06 01:41:02 +08:00
Kazu Hirata	2c4ba3e9d3	[Target] Use make_early_inc_range (NFC)	2021-11-05 09:14:32 -07:00
Roman Lebedev	ad617183bb	[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: mask is i8 not i1 Even though AVX512's masked mem ops (unlike AVX1/2) have a mask that is a `VF x i1`, replication of said masks happens after promotion of it to `VF x i8`, so we should use `i8`, not `i1`, when calculating the cost of mask replication.	2021-11-05 17:27:02 +03:00
David Sherwood	657a1dcd0d	[AArch64] Add target DAG combine for UUNPKHI/LO When created a UUNPKLO/HI node with an undef input then the output should also be undef. I've added a target DAG combine function to ensure we avoid creating an unnecessary uunpklo/hi instruction. Differential Revision: https://reviews.llvm.org/D113266	2021-11-05 13:50:59 +00:00
Quinn Pham	c71fbdd87b	[NFC] Inclusive language: Remove instances of master in URLs [NFC] This patch fixes URLs containing "master". Old URLs were either broken or redirecting to the new URL. Reviewed By: #libc, ldionne, mehdi_amini Differential Revision: https://reviews.llvm.org/D113186	2021-11-05 08:48:41 -05:00
Jingu Kang	a7b1872593	[AArch64] Fix a bug from a pattern for uaddv(uaddlp(x)) ==> uaddlv A pattern has selected wrong uaddlv MI. It should be as below. uaddv(uaddlp(v8i8)) ==> uaddlv(v8i8) Differential Revision: https://reviews.llvm.org/D113263	2021-11-05 12:48:18 +00:00
Simon Pilgrim	5e9ac7c0a5	[X86] Enable v32i16 rotate lowering on non-BWI targets Fixes one of the regressions in D113192	2021-11-05 11:00:31 +00:00
Chen Zheng	fed2889f07	[PowerPC] use correct selection for v16i8/v8i16 splat load Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D113236	2021-11-05 10:04:03 +00:00
Jay Foad	c93bf53a3e	[AMDGPU] NFC formatting fixes in SIMemoryLegalizer	2021-11-05 09:10:24 +00:00
Qiu Chaofan	5fd406e254	[PowerPC] Add intrinsic to convert between ppc_fp128 and fp128 ppc_fp128 and fp128 are both 128-bit floating point types. However, we can't do conversion between them now, since trunc/ext are not allowed for same-size fp types. This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and llvm.convert.ppcf128.to.f128, to support such conversion. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D109421	2021-11-05 16:58:38 +08:00
Chen Zheng	9695027066	[PowerPC] address post-commit comments for D106555; NFC Address namanjai post commit comments.	2021-11-05 05:30:53 +00:00
Shengchen Kan	be08e452f3	[X86][MS-InlineAsm] Add constraint m for memory access w/ global var Constraint `m` should be used when the address of a variable is passed as a value. And the constraint is missing for MS inline assembly when sth is written to the address of the variable. The missing would cause FE delete the definition of the static varible, and then result in "undefined reference to xxx" issue. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D113096	2021-11-05 09:11:41 +08:00
Yonghong Song	41860e602a	BPF: Support btf_type_tag attribute A new kind BTF_KIND_TYPE_TAG is defined. The tags associated with a pointer type are emitted in their IR order as modifiers. For example, for the following declaration: int __tag1 * __tag1 __tag2 *g; The BTF type chain will look like VAR(g) -> __tag1 --> __tag2 -> pointer -> __tag1 -> pointer -> int In the above "->" means BTF CommonType.Type which indicates the point-to type. Differential Revision: https://reviews.llvm.org/D113222	2021-11-04 17:01:36 -07:00
Ben Langmuir	a2639dcbe6	[ORC] Add a utility for adding missing "self" relocations to a Symbol If a tool wants to introduce new indirections via stubs at link-time in ORC, it can cause fidelity issues around the address of the function if some references to the function do not have relocations. This is known to happen inside the body of the function itself on x86_64 for example, where a PC-relative address is formed, but without a relocation. ``` _foo: leaq -7(%rip), %rax ## form pointer to '_foo' without relocation _bar: leaq (%rip), %rax ## uses X86_64_RELOC_SIGNED to '_foo' ``` The consequence of introducing a stub for such a function at link time is that if it forms a pointer to itself without relocation, it will not have the same value as a pointer from outside the function. If the function pointer is used as a key, this can cause problems. This utility provides best-effort support for adding such missing relocations using MCDisassembler and MCInstrAnalysis to identify the problematic instructions. Currently it is only implemented for x86_64. Note: the related issue with call/jump instructions is not handled here, only forming function pointers. rdar://83514317 Differential revision: https://reviews.llvm.org/D113038	2021-11-04 15:01:05 -07:00
Thomas Symalla	76cbe62262	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637	2021-11-04 21:50:18 +01:00
David Green	091244023a	[ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered. This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops. Differential Revision: https://reviews.llvm.org/D113094	2021-11-04 18:42:12 +00:00
Wouter van Oortmerssen	a320f877ce	[WebAssembly] Fix debug locations for ExplicitLocals pass This is a reworked version of the reverted patch: https://reviews.llvm.org/D112487 Note that a) it doesn't need the test changes anymore, and b) I checked at least locally it passes other.test_pthread_lsan_leak Differential Revision: https://reviews.llvm.org/D113208	2021-11-04 11:38:03 -07:00
Zakk Chen	0649dfebba	[RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0. Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic. Reviewed By: frasercrmck, jrtc27, craig.topper Differential Revision: https://reviews.llvm.org/D111062	2021-11-04 10:08:01 -07:00
Kazu Hirata	2887117d2c	[Hexagon] Use make_early_inc_range (NFC)	2021-11-04 08:51:05 -07:00
Sander de Smalen	1ea4296208	[NFC] Remove from UnivariateLinearPolyBase::getValue(). This interface should not have existed in the first place, let alone be a public member. It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous. The interfaces to be used are either getFixedValue() or getKnownMinValue().	2021-11-04 14:32:08 +00:00
Chen Zheng	f6db18fd4a	[PowerPC][NFC] make option ppc-formprep-max-vars can be set more than one time.	2021-11-04 13:44:58 +00:00
Simon Pilgrim	87d5bb66eb	[X86][SSE] Improve PMADDWD SimplifyDemandedVectorElts handling Check both operands for zero elements to remove unnecessary demanded elts. Try to help reduce some minor regressions noticed in D110995	2021-11-04 12:56:31 +00:00
Qiu Chaofan	a84118756c	[PowerPC] Enforce side effects to FPSCR read/set intrinsics Currently, FPSCR is not modeled, so in some early passes (such as early-cse), the read/set intrinsics to FPSCR may get incorrect simplification. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112380	2021-11-04 11:45:32 +08:00
RamNalamothu	539f500e78	[AMDGPU] Do not add debug locations to the code inside prologue There is no real source location for code inside prologue as it is generated by compiler but source locations are being added to code inside prologue as a side effect of https://reviews.llvm.org/D99269 because buildSpillLoadStore() is using source location of the real instruction in the basic block if any. Fixes: SWDEV-307590 Reviewed By: scott.linder, sebastian-ne Differential Revision: https://reviews.llvm.org/D113100	2021-11-04 08:02:41 +05:30
Craig Topper	5022ac0771	[RISCV] Use HasVInstructions and HasVInstructionsAnyF in more place in TableGen. NFC Change RISCVSubtarget.hasVInstructionAnyF() to call hasVInstructionsF32 so that any changes to hasVInstructionsF32 are reflected. The files were missed in D112496.	2021-11-03 14:32:45 -07:00
Matthias Braun	847a680733	X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr This is a re-commit of `e2c7ee0743` which was reverted in `a2a58d91e8`. This includes a fix to consistently check for EFLAGS being live-out. See phabricator review. Original Summary: This extends `optimizeCompareInstr` to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example: CMP x, 13 ... CMP x, 12 ; can be removed if we change the SETg SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for `x > C` the fallthrough optimization switches true/false blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into `x < C + 1`. Differential Revision: https://reviews.llvm.org/D110867	2021-11-03 14:12:23 -07:00
alex-t	0a3d755ee9	[AMDGPU] Enable divergence-driven BFE selection Detailed description: This change enables the bit field extract patterns selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node divergence. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D110950	2021-11-03 23:26:59 +03:00
Harald van Dijk	889c2b97bd	[X86] Fix X32 indirect call generation The check for whether a zero extension was needed was subtly wrong and saw a value that was already 64 bits, so did not extend. Fixes PR52357. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112860	2021-11-03 16:43:44 +00:00
Kazu Hirata	4bef0304e1	[AArch64, AMDGPU] Use make_early_inc_range (NFC)	2021-11-03 09:22:51 -07:00
Hans Wennborg	a2a58d91e8	Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr" This casued miscompiles of switches, see comments on the code review. > This extends `optimizeCompareInstr` to re-use previous comparison > results if the previous comparison was with an immediate that was 1 > bigger or smaller. Example: > > CMP x, 13 > ... > CMP x, 12 ; can be removed if we change the SETg > SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP > > Motivation: This often happens because SelectionDAG canonicalization > tends to add/subtract 1 often when optimizing for fallthrough blocks. > Example for `x > C` the fallthrough optimization switches true/false > blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into > `x < C + 1`. > > Differential Revision: https://reviews.llvm.org/D110867 This reverts commit `e2c7ee0743`.	2021-11-03 17:01:36 +01:00
Roman Lebedev	df93c8a919	[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: fallback to scalarization cost computation for mask I don't really buy that masked interleaved memory loads/stores are supported on X86. There is zero costmodel test coverage, no actual cost modelling for the generation of the mask repetition, and basically only two LV tests. Additionally, i'm not very interested in AVX512. I don't know if this really helps "soft" block over at https://reviews.llvm.org/D111460#inline-1075467, but i think it can't make things worse at least. When we are being told that there is a masking, instead of completely giving up and falling back to fully scalarizing `BasicTTIImplBase::getInterleavedMemoryOpCost()`, let's correctly query the cost of masked memory ops, keep all the pretty shuffle cost modelling, but scalarize the cost computation for the mask replication. I think, not scalarizing the shuffles themselves may adjust the computed costs a bit, and maybe hopefully just enough to hide the "regressions" at https://reviews.llvm.org/D111460#inline-1075467 I do mean hide, because the test coverage is non-existent. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112873	2021-11-03 18:14:35 +03:00
Peter Waller	7a34145f40	Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `753eba6421`. Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 13:42:14 +00:00
Peter Waller	753eba6421	Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store" This reverts commit `1febf42f03`, which has a use-of-uninitialized-memory bug. See: https://reviews.llvm.org/D112076	2021-11-03 13:39:38 +00:00
Andrew Savonichev	123ad720f1	[NVPTX] Mark special registers as reserved A reserved register: - is not allocatable - is considered always live - is ignored by liveness tracking NVPTX special registers match the criteria, and marking them as reserved helps to avoid machine verifier error: * Bad machine code: Using an undefined physical register * - function: foo - basic block: %bb.0 (0x557bb178b708) - instruction: %0:int32regs = MOV_SPECIAL $envreg0 - operand 1: $envreg0 Differential Revision: https://reviews.llvm.org/D113008	2021-11-03 15:48:04 +03:00
Andrew Savonichev	0e70785538	[NVPTX] Add MoveParam instruction for TargetExternalSymbol operand TargetExternalSymbol is considered to be an immediate and not a register, so machine verifier emits an error: * Bad machine code: Expected a register operand. * - function: static_offset - basic block: %bb.0 bb (0x560e9b306028) - instruction: %3:int64regs = MoveParamI64 &static_offset_param_1 - operand 1: &static_offset_param_1 The patch adds variants of this instruction with an immediate operand for byval arguments on 64-bit and 32-bit targets. Differential Revision: https://reviews.llvm.org/D113006	2021-11-03 14:43:41 +03:00
David Green	3bc586b9aa	[ARM] Treat MVE gather add-like-or's like adds LLVM has the habit of turning adds with no common bits set into ors, which means we need to detect them and treat them like adds again in the MVE gather/scatter lowering pass. Differential Revision: https://reviews.llvm.org/D112922	2021-11-03 11:41:06 +00:00
Peter Waller	1febf42f03	[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store Contiguous gather => masked load: (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1)) => (masked.load (gep BasePtr IndexBase) Align Mask undef) Contiguous scatter => masked store: (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1)) => (masked.store Value (gep BasePtr IndexBase) Align Mask) Tests with <vscale x 2 x double>: [Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation]. Differential Revision: https://reviews.llvm.org/D112076	2021-11-03 11:02:44 +00:00
David Green	d36dd1f842	[ARM] Push gather/scatter shl index updates out of loops This teaches the MVE gather scatter lowering pass that SHL is essentially the same as Mul, where we are able to optimize the induction of a gather/scatter address by pushing them out of loops. https://alive2.llvm.org/ce/z/wG4VyT Differential Revision: https://reviews.llvm.org/D112920	2021-11-03 11:00:05 +00:00
Qiu Chaofan	741aeda97d	[PowerPC] Implement longdouble pack/unpack builtins Implement two builtins to pack/unpack IBM extended long double float, according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112055	2021-11-03 17:57:25 +08:00
Andrew Savonichev	30a3a17df8	[NVPTX] Copy machine operand flags in TII::insertBranch Before this patch, flags such as undef were dropped by TII::insertBranch (used by BranchFolding pass), resulting in the following error from machine verifier: * Bad machine code: Reading virtual register without a def * - function: hoge - basic block: %bb.0 bb (0x562e9c240e68) - instruction: CBranch %2:int1regs, %bb.3 - operand 0: %2:int1regs Differential Revision: https://reviews.llvm.org/D113001	2021-11-03 12:38:27 +03:00
Yi Kong	803d4f8a35	[ARM][AsmParser] Don't emit "deprecated instruction in IT block" warning if requested Also fixed formatting in AsmMatcherEmitter because it was confusing. Differential Revision: https://reviews.llvm.org/D112993	2021-11-03 17:18:04 +08:00
Ben Shi	59c3b48d99	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `3de3ca3137`.	2021-11-03 14:15:21 +08:00
Chen Zheng	5a8b196340	[PowerPC] handle more splat loads without stack operation This mostly improves splat loads code generation on Power7 Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106555	2021-11-03 05:17:41 +00:00
Abinav Puthan Purayil	fbe61fb0aa	[AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation. The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to optimize AGPR to AGPR copy but it missed checking for the SGPR clobbering for the S_MOV_B64_IMM_PSEUDO generation. Differential Revision: https://reviews.llvm.org/D113005	2021-11-03 09:09:24 +05:30
Ben Shi	3de3ca3137	[AArch64] Optimize add/sub with immediate Optimize ([add\|sub] r, imm) -> ([ADD\|SUB] ([ADD\|SUB] r, #imm0, lsl #12), #imm1), if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned integers. Optimize ([add\|sub] r, imm) -> ([SUB\|ADD] ([SUB\|ADD] r, #imm0, lsl #12), #imm1), if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned integers. Reviewed By: jaykang10, dmgreen Differential Revision: https://reviews.llvm.org/D111034	2021-11-03 03:06:43 +00:00
Phoebe Wang	8f101971b6	[X86][VARARG] Assign MMO earlier to avoid prolog insert point been sunk across VASTART_SAVE_XMM_REGS The changes in D80163 defered the assignment of MachineMemOperand (MMO) until the X86ExpandPseudo pass. This will result in crash due to prolog insert point been sunk across the pseudo instruction VASTART_SAVE_XMM_REGS. Moving the assignment to the creation of the node can avoid the problem. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D112859	2021-11-03 10:13:32 +08:00
Eli Friedman	c964afb2c8	[AArch64] Diagnose large adrp offset on Windows. On Windows, this relocation can only encode a 21-bit offset. Make sure we emit an error, instead of silently truncating the offset. Found investigating https://bugs.llvm.org/show_bug.cgi?id=52378 Differential Revision: https://reviews.llvm.org/D113051	2021-11-02 15:11:22 -07:00
Simon Pilgrim	53900a19fd	[X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper for splat of normal vector loads. NFCI. Reapplied from rG1cfecf4fc427 with fix for PR51226 - ensure the load is a normal (non-ext) load.	2021-11-02 20:03:25 +00:00
Simon Pilgrim	82e0eb22af	[X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper. NFCI. This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.	2021-11-02 18:04:35 +00:00
Fraser Cormack	d065b03801	[RISCV] Optimize vp.load with an all-ones mask Similar to D110206, this patch optimizes unmasked vp.load intrinsics to avoid the need of a vmset instruction to set the mask. It does so by selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D113022	2021-11-02 17:23:39 +00:00
Jay Foad	be1a8f8834	[AMDGPU] Really preserve LiveVariables in SILowerControlFlow https://bugs.llvm.org/show_bug.cgi?id=52204 Differential Revision: https://reviews.llvm.org/D112731	2021-11-02 15:03:37 +00:00
Matt	895145aacb	Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA" This reverts commit `fc28a2f8ce`.	2021-11-02 14:56:01 +00:00
Simon Pilgrim	e173631dd1	[X86][AVX] SimplifyDemandedVectorEltsForTargetNode - use getBROADCAST_LOAD helper. NFCI. Reduce width of X86ISD::SUBV_BROADCAST_LOAD node.	2021-11-02 14:07:22 +00:00
Simon Pilgrim	8ca666a280	[X86][AVX] lowerV2X128Shuffle - use getBROADCAST_LOAD helper. NFCI.	2021-11-02 14:07:21 +00:00
Martin Liska	c5029023fb	Fix building with GCC 12: Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380 Differential Revision: https://reviews.llvm.org/D112990	2021-11-02 14:28:00 +01:00
David Callahan	4ec1b8eeac	[RISCV] Fix invalid kill on callee save A callee save may be live (specifically X1) on entry and so a spill should not mark it killed. Differential Revision: https://reviews.llvm.org/D111285	2021-11-02 11:56:54 +00:00
Wouter van Oortmerssen	ac65366485	[WebAssembly] support "return" and unreachable code in asm type checker To support return (it not being supported well was the ground cause for https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have at least a basic notion of unreachable, which in this case just means to stop type checking until there is an end_block (an incoming control flow edge). This is conservative (may miss on some type checking opportunities) but is simple and an improvement over what we had before. Differential Revision: https://reviews.llvm.org/D112953	2021-11-01 15:42:58 -07:00
Yonghong Song	f63405f6e3	BPF: Workaround an InstCombine ICmp transformation with llvm.bpf.compare builtin Commit `acabad9ff6` ("[InstCombine] try to canonicalize icmp with trunc op into mask and cmp") added a transformation to convert "(conv)a < power_2_const" to "a & <const>" in certain cases and bpf kernel verifier has to handle the resulted code conservatively and this may reject otherwise legitimate program. This commit tries to prevent such a transformation. A bpf backend builtin llvm.bpf.compare is added. The ICMP insn, which is subject to above InstCombine transformation, is converted to the builtin function. The builtin function is later lowered to original ICMP insn, certainly after InstCombine pass. With this change, all affected bpf strobemeta* selftests are passed now. Differential Revision: https://reviews.llvm.org/D112938	2021-11-01 14:46:20 -07:00
Cameron McInally	702fd3d323	[SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive. For NEON, FMA matching is done in the MachineCombiner, and not the DAGCombiner. That causes problems with VLS lowering, since the vectors are fixed width at the DAGCombiner, but are scalable in the MachineCombiner. This patch corrects it by matching FMAs for VLS vectors in the DAGCombiner. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112557	2021-11-01 10:43:52 -07:00
Kazu Hirata	d000431fb2	[X86] Remove X86ELFObjectWriter in X86AsmBackend.cpp (NFC) Note that the identically named class is defined in an anonymous namespace in X86ELFObjectWriter.cpp.	2021-11-01 08:31:54 -07:00
Jay Foad	7afef22926	[AMDGPU] Use MachineInstrBuilder::addReg. NFC.	2021-11-01 15:29:51 +00:00
Jay Foad	2b548b18c1	[AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32 Differential Revision: https://reviews.llvm.org/D112917	2021-11-01 13:55:53 +00:00
Mubashar Ahmad	0b83a18a2b	[AArch64] Enablement of Cortex-X2 Enables support for Cortex-X2 cores. Differential Revision: https://reviews.llvm.org/D112459	2021-11-01 11:55:24 +00:00
Simon Pilgrim	6fc50e531d	[CostModel][X86] Remove old FIXME comments for AVX512F vector splitting Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.	2021-11-01 11:11:11 +00:00
Simon Pilgrim	fd485d8cda	[X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances. This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source. There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses? Noticed while investigating the quality of interleaved load/store codegen. Differential Revision: https://reviews.llvm.org/D111960	2021-11-01 10:45:50 +00:00

1 2 3 4 5 ...

64867 Commits