llvm-project

Commit Graph

Author	SHA1	Message	Date
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Usman Nadeem	5420fc4a27	[AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend Replace vector unpack operation with a scalar extend operation. unpack(splat(X)) --> splat(extend(X)) If we have both, unpkhi and unpklo, for the same vector then we may save a register in some cases, e.g: Hi = unpkhi (splat(X)) Lo = unpklo(splat(X)) --> Hi = Lo = splat(extend(X)) Differential Revision: https://reviews.llvm.org/D106929 Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b	2021-08-09 14:58:54 -07:00
Usman Nadeem	85bbc05154	[AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value Move the last{a,b} operation to the vector operand of the binary instruction if the binop's operand is a splat value. This essentially converts the binop to a scalar operation. Example: // If x and/or y is a splat value: lastX (binop (x, y)) --> binop(lastX(x), lastX(y)) Differential Revision: https://reviews.llvm.org/D106932 Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0	2021-08-09 14:48:41 -07:00
Eli Friedman	ac20e56911	[AArch64] Implement FCOPYSIGN for SVE. I was originally going to try to implement this in target-independent code, but it's actually sort of tricky to generate the correct sequence for vectors like nxv2f32. So just stick this in target-specific code, at least for now. Differential Revision: https://reviews.llvm.org/D107608	2021-08-09 12:06:48 -07:00
Arnold Schwaighofer	b987c283ae	[coro] Correct CurrentBlock tracking bug recently introduced We use the CurrentBlock to determine whether we have already processed a block. Don't reuse this variable for setting where we should insert the rematerialization. The rematerialization block is different to the current block when we rematerialize for coro suspend block users. Differential Revision: https://reviews.llvm.org/D107573	2021-08-09 10:41:41 -07:00
Bradley Smith	73ecb9987b	[AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter The patterns for fixed length gather/scatter with 32-bit offsets and 64-bit memory type are slightly different that the rest of the patterns, as such the lowering needs to be slightly different to ensure the correct types are used. Differential Revision: https://reviews.llvm.org/D107576	2021-08-09 14:05:22 +00:00
Jeremy Morse	d4ce9e463d	[DWARF] Revert sharing subprograms across CUs This patch is a revert of `e08f205f5c`. In that patch, DW_TAG_subprograms were permitted to be referenced across CU boundaries, to improve stack trace construction using call site information. Unfortunately, as documented in PR48790, the way that subprograms are "owned" by dwarf units is sufficiently complicated that subprograms end up in unexpected units, invalidating cross-unit references. There's no obvious way to easily fix this, and several attempts have failed. Revert this to ensure correct DWARF is always emitted. Three tests change in addition to the reversion, but they're all very light alterations. Differential Revision: https://reviews.llvm.org/D107076	2021-08-09 12:43:43 +01:00
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Luo, Yuanke	53642d5b80	[NFC] Fix the formula for reciprocal calculation. Differential Revision: https://reviews.llvm.org/D107713	2021-08-09 16:03:56 +08:00
Min-Yih Hsu	7cbcde4aa3	[M68k] Use separate asm operand class for different widths of address This could help asm parser to pick the correct variant of instruction. This patch also migrated all the control instructions MC tests.	2021-08-09 00:07:19 -07:00
Min-Yih Hsu	cf277f0b31	[M68k][NFC] Coalesce render methods in different asm register op class And assign RegClass (i.e. operand class for all GPR) as the super class of ARegClass and DRegClass. Note that this is a NFC change because actually we already had XRDReg to model either address or data register operands (as well as test coverage for it). The new super class syntax added here is just making the relations between three RegClass-es more explicit.	2021-08-09 00:07:19 -07:00
Cullen Rhodes	1a18bb9270	[AArch64] NFC: Remove DecodeVectorRegisterClass from disassembler The decoder function and table are the same as FPR128, use that instead. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107644	2021-08-09 06:52:47 +00:00
Simon Atanasyan	990e8025b5	[MC][ELF] Do not error on parsing .debug_* section directive for MIPS MIPS .debug_* sections should have SHT_MIPS_DWARF section type to distinguish among sections contain DWARF and ECOFF debug formats, but in assembly files these sections have SHT_PROGBITS (@progbits) type. Now assembler shows 'changed section type for ...' error when parsing `.section .debug_*,"",@progbits` directive for MIPS targets. The same problem exists for x86-64 target and this patch extends workaround implemented in D76151. The patch adds one more case when assembler ignores section types mismatch after `SwitchSection()` call. Differential Revision: https://reviews.llvm.org/D107707	2021-08-09 08:54:56 +03:00
Christudasan Devadasan	fcf2d5f402	Revert "SROA: Enhance speculateSelectInstLoads" This reverts commit `ffc3fb665d`.	2021-08-09 01:13:39 -04:00
Michael Liao	b5e470aa2e	[LowerMemIntrinsics] Typo fix.	2021-08-08 22:38:58 -04:00
Craig Topper	2f3b738960	[RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64. This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.	2021-08-08 18:30:48 -07:00
Craig Topper	88bc29f5f2	[RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC Previously we converted ISD condition codes to integers and stored them directly in our MIR instructions. The ISD enum kind of belongs to SelectionDAG so that seems like incorrect layering. This patch instead uses a CondCode node on RISCV::SELECT_CC until isel and then converts it from ISD encoding to a RISCV specific value. This value can be converted to/from the RISCV branch opcodes in the RISCV namespace. My larger motivation is to possibly support a microarchitectural feature of some CPUs where a short forward branch over a single instruction can be predicated internally. This will require a new pseudo instruction for select that needs to carry a branch condition and live probably until RISCVExpandPseudos. At that point it can be expanded to control flow without other instructions ending up in the predicated basic block. Using an ISD encoding in RISCVExpandPseudos doesn't seem like correct layering. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107400	2021-08-08 17:25:37 -07:00
Craig Topper	20dfe051ab	[RISCV] Move the $rs operand of PseudoStore from outs to ins. NFC This is the data to be stored so it should be an input. To keep operand order similar between loads and stores, move the temp register to the first dest operand of floating point loads. Rework the assembler code accordingly. This doesn't have any functional effect because this Pseudo is only used by the assembler which doesn't use ins/outs. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107309	2021-08-08 15:58:24 -07:00
Kazu Hirata	d9c9d13365	[DWARF] Remove collectChildrenAddressRanges (NFC) The last use was removed on Dec 21, 2018 in commit `c3f30a7fc6`.	2021-08-08 08:57:32 -07:00
Dorit Nuzman	67278b8a90	[LV] Support Interleaved Store Group With Gaps Teach LV to use masked-store to support interleave-store-group with gaps (instead of scatters/scalarization). The symmetric case of using masked-load to support interleaved-load-group with gaps was introduced a while ago, by https://reviews.llvm.org/D53668; This patch completes the store-scenario leftover from D53668, and solves PR50566. Reviewed by: Ayal Zaks Differential Revision: https://reviews.llvm.org/D104750	2021-08-08 10:32:02 +03:00
Min-Yih Hsu	657bb7262d	[M68k] Separate ADDA from ADD and migrate rest of the arithmetic MC tests Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed together, which not only violated Motorola assembly's syntax but also made asm parsing more difficult. This patch separates these two kinds of instructions migrate rest of the tests from test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic. Note that we observed minor regressions on codegen quality: Sometimes isel uses ADD instead of ADDA even the latter can lead to shorter sequence of code. This issue implies that some isel patterns might need to be updated.	2021-08-07 17:19:12 -07:00
Craig Topper	d4ee84ceee	[RISCV] Support FP_TO_S/UINT_SAT for i32 and i64. The fcvt fp to integer instructions saturate if their input is infinity or out of range, but the instructions produce a maximum integer for nan instead of 0 required for the ISD opcodes. This means we can use the instructions to do the saturating conversion, but we'll need to fix up the nan case at the end. We can probably improve the i8 and i16 default codegen as well, but I'll leave that for a follow up. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107230	2021-08-07 16:06:00 -07:00
Nikita Popov	88003cea1c	[MemCpyOpt] Remove MemDepAnalysis-based implementation The MemorySSA-based implementation has been enabled for a few months (since D94376). This patch drops the old MDA-based implementation entirely. I've kept this to only the basic cleanup of dropping various conditions -- the code could be further cleaned up now that there is only one implementation. Differential Revision: https://reviews.llvm.org/D102113	2021-08-07 22:35:44 +02:00
Krishna	a9a176ca3b	[InstCombine] Remove nnan requirement for transformation to fabs from select In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs. (i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub). (ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D106872	2021-08-07 22:38:45 +05:30
Craig Topper	24dfba8d50	[X86] Teach shouldSinkOperands to recognize pmuldq/pmuludq patterns. The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg pattern on the inputs. Ideally we pattern match these away during isel. It is possible for LICM or other middle end optimizations to separate the extend from the mul. This prevents SelectionDAG from removing it or depending on how the extend is lowered, we may not be able to generate an AssertSExt/AssertZExt in the mul basic block. This will prevent pmuldq/pmuludq from being formed at all. This patch teaches shouldSinkOperands to recognize this so that CodeGenPrepare will clone the extend into the same basic block as the mul. Fixes PR51371. Differential Revision: https://reviews.llvm.org/D107689	2021-08-07 08:45:56 -07:00
Roman Lebedev	0a241e90d4	[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))` Instead of expanding it ourselves, we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2: https://alive2.llvm.org/ce/z/ymz7zE (self) https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext) https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)	2021-08-07 17:31:33 +03:00
Roman Lebedev	c6ff867f92	[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))` Now that we canonicalize low bit splatting to the form we were emitting here ourselves, emit simpler IR that will be canonicalized later. See `1e801439be` for proofs: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)	2021-08-07 17:31:24 +03:00
Roman Lebedev	e71870512f	[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305) Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF), so we should have a preference. It seems like mask+negation is better than two shifts.	2021-08-07 17:25:28 +03:00
Christudasan Devadasan	ffc3fb665d	SROA: Enhance speculateSelectInstLoads Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-07 09:09:14 -04:00
Florian Hahn	a00aafc30d	[VPlan] Iterate over phi recipes to detect reductions to fix. After refactoring the phi recipes, we can now iterate over all header phis in a VPlan to detect reductions when it comes to fixing them up when tail folding. This reduces the coupling with the cost model & legal by using the information directly available in VPlan. It also removes a call to getOrAddVPValue, which references the original IR value which may become outdated after VPlan transformations. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100102	2021-08-07 14:06:50 +01:00
Amara Emerson	4c2e01232c	[GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs. We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() instead of eraseFromParent(). We should probably use that in other places too but fix this issue which affects clang bootstrap builds for now.	2021-08-07 01:41:02 -07:00
Nemanja Ivanovic	62fe3dcf98	Fix PPC buildbot break caused by `4c4093e6e3` This commit adds the isnan intrinsic and provides a default expansion for it in the SDAG. However, it makes the assumption that types it operates on are IEEE-compliant types. This is not always the case. An example of that is PPC "double double" which has a representation that - Does not need to conform to IEEE requirements for isnan as it is not an IEEE-compliant type - Does not have a representation that allows for straightforward reinterpreting as an integer and use of integer operations The result was that this commit broke __builtin_isnan for ppc_fp128 making many valid numeric values report a NaN. This patch simply changes the expansion to always expand to unordered comparison (regardless of whether FP exceptions are tracked). This is inline with previous semantics.	2021-08-06 22:10:20 -05:00
Steffen Larsen	1b4c85fc02	[NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5. PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D107046	2021-08-06 16:13:35 -07:00
Sanjay Patel	0369714b31	[InstCombine] reduce vector casting before icmp There may be some generalizations (see test comments) of these patterns, but this should handle the cases motivated by: https://llvm.org/PR51315 https://llvm.org/PR51259 The backend may want to transform differently, but at least for the x86 examples that I looked at, there does not appear to be any significant perf diff either way.	2021-08-06 17:09:38 -04:00
Michael Liao	05783e1cfe	[amdgpu] Revise the conversion from i64 to f32. - Replace 'cmp+sel' with 'umin' if possible. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107507	2021-08-06 17:01:47 -04:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
Thomas Johnson	f8a4495149	[ARC] Add codegen for llvm.ctlz intrinsic for the ARC backend Differential Revision: https://reviews.llvm.org/D107611	2021-08-06 12:18:06 -07:00
Artem Belevich	6a9cf21f5a	[CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that there are users that do not expect LLVM to materialize `memset` intrinsic. While other passes can do that, too, MemCpyOpt triggers it more frequently and breaks sanitizers and some downstream users. For now introduce a flag to force-enable the flag and opt-in only CUDA compilation with NVPTX back-end. Differential Revision: https://reviews.llvm.org/D106401	2021-08-06 11:13:52 -07:00
Zheng Chen	30b0c455b1	[LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D107618	2021-08-06 16:51:09 +00:00
Jon Roelofs	eae4a44c1d	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-06 09:48:39 -07:00
Michael Liao	d1cacd5928	[MemCpyOpt] Teach memcpyopt to handle loads from the constant memory. - Loads from the constant memory (either explicit one or as the source of memory transfer intrinsics) won't alias any stores. Reviewed By: asbirlea, efriedma Differential Revision: https://reviews.llvm.org/D107605	2021-08-06 12:43:52 -04:00
Craig Topper	b2ca4dc935	[LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available. This isn't optimal, but prevents crashing when the libcall isn't available. It just calculates the full product and makes sure the high bits match the sign of the low half. Each of the pieces should go through their own type legalization. This can make D107420 unnecessary. Needs tests, but I wanted to start discussion about D107420. Reviewed By: FreddyYe Differential Revision: https://reviews.llvm.org/D107581	2021-08-06 09:43:01 -07:00
David Green	77e8f4eeee	[ARM] Define ComplexPatternFuncMutatesDAG Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we need to define ComplexPatternFuncMutatesDAG. Differential Revision: https://reviews.llvm.org/D107476	2021-08-06 17:35:11 +01:00
Kazu Hirata	276be84d0a	[CodeGen] Remove computeDefOperandLatency (NFC) The last use was removed on Oct 9, 2016 in commit `5c924d7117`.	2021-08-06 08:26:55 -07:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Mircea Trofin	ae1a2a09e4	[NFC][MLGO] Make logging more robust 1) add some self-diagnosis (when asserts are enabled) to check that all features have the same nr of entries 2) avoid storing pointers to mutable fields because the proto API contract doesn't actually guarantee those stay fixed even if no further mutation of the object occurs. Differential Revision: https://reviews.llvm.org/D107594	2021-08-06 04:44:52 -07:00
Reshabh Sharma	5173854f19	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-06 15:53:33 +05:30
Simon Pilgrim	dbce6a8d9d	[ARM] Fold insert_subvector to concat_vectors D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage. I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.	2021-08-06 11:21:31 +01:00
Simon Pilgrim	18e6a03b1a	[X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281) As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width. The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data. This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize. Differential Revision: https://reviews.llvm.org/D107158	2021-08-06 11:21:31 +01:00
Cullen Rhodes	08bc441174	[AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst	2021-08-06 09:23:18 +00:00
David Sherwood	3fd96e1b2e	[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-06 10:13:15 +01:00
Chuanqi Xu	0fd03feb4b	[FuncSpec] Return changed if function is changed by tryToReplaceWithConstant The may get changed before specialization by RunSCCPSolver. In other words, the pass may change the function without specialization happens. Add test and comment to reveal this. And it may return No Changed if the function get changed by RunSCCPSolver before the specialization. It looks like a potential bug. Test Plan: check-all Reviewed By: https://reviews.llvm.org/D107622 Differential Revision: https://reviews.llvm.org/D107622	2021-08-06 17:00:17 +08:00
David Sherwood	43a5c750d1	Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors" This reverts commit `95800da914`.	2021-08-06 09:48:16 +01:00
Jay Foad	83610d4eb0	[AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107474	2021-08-06 09:40:48 +01:00
Jay Foad	24b67a9024	[AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef We can improve on the generic splitting by using ffbh/ffbl, which have a defined result when the input is zero. Differential Revision: https://reviews.llvm.org/D107442	2021-08-06 09:40:48 +01:00
Jay Foad	d77b43c385	[AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32 This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These instructions have a defined result of -1 when the input is zero. Differential Revision: https://reviews.llvm.org/D107441	2021-08-06 09:40:48 +01:00
Jay Foad	cd2594e1c6	[GlobalISel] Improve legalization of narrow CTTZ Differential Revision: https://reviews.llvm.org/D107457	2021-08-06 09:40:48 +01:00
Chuanqi Xu	62fc3e0ad6	[NFC] [FuncSpec] Remove unused variables in isArgumentInteresting	2021-08-06 16:38:20 +08:00
Chuanqi Xu	cc3f40bb41	[FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish) Noticed that the computation for function specialization cost of a function wouldn't change during the traversal of the arguments for the function. We could hoist the computation out of the traversal. I observed about ~1% improvement on compile time for spec2017. But I guess it may not be precise. This should be NFC and fine. Reviewed By: Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D107621	2021-08-06 15:43:05 +08:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Florian Hahn	3e58dd19df	[LV] Move reduction PHI node fixup to VPlan::execute (NFC). All information to fix-up the reduction phi nodes in the vectorized loop is available in VPlan now. This patch moves the code to do so, to make this clearer. Fixing up the loop exit value still relies on other information and remains outside of VPlan for now. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100113	2021-08-06 08:29:20 +01:00
Chuanqi Xu	82ca845b47	[NFC] [FuncSpec] Update the Todo list for recursive functions Now the recursive functions may get specialized many times when `func-specialization-max-iters` increases. See discussion in https://reviews.llvm.org/D106426 for details.	2021-08-06 14:43:17 +08:00
Amara Emerson	4fee756c75	Delete copy-ctor of MachineFrameInfo. I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo() without declaring the variable as a reference. Deleting the copy-constructor also showed a place in the ARM backend which was doing the same thing, albeit it didn't impact correctness there from the looks of it.	2021-08-05 23:24:37 -07:00
Kai Luo	666ee849f0	[PowerPC] Fix shift amount of xxsldwi when performing vector int_to_double POC ``` // main.c #include <stdio.h> #include <altivec.h> extern vector double foo(vector int s); int main() { vector int s = {0, 1, 0, 4}; vector double vd; vd = foo(s); printf("%lf %lf\n", vd[0], vd[1]); return 0; } // poc.c vector double foo(vector int s) { int x1 = s[1]; int x3 = s[3]; double d1 = x1; double d3 = x3; vector double x = { d1, d3 }; return x; } ``` Compiled with `poc.c main.c -mcpu=pwr8 -O3` on BE machine. Current clang gives ``` 4.000000 1.000000 ``` while xlc gives ``` 1.000000 4.000000 ``` Xlc's output should be correct. Reviewed By: shchenz, #powerpc Differential Revision: https://reviews.llvm.org/D107428	2021-08-06 06:01:29 +00:00
Arthur Eubanks	a1b21ed3fb	[GCov] Emit memset instead of stores in __llvm_gcov_reset For a very large module, __llvm_gcov_reset can become very large. __llvm_gcov_reset previously emitted stores to a bunch of globals in one huge basic block. MemCpyOpt would turn many of these stores into memsets, and updating MemorySSA would be extremely slow. Verified that this makes the compile time of certain files go down drastically (20min -> 5min). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107538	2021-08-05 22:40:15 -07:00
Serge Bazanski	7ece20505f	[Lanai] fix lowering wide returns This implements LanaiTargetLowering::CanLowerReturn, thereby ensuring all return values conform to the RetCC and get sret-demoted as necessary. A regression test is also added that exercises this functionality. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D107086	2021-08-05 21:08:09 -07:00
Jinsong Ji	6f84d94b9c	[PowerPC] Fix copy/paste error in scalar_to_vector patterns https://reviews.llvm.org/D100478 refactoring added a copy/paste error for v8i16 patterns. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D107609	2021-08-06 02:59:01 +00:00
Jessica Paquette	e6a3944ea9	[AArch64][GlobalISel] Overhaul G_INSERT legalization Similar cleanup to G_EXTRACT (`51bd4e874f`). Also swap the order of clamp/widen to avoid unnecessary complex merges. Add a bunch of missing testcases to legalize-inserts while we're at it. Differential Revision: https://reviews.llvm.org/D107601	2021-08-05 18:28:22 -07:00
Jessica Paquette	562c8e14d9	[AArch64][GlobalISel] Widen G_IMPLICIT_DEF and G_FREEZE before clamping Similar to other cleanup commits which widen instructions before clamping during legalization. Purpose of this is to avoid weird type breakdowns. In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions. The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so this can help with emitting merges elsewhere. Differential Revision: https://reviews.llvm.org/D107604	2021-08-05 18:21:14 -07:00
Sean Fertile	23651c5ae0	[PowerPC][AIX] Create multiple constant sections. Fixes issue where late materialized constants can be more strictly aligned then their containing csect. Differential Revision: https://reviews.llvm.org/D103103	2021-08-05 21:19:16 -04:00
Jon Roelofs	5fc7b1a260	Revert "[GlobalISel][KnownBits] Implement G_CTPOP" This reverts commit `ce6eb4f15a`. It's broken on the windows bots: https://reviews.llvm.org/D107606#2930121	2021-08-05 17:47:47 -07:00
Jon Roelofs	ce6eb4f15a	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-05 17:17:29 -07:00
Ryan Prichard	623cf3dfdf	Mark getc_unlocked as unavailable by default Before D45736, getc_unlocked was available by default, but turned off for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked functions, which were unavailable by default, but it also: * left getc_unlocked enabled by default, * removed the disabling line for Windows, and * added code to enable getc_unlocked for GNU, Android, and OSX. For consistency, make getc_unlocked unavailable by default. Maybe this was the intent of D45736 anyway. Reviewed By: MaskRay, efriedma Differential Revision: https://reviews.llvm.org/D107527	2021-08-05 16:35:02 -07:00
Jessica Paquette	8a557d8311	[AArch64][GlobalISel] Widen extloads before clamping during legalization Allows us to avoid awkward type breakdowns on types like s88, like the other commits. Differential Revision: https://reviews.llvm.org/D107587	2021-08-05 16:14:06 -07:00
Stanislav Mekhanoshin	d71924fbfe	[AMDGPU] Improve v2i32/v2f32 insertelt patterns Using REG_SEQUENCE produces better code than INSERT_SUBREG, we can omit one move instruction in many cases. Fixes: SWDEV-298028 Differential Revision: https://reviews.llvm.org/D107602	2021-08-05 16:13:39 -07:00
Heejin Ahn	41ba39dfcd	[WebAssembly] Don't do SjLj transformation when there's only setjmp When there is a `setjmp` call in a function, we transform every callsite of `setjmp` to record its information by calling `saveSetjmp` function, and we also transform every callsite of a function that can longjmp to to check if a longjmp occurred and if so jump to the corresponding post-setjmp BB. Currently we are doing this for every function that contains a call to `setjmp`, but if there is no other function call within that function that can longjmp, this transformation of `setjmp` callsite and all the preparation of `setjmpTable` in the entry of the function are not necessary. This checks if a setjmp-calling function has any other calls that can longjmp, and if not, skips the function for the SjLj transformation. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107530	2021-08-05 15:28:02 -07:00
David Green	649cf4514d	[AArch64] Expand the SVE min/max reduction costs to NEON This takes the existing SVE costing for the various min/max reduction intrinsics and expands it to NEON, where I believe it applies equally well. In the process it changes the lowering to use min/max cost, as opposed to summing up the cost of ICmp+Select. Differential Revision: https://reviews.llvm.org/D106239	2021-08-05 23:23:24 +01:00
Jessica Paquette	36498374d4	[AArch64][GlobalISel] Widen G_BSWAP before clamping This allows us to avoid odd type breakdowns + allows us to legalize types like s88 in the first place. Add some testcases for known legal types + testcases for s4 and s88. Differential Revision: https://reviews.llvm.org/D107607	2021-08-05 15:16:00 -07:00
Jessica Paquette	51bd4e874f	[AArch64][GlobalISel] Overhaul G_EXTRACT legalization This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly changing this because it should make it easier to improve legalization for instructions which use G_EXTRACT as part of the legalization process. This also adds support for legalizing some weird types. Similar to other recent legalizer changes, this changes the order of widening/clamping. There was some dead code in our existing rules (e.g. the p0 case would never get hit), so this knocks those out and makes the types we want to handle explicit. This also removes some checks which, nowadays, are handled by the MachineVerifier. Differential Revision: https://reviews.llvm.org/D107505	2021-08-05 13:55:15 -07:00
Jon Roelofs	98f38c151b	[AArch64][GlobalISel] Legalize ctpop s128 This is re-landing the same patch again, but without the changes to LegalizerHelper that regressed the Mips test: test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll Differential revision: https://reviews.llvm.org/D106494	2021-08-05 11:54:53 -07:00
Chris Jackson	113a06f7a5	{DebugInfo][LSR] Don't cache dbg.value that are already undef The SCEV-based salvaging method caches dbg.value information pre-LSR so that salvaging may be attempted post-LSR. If the dbg.value are already undef pre-LSR then a salvage attempt would be fruitless, so avoid caching them. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D107448	2021-08-05 19:16:43 +01:00
Roman Lebedev	c0586ff05d	[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef` to it is confusing. Instead, just move the `Mask` vector higher up, and change the code that earlier had no access to it but now does to use `Mask` instead of `BaseMask`. This has no other intentional changes. This is a recommit of `35c0848b57`, that was reverted to simplify reversion of an earlier change.	2021-08-05 20:37:51 +03:00
Ramesh Peri	976bd23612	[llvm-ar] Fix for handling thin archive with SYM64 and a test case for it WHen thin archives are created which have symbol table of type SYM64 then all the tools will not work since they cannot read the files properly. One can reproduce the problem as follows: 1. Take a hello world program and create an archive out of it. The SYM64_THRESHOLD=0 will force the generation of SYM64 symbol table. clang -c hello.cpp SYM64_THRESHOLD=0 llvm-ar crsT mylib.a hello.o 2. Now try to use any of the tools on this mylib.a and it will fail. llvm-nm -M mylib.a THis fix will eliminate these failures. A regression test is created in llvm/test/Object/archive-symtab.test Reviewed By: MaskRay, Ramesh Differential Revision: https://reviews.llvm.org/D107322	2021-08-05 10:06:34 -07:00
Benjamin Kramer	bd17ced1db	Revert "[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats" This reverts commits `f819e4c7d0` and `35c0848b57`. It triggers an infinite loop during compilation. $ cat t.ll target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define void @MaxPoolGradGrad_1.65() local_unnamed_addr #0 { entry: %wide.vec78 = load <64 x i32>, <64 x i32>* null, align 16 %strided.vec83 = shufflevector <64 x i32> %wide.vec78, <64 x i32> poison, <8 x i32> <i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60> %0 = lshr <8 x i32> %strided.vec83, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16> %1 = add <8 x i32> zeroinitializer, %0 %2 = shufflevector <8 x i32> %1, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> %3 = shufflevector <16 x i32> %2, <16 x i32> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31> %interleaved.vec = shufflevector <32 x i32> undef, <32 x i32> %3, <64 x i32> <i32 0, i32 8, i32 16, i32 24, i32 32, i32 40, i32 48, i32 56, i32 1, i32 9, i32 17, i32 25, i32 33, i32 41, i32 49, i32 57, i32 2, i32 10, i32 18, i32 26, i32 34, i32 42, i32 50, i32 58, i32 3, i32 11, i32 19, i32 27, i32 35, i32 43, i32 51, i32 59, i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60, i32 5, i32 13, i32 21, i32 29, i32 37, i32 45, i32 53, i32 61, i32 6, i32 14, i32 22, i32 30, i32 38, i32 46, i32 54, i32 62, i32 7, i32 15, i32 23, i32 31, i32 39, i32 47, i32 55, i32 63> store <64 x i32> %interleaved.vec, <64 x i32>* undef, align 16 unreachable } $ llc < t.ll -mcpu=skylake <hang>	2021-08-05 18:58:08 +02:00
Jessica Paquette	f3f3098afe	[AArch64][GlobalISel] Mark v16s8 <- v8s8, v8s8 G_CONCAT_VECTOR as legal G_CONCAT_VECTORS shows up from time to time when legalizing other instructions. We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it as legal gives us selection for free. Differential Revision: https://reviews.llvm.org/D107512	2021-08-05 09:40:46 -07:00
Kazu Hirata	72661f337a	[Transforms] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-08-05 08:53:17 -07:00
Alexey Bataev	e7c3eaa8ae	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107494	2021-08-05 08:41:24 -07:00
Craig Topper	f7076cfd3a	[DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding. We don't have real demanded bits support for MULHU, but we can still use the known bits based constant folding support at the end of SimplifyDemandedBits to simplify a MULHU. This helps with cases where we know the LHS and RHS have enough leading zeros so that the high multiply result is always 0. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106471	2021-08-05 08:31:26 -07:00
David Sherwood	e9177b0958	Fix build issues caused by `95800da914`	2021-08-05 16:26:34 +01:00
Sander de Smalen	3e47f009ff	[LV] Consider ExtractValue as uniform. Since all operands to ExtractValue must be loop-invariant when we deem the loop vectorizable, we can consider ExtractValue to be uniform. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107286	2021-08-05 16:20:50 +01:00
eopXD	fd7f6a3c81	[NFC][LoopIdiom] rename boolean variable NegStride to IsNegStride Rename variable for better code readability. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107570	2021-08-05 23:11:42 +08:00
Jay Foad	2b63933115	[AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107566	2021-08-05 15:57:40 +01:00
Jay Foad	e6c364a624	[AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107546	2021-08-05 15:57:40 +01:00
Momchil Velikov	f171149e0d	[SimpifyCFG] Speculate a store preceded by a local non-escaping load In SimplifyCFG we may simplify the CFG by speculatively executing certain stores, when they are preceded by a store to the same location. This patch allows such speculation also when the stores are similarly preceded by a load. In order for this transformation to be correct we need to ensure that the memory location is writable and the store in the new location does not introduce a data race. Local objects (created by an `alloca` instruction) are always writable, so once we are past a read from a location it is valid to also write to that same location. Seeing just a load does not guarantee absence of a data race (unlike if we see a store) - the load may still be part of a race, just not causing undefined behaviour (cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic). In the original program, a data race might have been prevented by the condition, but once we move the store outside the condition, we must be sure a data race wasn't possible anyway, no matter what the condition evaluates to. One way to be sure that a local object is never concurrently read/written is check that its address never escapes the function. Hence this transformation is restricted to local, non-escaping objects. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D107281	2021-08-05 15:54:42 +01:00
Simon Pilgrim	2cbf9fd402	[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes. This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector. This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector. Fixes PR50053 Differential Revision: https://reviews.llvm.org/D107068	2021-08-05 15:40:48 +01:00
Florian Hahn	38b098be66	[VectorCombine] Limit scalarization known non-poison indices. We can only trust the range of the index if it is guaranteed non-poison. Fixes PR50949. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107364	2021-08-05 15:36:31 +01:00
Dawid Jurczak	06206a8cd1	[BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc Additionally with this patch aligned DSE which is the only user of emitCalloc. Differential Revision: https://reviews.llvm.org/D103523	2021-08-05 16:18:38 +02:00
David Sherwood	95800da914	[LoopVectorize] Add support for replication of more intrinsics with scalable vectors This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-05 15:17:27 +01:00
Dawid Jurczak	f8cdde7195	[SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation FoldMallocMemset can be safely removed because since https://reviews.llvm.org/D103009 such transformation is already performed in DSE. Differential Revision: https://reviews.llvm.org/D103451	2021-08-05 16:08:32 +02:00
Krzysztof Parzyszek	d0c3b61498	Delay initialization of OptBisect When LLVM is used in other projects, it may happen that global cons- tructors will execute before the call to ParseCommandLineOptions. Since OptBisect is initialized via a constructor, and has no ability to be updated at a later time, passing "-opt-bisect-limit" to the parse function may have no effect. To avoid this problem use a cl::cb (callback) to set the bisection limit when the option is actually processed. Differential Revision: https://reviews.llvm.org/D104551	2021-08-05 09:04:17 -05:00
Bardia Mahjour	0e08891ec1	[DA] control compile-time spent by MIV tests Function exploreDirections() in DependenceAnalysis implements a recursive algorithm for refining direction vectors. This algorithm has worst-case complexity of O(3^(n+1)) where n is the number of common loop levels. In this patch I'm adding a threshold to control the amount of time we spend in doing MIV tests (which most of the time end up resulting in over pessimistic direction vectors anyway). Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D107159	2021-08-05 09:50:11 -04:00
Sander de Smalen	8d08a84745	[LV] Remove a change that was added in D106164. This change wasn't strictly necessary for D106164 and could be removed. This patch addresses the post-commit comments from @fhahn on D106164, and also changes sve-widen-gep.ll to use the same IR test as shown in pointer-induction.ll. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106878	2021-08-05 14:44:53 +01:00
Paul Robinson	75aa3d520d	Add a DIExpression const-folder to prevent silly expressions. It's entirely possible (because it actually happened) for a bool variable to end up with a 256-bit DW_AT_const_value. This came about when a local bool variable was initialized from a bitfield in a 32-byte struct of bitfields, and after inlining and constant propagation, the variable did have a constant value. The sequence of optimizations had it carrying "i256" values around, but once the constant made it into the llvm.dbg.value, no further IR changes could affect it. Technically the llvm.dbg.value did have a DIExpression to reduce it back down to 8 bits, but the compiler is in no way ready to emit an oversized constant and a DWARF expression to manipulate it. Depending on the circumstances, we had either just the very fat bool value, or an expression with no starting value. The sequence of optimizations that led to this state did seem pretty reasonable, so the solution I came up with was to invent a DWARF constant expression folder. Currently it only does convert ops, but there's no reason it couldn't do other ops if that became useful. This broke three tests that depended on having convert ops survive into the DWARF, so I added an operator that would abort the folder to each of those tests. Differential Revision: https://reviews.llvm.org/D106915	2021-08-05 06:14:40 -07:00
Petar Avramovic	66de26b1f9	GlobalISel: Fix matchEqualDefs for instructions with multiple defs Instructions that produceSameValue produce same values for operands with same index. matchEqualDefs used to return true for any two values from different instructions that produce same values. Fix this by checking if values are defined by operands with the same index. Differential Revision: https://reviews.llvm.org/D107362	2021-08-05 15:05:45 +02:00
Simon Pilgrim	e78bf49a58	[X86] Rename Subtarget Tuning Feature Flag Prefix. NFC. As suggested on D107370, this patch renames the tuning feature flags to start with 'Tuning' instead of 'Feature'. Differential Revision: https://reviews.llvm.org/D107459	2021-08-05 13:09:23 +01:00
Dominik Montada	cc947e29ea	[GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107330	2021-08-05 13:52:10 +02:00
Fraser Cormack	0b8471e91b	[SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action The LegalizeAction for this node should follow the logic for `VECREDUCE_SEQ_FADD` and be determined using the vector operand's type. here isn't an in-tree target that makes use of this, but I think it's safe to say this is how it should behave, should a target want to customize the action for this node. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107478	2021-08-05 09:42:33 +01:00
Jay Foad	e790b2b744	[AMDGPU] Make more use of getHiHalf64 and split64BitValue. NFCI.	2021-08-05 09:36:13 +01:00
Igor Kudrin	2c14798ead	[ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions This extends D105979 and adds support for VLDR instructions. Differential Revision: https://reviews.llvm.org/D105980	2021-08-05 14:11:11 +07:00
Igor Kudrin	ddbe812bcc	[ARM][llvm-objdump] Annotate PC-relative memory operands This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for Arm so that the disassembler can print the target address of memory operands that use PC+immediate addressing. Differential Revision: https://reviews.llvm.org/D105979	2021-08-05 14:11:11 +07:00
eopXD	26aa1bbe97	[NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned Letting it take SCEV allows further modification on the function to optimize if the StoreSize / Stride is runtime determined. This is a preceeding of D107353. The big picture is to let LoopIdiom deal with runtime-determined sizes. Reviewed By: Whitney, lebedev.ri Differential Revision: https://reviews.llvm.org/D104595	2021-08-05 13:21:48 +08:00
Heejin Ahn	aa0b0fbbe6	[WebAssembly] Use `SDValue::getConstantOperandVal` (NFC) Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D107499	2021-08-04 21:15:23 -07:00
Nathan Lanza	5848166369	Disable LibFuncs for stpcpy and stpncpy for Android < 21 These functions don't exist in android API levels < 21. A change in llvm-12 (rG6dbf0cfcf789) caused Oz builds to emit this symbol assuming it's available and thus is causing link errors. Simply disable it here. Differential Revision: https://reviews.llvm.org/D107509	2021-08-04 22:48:41 -04:00
Matt Jacobson	75abeb64ce	[AVR] emit 'MCSA_Global' references to '__do_global_ctors' and '__do_global_dtors' Emit references to '__do_global_ctors' and '__do_global_dtors' to allow constructor/destructor routines to run. Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D107133	2021-08-05 10:37:36 +08:00
modimo	041b525141	[CSSPGO] Remove used of PseudoProbeAttributes::Reserved D106861 added usage of PseudoProbeAttributes::Reserved as TailCall however this usage hasn't been committed/reviewed. Removing this usage. Testing ninja check-all Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D107514	2021-08-04 17:23:56 -07:00
Yonghong Song	e52946b9ab	BPF: avoid NE/EQ loop exit condition Kuniyuki Iwashima reported in [1] that llvm compiler may convert a loop exit condition with "i < bound" to "i != bound", where "i" is the loop index variable and "bound" is the upper bound. In case that "bound" is not a constant, verifier will always have "i != bound" true, which will cause verifier failure since to verifier this is an infinite loop. The fix is to avoid transforming "i < bound" to "i != bound". In llvm, the transformation is done by IndVarSimplify pass. The compiler checks loop condition cost (i = i + 1) and if the cost is lower, it may transform "i < bound" to "i != bound". This patch implemented getArithmeticInstrCost() in BPF TargetTransformInfo class to return a higher cost for such an operation, which will prevent the transformation for the test case added in this patch. [1] https://lore.kernel.org/netdev/1994df05-8f01-371f-3c3b-d33d7836878c@fb.com/ Differential Revision: https://reviews.llvm.org/D107483	2021-08-04 16:54:16 -07:00
Jessica Paquette	ca2e053652	[AArch64][GlobalISel] Legalize wide vector G_PHIs Clamp the max number of elements when legalizing G_PHI. This allows us to legalize some common fallbacks like 4 x s64. Here's an example: https://godbolt.org/z/6YocsEYTd Had to add -global-isel-abort=0 to legalize-phi.mir to account for the G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI. Differential Revision: https://reviews.llvm.org/D107508	2021-08-04 16:48:59 -07:00
Heejin Ahn	31a71a393f	[WebAssembly] Make result of 'catch' inst variadic `catch` instruction can have any number of result values depending on its tag, but so far we have only needed a single i32 return value for C++ exception so the instruction was specified that way. But using the instruction for SjLj handling requires multiple return values. This makes `catch` instruction's results variadic and moves selection of `throw` and `catch` instruction from ISelLowering to ISelDAGToDAG. Moving `catch` to ISelDAGToDAG is necessary because I am not aware of a good way to do instruction selection for variadic output instructions in TableGen. This also moves `throw` because 1. `throw` and `catch` share the same utility function and 2. there is really no reason we should do that in ISelLowering in the first place. What we do is mostly the same in both places, and moving them to ISelDAGToDAG allows us to remove unnecessary mid-level nodes for `throw` and `catch` in WebAssemblyISD.def and WebAssemblyInstrInfo.td. This also adds handling for new `catch` instruction to AsmTypeCheck. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D107423	2021-08-04 14:05:33 -07:00
Fangrui Song	9c19b36f1c	[X86] Remove -x86-experimental-pref-loop-alignment in favor of -align-loops	2021-08-04 13:23:57 -07:00
Fangrui Song	a194438615	[CodeGen] Add -align-loops to `lib/CodeGen/CommandFlags.cpp`. It can replace -x86-experimental-pref-loop-alignment=. The loop alignment is only used by MachineBlockPlacement. The implementation uses a new `llvm::TargetOptions` for now, as an IR function attribute/module flags metadata may be overkill. This is the llvm part of D106701.	2021-08-04 12:45:18 -07:00
Michael Liao	5edc886e90	[amdgpu] Add an enhanced conversion from i64 to f32. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D107187	2021-08-04 15:33:12 -04:00
Nikita Popov	bb15861e14	[MemCpyOpt] Relax libcall checks Rather than blocking the whole MemCpyOpt pass if the libcalls are not available, only disable creation of new memset/memcpy intrinsics where only load/stores were used previously. This only affects the store merging and load-store conversion optimization. Other optimizations are derived from existing intrinsics, which are well-defined in the absence of libcalls -- not having the libcalls just means that call simplification won't convert them to intrinsics. This is a weaker variation of D104801, which dropped these checks entirely. Ideally we would not couple emission of intrinsics to libcall availability at all, but as the intrinsics may be legalized to libcalls we need to be a bit careful right now. Differential Revision: https://reviews.llvm.org/D106769	2021-08-04 21:17:51 +02:00
Giorgis Georgakoudis	29a3e3dd7b	[OpenMPOpt] Expand SPMDization with guarding for target parallel regions This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106892	2021-08-04 11:49:24 -07:00
Craig Topper	c23405174a	[DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS. This allows special constants like to 0 to be recognized. It's also expected by isel patterns if a target had a mulh with immediate instructions. The commuting done by tablegen won't commute patterns with immediates since it expects DAGCombine to have done it. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107486	2021-08-04 11:39:23 -07:00
Alexey Bataev	214f99b27c	Revert "[SLP]Do not emit extra shuffle for insertelements vectorization." This reverts commit `871ea69803` to fix the problem if the first vector is not just undef.	2021-08-04 11:28:59 -07:00
Reshabh Sharma	dce35ef104	Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list" This reverts commit `d42e70b3d3`.	2021-08-04 23:33:31 +05:30
Dawid Jurczak	238139be09	[DSE][NFC] Clean up DeadStoreElimination from unused variables Differential Revision: https://reviews.llvm.org/D106446	2021-08-04 19:44:40 +02:00
Craig Topper	643ce70a64	[RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions. Use a tail policy operand instead. Inspired by the work in D105092, but without the intrinsic interface changes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106512	2021-08-04 10:39:50 -07:00
Jessica Paquette	d9279843b1	[AArch64][GlobalISel] Widen G_PHI before clamping it during legalization This allows us to handle weird types like s88; we first widen to s128, then clamp back down to s64. https://godbolt.org/z/9xqbP46Mz Also this makes it possible for GISel to legalize the case in pr48188.ll. It now does the same thing as SDAG, although regalloc chooses different registers. Differential Revision: https://reviews.llvm.org/D107417	2021-08-04 10:25:14 -07:00
Jessica Paquette	7d97de60b3	[AArch64][GlobalISel] Widen G_FPTO*I before clamping Going through our legalization rules and doing some cleanup. Widening and then clamping is usually easier than clamping and then widening. This allows us to legalize some weird types like s88. Differential Revision: https://reviews.llvm.org/D107413	2021-08-04 10:19:26 -07:00
Petr Hosek	6660cec568	[InstrProfiling] Emit bias variable eagerly Rather than emitting the bias variable lazily as needed, emit it eagerly. This allows profile runtime to refer to this variable unconditionally without having to use the weak reference. The bias variable is in a COMDAT so there'll never be more than one instance, and if it's not needed, linker should be able to GC it, so the overhead should be minimal. Differential Revision: https://reviews.llvm.org/D107377	2021-08-04 10:17:08 -07:00
Andrea Di Biagio	7a1a35a1d1	[X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322). This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in the RM/MR variants of ADC/SBB (PR51318) This also fixes the absence of ReadAdvance for the register operand of RMW arithmetic instructions (PR51322). Differential Revision: https://reviews.llvm.org/D107367	2021-08-04 17:50:22 +01:00
jamesluox	ee7d20e846	[CSSPGO] Migrate and refactor the decoder of Pseudo Probe Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D106861	2021-08-04 09:21:34 -07:00
Sander de Smalen	fe6ae81ef3	[InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded. According to the LangRef, a (vscale_range) value of 0 means unbounded. This patch additionally cleans up the test file vscale_sext_and_zext.ll.	2021-08-04 17:17:37 +01:00
Bradley Smith	d9cc5d84e4	[AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores An insert subvector that is inserting the result of a vector predicate sized load into undef at index 0, whose result is casted to a predicate type, can be combined into a direct predicate load. Likewise the same applies to extract subvector but in reverse. The purpose of this optimization is to clean up cases that will be introduced in a later patch where casts to/from predicate types from i8 types will use insert subvector, rather than going through memory early. This optimization is done in SVEIntrinsicOpts rather than InstCombine to re-introduce scalable loads as late as possible, to give other optimizations the best chance possible to do a good job. Differential Revision: https://reviews.llvm.org/D106549	2021-08-04 15:51:14 +00:00
Simon Wallis	9269752671	[AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults Don't know how to custom expand this UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788 The fix is to provide missing expansions for: case ISD::STRICT_FP_TO_UINT: case ISD::STRICT_FP_TO_SINT: A test case is provided. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107452	2021-08-04 16:18:19 +01:00
Chris Jackson	21ee38e24f	[DebugInfo][LSR] Avoid crashes on large integer inputs SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may contain very large integers but the translation does not support integers greater than 64 bits. This patch adds checks to ensure conversions of these large integers is not attempted. A regression test is added to ensure no such translation is attempted. Reviewed by: StephenTozer PR: https://bugs.llvm.org/show_bug.cgi?id=51329 Differential Revision: https://reviews.llvm.org/D107438	2021-08-04 15:51:22 +01:00
Reshabh Sharma	d42e70b3d3	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-04 19:53:33 +05:30
Roman Lebedev	35c0848b57	[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef` to it is confusing. Instead, just move the `Mask` vector higher up, and change the code that earlier had no access to it but now does to use `Mask` instead of `BaseMask`. This has no other intentional changes.	2021-08-04 17:15:12 +03:00
Roman Lebedev	916cdc3d4b	[NFC][X86] combineX86ShuffleChain(): rename inner Mask to avoid future shadowing I want to hoist `Mask` variable higher up, but then it would clash with this one. So let's rename this one first. There are no other intentional changes here other than said rename.	2021-08-04 17:15:12 +03:00
Tomas Matheson	40650f27b5	[ARM][atomicrmw] Fix CMP_SWAP_32 expand assert This assert is intended to ensure that the high registers are not selected when it is passed to one of the thumb UXT instructions. However it was triggering even for 32 bit where no UXT instruction is emitted. Fixes PR51313. Differential Revision: https://reviews.llvm.org/D107363	2021-08-04 15:02:02 +01:00
Roman Lebedev	f819e4c7d0	[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats Given a shuffle mask, if it is picking from an input that is splat given the current granularity of the shuffle, then adjust the mask to pick from the same lane of the input as the mask element is in. This may result in a shuffle being simplified into a blend. I believe this is correct given that the splat detection matches the one just above the new code, My basic thought is that we might be able to get less regressions by handling multiple insertions of the same value into a vector if we form broadcasts+blend here, as opposed to D105390, but i have not really thought this through, and did not try implementing it yet. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107009	2021-08-04 16:55:04 +03:00
David Green	eeddcba525	[RDA] Attempt to make RDA subreg aware This attempts to make more of RDA aware of potentially overlapping subregisters. Some of this was already in place, with it iterating through MCRegUnitIterators. This also replaces calls to LiveRegs.contains(..) with !LiveRegs.available(..), and updates the isValidRegUseOf and isValidRegDefOf to search subregs. Differential Revision: https://reviews.llvm.org/D107351	2021-08-04 14:21:32 +01:00
Simon Pilgrim	8cd40ece70	[X86] Rename X86 tuning feature flag FeatureHasFastGather -> FeatureFastGather Match the naming style used by the other 'FeatureFast/FeatureSlow' tuning flags.	2021-08-04 13:07:50 +01:00
Simon Pilgrim	17e8ac0703	[X86] Move FeatureFastBEXTR from bdver2 features to tuning Noticed while looking at the feature flag renaming suggested in D107370	2021-08-04 13:07:49 +01:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Simon Pilgrim	fc8dee1ebb	[X86] Split Subtarget ISA / Security / Tuning Feature Flags Definitions. NFC Our list of slow/fast tuning feature flags has become pretty extensive and is randomly interleaved with ISA and Security (Retpoline etc.) flags, not even based on when the ISAs/flags were introduced, making it tricky to locate them. Plus we started treating tuning flags separately some time ago, so this patch tries to group the flags to match. I've left them mostly in the same order within each group - I'm happy to rearrange them further if there are specific ISA or Tuning flags that you think should be kept closer together. Differential Revision: https://reviews.llvm.org/D107370	2021-08-04 11:16:36 +01:00
Tim Northover	d7b0e5525a	X86: fix frame offset calculation with mandatory tail calls If there's a region of the stack reserved for potential tail call arguments (only the case when we guarantee tail calls will be honoured), this is right next to the incoming stored return address, not necessarily next to the callee-saved area, so combining the two into a single figure leads to incorrect offsets in some edge cases.	2021-08-04 10:02:42 +01:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Sjoerd Meijer	30fbb06979	[FuncSpec] Support specialising recursive functions This adds support for specialising recursive functions. For example: int Global = 1; void recursiveFunc(int arg) { if (arg < 4) { print(arg); recursiveFunc(arg + 1); } } void main() { recursiveFunc(&Global); } After 3 iterations of function specialisation, followed by inlining of the specialised versions of recursiveFunc, the main function looks like this: void main() { print(1); print(2); print(3); } To support this, the following has been added: - Update the solver and state of the new specialised functions, - An optimisation to propagate constant stack values after each iteration of function specialisation, which is necessary for the next iteration to recognise the constant values and trigger. Specialising recursive functions is (at the moment) controlled by option -func-specialization-max-iters and is opt-in for compile-time reasons. I.e., the default is -func-specialization-max-iters=1, but for the example above we would need to use -func-specialization-max-iters=3. Future work is to see if we can increase the default, or improve the cost-model/heuristics to control compile-times. Differential Revision: https://reviews.llvm.org/D106426	2021-08-04 08:07:04 +01:00
Senran Zhang	486b6013f9	[Support] Initialize common options in `getRegisteredOptions` This allows users accessing options in libSupport before invoking `cl::ParseCommandLineOptions`, and also matches the behavior before D105959. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D106334	2021-08-03 23:59:10 -07:00
hsmahesha	596e61c332	[AMDGPU] Ignore call graph node which does not have function info. While collecting reachable callees (from kernels), ignore call graph node which does not have associated function or associated function is not a definition. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D107329	2021-08-04 10:22:33 +05:30
Senran Zhang	df4e0beaeb	[NFC][ConstantFold] Check getAggregateElement before getSplatValue call Constant::getSplatValue has O(N) time complexity in the worst case, where N is the # of elements in a vector. So we call Constant::getAggregateElement first and return earlier if possible to avoid unnecessary getSplatValue calls. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107252	2021-08-03 21:52:14 -07:00
Heejin Ahn	9bd02c433b	[WebAssembly] Misc. cosmetic changes in EH (NFC) - Rename `wasm.catch` intrinsic to `wasm.catch.exn`, because we are planning to add a separate `wasm.catch.longjmp` intrinsic which returns two values. - Rename several variables - Remove an unnecessary parameter from `canLongjmp` and `isEmAsmCall` from LowerEmscriptenEHSjLj pass - Add `-verify-machineinstrs` in a test for a safety measure - Add more comments + fix some errors in comments - Replace `std::vector` with `SmallVector` for cases likely with small number of elements - Renamed `EnableEH`/`EnableSjLj` to `EnableEmEH`/`EnableEmSjLj`: We are soon going to add `EnableWasmSjLj`, so this makes the distincion clearer Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D107405	2021-08-03 21:03:46 -07:00
Arthur Eubanks	ad25344620	[MC][CodeGen] Emit constant pools earlier Previously we would emit constant pool entries for ldr inline asm at the very end of AsmPrinter::doFinalization(). However, if we're emitting dwarf aranges, that would end all sections with aranges. Then if we have constant pool entries to be emitted in those same sections, we'd hit an assert that the section has already been ended. We want to emit constant pool entries before emitting dwarf aranges. This patch splits out arm32/64's constant pool entry emission into its own MCTargetStreamer virtual method. Fixes PR51208 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107314	2021-08-03 20:55:31 -07:00
Jacob Hegna	b16c37fa2c	[MLGO] Update the current model url for the Oz inliner model.	2021-08-04 03:09:00 +00:00
Jessica Paquette	5643736378	[AArch64][GlobalISel] Widen G_SELECT before clamping it This allows us to handle the s88 G_SELECTS: https://godbolt.org/z/5s18M4erY Weird types like this can result in weird merges. Widening to s128 first and then clamping down avoids that situation. Differential Revision: https://reviews.llvm.org/D107415	2021-08-03 18:31:17 -07:00
Shimin Cui	2d9759c790	[GlobalOpt] Fix the load types when OptimizeGlobalAddressOfMalloc Currently, in OptimizeGlobalAddressOfMalloc, the transformation for global loads assumes that they have the same Type. With the support of ConstantExpr (https://reviews.llvm.org/D106589), this may not be true any more (as seen in the test case), and we miss the code to handle this, This is to fix that. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107397	2021-08-03 19:22:53 -04:00
Craig Topper	b818da27ab	[SimplifyCFG] Enable switch to lookup table for more types. This transform has been restricted to legal types since https://reviews.llvm.org/rG65df808f6254617b9eee931d00e95d900610b660 in 2012. This is particularly restrictive on RISCV64 which only has i64 as a legal integer type. i32 is a very common type in code generated from C, but we won't form a lookup table with it. This also effects other common types like i8/i16 types on ARM, AArch64, RISCV, etc. This patch proposes to allow power of 2 types larger than 8 bit, if they will fit in the largest legal integer type in DataLayout. These types are common in C code so generally well handled in the backends. We could probably do this for other types like i24 and rely on alignment and padding to allow the backend to use a single wider load. This isn't my main concern right now and it will need more tests. We could also allow larger types up to some limit and let the backend split into multiple loads, but we need to define that limit. It's also not my main concern right now. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107233	2021-08-03 15:35:16 -07:00
Evandro Menezes	63a5ac4e0d	[RISCV] Add scheduling resources for V Add the scheduling resources for the V extension instructions. Differential Revision: https://reviews.llvm.org/D98002	2021-08-03 15:47:51 -05:00
modimo	f5b8a3125a	[ThinLTO] Add TimeTrace for Thinlink step Results from Clang self-build: {F17435948} Testing: ninja check-all Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D104428	2021-08-03 13:20:04 -07:00
Alexey Bataev	871ea69803	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107344	2021-08-03 13:18:41 -07:00
Alexey Bataev	7d9d926a18	Revert "[SLP]Improve graph reordering." This reverts commit `e408d1dfab` and 2 other (`4b25c11321` and `c2deb2afaf`) related to fix the problem with the reordering shuffles.	2021-08-03 12:13:43 -07:00
Sami Tolvanen	7ce1c4da77	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Relands `700d07f8ce` with -msvc targets fixed. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058	2021-08-03 11:35:30 -07:00
Dylan Fleming	3943a74666	[InstCombine] Fixed select + masked load fold failure Fixed type assertion failure caused by trying to fold a masked load with a select where the select condition is a scalar value Reviewed By: sdesmalen, lebedev.ri Differential Revision: https://reviews.llvm.org/D107372	2021-08-03 19:06:12 +01:00
Philip Reames	223835f08b	[runtimeunroll] A bit of style cleanup to simplify a following change [NFC] Use for-range, use the idiomatic pattern for non-loop values, etc..	2021-08-03 10:28:46 -07:00
David Green	bd07c2e266	[AArch64] Prefer fmov over orr v.16b when copying f32/f64 This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to a fmov of the appropriate type. At least on some CPU's with 64bit NEON data paths this is expected to be faster, and shouldn't be slower on any CPU that treats fmov as a register rename. Differential Revision: https://reviews.llvm.org/D106365	2021-08-03 17:25:40 +01:00
Craig Topper	deaeb16d88	[RISCV] Indicate that RISCVMergeBaseOffsetOpt preserves the CFG. Return false from runOnFunction if nothing changed. Curiously we already returned a bool from detectAndFoldOffset, but didn't use it. Fix a couple breaks after returns that I saw while auditing detectAndFoldOffset. Differential Revision: https://reviews.llvm.org/D107303	2021-08-03 08:32:36 -07:00
Simon Pilgrim	11396641e4	[DAG] Cleanup DAGCombiner::CombineConsecutiveLoads early-outs. NFCI. We had some similar hasOneUse/isNON_EXTLoad early-outs spread out over different parts of the method - we should pull them all together. Noticed while triaging PR45116	2021-08-03 13:47:55 +01:00
Krishna	946fd4ea65	Revert "[InstCombine] Remove nnan requirement for transformation to fabs from select" This reverts commit `6180ce2e2a`.	2021-08-03 18:08:11 +05:30
Krishna	d99260641b	[InstCombine] Fold phi ( inttoptr/ptrtoint x ) to phi (x) The inttoptr/ptrtoint roundtrip optimization is not always correct. We are working towards removing this optimization and adding support to specific cases where this optimization works. In this patch, we focus on phi-node operands with inttoptr casts. We know that ptrtoint( inttoptr( ptrtoint x) ) is same as ptrtoint (x). So, we want to remove this roundtrip cast which goes through phi-node. Reviewed By: aqjune Differential Revision: https://reviews.llvm.org/D106289	2021-08-03 17:52:59 +05:30
Krishna	6180ce2e2a	[InstCombine] Remove nnan requirement for transformation to fabs from select In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs. (i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub). (ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg). Differential Revision: https://reviews.llvm.org/D106872	2021-08-03 17:52:58 +05:30
Simon Pilgrim	d3917bbfc6	[X86] Add title comment to separate the "CPU Families" features from the other subtarget features. NFCI. Hopefully we can get rid of these some day...	2021-08-03 12:53:57 +01:00
Fraser Cormack	cba6aab971	[RISCV] Support simple fractional steps in matching VID sequences This patch extends the optimization of VID-sequence BUILD_VECTORs introduced in D104921 to include simple fractional steps composed of a separated integer numerator and denominator. A notable limitation in this sequence detection is that only sequences with steps N/1 or 1/D are found, meaning that the step between elements and the frequency with which it changes is consistent across the whole sequence. Fractional steps such as 2/3 won't be matched as those would involve more complex tracking of state or some level of backtracking. As is stands, however, this patch is sufficient to match common interleave-type shuffle indices, for example matching `<0,0,1,1>` (or commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2. While the optimization is relatively `undef`-tolerant, due to greedy pattern-matching there even are some simple patterns which confuse the sequence detection into identifying either a suboptimal sequence or no sequence at all. Currently only fractional-step sequences identified as having a power-of-two denominator are actually lowered to RVV instructions. This is to avoid introducing divisions into the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106533	2021-08-03 10:38:24 +01:00
Jason Molenda	0d8cd4e2d5	[AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted Add a comment when there is a shifted value, add x9, x0, #291, lsl #12 ; =1191936 but not when the immediate value is unshifted, subs x9, x0, #256 ; =256 when the comment adds nothing additional to the reader. Differential Revision: https://reviews.llvm.org/D107196	2021-08-03 02:28:46 -07:00
Esme-Yi	69396896fb	[llvm-readobj][XCOFF] Fix the error dumping for the first item of StringTable. Summary: For the string table in XCOFF, the first 4 bytes contains the length of the string table, so we should print the string entries from fifth bytes. This patch also adds tests for llvm-readobj dumping the string table. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D105522	2021-08-03 09:08:58 +00:00
David Sherwood	0156f91f3b	[NFC] Rename enable-strict-reductions to force-ordered-reductions I'm renaming the flag because a future patch will add a new enableOrderedReductions() TTI interface and so the meaning of this flag will change to be one of forcing the target to enable/disable them. Also, since other places in LoopVectorize.cpp use the word 'Ordered' instead of 'strict' I changed the flag to match. Differential Revision: https://reviews.llvm.org/D107264	2021-08-03 09:33:01 +01:00
Cullen Rhodes	a02bbeeae7	[AArch64][AsmParser] NFC: Use helpers in matrix tile list parsing	2021-08-03 08:13:01 +00:00
Jay Foad	40202b13b2	[AMDGPU] Legalize operands of V_ADDC_U32_e32 and friends These instructions have an implicit use of vcc which counts towards the constant bus limit. Pre gfx10 this means that the explicit operands cannot be sgprs. Use the custom inserter hook to call legalizeOperands to enforce that restriction. Fixes https://bugs.llvm.org/show_bug.cgi?id=51217 Differential Revision: https://reviews.llvm.org/D106868	2021-08-03 09:04:52 +01:00
Paulo Matos	d3a0a65bf0	Reland: "[WebAssembly] Add new pass to lower int/ptr conversions of reftypes" Add new pass LowerRefTypesIntPtrConv to generate debugtrap instruction for an inttoptr and ptrtoint of a reference type instead of erroring, since calling these instructions on non-integral pointers has been since allowed (see `ac81cb7e6`). Differential Revision: https://reviews.llvm.org/D107102	2021-08-03 09:20:51 +02:00
jacquesguan	7900ee0b61	[RISCV] Teach VSETVLI insertion to merge the unused VSETVLI with the one need to be insert after it. If a vsetvli instruction is not compatible with the next vector instruction, and there is no other things that may update or use VL/VTYPE, we could merge it with the next vsetvli instruction that should be insert for the vector instruction. This commit only merge VTYPE with the former vsetvli instruction which has the same VL. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106857	2021-08-03 12:06:59 +08:00
luxufan	0023caf952	[RuntimeDyldChecker] Delete comparision of integers of different signs	2021-08-03 11:38:25 +08:00
luxufan	f4e418ac1e	[RuntimeDyldChecker] Support offset in decode_operand expr In RISCV's relocations, some relocations are comprised of two relocation types. For example, R_RISCV_PCREL_HI20 and R_RISCV_PCREL_LO12_I compose a PC relative relocation. In general the compiler will set a label in the position of R_RISCV_PCREL_HI20. So, to test the R_RISCV_PCREL_LO12_I relocation, we need decode instruction at position of the label points to R_RISCV_PCREL_HI20 plus 4 (the size of a riscv non-compress instruction). Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D105528	2021-08-03 11:25:51 +08:00
Shimin Cui	7ce98cf56e	[GlobalOpt] Fix the assert for stored once non-pointer to global address This is to fix the assert @bjope reported due to the code change of https://reviews.llvm.org/D106589. The test case from @bjope is also included. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107302	2021-08-02 19:23:29 -04:00
Eli Friedman	1f62af6346	[AArch64][SelectionDAG] Support passing/returning scalable vectors with unusual types. This adds handling for two cases: 1. A scalable vector where the element type is promoted. 2. A scalable vector where the element count is odd (or more generally, not divisble by the element count of the part type). (Some element types still don't work; for example, <vscale x 2 x i128>, or <vscale x 2 x fp128>.) Differential Revision: https://reviews.llvm.org/D105591	2021-08-02 15:53:16 -07:00
Roman Lebedev	6f6e9a867f	[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop I'm not sure this is the best way to approach this, but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D107271	2021-08-03 00:57:26 +03:00
Roman Lebedev	4ba3326f17	[InstCombine] `vector_reduce_{or,and}(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) This allows the expansion logic to actually trigger if the argument was extended from i1 element type, like the rest of the reductions expect. Alive2 agrees: https://alive2.llvm.org/ce/z/wcfews (or zext) https://alive2.llvm.org/ce/z/FCXNFx (or sext) https://alive2.llvm.org/ce/z/f26zUY (and zext) https://alive2.llvm.org/ce/z/jprViN (and sext)	2021-08-03 00:54:35 +03:00
Jessica Paquette	bd13c8e610	[AArch64][GlobalISel] Emit extloads for ZExt/SExt values in assignValueToAddress When a value is expected to be extended, we should emit an extended load rather than a normal G_LOAD. Add checklines to arm64-abi.ll which show that we now emit the correct loads. For ease of comparison: https://godbolt.org/z/8WvY6EfdE Differential Revision: https://reviews.llvm.org/D107313	2021-08-02 14:48:44 -07:00
Roman Lebedev	554fc9ad0a	[InstCombine] `vector_reduce_smax(?ext(<n x i1>))` --> `?ext(vector_reduce_{and,or}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/3oqir9 (self) https://alive2.llvm.org/ce/z/6cuI5m (zext) https://alive2.llvm.org/ce/z/4FL8rD (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Roman Lebedev	f47b7b6d10	[InstCombine] `vector_reduce_smin(?ext(<n x i1>))` --> `?ext(vector_reduce_{or,and}(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/noXtZ8 (self) https://alive2.llvm.org/ce/z/JNrN6C (zext) https://alive2.llvm.org/ce/z/58snuN (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-03 00:29:06 +03:00
Nikita Popov	c7770574f9	Revert "[unroll] Move multiple exit costing into consumer pass [NFC]" This reverts commit `76940577e4`. This causes Transforms/LoopUnroll/ARM/multi-blocks.ll to fail.	2021-08-02 22:23:34 +02:00
Chang-Sun Lin, Jr	b58eda39eb	[ValueTracking] Fix computeConstantRange to use "may" instead of "always" semantics for llvm.assume ValueTracking should allow for value ranges that may satisfy llvm.assume, instead of restricting the ranges only to values that will always satisfy the condition. Differential Revision: https://reviews.llvm.org/D107298	2021-08-02 22:20:17 +02:00
Roman Lebedev	b9b7162b8b	[InstCombine] `vector_reduce_umax(?ext(<n x i1>))` --> `?ext(vector_reduce_or(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/NbBaeT (self) https://alive2.llvm.org/ce/z/iEaig4 (zext) https://alive2.llvm.org/ce/z/meGb3y (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 23:02:23 +03:00
Roman Lebedev	0c13798056	[InstCombine] `vector_reduce_umin(?ext(<n x i1>))` --> `?ext(vector_reduce_and(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/XxUScW (self) https://alive2.llvm.org/ce/z/3usTF- (zext) https://alive2.llvm.org/ce/z/GVxwQz (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 23:02:22 +03:00
Philip Reames	76940577e4	[unroll] Move multiple exit costing into consumer pass [NFC] This aligns the multiple exit costing with all the other cost decisions. Note that UnrollAndJam, which is the only other caller of the original home of this code, unconditionally bails out of multiple exit loops.	2021-08-02 12:46:23 -07:00
Nikita Popov	380b8a603c	[DFAJumpThreading] Use SmallPtrSet for Visited (NFC) This set is only used for contains checks, so there is no need to use std::set.	2021-08-02 21:30:25 +02:00
Nikita Popov	3f7aea1a37	[DFAJumpThreading] Use insert return value (NFC) Rather than find + insert. Also use range based for loop.	2021-08-02 21:21:21 +02:00
Nikita Popov	84602f98c6	[DFAJumpThreading] Remove unnecessary includes (NFC) This file uses neither unordered_map nor unordered_set.	2021-08-02 21:13:30 +02:00
Nikita Popov	e97524cba2	[DFAJumpThreading] Mark DT as preserved in LegacyPM It is marked as preserved in NewPM, but not LegacyPM.	2021-08-02 21:13:30 +02:00
Roman Lebedev	469793efa7	[InstCombine] `vector_reduce_mul(?ext(<n x i1>))` --> `zext(vector_reduce_and(<n x i1>))` (PR51259) Alive2 agrees: https://alive2.llvm.org/ce/z/PDansB (self) https://alive2.llvm.org/ce/z/55D-Xc (zext) https://alive2.llvm.org/ce/z/LxG3-r (sext) We already handle `vector_reduce_and(<n x i1>)`, so let's just combine into the already-handled pattern and let the existing fold do the rest.	2021-08-02 21:57:51 +03:00
Philip Reames	9016beaa24	[unrollruntime] Pull out a helper function for readability and eventual reuse [nfc]	2021-08-02 11:47:27 -07:00
Paulo Matos	245f2ee647	Revert "[WebAssembly] Add new pass to lower int/ptr conversions of reftypes" This reverts commit `ce1c59dea6`.	2021-08-02 20:12:25 +02:00
Philip Reames	ebc4c4e3b0	[unroll] Add clarifying comment The option to not preserve LCSSA is in fact not tested at all in upstream. I was tempted to just remove the code entirely, but realized I didn't need to for my actual goal.	2021-08-02 10:44:56 -07:00
Alexander Yermolovich	5a865b0b1e	[DWARF] Don't process .debug_info relocations for DWO Context When we build with split dwarf in single mode the .o files that contain both "normal" debug sections and dwo sections, along with relocaiton sections for "normal" debug sections. When we create DWARF context in DWARFObjInMemory we process relocations and store them in the map for .debug_info, etc section. For DWO Context we also do it for non dwo dwarf sections. Which I believe is not necessary. This leads to a lot of memory being wasted. We observed 70GB extra memory being used. I went with context sensitive approach, flag is passed in. I am not sure if it's always safe not to process relocations for regular debug sections if Obj contains .dwo sections. If it is alternatvie might be just to scan, in constructor, sections and if there are .dwo sections not to process regular debug ones. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D106624	2021-08-02 10:41:47 -07:00
Paulo Matos	ce1c59dea6	[WebAssembly] Add new pass to lower int/ptr conversions of reftypes Add new pass LowerRefTypesIntPtrConv to generate trap instruction for an inttoptr and ptrtoint of a reference type instead of erroring, since calling these instructions on non-integral pointers has been since allowed (see `ac81cb7e6`). Differential Revision: https://reviews.llvm.org/D107102	2021-08-02 19:40:00 +02:00
Florian Hahn	bb725c9803	[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe. This patch updates VPInterleaveRecipe::print to print the actual defined VPValues for load groups and the store VPValue operands for store groups. The IR references may become outdated while transforming the VPlan and the defined and stored VPValues always are up-to-date. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D107223	2021-08-02 18:36:36 +01:00
Roman Lebedev	1e801439be	[InstCombine] `xor` reduction w/ i1 elt type is a parity check For i1 element type, `xor` and `add` are interchangeable (https://alive2.llvm.org/ce/z/e77hhQ), so we should treat it just like an `add` reduction and consistently transform them both: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext) Though, let's emit the IR that is similar to the one we produce for `vector_reduce_add(<n x i1>)`. See https://bugs.llvm.org/show_bug.cgi?id=51259	2021-08-02 20:21:37 +03:00
Thomas Lively	417e500668	[WebAssembly] Compute known bits for SIMD bitmask intrinsics This optimizes out the mask when the result of a bitmask is interpreted as an i8 or i16 value. Resolves PR50507. Differential Revision: https://reviews.llvm.org/D107103	2021-08-02 09:52:34 -07:00
David Green	c423a586a7	[ARM] Remove setPreservesCFG from ARMBlockPlacement As of `2829391840` it no longer preserves the CFG, needing to split blocks in order to add DLS instructions.	2021-08-02 14:15:45 +01:00
Irina Dobrescu	b01417d3c5	[AArch64] Optimise min/max lowering in ISel Differential Revision: https://reviews.llvm.org/D106561	2021-08-02 13:40:21 +01:00
Carl Ritson	675c942373	[AMDGPU] Disable NSA for BVH instructions when appropriate Check maximum NSA size when selecting NSA or non-NSA BVH instructions. Differential Revision: https://reviews.llvm.org/D103230	2021-08-02 20:09:26 +09:00
Florian Mayer	66b4aafa2e	[hwasan] Detect use after scope within function. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D105201	2021-08-02 11:34:12 +01:00
Simon Pilgrim	7397dcb403	[TTI] Add basic SK_InsertSubvector shuffle mask recognition This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188 This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion. The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic. The widening case will be addressed in a follow up patch that treats the cost as 0. Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors. Differential Revision: https://reviews.llvm.org/D107228	2021-08-02 11:23:44 +01:00
Simon Pilgrim	0579050116	Fix MSVC signed/unsigned comparison warning. NFCI.	2021-08-02 11:23:43 +01:00
Rosie Sumpter	f117ed542f	[LoopFlatten] Fix missed LoopFlatten opportunity When the limit of the inner loop is a known integer, the InstCombine pass now causes the transformation e.g. imcp ult i32 %inc, tripcount -> icmp ult %j, tripcount-step (where %j is the inner loop induction variable and %inc is add %j, step), which is now accounted for when identifying the trip count of the loop. This is also an acceptable use of %j (provided the step is 1) so is ignored as long as the compare that it's used in is also the condition of the inner branch. Differential Revision: https://reviews.llvm.org/D105802	2021-08-02 11:09:54 +01:00
David Green	2829391840	[ARM] Revert WLSTP to DLSTP if the target block is out of range If the block target for a WLSTP instruction is known to be out of range, and cannot be fixed by the ARMBlockPlacementPass, we can relax it to a DLSTP (and cmp/branch) to still allow the creation of tail predicated loops. That is what this patch does, adding extra revert code to the fallback path of ARMBlockPlacementPass. Due to the code produced when reverting, this creates a DLSTP between a Bcc and a Br. As a DLS isn't necessarily a terminator we need to split the block to move the DLS/Br into. Differential Revision: https://reviews.llvm.org/D104709	2021-08-02 10:59:52 +01:00
Cullen Rhodes	7ed0120d84	[AArch64][AsmParser] NFC: Parser.Lex() -> Lex() Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D107146	2021-08-02 09:48:41 +00:00
Max Kazantsev	c5b63714b5	[GC][NFC] Make getGCStrategy by name available in IR We might want to use info from GC strategy in middle end analysis. The motivation for this is provided in D99135: we may want to ask a GC if it's going to work with a given pointer (currently this code makes naive check by the method name). Differetial Revision: https://reviews.llvm.org/D100559 Reviewed By: reames	2021-08-02 14:26:04 +07:00
Carl Ritson	a441de6d94	[AMDGPU][GlobalISel] Add missing default mapping for BVH intrinsics Application of default mapping to BVH intrinsics was missing. Copy parts of SelectionDAG test to GlobalISel test as these would have indicated this error. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D107211	2021-08-02 12:43:38 +09:00
Freddy Ye	d268c20070	[X86] Support auto-detect for tigerlake and alderlake Differential Revision: https://reviews.llvm.org/D107245	2021-08-02 11:01:01 +08:00
Shimin Cui	732b05555c	[GlobalOpt] support ConstantExpr use of global address for OptimizeGlobalAddressOfMalloc I'm working on extending the OptimizeGlobalAddressOfMalloc to handle some more general cases. This is to add support of the ConstantExpr use of the global variables. The function allUsesOfLoadedValueWillTrapIfNull is now iterative with the added CE use of GV. Also, the recursive function valueIsOnlyUsedLocallyOrStoredToOneGlobal is changed to iterative using a worklist with the GEP case added. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D106589	2021-07-31 18:42:02 -04:00
Hsiangkai Wang	8b33839f01	[RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR. Differential Revision: https://reviews.llvm.org/D107139	2021-08-01 05:58:17 +08:00
Eli Friedman	bdd55b2f18	Fix the default alignment of i1 vectors. Currently, the default alignment is much larger than the actual size of the vector in memory. Fix this to use a sane default. For SVE, temporarily remove lowering of load/store operations for predicates with less than 16 elements. The layout the backend was assuming for SVE predicates with less than 16 elements doesn't agree with the frontend. More work probably needs to be done here. This change is, strictly speaking, not backwards-compatible at the bitcode level. But probably nobody is actually depending on that; i1 vectors in memory are rare, and the code that does use them probably ends up forcing the alignment to something sane anyway. If we think this is a concern, I can restrict this to scalable vectors for now (where it's actually causing issues for me at the moment). Differential Revision: https://reviews.llvm.org/D88994	2021-07-31 14:09:59 -07:00
Eli Friedman	2a2847823f	[ConstantFold] Get rid of special cases for sizeof etc. Target-dependent constant folding will fold these down to simple constants (or at least, expressions that don't involve a GEP). We don't need heroics to try to optimize the form of the expression before that happens. Fixes https://bugs.llvm.org/show_bug.cgi?id=51232 . Differential Revision: https://reviews.llvm.org/D107116	2021-07-31 13:20:47 -07:00
Sanjay Patel	7f55557765	[Analysis] improve function signature checking for snprintf The check for size_t parameter 1 was already here for snprintf_chk, but it wasn't applied to regular snprintf. This could lead to mismatching and eventually crashing as shown in: https://llvm.org/PR50885	2021-07-31 15:17:20 -04:00
Craig Topper	593059b328	[RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC fcvt.w(u) supports multiple rounding modes, but the ISD node doesn't encode that. So name it to match the rounding mode it uses.	2021-07-31 11:14:59 -07:00
Sanjay Patel	f2a322bfcf	[SROA] prevent crash on large memset length (PR50910) I don't know much about this pass, but we need a stronger check on the memset length arg to avoid an assert. The current code was added with D59000. The test is reduced from: https://llvm.org/PR50910 Differential Revision: https://reviews.llvm.org/D106462	2021-07-31 14:07:30 -04:00
Sanjay Patel	a22c99c3c1	[InstCombine] canonicalize cmp-of-bitcast-of-vector-cmp to use zero constant We can invert a compare constant and preserve the logic as shown in this sampling: https://alive2.llvm.org/ce/z/YAXbfs (In theory, we could deal with non-all-ones/zero as well, but it doesn't seem worthwhile.) I noticed this as a part of the x86 codegen difference in https://llvm.org/PR51259 - it ends up using "test" instead of "not + cmp" in that example. This pattern also shows up in https://llvm.org/PR41312 and https://llvm.org/PR50798 . Differential Revision: https://reviews.llvm.org/D107170	2021-07-31 13:31:12 -04:00
David Green	15a1d7e839	[ARM] Switch order of creating VADDV and VMLAV. It can be beneficial to attempt to try the larger VMLAV patterns before VADDV, in case both may match the same code.	2021-07-31 16:28:52 +01:00
Matt Arsenault	ebc17a0d68	GlobalISel: Scalarize unaligned vector stores This has the same problems and limitations as the load path.	2021-07-31 10:37:15 -04:00
Simon Pilgrim	3a7c82efb8	[DAG] isGuaranteedNotToBeUndefOrPoison - handle ISD::BUILD_VECTOR nodes If all demanded elements of the BUILD_VECTOR pass a isGuaranteedNotToBeUndefOrPoison check, then we can treat this specific demanded use of the BUILD_VECTOR as guaranteed not to be undef or poison either. Differential Revision: https://reviews.llvm.org/D107174	2021-07-31 15:08:25 +01:00
Matt Arsenault	bc2cb91a20	GlobalISel: Have lowerStore handle some unaligned stores This is NFC until some of the AMDGPU legalization rules are ripped out.	2021-07-31 10:01:42 -04:00
Alexandros Lamprineas	7d940432c4	[AArch64] Legalize MVT::i64x8 in DAG isel lowering This patch legalizes the Machine Value Type introduced in D94096 for loads and stores. A new target hook named getAsmOperandValueType() is added which maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization. Differential Revision: https://reviews.llvm.org/D94097	2021-07-31 09:51:28 +01:00
Alexandros Lamprineas	3094e5389b	[AArch64] Add a Machine Value Type for 8 consecutive registers Adds MVT::i64x8, a Machine Value Type needed for lowering inline assembly operands which materialize a sequence of eight general purpose registers. Differential Revision: https://reviews.llvm.org/D94096	2021-07-31 09:51:28 +01:00
Petr Hosek	83302c8489	[profile] Fix profile merging with binary IDs This fixes support for merging profiles which broke as a consequence of `e50a38840d`. The issue was missing adjustment in merge logic to account for the binary IDs which are now included in the raw profile just after header. In addition, this change also: * Includes the version in module signature that's used for merging to avoid accidental attempts to merge incompatible profiles. * Moves the binary IDs size field after version field in the header as was suggested in the review. Differential Revision: https://reviews.llvm.org/D107143	2021-07-30 18:54:27 -07:00
Petr Hosek	d3dd07e3d0	Revert "[profile] Fix profile merging with binary IDs" This reverts commit `dcadd64986`.	2021-07-30 18:53:48 -07:00
Petr Hosek	dcadd64986	[profile] Fix profile merging with binary IDs This fixes support for merging profiles which broke as a consequence of `e50a38840d`. The issue was missing adjustment in merge logic to account for the binary IDs which are now included in the raw profile just after header. In addition, this change also: * Includes the version in module signature that's used for merging to avoid accidental attempts to merge incompatible profiles. * Moves the binary IDs size field after version field in the header as was suggested in the review. Differential Revision: https://reviews.llvm.org/D107143	2021-07-30 17:38:53 -07:00
Petr Hosek	6ea2f31f3d	Revert "[profile] Fix profile merging with binary IDs" This reverts commit `89d6eb6f8c`, this seemed to have break a few builders.	2021-07-30 14:32:52 -07:00
Florian Mayer	b5b023638a	Revert "[hwasan] Detect use after scope within function." This reverts commit `84705ed913`.	2021-07-30 22:32:04 +01:00
Brendon Cahoon	c4c379d633	[LoopStrengthReduction] Fix pointer extend asserts Additional asserts were added to ScalarEvolution to enforce pointer/int type rules. An assert is triggered when the LSR pass attempts to extend a pointer SCEV in GenerateTruncates. This patch changes GenerateTruncates to exit early if the Formaula contains a ScaledReg or BaseReg with a pointer type. Differential Revision: https://reviews.llvm.org/D107185	2021-07-30 17:24:08 -04:00
Petr Hosek	89d6eb6f8c	[profile] Fix profile merging with binary IDs This fixes support for merging profiles which broke as a consequence of `e50a38840d`. The issue was missing adjustment in merge logic to account for the binary IDs which are now included in the raw profile just after header. In addition, this change also: * Includes the version in module signature that's used for merging to avoid accidental attempts to merge incompatible profiles. * Moves the binary IDs size field after version field in the header as was suggested in the review. Differential Revision: https://reviews.llvm.org/D107143	2021-07-30 13:30:30 -07:00
Rahman Lavaee	2256b359d7	Explain the symbols of basic block clusters with an example in the header comments. This prevents from confusion with the ``labels`` option. Reviewed By: snehasish Differential Revision: https://reviews.llvm.org/D107128	2021-07-30 12:08:04 -07:00
Fangrui Song	a1532ed275	[InstrProfiling] Make CountersPtr in __profd_ relative Change `CountersPtr` in `__profd_` to a label difference, which is a link-time constant. On ELF, when linking a shared object, this requires that `__profc_` is either private or linkonce/linkonce_odr hidden. On COFF, we need D104564 so that `.quad a-b` (64-bit label difference) can lower to a 32-bit PC-relative relocation. ``` # ELF: R_X86_64_PC64 (PC-relative) .quad .L__profc_foo-.L__profd_foo # Mach-O: a pair of 8-byte X86_64_RELOC_UNSIGNED and X86_64_RELOC_SUBTRACTOR .quad l___profc_foo-l___profd_foo # COFF: we actually use IMAGE_REL_AMD64_REL32/IMAGE_REL_ARM64_REL32 so # the high 32-bit value is zero even if .L__profc_foo < .L__profd_foo # As compensation, we truncate CountersDelta in the header so that # __llvm_profile_merge_from_buffer and llvm-profdata reader keep working. .quad .L__profc_foo-.L__profd_foo ``` (Note: link.exe sorts `.lprfc` before `.lprfd` even if the object writer has `.lprfd` before `.lprfc`, so we cannot work around by reordering `.lprfc` and `.lprfd`.) With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) `ld -pie` linked clang is 1.74% smaller due to fewer R_X86_64_RELATIVE relocations. ``` % readelf -r pie \| awk '$3~/R.*/{s[$3]++} END {for (k in s) print k, s[k]}' R_X86_64_JUMP_SLO 331 R_X86_64_TPOFF64 2 R_X86_64_RELATIVE 476059 # was: 607712 R_X86_64_64 2616 R_X86_64_GLOB_DAT 31 ``` The absolute function address (used by llvm-profdata to collect indirect call targets) can be converted to relative as well, but is not done in this patch. Differential Revision: https://reviews.llvm.org/D104556	2021-07-30 11:52:18 -07:00
David Green	69cdadddec	[ARM] Distribute reductions based on ascending load offset This distributes reductions based on the relative offset of loads, if one is found from their operands. Given chains of reductions this will then sort them in ascending load order, which in turn can help simple prefetches latch on to increasing strides more easily. Differential Revision: https://reviews.llvm.org/D106569	2021-07-30 19:50:07 +01:00
Simon Pilgrim	afc6b09dee	[InstCombine] getMaskedTypeForICmpPair - remove dead code. NFCI. Ok should be true at this point, so the early-out is dead - replace with an assert.	2021-07-30 19:23:05 +01:00
Simon Pilgrim	3c0b596ecc	SelectionDAGDumper.cpp - remove nested if-else return chain. NFCI. Match style and don't use an else after a return.	2021-07-30 19:23:05 +01:00
Simon Pilgrim	986841cca2	SelectionDAGDumper.cpp - printrWithDepthHelper - remove dead code. NFCI. Fixes coverity warning - we have an early-out for unsigned depth == 0, so the depth < 1 early-out later on is dead code.	2021-07-30 19:23:04 +01:00
Matt Arsenault	e46badd4e9	GlobalISel: Have lowerLoad scalarize unaligned vectors This could be smarter by picking an ideal type, or at least splitting the vector in half first. Also handles lower for non-power-of-2, non-extending vector loads. Currently this just avoids failing to legalize some odd vector AMDGPU tests, but is a step towards removing the split logic from the NarrowScalar logic.	2021-07-30 13:23:29 -04:00
Alexey Bataev	95e5d401ae	[SLP]Improve splats vectorization. Replace insertelement instructions for splats with just single insertelement + broadcast shuffle. Also, try to merge these instructions if they come from the same/shuffled gather node. Differential Revision: https://reviews.llvm.org/D107104	2021-07-30 10:17:45 -07:00
Kerry McLaughlin	9d35594993	Reland "[LV] Use lookThroughAnd with logical reductions" If a reduction Phi has a single user which `AND`s the Phi with a type mask, `lookThroughAnd` will return the user of the Phi and the narrower type represented by the mask. Currently this is only used for arithmetic reductions, whereas loops containing logical reductions will create a reduction intrinsic using the widened type, for example: for.body: %phi = phi i32 [ %and, %for.body ], [ 255, %entry ] %mask = and i32 %phi, 255 %gep = getelementptr inbounds i8, i8* %ptr, i32 %iv %load = load i8, i8* %gep %ext = zext i8 %load to i32 %and = and i32 %mask, %ext ... ^ this will generate an and reduction intrinsic such as the following: call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...) The same example for an add instruction would create an intrinsic of type i8: call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...) This patch changes AddReductionVar to call lookThroughAnd for other integer reductions, allowing loops similar to the example above with reductions such as and, or & xor to vectorize. Reviewed By: david-arm, dmgreen Differential Revision: https://reviews.llvm.org/D105632	2021-07-30 18:04:09 +01:00
Matt Arsenault	f19226dda5	GlobalISel: Have load lowering handle some unaligned accesses The code for splitting an unaligned access into 2 pieces is essentially the same as for splitting a non-power-of-2 load for scalars. It would be better to pick an optimal memory access size and directly use it, but splitting in half is what the DAG does. As-is this fixes handling of some unaligned sextload/zextloads for AMDGPU. In the future this will help drop the ugly abuse of narrowScalar to handle splitting unaligned accesses.	2021-07-30 12:55:58 -04:00
Matt Arsenault	faccf427df	AMDGPU/GlobalISel: Remove special case lowering for non-pow-2 stores We end up with extra copies from buildAnyExtOrTrunc if these are lowered after the register types are legalized.	2021-07-30 12:37:29 -04:00
Kazu Hirata	e76ddfa9ef	[Transforms] Remove HasValueForBlock (NFC) The function seems to be unused for at least one year.	2021-07-30 08:56:49 -07:00
Dylan Fleming	a7a39ec886	[SVE] Add folds for sign and zero extends of vscale Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D105994	2021-07-30 16:02:50 +01:00
David Green	532d05b714	[ARM] Attempt to distribute reductions This adds a combine for adds of reductions, distributing them so that they occur sequentially to enable better use of accumulating VADDVA instructions. It combines: add(X, add(vecreduce(Y), vecreduce(Z))) -> add(add(X, vecreduce(Y)), vecreduce(Z)) and add(add(A, reduce(B)), add(C, reduce(D))) -> add(add(add(A, C), reduce(B)), reduce(D)) These together distribute the add's so that more reductions can be selected to VADDVA. Differential Revision: https://reviews.llvm.org/D106532	2021-07-30 14:48:31 +01:00
Florian Mayer	84705ed913	[hwasan] Detect use after scope within function. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D105201	2021-07-30 13:59:36 +01:00
Alexey Bataev	4b25c11321	[SLP]Fix an assertion for the size of user nodes. For the nodes with reused scalars the user may be not only of the size of the final shuffle but also of the size of the scalars themselves, need to check for this. It is safe to just modify the check here, since the order of the scalars themselves is preserved, only indeces of the reused scalars are changed. So, the users with the same size as the number of scalars in the node, will not be affected, they still will get the operands in the required order. Reported by @mstorsjo in D105020. Differential Revision: https://reviews.llvm.org/D107080	2021-07-30 05:46:44 -07:00
Alexey Bataev	f4fb854811	[SLP]Do not consider deleted instruction as external users. If the instruction was previously deleted, it should not be treated as an external user. This fixes cost estimation and removes dead extractelement instructions. Differential Revision: https://reviews.llvm.org/D107106	2021-07-30 05:37:43 -07:00
Alexey Bataev	c2deb2afaf	[SLP]Fix a crash in gathered loads analysis. Need to check that the minimum acceptable vector factor is at least 2, not 0, to avoid compiler crash during gathered loads analysis. Differential Revision: https://reviews.llvm.org/D107058	2021-07-30 05:19:17 -07:00
Alex Zinenko	aa426c372c	[OMPIRBuilder] add minimalist reduction support This introduces a builder function for emitting IR performing reductions in OpenMP. Reduction variable privatization and initialization to the reduction-neutral value is expected to be handled separately. The caller provides the reduction functions. Further commits can provide implementation of reduction functions for the reduction operators defined in the OpenMP specification. This implementation was tested on an MLIR fork targeting OpenMP from C and produced correct executable code. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D104928	2021-07-30 13:58:26 +02:00
David Green	4b56306762	[ARM] Turn vecreduce_add(add(x, y)) into vecreduce(x) + vecreduce(y) Under MVE we can use VADDV/VADDVA's to perform integer add reductions, so it can be beneficial to use more reductions than summing subvectors and reducing once. Especially for VMLAV/VMLAVA the mul can be incorporated into the reduction, producing less instructions. Some of the test cases currently get larger due to extra integer adds, but will be improved in a followup patch. Differential Revision: https://reviews.llvm.org/D106531	2021-07-30 10:10:41 +01:00
Cullen Rhodes	3a349d2269	[AArch64][SME] Introduce feature for streaming mode The Scalable Matrix Extension (SME) introduces a new execution mode called Streaming SVE mode. In streaming mode a substantial subset of the SVE and SVE2 instruction set is available, along with new outer product, load, store, extract and insert instructions that operate on the new architectural register state for the matrix. To support streaming mode this patch introduces a new subtarget feature +streaming-sve. If enabled, the subset of SVE(2) instructions are available. The existing behaviour for SVE(2) remains unchanged, the subset of instructions that are legal in streaming mode are enabled if either +sve[2] or +streaming-sve is specified. Instructions that are illegal in streaming mode remain predicated on +sve[2]. The SME target feature has been updated to imply +streaming-sve rather than +sve. The following changes are made to the SVE(2) tests: * For instructions that are legal in streaming mode: - added RUN line to verify +streaming-sve enables the instruction. - updated diagnostic to 'instruction requires: streaming-sve or sve'. * For instructions that are illegal in streaming-mode: - added RUN line to verify +streaming-sve does not enable the instruction. SVE(2) instructions that are legal in streaming mode have: if !HaveSVE[2]() && !HaveSME() then UNDEFINED; at the top of the pseudocode in the XML. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D106272	2021-07-30 07:30:45 +00:00
Lang Hames	8a241cd9c2	[JITLink][ELF][x86-64] Include relocation name in missing relocation errors. This saves a level of manual table lookup for those of us who don't remember ELF relocation numbers off the top of our heads.	2021-07-30 15:19:11 +10:00
Tarindu Jayatilaka	7a797b2902	Take OptimizationLevel class out of Pass Builder Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D107025	2021-07-29 21:57:23 -07:00
Stefan Pintilie	754520a2bf	[PowerPC] Fix issue where hint was providing the incorrect regsiter class. Regsier hints when copying to a UACC register do not always produce VSRp registers. This patch makes sure that we do not produce hints in cases where the subregsiter of the UACC is not a VSRp. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D107101	2021-07-29 21:10:45 -05:00
Esme-Yi	8011fc1953	[yaml2obj] Enable support for parsing 64-bit XCOFF. Summary: Add support for yaml2obj to parse 64-bit XCOFF. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D100375	2021-07-30 02:06:04 +00:00
Mark Schimmel	e622c99f30	[ARC] Add norm/normh instructions with disassembly tests Add disassembler support for the NORM and NORMH instructions. These instructions only exist when the ARC processor is configured with the "norm" extension. fferential Revision: https://reviews.llvm.org/D107118	2021-07-29 17:54:52 -07:00
Ben Shi	bb6fddb63c	Optimize mul in the zba extension with SH*ADD This patch does the following optimization of mul with a constant. (mul x, 11) -> (SH1ADD (SH2ADD x, x), x) (mul x, 19) -> (SH1ADD (SH3ADD x, x), x) (mul x, 13) -> (SH2ADD (SH1ADD x, x), x) (mul x, 21) -> (SH2ADD (SH2ADD x, x), x) (mul x, 37) -> (SH2ADD (SH3ADD x, x), x) (mul x, 25) -> (SH3ADD (SH1ADD x, x), x) (mul x, 41) -> (SH3ADD (SH2ADD x, x), x) (mul x, 73) -> (SH3ADD (SH3ADD x, x), x) (mul x, 27) -> (SH1ADD (SH3ADD x, x), (SH3ADD x, x)) (mul x, 45) -> (SH2ADD (SH3ADD x, x), (SH3ADD x, x)) (mul x, 81) -> (SH3ADD (SH3ADD x, x), (SH3ADD x, x)) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107065	2021-07-30 08:36:28 +08:00
Thomas Johnson	cc238a6e03	[ARC] Add additional mov immediate instruction formats with a fix for u6 decoding Differential Revision: https://reviews.llvm.org/D107088	2021-07-29 16:41:55 -07:00
Joseph Huber	cd0dd8ece8	[OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following: - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared. - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode. - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall. - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106802	2021-07-29 19:28:31 -04:00
Adrian Prantl	c5d84d2eb3	GlobalISel/AArch64: don't optimize away redundant branches at -O0 This patch prevents GlobalISel from optimizing out redundant branch instructions when compiling without optimizations. The motivating example is code like the following common pattern in Swift, where users expect to be able to set a breakpoint on the early exit: public func f(b: Bool) { guard b else { return // I would like to set a breakpoint here. } ... } The patch modifies two places in GlobalISEL: The first one is in IRTranslator.cpp where the removal of redundant branches is made conditional on the optimization level. The second one is in AArch64InstructionSelector.cpp where an -O0 only optimization is being removed. Disabling these optimizations increases code size at -O0 by ~8%. However, doing so improves debuggability, and debug builds are the primary reason why developers compile without optimizations. We thus concluded that this is the right trade-off. rdar://79515454 This tenatively reapplies the patch without modifications, the LLDB test that has blocked this from landing previously has since been modified to hopefully no longer be sensitive to this change. Differential Revision: https://reviews.llvm.org/D105238	2021-07-29 16:04:22 -07:00
David Green	d4a2daa919	[ARM] Define a couple more ssub indexes. NFC Same as `91bd3ad128`, this doesn't really change anything but gives the registers better names than the ones tablegen would define. And fills in the missing gaps.	2021-07-29 23:00:35 +01:00
Amara Emerson	c54d5c9756	[GlobalISel] Use GMergeLikeOp to simplify a combine. NFC.	2021-07-29 13:53:16 -07:00
Andy Kaylor	b4d945bacd	Fixing an infinite loop problem in InstCombine Patch by Mohammad Fawaz This issues started happening after `b373b5990d` Basically, if the memcpy is volatile, the collectUsers() function should return false, just like we do for volatile loads. Differential Revision: https://reviews.llvm.org/D106950	2021-07-29 12:57:17 -07:00
Sander de Smalen	84a4caeb84	[InstSimplify] Don't assume parent function when simplifying llvm.vscale. D106850 introduced a simplification for llvm.vscale by looking at the surrounding function's vscale_range attributes. The call that's being simplified may not yet have been inserted into the IR. This happens for example during function cloning. This patch fixes the issue by checking if the instruction is in a parent basic block.	2021-07-29 20:08:08 +01:00
Amara Emerson	532c458fa8	[GlobalISel] Add GPtrAdd and use it in some combines.	2021-07-29 12:04:02 -07:00
Dawid Jurczak	5c315bee8c	[DSE] Transform memset + malloc --> calloc (PR25892) After this change DSE can eliminate malloc + memset and emit calloc. It's https://reviews.llvm.org/D101440 follow-up. Differential Revision: https://reviews.llvm.org/D103009	2021-07-29 18:34:10 +02:00
Jessica Clarke	95ef464ac9	Handle subregs and superregs in callee-saved register mask If a target lists both a subreg and a superreg in a callee-saved register mask, the prolog will spill both aliasing registers. Instead, don't spill the subreg if a superreg is being spilled. This case is hit by the PowerPC SPE code, as well as a modified RISC-V backend for CHERI I maintain out of tree. Reviewed By: jhibbits Differential Revision: https://reviews.llvm.org/D73170	2021-07-29 16:53:29 +01:00
Rosie Sumpter	fab5659c79	Revert "[LoopFlatten] Fix missed LoopFlatten opportunity" This reverts commit `2df8bf9339`. Reverting because it causes an assertion failure.	2021-07-29 15:52:45 +01:00
Sanjay Patel	fa6b2c9915	[DAGCombiner] don't try to partially reduce add-with-overflow ops This transform was added with D58874, but there were no tests for overflow ops. We need to change this one way or another because it can crash as shown in: https://llvm.org/PR51238 Note that if there are no uses of an overflow op's bool overflow result, we reduce it to a regular math op, so we continue to fold that case either way. If we have uses of both the math and the overflow bool, then we are likely not saving anything by creating an independent sub instruction as seen in the test diffs here. This patch makes the behavior in SDAG consistent with what we do in instcombine AFAICT. Differential Revision: https://reviews.llvm.org/D106983	2021-07-29 08:51:54 -04:00
Andrew Savonichev	bcc83a2e83	[MCA] Use LSU for the in-order pipeline Load/Store unit is used to enforce order of loads and stores if they alias (controlled by --noalias=false option). Fixes PR50483 - [MCA] In-order pipeline doesn't track memory load/store dependencies. Differential Revision: https://reviews.llvm.org/D103955	2021-07-29 14:40:23 +03:00
Bradley Smith	191831e380	[AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter An incorrect mask type when lowering an SVE gather/scatter was causing a codegen fault which manifested as the incorrect predicate size being used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d). Fixes PR51182. Differential Revision: https://reviews.llvm.org/D106943	2021-07-29 11:22:17 +00:00
Jeremy Morse	2537120c87	Follow-up to D105207, only salvage affine SCEVs to avoid a crash SCEVToIterCountExpr only expects to be fed affine expressions, but DbgRewriteSalvageableDVIs is feeding it non-affine induction variables. Following this up with an obvious fix, will add test coverage too if this avoids D105207 being reverted.	2021-07-29 11:48:08 +01:00
Cullen Rhodes	08d92dbbff	[AArch64][AsmParser] NFC: Parser.getTok() -> getTok() Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106949	2021-07-29 10:18:54 +00:00
Amara Emerson	da61ab8475	[AArch64][GlobalISel] More widenToNextPow2 changes, this time for arithmetic/bitwise ops.	2021-07-29 03:02:29 -07:00
Mirko Brkusanin	971f4173f8	[AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary While v_cmp will AND inactive lanes with 0, that is not the case for logical operations. This fixes a Vulkan CTS test that would hang otherwise. Differential Revision: https://reviews.llvm.org/D105709	2021-07-29 11:20:49 +02:00
Rosie Sumpter	2df8bf9339	[LoopFlatten] Fix missed LoopFlatten opportunity When the trip count of the inner loop is a constant, the InstCombine pass now causes the transformation e.g. imcp ult i32 %inc, tripcount -> icmp ult %j, tripcount-step (where %j is the inner loop induction variable and %inc is add %j, step), which is now accounted for when identifying the trip count of the loop. This is also an acceptable use of %j (provided the step is 1) so is ignored as long as the compare that it's used in is also the condition of the inner branch. Differential Revision: https://reviews.llvm.org/D105802	2021-07-29 09:47:41 +01:00
Fraser Cormack	02dd4b59bc	[RISCV] Optimize floating-point "dominant value" BUILD_VECTORs This patch aims to improve the performance of BUILD_VECTORs which are identified as containing a dominant element. Given that most floating-point constants themselves require a load from the constant pool, it was possible for the optimization to actually increase the number of individual loads on small vectors. The exception is the zero constant -- +0.0 -- which can be materialized efficiently. While this optimization could do with a proper cost model to weigh the benfits of a single vector load vs. the manipulation of individual elements -- even for integer vectors which often require several instructions to materialize -- without a concrete RVV implementation to work with any heuristic is likely to be both more obtuse and inaccurate. Until then, this patch fixes at least one known obvious deficiency. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106963	2021-07-29 09:22:34 +01:00
Jun Ma	e2fe26e77b	[NFC][InstSimplify] Use more intuitive variable names.	2021-07-29 13:55:47 +08:00
Guozhi Wei	50b6273145	[MBP] findBestLoopTopHelper should exit if OldTop is not a chain header Function findBestLoopTopHelper tries to find a new loop top block which can also fall through to OldTop, but it's impossible if OldTop is not a chain header, so it should exit immediately. Differential Revision: https://reviews.llvm.org/D106329	2021-07-28 19:00:45 -07:00
Ben Shi	264b8e2a20	[RISCV] Optimize mul in the zba extension with SH*ADD This patch makes the following optimization, if the immediate multiplier is not a simm12. (mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits)) (mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits)) (mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits)) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106648	2021-07-29 09:46:41 +08:00
Wenlei He	1a8087adaf	[ThinLTO] Disallow importing for functions with indir branch to block address We don't allowing inlining for functions with blockaddress with uses other than strictly callbr. This is because if the blockaddress escapes the function via a global variable, inlining may lead to an invalid cross-function reference. We check against such cases during inlining, however the check can fail for ThinLTO post-link because CFG simplification can incorrectly removes blocks based on wrong block reachability. When we import a function with blockaddress taken in a global variable but without importing that variable, we won't go through value mapping to reflect the real address-taken-ness of the cloned blocks. For the imported clone, this leads to blocks reachable from indirect branch through global variable being incorrectly treated as unreachable and removed by SimplifyCFG. Since inlining for such cases shouldn't be allowed in the first place, I'm marking them as ineligible for importing during pre-link to save the problem of missing address-taken-ness of imported clone as well as bad DCE and inlining. Differential Revision: https://reviews.llvm.org/D106930	2021-07-28 18:02:48 -07:00
Daniel Rodríguez Troitiño	d6704e5ed9	[llvm-objcopy][MachO] Ignore all LC_SUB_* commands. The LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_CLIENT, and LC_SUB_LIBRARY are used to indicate related libraries, binaries or framework names. Their only payload is the string with the name of the object. Adding those commands to the list of ignored/skipped load commands will avoid an error that stop the process of copying/stripping and will copy their contents verbatim. Additionally, in order to have a test for this case, `yaml2obj` now allows those four commands to contain a `Content`. Differential Revision: https://reviews.llvm.org/D106412	2021-07-28 17:35:26 -07:00
Jessica Paquette	5a333dc5da	[AArch64][GlobalISel] Improve legalization for odd-type G_LOAD Swap the order of widening so that we widen to the next power-of-2 first when legalizing G_LOAD. Also, provide a minimum type for the power of 2 to disallow s2 + s1. Clamping ought to disallow s2 and s1, but I think it's better to be explicit about the expected minimum size. We probably need a similar change for G_STORE, but it seems to be a bit more finnicky. So, let's just handle G_LOAD for now. Differential Revision: https://reviews.llvm.org/D107013	2021-07-28 17:19:14 -07:00
Joseph Huber	adbaa39dfc	[Attributor] Change function internalization to not replace uses in internalized callers The current implementation of function internalization creats a copy of each function and replaces every use. This has the downside that the external versions of the functions will call into the internalized versions of the functions. This prevents them from being fully independent of eachother. This patch replaces the current internalization scheme with a method that creates all the copies of the functions intended to be internalized first and then replaces the uses as long as their caller is not already internalized. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106931	2021-07-28 18:57:28 -04:00
Jessica Paquette	c0a41c3d3b	[AArch64][GlobalISel] Improve legalization for odd-sized G_ICMP/G_CONSTANT We were handing types like s88 like 1) clamp to the range 2) widen to the next power of 2 This isn't desirable because it causes an odd breakdown for types like s88. If we widen to the next power of 2 (s128) first, then we get a clean breakdown when we clamp back to s64. Differential Revision: https://reviews.llvm.org/D106998	2021-07-28 15:31:33 -07:00
Chris Jackson	0ba8595287	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR Reapply commit `d675b594f4` that was reverted due to buildbot failures. A simple fix has been applied to remove an assertion. Differential Revision: https://reviews.llvm.org/D105207	2021-07-28 23:04:59 +01:00
Eli Friedman	4adcff0b70	[ARM] Fix llvm-objdump disassembly of armv7m object files. Apparently, the features were getting mixed up, so we'd try to disassemble in ARM mode. Fix sub-architecture detection to compute the correct triple if we're detecting it automatically, so the user doesn't need to pass --triple=thumb etc. It's possible we should be somehow tying the "+thumb-mode" target feature more directly to Tag_CPU_arch_profile? But this seems to work reasonably well, anyway. While I'm here, fix up the other llvm-objdump tests that were explicitly specifying an ARM triple; that shouldn't be necessary. Differential Revision: https://reviews.llvm.org/D106912	2021-07-28 11:41:54 -07:00
Patrick Holland	dbed061bf1	[MCA] Moving the target specific CustomBehaviour impl. from /tools/llvm-mca/ to /lib/Target/. Differential Revision: https://reviews.llvm.org/D106775	2021-07-28 11:23:18 -07:00
Sjoerd Meijer	bc43078fe8	[LoopFlatten] Fix bug where SCEVCouldNotCompute object is used The SCEV method getBackedgeTakenCount() returns a SCEVCouldNotCompute object if the backedge-taken count is unpredictable. This fix ensures there is no longer an attempt to use such an object to find the trip count. Patch by: Rosie Sumpter. Differential Revision: https://reviews.llvm.org/D106970	2021-07-28 18:35:08 +01:00
Jeroen Dobbelaere	03b8c69d06	[PredicateInfo] Use Intrinsic::getDeclaration now that it handles unnamed types. This is a second attempt to fix the EXPENSIVE_CHECKS issue that was mentioned In D91661#2875179 by @jroelofs. (The first attempt was in D105983) D91661 more or less completely reverted D49126 and by doing so also removed the cleanup logic of the created declarations and calls. This patch is a replacement for D91661 (which must itself be reverted first). It replaces the custom declaration creation with the generic version and shows the test impact. It also tracks the number of NamedValues to detect if a new prototype was added instead of looking at the available users of a prototype. Reviewed By: jroelofs Differential Revision: https://reviews.llvm.org/D106147	2021-07-28 19:30:29 +02:00
Jeroen Dobbelaere	dc5570d149	Revert "Revert of D49126 [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types." This reverts commit `77080a1eb6`. This change introduced issues detected with EXPENSIVE_CHECKS. Reverting to restore the needed function cleanup. A next patch will then just improve on the name mangling.	2021-07-28 19:30:29 +02:00
Fangrui Song	6da3d8b19c	[llvm] Replace LLVM_ATTRIBUTE_NORETURN with C++11 [[noreturn]] [[noreturn]] can be used since Oct 2016 when the minimum compiler requirement was bumped to GCC 4.8/MSVC 2015. Note: the definition of LLVM_ATTRIBUTE_NORETURN is kept for now.	2021-07-28 09:31:14 -07:00
Craig Topper	3106f85945	[RISCV] Fix grammar in a comment. NFC	2021-07-28 09:09:26 -07:00
Craig Topper	54588bcc05	[RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop. The sign_extend we insert here can get turned into a zero_extend if the sign bit is known zero. This can enable a setcc combine that shrinks compares with zero_extend. This reduces the use count of the zero_extend allowing other combines to turn it back into an any_extend. This restricts the combine to only cases where the result is used by a CopyToReg. This works for my original motivating case. I hope the CopyToReg use will prevent any converted extends from turning back into an any_extend. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106754	2021-07-28 09:05:45 -07:00
Chris Jackson	3992896043	Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR" Reverted due to buildbot failures. This reverts commit `d675b594f4`.	2021-07-28 16:44:54 +01:00
Chris Jackson	d675b594f4	[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR Reapply commit `796b84d26f` that was reverted due to reports of crashes. A minor change now guards against getVariableLocationOperand() returning a nullptr. Differential Revision: https://reviews.llvm.org/D106659	2021-07-28 16:28:46 +01:00
Sanjay Patel	5b83261c15	[DivRemPairs] make sure we have a valid CFG for hoisting division This transform was added with `e38b7e8948` and as shown in: https://llvm.org/PR51241 ...it could crash without an extra check of the blocks. There might be a more compact way to write this constraint, but we can't just count the successors/predecessors without affecting a test that includes a switch instruction.	2021-07-28 11:09:12 -04:00
Jeremy Morse	8612417e5a	[DebugInfo][InstrRef] Don't break up ret-sequences on debug-info instrs When we have a terminator sequence (i.e. a tailcall or return), MIIsInTerminatorSequence is used to work out where the preceding ABI-setup instructions end, i.e. the parts that were glued to the terminator instruction. This allows LLVM to split blocks safely without having to worry about ABI stuff. The function only ignores DBG_VALUE instructions, meaning that the two debug instructions I recently added can end terminator sequences early, causing various MachineVerifier errors. This patch promotes the test for debug instructions from "isDebugValue" to "isDebugInstr", thus avoiding any debug-info interfering with this function. Differential Revision: https://reviews.llvm.org/D106660	2021-07-28 15:56:00 +01:00
Jun Ma	ca0fe3447f	[InstSimplify] Simplify llvm.vscale when vscale_range attribute exists Reduce llvm.vscale to constant based on vscale_range attribute. Differential Revision: https://reviews.llvm.org/D106850	2021-07-28 21:41:52 +08:00
Alexey Bataev	3ad6437fcc	[SLP]Fix build on MacOS, NFC.	2021-07-28 06:33:13 -07:00
Sanjay Patel	4c41caa287	[x86] improve CMOV codegen by pushing add into operands, part 3 In this episode, we are trying to avoid an x86 micro-arch quirk where complex (3 operand) LEA potentially costs significantly more than simple LEA. So we simultaneously push and pull the math around the CMOV to balance the operations. I looked at the debug spew during instruction selection and decided against trying a later DAGToDAG transform -- it seems very difficult to match if the trailing memops are already selected and managing the creation of extra instructions at that level is always tricky. Differential Revision: https://reviews.llvm.org/D106918	2021-07-28 09:10:33 -04:00
Simon Pilgrim	124d586382	[X86][AVX] Move VPERM2F128 defs above VINSERTF128 defs. NFC. This will be necessary for a future patch to lower VINSERTF128 custom folds to VPERM2F128	2021-07-28 14:02:17 +01:00
Alexey Bataev	e408d1dfab	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-07-28 05:49:06 -07:00
Wael Yehia	9559bd1990	[LTO][Legacy] Add new API to check presence of ctor/dtor functions. On AIX, the linker needs to check whether a given lto_module_t contains any constructor/destructor functions, in order to implement the behavior of the -bcdtors:all flag. See https://www.ibm.com/docs/en/aix/7.2?topic=l-ld-command for the flag's documentation. In llvm IR, constructor (destructor) functions are added to a special global array @llvm.global_ctors (@llvm.global_dtors). However, because these two symbols are artificial, they are not visited during the symbol traversal (using the lto_module_get_[num_symbols\|symbol_name\|symbol_attribute] API). This patch adds a new function to the libLTO interface that checks the presence of one or both of these two symbols. Reviewed By: steven_wu Differential Revision: https://reviews.llvm.org/D106887	2021-07-28 12:41:56 +00:00
Florian Hahn	c07dd2b885	[LV] Move recurrence backedge fixup code to VPlan::execute (NFC). As suggested in D105008, move the code that fixes up the backedge value for first order recurrences to VPlan::execute. Now all that remains in fixFirstOrderRecurrences is the code responsible for creating the exit values in the middle block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D106244	2021-07-28 13:32:40 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
Chris Jackson	04b94c7cae	Revert "[DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR" Crashes were reported on the upstreamm revision: https://reviews.llvm.org/D105207 This reverts commit `796b84d26f`.	2021-07-28 10:05:54 +01:00
RamNalamothu	1a8c57179a	[AMDGPU] We would need FP if there is call and caller save VGPR spills Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs to consider caller save VGPR spills as well while anticipating if we require FP. Fixes: SWDEV-295978 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106758	2021-07-28 11:12:55 +05:30
Valentin Clement	fe7ca1a9fc	[mlir][openacc] Initial translation for DataOp to LLVM IR Add basic translation of acc.data to LLVM IR with runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104301	2021-07-27 22:04:04 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Juneyoung Lee	4f71f59bf3	[DAGCombiner] Fold SETCC(FREEZE(x),const) to FREEZE(SETCC(x,const)) if SETCC is used by BRCOND This patch adds a peephole optimization `SETCC(FREEZE(x),const)` => `FREEZE(SETCC(x,const))` if the SETCC is only used by BRCOND. Combined with `BRCOND(FREEZE(X)) => BRCOND(X)`, this leads to a nice improvement in the generated assembly when x is a masked loaded value. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D105344	2021-07-28 09:22:15 +09:00
Xiang1 Zhang	3223d41017	[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D106780	2021-07-28 08:16:59 +08:00
Johannes Doerfert	3dca83961c	Reapply "[Attributor] Disable simplification AAs if a callback is present"" This reapplies commit `cbb709e251` and includes the use of the lookup method instead of operator[] to avoid accidentally setting (empty) simplification callbacks. This reverts commit `aa27430a62`.	2021-07-27 19:14:50 -05:00
Xiang1 Zhang	2ca3937131	Revert "[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT" This reverts commit `6ff73efea9`.	2021-07-28 08:12:29 +08:00
Xiang1 Zhang	6ff73efea9	[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT	2021-07-28 08:08:30 +08:00
Krzysztof Parzyszek	64d5b6e373	[Hexagon] Fix resetting dead registers in DBG_VALUE_LISTs This fixes https://llvm.org/PR51229.	2021-07-27 18:36:28 -05:00
Johannes Doerfert	aa27430a62	Revert "[Attributor] Disable simplification AAs if a callback is present" This reverts commit `cbb709e251` as it breaks the tests, which was not supposed to happen. Investigating now.	2021-07-27 18:09:42 -05:00
Johannes Doerfert	fd520e75f1	[Attributor] Verify `checkForAllUses` return value properly Also do not emit more than one remark after Heap2Stack failed.	2021-07-27 17:50:27 -05:00
Johannes Doerfert	cbb709e251	[Attributor] Disable simplification AAs if a callback is present AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at the IR by default. If queried for a IR position which has a simplification callback we should either look at the callback return, or give up. We do the latter for now.	2021-07-27 17:50:26 -05:00
Mircea Trofin	935dea2cb2	[MLGO] fix silly LLVM_DEBUG misuse	2021-07-27 15:10:28 -07:00
Mircea Trofin	eb76ca573d	[NFC][MLGO] Debug messages for what inline advisor is selected We already have an indication (error) if the desired inline advisor cannot be enabled, but we don't have a positive indication. Added LLVM_DEBUG messages for the latter.	2021-07-27 15:05:39 -07:00
Nemanja Ivanovic	778932c673	[PowerPC] Turn deprecated altivec prefetch instrs to nops on AIX The dst/dstt/dstst/dststt instructions are nop's on all PowerPC cores that AIX supports. The AIX assembler also does not accept these mnemonics. Turn them into nop's on AIX (similar to dstall).	2021-07-27 15:50:02 -05:00
Sanjay Patel	156ba620b3	[x86] update stale code comment; NFC The transform was generalized with: `1ce05ad619`	2021-07-27 16:45:52 -04:00
Matt Arsenault	d7d2e4545e	AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9 The patterns for the m0 glue patterns were failing to import.	2021-07-27 15:56:42 -04:00
Benjamin Kramer	05815c9f63	Remove unused include that's also a layering violation. NFC.	2021-07-27 21:21:55 +02:00
Amara Emerson	a11d9a1f48	[AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection. Causes a fallback because of lack of regclasses on vregs, unless its without asserts, where we end up crashing later in codegen.	2021-07-27 12:13:56 -07:00
Enna1	1ee6559ef6	[ASAN] NFC: Remove redundant variable `StackAlignment` has only one use: `StackAlignment = std::max(StackAlignment, AI.getAlignment());` So it is redundant. Reviewed By: vitalybuka, MTC Differential Revision: https://reviews.llvm.org/D106741	2021-07-27 12:02:37 -07:00
Adam Nemet	d87d3615f7	[Matrix] Fix shape for factored transpose The shape of the input is C x R. Differential Revision: https://reviews.llvm.org/D106722	2021-07-27 11:36:13 -07:00
Adam Nemet	bf7eb48454	[Matrix] RAUW should only replace an instruction in ShapeMap if supportsShapeInfo As an instruction is replaced in optimizeTransposes RAUW will replace it in the ShapeMap (ShapeMap is ValueMap so that uses are updated). In finalizeLowering however we skip updating uses if they are in the ShapeMap since they will be lowered separately at which point we pick up the lowered operands. In the testcase what happened was that since we replaced the doubled-transpose with the shuffle, it ended up in the ShapeMap. As we lowered the columnwise-load the use in the shuffle was not updated. Then as we removed the original columnwise-load we changed that to an undef. I.e. we ended up with: ``` %shuf = shufflevector <8 x double> undef, <8 x double> poison, <6 x i32> ^^^^^ <i32 0, i32 1, i32 2, i32 4, i32 5, i32 6> ``` Besides the fix itself, I have fortified this last bit. As we change uses to undef when removing instruction we track the undefed instruction to make sure we eventually remove those too. This would have caught the issue at compile time. Differential Revision: https://reviews.llvm.org/D106714	2021-07-27 11:36:13 -07:00
Alexey Zhikhartsev	02077da7e7	Add jump-threading optimization for deterministic finite automata The current JumpThreading pass does not jump thread loops since it can result in irreducible control flow that harms other optimizations. This prevents switch statements inside a loop from being optimized to use unconditional branches. This code pattern occurs in the core_state_transition function of Coremark. The state machine can be implemented manually with goto statements resulting in a large runtime improvement, and this transform makes the switch implementation match the goto version in performance. This patch specifically targets switch statements inside a loop that have the opportunity to be threaded. Once it identifies an opportunity, it creates new paths that branch directly to the correct code block. For example, the left CFG could be transformed to the right CFG: ``` sw.bb sw.bb / \| \ / \| \ case1 case2 case3 case1 case2 case3 \ \| / / \| \ latch.bb latch.2 latch.3 latch.1 br sw.bb / \| \ sw.bb.2 sw.bb.3 sw.bb.1 br case2 br case3 br case1 ``` Co-author: Justin Kreiner @jkreiner Co-author: Ehsan Amiri @amehsan Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D99205	2021-07-27 14:34:04 -04:00
Craig Topper	3852b8c70f	[RISCV] Select vector shl by 1 to a vector add. A vector add may be faster than a vector shift. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106689	2021-07-27 10:57:28 -07:00
Andy Kaylor	b373b5990d	Enabling the copy-constant-to-alloca optimization in more instances Patch by Mohammad Fawaz This patch allows lifetime calls to be ignored (and later erased) if we know that the copy-constant-to-alloca optimization is going to happen. The case that is missed is when the global variable is in a different address space than the alloca (as shown in the example added to the lit test.) This used to work before `6da31fa4a6` Differential Revision: https://reviews.llvm.org/D106573	2021-07-27 10:11:43 -07:00
David Sherwood	a5dd6c6cf9	[LoopVectorize] Don't interleave scalar ordered reductions for inner loops Consider the following loop: void foo(float dst, float src, int N) { for (int i = 0; i < N; i++) { dst[i] = 0.0; for (int j = 0; j < N; j++) { dst[i] += src[(i * N) + j]; } } } When we are not building with -Ofast we may attempt to vectorise the inner loop using ordered reductions instead. In addition we also try to select an appropriate interleave count for the inner loop. However, when choosing a VF=1 the inner loop will be scalar and there is existing code in selectInterleaveCount that limits the interleave count to 2 for reductions due to concerns about increasing the critical path. For ordered reductions this problem is even worse due to the additional data dependency, and so I've added code to simply disable interleaving for scalar ordered reductions for now. Test added here: Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll Differential Revision: https://reviews.llvm.org/D106646	2021-07-27 17:41:01 +01:00
Matt Arsenault	b32d3d9e81	AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source This is partially a workaround. SILowerI1Copies does not understand unstructured loops. This would result in inserting instructions to merge a mask register in the same block where it was defined in an unstructured loop.	2021-07-27 11:44:38 -04:00
Thomas Lively	33786576fd	[WebAssembly] Codegen for extmul SIMD instructions Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions with normal codegen patterns. Differential Revision: https://reviews.llvm.org/D106724	2021-07-27 08:41:30 -07:00
Anirudh Prasad	a8cfa4b9bd	[SystemZ][z/OS] Initial code to generate assembly files on z/OS - This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target. - Only the .text and the .bss sections are added for now. - The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections. - This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target - Further improvements and additions will be made in future patches. Reviewed By: tmatheson Differential Revision: https://reviews.llvm.org/D106380	2021-07-27 11:29:15 -04:00
Anna Thomas	8ee5759fd5	Strip undef implying attributes when moving calls When hoisting/moving calls to locations, we strip unknown metadata. Such calls are usually marked `speculatable`, i.e. they are guaranteed to not cause undefined behaviour when run anywhere. So, we should strip attributes that can cause immediate undefined behaviour if those attributes are not valid in the context where the call is moved to. This patch introduces such an API and uses it in relevant passes. See updated tests. Fix for PR50744. Reviewed By: nikic, jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D104641	2021-07-27 10:57:05 -04:00
Tres Popp	d225de60c9	Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI." This reverts commit `1cfecf4fc4`. This commit broke LLVM code generated through XLA by removing a conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD This is not a perfect revert. The new function is left as other uses of it exist now.	2021-07-27 16:55:50 +02:00
Tres Popp	70fa9479b2	Revert "Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI."" This reverts commit `d7bbb1230a`. There were follow up uses of a deleted method and I didn't run the tests. Undo the revert, so I can do it properly.	2021-07-27 16:48:31 +02:00

... 5 6 7 8 9 ...

149978 Commits