llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	cba40c4ede	[llvm] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 07:11:14 -08:00
Sergei Larin	a721ddbae9	Update MaxMinLatency even if dependencies have been already scheduled. Covers an extremely rare corner case on internal book keeping.	2021-11-09 06:47:49 -08:00
Kerry McLaughlin	0d748b4d32	[LoopVectorize] Extract the last lane from a uniform store Changes VPReplicateRecipe to extract the last lane from an unconditional, uniform store instruction. collectLoopUniforms will also add stores to the list of uniform instructions where Legal->isUniformMemOp is true. setCostBasedWideningDecision now sets the widening decision for all uniform memory ops to Scalarize, where previously GatherScatter may have been chosen for scalable stores. This fixes an assert ("Cannot yet scalarize uniform stores") in setCostBasedWideningDecision when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: sdesmalen, david-arm, fhahn Differential Revision: https://reviews.llvm.org/D112725	2021-11-09 14:43:16 +00:00
Sanjay Patel	c36b7e21bd	[InstCombine] enhance vector bitwise select matching (Cond & C) \| (~bitcast(Cond) & D) --> bitcast (select Cond, (bc C), (bc D)) This is part of fixing: https://llvm.org/PR34047 That report shows a case where a bitcast is sitting between the select condition candidate and its 'not' value due to current cast canonicalization rules. There's a bitcast type restriction that might be violated in existing matching, but I still need to investigate if that is possible - Alive2 shows we can only do this transform safely when the bitcast is from narrow to wide vector elements (otherwise poison could leak into elements that were safe in the original code): https://alive2.llvm.org/ce/z/Hf66qh Differential Revision: https://reviews.llvm.org/D113035	2021-11-09 08:54:59 -05:00
Chris Jackson	116dc70cf3	[DebugInfo][LSR] Add more stringent checks on IV selection and salvage attempts Prevent the selection of IVs that have a SCEV containing an undef. Also prevent salvaging attempts for values for which a SCEV could not be created by ScalarEvolution and have only SCEVUknown. Reviewed by: Orlando Differential Revision: https://reviews.llvm.org/D111810	2021-11-09 13:09:37 +00:00
Florian Hahn	2ead34716a	[SimplifyCFG] Add early bailout if Use is not in same BB. Without this patch, passingValueIsAlwaysUndefined will iterate over all instructions from I to the end of the basic block, even if the use is outside the block. This patch adds an early bail out, if the use instruction is outside I's BB. This can greatly reduce compile-time in cases where very large basic blocks are involved, with a large number of PHI nodes and incoming values. Note that the refactoring makes the handling of the case where I is a phi and Use is in PHI more explicit as well: for phi nodes, we can also directly bail out. In the existing code, we would iterate until we reach the end and return false. Based on an earlier patch by Matt Wala. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113293	2021-11-09 12:57:03 +00:00
Andrew Savonichev	b702276ad0	[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner. FMUL_indexed is normally selected during instruction selection, but it does not work in cases when VDUP and VMUL are in different basic blocks. Differential Revision: https://reviews.llvm.org/D99662	2021-11-09 15:30:19 +03:00
Simon Pilgrim	58c01ef270	[SelectionDAG] Merge FoldConstantVectorArithmetic into FoldConstantArithmetic (PR36544) This patch merges FoldConstantVectorArithmetic back into FoldConstantArithmetic. Like FoldConstantVectorArithmetic we now handle vector ops with any operand count, but we currently still only handle binops for scalar types - this can be improved in future patches - in particular some common unary/trinary ops still have poor constant folding. There's one change in functionality causing test changes - FoldConstantVectorArithmetic bails early if the build/splat vector isn't all constant (with some undefs) elements, but FoldConstantArithmetic doesn't - it instead attempts to fold the scalar nodes and bails if they fail to regenerate a constant/undef result, allowing some additional identity/undef patterns to be handled. Differential Revision: https://reviews.llvm.org/D113300	2021-11-09 11:31:01 +00:00
Alexey Lapshin	c8ae08987d	[llvm-dwarfdump] dump link to the immediate parent. It is often useful to know which die is the parent of the current die. This patch adds information about parent offset into the dump: 0x0000000b: DW_TAG_compile_unit DW_AT_producer ("by_hand") 0x00000014: DW_TAG_base_type (0x0000000b) <<<<<<<<<<<<<< DW_AT_name ("int") Now it is easy to see which die is the parent of the current die. This patch makes that behaviour to be default. We can make it to be opt-in if neccessary. This functionality differs from already existed "--show-parents" in that sence that parent information is shown for all dies and only link to the immediate parent is shown. Differential Revision: https://reviews.llvm.org/D113406	2021-11-09 14:14:06 +03:00
Max Kazantsev	cb728cb8a9	[NFC] Get rid of hardcoded magical constant and use Optionals instead Refactor calculateIterationsToInvariance so that it doesn't need a magical constant to signify unknown answer.	2021-11-09 18:13:19 +07:00
Roman Lebedev	d484cc152b	[TTI] Adjust `getReplicationShuffleCost()` interface It is trivial to produce DemandedSrcElts given DemandedReplicatedElts, so don't pass the former. Also, it isn't really useful so far to have the overload taking the Mask, so just inline it.	2021-11-09 14:07:59 +03:00
Simon Pilgrim	32a4a883f6	Revert rGe1eec7601b6988b35ae3cdc8d67cf3cf4e1361dd "[XCOFF][yaml2obj] support for the auxiliary file header." This is failing on MSVC builds: https://lab.llvm.org/buildbot/#/builders/86/builds/23436	2021-11-09 11:02:13 +00:00
Dmitry Makogon	5ec2386332	Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs" This reapplies patch `db289340c8`. The test failures on build with expensive checks caused by the patch happened due to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort, which shuffles the given container if the expensive checks are enabled, so equivalent Phis in the sorted vector had different mutual order from run to run. replaceCongruentIVs tries to replace narrow Phis with truncations of wide ones. In some test cases there were several Phis with the same width, so if their order differs from run to run, the narrow Phis would be replaced with a different Phi, depending on the shuffling result. The patch `ae14fae0ff` fixed this issue by replacing llvm::sort with llvm::stable_sort.	2021-11-09 17:42:29 +07:00
Florian Hahn	acbefbf19f	[VPlan] Guard code to dump instructions after `d9361bfbe2`. This should fix build failures when built without assertions enabled, e.g. https://lab.llvm.org/buildbot/#/builders/205/builds/172	2021-11-09 10:29:05 +00:00
Florian Hahn	d9361bfbe2	[VPlan] Add initial inner-loop VPlan verification. This patch adds a function to verify general properties of VPlans. The first check makes sure that all phi-like recipes are at the beginning of a block, with no other recipes in between. Note that this currently may not hold for VPBlendRecipes at the moment, as other recipes may be inserted before the VPBlendRecipe during mask creation. Note that this patch depends on D111300 and D111301, which fix code that breaks the checked invariant. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111302	2021-11-09 10:18:28 +00:00
Esme-Yi	e1eec7601b	[XCOFF][yaml2obj] support for the auxiliary file header. Summary: This patch adds yaml2obj supporting for the auxiliary file header of XCOFF. Reviewed By: DiggerLin, jhenderson Differential Revision: https://reviews.llvm.org/D111487	2021-11-09 09:48:40 +00:00
Dmitry Makogon	ae14fae0ff	[SCEVExpander] Use stable_sort to sort loop Phis in SCEVExpander::replaceCongruentIVs This is a fix for test failures on expensive checks build caused by `db289340c8`. With LLVM_ENABLE_EXPENSIVE_CHECKS enabled the llvm::sort shuffles the given container. However, the sort is only called when the TTI is passed to replaceCongruentIVs. In the mentioned patch we pass it TTI, so the sort happens. But due to shuffling equivalent Phis may appear in different order from run to run. With the stable_sort instead of sort this is impossible - the order of sorted Phis is preserved.	2021-11-09 16:29:57 +07:00
Jay Foad	5c3c7adf3a	[CodeGen] Fix assertion failure in TwoAddressInstructionPass::rescheduleMIBelowKill This fixes an assertion failure with -early-live-intervals when trying to update the live intervals for a debug instruction, which don't even have slot indexes. Differential Revision: https://reviews.llvm.org/D113116	2021-11-09 09:24:21 +00:00
Shao-Ce SUN	1c81941f19	[NFC][RISCV] Fix wrong predicates of vfwredsum	2021-11-09 17:19:50 +08:00
Kazu Hirata	c375cdc932	[Hexagon] Use MachineBasicBlock::{successors,predecessors} (NFC)	2021-11-09 00:26:06 -08:00
Carlos Galvez	7ecec3f0f5	[CUDA] Bump supported CUDA version to 11.5 Differential Revision: https://reviews.llvm.org/D113249	2021-11-09 08:20:53 +00:00
Akira Hatanaka	1fe8993ad8	[ObjC][ARC] Replace uses of ObjC intrinsics that are arguments of operand bundle "clang.arc.attachedcall" with ObjC runtime functions The existing code only handles the case where the intrinsic being rewritten is used as the called function pointer of a call/invoke.	2021-11-08 21:19:07 -08:00
Liqiang Tao	6cad45d5c6	[llvm][Inline] Add a module level inliner Add module level inliner, which is a minimum viable product at this point. Also add some tests for it. RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-August/152297.html Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D106448	2021-11-09 11:03:29 +08:00
Akira Hatanaka	8f8d9f743d	[ObjC][ARC] Handle operand bundle "clang.arc.attachedcall" on targets that don't use the inline asm marker This patch makes the changes to the ARC middle-end passes that are needed to handle operand bundle "clang.arc.attachedcall" on targets that don't use the inline asm marker for the retainRV/autoreleaseRV handshake (e.g., x86-64). Note that anyone who wants to use the operand bundle on their target has to teach their backend to handle the operand bundle. The x86-64 backend already knows about the operand bundle (see https://reviews.llvm.org/D94597). Differential Revision: https://reviews.llvm.org/D111334	2021-11-08 18:38:39 -08:00
River Riddle	7480efd6f0	[Tablegen] Collect all global state into one managed static Tablegen uses copious amounts of global state for uniquing various records. This was fine under the original vision where tablegen was a tool, and not a library, but there are various usages of tablegen that want to use it as a library. One concrete example is that downstream we have a kythe indexer for tablegen constructs that allows for IDEs to serve go-to-definition/references/and more. We currently (kind of hackily) keep the tablegen parts in a shared library that gets loaded/unloaded. This revision starts to remedy this by globbing all of the static state into a managed static so that they can at least be unloaded with llvm_shutdown. A better solution would be to feed in a context variable (much like how the IR in LLVM/MLIR do), but that is a more invasive change that can come later. Differential Revision: https://reviews.llvm.org/D108934	2021-11-09 01:24:54 +00:00
Wouter van Oortmerssen	62eeb3e57e	[WebAssembly] fix __stack_pointer being added to .debug_aranges When emitting a reloc for the Wasm global __stack_pointer, it was inadvertedly added to the symbols used for generating aranges, which caused some aranges to use it as the end symbol in a symbol diff, which caused a reloc for it to be emitted, which then caused an assert in `wasm64` since we have no 64-bit relocs for Wasm globals. Fixes: https://bugs.llvm.org/show_bug.cgi?id=52376 Differential Revision: https://reviews.llvm.org/D113438	2021-11-08 16:30:31 -08:00
Wouter van Oortmerssen	4a0c89a6cf	[WebAssembly] Fix fixBrTableIndex removing instruction without checking uses Fixes: https://bugs.llvm.org/show_bug.cgi?id=52352 Differential Revision: https://reviews.llvm.org/D113230	2021-11-08 15:53:44 -08:00
Craig Topper	376233113e	[RISCV] Use TargetConstant for CSR number for READ_CSR/WRITE_CSR. This is consistent with what we do for other operands that are required to be constants. I don't think this results in any real changes. The pattern match code for isel treats ConstantSDNode and TargetConstantSDNode the same.	2021-11-08 15:10:24 -08:00
Arthur Eubanks	28a06a1b87	[NFC][FuncAttrs] Keep track of modified functions This is in preparation for only invalidating analyses on changed functions. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D113303	2021-11-08 15:04:56 -08:00
Akira Hatanaka	f2c7c3c7c7	[ObjC][ARC] Invalidate an entry of UnderlyingObjCPtrCache when the instruction the key points to is deleted Use weak value handles for both the key and the value. The entry is invalid if either value handle is null. This fixes an assertion failure in BasicAAResult::alias that is caused by UnderlyingObjCPtrCache returning a wrong value. I don't have a test case for this patch that fails reliably. rdar://83984790	2021-11-08 14:41:06 -08:00
Ard Biesheuvel	2caf85ad7a	[ARM] implement LOAD_STACK_GUARD for remaining targets Currently, LOAD_STACK_GUARD on ARM is only implemented for Mach-O targets, and other targets rely on the generic support which may result in spilling of the stack canary value or address, or may cause it to be kept in a callee save register across function calls, which means they essentially get spilled as well, only by the callee when it wants to free up this register. So let's implement LOAD_STACK GUARD for other targets as well. This ensures that the load of the stack canary is rematerialized fully in the epilogue. This code was split off from D112768: [ARM] implement support for TLS register based stack protector for which it is a prerequisite. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112811	2021-11-08 22:59:15 +01:00
Michael Liao	bf225939bc	[InferAddressSpaces] Support assumed addrspaces from addrspace predicates. - CUDA cannot associate memory space with pointer types. Even though Clang could add extra attributes to specify the address space explicitly on a pointer type, it breaks the portability between Clang and NVCC. - This change proposes to assume the address space from a pointer from the assumption built upon target-specific address space predicates, such as `__isGlobal` from CUDA. E.g., ``` foo(float *p) { __builtin_assume(__isGlobal(p)); // From there, we could assume p is a global pointer instead of a // generic one. } ``` This makes the code portable without introducing the implementation-specific features. Note that NVCC starts to support __builtin_assume from version 11. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112041	2021-11-08 16:51:57 -05:00
Martin Storsjö	46ec93a457	[Support] [VirtualFileSystem] Detect the windows_slash path style This fixes the following clang VFS tests, if `windows_slash` is the default style: Clang :: VFS/implicit-include.c Clang :: VFS/relative-path.c Clang-Unit :: Frontend/./FrontendTests.exe/CompilerInstance.DefaultVFSOverlayFromInvocation Also clarify a couple references to `Style::windows` into `Style::windows_backslash`, to make it clearer that each of them are opinionated in different directions (even if it doesn't matter for calls to e.g. `is_absolute`). Differential Revision: https://reviews.llvm.org/D113272	2021-11-08 22:21:29 +02:00
Nikita Popov	1376301c87	[InstCombine] Canonicalize range test idiom InstCombine converts range tests of the form (X > C1 && X < C2) or (X < C1 \|\| X > C2) into checks of the form (X + C3 < C4) or (X + C3 > C4). It is possible to express all range tests in either of these forms (with different choices of constants), but currently neither of them is considered canonical. We may have equivalent range tests using either ult or ugt. This proposes to canonicalize all range tests to use ult. An alternative would be to canonicalize to either ult or ugt depending on the specific constants involved -- e.g. in practice we currently generate ult for && style ranges and ugt for \|\| style ranges when going through the insertRangeTest() helper. In fact, the "clamp like" fold was relying on this, which is why I had to tweak it to not assume whether inversion is needed based on just the predicate. Proof: https://alive2.llvm.org/ce/z/_SP_rQ Differential Revision: https://reviews.llvm.org/D113366	2021-11-08 21:15:46 +01:00
Adrian Prantl	8bd8dd16e2	Extend obj2yaml to optionally preserve raw __LINKEDIT/__DATA segments. I am planning to upstream MachOObjectFile code to support Darwin chained fixups. In order to test the new parser features we need a way to produce correct (and incorrect) chained fixups. Right now the only tool that can produce them is the Darwin linker. To avoid having to check in binary files, this patch allows obj2yaml to print a hexdump of the raw LINKEDIT and DATA segment, which both allows to bootstrap the parser and enables us to easily create malformed inputs to test error handling in the parser. This patch adds two new options to obj2yaml: -raw-data-segment -raw-linkedit-segment Differential Revision: https://reviews.llvm.org/D113234	2021-11-08 11:30:12 -08:00
Florian Hahn	e3bfb6a146	[VPlan] Make sure recurrence splice is not inserted between phis. All phi-like recipes should be at the beginning of a VPBasicBlock with no other recipes in between. Ensure that the recurrence-splicing recipe is not added between phi-like recipes, but after them. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111301	2021-11-08 17:42:32 +00:00
Craig Topper	304edbb553	[RISCV] SMUL_LOHI/UMUL_LOHI should expand for RVV. These and MULHS/MULHU both default to Legal. Targets need to set the ones they don't support to Expand. I think MULHS/MULHU likely has priority in most places so this change probably isn't directly testable. I found it while looking at disabling MULHS/MULHU for nxvXi64 as required for Zve64x. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D113325	2021-11-08 09:38:36 -08:00
Kazu Hirata	3c06920cd1	[llvm] Use make_early_inc_range (NFC)	2021-11-08 09:09:39 -08:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
Joe Nash	79f52af4cd	[AMDGPU] Make getInstSizeInBytes more generic NFC. This check mainly handles size affecting literals. Make it check all explicit operands instead of a few by name. Also make the isLiteral check handle the KIMM operands, see https://reviews.llvm.org/D111067 Change-Id: I1a362d55b2a10f5c74d445272e8b7829a8b77597 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D113318 Change-Id: Ie6c688f30a71e0335d1c6dd1ff65019bd7ce684e	2021-11-08 10:34:49 -05:00
Anton Afanasyev	ce4fa93db8	[SCCP] Tune cast instruction handling for overdefined operand Extended value is known to be inside range smaller than full one. Prevent SCCP to mark such value as overdefined. Fixes PR52253 Differential Revision: https://reviews.llvm.org/D112721	2021-11-08 18:34:30 +03:00
Simon Pilgrim	7f32edea23	[X86] combineMulToPMADDWD - use ComputeMinSignedBits(). NFCI. Use ComputeMinSignedBits() to ensure the mul source operands at least sign-extend down from the bottom 16 bits. This will make it easier if/when we try to support handling of source types larger than 32-bits.	2021-11-08 15:28:31 +00:00
David Sherwood	c63b0f471b	[NFC][LoopVectorize] Make the createStepForVF interface more caller-friendly The common use case for calling createStepForVF is currently something like: Value Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF); and it makes more sense to reduce overall lines of code and change the function to let it create the constant instead. With my patch this becomes: Value Step = createStepForVF(Builder, Ty, VF, UF); and the ConstantInt is created instead createStepForVF. A side-effect of this is that the code in createStepForVF is also becomes simpler. As part of this patch I've also replaced some calls to getRuntimeVF with calls to createStepForVF, i.e. getRuntimeVF(Builder, Count->getType(), VFactor * UFactor) -> createStepForVF(Builder, Count->getType(), VFactor, UFactor) because this feels semantically better. Differential Revision: https://reviews.llvm.org/D113122	2021-11-08 15:14:14 +00:00
Mindong Chen	495e258fd7	[AArch64][SVE] Add FP types to the supported SVE structure load/stores vector type list This adds FP type support to the SVE Container type list as a supplement to D112303. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113333	2021-11-08 22:29:08 +08:00
Simon Pilgrim	f059b04f7b	[DAG] Add SelectionDAG::ComputeMinSignedBits helper As suggested on D113371, this adds a wrapper to SelectionDAG::ComputeNumSignBits, similar to the llvm::ComputeMinSignedBits wrapper. I've included some usage, its not exhaustive, just the more obvious cases where the intention is obvious. Differential Revision: https://reviews.llvm.org/D113396	2021-11-08 14:12:45 +00:00
David Sherwood	8d38c24fb6	[SVE][CodeGen] Improve codegen for some FP insert_subvector cases When inserting an unpacked FP subvector into a packed vector we can simply cast the unpacked value into a packed value, since both types are legal for SVE. We can then use this as the input for the UZP instruction. This avoids us expanding the operation by going through the stack. Differential Revision: https://reviews.llvm.org/D113270	2021-11-08 13:45:55 +00:00
Anastasia Stulova	a10a69fe9c	[SPIR-V] Add SPIR-V triple and clang target info. Add new triple and target info for ‘spirv32’ and ‘spirv64’ and, thus, enabling clang (LLVM IR) code emission to SPIR-V target. The target for SPIR-V is mostly reused from SPIR by derivation from a common base class since IR output for SPIR-V is mostly the same as SPIR. Some refactoring are made accordingly. Added and updated tests for parts that are different between SPIR and SPIR-V. Patch by linjamaki (Henry Linjamäki)! Differential Revision: https://reviews.llvm.org/D109144	2021-11-08 13:34:10 +00:00
Dmitry Makogon	8d4eba6c0d	Revert "[IndVars] Pass TTI to replaceCongruentIVs" This reverts commit `db289340c8`. The patch caused 2 crashes with expensive checks enabled.	2021-11-08 19:35:14 +07:00
Matt	4a59694ba1	[AArch64][SVE] Combine FADD and FMUL aarch64 intrinsics to FMLA This is a refinement to the work in https://reviews.llvm.org/D111638 Fold (fadd p a (fmul p b c)) into (fma p a b c) Differential Revision: https://reviews.llvm.org/D113095	2021-11-08 12:22:38 +00:00
Dmitry Makogon	db289340c8	[IndVars] Pass TTI to replaceCongruentIVs In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'. This function optionally takes a TTI argument to be able to replace narrow IVs uses with truncates of the widest one. For some reason the TTI wasn't passed to the function, so it couldn't perform such transform. This patch fixes it. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D113024	2021-11-08 19:20:53 +07:00
Simon Pilgrim	f60d3ec0c7	[DAG] Add BuildVectorSDNode::getConstantRawBits helper We have several places where we need to extract the raw bits data from a BUILD_VECTOR node, so consolidate this to a single helper function that handles Undefs and Integer/FP constants, including implicit truncation. This should make it easier to extend D113202 to handle more constant folding of bitcasted constant data. Differential Revision: https://reviews.llvm.org/D113351	2021-11-08 12:07:38 +00:00
Simon Moll	c2b91eef27	[VE] default to integrated asm in AsmInfo VE integrated asm has been the default in Clang. Also use the default setting for integrated asm in the backend. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D113384	2021-11-08 11:58:29 +01:00
David Green	a982940eb5	[AArch64] Combine fptoi.sat(fmul) to fixed point cvtf We already have patterns for fptosi and fptoui plus fmul to fixed point convert, this adds equivalent patterns for fptosi.sat and fptoui.sat, which should apply equally well for the legal saturating variants. Differential Revision: https://reviews.llvm.org/D113199	2021-11-08 10:07:34 +00:00
David Sherwood	c42bb30b9e	[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor we bail out if the main vector loop uses a scalable VF. This patch adds support for generating epilogue vector loops using a fixed-width VF when the main vector loop uses a scalable VF. I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor so that we convert the scalable VF into a fixed-width VF and do profitability checks on that instead. In addition, since the scalable and fixed-width VFs live in different VPlans that means I had to change the calls to LVP.hasPlanWithVFs so that we only pass in the fixed-width VF. New tests added here: Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll Differential Revision: https://reviews.llvm.org/D109432	2021-11-08 09:41:13 +00:00
Qiu Chaofan	9b5e2b5261	[PowerPC] Implement basic macro fusion in Power10 Including basic fusion types around arithmetic and logical instructions. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111693	2021-11-08 17:23:56 +08:00
Andrew Wei	bf3784b882	[AArch64] Canonicalize X(Y+1) or X(1-Y) to madd/msub Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern Reviewed By: dmgreen, sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D111862	2021-11-08 16:49:31 +08:00
skc7	a0633f5ccb	[AMDGPU] Test Commit. NFC Reviewed By: hsmhsm Differential Revision: https://reviews.llvm.org/D113379	2021-11-08 07:09:09 +00:00
Ben Shi	e32cf690df	[RISCV] Optimize (add (mul r, c0), c1) Optimize (add (mul x, c0), c1) -> (add (mul (add x, c1/c0+1), c0), c1%c0-c0), if c1/c0+1 and c1%c0-c0 are simm12, while c1 is not. Optimize (add (mul x, c0), c1) -> (add (mul (add x, c1/c0-1), c0), c1%c0+c0), if c1/c0-1 and c1%c0+c0 are simm12, while c1 is not. Reviewed By: craig.topper, asb Differential Revision: https://reviews.llvm.org/D111141	2021-11-08 02:58:25 +00:00
Chen Zheng	7c6f5950f0	[PowerPC] comment for different input register classes; nfc Add comments to explain why XXPERMDIs and XXPERMDI have different input register classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI. This addresses the comments in abandoned patch D113178, we keep using `f0` instead of using `vs0` for XXPERMDIs on purpose.	2021-11-08 02:21:30 +00:00
Zi Xuan Wu	4fb282fec5	[CSKY] Add CSKY 16-bit instruction format and encoding CSKY is a ARCH which supports mixture of 16-bit and 32-bit instructions natively, and there is not an indivual predictor or feature to enable/disable 16-bit instruction. So I think it's better to add 16-bit instruction early, and naturally to use 16-bit and 32-bit instructions. Differential Revision: https://reviews.llvm.org/D112919	2021-11-08 10:02:15 +08:00
Chen Zheng	50acbbe3cd	[AsmPrinter][ORE] use correct opcode name Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D113173	2021-11-08 01:51:24 +00:00
Kazu Hirata	0d182d9d1e	[Transforms] Use make_early_inc_range (NFC)	2021-11-07 17:03:15 -08:00
Simon Pilgrim	55e4cd8485	[X86][AVX2] Recognise 256-bit truncation shuffles and mask 256-bit source For v8i16 shuffle patterns that are lowered with AND+PACKUS, check to see if the sources are from a 256-bit vector and perform the masking using BLENDW at the 256-bit level. With the test changes we can see more examples of duplicate XMM/YMM zero vectors (PR26018) :(	2021-11-07 21:24:55 +00:00
Nikita Popov	2060895c9c	[ConstantRange] Add exact union/intersect (NFC) For some optimizations on comparisons it's necessary that the union/intersect is exact and not a superset. Add methods that return Optional<ConstantRange> only if the result is exact. For the sake of simplicity this is implemented by comparing the subset and superset approximations for now, but it should be possible to do this more directly, as unionWith() and intersectWith() already distinguish the cases where the result is imprecise for the preferred range type functionality.	2021-11-07 21:46:06 +01:00
Nikita Popov	cf71a5ea8f	[ConstantRange] Support zero size in isSizeLargerThan() From an API perspective, it does not make a lot of sense that 0 is not a valid argument to this function. Add the exact check needed to support it.	2021-11-07 21:22:45 +01:00
Nikita Popov	a8c318b50e	[BasicAA] Use index size instead of pointer size When accumulating the GEP offset in BasicAA, we should use the pointer index size rather than the pointer size. Differential Revision: https://reviews.llvm.org/D112370	2021-11-07 18:56:11 +01:00
Kazu Hirata	aee86f9b6c	[AMDGPU] Remove unused declaration selectSMRD (NFC) The function body proper was removed on Feb 20, 2019 in commit `79b5c3842b`.	2021-11-07 09:53:18 -08:00
Kazu Hirata	41ef3187e0	[ARM, X86] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-07 09:53:16 -08:00
Simon Pilgrim	f057756a1a	[SLP] Fix Wdocumentation warning - remove \returns from void function. NFC.	2021-11-07 15:08:39 +00:00
Simon Pilgrim	d391e4fe84	[X86] Update RET/LRET instruction to use the same naming convention as IRET (PR36876). NFC Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth. Helps prevent future scheduler model mismatches like those that were only addressed in D44687. Differential Revision: https://reviews.llvm.org/D113302	2021-11-07 15:06:54 +00:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Simon Pilgrim	b5ef56f0bc	[X86][AVX] Add missing X86ISD::VBROADCAST(v4f32 -> v8f32) isel pattern for AVX1 targets D109434 addressed the v2f64 -> v4f64 case, an internal test has found an equivalent crash for the v4f32 -> v8f32 case.	2021-11-07 12:59:35 +00:00
Simon Pilgrim	0ff1edeeec	[DAG] SimplifyVBinOp - replace FoldConstantVectorArithmetic with FoldConstantArithmetic Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.	2021-11-07 12:11:46 +00:00
Kazu Hirata	22e21da47d	[WebAssembly] Remove unused declaration SelectExternRefAddr (NFC)	2021-11-06 19:31:22 -07:00
Kazu Hirata	e4bab21848	[AMDGPU] Use MachineBasicBlock::{predecessors,successors} (NFC)	2021-11-06 19:31:20 -07:00
Kazu Hirata	843d1eda18	[llvm] Use llvm::reverse (NFC)	2021-11-06 19:31:18 -07:00
Fangrui Song	d9e2c8f54d	[yaml2obj][COFF] Make some PEHeader fields optional This makes it easy to write tests where the irrelevant fields are not needed.	2021-11-06 16:39:59 -07:00
Nikita Popov	9f0194be45	[ConstantRange] Add getEquivalentICmp() variant with offset (NFCI) Add a variant of getEquivalentICmp() that produces an optional offset. This allows us to create an equivalent icmp for all ranges. Use this in the with.overflow folding code, which was doing this adjustment separately -- this clarifies that the fold will indeed always apply.	2021-11-06 21:59:45 +01:00
Kazu Hirata	cefc01fa65	[X86] Simplify a call to MachineBasicBlock::erase (NFC)	2021-11-06 13:08:25 -07:00
Kazu Hirata	815e8b5a20	[Hexagon] Remove an extraneous variable (NFC)	2021-11-06 13:08:23 -07:00
Kazu Hirata	14d656b3d8	[Target] Use llvm::reverse (NFC)	2021-11-06 13:08:21 -07:00
Nikita Popov	e3cec17b2d	[InstSimplify] Remove incorrect icmp of gep fold (PR52429) As described in https://bugs.llvm.org/show_bug.cgi?id=52429 this fold is incorrect, because inbounds only guarantees that the pointers don't wrap in the unsigned space: It is possible that the sign boundary is crossed by an object. I'm dropping the fold entirely rather than adjusting it, because computePointerICmp() fully subsumes it (just with correct predicate handling). Differential Revision: https://reviews.llvm.org/D113343	2021-11-06 21:03:21 +01:00
Nikita Popov	f8627877a9	[SCEV] Make eraseValueFromMap() private (NFC) The public API for this functionality is forgetValue(). There was only one call from LoopVectorize, which was directly next to a forgetValue() call and as such redundant.	2021-11-06 17:14:02 +01:00
Roman Lebedev	a30ec4778a	[TTI][CostModel] `getUserCost()`: recognize replication shuffles and query their cost This finally creates proper test coverage for replication shuffles, that are used by LV for conditional loads, and will allow to add proper costmodel at least for AVX512. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113324	2021-11-06 16:45:15 +03:00
Roman Lebedev	f8efc5c0ac	[NFC][TTI] Add/extract `getReplicationShuffleCost()` method, deduplicate it's implementations Hiding it in `getInterleavedMemoryOpCost()` is problematic for a number of reasons, including testability and reuse, let's do better. In a followup `getUserCost()` will be taught to use to to estimate the mask costs, which will allow for better cost model tests for it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D113313	2021-11-06 16:45:15 +03:00
Sanjay Patel	39c4c7d391	[DAGCombiner] remove vselect fold that was accidentally added This diff snuck into the unrelated: `025a2f73a3` It's a suggested follow-up for D113212, but I need to add test coverage first.	2021-11-06 09:34:30 -04:00
Sanjay Patel	83c2fb9f66	[InstCombine] match usub.sat from umax intrinsic umax(X, Op1) - Op1 --> usub.sat(X, Op1) https://alive2.llvm.org/ce/z/HpcGiJ This happens in 2 or more steps with an icmp-select idiom instead of an intrinsic. This is another step towards canonicalization of the min/max intrinsics. See: D98152	2021-11-06 08:32:52 -04:00
Sanjay Patel	025a2f73a3	[InstCombine] add tests for umax with sub; NFC	2021-11-06 08:32:52 -04:00
David Blaikie	0a5c26f2ef	DebugInfo: Simplified Template Names: drop unneeded space in arrays Matching a recent clang change I've made, now 'int[3]' is formatted without the space between the type and array bound. This commit updates libDebugInfoDWARF/llvm-dwarfdump to match that formatting.	2021-11-05 22:50:57 -07:00
Bin Cheng	54d891a7d5	[RISCV]: Fix typo by abstracting VWholeLoad* classes This patch abstracts VWholeLoad* classes into VWholeLoadN, simplifies existing code as well as fixes a typo. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109319	2021-11-06 10:48:03 +08:00
Bin Cheng	d488f1fff2	[RISCV][NFC]: Refactor classes for load/store instructions of RVV This patch refactors classes for load/store of V extension by: - Introduce new class for VUnitStrideLoadFF and VUnitStrideSegmentLoadFF so that uses of L/SUMOP* are not spread around different places. - Reorder classes for Unit-Stride load/store in line with table describing lumop/sumop in riscv-v-spec.pdf. Reviewed By: HsiangKai, craig.topper Differential Revision: https://reviews.llvm.org/D109318	2021-11-06 10:48:03 +08:00
Kazu Hirata	87e53a0ad8	[llvm] Use make_early_inc_range (NFC)	2021-11-05 19:39:07 -07:00
David Blaikie	f57d0e2726	DWARF Simplified Template Names: Narrow down the handling for operator overloads Actually we can, for now, remove the explicit "operator" handling entirely - since clang currently won't try to flag any of these as rebuildable. That seems like a reasonable state for now, but it could be narrowed down to only apply to conversion operators, most likely - but would need more nuance for op> and op>> since they would be incorrectly flagged as already having their template arguments (due to the trailing '>').	2021-11-05 15:41:56 -07:00
Philip Reames	d24a0e8857	[SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop. Differential Revision: https://reviews.llvm.org/D109457	2021-11-05 15:36:47 -07:00
Scott Linder	f82bdf0fcc	[NFC][Verifier] Remove redundant Module parameters These `M` parameters shadow the `M` member in `VerifierSupport`, and both always refer to the same module. Eliminate the redundant parameters and always use the member. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D106474	2021-11-05 21:30:02 +00:00
Jay Foad	bdaa181007	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 21:20:30 +00:00
Roman Lebedev	a5cd27880a	[IR] Improve member `ShuffleVectorInst::isReplicationMask()` When we have an actual shuffle, we can impose the additional restriction that the mask replicates the elements of the first operand, so we know the replication factor as a ratio of output and op0 vector sizes.	2021-11-06 00:09:27 +03:00
Martin Storsjö	86c01b1bc6	[DebugInfo] [PDB] Force injected source paths to use backslashes This fixes lld/COFF/pdb-natvis.test (which only is run on Windows) when using paths with forward slashes on Windows. Differential Revision: https://reviews.llvm.org/D113265	2021-11-05 21:50:42 +02:00
Sanjay Patel	7e30404c3b	[DAGCombiner] add fold for vselect based on mask of signbit, part 2 This is the 'or' sibling for the fold added with: D113212 https://alive2.llvm.org/ce/z/tgnp7K Note that neither of these transforms is poison-safe, but it does not seem to matter at this level. We have had the scalar version of D113212 for a long time, so this is just making optimizer behavior consistent. We do not have the scalar version of this fold, however, so that is another follow-up.	2021-11-05 15:02:12 -04:00
Yonghong Song	3466e00716	Reland "[Attr] support btf_type_tag attribute" This is to revert commit `f95bd18b5f` (Revert "[Attr] support btf_type_tag attribute") plus a bug fix. Previous change failed to handle cases like below: $ cat reduced.c void a(*); void a() {} $ clang -c reduced.c -O2 -g In such cases, during clang IR generation, for function a(), CGCodeGen has numParams = 1 for FunctionType. But for FunctionTypeLoc we have FuncTypeLoc.NumParams = 0. By using FunctionType.numParams as the bound to access FuncTypeLoc params, a random crash is triggered. The bug fix is to check against FuncTypeLoc.NumParams before accessing FuncTypeLoc.getParam(Idx). Differential Revision: https://reviews.llvm.org/D111199	2021-11-05 11:25:17 -07:00
Florian Hahn	f64580f8d2	[AArch64][GISel] Optimize 8 and 16 bit variants of uaddo. Try simplify G_UADDO with 8 or 16 bit operands to wide G_ADD and TBNZ if result is only used in the no-overflow case. It is restricted to cases where we know that the high-bits of the operands are 0. If there's an overflow, then the the 9th or 17th bit must be set, which can be checked using TBNZ. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D111888	2021-11-05 19:11:15 +01:00
Shao-Ce SUN	5c3d7184b4	[RISCV] Support Zfhmin extension According to RISC-V Unprivileged ISA 15.6. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D111866	2021-11-06 01:41:02 +08:00
Kazu Hirata	2c4ba3e9d3	[Target] Use make_early_inc_range (NFC)	2021-11-05 09:14:32 -07:00
Whitney Tsang	93421108d2	Add NoOpLoopNestPass and LOOPNEST_PASS macro Having a NoOpLoopNestPass can ensure that only outermost loop is invoked for a LoopNestPass with a lit test. There are some existing passes that are implemented as LoopNestPass, but they are still using LOOP_PASS macro. It would be easier to identify LoopNestPasses with a LOOPNEST_PASS macro. Differential Revision: https://reviews.llvm.org/D113185	2021-11-05 16:11:48 +00:00
Michael Liao	af2ae2cf42	[BranchRelaxation] Fix warning on unused variable. NFC.	2021-11-05 11:18:27 -04:00
David Green	08056e1888	[InstCombine] Generalize sadd.sat combine to compute sign bits. There is a combine in instcombine to transform a saturated add/sub into a saddsat/ssubsat, currently handling inputs which are both sign extended (https://alive2.llvm.org/ce/z/68qpTn). This can generalize to, for example ashr of at least the bitwidth (https://alive2.llvm.org/ce/z/4TFyX- and https://alive2.llvm.org/ce/z/qDWzFs for example). Which means it generalizes further to "the number of sign bits", needing to be enough to truncate to the size of the saturate. (An example using `or` for instance: https://alive2.llvm.org/ce/z/EI_h_A). So this patch makes use of ComputeNumSignBits (with the newly added ComputeMinSignedBits) in matchSAddSubSat to generalize the fold to any inputs with enough sign bits known, truncating the inputs to the new size of the saturate. Differential Revision: https://reviews.llvm.org/D112298	2021-11-05 15:05:09 +00:00
David Green	61225c0818	[ValueTracking][InstCombine] Introduce and use ComputeMinSignedBits This introduces a new ComputeMinSignedBits method for ValueTracking that returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents the minimum bit size for the value as a signed integer. Similar to the existing APInt::getMinSignedBits method, this can make some of the reasoning around ComputeSignBits more natural. See https://reviews.llvm.org/D112298	2021-11-05 14:41:37 +00:00
Simon Pilgrim	9e6506299a	[DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic. We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code. Differential Revision: https://reviews.llvm.org/D113276	2021-11-05 14:36:17 +00:00
Roman Lebedev	ad617183bb	[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: mask is i8 not i1 Even though AVX512's masked mem ops (unlike AVX1/2) have a mask that is a `VF x i1`, replication of said masks happens after promotion of it to `VF x i8`, so we should use `i8`, not `i1`, when calculating the cost of mask replication.	2021-11-05 17:27:02 +03:00
Sanjay Patel	4fc1fc4005	[DAGCombiner] add fold for vselect based on mask of signbit (X s< 0) ? Y : 0 --> (X s>> BW-1) & Y We canonicalize to the icmp+select form in IR, and we already have this fold for scalar select in SDAG, so I think it's an oversight that we don't have the fold for vectors. It seems neutral for AArch64 and saves some instructions on x86. Whether we should also have the sibling folds for the inverse condition or all-ones true value may depend on target-specific factors such as whether there's an "and-not" instruction. Differential Revision: https://reviews.llvm.org/D113212	2021-11-05 10:06:16 -04:00
Roman Lebedev	01d8759ac9	[IR][ShuffleVector] Introduce `isReplicationMask()` matcher Avid readers of this saga may recall from previous installments, that replication mask replicates (lol) each of the `VF` elements in a vector `ReplicationFactor` times. For example, the mask for `ReplicationFactor=3` and `VF=4` is: `<0,0,0,1,1,1,2,2,2,3,3,3>`. More importantly, replication mask is used by LoopVectorizer when using masked interleaved memory operations. As discussed in previous installments, while it is used by LV, and we seem to support masked interleaved memory operations on X86, it's support in cost model leaves a lot to be desired: until basically yesterday even for AVX512 we had no cost model for it. As it has been witnessed in the recent AVX2 `X86TTIImpl::getInterleavedMemoryOpCost()` costmodel patches, while it is hard-enough to query the cost of a particular assembly sequence [from llvm-mca], afterwards the check lines LV costmodel tests must be updated manually. This is, at the very least, boring. Okay, now we have decent costmodel coverage for interleaving shuffles, but now basically the same mind-killing sequence has to be performed for replication mask. I think we can improve at least the second half of the problem, by teaching the `TargetTransformInfoImplCRTPBase::getUserCost()` to recognize `Instruction::ShuffleVector` that are repetition masks, adding exhaustive test coverage using `-cost-model -analyze` + `utils/update_analyze_test_checks.py` This way we can have good exhaustive coverage for cost model, and only basic coverage for the LV costmodel. This patch adds precise undef-aware `isReplicationMask()`, with exhaustive test coverage. * `InstructionsTest.ShuffleMaskIsReplicationMask` shows that it correctly detects all the known masks. * `InstructionsTest.ShuffleMaskIsReplicationMask_undef` shows that replacing some mask elements in a known replication mask still allows us to recognize it as a replication mask. Note, with enough undef elts, we may detect a different tuple. * `InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness` shows that if we detected the replication mask with given params, then if we actually generate a true replication mask with said params, it matches element-wise ignoring undef mask elements. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D113214	2021-11-05 16:53:47 +03:00
David Sherwood	657a1dcd0d	[AArch64] Add target DAG combine for UUNPKHI/LO When created a UUNPKLO/HI node with an undef input then the output should also be undef. I've added a target DAG combine function to ensure we avoid creating an unnecessary uunpklo/hi instruction. Differential Revision: https://reviews.llvm.org/D113266	2021-11-05 13:50:59 +00:00
Quinn Pham	c71fbdd87b	[NFC] Inclusive language: Remove instances of master in URLs [NFC] This patch fixes URLs containing "master". Old URLs were either broken or redirecting to the new URL. Reviewed By: #libc, ldionne, mehdi_amini Differential Revision: https://reviews.llvm.org/D113186	2021-11-05 08:48:41 -05:00
Simon Pilgrim	f2703c3c33	[DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC. NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode. Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.	2021-11-05 13:32:34 +00:00
Jingu Kang	a7b1872593	[AArch64] Fix a bug from a pattern for uaddv(uaddlp(x)) ==> uaddlv A pattern has selected wrong uaddlv MI. It should be as below. uaddv(uaddlp(v8i8)) ==> uaddlv(v8i8) Differential Revision: https://reviews.llvm.org/D113263	2021-11-05 12:48:18 +00:00
Alfredo Dal'Ava Junior	1cb9f37a17	[FreeBSD] Do not mark __stack_chk_guard as dso_local This symbol is defined in libc.so so it is definitely not DSO-Local. Marking it as such causes problems on some platforms (such as PowerPC). Differential revision: https://reviews.llvm.org/D109090	2021-11-05 07:29:50 -05:00
Simon Pilgrim	c1e7911c3b	[DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y)) To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands. The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support. One of the yak shaving fixes for D113192.... Differential Revision: https://reviews.llvm.org/D113202	2021-11-05 12:00:59 +00:00
Simon Pilgrim	5e9ac7c0a5	[X86] Enable v32i16 rotate lowering on non-BWI targets Fixes one of the regressions in D113192	2021-11-05 11:00:31 +00:00
Chen Zheng	fed2889f07	[PowerPC] use correct selection for v16i8/v8i16 splat load Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D113236	2021-11-05 10:04:03 +00:00
Jay Foad	0321bd64e6	Revert "[TwoAddressInstructionPass] Update existing physreg live intervals" This reverts commit `ec0e1e88d2`. It was pushed by mistake.	2021-11-05 09:54:26 +00:00
Jay Foad	c93bf53a3e	[AMDGPU] NFC formatting fixes in SIMemoryLegalizer	2021-11-05 09:10:24 +00:00
Jay Foad	ec0e1e88d2	[TwoAddressInstructionPass] Update existing physreg live intervals In TwoAddressInstructionPass::processTiedPairs with -early-live-intervals, update any preexisting physreg live intervals, as well as virtreg live intervals. By default (without -precompute-phys-liveness) physreg live intervals only exist for registers that are live-in to some basic block. Differential Revision: https://reviews.llvm.org/D113191	2021-11-05 09:10:24 +00:00
Qiu Chaofan	5fd406e254	[PowerPC] Add intrinsic to convert between ppc_fp128 and fp128 ppc_fp128 and fp128 are both 128-bit floating point types. However, we can't do conversion between them now, since trunc/ext are not allowed for same-size fp types. This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and llvm.convert.ppcf128.to.f128, to support such conversion. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D109421	2021-11-05 16:58:38 +08:00
Martin Storsjö	df0ba47c36	[Support] Allow configuring the preferred type of slashes on Windows Default to preferring forward slashes when built for MinGW, as many usecases, when e.g. Clang is used as a drop-in replacement for GCC, requires the compiler to output paths with forward slashes. Not all tests pass yet, if configuring to prefer forward slashes though. Differential Revision: https://reviews.llvm.org/D112787	2021-11-05 10:42:02 +02:00
Martin Storsjö	f4d83c56c9	[Support] [Windows] Convert paths to the preferred form This normalizes most paths (except ones input from the user as command line arguments) into the preferred form, if `real_style()` evaluates to `windows_forward`. Differential Revision: https://reviews.llvm.org/D111880	2021-11-05 10:41:51 +02:00
Martin Storsjö	a8b54834a1	[Support] Add a new path style for Windows with forward slashes This behaves just like the regular Windows style, with both separator forms accepted, but with get_separator() returning forward slashes. Add a more descriptive name for the existing style, keeping the old name around as an alias initially. Add a new function `make_preferred()` (like the C++17 `std::filesystem::path` function with the same name), which converts windows paths to the preferred separator form (while this one works on any platform and takes a `path::Style` argument). Contrary to `native()` (just like `make_preferred()` in `std::filesystem`), this doesn't do anything at all on Posix, it doesn't try to reinterpret backslashes into forward slashes there. Differential Revision: https://reviews.llvm.org/D111879	2021-11-05 10:41:51 +02:00
Martin Storsjö	f95bd18b5f	Revert "[Attr] support btf_type_tag attribute" This reverts commits `737e4216c5` and `ce7ac9e66a`. After those commits, the compiler can crash with a reduced testcase like this: $ cat reduced.c void a(*); void a() {} $ clang -c reduced.c -O2 -g	2021-11-05 10:36:40 +02:00
Chen Zheng	9695027066	[PowerPC] address post-commit comments for D106555; NFC Address namanjai post commit comments.	2021-11-05 05:30:53 +00:00
Vitaly Buka	1caabbef8e	[OpaquePtr] Fix initialization-order-fiasco Asan detects it after D112732.	2021-11-04 19:29:06 -07:00
Shengchen Kan	be08e452f3	[X86][MS-InlineAsm] Add constraint m for memory access w/ global var Constraint `m` should be used when the address of a variable is passed as a value. And the constraint is missing for MS inline assembly when sth is written to the address of the variable. The missing would cause FE delete the definition of the static varible, and then result in "undefined reference to xxx" issue. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D113096	2021-11-05 09:11:41 +08:00
Kirill Stoimenov	3f1aca58df	[ASan] Added stack safety support in address sanitizer. Added and implemented -asan-use-stack-safety flag, which control if ASan would use the Stack Safety results to emit less code for operations which are marked as 'safe' by the static analysis. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D112098	2021-11-04 17:22:31 -07:00
Arthur Eubanks	7175886a0f	[NewPM] Make eager analysis invalidation per-adaptor Follow-up change to D111575. We don't need eager invalidation on every adaptor. Most notably, adaptors running passes that use very few analyses, or passes that purely invalidate specific analyses. Also allow testing of this via a pipeline string "function<eager-inv>()". The compile time/memory impact of this is very comparable to D111575. https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D113196	2021-11-04 17:16:11 -07:00
Yonghong Song	41860e602a	BPF: Support btf_type_tag attribute A new kind BTF_KIND_TYPE_TAG is defined. The tags associated with a pointer type are emitted in their IR order as modifiers. For example, for the following declaration: int __tag1 * __tag1 __tag2 *g; The BTF type chain will look like VAR(g) -> __tag1 --> __tag2 -> pointer -> __tag1 -> pointer -> int In the above "->" means BTF CommonType.Type which indicates the point-to type. Differential Revision: https://reviews.llvm.org/D113222	2021-11-04 17:01:36 -07:00
Philip Reames	dec15d9a0a	[indvars] Use loop guards when canonicalizing exit conditions This extends the logic in canonicalizeExitConditions to use loop guards to specialize the SCEV of the loop invariant term before quering it's range.	2021-11-04 15:23:34 -07:00
Arthur Eubanks	13317286f8	[NewPM] Use the default AA pipeline by default We almost always want to use the default AA pipeline. It's very easy for users of PassBuilder to forget to customize the AAManager to use the default AA pipeline (for example, the NewPM C API forgets to do this). If somebody wants a custom AA pipeline, similar to what is being done now with the default AA pipeline registration, they can FAM.registerPass([&] { return std::move(MyAA); }); before calling PB.registerFunctionAnalyses(FAM); For example, LTOBackend.cpp and NewPMDriver.cpp do this. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D113210	2021-11-04 15:10:34 -07:00
Ben Langmuir	a2639dcbe6	[ORC] Add a utility for adding missing "self" relocations to a Symbol If a tool wants to introduce new indirections via stubs at link-time in ORC, it can cause fidelity issues around the address of the function if some references to the function do not have relocations. This is known to happen inside the body of the function itself on x86_64 for example, where a PC-relative address is formed, but without a relocation. ``` _foo: leaq -7(%rip), %rax ## form pointer to '_foo' without relocation _bar: leaq (%rip), %rax ## uses X86_64_RELOC_SIGNED to '_foo' ``` The consequence of introducing a stub for such a function at link time is that if it forms a pointer to itself without relocation, it will not have the same value as a pointer from outside the function. If the function pointer is used as a key, this can cause problems. This utility provides best-effort support for adding such missing relocations using MCDisassembler and MCInstrAnalysis to identify the problematic instructions. Currently it is only implemented for x86_64. Note: the related issue with call/jump instructions is not handled here, only forming function pointers. rdar://83514317 Differential revision: https://reviews.llvm.org/D113038	2021-11-04 15:01:05 -07:00
David Blaikie	7cdd262351	DebugInfo: Fix incorrect line table lookup when resolving decl_file from a split unit Specifically in DWARFv5 the unit for the line table entry was correct but the context was incorrect - leading to looking up .debug_line_str in the dwp instead of the executable. (perhaps we could/should remove the context pointer entirely, and rely on the one in the unit... I might try that as a separate follow-up commit)	2021-11-04 14:54:27 -07:00
Philip Reames	c0d9bf2f6a	[indvars] Allow rotation (narrowing) of exit test when discovering trip count This relaxes the one-use requirement on the rotation transform specifically for the case where we know we're zexting an IV of the loop. This allows us to discover trip count information in SCEV, which seems worth a single extra loop invariant truncate. Honestly, I'd prefer if SCEV could just compute the trip count directly (e.g. D109457), but this unblocks practical benefit.	2021-11-04 14:49:24 -07:00
Yonghong Song	737e4216c5	[Attr] support btf_type_tag attribute This patch added clang codegen and llvm support for btf_type_tag support. Currently, btf_type_tag attribute info is preserved in DebugInfo IR only for pointer types associated with typedef, global variable and function declaration. Eventually, such information is emitted to dwarf. The following is an example: $ cat test.c #define __tag __attribute__((btf_type_tag("tag"))) int __tag g; $ clang -O2 -g -c test.c $ llvm-dwarfdump --debug-info test.o ... 0x0000001e: DW_TAG_variable DW_AT_name ("g") DW_AT_type (0x00000033 "int ") DW_AT_external (true) DW_AT_decl_file ("/home/yhs/test.c") DW_AT_decl_line (2) DW_AT_location (DW_OP_addr 0x0) 0x00000033: DW_TAG_pointer_type DW_AT_type (0x00000042 "int") 0x00000038: DW_TAG_LLVM_annotation DW_AT_name ("btf_type_tag") DW_AT_const_value ("tag") 0x00000041: NULL 0x00000042: DW_TAG_base_type DW_AT_name ("int") DW_AT_encoding (DW_ATE_signed) DW_AT_byte_size (0x04) 0x00000049: NULL Basically, a DW_TAG_LLVM_annotation tag will be inserted under DW_TAG_pointer_type tag if that pointer has a btf_type_tag associated with it. Differential Revision: https://reviews.llvm.org/D111199	2021-11-04 14:23:31 -07:00
Philip Reames	453fdebd48	[indvars] Extend canonicalizeExitConditions to inverted operands As discussed in the original reviews, but done in a follow on.	2021-11-04 14:20:37 -07:00
Thomas Symalla	76cbe62262	[AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values. This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers. Reviewed By: sebastian-ne Differential Revision: https://reviews.llvm.org/D111637	2021-11-04 21:50:18 +01:00
Noah Shutty	d788c44f5c	[Support] Improve Caching conformance with Support library behavior This diff makes several amendments to the local file caching mechanism which was migrated from ThinLTO to Support in rGe678c51177102845c93529d457b020f969125373 in response to follow-up discussion on that commit. Patch By: noajshu Differential Revision: https://reviews.llvm.org/D113080	2021-11-04 13:00:44 -07:00
David Green	091244023a	[ARM] Move VPTBlock pass after post-ra scheduling Currently when tail predicating loops, vpt blocks need to be created with the vctp predicate in case we need to revert to non-tail predicated form. This has the unfortunate side effect of severely hampering post-ra scheduling at times as the instructions are already stuck in vpt blocks, not allowed to be independently ordered. This patch addresses that by just moving the creation of VPT blocks later in the pipeline, after post-ra scheduling has been performed. This allows more optimal scheduling post-ra before the vpt blocks are created, leading to more optimal tail predicated loops. Differential Revision: https://reviews.llvm.org/D113094	2021-11-04 18:42:12 +00:00
Wouter van Oortmerssen	a320f877ce	[WebAssembly] Fix debug locations for ExplicitLocals pass This is a reworked version of the reverted patch: https://reviews.llvm.org/D112487 Note that a) it doesn't need the test changes anymore, and b) I checked at least locally it passes other.test_pthread_lsan_leak Differential Revision: https://reviews.llvm.org/D113208	2021-11-04 11:38:03 -07:00
Rahman Lavaee	f533ec37eb	Make the BBAddrMap struct binary-format-agnostic. The only binary-format-related field in the BBAddrMap structure is the function address (`Addr`), which will use uint64_t in 64B format and uint32_t in 32B format. This patch changes it to use uint64_t in both formats. This allows non-templated use of the struct, at the expense of a marginal additional size overhead for the 32-bit format. The size of the BB address map section does not change. Differential Revision: https://reviews.llvm.org/D112679	2021-11-04 10:27:24 -07:00
Zakk Chen	0649dfebba	[RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0. Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic. Reviewed By: frasercrmck, jrtc27, craig.topper Differential Revision: https://reviews.llvm.org/D111062	2021-11-04 10:08:01 -07:00
Kazu Hirata	2887117d2c	[Hexagon] Use make_early_inc_range (NFC)	2021-11-04 08:51:05 -07:00
Jamie Schmeiser	8720149d9b	Remove unused function from print-changed=dot-cfg code Summary: Remove unused function from print-changed=dot-cfg code to silence a gcc compiler warning. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: uabelho(Mikael Holmen) Differential Revision: https://reviews.llvm.org/D113188	2021-11-04 10:40:50 -04:00
Sander de Smalen	1ea4296208	[NFC] Remove from UnivariateLinearPolyBase::getValue(). This interface should not have existed in the first place, let alone be a public member. It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous. The interfaces to be used are either getFixedValue() or getKnownMinValue().	2021-11-04 14:32:08 +00:00
Sjoerd Meijer	3fd1902ad8	[FuncSpec] Enable it only with -O3 Function specialisation was running at all optimisation levels (if enabled on the command line, it is not on by default). That was an oversight and not something we want to do. Function specialisation duplicates functions when it triggers, so the backend is processing more functions/instructions resulting in compile-time increases, which seems more appropriate with -O3 and inline with GCC. Please note that since function specialisation is not enabled by default, this didn't require updating any pass manager tests. Differential Revision: https://reviews.llvm.org/D112129	2021-11-04 13:59:00 +00:00

1 2 3 4 5 ...

152407 Commits