llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	96400f179f	[VPlan] Record whether scalar IVs are need in induction recipe. (NFC) This explicitly records whether a scalar IV is needed in the VPWidenIntOrFpInductionRecipe, to remove a dependence on the cost-model during its ::execute. It will also be used in D116123 to determine if a vector phi will be generated. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118167	2022-01-28 09:34:03 +00:00
Max Kazantsev	8599bb0f26	[InstCombine] Fold and-reduce idiom This patch introduces folding of and-reduce idiom and generates code that is easier to read and which is lest costly in terms of icmp operations. The folding is ``` icmp eq (bitcast(icmp ne (lhs, rhs)), 0) ``` into ``` icmp eq(bitcast(lhs), bitcast(rhs)) ``` See PR53419. Differential Revision: https://reviews.llvm.org/D118317 Reviewed By: lebedev.ri, spatel	2022-01-28 11:20:08 +07:00
Ellis Hoag	11d3074267	[InstrProf] Add single byte coverage mode Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because * We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions * We use a single byte per function rather than 8 bytes per block The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code. When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%). [0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4 Reviewed By: kyulee Differential Revision: https://reviews.llvm.org/D116180	2022-01-27 17:38:55 -08:00
Vitaly Buka	bddc814b44	[msan] Copy origin of byval arguments Depends on D117278 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117285	2022-01-27 16:24:07 -08:00
Florian Hahn	9fd7a2e379	[ConstraintElimination] Use constraints with 0 or 1 coefficients. isConditionImplied is able to correctly handle 0 or 1 coefficients, so let it handle those cases, rather than skipping them.	2022-01-27 18:41:33 +00:00
Florian Hahn	258a0a3a55	[ConstraintElimination] Use simplified constraint for == 0. When checking x == 0, checking x u<= 0 is sufficient and simpler than x u>= 0 && x u<= 0. https://alive2.llvm.org/ce/z/btM7d3 ---------------------------------------- define i1 @src(i4 %a) { %0: %c = icmp eq i4 %a, 0 ret i1 %c } => define i1 @tgt(i4 %a) { %0: %c = icmp ule i4 %a, 0 ret i1 %c } Transformation seems to be correct!	2022-01-27 13:31:23 +00:00
Florian Hahn	a78ce48c37	[ConstraintElimination] Introduce struct to manage constraints. (NFC) This patch adds a struct to manage a list of constraints. It simplifies a follow-up change, that adds pre-conditions that must hold before a list of constraints can be used.	2022-01-27 12:40:09 +00:00
Nikita Popov	d839afe3f9	[InstCombine] Avoid pointer element type access in PointerReplacer This code replaces the address space of the pointers while keeping the element type. Use the appropriate helpers to make this work with opaque pointers.	2022-01-27 12:28:32 +01:00
Nikita Popov	648faa3b5d	[InstCombine] Mark element type access as non-opaque (NFC) Also make the function static to make it more obvious that it is only used in the one place.	2022-01-27 11:40:29 +01:00
Florian Hahn	bb5c1b0691	[LoopVersioning] Use IRBuilder for OR simplification.	2022-01-27 09:55:51 +00:00
Nikita Popov	2c736f666b	[InstCombine] Skip GEP of bitcast transform with opaque pointers This transform is fundamentally incompatible with opaque pointers. Usually we would not hit it anyway because the bitcast is folded away earlier, but due to worklist order it might survive until here, so make sure we bail out explicitly.	2022-01-27 10:51:45 +01:00
Nikita Popov	b7179d9279	[InstCombine] Extract GEP of bitcast folds into separate function (NFC)	2022-01-27 10:48:00 +01:00
Nikita Popov	73cd8e29ad	[InstCombine] Skip PromoteCastOfAllocation() transform under opaque pointers I think this can't be hit anyway (because a ptr-to-ptr bitcast would get folded earlier), but in the interest of being explicit skip this transform for opaque pointers entirely.	2022-01-27 10:25:45 +01:00
Nikita Popov	8d992862a0	[InstCombine] Remove some pointer element type accesses One of these is guarded against opaque pointers, and the others were accessing the call function type in a rather convoluted way.	2022-01-27 10:15:35 +01:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Nikita Popov	de8867a0b6	[AMDGPUEmitPrintf] Don't require specific pointer element type Rather than checking for i8, simply add a bitcast to i8, so the appendString() code sees the expected type.	2022-01-26 16:16:32 +01:00
Nikita Popov	903c3d2863	[SCEVExpander] Always use i8 GEP for reused value offset We could keep the non-i8 GEP code for non-opaque pointers, but there's two reasons I'm dropping it: First, this actually appears to be dead code, at least it isn't hit in any of our tests. I expect that this is because we usually expand trip counts, and those are never pointers (anymore). Second, the non-i8 GEP was actually incorrect in multiple ways, because it used SCEV type sizes, which don't match DL type sizes (for pointers) and certainly don't match type alloc sizes (which is what GEPs actually use). As such, I'm simplifying the code to always use the i8 GEP code path if it does get hit.	2022-01-26 15:38:58 +01:00
Nikita Popov	03d0acc545	[DSE] Use helper for unwind check (NFCI) This should be no functional change, as the cases supported by the helper and the cases supported by DSE are currently the same, the code structure is just slightly different.	2022-01-26 14:08:08 +01:00
Nikita Popov	6b69985da4	[MemCpyOpt] Use helper for unwind check This extends support to byval arguments. It would be further extended to handle the case of non-captured noalias returns.	2022-01-26 12:43:31 +01:00
Benjamin Kramer	0776f6e04d	[LSV] Vectorize loads of vectors by turning it into a larger vector Use shufflevector to do the subvector extracts. This allows a lot more load merging on AMDGPU and also on NVPTX when <2 x half> is involved. Differential Revision: https://reviews.llvm.org/D117219	2022-01-26 11:38:41 +01:00
Nuno Lopes	24a49e99f3	[NewGVN] FIx phi-of-ops in the presence of memory read operations The phi-of-ops functionality has a function OpIsSafeForPHIOfOps to determine when it's safe to create the new phi. But this function only checks for the obvious dominator conditions and ignores memory. This patch takes the conservative approach and disables phi-of-ops whenever there's a load that doesn't dominate the phi, as its value may be affected by a store inside the loop. This can be improved later to check aliasing between the load/stores. Fixes https://llvm.org/PR53277 Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D117999	2022-01-26 10:19:18 +00:00
Nikita Popov	44cfc3a816	[LICM] Generalize unwinding check during scalar promotion This extract a common isNotVisibleOnUnwind() helper into AliasAnalysis, which handles allocas, byval arguments and noalias calls. After D116998 this could also handle sret arguments. We have similar logic in DSE and MemCpyOpt, which will be switched to use this helper as well. The noalias call case is a bit different from the others, because it also requires that the object is not captured. The caller is responsible for doing the appropriate check. Differential Revision: https://reviews.llvm.org/D117000	2022-01-26 11:15:03 +01:00
Nikita Popov	bec4e865de	[SCEVExpander] Remove pointer element type access in assertion Assert directly on i8 rather than the element type of i8*.	2022-01-26 10:35:57 +01:00
Nikita Popov	9e7a2bfcf7	[OpenMPOpt] Add const qualifier (NFC) Make it clear that this large lambda does not modify the vector.	2022-01-26 10:35:57 +01:00
Nikita Popov	c82cb5d000	[AddressSanitizer] Avoid pointer element type accesses Determine masked load/store type based on the value operand and result types, rather than pointer element type.	2022-01-26 10:16:15 +01:00
Giorgis Georgakoudis	7cb4c26173	[OMPIRBuilder] Generate aggregate argument for parallel region outlined functions Summary: This patch modifies code generation in OpenMPIRBuilder to pass arguments to the parallel region outlined function in an aggregate (struct), besides the global_tid and bound_tid arguments. It depends on the updated CodeExtractor (see D96854) for support. It mirrors functionality of Clang codegen (see D102107). Differential Revision: https://reviews.llvm.org/D110114	2022-01-25 20:53:45 -05:00
Giorgis Georgakoudis	95b981ca2a	[CodeExtractor] Enable partial aggregate arguments Summary: Enable CodeExtractor to construct output functions that partially aggregate inputs/outputs in their argument list. A use case is the OMPIRBuilder to create outlined functions for parallel regions that aggregate in a struct the payload variables for the region while passing as scalars thread and bound identifiers. Differential Revision: https://reviews.llvm.org/D96854	2022-01-25 20:50:34 -05:00
Andrew Litteken	ba79295c48	[NFC][IROutliner] fix namespace and unused variable	2022-01-25 18:41:30 -06:00
Andrew Litteken	e8f4e41b6b	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region. We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function. Recommit of `515eec3553` Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106997	2022-01-25 18:25:50 -06:00
Andrew Litteken	e50b217b4e	Revert "[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region." This reverts commit `515eec3553`. By mistake, commit message was not complete.	2022-01-25 18:24:19 -06:00
Andrew Litteken	515eec3553	[IRSim][IROutliner] Add support for outlining PHINodes with the rest of the region.	2022-01-25 18:20:10 -06:00
Andrew Litteken	9c2daf648c	Revert "[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions" This reverts commit `8de76bd569`. Reverting due to failure of different-intrinsics.ll on lld-x86_64-win buildbot.	2022-01-25 18:19:33 -06:00
Dávid Bolvanský	fe30370b00	Reland "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 )"	2022-01-26 01:11:06 +01:00
Andrew Litteken	8de76bd569	[IRSim][IROutliner] Allowing Intrinsic Calls to be Used in Similarity Matching and Outlined Regions Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match. This also adds an additional command line flag debug option to disable outlining intrinsics. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109450	2022-01-25 17:06:09 -06:00
Dávid Bolvanský	90f185c964	Revert "[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 )" This reverts commit `ceec438368`. Clang tests fail.	2022-01-25 23:13:46 +01:00
Dávid Bolvanský	ceec438368	[AlwaysInliner] Enable call site inlining to make flatten attribute working again (#53360 ) Problem: Migration to new PM broke flatten attribute. This is one use case why LLVM should support inlining call-site with alwaysinline. The flatten attribute is nowdays broken, so we should either land patch like this one or remove everything related to flatten attribute from Clang. Second use case is something like "per call site inlining intrinsics" to control inlining even more; mentioned in https://lists.llvm.org/pipermail/cfe-dev/2018-September/059232.html Fixes https://github.com/llvm/llvm-project/issues/53360 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D117965	2022-01-25 22:55:30 +01:00
Andrew Litteken	f5f377d1fc	[IRSim][IROutliner] Adding support for recognizing and outlining indirect function calls, and function calls with different names, but the same type The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions. There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match. Reviewers: paquette, jroelofs Differential Revision: https://reviews.llvm.org/D109448	2022-01-25 15:19:28 -06:00
Andrew Litteken	dcc3e728ca	[IROutliner] Allowing Phi Nodes in exit blocks In addition to having multiple exit locations, there can be multiple blocks leading to the same exit location, which results in a potential phi node. If we find that multiple blocks within the region branch to the same block outside the region, resulting in a phi node, the code extractor pulls this phi node into the function and uses it as an output. We make sure that this phi node is given an output slot, and that the two values are removed from the outputs if they are not used anywhere else outside of the region. Across the extracted regions, the phi nodes are combined into a single block for each potential output block, similar to the previous patch. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106995	2022-01-25 11:33:53 -06:00
Nikita Popov	2f02c7e1f2	[SanitizerCoverage] Avoid pointer element type access Use the load/store type instead.	2022-01-25 17:22:20 +01:00
Nikita Popov	98db33349b	[SLC] Fix pointer diff type in sprintf() optimization We should always be calculating a byte-wise difference here. Previously this calculated the pointer difference while taking the pointer element type into account, which is incorrect.	2022-01-25 15:22:56 +01:00
Nikita Popov	7cc3e141d7	[MemProf] Avoid pointer element type access Determine the masked load/store access type from the value type of the intrinsics, rather than the pointer element type. For cleanliness, include the access type in InterestingMemoryAccess.	2022-01-25 14:52:54 +01:00
Nikita Popov	6a008de82a	[Evaluator] Simplify handling of bitcasted calls When fetching the function, strip casts. When casting the result, use the call result type. Don't actually inspect the bitcast.	2022-01-25 14:19:04 +01:00
Nikita Popov	78e1f70220	[ObjCARCOpts] Use standard non-terminator unreachable pattern This is what CreateNonTerminatorUnreachable() in InstCombine uses. Specific choice here doesn't really matter, but we should pick one that is pointer element type independent.	2022-01-25 13:08:03 +01:00
Nikita Popov	30d4a7e295	[IRBuilder] Require explicit element type in CreatePtrDiff() For opaque pointer compatibility, we cannot derive the element type from the pointer type.	2022-01-25 12:43:57 +01:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Ahmed Bougacha	e7298464c5	[ObjCARC] Use "UnsafeClaimRV" to refer to unsafeClaim in enums. NFC. This matches the actual runtime function more closely. I considered also renaming both RetainRV/UnsafeClaimRV to end with "ARV", for AutoreleasedReturnValue, but there's less potential for confusion there.	2022-01-24 19:37:01 -08:00
Ahmed Bougacha	03e9ba2740	[ObjCARC] Remove unused RetainRVDep dependency kind. NFC.	2022-01-24 19:37:01 -08:00
Joseph Huber	5eb49009eb	[OpenMP] Add more identifier to created shared globals Currenly we push some variables to a global constant containing shared memory as an optimization. This generated constant had internal linkage and should not have collided with any known identifiers in the translation unit. However, there have been observed cases of this optimiztaion unintentionally colliding with undocumented PTX identifiers. This patch adds a suffix to the created globals to hopefully bypass this. Depends on D118059 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D118068	2022-01-24 20:37:54 -05:00
Joseph Huber	06cfdd5224	[OpenMP][Fix] Properly inherit calling convention Previously in OpenMPOpt we did not correctly inherit the calling convention of the callee when creating new OpenMP runtime calls. This created issues when the calling convention was changed during `GlobalOpt` but a new call was creating without the correct calling convention. This lead to the call being replaced with a poison value in `InstCombine` due to undefined behaviour and causing large portions of the program to be incorrectly eliminated. This patch correctly inherits the existing calling convention from the callee. Reviewed By: tianshilei1992, jdoerfert Differential Revision: https://reviews.llvm.org/D118059	2022-01-24 20:37:52 -05:00
Florian Hahn	8a15caaae5	[ConstraintElimination] Fix sign of sub decomposition. Update the decomposition code to make sure the right coefficient (-1) is used for the second operand of the subtract. Fixes PR53123.	2022-01-24 18:32:32 +00:00
eopXD	6be77561f8	[SLP][NFC] Add debug logs for entry. Tell the users they are specifying something without vector register. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D117980	2022-01-24 09:05:21 -08:00
Sjoerd Meijer	ada6d78a78	[LoopFlatten] Address FIXME about getTripCountFromExitCount. NFC. Together with the previous commit which mainly documents better LoopFlatten's overall strategy, this addresses a concern added as a FIXME comment in D110587; the code refactoring (NFC) introduces functions (also for the SCEV usage) to make this clearer.	2022-01-24 13:46:19 +00:00
Sjoerd Meijer	f6ac8088b0	[LoopFlatten] Added comments about usage of various Loop APIs. NFC.	2022-01-24 13:46:19 +00:00
Kerry McLaughlin	8082ab2fc3	[LoopVectorize] Support epilogue vectorisation of loops with reductions isCandidateForEpilogueVectorization will currently return false for loops which contain reductions. This patch removes this restriction and makes the following changes to support epilogue vectorisation with reductions: - `fixReduction`: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop. - `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue. - `processLoop`: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D116928	2022-01-24 12:03:31 +00:00
Nikita Popov	67346b43e0	[Attributor] Use MemoryLocation to get pointer operand and accessed type (NFCI) This relies on existing APIs and avoids accessing the pointer element type. The alternative would be to extend getPointerOperand() to also return the accessed type, but I figured going through MemoryLocation would be cleaner. Differential Revision: https://reviews.llvm.org/D117868	2022-01-24 10:10:13 +01:00
Nikita Popov	d29e319263	[OpaquePtrs] Add getNonOpaquePointerElementType() method (NFC) This method is intended for use in places that cannot be reached with opaque pointers, or part of deprecated methods. This makes it easier to see that some uses of getPointerElementType() don't need further action. Differential Revision: https://reviews.llvm.org/D117870	2022-01-24 10:03:49 +01:00
Kazu Hirata	f63a9cd99d	[Vectorize] Remove unused variables (NFC)	2022-01-23 20:32:54 -08:00
Sanjay Patel	2e26633af0	[IR] document and update ctlz/cttz intrinsics to optionally return poison rather than undef The behavior in Analysis (knownbits) implements poison semantics already, and we expect the transforms (for example, in instcombine) derived from those semantics, so this patch changes the LangRef and remaining code to be consistent. This is one more step in removing "undef" from LLVM. Without this, I think https://github.com/llvm/llvm-project/issues/53330 has a legitimate complaint because that report wants to allow subsequent code to mask off bits, and that is allowed with undef values. The clang builtins are not actually documented anywhere AFAICT, but we might want to add that to remove more uncertainty. Differential Revision: https://reviews.llvm.org/D117912	2022-01-23 11:22:48 -05:00
Sanjay Patel	39e602b6c4	[InstCombine] try to fold binop with phi operands This is an alternate version of D115914 that handles/tests all binary opcodes. I suspect that we don't see these patterns too often because -simplifycfg would convert the minimal cases into selects rather than leave them in phi form (note: instcombine has logic holes for combining the select patterns too though, so that's another potential patch). We only create a new binop in a predecessor that unconditionally branches to the final block. https://alive2.llvm.org/ce/z/C57M2F https://alive2.llvm.org/ce/z/WHwAoU (not safe to speculate an sdiv for example) https://alive2.llvm.org/ce/z/rdVUvW (but it is ok on this path) Differential Revision: https://reviews.llvm.org/D117110	2022-01-22 15:00:06 -05:00
Florian Hahn	5f2854f1da	[LV] Always create VPWidenCanonicalIVRecipe, optimize away later. This patch updates createBlockInMask to always generate VPWidenCanonicalIVRecipe and adds a transform to optimize it away later, if it is not needed. This is a step towards breaking up VPWidenIntOrFpInductionRecipe and explicitly distinguishing between vector phis and scalarizing. Split off from D116123. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117140	2022-01-22 15:34:20 +00:00
Florian Mayer	754d6af7c3	[NFC] Improve code reuse. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D116711	2022-01-21 10:50:54 -08:00
Caroline Concatto	ad43217a04	[InstCombine] Fold for masked gather when loading the same value each time. This patch checks in the masked gather when the first operand value is a splat and the mask is all one, because the masked gather is reloading the same value each time. This patch replaces this pattern of masked gather by a scalar load of the value and splats it in a vector. Differential Revision: https://reviews.llvm.org/D115726	2022-01-21 14:19:51 +00:00
Nikita Popov	bfbdb5e43e	[Coroutines] Avoid some pointer element type accesses These are just verifying that pointer types are correct, which is no longer relevant under opaque pointers.	2022-01-21 12:36:19 +01:00
Nikita Popov	9c5b856dac	[CoroSplit] Avoid pointer element type accesses Use isOpaqueOrPointeeTypeMatches() for the assertions instead.	2022-01-21 12:22:09 +01:00
Nikita Popov	e7762653d3	[Attributor] Avoid some pointer element type accesses	2022-01-21 11:20:10 +01:00
Florian Hahn	55689904d2	[VPlan] Move ::isCanonical outside ifdef. This fixes a build failure with assertions disabled.	2022-01-21 09:44:31 +00:00
Florian Hahn	c0cf209076	[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI). This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if an induction recipe is canonical. The code is also updated to use it instead of isCanonicalID. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D117551	2022-01-21 09:35:06 +00:00
Pawe Bylica	1d7604fdce	[InstCombine] Simplify bswap -> shift Simplify bswap(x) to shl(x) or lshr(x) if x has exactly one "active byte", i.e. all active bits are contained in boundaries of a single byte of x. https://alive2.llvm.org/ce/z/nvbbU5 https://alive2.llvm.org/ce/z/KiiL3J Reviewed By: spatel, craig.topper, lebedev.ri Differential Revision: https://reviews.llvm.org/D117680	2022-01-21 01:25:30 +01:00
Johannes Doerfert	37e0c58559	[Attributor][FIX] AAValueConstantRange should not loop unconstrained The old method to avoid unconstrained expansion of the constant range in a loop did not work as soon as there were multiple instructions in between the phi and its input. We now take a generic approach and limit the number of updates as a fallback. The old method is kept as it catches "the common case" early.	2022-01-20 18:07:04 -06:00
Johannes Doerfert	7bf9065ad7	[Attributor][NFC] Clang format	2022-01-20 18:06:53 -06:00
Philip Reames	c0906f6b21	[SLP] Remove stray semicolon to make bots happy Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.	2022-01-20 14:09:28 -08:00
Philip Reames	5a670f1378	[SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC]	2022-01-20 13:58:20 -08:00
Philip Reames	60f6191879	[SLP] Extract formBundle helper for readability [NFC]	2022-01-20 13:08:37 -08:00
Sanjay Patel	a7a2860d0e	[InstCombine] convert mul with sexted bool and constant to select We already have the related folds for zext-of-bool, so it should make things more consistent to have this transform to select for sext-of-bool too: https://alive2.llvm.org/ce/z/YikdfA Fixes #53319	2022-01-20 15:57:01 -05:00
Philip Reames	118babe67a	[SLP] Use for loops for walking bundle elements	2022-01-20 12:44:33 -08:00
Philip Reames	860038e0d7	[SLP] Rename a couple lambdas to be more clearly separate from method names	2022-01-20 12:13:30 -08:00
Roman Lebedev	ba8eb31bd9	[InstCombine] Instruction sinking: fix check for function terminating block Checking for specific function terminating opcodes means we don't handle other non-hardcoded ones :) This should probably be generalized to something similar to the `IsBlockFollowedByDeoptOrUnreachable()`. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117810	2022-01-20 22:41:31 +03:00
Sanjay Patel	2d031ec5e5	[InstCombine] add one-use check to opposite shift folds Test comments say this might be intentional, but I don't see any hard evidence to support it. The extra instruction shows up as a potential regression in D117680. One test does show a missed fold that might be recovered with better demanded bits analysis.	2022-01-20 13:49:23 -05:00
Craig Topper	9abc593e98	[TargetLowering][InstCombine] Simplify BSwap demanded bits code a little. NFC Use alignDown instead of &= ~7. Replace ResultBit with NLZ. (BitWidth - NLZ - NTZ == 8) so (BitWidth - NTZ - 8 == NLZ). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D117804	2022-01-20 10:45:17 -08:00
Nadav Rotem	191a6e9dfa	optimize icmp-ugt-ashr This diff optimizes the sequence icmp-ugt(ashr,C_1) C_2. InstCombine already implements this optimization for sgt, and this patch adds support ugt. This patch adds the check for UGT. @craig.topper came up with the idea and proof: define i1 @src(i8 %x, i8 %y, i8 %c) { %cp1 = add i8 %c, 1 %i = shl i8 %cp1, %y %i.2 = ashr i8 %i, %y %cmp = icmp eq i8 %cp1, %i.2 ;Assume: C + 1 == (((C + 1) << y) >> y) call void @llvm.assume(i1 %cmp) ; uncomment for the sgt case %j = shl i8 %cp1, %y %j.2 = sub i8 %j, 1 %cmp2 = icmp ne i8 %j.2, 127 ;Assume (((c + 1 ) << y) - 1) != 127 call void @llvm.assume(i1 %cmp2) %s = ashr i8 %x, %y %r = icmp sgt i8 %s, %c ret i1 %r } define i1 @tgt(i8 %x, i8 %y, i8 %c) { %cp1 = add i8 %c, 1 %j = shl i8 %cp1, %y %j.2 = sub i8 %j, 1 %r = icmp sgt i8 %x, %j.2 ret i1 %r } declare void @llvm.assume(i1) This change is related to the optimizations in D117252. Differential Revision: https://reviews.llvm.org/D117365	2022-01-20 09:31:46 -08:00
Sjoerd Meijer	fabf1de132	[FuncSpec] Add a reference, and some other clarifying comments. NFC.	2022-01-20 17:01:08 +00:00
Philip Reames	c104fca36b	{SLP] Delete dead code in favor of proper assert [NFC]	2022-01-20 08:54:12 -08:00
Philip Reames	c43ebae838	[SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC]	2022-01-20 08:46:44 -08:00
Philip Reames	3c422cbe6b	[SLP] Add an asser to make a non-obvious precondition clear [NFC]	2022-01-20 08:24:10 -08:00
Nikita Popov	0d20407d1a	Reapply [MemCpyOpt] Look through pointer casts when checking capture This is a recommit of the patch without changes. The reason for the revert has been addressed in D117679. ----- The user scanning loop above looks through pointer casts, so we also need to strip pointer casts in the capture check. Previously the source was incorrectly considered not captured if a bitcast was passed to the call.	2022-01-20 09:30:21 +01:00
Nikita Popov	655a7024db	Reapply [MemCpyOpt] Make capture check during call slot optimization more precise This is a recommit of the patch without changes. The reason for the revert has been addressed in D117679. ----- Call slot optimization is currently supposed to be prevented if the call can capture the source pointer. Due to an implementation bug, this check currently doesn't trigger if a bitcast of the source pointer is passed instead. I'm somewhat afraid of the fallout of fixing this bug (due to heavy reliance on call slot optimization in rust), so I'd like to strengthen the capture reasoning a bit first. In particular, I believe that the capture is fine as long as a) the call itself cannot depend on the pointer identity, because neither dest has been captured before/at nor src before the call and b) there is no potential use of the captured pointer before the lifetime of the source alloca ends, either due to lifetime.end or a return from a function. At that point the potentially captured pointer becomes dangling. Differential Revision: https://reviews.llvm.org/D115615	2022-01-20 09:30:20 +01:00
Nikita Popov	d7bff2e9d2	[MemCpyOpt] Fix metadata merging during call slot optimization Call slot optimization currently merges the metadata between the call and the load. However, we also need to merge in the metadata of the store. Part of the reason why we might have gotten away with this previously is that usually the load and the store are the same instruction (a memcpy), this can only happen if call slot optimization occurs on an actual load/store pair. This addresses the issue reported in https://reviews.llvm.org/D115615#3251386. Differential Revision: https://reviews.llvm.org/D117679	2022-01-20 09:25:13 +01:00
Heejin Ahn	eb675e972d	[WebAssembly] Support Wasm EH + Wasm SjLj D108960 added support for SjLj using Wasm EH instructions, which we call Wasm SjLj going forward. (We call the old SjLj Emscripten SjLj) But it did not support using Wasm EH and Wasm SjLj together. So far users of Wasm EH had to use Wasm EH with Emscripten SjLj, which had a certain limitation and it suffered from bigger code size increases as well. This enables using Wasm EH and Wasm SjLj together. 1. This redirects `catchswitch` and `cleanupret` that unwind to caller to `catch.dispatch.longjmp` BB, which is a `catchswitch` BB that handles longjmps. 2. D108960 converted all longjmpable `call`s to `invokes` that unwind to `catch.dispatch.longjmp`. This CL checks if the `call` is embedded within another `catchpad`, and if so, makes it unwind to its nearest parent's unwind destination, rather than `catch.dispatch.longjmp`. This is necessary to preserve the scoping structure. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117610	2022-01-19 20:13:54 -08:00
Craig Topper	02d9a4d56d	[LoopPeel] Pass TripCount to computePeelCount by value instead of by reference. NFC The TripCount is not modified by the function so it doesn't need to be passed by reference. Verified by passing it as const reference before changing to value. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D117735	2022-01-19 17:54:45 -08:00
Craig Topper	1507786c22	[LoopPeeling] Fix stale comments. NFC These comments were not updated when PeelingPreferences split from UnrollingPreferences.	2022-01-19 17:00:12 -08:00
Johannes Doerfert	b4a7559844	[OpenMP][FIX] Replace ICVs only with values valid at the getter position While we might know the value if an ICV at a getter position it is not always clear that we can simply use it. Verify the value is valid first to avoid invalid IR. Fixes #53300.	2022-01-19 18:40:13 -06:00
Eli Friedman	86cdff0e21	[OpenMPOpt] Use SetVector to store list of kernels. Fixes test failures on reverse-iteration buildbot.	2022-01-19 13:55:32 -08:00
Wenlei He	7cca13bc3a	[PartialInline] Bail out on asm-goto/callbr Fixing ICE when partial inline tries to deal with blockaddress uses of function which is typical for asm-goto/callbr. We ran into this with PGO multi-region partial inline. Differential Revision: https://reviews.llvm.org/D117509	2022-01-19 10:57:57 -08:00
Nikita Popov	4dc4815f56	[MemCpyOpt] Add some debug output to call slot optimization (NFC)	2022-01-19 15:51:10 +01:00
Nikita Popov	5ba73c924d	[BuildLibCalls] Mark calloc as inaccessiblememonly Now that DSE handles inaccessiblememonly calloc, mark it as such, as we do with other memory allocation functions.	2022-01-19 12:55:09 +01:00
Nikita Popov	26f81984e7	[DSE] Handle inaccessiblememonly calloc Change the DSE calloc handling to assume that it is inaccessiblememonly, i.e. the defining access is liveOnEntry. Differential Revision: https://reviews.llvm.org/D117543	2022-01-19 12:55:09 +01:00
Sjoerd Meijer	d544a89a37	[LoopFlatten] Update MemorySSA state I would like to move LoopFlatten from LoopPass Manager LPM2 to LPM1 (D116612), but that is a LPM that is using MemorySSA and so LoopFlatten needs to preserve MemorySSA and this adds that. More specifically, LoopFlatten restructures the CFG and with this change the MSSA state is updated accordingly, where we also update the DomTree. LoopFlatten doesn't rewrite/optimise/delete load or store instructions, so I have not added any MSSA updates for that. Differential Revision: https://reviews.llvm.org/D116660	2022-01-19 10:57:33 +00:00
Nikita Popov	d56b0ad441	[ConstantHoist] Remove check for notional overindexing ConstantHoist currently only hoists GEPs if there is no notional overindexing. As this transform only hoists address arithmetic, it shouldn't care about whether any overindexing occurs or not. There is one caveat: If the hoisted base GEP is inbounds, and a later non-inbounds GEP is rewritten in terms of it, the value may be incorrectly poisoned. To avoid this, restrict the transform to inbounds GEPs for now, as the notional overindexing check effectively did that as well. The inbounds restriction could be dropped by dropping inbounds from the base GEP expression. Differential Revision: https://reviews.llvm.org/D117201	2022-01-19 11:32:10 +01:00
Nikita Popov	a115bbea9b	[Attributor] Remove notional overindexing check AAPointerInfo currently bails on constant expression GEPs with notional overindexing. I don't think this is necessary, as the following code handling GEPOperator will deal with arbitrary indices appropriately. Differential Revision: https://reviews.llvm.org/D117203	2022-01-19 11:30:04 +01:00
Florian Hahn	165e36bf18	[VPlan] Assert can IV is only used by increments during epilogue vec. After resetting the start value of the canonical IV, it might not be canonical any more. Add an assertion to make sure it is only used by its increment, to avoid potential mis-use. Suggested in D117140.	2022-01-19 10:10:05 +00:00
Chuanqi Xu	c8ecf12bc3	[Coroutines] Offering llvm.coro.align intrinsic It is a known problem that we can't align the switch-based coroutine frame if the alignment exceeds std::max_align_t (which is 16 usually). We could solve the problem on the middle-end by dynamically transforming or in the frontend by emitting aligned allocation function. If we need to solve it in the frontend, the middle end need to offer an intrinsic to tell the alignment at least. This patch tries to offer such an intrinsic called llvm.coro.align. Reviewed By: https://reviews.llvm.org/D117542 Differential revision: https://reviews.llvm.org/D117542	2022-01-19 09:52:45 +08:00
spupyrev	13d1364a34	A better profi rebalancer This is an extension of profi post-processing step that rebalances counts in CFGs that have basic blocks w/o probes (aka "unknown" blocks). Specifically, the new version finds many more "unknown" subgraphs and marks more "unknown" basic blocks as hot (which prevents unwanted optimization passes). I see up to 0.5% perf on some (large) binaries, e.g., clang-10 and gcc-8. The algorithm is still linear and yields no build time overhead.	2022-01-18 12:14:24 -08:00
Ellis Hoag	5b9358d774	[InstrProf][NFC] Add InstrProfInstBase base The `InstrProfInstBase` class is for all `llvm.instrprof.*` intrinsics. In a later diff we will add new instrinsic of this type. Also refactor some logic in `InstrProfiling.cpp`. Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D117261	2022-01-18 11:12:00 -08:00
Adrian Tong	ea27adb45b	[NFC] Test commit. This is just a test commit to check whether I got commit permission.	2022-01-18 19:01:04 +00:00
Mircea Trofin	3e8553aab4	[mlgo][inline] Improve global state tracking The global state refers to the number of the nodes currently in the module, and the number of direct calls between nodes, across the module. Node counts are not a problem; edge counts are because we want strictly the kind of edges that affect inlining (direct calls), and that is not easily obtainable without iteration over the whole module. This patch avoids relying on analysis invalidation because it turned out to be too aggressive in some cases. It leverages the fact that Node objects are stable - they do not get deleted while cgscc passes are run over the module; and cgscc pass manager invariants. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D115847	2022-01-18 17:45:34 +00:00
Jan Svoboda	5f4ae56457	[llvm] Remove uses of `std::vector<bool>` LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement. This patch does just that for llvm. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D117121	2022-01-18 18:20:45 +01:00
Sanjay Patel	2d50630efb	[InstCombine] reduce code duplication; NFC	2022-01-18 12:13:45 -05:00
Hans Wennborg	53a51acc36	Revert "[MemCpyOpt] Make capture check during call slot optimization more precise" This casued a miscompile due to call slot optimization replacing a call argument without considering the call's !noalias metadata, see discussion on the code review. > Call slot optimization is currently supposed to be prevented if > the call can capture the source pointer. Due to an implementation > bug, this check currently doesn't trigger if a bitcast of the source > pointer is passed instead. I'm somewhat afraid of the fallout of > fixing this bug (due to heavy reliance on call slot optimization > in rust), so I'd like to strengthen the capture reasoning a bit first. > > In particular, I believe that the capture is fine as long as a) > the call itself cannot depend on the pointer identity, because > neither dest has been captured before/at nor src before the > call and b) there is no potential use of the captured pointer > before the lifetime of the source alloca ends, either due to > lifetime.end or a return from a function. At that point the > potentially captured pointer becomes dangling. > > Differential Revision: https://reviews.llvm.org/D115615 Also reverting the dependent commit: > [MemCpyOpt] Look through pointer casts when checking capture > > The user scanning loop above looks through pointer casts, so we > also need to strip pointer casts in the capture check. Previously > the source was incorrectly considered not captured if a bitcast > was passed to the call. This reverts commit `487a34ed9d` and `00e6869463`.	2022-01-18 17:41:49 +01:00
Daniil Kovalev	d8e0e125a2	[InstCombine] Simplify addends reordering logic Previously some constants were not pushed to the top of the resulting expression tree as intended by the algorithm. We can remove the logic from simplifyFAdd and rely on SimplifyAssociativeOrCommutative to do that. Differential Revision: https://reviews.llvm.org/D117302	2022-01-18 16:00:47 +03:00
David Sherwood	e781620dee	[LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled When SVE is enabled for AArch64 targets it makes more sense to use the get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping from the intrinsic to the 'whilelo' instruction for legal vector types. This instruction neatly takes overflow into account as well. This patch fixes an issue in VPInstruction::generateInstruction that assumed we are only dealing with fixed-width vectors. Differential Revision: https://reviews.llvm.org/D117109	2022-01-18 11:59:30 +00:00
pvellien	4e1c207726	[SimplifyCFG] Fix assertion failure when reusing table switch comparison After D116332, some icmps no longer fold with the target-independent constant folder. The SimplifyCFG code assumed that the comparison would always fold, which is not guaranteed. Explicitly check that the result is either true or false. Differential Revision: https://reviews.llvm.org/D117184	2022-01-18 09:30:54 +01:00
Philip Reames	26049b8ce3	[GlobalOpt] Generalize malloc-to-global for any allocation function We can generalize the malloc-to-global transform for other allocation functions which are both a) removable, and b) have a known initialization value. One subtlety that I want to point out - mostly because I hadn't realized it was true until I took a closer look - is that the existing code doesn't prove that initialization/malloc happens only once. The initialization function can be called multiple times. This is correct without special handling for malloc as undef can map to any value previously written, but a non-undef initializing allocation it means we may end up memseting the new global repeatedly. In particular, this means it's not legal to fold the memset into the initializer of the global. Differential Revision: https://reviews.llvm.org/D117503	2022-01-17 15:06:23 -08:00
Philip Reames	6ca192de58	[LoopDeletion] Add back statistic update lost in `523573e` Caught by a couple of builders as an unused variable warning (e.g. https://lab.llvm.org/buildbot#builders/57/builds/13973).	2022-01-17 12:20:51 -08:00
Philip Reames	523573e90d	[LoopDeletion] Revert `3af8a11` and add test coverage for breakage This reverts `3af8a11` because I'd used an upper bound where an lower bound was required. The included reduced test case demonstrates the issue.	2022-01-17 11:44:03 -08:00
Stephen Tozer	32417b3203	[DebugInfo] ValueMapper impl for DIArgList respects IgnoreMissingLocals This patch fixes an issue in which SSA value reference within a DIArgList would be unnecessarily dropped by llvm-link, even when invoking on a single file (which should be a no-op). The reason for the difference is that the ValueMapper does not refer to the RF_IgnoreMissingLocals flag for LocalAsMetadata contained within a DIArgList; this flag is used for direct LocalAsMetadata uses to preserve SSA references even when the ValueMapper does not have an explicit mapping for the referenced SSA value, which appears to always be the case when using llvm-link in this manner. Differential Revision: https://reviews.llvm.org/D114355	2022-01-17 17:17:32 +00:00
Sanjay Patel	4cdf30d9d3	[InstCombine] FP with reassoc FMF: (X * C) + X --> X * (MulC + 1.0) This fold already exists for scalars via FAddCombine (and that's why 2 of the tests are only changed cosmetically), but that code misses vectors and has largely been replaced by simpler folds over time, so this is another step towards removing it.	2022-01-17 10:38:05 -05:00
Florian Hahn	aa7f0e6a55	[DSE] Remove commented-out InvisibleToCallerBeforeRet. (NFC) This code was is a leftover from earlier changes and should be removed.	2022-01-17 13:59:13 +00:00
Sanjay Patel	7037d110fa	[InstCombine] propagate IR flags from binop through select The tests with constant folding that produces poison could potentially remove the select entirely: https://alive2.llvm.org/ce/z/e-WUqF ...but this patch just removes the FMF-only limitation on propagation.	2022-01-17 08:42:48 -05:00
Florian Hahn	500fe60957	[VPlan] Drop unnecessary uses of getVPSingleValue (NFC).	2022-01-17 13:27:33 +00:00
Nikita Popov	12bee2c054	[GlobalOpt] Drop an incorrect check This was a last-minute addition to D117249, and of course I ended up inverting the condition in a way that caused an uninitialized memory read. I've dropped it entirely, as I don't think we actually care whether the size is zero or not here. The previous code wasn't checking this either.	2022-01-17 10:10:56 +01:00
Nikita Popov	499f1ca79f	[GlobalOpt] Use generic type when converting malloc to global The malloc to global transform currently determines the type of the global by looking at bitcasts of the malloc. This is limited (the transform fails if there are multiple different types) and incompatible with opaque pointers. My initial approach was to construct an appropriate struct type based on usage in loads/stores. What this patch does instead is to always create an [i8 x AllocSize] global, without trying to guess types at all. This does mean that other transforms that require a certain global type may break. I fixed two of these in D117034 and D117223, which I believe should be sufficient to avoid regressions. In particular, the global SRA change should end up splitting the global into naturally-typed sub-globals, at which point all other optimizations should work. Differential Revision: https://reviews.llvm.org/D117092	2022-01-17 09:55:33 +01:00
Nikita Popov	4796b4ae7b	[GlobalOpt] Make global SRA offset based Currently global SRA uses the GEP structure to determine how to split the global. This patch instead analyses the loads and stores that are performed on the global, and collects which types are used at which offset, and then splits the global according to those. This is both more general, and works fine with opaque pointers. This is also closer to how ordinary SROA is performed. Differential Revision: https://reviews.llvm.org/D117223	2022-01-17 09:28:36 +01:00
Nikita Popov	00b77d917c	[DSE] Remove alloc function check in canSkipDef() canSkipDef() currently skips inaccessiblememonly calls, but not if they are allocation functions. This check was added in D103009, but actually seems to be a leftover from a previous implementation in D101440. canSkipDef() is not used on the storeIsNoop() path, where the relevant transform ended up being implemented. Differential Revision: https://reviews.llvm.org/D117005	2022-01-17 09:23:51 +01:00
Florian Hahn	070d1034da	[LV] Restore metadata to disable runtime unrolling for epilogue loop. After `d4a8fc3a87` LV stopped adding metadata to disable runtime unrolling to the vectorized epilogue loop. This was missed because `278aa65cc4` removed the relevant test coverage. This patch fixes that by adding the relevant metadata after vector loop generation.	2022-01-16 13:14:16 +00:00
Florian Hahn	62739204d4	[LV] Move AddRuntimeUnrollDisableMetaData so it can be used earlier (NFC) Move up the definition of AddRuntimeUnrollDisableMetaData, so it can be re-used earlier in the file in a follow-up patch.	2022-01-16 10:30:24 +00:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Nikita Popov	d1675e4944	[AttrBuilder] Remove empty() / td_empty() methods The empty() method is a footgun: It only checks whether there are non-string attributes, which is not at all obvious from its name, and of dubious usefulness. td_empty() is entirely unused. Drop these methods in favor of hasAttributes(), which checks whether there are any attributes, regardless of whether these are string or enum attributes.	2022-01-15 17:57:18 +01:00
Florian Hahn	e00158ed5c	[LoopUtils] Use InstSimplifyFolder in addRuntimeChecks. Use the InstSimplifyFolder introduced earlier to perform initial simplification during runtime check construction.	2022-01-15 15:21:16 +00:00
Vitaly Buka	35d00fdc10	[msan] Reset shadow of byval before call If function is not sanitized we must reset shadow, not copy. Depends on D117285 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117286	2022-01-14 22:35:43 -08:00
Quentin Colombet	a8ca4046e2	[LSR] Fix crash in Phi node with EHPad block This fixes a crash I observed in issue #48708 where the LSR pass tries to insert an instruction in a basic block with only a catchswitch statement in there. This happens because the Phi node being evaluated assumes the same value for different basic blocks. If the basic block associated with the incoming value of the operand being evaluated has an EHPad terminator LSR skips optimizing it. But if that incoming value can come from multiple different blocks there can be some incoming basic blocks which are terminated in an EHPad. If these are then rewritten in RewriteForPhi the ones containing an EHPad terminator will hit the "Insertion point must be a normal instruction" assert in AdjustInsertPositionForExpand. This fix makes CollectLoopInvariantFixupsAndFormulae also ignore cases where the same value has another incoming basic block with an EHPad, same as it already does in case the primary value has one. Patch by Lorenz Brun <lorenz@brun.one> Differential Revision: https://reviews.llvm.org/D98378	2022-01-14 18:53:18 -08:00
Vitaly Buka	0a46b6ec4e	[msan] Clear byval shadow in ignored functions If function has no sanitize_memory we still reset shadow for nested calls. The first return from getShadow() correctly returned shadow for argument, but it didn't reset shadow of byval pointee. Depends on D117277 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D117278	2022-01-14 17:32:07 -08:00
Vitaly Buka	4959708502	[NFC][msan] Consolidate clean shadow handling Depends on D117276 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117277	2022-01-14 17:06:39 -08:00
Vitaly Buka	18e4369e19	[NFC][msan] Don't setOrigin for byval pointer It's NFC because shadow of pointer is clean so origins will not be propagated anyway. Depends on D117275 Reviewed By: kda, eugenis Differential Revision: https://reviews.llvm.org/D117276	2022-01-14 16:42:26 -08:00
Heejin Ahn	c3a68c5d63	[SROA] Bail out on PHIs in catchswitch BBs In the process of rewriting `alloca`s and `phi`s that use them, the SROA pass can try to insert a non-PHI instruction by calling `getFirstInsertionPt()`, which is not possible in a catchswitch BB. This CL makes we bail out on these cases. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D117168	2022-01-14 14:55:07 -08:00
Congzhe Cao	fa6a2876c7	[LoopInterchange] Enable interchange with multiple inner loop indvars Currently loop interchange only supports loops with one inner loop induction variable. This patch adds support for transformation with more than one inner loop induction variables. The induction PHIs and induction increment instructions are moved/duplicated properly to the new outer header and the new outer latch, respectively. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D114917	2022-01-14 16:28:41 -05:00
Vitaly Buka	3552177229	[NFC][msan] Reorder branches in complex if Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D117274	2022-01-14 13:22:43 -08:00
Nadav Rotem	9551fc57b7	Fold ashr-exact into a icmp-ugt. This commit optimizes the code sequence: icmp-XXX (ashr-exact (X, C_1), C_2). Instcombine already implements this optimization for sgt, and this patch adds support to additional predicates. The transformation is legal for all predicates if the 'exact' flag is set, and to SGE, UGE, SLT, ULT when the exact flag is not present. This pattern is found in the std::vector bounds checks code of the at() method. Alive2 proof: https://alive2.llvm.org/ce/z/JT_WL8 Differential Revision: https://reviews.llvm.org/D117252	2022-01-14 12:58:44 -08:00
Jessica Paquette	acb8de565e	[JumpThreading] Change asserts for WantInteger into actual checks After `e734e8286b`, it is possible to end up in a situation where an `indirectbr` is fed by a cast, which is in turn fed by an operation which only produces integers. `indirectbr` expects a block address, however these operations can't produce that. There were several asserts in `computeValueKnownInPredecessorsImpl` which check that we're not looking for a block address if we're walking through something which can never produce one. Since it's now possible to hit these asserts, this changes them into actual checks which return false if `Preference` is not `WantInteger`. This adds a testcase which verifies that we don't crash anymore in these situations. Differential Revision: https://reviews.llvm.org/D99814	2022-01-14 11:15:14 -08:00
Florian Hahn	42b34facfd	Recommit "[LV] Inline CreateSplatIV call for scalar VFs." This reverts the revert commit `073c27b5e5`. A reduced test case has been added in `5e4966cbae` and the code has been updated to handle the case where getInductionOpcode returns BinaryOpsEnd. In this case, the original code was always using Instruction::Add. Do the same in the patch. Note this commit may slightly change the value naming, because it now also assigns the 'induction' name in the floating point case.	2022-01-14 19:03:49 +00:00
Sanjay Patel	02455bea6b	[InstCombine] remove unnecessary use check on X >>exact == 0 fold The transform replaces one icmp with another, so we should not care if the shift has another use.	2022-01-14 12:52:16 -05:00
Florian Hahn	1ef9bfa013	[InstSimplify] Pass pointer and indices separately to SimplifyGEPInst. This doesn't require callers to put the pointer operand and the indices in a container like a vector when calling the function. This is not really an issue with the existing callers. But when using it from IRBuilder the inputs are available as separate pointer value and indices ArrayRef. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D117038	2022-01-14 09:59:52 +00:00
Caroline Concatto	8e5a5b619d	[InstCombine] Fold for masked scatters to a uniform address When masked scatter intrinsic does a uniform store to a destination address from a source vector, and in this case, the mask is all one value. This patch replaces the masked scatter with an extracted element of the last lane of the source vector and stores it in the destination vector. This patch also folds when the value in the masked scatter is a splat. In this case, the mask cannot be all zero, and it folds to a scalar store of the value in the destination pointer. Differential Revision: https://reviews.llvm.org/D115724	2022-01-14 09:44:34 +00:00
Bryce Wilson	28b6e2cb3d	[Attributor] [NFC] Use canonical variable name Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 23:06:00 -08:00
Vitaly Buka	71a4fde397	[NFC][msan] Init few vars later	2022-01-13 22:00:37 -08:00
Vitaly Buka	36138d8252	[NFC][msan] Declare some getShadow vars later	2022-01-13 21:36:37 -08:00
James Y Knight	073c27b5e5	Revert "[LV] Inline CreateSplatIV call for scalar VFs (NFC)." Causes a crash with the following (creduce'd) test-case: clang -O3 '--target=aarch64-grtev4-linux-gnu' -xc - -c -o /dev/null <<EOF int e; int f; int g() { int h; int j = 0; while (&f - j > 0) { int k; k = j; if (e == j && *e) k = 5; h = k; j++; } return h; } EOF This reverts commit `7ce48be0fd`.	2022-01-14 00:00:02 +00:00
Philip Reames	5d5d4d94f0	[Attributor] Generalize heap to stack to any allocator with relevant properties This completes removal of the isXLike queries, and depends on a whole series of earlier patches which have already landed. Differential Revision: https://reviews.llvm.org/D117242	2022-01-13 15:33:24 -08:00
Philip Reames	cf66f01ec1	[Attributor] Share code for abstract interpretation of allocation sizes with getObjectSize [NFC-ish] The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code. Note this is not NFC for two reasons: * The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space. * I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that. Differential Revision: https://reviews.llvm.org/D117241	2022-01-13 15:33:24 -08:00
Arthur Eubanks	9a0fe1b0fc	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Relanding with fix for compile times D117236. Reviewed By: nikic, mtrofin Differential Revision: https://reviews.llvm.org/D115545	2022-01-13 14:48:38 -08:00
Arthur Eubanks	757e044dce	[Inliner] Don't removeDeadConstantUsers() when checking if a function is dead If a function has many uses, this can take a good chunk of compile times. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117236	2022-01-13 14:29:45 -08:00
Congzhe Cao	37e34b74e9	[LoopInterchange] Enable interchange with multiple outer loop indvars This patch enables loop interchange with multiple outer loop induction variables, and hence removes the limitation that only a single outer loop induction variable is supported. In fact, it turns out that the current pass already trivially supports multiple outer indvars, which is the result of a previous patch `https://reviews.llvm.org/D102743`. Therefore, this patch removed that limitation and provides test cases for multiple outer indvars. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D114916	2022-01-13 16:51:32 -05:00
Roman Lebedev	82c8aca934	[SimplifyCFG] Be more aggressive when sinking into block followed by unreachable I strongly believe we need some variant of this. The main problem is e.g. that the glibc's assert has 4 parameters, but the profitability check is only okay with one extra phi node, so D116692 doesn't even trigger on most of the expected cases. While that restriction probably makes sense in normal code, if we are about to run off of a cliff (into an `unreachable`), this successor block is unlikely so the cost to setup these PHI nodes should not be on the hotpath, and shouldn't matter performance-wise. Likewise, we don't sink if there are unconditional predecessors UNLESS we'd sink at least one non-speculatable instruction, which is a performance workaround, but if we are about to run into `unreachable`, it shouldn't matter. Note that we only allow the case where there are at most unconditiona branches on the way to the unreachable block. Differential Revision: https://reviews.llvm.org/D117045	2022-01-13 23:30:31 +03:00
Florian Hahn	3f2fb767e3	[VPlan] Make IV operand explicit for VPWidenCanonicalIVRecipe (NFC). This makes the def-use relationship between VPCanonicalIVPHIRecipe and VPWidenCanonicalIVRecipe explicit. Needed for D117140.	2022-01-13 11:13:05 +00:00
Nikita Popov	1cbb456123	[GlobalOpt] Fix global to select transform under opaque pointers We need to check that the load/store type is also the same, as this is no longer implicitly checked through the pointer type.	2022-01-13 11:13:06 +01:00
Florian Hahn	7ce48be0fd	[LV] Inline CreateSplatIV call for scalar VFs (NFC). This is a NFC change split off from D116123, as suggested there. D116123 will remove the last user of CreateSplatIV.	2022-01-13 09:34:31 +00:00
James Y Knight	55fcbf0a84	Revert "[Inline] Attempt to delete any discardable if unused functions" Somehow this ends up causing an infinite loop in the inliner. This reverts commit `d5be48c66d`.	2022-01-13 03:06:47 +00:00
Philip Reames	9979299705	[Attributor] Simplify how we handle required alignment during heap-to-stack [NFC] The existing code duplicated the same concern in two places, and (weirdly) changed the inference of the allocation size based on whether we could meet the alignment requirement. Instead, just directly check the allocation requirement.	2022-01-12 17:34:17 -08:00
Philip Reames	d1f4c6a611	[Attributor] Generalize calloc handling in heap-to-stack for any init value [NFC] Rewrite the calloc specific handling in heap-to-stack to allow arbitrary init values. The basic problem being solved is that if an allocation is initilized to anything other than zero, this must be explicitly done for the formed alloca as well. This covers the calloc case today, but once a couple of earlier guards are removed in this code, downstream allocators with other init values could also be handled. Inspired by discussion on D116971	2022-01-12 16:58:39 -08:00
Philip Reames	8e76720cf2	[Attributor] Reuse object size evaluation code [NFC]	2022-01-12 16:58:39 -08:00
Philip Reames	db57065b36	[Attributor] Use getAllocAlignment where possible [NFC] Inspired by D116971.	2022-01-12 16:58:39 -08:00
Arthur Eubanks	fe827a93f6	[ModuleInliner] Properly delete dead functions Followup to D116964 where we only did this in the CGSCC inliner. Fixes leaks reported in D116964.	2022-01-12 09:57:43 -08:00
Arthur Eubanks	d5be48c66d	[Inline] Attempt to delete any discardable if unused functions Previously we limited ourselves to only internal/private functions. We can also delete linkonce_odr functions. Minor compile time wins: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=instructions Major memory wins on tramp3d: https://llvm-compile-time-tracker.com/compare.php?from=d51e3474e060cb0e90dc2e2487f778b0d3e6a8de&to=bccffe3f8d5dd4dda884c9ac1f93e51772519cad&stat=max-rss Reviewed By: nikic, mtrofin Differential Revision: https://reviews.llvm.org/D115545	2022-01-12 08:36:04 -08:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
Florian Hahn	e3275cfa94	[BuildLibCalls] Add nounwind,willreturn to memset_pattern{4,8,16}. Similar to memset, memset_pattern{4,8,16} all will return and do not unwind. Use fallthrough to include all attributes also set for memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D114904	2022-01-12 10:32:53 +00:00
Nikita Popov	5642ce5ac2	[GlobalOpt] Drop redundant setExternallyInitialized() call (NFC) This is part of copyAttributesFrom().	2022-01-12 09:42:58 +01:00
Nikita Popov	47a47733f0	[GlobalStatus] Remove unused HasNonInstructionUser member (NFC) This hasn't been used in a long time.	2022-01-12 09:40:54 +01:00
Nikita Popov	f3e87176e1	[GlobalOpt] Support "stored once" optimization for different types GlobalOpt can optimize a global with undef initializer and a single store to put the stored value into the initializer instead. Currently, this requires the type of the global and the store to match. This patch extends support to cases with different types (but same size), in which case we create a new global to replace the old one. Differential Revision: https://reviews.llvm.org/D117034	2022-01-12 09:39:31 +01:00
Chuanqi Xu	22225cc5e6	[Coroutines] Handle lifetime markers, bitcast and unused instruciton for symmetric transfer This fixes bug49888. The root cause for this is that simplifyTerminatorLeadingToRet didn't handle lifetime markers well. Another issue also noted in D116327 is that we deleted some inlined optimization pass in CoroSplit so that simplifyTerminatorLeadingToRet need to remove dead instructions by hand. This patch fixes bug49888 by skipping lifetime markers and bitcast instruction and removing dead instructions by hand in simplifyTerminatorLeadingToRet. Reviewed By: junparser Differential Revision: https://reviews.llvm.org/D116330	2022-01-12 15:58:38 +08:00
Mircea Trofin	248d55af3e	[NFC][MLGO] Use LazyCallGraph::Node to track functions. This avoids the InlineAdvisor carrying the responsibility of deleting Function objects. We use LazyCallGraph::Node objects instead, which are stable in memory for the duration of the Module-wide performance of CGSCC passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which is the case here) Differential Revision: https://reviews.llvm.org/D116964	2022-01-11 19:23:47 -08:00
Chuanqi Xu	403772ff1c	[Coroutines] Enhance symmetric transfer for constant CmpInst This fixes bug52896. Simply, some symmetric transfer optimization chances get invalided due to we delete some inlined optimization passes in `822b92a`. This would cause stack-overflow in some situations which should be avoided by the design of coroutine. This patch tries to fix this by transforming the constant CmpInst instruction which was done in the deleted passes. Reviewed By: rjmccall, junparser Differential Revision: https://reviews.llvm.org/D116327	2022-01-12 10:14:37 +08:00
Kevin Athey	7ea175d1c6	Add 'eager-checks' as a module parameter to MSAN. This creates a way to configure MSAN to for eager checks that will be leveraged by the introduction of a clang flag (-fsanitize-memory-param-retval). This is redundant with the existing flag: -mllvm -msan-eager-checks. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D116855	2022-01-11 14:30:49 -08:00
Philip Reames	6bf590d6e8	[InstCombine] Pull out a helper function to simplify upcoming patch [NFC]	2022-01-11 13:05:25 -08:00
Philip Reames	75de92d3e2	[DSE] Seperate malloc+memset -> calloc transform from noop store dedection [NFC] This transformation has nothing to do with whether the store is a noop. The memset becomes a noop, but only after we replace the malloc with a calloc.	2022-01-11 12:55:59 -08:00
Philip Reames	e2e7ecf25d	[DSE] Minor style improvements to calloc formation code [NFC]	2022-01-11 12:18:23 -08:00
Philip Reames	a1bf4ddac6	[DSE] Generalize store null to calloc allocated memory [NFC-ish] This change removes a direct check for calloc-like allocation functions, and instead handles the generic case where we're storing a constant to constant initialized memory. This is mostly to remove the call to isCallocLike, but if someone downstream happens to have an initialized alloc which initializes to e.g. -1, this will also kick in for them. (I don't know of such an example ftr.)	2022-01-11 12:02:51 -08:00
Philip Reames	3712372fa5	[DSE] Style improvements after `3cef3cf` - remove redundant dyn_casts [NFC] I'd been working on exactly the same patch when Nikita landed his, so this patch is basically the style diff between the two. :)	2022-01-11 08:39:18 -08:00
Nikita Popov	94d6263391	[GlobalStatus] Look through non-constexpr casts analyzeGlobal() looks through non-constexpr cast instructions when looking for users. However, this particular place only strips the casts again if they are constexprs. We should be looking through all casts here.	2022-01-11 16:02:35 +01:00
Nikita Popov	3cef3cf02f	[DSE] Check for noalias calls rather than alloc functions For these "visible on unwind/ret" checks we only care about the fact that no other code has access to the pointer (unless it escapes). A noalias call is sufficient for this, it does not have to be a known allocation function. This is basically the same change as D116728, but for DSE rather than LICM.	2022-01-11 12:22:16 +01:00
Florian Hahn	2d67a86b7c	[SCEVExpander] Use IntToPtr for temporary instruction. Use PtrToInt instead Add when creating temporary instructions. The add might get folded away with more sophisticated folding.	2022-01-11 09:40:21 +00:00
Philip Reames	abc787fbf3	Delete a stale comment	2022-01-10 18:18:34 -08:00
Philip Reames	5265ac72c6	[MemoryBuiltin] Add an API for checking if an unused allocation can be removed [NFC] Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.	2022-01-10 15:43:39 -08:00
Craig Topper	38b30eb2b2	[LowerMatrixIntrinsics] Call getRegisterClassForType before getNumberOfRegisters. getNumberOfRegisters takes a ClassID as it's argument. It shouldn't be passed a bool. Assuming the bool meant vector or not, we should call getRegisterClassForType first. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D116903	2022-01-10 15:32:13 -08:00
Roman Lebedev	82fb4f4b22	[SCEV] Sequential/in-order `UMin` expression As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692, SCEV is forbidden from reasoning about 'backedge taken count' if the branch condition is a poison-safe logical operation, which is conservatively correct, but is severely limiting. Instead, we should have a way to express those poison blocking properties in SCEV expressions. The proposed semantics is: ``` Sequential/in-order min/max SCEV expressions are non-commutative variants of commutative min/max SCEV expressions. If none of their operands are poison, then they are functionally equivalent, otherwise, if the operand that represents the saturation point* of given expression, comes before the first poison operand, then the whole expression is not poison, but is said saturation point. ``` * saturation point - the maximal/minimal possible integer value for the given type The lowering is straight-forward: ``` compare each operand to the saturation point, perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks, and if reduction returned true then return saturation point else return the naive min/max reduction over the operands ``` https://alive2.llvm.org/ce/z/Q7jxvH (2 ops) https://alive2.llvm.org/ce/z/QCRrhk (3 ops) Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97 That allows us to handle the patterns in question. Reviewed By: nikic, reames Differential Revision: https://reviews.llvm.org/D116766	2022-01-10 20:51:26 +03:00
Bryce Wilson	fb936595fa	[MemoryBuiltins] Add field for alignment argument [NFC] There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically. This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately. Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)	2022-01-10 09:15:20 -08:00
Philip Reames	f4c54683d6	[instcombine] Infer alignment for aligned_alloc with potentially zero size This change removes a previous restriction where we had to prove the allocation performed by aligned_alloc was non-zero in size before using the align parameter to annotate the result. I believe this was conservatism around the C11 specification of this routine which allowed UB when size was not a multiple of alignment, but if so, it was a partial one at best. (ex: align 32, size 16 was equally UB, but not restricted) The spec has since been clarified to require nullptr return, not UB. A nullptr - the documented return for this function on failure for all cases after UB mentioned above was removed - is trivially aligned for any power of two. This isn't totally new behavior even for this transform, we'd previously annotate potentially failing allocs (e.g. huge sizes) meaning we were putting align on potentially null pointers anyways. This change simpy does the same for all failure modes.	2022-01-10 08:48:49 -08:00
Johannes Doerfert	7b39dccbe4	[Attributor][FIX] Ensure "IsExact" is false for non-exact accesses If we look at potentially interfering accesses we need to ensure the "IsExact" flag is set appropriately. Accesses that have an "unknown" size or offset cannot be exact matches and we missed to flag that. Error and test reported by Serguei N. Dmitriev.	2022-01-10 10:09:36 -06:00
Simon Pilgrim	c1f1359882	[PGOInstrumentation] populateEHOperandBundle - earlyout if !isa<CallBase> All paths (that actually do anything) require a successful dyn_cast<CallBase> - so just earlyout if the cast fails Fixes static analyzer nullptr deference warning	2022-01-10 15:34:37 +00:00
Simon Pilgrim	353484d191	[LowerExpectIntrinsic] Use cast<> instead of dyn_cast<> to avoid dereference of nullptr. NFC	2022-01-10 15:34:37 +00:00
David Sherwood	b0922a9dcd	[LoopVectorize] Make VPWidenCanonicalIVRecipe::execute work for scalable vectors The code in VPWidenCanonicalIVRecipe::execute only worked for fixed-width vectors due to the way we generate the values per lane. This patch changes the code to use a combination of vector splats and step vectors to get the same result. This then works for both fixed-width and scalable vectors. Tests that exercise this code path for scalable vectors have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Differential Revision: https://reviews.llvm.org/D113180	2022-01-10 14:12:32 +00:00
Nuno Lopes	7b1cb72ad9	[SROA] Switch replacement of dead/UB/unreachable ops from undef to poison SROA has 3 data-structures where it stores sets of instructions that should be deleted: - DeadUsers -> instructions that are UB or have no users - DeadOperands -> instructions that are UB or operands of useless phis - DeadInsts -> "dead" instructions, including loads of uninitialized memory with users The first 2 sets can be RAUW with poison instead of undef. No brainer as UB can be replaced with poison, and for instructions with no users RAUW is a NOP. The 3rd case cannot be currently replaced with poison because the set mixes the loads of uninit memory. I leave that alone for now. Another case where we can use poison is in the construction of vectors from multiple loads. The base vector for the first insertelement is now poison as it doesn't matter as it is fully overwritten by inserts. Differential Revision: https://reviews.llvm.org/D116887	2022-01-10 14:04:26 +00:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Florian Hahn	003ac239d8	[SROA] Reduce the number of times a IRBuilder is constructed (NFC). This patch reduces the number of times IRBuilders need to be constructed in SROA.cpp by passing existing ones by reference to the appropriate places.	2022-01-10 12:09:13 +00:00
Florian Hahn	aecad5828e	[SCEVExpander] Only create trunc when needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to create TruncTripCount if it is actually used. Sink the TruncTripCount creating into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 11:31:27 +00:00
David Sherwood	e3c84fb948	[LoopVectorize] Add support for tail folding using scalable vectors This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount whereby we weren't correctly generating the runtime trip count for scalable vectors when tail-folding. It also removes some asserts in the tail-folding path for cases when the VF is not scalable. In this patch I have only permitted tail-folding to be enabled explicitly for scalable vectors when the user has specified one of the following flags: -prefer-predicate-over-epilogue=predicate-dont-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue For now it's best not to enable tail-folding with scalable vectors for low trip counts or when optimising for code size, since there has been no analysis on whether this is worth it. Various tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll The tests cannot be target independent because they require masked load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore need to return true. Differential Revision: https://reviews.llvm.org/D113003	2022-01-10 10:55:40 +00:00
Florian Hahn	ad1b8772cf	[SCEVExpander] Only create multiplication if needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to compute \|Step\| * Trip count if the result of the multiplication is actually used. Sink the multiplication into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-10 08:49:25 +00:00
Nikita Popov	92d55e7336	[MemoryBuiltins] Remove isNoAliasFn() in favor of isNoAliasCall() We currently have two similar implementations of this concept: isNoAliasCall() only checks for the noalias return attribute. isNoAliasFn() also checks for allocation functions. We should switch to only checking the attribute. SLC is responsible for inferring the noalias return attribute for non-new allocation functions (with a missing case fixed in `348bc76e35`). For new, clang is responsible for setting the attribute, if -fno-assume-sane-operator-new is not passed. Differential Revision: https://reviews.llvm.org/D116800	2022-01-10 09:18:15 +01:00
Johannes Doerfert	4e8a02e7f4	[Attributor][FIX] Remove assumption that doesn't have to hold There is no guarantee we strip all GEPOperators and the conservative handling doesn't even require us to.	2022-01-09 13:15:53 -06:00
Florian Hahn	1ce01b7dfe	[SCEVExpander] Simplify cleanup, skip sorting by dominance. There is no need to sort inserted instructions by dominance, as the deletion loop still requires RAUW with undef before deleting. Removing instructions in reverse insertion order should still insure that the number of uselist updates is kept to a minimum.	2022-01-09 18:38:41 +00:00

... 2 3 4 5 6 ...

29671 Commits