llvm-project

Commit Graph

Author	SHA1	Message	Date
Michał Górny	f6f1fd443f	Revert "[InstCombine] allow more folds more multi-use selects" This reverts commit `681a6a3990`. It broke sanitizer tests (as seen on buildbots), see: https://reviews.llvm.org/rG681a6a399022#1143137	2022-11-13 07:27:01 +01:00
Mengxuan Cai	ec210f3942	[LoopFuse] Ensure inner loops are in loop simplified form under new PM LoopInfo doesn't give all loops in a loop nest, it gives top level loops only. While isLoopSimplifyForm() only checkes for the outter most loop of a loop nest. As a result, inner loops that are not in simplied form can not be simplified with the original code. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D137672	2022-11-11 15:55:59 -05:00
Sanjay Patel	681a6a3990	[InstCombine] allow more folds more multi-use selects The 'and' case showed up in a recent bug report and prevented more follow-on transforms from happening. We could handle more patterns (for example, the select arms simplified, but not to constant values), but this seems like a safe, conservative enhancement. The backend can convert select-of-constants to math/logic in many cases if it is profitable. There is a lot of overlapping logic for these kinds of patterns (see SimplifySelectsFeedingBinaryOp() and FoldOpIntoSelect()), so there may be some opportunity to improve efficiency. There are also optimization gaps/inconsistency because we do not call this code for all bin-opcodes (see TODO for ashr test).	2022-11-11 15:26:54 -05:00
Jordan Rupprecht	81896f88ce	[NFC] Remove unused OrigLoopID vars	2022-11-11 07:51:40 -08:00
Florian Hahn	2d7e5e29b7	[LV] Remove unused OrigLoopID argument from completeLoopSekelton (NFC). The argument is not used any longer and can be removed.	2022-11-11 15:39:08 +00:00
Nikita Popov	4d33cf4166	[MemCpyOpt] Avoid moving lifetime marker above def (PR58903) This is unlikely to happen with opaque pointers, so just bail out of the transform, rather than trying to move bitcasts/etc as well. Fixes https://github.com/llvm/llvm-project/issues/58903.	2022-11-11 15:06:34 +01:00
Fangrui Song	8901635423	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) This fixes two problems. (a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to linger around (unreferenced, hence benign), and is now correctly discarded. ``` int foo(); inline int v = foo(); ``` (b) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' #include "c.h" int main() { bar(); foo(); } eof cat > m2.cc <<'eof' #include "c.h" __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw ``` If a GlobalAlias references a GlobalValue which is just changed to available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other available_externally functions, so it cannot easily be removed. Depends on D137441: we use available_externally to mark a GlobalAlias in a non-prevailing COMDAT, similar to how we handle GlobalVariable/Function. GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to GlobalVariable gives flexibility for future extensions (the use case is niche. For simplicity we don't handle it yet). In addition, available_externally GlobalAlias is the most straightforward implementation and retains the aliasee information to help optimizers. See windows-vftable.ll: Windows vftable uses an alias pointing to a private constant where the alias is the COMDAT leader. The COMDAT use case is skeptical and ThinLTO does not discard the alias in the non-prevailing COMDAT. This patch retains the behavior. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-11-10 21:54:43 -08:00
Alan Zhao	885e6105b4	Revert "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `89ddcff1d2`. Reason: This breaks bootstrapping builds of LLVM on Windows using ThinLTO; see https://crbug.com/1382839	2022-11-10 17:48:18 -08:00
William Huang	bd2b5ec803	[InstCombine] PR58901 - fix bug with swapping GEP of different types Fix https://github.com/llvm/llvm-project/issues/58901 by adding stricter check whether non-opaque GEP can be swapped. This will not affect GEP swapping optimization in the future since we are switching to opaque GEP Reviewed By: clin1 Differential Revision: https://reviews.llvm.org/D137752	2022-11-10 20:24:41 +00:00
Sanjay Patel	b57819e130	[VectorCombine] widen a load with subvector insert This adapts/copies code from the existing fold that allows widening of load scalar+insert. It can help in IR because it removes a shuffle, and the backend can already narrow loads if that is profitable in codegen. We might be able to consolidate more of the logic, but handling this basic pattern should be enough to make a small difference on one of the motivating examples from issue #17113. The final goal of combining loads on those patterns is not solved though. Differential Revision: https://reviews.llvm.org/D137341	2022-11-10 14:11:32 -05:00
Alexey Bataev	b505fd559d	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-10 10:59:54 -08:00
Wu, Yingcong	7f07c4d513	[SanitizerCoverage] Fix wrong pointer type return from CreateSecStartEnd() `CreateSecStartEnd()` will return pointer to the input type, so when called with `CreateSecStartEnd(M, SanCovCFsSectionName, IntptrPtrTy)`, `SecStartEnd.first` and `SecStartEnd.second` will have type `IntptrPtrPtrTy`, not `IntptrPtrTy`. This problem should not impact the functionality and with opaque pointer enable, this will not trigger any alarm. But if runs with `-no-opaque-pointers`, this mismatch pointer type will cause type check assertion in `CallInst::init()` to fail. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D137310	2022-11-09 23:29:04 -08:00
wlei	47b0758049	[SampleFDO] Persist profile staleness metrics into binary With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system. For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`) In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D136698	2022-11-09 22:34:33 -08:00
OCHyams	23bb4735ca	[Assignment Tracking][10/*] salvageDebugInfo for dbg.assign intrinsics The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Plumb in salvaging for the address part of dbg.assign intrinsics. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133293	2022-11-09 11:49:46 +00:00
Nikita Popov	ce2f9ba2c9	[SCCP] Add helper for getting constant range (NFC) Add a helper for the recurring pattern of getting a constant range if the value lattice element is one, or a full range otherwise.	2022-11-09 12:42:36 +01:00
OCHyams	a9025f57ba	[Assignment Tracking][8/] Add DIAssignID merging utilities The Assignment Tracking debug-info feature is outlined in this RFC: https://discourse.llvm.org/t/ rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir Add method: Instruction::mergeDIAssignID( ArrayRef<const Instruction > SourceInstructions) which merges the DIAssignID metadata attachments on `SourceInstructions` and `this` and replaces uses of the original IDs with the new shared one. This is used when stores are merged, for example sinking stores out of a if-diamond CFG or vectorizing contiguous stores. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D133291	2022-11-09 10:46:04 +00:00
Akira Hatanaka	295861514e	[ObjC][ARC] Fix non-deterministic behavior in ProvenanceAnalysis ProvenanceAnalysis::relatedCheck was giving different answers depending on the order in which the pointers were passed. Specifically, it was returning different values when A and B were both loads and were both referring to identifiable objects, but only one was used by a store instruction.	2022-11-08 15:05:25 -08:00
OCHyams	fd16ff3a7e	Reapply: [NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h Reverted in `b22d80dc6a`. Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the Assignment Tracking patch stack and remove redundant parameter Src. Reviewed By: jryans Differential Revision: https://reviews.llvm.org/D132357	2022-11-08 16:25:39 +00:00
Alexey Bataev	b5d91ab73e	[SLP]Fix PR58863: Mask index beyond mask size for non-power-2 insertelement analysis. Need to check if the insertelement mask size is reached during cost analysis to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D137639	2022-11-08 07:54:57 -08:00
skc7	42bce72536	Reapply "[SLP] Extend reordering data of tree entry to support PHInodes". Reapplies `87a2086` (which was reverted in `656f1d8`). Fix for scalable vectors in getInsertIndex merged in `46d53f4`. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D137537	2022-11-08 21:21:28 +05:30
Nathan James	6aa050a690	Reland "[llvm][NFC] Use c++17 style variable type traits" This reverts commit `632a389f96`. This relands commit `1834a310d0`. Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 14:15:15 +00:00
Nathan James	632a389f96	Revert "[llvm][NFC] Use c++17 style variable type traits" This reverts commit `1834a310d0`.	2022-11-08 13:11:41 +00:00
skc7	46d53f45d8	[SLP][NFC] Restructure getInsertIndex Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D137567	2022-11-08 18:07:50 +05:30
Nathan James	1834a310d0	[llvm][NFC] Use c++17 style variable type traits This was done as a test for D137302 and it makes sense to push these changes Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D137493	2022-11-08 12:22:52 +00:00
Dmitry Makogon	ebac59999f	[SimpleLoopUnswitch] Skip trivial selects in guards conditions unswitch candidates We do this for conditional branches, but not for guards for some reason. Fixes pr58666. Differential Revision: https://reviews.llvm.org/D137249	2022-11-08 13:29:27 +07:00
Matt Arsenault	e661185fb3	InstCombine: Fold fdiv nnan x, 0 -> copysign(inf, x) https://alive2.llvm.org/ce/z/gLBFKB	2022-11-07 22:00:15 -08:00
skc7	9d96feb19b	[SLP][NFC] Restructure areTwoInsertFromSameBuildVector Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D137569	2022-11-08 09:32:19 +05:30
Shubham Sandeep Rastogi	b22d80dc6a	Revert "[NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h" This reverts commit `80378a4ca7`. I am reverting this patch because I need to revert `171f7024cc` and without reverting this patch, reverting `171f7024cc` causes conflicts. Patch `171f7024cc` introduced a cyclic dependancy in the module build. https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/48197/consoleFull#-69937453049ba4694-19c4-4d7e-bec5-911270d8a58c In file included from <module-includes>:1: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/Argument.h:18:10: fatal error: cyclic dependency in module 'LLVM_IR': LLVM_IR -> LLVM_intrinsic_gen -> LLVM_IR ^ While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14: While building module 'LLVM_IR' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57: In file included from <module-includes>:12: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/DebugInfo.h:24:10: fatal error: could not build module 'LLVM_intrinsic_gen' ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14: In file included from <module-includes>:15: In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCContext.h:23: /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:10: fatal error: could not build module 'LLVM_IR' ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:10: fatal error: could not build module 'LLVM_MC' ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ 4 errors generated.	2022-11-07 15:19:04 -08:00
Fangrui Song	89ddcff1d2	[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its leader has been made available_externally. This violates the COMDAT rule that its members must be retained or discarded as a unit. To fix this, update the regular LTO change D34803 to track local linkage GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.) This fixes two problems. (a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to linger around (unreferenced, hence benign), and is now correctly discarded. ``` int foo(); inline int v = foo(); ``` (b) Fix https://github.com/llvm/llvm-project/issues/58215: as a size optimization, we place private `__profd_` in a COMDAT with a `__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change makes the `__profd_` available_externally. ``` cat > c.h <<'eof' extern void bar(); inline __attribute__((noinline)) void foo() {} eof cat > m1.cc <<'eof' #include "c.h" int main() { bar(); foo(); } eof cat > m2.cc <<'eof' #include "c.h" __attribute__((noinline)) void bar() { foo(); } eof clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_.profraw ``` If a GlobalAlias references a GlobalValue which is just changed to available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other available_externally functions, so it cannot easily be removed. Depends on D137441: we use available_externally to mark a GlobalAlias in a non-prevailing COMDAT, similar to how we handle GlobalVariable/Function. GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to GlobalVariable gives flexibility for future extensions (the use case is niche. For simplicity we don't handle it yet). In addition, available_externally GlobalAlias is the most straightforward implementation and retains the aliasee information to help optimizers. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D135427	2022-11-07 10:07:10 -08:00
Miguel Saldivar	de36d39e24	[InstCombine] Avoid passing pow attributes to sqrt As described in issue #58475, we could pass the attributes of pow to sqrt and crash. Differential Revision: https://reviews.llvm.org/D137454	2022-11-07 12:07:37 -05:00
Alexey Bataev	ecd0b5a532	Revert "[SLP]Redesign vectorization of the gather nodes." This reverts commit `8ddd1ccdf8` to fix buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839	2022-11-07 08:35:21 -08:00
Matt Devereau	a8c24d57b8	[InstCombine] Remove redundant splats in InstCombineVectorOps Splatting the first vector element of the result of a BinOp, where any of the BinOp's operands are the result of a first vector element splat can be simplified to splatting the first vector element of the result of the BinOp Differential Revision: https://reviews.llvm.org/D135876	2022-11-07 15:39:05 +00:00
Matt Arsenault	0f68ffe1e2	InstCombine: Fold compare with smallest normal if input denormals are flushed Try to simplify comparisons with the smallest normalized value. If denormals will be treated as 0, we can simplify by using an equality comparison with 0. fcmp olt fabs(x), smallest_normalized_number -> fcmp oeq x, 0.0 fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0 fcmp oge fabs(x), smallest_normalized_number -> fcmp one x, 0.0 fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0 The device libraries have a few range checks that look like this for denormal handling paths.	2022-11-07 07:16:47 -08:00
OCHyams	80378a4ca7	[NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the Assignment Tracking patch stack and remove redundant parameter Src. Reviewed By: jryans Differential Revision: https://reviews.llvm.org/D132357	2022-11-07 15:14:43 +00:00
Alexey Bataev	8ddd1ccdf8	[SLP]Redesign vectorization of the gather nodes. Gather nodes are vectorized as simply vector of the scalars instead of relying on the actual node. It leads to the fact that in some cases we may miss incorrect transformation (non-matching set of scalars is just ended as a gather node instead of possible vector/gather node). Better to rely on the actual nodes, it allows to improve stability and better detect missed cases. Differential Revision: https://reviews.llvm.org/D135174	2022-11-07 07:04:38 -08:00
Nikita Popov	9a45e4beed	[MemCpyOpt] Move lifetime marker before call to enable call slot optimization Currently call slot optimization may be prevented because the lifetime markers for the destination only start after the call. In this case, rather than aborting the transform, we should move the lifetime.start before the call to enable the transform. Differential Revision: https://reviews.llvm.org/D135886	2022-11-07 15:26:00 +01:00
luxufan	49143f9d14	[IndVars] Forget the SCEV when the instruction has been sunk. In the past, the SCEV expression of the sunk instruction was not forgetted. This led to the incorrect block dispositions after the instruction be sunk. Fixes https://github.com/llvm/llvm-project/issues/58662 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D137060	2022-11-06 22:46:49 +08:00
Sanjay Patel	bff6880a5f	[SimplifyLibCalls] improve code readability for AttributeList propagation; NFC It is possible that we can do better on some of these transforms by passing some subset of attributes, but we were not doing that in any of the changed code. So it's better to give that a name to indicate we're clearing attributes or make that more obvious by using the default-constructed empty list.	2022-11-06 09:07:17 -05:00
Sanjay Patel	1c6ebe29d3	[InstCombine] reduce multi-use casts+masks As noted in the code comment, we could generalize this: https://alive2.llvm.org/ce/z/N5m-eZ It saves an instruction even without a constant operand, but the 'and' is wider. We can do that as another step if it doesn't harm anything. I noticed that this missing pattern with a constant operand inhibited other transforms in a recent bug report, so this is enough to solve that case.	2022-11-06 09:07:17 -05:00
David Green	656f1d8b74	Revert "[SLP] Extend reordering data of tree entry to support PHI nodes" This reverts commit `87a20868eb` as it has problems with scalable vectors and use-list orders. Test to follow.	2022-11-06 11:43:51 +00:00
Florian Hahn	a41cb8bf58	[SimpleLoopUnswitch] Forget block & loop dispos during trivial unswitch. Unswitching adjusts the CFG in ways that may invalidate cached loop dispositions. Clear all cached block and loop dispositions during trivial unswitching. The same is already done for non-trivial unswitching. Fixes #58751.	2022-11-05 16:56:06 +00:00
chenglin.bi	6703290361	[InstCombine] fold `sub + and` pattern with specific const value `C1 - ((C3 - X) & C2) --> (X & C2) + (C1 - (C2 & C3))` when: (C3 - ((C2 & C3) - 1)) is pow2 && ((C2 + C3) & ((C2 & C3) - 1)) == ((C2 & C3) - 1) && C2 is negative pow2 \|\| (C3 - X) is nuw https://alive2.llvm.org/ce/z/HXQJV- Fix: #58523 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D136582	2022-11-05 12:58:45 +08:00
Matthew Voss	a4b543a5a5	[llvm-profdata] Check for all duplicate entries in MemOpSize table Previously, we only checked for duplicate zero entries when merging a MemOPSize table (see D92074), but a user recently provided a reproducer demonstrating that other entries can also be duplicated. As demonstrated by the test in this patch, PGOMemOPSizeOpt can potentially generate invalid IR for non-zero, non-consecutive duplicate entries. This seems to be a rare case, since the duplicate entry is often below the threshold, but possible. This patch extends the existing warning to check for any duplicate values in the table, both in the optimization and in llvm-profdata. Differential Revision: https://reviews.llvm.org/D136211	2022-11-04 17:08:54 -07:00
Florian Hahn	9a456b7ad3	[IndVars] Forget SCEV for replaced PHI. Additional SCEV verification highlighted a case where the cached loop dispositions where incorrect after simplifying a phi node in IndVars. Fix it by invalidating the phi before replacing it. Fixes #58750	2022-11-04 18:42:07 +00:00
Sanjay Patel	710e34e136	[VectorCombine] move load safety checks to helper function; NFC These checks can be re-used with other potential transforms such as a load of a subvector-insert.	2022-11-04 10:39:37 -04:00
Juan Manuel MARTINEZ CAAMAÑO	96ad51e3eb	[StructurizeCFG][DebugInfo] Avoid use-after-free Reviewed By: dstuttard Differential Revision: https://reviews.llvm.org/D137408	2022-11-04 13:39:49 +00:00
Alex Gatea	7d0648cb6c	[GVN] Patch for invalid GVN replacement If PRE is performed as part of the main GVN pass (to PRE GEP operands before processing loads), and it is performed across a backedge, we will end up adding the new instruction to the leader table of a block that has not yet been processed. When it will be processed, GVN will incorrectly assume that the value is already available, even though it is only available at the end of the block. Avoid this by not performing PRE across backedges. Fixes https://github.com/llvm/llvm-project/issues/58418. Differential Revision: https://reviews.llvm.org/D136095	2022-11-04 14:28:17 +01:00
Nikita Popov	304f1d59ca	[IR] Switch everything to use memory attribute This switches everything to use the memory attribute proposed in https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly attributes are dropped. The readnone, readonly and writeonly attributes are restricted to parameters only. The old attributes are auto-upgraded both in bitcode and IR. The bitcode upgrade is a policy requirement that has to be retained indefinitely. The IR upgrade is mainly there so it's not necessary to update all tests using memory attributes in this patch, which is already large enough. We could drop that part after migrating tests, or retain it longer term, to make it easier to import IR from older LLVM versions. High-level Function/CallBase APIs like doesNotAccessMemory() or setDoesNotAccessMemory() are mapped transparently to the memory attribute. Code that directly manipulates attributes (e.g. via AttributeList) on the other hand needs to switch to working with the memory attribute instead. Differential Revision: https://reviews.llvm.org/D135780	2022-11-04 10:21:38 +01:00
Nikita Popov	01ec0ff2dc	[SimplifyCFG] Allow speculating block containing assume() SpeculativelyExecuteBB(), which converts a branch + phi structure into a select, currently bails out if the block contains an assume (because it is not speculatable). Adjust the fold to ignore ephemeral values (i.e. assumes and values only used in assumes) for cost modelling purposes, and drop them when performing the fold. Theoretically, we could try to preserve the assume information by generating a assume(br_cond \|\| assume_cond) style assume, but this is very unlikely to to be useful (because we don't do anything useful with assumes of this form) and it would make things substantially more complicated once we take operand bundle assumes into account (which don't really support a \|\| operation). I'd prefer not to do that without good motivation. Differential Revision: https://reviews.llvm.org/D137339	2022-11-04 09:26:35 +01:00
Congzhe Cao	75b33d6bd5	[LoopInterchange] Check phis in all subloops This is the bugfix to the miscompile mentioned in https://reviews.llvm.org/D132055#3814831. The IR that reproduced the bug is added as the test case in this patch. What this patch does is that, during legality phase instead of checking the phi nodes only in `InnerLoop` and `OuterLoop`, we check phi nodes in all subloops of the `OuterLoop`. Suppose if the loop nest is triply nested, and `InnerLoop` and `OuterLoop` is the middle loop and the outermost loop respectively, we'll check phi nodes in the innermost loop as well, in addition to the ones in the middle and outermost loops. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D134930	2022-11-04 00:20:52 -04:00
Florian Hahn	8086b0c8a8	[ConstraintElim] Drop bail out for scalable vectors after using getTrue ConstantInt::getTrue/getFalse can materialize scalable vectors with all lanes true/false.	2022-11-03 19:05:45 +00:00
LiDongjin	d1cee3539f	[LoopVectorize] Fix crash on "Cannot dereference end iterator!"(PR56627) Check hasOneUser before user_back(). Differential Revision: https://reviews.llvm.org/D136227	2022-11-03 23:13:37 +08:00
Nikita Popov	68b24c3b44	[CVP] Simplify comparisons without constant operand CVP currently only tries to simplify comparisons if there is a constant operand. However, even if both are non-constant, we may be able to determine the result of the comparison based on range information. IPSCCP is already capable of doing this, but because it runs very early, it may miss some cases. Differential Revision: https://reviews.llvm.org/D137253	2022-11-03 15:35:27 +01:00
bipmis	150fc73dda	[AggressiveInstCombine] Avoid load merge/widen if stores are present b/w loads This patch is to address the test cases in which the load has to be inserted at a right point. This happens when there is a store b/w the loads. This patch reverts the loads merge in all cases when stores are present b/w loads and will eventually be replaced with proper fix and test cases. Differential Revision: https://reviews.llvm.org/D137333	2022-11-03 14:32:07 +00:00
Nikita Popov	b03f7c3365	[SimplifyCFG] Use range based for loop (NFC)	2022-11-03 15:21:45 +01:00
Nikita Popov	592a96c03b	[SimplifyCFG] Extract code for tracking ephemeral values (NFC) To allow reusing this in more places in SimplifyCFG.	2022-11-03 14:28:12 +01:00
Alexey Bataev	f090e3c00f	[SLP]Fix write after bounds. Need to use comma instead of + symbol to prevent writing after bounds.	2022-11-03 05:30:41 -07:00
Alexey Bataev	8b015b2078	[SLP][NFC]Formatting and reduce number of iterations, NFC.	2022-11-03 05:30:13 -07:00
Florian Hahn	44d8f80b26	[ConstraintElim] Use ConstantInt::getTrue to create constants (NFC). Use existing ConstantInt::getTrue/getFalse functionality instead of custom getScalarConstOrSplat as suggested by @nikic.	2022-11-03 12:20:23 +00:00
Peter Waller	e1790c8c29	Revert "[InstCombine] Remove redundant splats in InstCombineVectorOps" This reverts commit `957eed0b1a`.	2022-11-03 07:56:03 +00:00
Ye Luo	00b09a7b18	Revert "[AAPointerInfo] refactor how offsets and Access objects are tracked" This reverts commit `b756096b0c`. See regression https://github.com/llvm/llvm-project/issues/58774	2022-11-03 00:01:51 -05:00
Youling Tang	d16b5c3504	[asan] Use proper shadow offset for loongarch64 in instrumentation passes Instrumentation passes now use the proper shadow offset. There will be many asan test failures without this patch. For example: ``` $ ./lib/asan/tests/LOONGARCH64LinuxConfig/Asan-loongarch64-calls-Test AddressSanitizer:DEADLYSIGNAL ================================================================= ==651209==ERROR: AddressSanitizer: SEGV on unknown address 0x1ffffe2dfa9b (pc 0x5555585e151c bp 0x7ffffb9ec070 sp 0x7ffffb9ebfd0 T0) ==651209==The signal is caused by a UNKNOWN memory access. ``` Before the patch: ``` $ make check-asan Testing Time: 36.13s Unsupported : 205 Passed : 83 Expectedly Failed: 1 Failed : 239 ``` After the patch: ``` $ make check-asan Testing Time: 58.98s Unsupported : 205 Passed : 421 Expectedly Failed: 1 Failed : 89 ``` Differential Revision: https://reviews.llvm.org/D137013	2022-11-03 11:04:47 +08:00
Fangrui Song	1ada819c23	[asan] Default to -fsanitize-address-use-odr-indicator for non-Windows This enables odr indicators on all platforms and private aliases on non-Windows. Note that GCC also uses private aliases: this fixes bogus `The following global variable is not properly aligned.` errors for interposed global variables Fix https://github.com/google/sanitizers/issues/398 Fix https://github.com/google/sanitizers/issues/1017 Fix https://github.com/llvm/llvm-project/issues/36893 (we can restore D46665) Global variables of non-hasExactDefinition() linkages (i.e. linkonce/linkonce_odr/weak/weak_odr/common/external_weak) are not instrumented. If an instrumented variable gets interposed to an uninstrumented variable due to symbol interposition (e.g. in issue 36893, _ZTS1A in foo.so is resolved to _ZTS1A in the executable), there may be a bogus error. With private aliases, the register code will not resolve to a definition in another module, and thus prevent the issue. Cons: minor size increase. This is mainly due to extra `__odr_asan_gen_*` symbols. (ELF) In addition, in relocatable files private aliases replace some relocations referencing global symbols with .L symbols and may introduce some STT_SECTION symbols. For lld, with -g0, the size increase is 0.07~0.09% for many configurations I have tested: -O0, -O1, -O2, -O3, -O2 -ffunction-sections -fdata-sections -Wl,--gc-sections. With -g1 or above, the size increase ratio will be even smaller. This patch obsoletes D92078. Don't migrate Windows for now: the static data member of a specialization `std::num_put<char>::id` is a weak symbol, as well as its ODR indicator. Unfortunately, link.exe (and lld without -lldmingw) generally doesn't support duplicate weak definitions (weak symbols in different TUs likely pick different defined external symbols and conflict). Differential Revision: https://reviews.llvm.org/D137227	2022-11-02 19:21:33 -07:00
Florian Hahn	74d8628cf7	[ConstraintElimination] Skip compares with scalable vector types. Materializing scalable vectors with boolean values is not implemented yet. Skip those cases for now and leave a TODO.	2022-11-02 23:57:14 +00:00
Joseph Huber	cccbd2a2b2	Revert "[Attributor][NFCI] Move MemIntrinsic handling into the initializer" This was causing failures when optimizing codes with complex numbers. Revert until a fix can be implemented. This reverts commit `7fdf3564c0`.	2022-11-02 15:28:50 -05:00
Florian Hahn	b478d8b966	[ConstraintElimination] Generate true/false vectors for vector cmps. This fixes crashes when vector compares can be simplified to true/false.	2022-11-02 18:13:35 +00:00
Rong Xu	8acb881c19	[PGO] Add a threshold for number of critical edges in PGO For some auto-generated sources, we have a huge number of critical edges (like from switch statements). We have seen instance of 183777 critical edges in one function. After we split the critical edges in PGO instrumentation/profile-use pass, the CFG is so large that we have compiler time issues in downstream passes (like in machine CSE and block placement). Here I add a threshold to skip PGO if the number of critical edges are too large. The threshold is large enough so that it will not affect the majority of PGO compilation. Also sync the logic for skipping instrumentation and profile-use. I think this is the correct thing to do. Differential Revision: https://reviews.llvm.org/D137184	2022-11-02 10:14:04 -07:00
Sanjay Patel	f03b069c5b	[InstCombine] fold mul with decremented "shl -1" factor (2nd try) This is a corrected version of: `bc886e9b58` I made a copy-paste error that created an "add" instead of the intended "sub" on that attempt. The regression tests showed the bug, but I overlooked that. As I said in a comment on issue #58717, the bug reports resulting from the botched patch confirm that the pattern does occur in many real-world applications, so hopefully eliminating the multiply results in better code. I added one more regression test in this version of the patch, and here's an Alive2 proof to show that exact example: https://alive2.llvm.org/ce/z/dge7VC Original commit message: This is a sibling to: `6064e92b0a` ...but we canonicalize the shl+add to shl+xor, so the pattern is different than I expected: https://alive2.llvm.org/ce/z/8CX16e I have not found any patterns that are safe to propagate no-wrap, so that is not included here. Differential Revision: https://reviews.llvm.org/D137157	2022-11-02 09:30:01 -04:00
OCHyams	88d34d4626	[DebugInfo] Fix minor debug info bug in deleteDeadLoop Using a DebugVariable as the set key rather than std::pair<DIVariable , DIExpression > ensures we don't accidently confuse multiple instances of inlined variables. Reviewed By: jryans Differential Revision: https://reviews.llvm.org/D133303	2022-11-02 13:28:05 +00:00
Sanjay Patel	b24e2f6ef6	[InstCombine] use logical-and matcher to avoid crash Follow-on to: `ec0b406e16` This should prevent crashing for example like issue #58552 by not matching a select-of-vectors-with-scalar-condition. The test that shows a regression seems unlikely to occur in real code. This also picks up an optimization in the case where a real (bitwise) logic op is used. We could already convert some similar select ops to real logic via impliesPoison(), so we don't see more diffs on commuted tests. Using commutative matchers (when safe) might also handle one of the TODO tests.	2022-11-02 08:23:52 -04:00
Matt Devereau	957eed0b1a	[InstCombine] Remove redundant splats in InstCombineVectorOps Splatting the first vector element of the result of a BinOp, where any of the BinOp's operands are the result of a first vector element splat can be simplified to splatting the first vector element of the result of the BinOp Differential Revision: https://reviews.llvm.org/D135876	2022-11-02 11:57:05 +00:00
Max Kazantsev	2baabd2c19	[LoopPredication][NFCI] Perform 'visited' check before pushing to worklist This prevents duplicates to be pushed into the stack and hypothetically should reduce memory footprint on ugly cornercases with multiple repeating duplicates in 'and' tree.	2022-11-02 18:04:23 +07:00
Nikita Popov	134bda4b61	[ValueLattice] Use DL-aware folding in getCompare() Use DL-aware ConstantFoldCompareInstOperands() API instead of ConstantExpr API. The practical effect of this is that SCCP can now fold comparisons that require DL.	2022-11-02 10:41:11 +01:00
Nikita Popov	7ad3bb527e	Reapply [ValueLattice] Fix getCompare() for undef values Relative to the previous attempt, this also updates the ValueLattice unit tests. ----- Resolve the TODO about incorrect getCompare() behavior. This can be made more precise (e.g. by materializing the undef value and performing constant folding on it), but for now just return an unknown result to fix the correctness issue. This should be NFC in terms of user-visible behavior, because the only user of this method (SCCP) was already guarding against UndefValue results.	2022-11-02 10:23:06 +01:00
Bjorn Pettersson	44bb4099cd	[ConstraintElimination] Do not crash on vector GEP in decomposeGEP Commit `359bc5c541` caused Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failures in decomposeGEP when the GEP pointer operand is a vector. Fix is to use DataLayout::getIndexTypeSizeInBits when fetching the index size, as it will use the scalar type in case of a ptr vector. Differential Revision: https://reviews.llvm.org/D137185	2022-11-02 10:04:11 +01:00
Fangrui Song	87d98a7300	[hwasan] Remove no-op setDSOLocal. NFC PrivateLinkage and HiddenVisibility are implicitly dso_local.	2022-11-02 00:17:16 -07:00
Johannes Doerfert	7fdf3564c0	[Attributor][NFCI] Move MemIntrinsic handling into the initializer The handling for MemIntrinsic is not dependent on optimistic information, no need to put it in update for now. Added a TODO though.	2022-11-01 20:37:53 -07:00
Johannes Doerfert	f89deef13e	[Attributor][NFC] Hide verbose output behind `attributor-verbose`	2022-11-01 20:37:53 -07:00
Sanjay Patel	ec0b406e16	[InstCombine] use logical-or matcher to avoid crash This should prevent crashing for the example in issue #58552 by not matching a select-of-vectors-with-scalar-condition. A similar change is likely needed for the related fold to properly fix that kind of bug. The test that shows a regression seems unlikely to occur in real code. This also picks up an optimization in the case where a real (bitwise) logic op is used. We could already convert some similar select ops to real logic via impliesPoison(), so we don't see more diffs on commuted tests. Using commutative matchers (when safe) might also handle one of the TODO tests.	2022-11-01 16:47:41 -04:00
Arthur Eubanks	8c49b01a1e	[GlobalOpt] Don't remove inalloca from varargs functions Varargs and inalloca have a weird interaction where varargs are actually passed via the inalloca alloca. Removing inalloca breaks the varargs because they're still not passed as separate arguments. Fixes #58718 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D137182	2022-11-01 13:04:05 -07:00
Florian Hahn	e83e289fd1	[SCEVExpander] Forget SCEV when replacing congruent phi. Otherwise there may be stale values left over in the cache, causing a SCEV verification failure. Fixes #58702.	2022-11-01 17:49:38 +00:00
Nikita Popov	094d1298d1	Revert "[ValueLattice] Fix getCompare() for undef values" This reverts commit `6de41e6d76`. Missed a unit test affected by this change, revert for now.	2022-11-01 17:03:40 +01:00
Nikita Popov	6de41e6d76	[ValueLattice] Fix getCompare() for undef values Resolve the TODO about incorrect getCompare() behavior. This can be made more precise (e.g. by materializing the undef value and performing constant folding on it), but for now just return an unknown result.	2022-11-01 16:41:24 +01:00
Sanjay Patel	4299b28a9b	[InstCombine] add helper function for select-of-bools folds; NFC This set of folds keeps growing, and it contains bugs like issue #58552, so make it easier to spot those via backtrace.	2022-11-01 11:06:18 -04:00
Florian Hahn	27f6091bb0	[ConstraintElimination] Fix nested GEP check. At the moment, the implementation requires that the outer GEP has a single index, the inner GEP can have an arbitrary indices, because the general `decompose` helper is used.	2022-11-01 14:43:42 +00:00
skc7	87a20868eb	[SLP] Extend reordering data of tree entry to support PHI nodes Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D136757	2022-11-01 04:50:04 +00:00
Sameer Sahasrabuddhe	b756096b0c	[AAPointerInfo] refactor how offsets and Access objects are tracked AAPointerInfo now maintains a list of all Access objects that it owns, along with the following maps: - OffsetBins: OffsetAndSize -> { Access } - InstTupleMap: RemoteI x LocalI -> Access A RemoteI is any instruction that accesses memory. RemoteI is different from LocalI if and only if LocalI is a call; then RemoteI is some instruction in the callgraph starting from LocalI. Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets the value to Unknown if the new offset is not the same as the old offset. The instruction must now be moved from its current bin to the bin corresponding to the new offset. This happens for example, when: - A PHINode has operands that result in different offsets. - The same remote inst is reachable from the same local inst via different paths in the callgraph: ``` A (local inst) \| B / \ C1 C2 \ / D (remote inst) ``` This fixes a bug where a store is incorrectly eliminated in a lit test. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D136526	2022-11-01 09:57:12 +05:30
Florian Mayer	e1de7ac20f	Revert "[InstCombine] fold mul with decremented "shl -1" factor" This reverts commit `bc886e9b58`. Broke MSAN bootstrap buildbots with Assertion `RangeAfterCopy % ExtraScale == 0 && "Extra instruction requires immediate to be aligned"' failed.	2022-10-31 17:39:05 -07:00
Usman Nadeem	32755786e0	[JumpThreading] Put a limit on the PHI nodes when duplicating a BB. Do not duplicate a BB if it has a lot of PHI nodes. If a threadable chain is too long then the number of duplicated PHI nodes can add up, leading to a substantial increase in compile time when rewriting the SSA. Fixes https://github.com/llvm/llvm-project/issues/58203 Differential Revision: https://reviews.llvm.org/D136716 The threshold of 76 in this patch is reasonably high and reduces the compile time of cldwat2m_macro.f90 in SPEC2017/cam4 from 80+min to <2min. Change-Id: I153c89a8e0d89b206a5193dc1b908c67e320717e	2022-10-31 15:51:56 -07:00
Sanjay Patel	1f8ac37e2d	[InstCombine] adjust branch on logical-and fold The transform was just added with: `115d2f69a5` ...but as noted in post-commit feedback, it was confusingly coded. Now, we create the final expected canonicalized form directly and put an extra use check on the match, so we should not ever end up with more instructions.	2022-10-31 17:39:29 -04:00
Philip Reames	a819f6c8d1	[InstCombine] Allow simplify demanded transformations on scalable vectors Differential Revision: https://reviews.llvm.org/D136475	2022-10-31 13:39:36 -07:00
Patrick Walton	01859da84b	[AliasAnalysis] Introduce getModRefInfoMask() as a generalization of pointsToConstantMemory(). The pointsToConstantMemory() method returns true only if the memory pointed to by the memory location is globally invariant. However, the LLVM memory model also has the semantic notion of locally-invariant: memory that is known to be invariant for the life of the SSA value representing that pointer. The most common example of this is a pointer argument that is marked readonly noalias, which the Rust compiler frequently emits. It'd be desirable for LLVM to treat locally-invariant memory the same way as globally-invariant memory when it's safe to do so. This patch implements that, by introducing the concept of a ModRefInfo mask. A ModRefInfo mask is a bound on the Mod/Ref behavior of an instruction that writes to a memory location, based on the knowledge that the memory is globally-constant memory (in which case the mask is NoModRef) or locally-constant memory (in which case the mask is Ref). ModRefInfo values for an instruction can be combined with the ModRefInfo mask by simply using the & operator. Where appropriate, this patch has modified uses of pointsToConstantMemory() to instead examine the mask. The most notable optimization change I noticed with this patch is that now redundant loads from readonly noalias pointers can be eliminated across calls, even when the pointer is captured. Internally, before this patch, AliasAnalysis was assigning Ref to reads from constant memory; now AA can assign NoModRef, which is a tighter bound. Differential Revision: https://reviews.llvm.org/D136659	2022-10-31 13:03:41 -07:00
Sanjay Patel	115d2f69a5	[InstCombine] canonicalize branch with logical-and-not condition https://alive2.llvm.org/ce/z/EfHlWN In the motivating case from issue #58313, this allows forming a duplicate 'not' op which then gets CSE'd and simplifyCFG'd and combined into the expected 'xor'.	2022-10-31 15:51:45 -04:00
Alexey Bataev	99f9bd4807	[SLP]Fix a crash in the analysis of the compatible cmp operands. We can skip the analysis of the operands opcodes, can compare directly them in some cases.	2022-10-31 09:47:25 -07:00
Sameer Sahasrabuddhe	2e7636c13b	[AAPointerInfo] check for Unknown offsets in callee When translating offset info from the callee at a call site, first check if the offset is Unknown. Any offset in the caller should be added only if the callee offset is valid. Differential Revision: https://reviews.llvm.org/D137011	2022-10-31 22:15:56 +05:30
Mengxuan Cai	eda3c93486	[LoopFuse] Ensure loops are in loop simplified form under new PM Loop Fusion (Function Pass) requires loops in simplified form. With legacy-pm, loop-simplify pass is added as a dependency for loop-fusion. But the new pass manager does not always ensure this format. This patch tries to invoke simplifyLoop() on loops that are not in simplified form only for new PM. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D136781	2022-10-31 11:46:28 -04:00
Brendon Cahoon	f59205aef9	[BasicBlockUtils] Add a new way for CreateControlFlowHub() The existing way of creating the predicate in the guard blocks uses a boolean value per outgoing block. This increases the number of live booleans as the number of outgoing blocks increases. The new way added in this change is to store one integer to represent the outgoing block we want to branch to, then at each guard block, an integer equality check is performed to decide which a specific outgoing block is taken. Using an integer reduces the number of live values and decreases register pressure especially in cases where there are a large number of outgoing blocks. The integer based approach is used when the number of outgoing blocks crosses a threshold, which is currently set to 32. Patch by Ruiling Song. Differential review: https://reviews.llvm.org/D127831	2022-10-31 08:58:54 -05:00
Brendon Cahoon	9265b7fa83	NFC: restructure code for CreateControlFlowHub() Differential review: https://reviews.llvm.org/D127830	2022-10-31 08:39:09 -05:00
Sanjay Patel	bc886e9b58	[InstCombine] fold mul with decremented "shl -1" factor This is a sibling to: `6064e92b0a` ...but we canonicalize the shl+add to shl+xor, so the pattern is different than I expected: https://alive2.llvm.org/ce/z/8CX16e I have not found any patterns that are safe to propagate no-wrap, so that is not included here.	2022-10-31 09:06:55 -04:00
Patrick Walton	36bbd68667	[InstCombine] Allow memcpys from constant memory to readonly nocapture parameters to be elided. Currently, InstCombine can elide a memcpy from a constant to a local alloca if that alloca is passed as a nocapture parameter to a function that's readnone or readonly, but it can't forward the memcpy if the argument is marked readonly nocapture, even though readonly guarantees that the callee won't mutate the pointee through that pointer. This patch adds support for detecting and handling such situations, which arise relatively frequently in Rust, a frontend that liberally emits readonly. A more general version of this optimization would use alias analysis to check the call's ModRef info for the pointee, but I was concerned about blowing up compile time, so for now I'm just checking for one of readnone on the function, readonly on the function, or readonly on the parameter. Differential Revision: https://reviews.llvm.org/D136822	2022-10-30 14:41:03 -07:00
Florian Hahn	1f1fb208da	[SimpleLoopUnswitch] Forget block and loop dispositions. Also invalidate block and loop dispositions during non-trivial unswitching. Fixes #58564.	2022-10-30 20:44:22 +00:00
zhongyunde	f58311796c	[InstCombine] refactor the SimplifyUsingDistributiveLaws NFC Precommit for D136015 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D137019	2022-10-30 21:04:06 +08:00
Patrick Walton	cb2cb2d201	[InstCombine] Avoid deleting volatile memcpys. InstCombine can replace memcpy to an alloca with a pointer directly to the source in certain cases. Unfortunately, it also did so for volatile memcpys. This patch makes it stop doing that. This was discovered in D136822. Differential Revision: https://reviews.llvm.org/D137031	2022-10-30 02:40:16 -07:00
Florian Hahn	43f0f1a66f	[VPlan] Use onlyFirstLaneUsed in sinkScalarOperands. Replace custom code to check if only the first lane is used by generic helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few additional cases and was suggested in D133760. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D136368	2022-10-29 19:45:19 +01:00
Sanjay Patel	6064e92b0a	[InstCombine] fold mul with incremented "shl 1" factor X * ((1 << Z) + 1) --> (X << Z) + X https://alive2.llvm.org/ce/z/P-7WK9 It's possible that we could do better with propagating no-wrap, but this carries over the existing logic and appears to be correct. The naming differences on the existing folds are a result of using getName() to set the final value via Builder. That makes it easier to transfer no-wrap rather than the gymnastics required from the raw create instruction APIs.	2022-10-29 12:50:19 -04:00
Sanjay Patel	50000ec2cb	[InstCombine] create helper function for mul patterns with 1<<X; NFC There are at least 2 other potential patterns that could go here.	2022-10-29 12:50:19 -04:00
Sanjay Patel	d344146857	[InstCombine] reduce code duplication in visitMul(); NFC	2022-10-29 09:26:02 -04:00
Arthur Eubanks	5404fe3456	Revert "[LegacyPM] Remove pipeline extension mechanism" This reverts commit `4ea6ffb7e8`. Breaks various backends.	2022-10-28 10:26:58 -07:00
Arthur Eubanks	4ea6ffb7e8	[LegacyPM] Remove pipeline extension mechanism Part of gradually removing the legacy PM optimization pipeline. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D136622	2022-10-28 10:16:52 -07:00
Momchil Velikov	c9b4dc3a81	[FuncSpec][NFC] Avoid redundant computations of DominatorTree/LoopInfo The `FunctionSpecialization` pass needs loop analysis results for its cost function. For this purpose, it computes the `DominatorTree` and `LoopInfo` for a function in `getSpecializationBonus`. This function, however, is called O(number of call sites x number of arguments), but the DominatorTree/LoopInfo can be computed just once. This patch plugs into the PassManager infrastructure to obtain LoopInfo for a function and removes ad-hoc computation from `getSpecializatioBonus`. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D136332	2022-10-28 16:08:41 +01:00
Momchil Velikov	14384c96df	Recommit: [FuncSpec][NFC] Refactor finding specialisation opportunities [recommitting after recommitting a dependency] This patch reorders the traversal of function call sites and function formal parameters to: * do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks. * do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D135968 Change-Id: I7d21081a2479cbdb62deac15f903d6da4f3b8529	2022-10-28 11:26:25 +01:00
Simon Pilgrim	3c71253eb7	ConstraintElimination - pass const DataLayout by reference in (recursive) MergeResults lambda capture. NFC. There's no need to copy this and fixes a coverity remark about large copy by value	2022-10-28 11:20:09 +01:00
David Green	5dd7d2ce67	[InstCombine] Don't change switch table from desirable to illegal types In InstCombine we treat i8/i16 as desirable, even if they are not legal. The current logic in shouldChangeType will decide to convert from an illegal but desirable type (such as an i8) to an illegal and undesirable type (such as i3). This patch prevents changing the switch conditions to an irregular type on like Arm/AArch64 where i8/i16 are not legal. This is the same issue as https://reviews.llvm.org/D54115. In the case I was looking it is was converting an i32 switch to an i8 switch, which then became a i3 switch. Differential Revision: https://reviews.llvm.org/D136763	2022-10-28 10:15:41 +01:00
Juan Manuel MARTINEZ CAAMAÑO	256f8b06c6	[StructurizeCFG][DebugInfo] Maintain DILocations in the branches created by StructurizeCFG Make StructurizeCFG preserve the debug locations of the branch instructions it introduces. Differential Revision: https://reviews.llvm.org/D135967	2022-10-28 02:51:02 -05:00
Alexey Bataev	2ec51f1c75	[SLP]Improve analysis of same/alternate code ops and scheduling. Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531	2022-10-27 16:29:16 -07:00
Alexey Bataev	8ce0c7b1c9	Revert "[SLP]Improve analysis of same/alternate code ops and scheduling." This reverts commit `dad64448c6` to fix a crash in https://lab.llvm.org/buildbot/#/builders/74/builds/14584	2022-10-27 15:21:35 -07:00
wlei	63c27c5c75	Use getCanonicalFnName for callee name	2022-10-27 13:14:47 -07:00
Sanjay Patel	fd90f542cf	[InstCombine] improve efficiency of sub demanded bits; NFC There's no reason to shrink a constant or simplify an operand in 2 steps. This matches what we currently do for 'add' (although that seems like it should be altered to handle the commutative case).	2022-10-27 15:28:05 -04:00
Alexey Bataev	dad64448c6	[SLP]Improve analysis of same/alternate code ops and scheduling. Should improve compile time for analysis and vectorization. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 6380.00 6378.00 -0.0% test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test 2023.00 2022.00 -0.0% test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test 148.00 146.00 -1.4% Generated more vector instructions. Differential Revision: https://reviews.llvm.org/D127531	2022-10-27 11:31:18 -07:00
eopXD	10da9844d0	[LSR] Drop LSR solution if it is less profitable than baseline The LSR may suggest less profitable transformation to the loop. This patch adds check to prevent LSR from generating worse code than what we already have. Since LSR affects nearly all targets, the patch is guarded by the option 'lsr-drop-solution' and default as disable for now. The next step should be extending an TTI interface to allow target(s) to enable this enhancememnt. Debug log is added to remind user of such choice to skip the LSR solution. Reviewed By: Meinersbur, #loopoptwg Differential Revision: https://reviews.llvm.org/D126043	2022-10-27 10:13:57 -07:00
Alexandros Lamprineas	dbeaf6baa2	[FuncSpec] Do not overestimate the specialization bonus for users inside loops. When calculating the specialization bonus for a given function argument, we recursively traverse the chain of (certain) users, accumulating the instruction costs. Then we exponentially increase the bonus to account for loop nests. This is problematic for two reasons: (a) the users might not themselves be inside the loop nest, (b) if they are we are accounting for it multiple times. Instead we should be adjusting the bonus before traversing the user chain. This reduces the instruction count for CTMark (newPM-O3) when Function Specialization is enabled without actually reducing the amount of specializations performed (geomean: -0.001% non-LTO, -0.406% LTO). Differential Revision: https://reviews.llvm.org/D136692	2022-10-27 15:26:11 +01:00
Sanjay Patel	d2d23795ca	[InstCombine] improve demanded bits for Sub operand 0 This is copying the code that was added for 'add' with D130075. (That patch removed a fallthrough in the cases, but we can probably still share at least some code again as a follow-up cleanup, but I didn't want to risk it here.) The reasoning is similar to the carry propagation for 'add': if we don't demand low bits of the subtraction and the subtrahend (aka RHS or operand 1) is known zero in those low bits, then there can't be any borrowing required from the higher bits of operand 0, so the low bits don't matter. Also, the no-wrap flags can be propagated (and I think that should be true for add too). Here's an attempt to prove that in Alive2: https://alive2.llvm.org/ce/z/xqh7Pa (can add nsw or nuw to src and tgt, and it should still pass) Differential Revision: https://reviews.llvm.org/D136788	2022-10-27 09:41:57 -04:00
Momchil Velikov	38f44ccfba	Recommit: [FuncSpec] Fix specialisation based on literals [fixed test to work with reverse iteration] The `FunctionSpecialization` pass has support for specialising functions, which are called with literal arguments. This functionality is disabled by default and is enabled with the option `-function-specialization-for-literal-constant` . There are a few issues with the implementation, though: * even with the default, the pass will still specialise based on floating-point literals * even when it's enabled, the pass will specialise only for the `i1` type (or `i2` if all of the possible 4 values occur, or `i3` if all of the possible 8 values occur, etc) The reason for this is incorrect check of the lattice value of the function formal parameter. The lattice value is `overdefined` when the constant range of the possible arguments is the full set, and this is the reason for the specialisation to trigger. However, if the set of the possible arguments is not the full set, that must not prevent the specialisation. This patch changes the pass to NOT consider a formal parameter when specialising a function if the lattice value for that parameter is: * unknown or undef * a constant * a constant range with a single element on the basis that specialisation is pointless for those cases. Is also changes the criteria for picking up an actual argument to specialise if the argument is: * a LLVM IR constant * has `constant` lattice value has `constantrange` lattice value with a single element. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135893 Change-Id: Iea273423176082ec51339aa66a5fe9fea83557ee	2022-10-27 12:48:20 +01:00
Sameer Sahasrabuddhe	0fd018f9a9	[NFC] [AAPointerInfo] OffsetAndSize is no longer an std::pair The struct OffsetAndSize is a simple tuple of two int64_t. Treating it as a derived class of std::pair has no special benefit, but it makes the code verbose since we need get/set functions that avoid using "first" and "second" in client code. Eliminating the std::pair makes this more readable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D136745	2022-10-27 11:00:28 +05:30
wlei	d6a0585dd1	[SampleFDO] Compute and report profile staleness metrics When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high. Two sets of metrics are introduced here: - (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this. - (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite. This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future. Reviewed By: hoy, wenlei, davidxl Differential Revision: https://reviews.llvm.org/D136627	2022-10-26 21:06:52 -07:00
Johannes Rudolf Doerfert	41a278f56a	[OpenMP][FIX] Do not add custom state machine eagerly in LTO runs If we run LTO optimization we migth end up introducing a custom state machine and later transforming the region into SPMD. This is a problem. While a follow up will introduce a check for the SPMD conversion, this already prevents the eager custom state machine generation. Only if the kernel init function is defined, rather then declared, we will emit a custom state machine. SPMD-zation can happen eagerly though. Tests are adjusted via a weak definition. The LTO test was added to verify this works as expected. Differential Revision: https://reviews.llvm.org/D136740	2022-10-26 10:40:11 -07:00
Alex Brachet	443e2a10f6	Reland "[PGO] Make emitted symbols hidden" This was reverted because it was breaking when targeting Darwin which tried to export these symbols which are now hidden. It should be safe to just stop attempting to export these symbols in the clang driver, though Apple folks will need to change their TAPI allow list described in the commit where these symbols were originally exported `f538018562` Then reverted again because it broke tests on MacOS, they should be fixed now. Bug: https://github.com/llvm/llvm-project/issues/58265 Differential Revision: https://reviews.llvm.org/D135340	2022-10-26 17:13:05 +00:00
Momchil Velikov	9901583968	Revert "[FuncSpec] Fix specialisation based on literals" This reverts commit `a8b0f58017` because of "reverse-iteration" buildbot failure.	2022-10-26 13:54:12 +01:00
Momchil Velikov	2c8a4c6e62	Revert "[FuncSpec][NFC] Refactor finding specialisation opportunities" This reverts commit `a8853924bd` due to dependency on `a8b0f58017`	2022-10-26 13:54:12 +01:00
Guillaume Chatelet	1a726cfa83	Take memset_inline into account in analyzeLoadFromClobberingMemInst This appeared in https://reviews.llvm.org/D126903#3884061 Differential Revision: https://reviews.llvm.org/D136752	2022-10-26 09:50:13 +00:00
Momchil Velikov	a8853924bd	[FuncSpec][NFC] Refactor finding specialisation opportunities This patch reorders the traversal of function call sites and function formal parameters to: * do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks. * do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D135968	2022-10-26 10:18:35 +01:00
Momchil Velikov	606d25e545	[FuncSpec] Compute specialisation gain even when forcing specialisation When rewriting the call sites to call the new specialised functions, a single call site can be matched by two different specialisations - a "less specialised" version of the function and a "more specialised" version of the function, e.g. for a function void f(int x, int y) the call like `f(1, 2)` could be matched by either void f.1(int x /* int y == 2 /); or void f.2(/ int x == 1, int y == 2 */); The `FunctionSpecialisation` pass tries to match specialisation in the order of decreasing gain, so "more specialised" functions are preferred to "less specialised" functions. This breaks, however, when using the flag `-force-function-specialization`, in which case the cost/benefit analysis is not performed and all the specialisations are equally preferable. This patch makes the pass calculate specialisation gain and order the specialisations accordingly even when `-force-function-specialization` is used, under the assumption that this flag has purely debugging purpose and it is reasonable to ignore the extra computing effort it incurs. Reviewed By: ChuanqiXu, labrinea Differential Revision: https://reviews.llvm.org/D136180	2022-10-26 10:08:03 +01:00
Momchil Velikov	a8b0f58017	[FuncSpec] Fix specialisation based on literals The `FunctionSpecialization` pass has support for specialising functions, which are called with literal arguments. This functionality is disabled by default and is enabled with the option `-function-specialization-for-literal-constant` . There are a few issues with the implementation, though: * even with the default, the pass will still specialise based on floating-point literals * even when it's enabled, the pass will specialise only for the `i1` type (or `i2` if all of the possible 4 values occur, or `i3` if all of the possible 8 values occur, etc) The reason for this is incorrect check of the lattice value of the function formal parameter. The lattice value is `overdefined` when the constant range of the possible arguments is the full set, and this is the reason for the specialisation to trigger. However, if the set of the possible arguments is not the full set, that must not prevent the specialisation. This patch changes the pass to NOT consider a formal parameter when specialising a function if the lattice value for that parameter is: * unknown or undef * a constant * a constant range with a single element on the basis that specialisation is pointless for those cases. Is also changes the criteria for picking up an actual argument to specialise if the argument is: * a LLVM IR constant * has `constant` lattice value has `constantrange` lattice value with a single element. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135893	2022-10-26 09:55:33 +01:00
Momchil Velikov	1a525dec7f	[FuncSpec] Fix missed opportunities for function specialisation When collecting the possible constant arguments to specialise a function the compiler will abandon the search on the first argument that is for some reason unsuitable as a specialisation constant. Thus, depending on the traversal order of the functions and call sites, the compiler can end up with a different set of possible constants, hence with different set of specialisations. With this patch, the compiler will skip unsuitable constants, but nevertheless will continue searching for more. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135867	2022-10-25 23:19:48 +01:00
Philip Reames	269bc684e7	[LV][RISCV] Disable vectorization of epilogue loops Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder. In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV. In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV. As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination. As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV. Differential Revision: https://reviews.llvm.org/D136695	2022-10-25 14:28:02 -07:00
Arthur Eubanks	ef37504879	[Instrumentation] Remove legacy passes Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D136615	2022-10-25 13:11:07 -07:00
Alina Sbirlea	d1b19da854	[LoopPeeling] Add flag to disable support for peeling loops with non-latch exits Add a flag to allow disabling the changes in https://reviews.llvm.org/D134803. Differential Revision: https://reviews.llvm.org/D136643	2022-10-25 12:19:14 -07:00
Momchil Velikov	c47739b45c	[FuncSpec] Consider small noinline functions for specialisation Small functions with size under a given threshold are not considered for specialisaion on the presumption that they are easy to inline. This does not apply to `noinline` functions, though. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135862	2022-10-25 19:49:04 +01:00
Fangrui Song	a527bda520	[LegacyPM] Remove DataFlowSanitizerLegacyPass Using the legacy PM for the optimization pipeline was deprecated in 13.0.0. Following recent changes to remove non-core features of the legacy PM/optimization pipeline, remove DataFlowSanitizerLegacyPass. Differential Revision: https://reviews.llvm.org/D124594	2022-10-25 10:55:29 -07:00
Simon Pilgrim	50fe87a5c8	[Transforms] classifyArgUse - don't deference pointer before null test Reported here: https://pvs-studio.com/en/blog/posts/cpp/1003/ (N11)	2022-10-25 17:24:00 +01:00
Yaxun (Sam) Liu	9d5adc7e49	Revert "reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost" This reverts commit `bd7949bcd8`. Revert this patch since reviwers have different opinions regarding the approach in post-commit review. Will open RFC for further discussion. Differential Revision: https://reviews.llvm.org/D132408	2022-10-25 12:15:39 -04:00
zhongyunde	620cff096a	[InstCombine] Fold series of instructions into mull for more types Relax the constraint of wide/vectors types. Address the comment https://reviews.llvm.org/D136015?id=469189#inline-1314520 Reviewed By: spatel, chfast Differential Revision: https://reviews.llvm.org/D136661	2022-10-25 23:04:46 +08:00
Nico Weber	76745d2b58	Revert "[PGO] Make emitted symbols hidden" This reverts commit `04877284b4`. Looks like this is still breaking the test Profile-x86_64 :: instrprof-darwin-dead-strip.c (see comment on https://reviews.llvm.org/D135340).	2022-10-25 08:54:47 -04:00
LiaoChunyu	e6c8418aab	[ObjCARC][NFC] Fix defined but not used warning from D135041 Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D136665	2022-10-25 15:16:42 +08:00
Kevin Athey	31bfa4a69b	[MSAN] Add handleCountZeroes for ctlz and cttz. This addresses a bug where vector versions of ctlz are creating false positive reports. Depends on D136369 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D136523	2022-10-24 17:31:34 -07:00
Roy Sundahl	0c35b6165c	[ASAN] Don't inline when -asan-max-inline-poisoning-size=0 When -asan-max-inline-poisoning-size=0, all shadow memory access should be outlined (through asan calls). This was not occuring when partial poisoning was required on the right side of a variable's redzone. This diff contains the changes necessary to implement and utilize __asan_set_shadow_01() through __asan_set_shadow_07(). The change is necessary for the full abstraction of the asan implementation and will enable experimentation with alternate strategies. Differential Revision: https://reviews.llvm.org/D136197	2022-10-24 14:17:59 -07:00
Sanjay Patel	5dcfc32822	[InstCombine] allow more commutative matches for logical-and to select fold This is a sibling transform to the fold just above it. That was changed to allow the corresponding commuted patterns with: `3073074562` `e1bd759ea5` `8628e6df70`	2022-10-24 16:40:43 -04:00
Yaxun (Sam) Liu	bd7949bcd8	reland `e5581df60a` [SimplifyCFG] accumulate bonus insts cost Fixed compile time increase due to always constructing LocalCostTracker. Now only construct LocalCostTracker when needed.	2022-10-24 15:43:53 -04:00
Craig Topper	1edc51b56a	[InstCombine] Explicitly check for scalable TypeSize. Instead of assuming it is a fixed size. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D136517	2022-10-24 12:29:06 -07:00
Alex Brachet	04877284b4	[PGO] Make emitted symbols hidden This was reverted because it was breaking when targeting Darwin which tried to export these symbols which are now hidden. It should be safe to just stop attempting to export these symbols in the clang driver, though Apple folks will need to change their TAPI allow list described in the commit where these symbols were originally exported `f538018562` Bug: https://github.com/llvm/llvm-project/issues/58265 Differential Revision: https://reviews.llvm.org/D135340	2022-10-24 19:05:10 +00:00
Alexey Bataev	da4e0f7ac5	[SLP][NFC]Fix PR58476: Fix compile time for reductions, NFC. Improve O(N^2) to O(N) in some cases, reduce number of allocations by reserving memory. Also, improve analysis of loads reduction values to avoid analysis of not vectorizable cases.	2022-10-24 10:13:24 -07:00
zhongyunde	81713e893a	[InstCombine] Fold series of instructions into mull The following sequence should be folded into in0 * in1 In0Lo = in0 & 0xffffffff; In0Hi = in0 >> 32; In1Lo = in1 & 0xffffffff; In1Hi = in1 >> 32; m01 = In1Hi * In0Lo; m10 = In1Lo * In0Hi; m00 = In1Lo * In0Lo; addc = m01 + m10; ResLo = m00 + (addc >> 32); Reviewed By: spatel, RKSimon Differential Revision: https://reviews.llvm.org/D136015	2022-10-25 01:09:37 +08:00
Kevin P. Neal	cfb88ee3ba	[StrictFP][IPSCCP] Constant fold intrinsics with metadata arguments This teaches the SCCP Solver how to constant fold more intrinsics. Constant folding appears to be just as good as D115737 but much, much lower in code change impact as suggested by nikic. The constrained floating-point intrinsics all take at least one metadata argument and were the motivation for the change. Differential Revision: https://reviews.llvm.org/D136466	2022-10-24 11:43:20 -04:00
Ahmed Bougacha	bddd9b6b91	[InstCombine] Combine ptrauth sign/resign + auth/resign intrinsics. (sign\|resign) + (auth\|resign) can be folded by omitting the middle sign+auth component if the key and discriminator match. Differential Revision: https://reviews.llvm.org/D132383	2022-10-24 08:03:14 -07:00
Matt Arsenault	597b9b7e8e	CodeExtractor: Fix assertion with non-0 alloca address spaces emitCallAndSwitchStatement creates placeholder allocas to pass to these, so the types need to match.	2022-10-23 15:16:55 -07:00
Mike Hommey	86e57e66da	[InstCombine] Bail out of casting calls when a conversion from/to byval is involved. Fixes #58307 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135738	2022-10-23 09:49:48 +02:00
Kazu Hirata	3f8d2c917c	Ensure newlines at the end of files (NFC)	2022-10-22 09:29:40 -07:00
Sanjay Patel	8628e6df70	[InstCombine] use freeze to enable poison-safe logic->select fold Without a freeze, this transform can leak poison to the output: https://alive2.llvm.org/ce/z/GJuF9i This makes the transform as uniform as possible, and it can help reduce patterns like issue #58313 (although that particular example probably still needs another transform). Differential Revision: https://reviews.llvm.org/D136527	2022-10-22 10:42:14 -04:00
Thomas Symalla	fc26a75280	[NFC] Fixed several misspellings of "Splitter" in Scalarizer Spliiter => Splitter	2022-10-22 15:13:56 +02:00
Sanjay Patel	e1bd759ea5	[InstCombine] allow more matches for logical-ands --> select This allows patterns with real 'and' instructions because those are safe to transform: https://alive2.llvm.org/ce/z/7-U_Ak	2022-10-22 08:15:50 -04:00
Arthur Eubanks	4153f989ba	[ObjCARC] Remove legacy PM versions of optimization passes This doesn't touch objc-arc-contract because that's in the codegen pipeline. However, this does move its corresponding initialize function into initializeCodegen(). Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135041	2022-10-21 13:40:54 -07:00
Sanjay Patel	3073074562	[InstCombine] allow more commutative matches for logical-and to select fold When the common value is part of either select condition, this is safe to reduce. Otherwise, it is not poison-safe (with the select form of the pattern): https://alive2.llvm.org/ce/z/FxQTzB This is another patch motivated by issue #58313.	2022-10-21 13:29:13 -04:00
Wael Yehia	461a1836d3	[PGO][AIX] Improve dummy var retention and allow -bcdtors:csect linking. 1) Use a static array of pointer to retain the dummy vars. 2) Associate liveness of the array with that of the runtime hook variable __llvm_profile_runtime. 3) Perform the runtime initialization through the runtime hook variable. 4) Preserve the runtime hook variable using the -u linker flag. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D136192	2022-10-21 16:32:42 +00:00
Sanjay Patel	d7fecf26f4	[InstCombine] allow some commutative matches for logical-and to select fold This is obviously correct for real logic instructions, and it also works for the poison-safe variants that use selects: https://alive2.llvm.org/ce/z/wyHiwX This is motivated by the lack of 'xor' folding seen in issue #58313. This more general fold should help reduce some of those patterns, but I'm not sure if this specific case does anything for that particular example.	2022-10-21 11:28:38 -04:00
Sanjay Patel	f6fc3e23b9	[InstCombine] refactor matching code for logical ands; NFCI Separating the matches makes it easier to enhance for commutative patterns.	2022-10-21 11:28:38 -04:00
Sanjay Patel	bf75e937bb	[InstCombine] match logical and/or more generally in fold to select This allows the regular bitwise logic opcodes in addition to the poison-safe select variants: https://alive2.llvm.org/ce/z/8xB9gy Handling commuted variants safely is likely trickier, so that's left to another patch.	2022-10-21 09:03:36 -04:00
Florian Hahn	fd236772f5	[IndVars] Forget SCEV for value after simplifying condition. Additional SCEV verification highlighted a case where the cached loop dispositions where incorrect after simplifying a condition in IndVars and moving the user in LoopDeletion. Fix it by invalidating ICmp and all its users. Fixes #58515.	2022-10-21 11:18:01 +01:00
Nikita Popov	e9754f0211	[IR] Add support for memory attribute This implements IR and bitcode support for the memory attribute, as specified in https://reviews.llvm.org/D135597. The new attribute is not used for anything yet (and as such, the old memory attributes are unaffected). Differential Revision: https://reviews.llvm.org/D135592	2022-10-21 12:11:25 +02:00
Florian Hahn	7eb4ec1c75	[VPlan] Print predicates for widened cmp instructions (NFC).	2022-10-21 08:54:11 +01:00
Michael Francis	922f42d531	[clang][AIX] Fix mcount name and call arguments Currently, compiling a program with the `-pg` flag will result in an undefined symbol error for `.mcount`. This revision fixes the call to use `__mcount`, which requires a pointer argument to a pointer-sized object (unique per inserted call) on AIX. This is only a partial fix. This patch should fix the `-pg` flag's behaviour on AIX to work with code you are compiling, but it will not link against standard libraries with `mcount` instrumentation calls. The next step is to add profiled libraries to the linker search paths in the Clang driver for the AIX toolchain when linking with `-pg`. Differential Review: https://reviews.llvm.org/D135384	2022-10-20 16:20:00 -04:00
William Huang	6c767cef5a	[InstCombine] Canonicalize GEP of GEP by swapping constant-indexed GEP to the back Canonicalize GEP of GEP by swapping GEP with some suffix constant indices to the back (and GEP with all constant indices to the back of that), this allows more constant index GEP merging to happen. Exceptions are: If swapping violates use-def relations, or anti-optimizes LICM For constant indexed GEP of GEP, if they cannot be merged directly, they will be casted to i8* and merged. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D125845	2022-10-20 17:41:26 +00:00
Paul Walker	ab8257ca0e	[NFC] Fix a few whitespace inconsistencies.	2022-10-20 14:52:25 +00:00
OCHyams	f825214411	[DebugInfo][NFC] Refactor debug intrinsic copy and delete to instead just move Reviewed By: jryans Differential Revision: https://reviews.llvm.org/D133304	2022-10-20 15:12:49 +01:00
Florian Hahn	e25ed058bc	[LV] Use buildScalarSteps to also handle VF = 1. (NFCI) The code in buildScalarSteps already properly handles creating the scalar induction values with VF = 1. Use it directly instead of using extra code to handle that case. Suggested by @Ayal in D133760.	2022-10-20 14:30:01 +01:00
Nikita Popov	7c32c7e777	Reapply [FunctionAttrs] Make location classification more precise Reapplying after the fix for volatile modelling in D135863. ----- Don't add argmem if the pointer is clearly not an argument (e.g. a global). I don't think this makes a difference right now, but gives more obvious results with D135780.	2022-10-20 15:13:20 +02:00
Nabeel Omer	e1fd6d49a3	[InstCombine] Fix assert condition in `foldSelectShuffleOfSelectShuffle` Bug introduced in `e239198cdb`. The assert() is making an assumption that the resulting shuffle mask will always select elements from both vectors, this is untrue in the case of two shuffles being folded if the former shuffle has a mask with undef elements in it. In such a case folding the shuffles might result in a mask which only selects from one of the vectors because the other elements (in the mask) are undef. Differential Revision: https://reviews.llvm.org/D136256	2022-10-20 12:10:54 +00:00
Florian Hahn	3a4aa24fd1	[LoopSimplifyCFG] Forget loop and block dispos after merging blocks. This fixes another case where block and loop dispositions weren't properly invalidate after changing the CFG. Fixes #58489.	2022-10-20 11:23:29 +01:00
Nikita Popov	bed31153b7	[FuncAttrs] Extract code for adding a location access (NFC) This code is the same for accesses from call arguments and for accesses from other (single-location) instructions. Extract i into a common function.	2022-10-20 12:01:27 +02:00
Nikita Popov	f2fe289374	[FunctionAttrs] Volatile operations can access inaccessible memory Per LangRef, volatile operations are allowed to access the location of their pointer argument, plus inaccessible memory: > Any volatile operation can have side effects, and any volatile > operation can read and/or modify state which is not accessible > via a regular load or store in this module. > [...] > The allowed side-effects for volatile accesses are limited. If > a non-volatile store to a given address would be legal, a volatile > operation may modify the memory at that address. A volatile > operation may not modify any other memory accessible by the > module being compiled. A volatile operation may not call any > code in the current module. FunctionAttrs currently does not model this and ends up marking functions with volatile accesses on arguments as argmemonly, even though they should be inaccessiblemem_or_argmemonly. Differential Revision: https://reviews.llvm.org/D135863	2022-10-20 11:57:10 +02:00
Alexey Bataev	b8b740c834	[SLP][NFC]Remove unused variable, NFC.	2022-10-19 12:35:27 -07:00
Fangrui Song	c80b12d352	Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally" This reverts commit `8ef3fd8d59`. I mentioned that GlobalAlias was not handled. It turns out GlobalAlias has to be handled in the same patch (as opposed to in a follow-up), as otherwise clang codegen of C5/D5 constructor/destructor would regress (https://reviews.llvm.org/D135427#3869003).	2022-10-19 11:24:12 -07:00
Florian Hahn	d72fcee8f4	[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC). @Ayal suggested a better named helper than using `!getDef()` to check if a value is invariant across all parts. The property we are using here is that the VPValue is defined outside any vector loop region. There's a TODO left to handle recipes defined in pre-header blocks. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D133666	2022-10-19 13:20:30 +01:00
bipmis	38f3e44997	[AggressiveInstCombine] Load merge the reverse load pattern of consecutive loads. This patch extends the load merge/widen in AggressiveInstCombine() to handle reverse load patterns. Differential Revision: https://reviews.llvm.org/D135137	2022-10-19 11:22:58 +01:00
Nikita Popov	747f27d97d	[AA] Rename getModRefBehavior() to getMemoryEffects() (NFC) Follow up on D135962, renaming the method name to match the new type name.	2022-10-19 11:03:54 +02:00
Nikita Popov	1a9d9823c5	[AA] Rename uses of FunctionModRefBehavior (NFC) Followup to D135962 to rename remaining uses of FunctionModRefBehavior to MemoryEffects. Does not touch API names yet, but also updates variables names FMRB/MRB to ME, to match the new type name.	2022-10-19 10:54:47 +02:00
Alexey Bataev	087dadfd37	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 11:55:59 -07:00
Alexey Bataev	62267e8de0	Revert "[SLP]Generalize cost model." This reverts commit `f12fb91188` and `f5c747bfbe` to fix detected non-initialized var use.	2022-10-18 11:25:59 -07:00
Sjoerd Meijer	f7c42a278b	Revert "Recommit "[LoopFlatten] Enable it by default"" This reverts commit `5b9597f59a`. A miscompilation was reported: https://github.com/llvm/llvm-project/issues/58441 Reverting this while I look at that.	2022-10-18 23:36:36 +05:30
Alexey Bataev	f5c747bfbe	[SLP][NFC]Fix a warning for ?: with enum/unsigned, NFC.	2022-10-18 10:08:05 -07:00
Florian Hahn	c65513444b	[IndVars] Forget SCEV for instruction and users before replacing it. Extra invalidation is needed here to clear stale values to fix a verification failure. Fixes #58440.	2022-10-18 17:38:14 +01:00
Alexey Bataev	f12fb91188	[SLP]Generalize cost model. Generalized the cost model estimation. Improved cost model estimation for repeated scalars (no need to count their cost anymore), improved cost model for extractelement instructions. cpu2017 511.povray_r 0.57 520.omnetpp_r -0.98 521.wrf_r -0.01 525.x264_r 3.59 <+ 526.blender_r -0.12 531.deepsjeng_r -0.07 538.imagick_r -1.42 Geometric mean: 0.21 Differential Revision: https://reviews.llvm.org/D115757	2022-10-18 08:49:32 -07:00
Arthur Eubanks	6219ec07c6	[SROA] Don't speculate phis with different load user types Fixes an SROA crash. Fallout from opaque pointers since with typed pointers we'd bail out at the bitcast. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D136119	2022-10-18 08:44:13 -07:00
Sanjay Patel	44b7da89d7	[InstCombine] fmul nnan X, 0.0 --> copysign(0.0, X) https://alive2.llvm.org/ce/z/ybgM5F Differential Revision: https://reviews.llvm.org/D136166	2022-10-18 11:34:02 -04:00
Sanjay Patel	d16989607b	[InstCombine] reduce code duplication in visitBranchInst(); NFCI	2022-10-18 11:34:02 -04:00
Florian Hahn	a8e9742bd4	[IndVarSimplify] Clear block and loop dispositions after moving instr. Moving an instruction can invalidate the cached block dispositions of the corresponding SCEV. Invalidate the cached dispositions. Also fixes a copy-paste error in forgetBlockAndLoopDispositions where the start expression S was removed from BlockDispositions in the loop but not the current values. This was also exposed by the new test case. Fixes #58439.	2022-10-18 16:18:14 +01:00
Alexey Bataev	e79532d28c	[SLP][NFC]Try to fix MSVC buildbots with a workaround, NFC.	2022-10-18 07:50:10 -07:00
uabkaka	da137d041b	[SimplifyLibCalls] Add NoUndef/NonNull/Dereferenceable attributes to iprintf/siprintf When SimplifyLibCalls fail to optimize printf and sprintf it add NoUndef/NonNull/Dereferenceable attributes. This patch add the same attributes if SimplifyLibCalls optimize printf/sprintf into the integer only iprintf/siprintf. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D136140	2022-10-18 16:36:35 +02:00
Alexey Bataev	6a6fc4890d	[SLP][NFC]Formatting of the getEntryCost function, NFC.	2022-10-18 07:18:26 -07:00
Florian Hahn	e302fa89aa	[LoopUnroll] Forget exit values when making changes. When unrolling, the exit values in LCSSA phis will get updated. Invalidate cached SCEV values for those phis in case SCEV looked through a exit phi. Fixes #58340.	2022-10-18 15:12:24 +01:00
Max Kazantsev	f884a4c957	[NFC] Reuse NonTrivialUnswitchCandidate instead of std::pair	2022-10-18 14:00:53 +07:00
Arthur Eubanks	308b4bca14	[NFC][SROA] Update comment to use opaque pointers for clarity	2022-10-17 16:37:29 -07:00
Daniel Sanders	021e6e05d3	[instsimplify] Move (extelt (inselt Vec, Value, Index), Index) -> Value from InstCombine As requested in https://reviews.llvm.org/D135625#3858141 Differential Revision: https://reviews.llvm.org/D136099	2022-10-17 15:22:06 -07:00
Matthias Braun	6d972ad2d8	ControlHeightReduction: Remove assert check in shouldApply Remove assertion checking for non-empty `ProfileSummaryInfo`. Differential Revision: https://reviews.llvm.org/D133706	2022-10-17 13:10:13 -07:00
Florian Hahn	6db71b8f14	[ConstraintElim] Use helper to allow overflow for coefficients of GEPs If the arithmetic for indices of inbounds GEPs overflows, the result is poison. This means it is also OK for the coefficients to overflow. GEP decomposition is limited to cases where the index size is <= 64 bit, which can be represented by int64_t used for the coefficients in the constraint system.	2022-10-17 20:30:43 +01:00
Sjoerd Meijer	5b9597f59a	Recommit "[LoopFlatten] Enable it by default" The sanitizer bots turned green again after another change went in, i.e. revert `26dd64ba9c`, so I don't think this patch was causing the problems.	2022-10-17 23:27:19 +05:30
Sjoerd Meijer	a71c4e4fbb	Revert "[LoopFlatten] Enable it by default" This reverts commit `233659c7ae`. I see some sanitizer build bot failures. Not sure if it is change causing it, but let's see if a revert returns the bots to green...	2022-10-17 22:14:20 +05:30
Sanjay Patel	8d76fbb5f0	[VectorCombine] fix crashing on match of non-canonical fneg We can't assume that operand 0 is the negated operand because the matcher handles "fsub -0.0, X" (and also +0.0 with FMF). By capturing the extract within the match, we avoid the bug and make the transform more robust (can't assume that this pass will only see canonical IR).	2022-10-17 10:47:48 -04:00
Nikita Popov	779fd39684	Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify Relative to the previous attempt, this is rebased over the InstSimplify fix in `ac74e7a780`, which addresses the miscompile reported in PR58401. ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes https://github.com/llvm/llvm-project/issues/57448, which was my original motivation for the change. Differential Revision: https://reviews.llvm.org/D134954	2022-10-17 16:11:05 +02:00
Florian Hahn	699396131f	Revert "Reapply [InstCombine] Switch foldOpIntoPhi() to use InstSimplify" This reverts commit `333246b48e`. It looks like this patch causes a mis-compile: https://github.com/llvm/llvm-project/issues/58401 Fixes #58401.	2022-10-17 12:56:28 +01:00
Sjoerd Meijer	233659c7ae	[LoopFlatten] Enable it by default LoopFlatten has been in the code base off by default for years, but this enables it to run by default. Downstream this has been running for years, so it has been exposed to quite some code. Then around the time we switched to the NPM, several fixes went in related to updating the MemorySSA state and we moved it to a loop pass manager, which both helped preventing rerunning certain analysis passes, and thus helped a bit with compile-times. About compile-times, adding a pass isn't free, but this should see only very minor increases. The pass is relatively simple and there shouldn't be anything algorithmically expensive because all it does is looking at inner/outer loops and it checks assumptions on loop increments and indices. If we see increases, I expect this to mainly come from invalidation of analysis info, and perhaps subsequent passes to trigger and do more. Despite its simplicity/restrictions, it triggers in most code-bases, which makes it worth to enable this by default. Differential Revision: https://reviews.llvm.org/D109958	2022-10-17 17:11:39 +05:30
Chuanqi Xu	1cedc51ff5	[Coroutines] Don't merge readnone calls in presplit coroutines Another alternative to fix the thread identification problem in coroutines. We plan to fix this problem by unifying memory effecting attributes. See https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579. But it may be a long-term project. And it is a pity that the coroutines can't resume in different threads for years. So this one is temporary fix. It may cause unnecessary performance regression for coroutines. But correctness are more important. And this one is planned to be reverted after we are able to unify the memory effecting attributes actually. Reviewed By: jdoerfert, rjmccall Differential Revision: https://reviews.llvm.org/D135550	2022-10-17 10:22:43 +08:00
Kazu Hirata	5ea3155565	[llvm] Use llvm::find (NFC)	2022-10-16 16:21:00 -07:00
Florian Hahn	462ab9810d	[ConstraintElim] Fix signed integer overflow for inbounds GEP. For inbounds GEPs, signed overflow yields poison, so it is fine for the coefficients to wrap as well. This fixes an UBSan failure.	2022-10-16 23:25:28 +01:00
Florian Hahn	aec0c1009f	[ConstraintElim] Replace custom GEP index handling by using existing code Instead of duplicating the existing decomposition code for GEP indices just use the existing code by calling the existing decompose function on the index expression and multiply the result's coefficients by the scale of the index. This both reduces code duplication and generalizes the pattern we can handle.	2022-10-16 21:53:11 +01:00
Florian Hahn	a4635ec710	[ConstraintElim] Support `add nsw` for unsigned preds with positive ops. If both operands of an `add nsw` are known positive, it can be treated the same as `add nuw` and added to the unsigned system. https://alive2.llvm.org/ce/z/6gprff	2022-10-16 20:25:14 +01:00
Sanjay Patel	e5ee0b06d6	[InstCombine] try to determine "exact" for sdiv If the divisor is a power-of-2 or negative-power-of-2 and the dividend is known to have >= trailing zeros than the divisor, the division is exact: https://alive2.llvm.org/ce/z/UGBksM (general proof) https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests) This isn't the most direct optimization (we could create ashr in these examples instead of relying on existing folds for exact divides), but it's possible that there's a more general constraint than just a pow2 divisor, so this might be extended in the future. This should solve issue #58348. Differential Revision: https://reviews.llvm.org/D135970	2022-10-16 10:59:56 -04:00
Sanjay Patel	340ae45be0	[InstCombine] use isKnownNonNegative() for readability; NFCI This should be functionally equivalent - both calls are thin wrappers around computeKnownBits(). We'll probably want to use known-bits directly in follow-up patches because that could determine "exact" for example (see issue #58348).	2022-10-16 10:59:56 -04:00
Kazu Hirata	b2f41e9ac1	[Vectorize] Use std::conditional_t (NFC)	2022-10-15 14:52:25 -07:00
Florian Hahn	7c1b80e35c	[ConstraintElim] Support unsigned decomposition of mul/shl nuw..const Support decomposition for `mul/shl nuw` with constant operand for unsigned queries. Those expressions should not wrap in the unsigned sense and can be added directly to the unsigned system.	2022-10-15 21:28:08 +01:00
Florian Hahn	f12684d36e	[ConstraintElim] Support signed decomposition of `add nsw`. Add support decomposition for `add nsw` for signed queries. `add nsw` won't wrap and can be directly added to the signed system.	2022-10-15 18:34:03 +01:00
Zequan Wu	82035ec777	Revert "[PGO] Make emitted symbols hidden" This reverts commit `ecac223b0e`. The commit causes instrprof-darwin-dead-strip.c to fail on mac.	2022-10-14 15:23:26 -07:00
Florian Hahn	16cf666bb7	[Loop] Move block and loop dispo invalidation to makeLoopInvariant. makeLoopInvariant may recursively move its operands to make them invariant, before moving the passed in instruction. Those recursively moved instructions are currently missed when invalidating block and loop dispositions. To address this, move the invalidation code to Loop::makeLoopInvariant. Fixes #58314. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135909	2022-10-14 21:58:14 +01:00
Zain Jaffal	0c8dde551c	[ConstraintElimination] Move logic for replacing ssub overflow users (NFC) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134044	2022-10-14 21:14:21 +01:00
Argyrios Kyrtzidis	d877e3fe71	[Transforms/ObjCARC] Fix non-deterministic output of `ObjCARCOptPass` `ProvenanceAnalysis::related()` was assuming that the order of parameters for `relatedCheck()` was not affecting the result but this was not the case when both parameters were `PHINode`s. Due to this assumption `ProvenanceAnalysis::related()` was ordering the parameters based on pointer value which resulted in non-deterministic behavior. To address this change `relatedPHI()` so that it gives the same result independent of the parameter order. rdar://100325456 Differential Revision: https://reviews.llvm.org/D135376	2022-10-14 12:26:58 -07:00
Craig Topper	d3366efd43	[LV] Simplify register usage code and avoid double map lookups. NFC Instead of checking whether a map entry exists to decide if we should initialize it or add to it, we can rely on the map entry being constructed and initialized to 0 before the addition happens. For the std::max case, I've made a reference to the map entry to avoid looking it up twice. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D135977	2022-10-14 11:55:48 -07:00
Florian Hahn	5a68e578ca	[ConstraintElim] Add debug message when decomposition fails.	2022-10-14 11:02:05 +01:00
Wolfgang Pieb	b43a1d1bd9	[PGO] Do not create block count annotations when all weights are 0, avoiding an assertion. A BB with a nonzero count, whose successor blocks all have 0 counts, could cause an assertion. Don't create any branch weights in this case. Reviewed By: xur Differential Revision: https://reviews.llvm.org/D134203	2022-10-13 14:57:42 -07:00
Sanjay Patel	d85505a932	[InstCombine] fold logical and/or to xor (A \| B) & ~(A & B) --> A ^ B https://alive2.llvm.org/ce/z/qpFMns We already have the equivalent fold for real logic instructions, but this pattern may occur with selects too. This is part of solving issue #58313.	2022-10-13 16:12:20 -04:00
Florian Hahn	572d5d374c	[ConstraintElim] Add support for GEPs with multiple indices. Lift restriction on GEPs with a single index by iterating over all indices and joining the {Coefficient, Variable} entries for all indices together.	2022-10-13 21:08:33 +01:00
Alex Brachet	ecac223b0e	[PGO] Make emitted symbols hidden This was reverted because it was breaking when targeting Darwin which tried to export these symbols which are now hidden. It should be safe to just stop attempting to export these symbols in the clang driver, though Apple folks will need to change their TAPI allow list described in the commit where these symbols were originally exported `f538018562` Bug: https://github.com/llvm/llvm-project/issues/58265 Differential Revision: https://reviews.llvm.org/D135340	2022-10-13 19:47:15 +00:00
Florian Hahn	71c49d189a	[ConstraintElim] Move check-and-replace logic to helper function (NFC). Move logic to check and replace conditions to a helper function. This isolates the code, allows using early returns, reduces the indentation and simplifies eliminateConstraints.	2022-10-13 18:58:37 +01:00
Nikita Popov	b54b84fde6	[MemCpyOpt] Add additional debug output (NFC)	2022-10-13 17:03:44 +02:00
Alexey Bataev	c787986cdd	[SLP]Improve costs of vectorized loads/stores by analyzing GEPs. When generating masked gathers nodes, SLP vectorizer accounts the cost of the GEPs for loads as part of the scalar-vector transformation cost estimation. But it does not do it for vectorized loads/stores, while it may completely remove some of the GEPs completely. Because of this in some cases masked gather operation can be much more profitable rather than regular vectorization (masked-gather cost + vector GEP - scalar loads + GEPs comparing to vectorized loads - scalar loads). Added the analysis of the removed scalarGEPs for vectorized load/store nodes for better cost estimation. Differential Revision: https://reviews.llvm.org/D135282	2022-10-13 07:20:41 -07:00
Philip Reames	fe755af3a9	Revert "Remove PlaceSafepoints pass" This reverts commit `cb66e123c6`. It was reported via https://reviews.llvm.org/rGcb66e123c6bc82a793300b6fb3ecbed79c58f557#1132969 that the Microsoft.NET compiler is still using this pass.	2022-10-13 07:17:25 -07:00
Florian Hahn	019049a1ca	[ConstraintElim] Use MulOverflow to avoid UB on signed overflow. This fixes an UBSan failure after `359bc5c541`. For inbounds GEP with index sizes <= 64, having the coefficients overflowing is fine.	2022-10-13 13:57:43 +01:00
Nikita Popov	d44cd1bbeb	Revert "[FunctionAttrs] Make location classification more precise" This reverts commit `b05f5b90a1`. There are thread sanitizer buildbot failures in simple_stack.c. I think that's because this ended up affecting the handling of volatile accesses to allocas. Reverting for now.	2022-10-13 12:11:04 +02:00
Nikita Popov	b05f5b90a1	[FunctionAttrs] Make location classification more precise Don't add argmem if the pointer is clearly not an argument (e.g. a global). I don't think this makes a difference right now, but gives more obvious results with D135780.	2022-10-13 11:24:23 +02:00
Florian Hahn	359bc5c541	[ConstraintElim] Bail out for GEPs when index size > 64 bits. Limit pointer decomposition to pointers with index sizes of at most 64 bits. int64_t is used for coefficients, so as long as the index size <= 64 bits we should be able to represent all pointer offsets. Pointer decomposition is limited to inbounds GEPs, so if a index computation would overflow the result is poison, so it doesn't matter that the coefficient overflows. This allows replacing MulOverflow with regular multiplications.	2022-10-13 10:19:30 +01:00
Nikita Popov	440ce05fbf	[FunctionAttrs] Handle potential access of captured argument We have to account for accesses to argument memory via captures. I don't think there's any way to make this produce incorrect results right now (because as soon as "other" is set, we lose the ability to infer argmemonly), but this avoids incorrect results once we have more precise representation.	2022-10-13 11:15:36 +02:00
Nikita Popov	5b3776842f	[FunctionAttrs] Account for memory effects of inalloca/preallocated The code for inferring memory attributes on arguments claims that inalloca/preallocated arguments are always clobbered: `d71ad41080/llvm/lib/Transforms/IPO/FunctionAttrs.cpp (L640-L642)` However, we would still infer memory attributes for the whole function without taking this into account, so we could still end up inferring readnone for the function. This adds an argument clobber if there are any inalloca/preallocated arguments. Differential Revision: https://reviews.llvm.org/D135783	2022-10-13 10:20:17 +02:00
Florian Hahn	0ebd288338	[ConstraintElim] Move GEP decomposition code to separate fn (NFC). Breaks up a large function and allows for the use to early exits.	2022-10-12 20:39:05 +01:00
Arthur Eubanks	f59e1bcc22	[PrintPipeline] Handle CoroConditionalWrapper and add more verification Add a check (can be disabled via a flag) that the pipeline we generate is actually parsable. Can be disabled because we don't expect to handle every pass in -print-pipeline-passes. Fixes #58280. Reviewed By: ChuanqiXu Differential Revision: https://reviews.llvm.org/D135703	2022-10-12 09:36:45 -07:00
Sanjay Patel	7b9482df3d	[InstCombine] fold sdiv with common shl amount in operands (X << Z) / (Y << Z) --> X / Y https://alive2.llvm.org/ce/z/CLKzqT This requires a surprising "nuw" constraint because we have to guard against immediate UB via signed-div overflow with -1 divisor. This extends `008a89037a` and is another transform derived from issue #58137.	2022-10-12 11:32:15 -04:00
Alexey Bataev	d71ad41080	[SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash. Need to set the insertpoint for extractelement to point to the first instruction in the node to avoid possible crash during external uses combine process. Without it we may endup with the incorrect transformation. Differential Revision: https://reviews.llvm.org/D135591	2022-10-12 08:18:30 -07:00
Sanjay Patel	008a89037a	[InstCombine] fold udiv with common shl amount in operands (X << Z) / (Y << Z) --> X / Y https://alive2.llvm.org/ce/z/E5eaxU This fixes the motivating example from issue #58137, but it is not the most general transform. We should probably also convert left-shift in the divisor to right-shift in the dividend for that, but that exposes another missed canonicalization for shifts and adds.	2022-10-12 11:12:26 -04:00
Jordan Rupprecht	cbae57c0e1	[NFC] Ignore unused var in no-asserts builds	2022-10-12 08:11:10 -07:00
Alexey Bataev	1be3428ea0	[SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze. Freeze instruction in some cases makes codegen worse, so need to be very careful when emitting it. Instead improve analysis in isUndefVector function to generate mask of unused elements and use it in the analysis. Differential Revision: https://reviews.llvm.org/D135382	2022-10-12 07:32:54 -07:00
Sanjay Patel	fe97f95036	[InstCombine] propagate "exact" through folds of div These folds were added recently with: `6b869be810` `8da2fa856f` ...but they didn't account for the "exact" attribute, and that can be safely propagated: https://alive2.llvm.org/ce/z/F_WhnR https://alive2.llvm.org/ce/z/ft9Cgr	2022-10-12 09:25:05 -04:00
Sanjay Patel	d117ee25b8	[InstCombine] add helper function for div+shl folds; NFC There are at least 2 similar patterns that could be added here, and the existing fold can be improved because it fails to propagate "exact".	2022-10-12 09:25:04 -04:00
Florian Hahn	c1fe52bfa6	[VPlan] Remove dead recipes before sinking. optimizeInductions may leave dead recipes which can prevent sinking. Sinking on the other hand should not introduce new dead recipes, so clean up dead recipes before sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D133762	2022-10-12 12:49:42 +01:00

... 3 4 5 6 7 ...

32159 Commits