llvm-project

Commit Graph

Author	SHA1	Message	Date
Congzhe Cao	c42772752a	[CodeMoverUtils] Enhance isSafeToMoveBefore() when control flow equivalence is satisfied With improved analysis in determining CFG equivalence that does not require strict dominance and post-dominance conditions, we now relax isSafeToMoveBefore() such that an instruction I can be moved before InsertPoint even if they do not strictly dominate each other, as long as they follow the same control flow path. For example, we can move Instruction 0 before Instruction 1, and vice versa. ``` if (cond1) // Instruction 0: %add = add i32 1, 2 if (cond1) // Instruction 1: %add2 = add i32 2, 1 ``` Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110456	2021-09-27 18:37:36 -04:00
modimo	20faf78919	[ThinLTO] Add noRecurse and noUnwind thinlink function attribute propagation Thinlink provides an opportunity to propagate function attributes across modules, enabling additional propagation opportunities. This change propagates (currently default off, turn on with `disable-thinlto-funcattrs=1`) noRecurse and noUnwind based off of function summaries of the prevailing functions in bottom-up call-graph order. Testing on clang self-build: 1. There's a 35-40% increase in noUnwind functions due to the additional propagation opportunities. 2. Throughput is measured at 10-15% increase in thinlink time which itself is 1.5% of E2E link time. Implementation-wise this adds the following summary function attributes: 1. noUnwind: function is noUnwind 2. mayThrow: function contains a non-call instruction that `Instruction::mayThrow` returns true on (e.g. windows SEH instructions) 3. hasUnknownCall: function contains calls that don't make it into the summary call-graph thus should not be propagated from (e.g. indirect for now, could add no-opt functions as well) Testing: Clang self-build passes and 2nd stage build passes check-all ninja check-all with newly added tests passing Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D36850	2021-09-27 12:28:07 -07:00
Sanjay Patel	fdba1dccbe	[InstCombine] reduce code for shl-of-sub transform; NFC	2021-09-27 14:56:01 -04:00
Sanjay Patel	623f93ed1c	[InstCombine] add use check to shl transform This bug was introduced with the refactoring in: `9075edc89b` ...but there were no tests to detect it.	2021-09-27 14:10:26 -04:00
Jameson Nash	e27a6db529	Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location We see that it might otherwise do: %10 = getelementptr {}, <2 x {}> %9, <2 x i32> <i32 10, i32 4> %11 = bitcast <2 x {}*> %10 to <2 x i64> ... %27 = extractelement <2 x i64> %11, i32 0 %28 = bitcast i64 %27 to <2 x i64>* store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2 Which is an out-of-bounds store (the extractelement got offset 10 instead of offset 4 as intended). With the fix, we correctly generate extractelement for i32 1 and generate correct code. Differential Revision: https://reviews.llvm.org/D106613	2021-09-27 14:06:13 -04:00
Kazu Hirata	59540b29f8	[InstCombine] Fix an "unused variable" warning	2021-09-27 09:49:32 -07:00
Sanjay Patel	9075edc89b	[InstCombine] move shl-only folds out from under commonShiftTransforms(); NFCI This is no-functional-change-intended, but it hopefully makes things slightly clearer and more efficient to have transforms that require 'shl' be called only from visitShl(). Further cleanup is possible.	2021-09-27 12:09:47 -04:00
Sanjay Patel	21429cf43a	[InstCombine] generalize fold for (trunc (X u>> C1)) u>> C This is another step towards trying to re-apply D110170 by eliminating conflicting transforms that cause infinite loops. `a47c8e40c7` was a previous patch in this direction. The diffs here are mostly cosmetic, but intentional: 1. The existing code that would handle this pattern in FoldShiftByConstant() is limited to 'shl' only now. The formatting change to IsLeftShift shows that we could move several transforms into visitShl() directly for efficiency because they are not common shift transforms. 2. The tests are regenerated to show new instruction names to prove that we are getting (almost) identical logic results. 3. The one case where we differ ("trunc_sandwich_small_shift1") shows that we now use a narrow 'and' instruction. Previously, we relied on another transform to do that, but it is limited to legal types. That seems to be a legacy constraint from when IR analysis and codegen were less robust. https://alive2.llvm.org/ce/z/JxyGA4 declare void @llvm.assume(i1) define i8 @src(i32 %x, i32 %c0, i8 %c1) { ; The sum of the shifts must not overflow the source width. %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %ov = icmp ult i32 %sum, 32 call void @llvm.assume(i1 %ov) %sh1 = lshr i32 %x, %c0 %tr = trunc i32 %sh1 to i8 %sh2 = lshr i8 %tr, %c1 ret i8 %sh2 } define i8 @tgt(i32 %x, i32 %c0, i8 %c1) { %z1 = zext i8 %c1 to i32 %sum = add i32 %c0, %z1 %maskc = lshr i8 -1, %c1 %s = lshr i32 %x, %sum %t = trunc i32 %s to i8 %a = and i8 %t, %maskc ret i8 %a }	2021-09-27 10:57:31 -04:00
Sanjay Patel	025a805d7c	[InstCombine] match variable names and code comments; NFC Similar to: `29c09c7` Planned follow-up is to add a transform here to allow removing a common shift fold that is conflicting with D110170.	2021-09-27 10:57:31 -04:00
Sjoerd Meijer	eba76056a3	[FuncSpec] Don't specialise (or crash) on poison or constexpr values Function specialization was crashing on poison values and constexpr values. The problem is that these values are not added to the solver, so it crashes when a lookup is performed for these values. This fixes that by not specialising on these values. For poison that is obvious, but for constexpr this is a change in behaviour. Thus, in one way this is a bit of a stopgap, but specialising on constexpr values wasn't done very intentionally, and need some more work and tests if we wanted to support this. As a follow up, we need to look if the solver should exit more gracefully and return a "don't know", or that it should really support these constexprs. This should fix PR51600 (https://bugs.llvm.org/show_bug.cgi?id=51600). Differential Revision: https://reviews.llvm.org/D110529	2021-09-27 14:58:53 +01:00
Jun Ma	3a998c06a8	Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""" This reverts commit `8ba2adcf9e`.	2021-09-27 20:39:05 +08:00
Nikita Popov	14a49f5840	[DSE] Don't check getUnderlyingObject() return value (NFC) getUnderlyingObject() never returns null. It will simply return something that is not the "root" underlying object. Also drop a stale comment.	2021-09-26 18:01:26 +02:00
Nikita Popov	f3c74b72f4	[DSE] Make DSEState non-copyable (NFC) As it contains a self-reference, the default copy/move ctors would not be safe. Move the DSEState::get() method into the ctor to make sure no move occurs here even without NRVO. This is a speculative fix for test failures on llvm-clang-x86_64-expensive-checks-win.	2021-09-26 17:54:38 +02:00
Sanjay Patel	6063e6b499	[InstCombine] move add after min/max intrinsic This is another regression noted with the proposal to canonicalize to the min/max intrinsics in D98152. Here are Alive2 attempts to show correctness without specifying exact constants: https://alive2.llvm.org/ce/z/bvfCwh (smax) https://alive2.llvm.org/ce/z/of7eqy (smin) https://alive2.llvm.org/ce/z/2Xtxoh (umax) https://alive2.llvm.org/ce/z/Rm4Ad8 (umin) (if you comment out the assume and/or no-wrap, you should see failures) The different output for the umin test is due to a fold added with `c4fc2cb5b2` : // umin(x, 1) == zext(x != 0) We probably want to adjust that, so it applies more generally (umax --> sext or patterns where we can fold to select-of-constants). Some folds that were ok when starting with cmp+select may increase instruction count for the equivalent intrinsic, so we have to decide if it's worth altering a min/max. Differential Revision: https://reviews.llvm.org/D110038	2021-09-26 09:49:10 -04:00
Nikita Popov	ba664d9066	[AA] Move earliest escape tracking from DSE to AA This is a followup to D109844 (and alternative to D109907), which integrates the new "earliest escape" tracking into AliasAnalysis. This is done by replacing the pre-existing context-free capture cache in AAQueryInfo with a replaceable (virtual) object with two implementations: The SimpleCaptureInfo implements the previous behavior (check whether object is captured at all), while EarliestEscapeInfo implements the new behavior from DSE. This combines the "earliest escape" analysis with the full power of BasicAA: It subsumes the call handling from D109907, considers a wider range of escape sources, and works with AA recursion. The compile-time cost is slightly higher than with D109907. Differential Revision: https://reviews.llvm.org/D110368	2021-09-25 22:40:41 +02:00
Nikita Popov	327bbbb10b	[DSE] Make capture check more precise It is sufficient that the object has not been captured before the load that produces the pointer we're loading. A capture after that can not affect the already loaded pointer. This is small part of D110368 applied separately.	2021-09-25 22:23:19 +02:00
Simon Pilgrim	5a14edd8ed	[InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold. We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078	2021-09-25 12:57:43 +01:00
Sanjay Patel	a47c8e40c7	[InstCombine] fold lshr(trunc(lshr X, C1)) C2 Only the multi-use cases are changing here because there's another fold that catches the simpler patterns. But that other fold is the source of infinite loops when we try to add D110170, so removing that is planned as a follow-up. Attempt to show the general proof in Alive2: https://alive2.llvm.org/ce/z/Ns1uS2 Note that the overshift fold-to-zero tests are not currently handled by instsimplify. If they were, we could assert that the shift amount sum is less than the source bitwidth.	2021-09-24 15:44:07 -04:00
Sanjay Patel	29c09c7653	[InstCombine] match variable names and code comments; NFC	2021-09-24 15:44:07 -04:00
Teresa Johnson	96cb97c453	[ThinLTO] Update combined index for SamplePGO indirect calls to locals In ThinLTO for locals we normally compute the GUID from the name after prepending the source path to get a unique global id. SamplePGO indirect call profiles contain the target GUID without this uniquification, however (unless compiling with -funique-internal-linkage-names). In order to correctly handle the call edges added to the combined index for these indirect calls, during importing and bitcode writing we consult a map of original to full GUID to identify the actual callee. However, for a large application this was consuming a lot of compile time as we need to do this repeatedly (especially during importing where we may traverse call edges multiple times). To fix this implement a suggestion in one of the FIXME comments, and actually modify the call edges during a single traversal after the index is built to perform the fixups once. I combined this fixup with the dead code analysis performed on the index in order to avoid adding an additional walk of the index. The dead code analysis is the first analysis performed on the index. This reduced the time required for a large thin link with SamplePGO by about 20%. No new test added, but I confirmed that there are existing tests that will fail when no fixup is performed. Differential Revision: https://reviews.llvm.org/D110374	2021-09-24 12:29:49 -07:00
Hans Wennborg	1e9afab875	Re-apply "[JumpThreading] Ignore free instructions" It seems the crashes we saw wasn't caused by this (see comments on the review). > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `4604695d7c`.	2021-09-24 18:52:30 +02:00
Florian Hahn	6f28fb7081	Recommit "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts the revert commit `df56fc6ebb`. This version of the patch adjusts the location where the EarliestEscapes cache is cleared when an instruction gets removed. The earliest escaping instruction does not have to be a memory instruction. It could be a ptrtoint instruction like in the added test @earliest_escape_ptrtoint, which subsequently gets removed. We need to invalidate the EarliestEscape entry referring to the ptrtoint when deleting it. This fixes the crash mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6	2021-09-24 17:13:27 +01:00
Sanjay Patel	3c5500907b	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)" This reverts commit `bb9333c350`. This exposes another existing bug that causes an infinite loop as shown in D110170 ...so reverting while I look at another fix.	2021-09-24 10:47:35 -04:00
Hans Wennborg	4604695d7c	Revert "[JumpThreading] Ignore free instructions" It caused compiler crashes, see comment on the code review for repro. > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `1e3c6fc7cb`.	2021-09-24 16:14:53 +02:00
Nico Weber	df56fc6ebb	Revert "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts commit `5ce89279c0`. Makes clang crash, see comments on https://reviews.llvm.org/D109844	2021-09-24 09:57:59 -04:00
Congzhe Cao	751be2a064	[CodeMoverUtils] Enhance isSafeToMoveBefore() when moving BBs When moving an entire basic block BB before InsertPoint, currently we check for all instructions whether the operands dominates InsertPoint, however, this can be improved such that even an operand does not dominate InsertPoint, as long as it appears as a previous instruction in the same BB, it is safe to move. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110378	2021-09-24 05:48:15 -04:00
David Sherwood	c2634fc6ab	[Analysis] Fix issues when querying vscale attributes on functions There are several places in the code that are currently broken as they assume an Instruction always has a parent Function when attempting to get the vscale_range attribute. This patch adds checks that an Instruction has a parent. I've added a test for a parentless @llvm.vscale intrinsic call here: unittests/Analysis/ValueTrackingTest.cpp Differential Revision: https://reviews.llvm.org/D110158	2021-09-24 09:58:10 +01:00
Nikita Popov	1e3c6fc7cb	[JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290	2021-09-23 18:28:36 +02:00
Fangrui Song	1a6e1ee42a	Resolve {GlobalValue,GloalIndirectSymol}::getBaseObject confusion While both GlobalAlias and GlobalIFunc are GlobalIndirectSymbol, their `getIndirectSymbol()` usage is quite different (GlobalIFunc's resolver is an entity different from GlobalIFunc itself). As discussed on https://lists.llvm.org/pipermail/llvm-dev/2020-September/144904.html ("[IR] Modelling of GlobalIFunc"), the name `getBaseObject` is confusing when used with GlobalIFunc. To resolve the confusion: * Move GloalIndirectSymol::getBaseObject to GlobalAlias:: (GlobalIFunc should use `getResolver` instead) * Change GlobalValue::getBaseObject not to inspect GlobalIFunc. Note: the function has 7 references. * Add GlobalIFunc::getResolverFunction to peel off potential ConstantExpr indirection (`strlen` in `test/LTO/Resolution/X86/ifunc.ll`) Note: GlobalIFunc::getResolver (like GlobalAlias::getAliasee which does not peel off ConstantExpr indirection) is kept to be used by ValueEnumerator. Reviewed By: ibookstein Differential Revision: https://reviews.llvm.org/D109792	2021-09-23 09:23:35 -07:00
Simon Pilgrim	5f2c53bdf4	Pass some DataLayout arguments by const-ref Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 15:50:31 +01:00
Sanjay Patel	bb9333c350	[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try) The 1st try at this was reverted because it caused an infinite loop in instcombine. That should be fixed after: `1cd6b44f26` (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-23 09:41:37 -04:00
Florian Hahn	5ce89279c0	[DSE] Track earliest escape, use for loads in isReadClobber. At the moment, DSE only considers whether a pointer may be captured at all in a function. This leads to cases where we fail to remove stores to local objects because we do not check if they escape before potential read-clobbers or after. Doing context-sensitive escape queries in isReadClobber has been removed a while ago in `d1a1cce5b1` to save compile-time. See PR50220 for more context. This patch introduces a new capture tracker, which keeps track of the 'earliest' capture. An instruction A is considered earlier than instruction B, if A dominates B. If 2 escapes do not dominate each other, the terminator of the common dominator is chosen. If not all uses cannot be analyzed, the earliest escape is set to the first instruction in the function entry block. If the query instruction dominates the earliest escape and is not in a cycle, then pointer does not escape before the query instruction. This patch uses this information when checking if a load of a loaded underlying object may alias a write to a stack object. If the stack object does not escape before the load, they do not alias. I will share a follow-up patch to also use the information for call instructions to fix PR50220. In terms of compile-time, the impact is low in general, NewPM-O3: +0.05% NewPM-ReleaseThinLTO: +0.05% NewPM-ReleaseLTO-g: +0.03 with the largest change being tramp3d-v4 (+0.30%) http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Compared to always computing the capture information on demand, we get the following benefits from the caching: NewPM-O3: -0.03% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.04% The biggest speedup is tramp3d-v4 (-0.21%). http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Overall there is a small, but noticeable benefit from caching. I am not entirely sure if the speedups warrant the extra complexity of caching. The way the caching works also means that we might miss a few cases, as it is less precise. Also, there may be a better way to cache things. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109844	2021-09-23 12:45:05 +01:00
Bjorn Pettersson	85a586501b	[BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor The NFC commit `e5692a564a` changed the logic for DomTreeUpdates to use the range [succ_begin, succ_begin) when looking for SuccsOfPredBB rather than using [succ_begin, succ_end). As the commit was NFC this is identified as a typo (it has been discussed briefly in phabricator). The typo was found when inspecting the code, so I've got no idea if changing back to the old range has any significant impact (such as solving any PR:s or causing some new problems). But at least this restores the code to the originally indented behavior.	2021-09-23 13:03:26 +02:00
Alex Richardson	05663dc146	[InstSimplify] Don't lose inbounds when simplifying a GEP I noticed this while working on a (ptrtoint (gep null, x)) -> x fold. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110168	2021-09-23 09:25:06 +01:00
Bjorn Pettersson	c5e0313e44	[ModuleInlinerWrapperPass] Do some naive printing of wrapped pipeline with -print-pipeline-passes Bisecting and reducing opt pipelines that includes the ModuleInlinerWrapperPass has turned out to be a bit problematic. This is far from perfect (it still lacks information about inline advisor params etc.), but it should give some kind of hint to what the wrapped pipeline looks like when using -print-pipeline-passes. Reviewed By: aeubanks, mtrofin Differential Revision: https://reviews.llvm.org/D109878	2021-09-23 09:54:42 +02:00
Johannes Doerfert	c6457dcae8	[OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state This patch fixes a problem when the AAKernelInfo state was invalidated, e.g., due to `optnone` for a kernel, but not all parts indicated the invalidation properly. We further eliminate most full state invalidations as they should never be necessary. Differential Revision: https://reviews.llvm.org/D109468	2021-09-23 00:04:30 -05:00
Johannes Doerfert	0a16c56010	[OpenMP][NFC] Improve debug output	2021-09-23 00:04:29 -05:00
hyeongyu kim	10a5632550	[NFC][InstCombine] Fix inconsistent comments	2021-09-23 09:31:39 +09:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Sanjay Patel	1cd6b44f26	[InstCombine] add one-use check to shift-shift transform We don't want to create extra instructions, and this could infinite loop with the proposed transform in D110170.	2021-09-22 16:31:12 -04:00
Arthur Eubanks	e7249e4acf	[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837	2021-09-22 09:52:37 -07:00
Simon Pilgrim	8a44281f47	[SLP] getReductionCost - use explicit TTI::TCK_RecipThroughput CostKind. NFCI. Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.	2021-09-22 16:52:22 +01:00
hyeongyu kim	98e96663f6	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110230	2021-09-23 00:48:24 +09:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
hyeongyu kim	ec8311444a	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110227	2021-09-23 00:14:50 +09:00
hyeongyu kim	e5aaf03326	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110226	2021-09-22 23:18:51 +09:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00
Florian Hahn	a7c6471a85	[Passes] Run vector-combine early with -fenable-matrix. IR with matrix intrinsics is likely to also contain large vector operations, which can benefit from early simplifications. This is the last step in a series of changes to improve code-gen for code using matrix subscript operators with the C/C++ matrix extension in CLang, like using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; } https://clang.godbolt.org/z/6dKxK1Ed7 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D102496	2021-09-22 12:48:32 +01:00
Sanjay Patel	c6013f71a4	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded" This reverts commit `2f6b07316f`. This caused several bots to hit an infinite loop at stage 2, so it needs to be reverted while figuring out how to fix that.	2021-09-22 07:45:21 -04:00
Yi Kong	d0746f2e9b	Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar. SELECT VecC, ScalarIdx, 0 We could splat Idx to a vector but it defeats the purpose of optimisation. Don't apply the folding rule in this case. This fixes a regression from commit `d561b6fbdb`.	2021-09-22 18:11:33 +08:00

1 2 3 4 5 ...

28616 Commits