llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	c71e996aed	SimplifyDemandedVectorElts for all intrinsics The point is that this simplifies integration of new intrinsics into SimplifiedDemandedVectorElts, and ensures we don't miss any existing ones. This is intended to be NFC-ish, but as seen from the diffs, can produce slightly different output. This is due to order of transforms w/in instcombine resulting in two slightly different fixed points. That's something we should fix, but isn't a problem w/this patch per se. Differential Revision: https://reviews.llvm.org/D57398 llvm-svn: 352653	2019-01-30 19:21:11 +00:00
Max Kazantsev	365021cc15	Properly use DT.verify in LoopSimplifyCFG llvm-svn: 352621	2019-01-30 12:32:19 +00:00
Max Kazantsev	34eeeec3ae	Enable IRCE for narrow latch by defailt llvm-svn: 352619	2019-01-30 11:25:12 +00:00
Alina Sbirlea	f9027e554a	Check bool attribute value in getOptionalBoolLoopAttribute. Summary: Check the bool value of the attribute in getOptionalBoolLoopAttribute not just its existance. Eliminates the warning noise generated when vectorization is explicitly disabled. Reviewers: Meinersbur, hfinkel, dmgreen Subscribers: jlebar, sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D57260 llvm-svn: 352555	2019-01-29 22:33:20 +00:00
Sanjay Patel	18db56209c	[InstCombine] canonicalize cmp/select form of uadd saturate with constant I'm circling back around to a loose end from D51929. The backend (either CGP or DAG) doesn't recognize this pattern, so we end up with different asm for these IR variants. Regardless of any future changes to canonicalize to saturation/overflow intrinsics, we want to get raw IR variations into the minimal number of raw IR forms. If/when we can canonicalize to intrinsics, that will make that step easier. Pre: C2 == ~C1 %a = add i32 %x, C1 %c = icmp ugt i32 %x, C2 %r = select i1 %c, i32 -1, i32 %a => %a = add i32 %x, C1 %c2 = icmp ult i32 %x, C2 %r = select i1 %c2, i32 %a, i32 -1 https://rise4fun.com/Alive/pkH Differential Revision: https://reviews.llvm.org/D57352 llvm-svn: 352536	2019-01-29 20:02:45 +00:00
Bjorn Pettersson	d014d576a9	[IPCP] Don't crash due to arg count/type mismatch between caller/callee Summary: This patch avoids an assert in IPConstantPropagation when there is a argument count/type mismatch between the caller and the callee. While this is actually UB on C-level (clang emits a warning), the IR verifier seems to accept it. I'm not sure what other frontends/languages might think about this, so simply bailing out to avoid hitting an assert (in CallSiteBase<>::getArgOperand or Value::doRAUW) seems like a simple solution. The problem is exposed by the fact that AbstractCallSites will look through a bitcast at the callee position of a call/invoke. Reviewers: jdoerfert, reames, efriedma Reviewed By: jdoerfert, efriedma Subscribers: eli.friedman, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D57052 llvm-svn: 352469	2019-01-29 10:19:44 +00:00
Philip Reames	6c5341bc5a	Demanded elements support for vector GEPs GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations Differential Revision: https://reviews.llvm.org/D57177 llvm-svn: 352440	2019-01-28 23:24:49 +00:00
Teresa Johnson	5b2f6a1bc2	[ThinLTO] Refine reachability check to fix compile time increase Summary: A recent fix to the ThinLTO whole program dead code elimination (D56117) increased the thin link time on a large MSAN'ed binary by 2x. It's likely that the time increased elsewhere, but was more noticeable here since it was already large and ended up timing out. That change made it so we would repeatedly scan all copies of linkonce symbols for liveness every time they were encountered during the graph traversal. This was needed since we only mark one copy of an aliasee as live when we encounter a live alias. This patch fixes the issue in a more efficient manner by simply proactively visiting the aliasee (thus marking all copies live) when we encounter a live alias. Two notes: One, this requires a hash table lookup (finding the aliasee summary in the index based on aliasee GUID). However, the impact of this seems to be small compared to the original pre-D56117 thin link time. It could be addressed if we keep the aliasee ValueInfo in the alias summary instead of the aliasee GUID, which I am exploring in a separate patch. Second, we only populate the aliasee GUID field when reading summaries from bitcode (whether we are reading individual summaries and merging on the fly to form the compiled index, or reading in a serialized combined index). Thankfully, that's currently the only way we can get to this code as we don't yet support reading summaries from LLVM assembly directly into a tool that performs the thin link (they must be converted to bitcode first). I added a FIXME, however I have the fix under test already. The easiest fix is to simply populate this field always, which isn't hard, but more likely the change I am exploring to store the ValueInfo instead as described above will subsume this. I don't want to hold up the regression fix for this though. Reviewers: trentxintong Subscribers: mehdi_amini, inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D57203 llvm-svn: 352438	2019-01-28 22:27:05 +00:00
Vedant Kumar	1c3694a4d4	[CodeExtractor] Add support for the `swifterror` attribute When passing a `swifterror` argument or alloca as an input to an extraction region, mark the input parameter `swifterror`. llvm-svn: 352408	2019-01-28 19:13:37 +00:00
Alina Sbirlea	932108703a	[SimpleLoopUnswitch] Early check exit for trivial unswitch with MemorySSA. Summary: If MemorySSA is avaiable, we can skip checking all instructions if block has any Defs. (volatile loads are also Defs). We still need to check all instructions for "canThrow", even if no Defs are found. Reviewers: chandlerc Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D57129 llvm-svn: 352393	2019-01-28 17:48:45 +00:00
Alina Sbirlea	a34bcbf335	Revert rL352238. llvm-svn: 352241	2019-01-25 21:12:08 +00:00
Alina Sbirlea	890a8e575f	[WarnMissedTransforms] Set default to 1. Summary: Set default value for retrieved attributes to 1, since the check is against 1. Eliminates the warning noise generated when the attributes are not present. Reviewers: sanjoy Subscribers: jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D57253 llvm-svn: 352238	2019-01-25 20:51:55 +00:00
Vedant Kumar	db3f9774ee	[HotColdSplit] Introduce a cost model to control splitting behavior The main goal of the model is to avoid increasing function size, as that would eradicate any memory locality benefits from splitting. This happens when: - There are too many inputs or outputs to the cold region. Argument materialization and reloads of outputs have a cost. - The cold region has too many distinct exit blocks, causing a large switch to be formed in the caller. - The code size cost of the split code is less than the cost of a set-up call. A secondary goal is to prevent excessive overall binary size growth. With the cost model in place, I experimented to find a splitting threshold that works well in practice. To make warm & cold code easily separable for analysis purposes, I moved split functions to a "cold" section. I experimented with thresholds between [0, 4] and set the default to the threshold which minimized geomean __text size. Experiment data from building LNT+externals for X86 (N = 639 programs, all sizes in bytes): \| Configuration \| __text geom size \| __cold geom size \| TEXT geom size \| \| -Os \| 1736.3 \| 0, n=0 \| 10961.6 \| \| -Os, thresh=0 \| 1740.53 \| 124.482, n=134 \| 11014 \| \| -Os, thresh=1 \| 1734.79 \| 57.8781, n=90 \| 10978.6 \| \| -Os, thresh=2 \| 1733.85 \| 65.6604, n=61 \| 10977.6 \| \| -Os, thresh=3 \| 1733.85 \| 65.3071, n=61 \| 10977.6 \| \| -Os, thresh=4 \| 1735.08 \| 67.5156, n=54 \| 10965.7 \| \| -Oz \| 1554.4 \| 0, n=0 \| 10153 \| \| -Oz, thresh=2 \| 1552.2 \| 65.633, n=61 \| 10176 \| \| -O3 \| 2563.37 \| 0, n=0 \| 13105.4 \| \| -O3, thresh=2 \| 2559.49 \| 71.1072, n=61 \| 13162.4 \| Picking thresh=2 reduces the geomean __text section size by 0.14% at -Os, -Oz, and -O3 and causes ~0.2% growth in the TEXT segment. Note that TEXT size is page-aligned, whereas section sizes are byte-aligned. Experiment data from building LNT+externals for ARM64 (N = 558 programs, all sizes in bytes): \| Configuration \| __text geom size \| __cold geom size \| TEXT geom size \| \| -Os \| 1763.96 \| 0, n=0 \| 42934.9 \| \| -Os, thresh=2 \| 1760.9 \| 76.6755, n=61 \| 42934.9 \| Picking thresh=2 reduces the geomean __text section size by 0.17% at -Os and causes no growth in the TEXT segment. Measurements were done with D57082 (r352080) applied. Differential Revision: https://reviews.llvm.org/D57125 llvm-svn: 352228	2019-01-25 18:30:37 +00:00
Max Kazantsev	38cd9acbb9	[LoopSimplifyCFG] Fix inconsistency in blocks in loop markup 2nd part of D57095 with the same reason, just in another place. We never fold branches that are not immediately in the current loop, but this check is missing in `IsEdgeLive` As result, it may think that the edge in subloop is dead while it's live. It's a pessimization in the current stance. Differential Revision: https://reviews.llvm.org/D57147 Reviewed By: rupprecht llvm-svn: 352170	2019-01-25 05:05:02 +00:00
Vedant Kumar	9d70f2b939	[HotColdSplit] Describe the pass in more detail, NFC llvm-svn: 352161	2019-01-25 03:22:38 +00:00
Vedant Kumar	65de025d64	[HotColdSplit] Split more aggressively before/after cold invokes While a cold invoke itself and its unwind destination can't be extracted, code which unconditionally executes before/after the invoke may still be profitable to extract. With cost model changes from D57125 applied, this gives a 3.5% increase in split text across LNT+externals on arm64 at -Os. llvm-svn: 352160	2019-01-25 03:22:23 +00:00
Peter Collingbourne	1a8acfb768	hwasan: If we split the entry block, move static allocas back into the entry block. Otherwise they are treated as dynamic allocas, which ends up increasing code size significantly. This reduces size of Chromium base_unittests by 2MB (6.7%). Differential Revision: https://reviews.llvm.org/D57205 llvm-svn: 352152	2019-01-25 02:08:46 +00:00
Haojian Wu	b9613a39b8	Fix a compiler error introduced in r352093. llvm-svn: 352098	2019-01-24 20:30:48 +00:00
Alina Sbirlea	0a4367209c	[LICM] Cleanup duplicated code. [NFCI] llvm-svn: 352093	2019-01-24 19:57:30 +00:00
Alina Sbirlea	52f6e2a173	[MemorySSA +LICM CFHoist] Solve PR40317. Summary: MemorySSA needs updating each time an instruction is moved. LICM and control flow hoisting re-hoists instructions, thus needing another update when re-moving those instructions. Pending cleanup: the MSSA update is duplicated, should be moved inside moveInstructionBefore. Reviewers: jnspaulsson Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D57176 llvm-svn: 352092	2019-01-24 19:48:35 +00:00
Vedant Kumar	ef1ebed1c6	[HotColdSplit] Move splitting earlier in the pipeline Performing splitting early has several advantages: - Inhibiting inlining of cold code early improves code size. Compared to scheduling splitting at the end of the pipeline, this cuts code size growth in half within the iOS shared cache (0.69% to 0.34%). - Inhibiting inlining of cold code improves compile time. There's no need to inline split cold functions, or to inline as much within those split functions as they are marked `minsize`. - During LTO, extra work is only done in the pre-link step. Less code must be inlined during cross-module inlining. An additional motivation here is that the most common cold regions identified by the static/conservative splitting heuristic can (a) be found before inlining and (b) do not grow after inlining. E.g. __assert_fail, os_log_error. The disadvantages are: - Some opportunities for splitting out cold code may be missed. This gap can potentially be narrowed by adding a worklist algorithm to the splitting pass. - Some opportunities to reduce code size may be lost (e.g. store sinking, when one side of the CFG diamond is split). This does not outweigh the code size benefits of splitting earlier. On net, splitting early in the pipeline has substantial code size benefits, and no major effects on memory locality or performance. We measured memory locality using ktrace data, and consistently found that 10% fewer pages were needed to capture 95% of text page faults in key iOS benchmarks. We measured performance on frequency-stabilized iOS devices using LNT+externals. This reverses course on the decision made to schedule splitting late in r344869 (D53437). Differential Revision: https://reviews.llvm.org/D57082 llvm-svn: 352080	2019-01-24 18:55:49 +00:00
Julian Lettner	b62e9dc46b	Revert "[Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls" This reverts commit `cea84ab93a`. llvm-svn: 352069	2019-01-24 18:04:21 +00:00
Philip Reames	4d683ee7e3	[RS4GC] Be slightly less conservative for gep vector_base, scalar_idx After submitting https://reviews.llvm.org/D57138, I realized it was slightly more conservative than needed. The scalar indices don't appear to be a problem on a vector gep, we even had a test for that. Differential Revision: https://reviews.llvm.org/D57161 llvm-svn: 352061	2019-01-24 16:34:00 +00:00
Philip Reames	a657510eb7	[RS4GC] Avoid crashing on gep scalar_base, vector_idx This is an alternative to https://reviews.llvm.org/D57103. After discussion, we dedicided to check this in as a temporary workaround, and pursue a true fix under the original thread. The issue at hand is that the base rewriting algorithm doesn't consider the fact that GEPs can turn a scalar input into a vector of outputs. We had handling for scalar GEPs and fully vector GEPs (i.e. all vector operands), but not the scalar-base + vector-index forms. A true fix here requires treating GEP analogously to extractelement or shufflevector. This patch is merely a workaround. It simply hides the crash at the cost of some ugly code gen for this presumable very rare pattern. Differential Revision: https://reviews.llvm.org/D57138 llvm-svn: 352059	2019-01-24 16:08:18 +00:00
Florian Hahn	bed7f9eab2	Revert "[HotColdSplitting] Get DT and PDT from the pass manager." This reverts commit `a6982414ed` (llvm-svn: 352036), because it causes a memory leak in the pass manager. Failing bot http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/10351/steps/check-llvm%20asan/logs/stdio llvm-svn: 352041	2019-01-24 11:22:08 +00:00
Florian Hahn	a6982414ed	[HotColdSplitting] Get DT and PDT from the pass manager. Instead of manually computing DT and PDT, we can get the from the pass manager, which ideally has them already cached. With the new pass manager, we could even preserve DT/PDT on a per function basis in a module pass. I think this also addresses the TODO about re-using the computed DTs for BFI. IIUC, GetBFI will fetch the DT from the pass manager and when we will fetch the cached version later. Reviewers: vsk, hiraditya, tejohnson, thegameg, sebpop Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D57092 llvm-svn: 352036	2019-01-24 09:44:52 +00:00
Max Kazantsev	56515a2c76	[LoopSimplifyCFG] Fix inconsistency in live blocks markup When we choose whether or not we should mark block as dead, we have an inconsistent logic in markup of live blocks. - We take candidate IF its terminator branches on constant AND it is immediately in current loop; - We mark successor live IF its terminator doesn't branch by constant OR it branches by constant and the successor is its always taken block. What we are missing here is that when the terminator branches on a constant but is not taken as a candidate because is it not immediately in the current loop, we will mark only one (always taken) successor as live. Therefore, we do NOT do the actual folding but may NOT mark one of the successors as live. So the result of markup is wrong in this case, and we may then hit various asserts. Thanks Jordan Rupprech for reporting this! Differential Revision: https://reviews.llvm.org/D57095 Reviewed By: rupprecht llvm-svn: 352024	2019-01-24 05:20:29 +00:00
Julian Lettner	cea84ab93a	[Sanitizers] UBSan unreachable incompatible with ASan in the presence of `noreturn` calls Summary: UBSan wants to detect when unreachable code is actually reached, so it adds instrumentation before every `unreachable` instruction. However, the optimizer will remove code after calls to functions marked with `noreturn`. To avoid this UBSan removes `noreturn` from both the call instruction as well as from the function itself. Unfortunately, ASan relies on this annotation to unpoison the stack by inserting calls to `_asan_handle_no_return` before `noreturn` functions. This is important for functions that do not return but access the the stack memory, e.g., unwinder functions like `longjmp` (`longjmp` itself is actually "double-proofed" via its interceptor). The result is that when ASan and UBSan are combined, the `noreturn` attributes are missing and ASan cannot unpoison the stack, so it has false positives when stack unwinding is used. Changes: # UBSan now adds the `expect_noreturn` attribute whenever it removes the `noreturn` attribute from a function # ASan additionally checks for the presence of this attribute Generated code: ``` call void @__asan_handle_no_return // Additionally inserted to avoid false positives call void @longjmp call void @__asan_handle_no_return call void @__ubsan_handle_builtin_unreachable unreachable ``` The second call to `__asan_handle_no_return` is redundant. This will be cleaned up in a follow-up patch. rdar://problem/40723397 Reviewers: delcypher, eugenis Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D56624 llvm-svn: 352003	2019-01-24 01:06:19 +00:00
David Callahan	d2eeb2516d	Update entry count for cold calls Summary: Profile sample files include the number of times each entry or inlined call site is sampled. This is translated into the entry count metadta on functions. When sample data is being read, if a call site that was inlined in the sample program is considered cold and not inlined, then the entry count of the out-of-line functions does not reflect the current compilation. In this patch, we note call sites where the function was not inlined and as a last action of the sample profile loading, we update the called function's entry count to reflect the calls from these call sites which are not included in the profile file. Reviewers: danielcdh, wmi, Kader, modocache Reviewed By: wmi Subscribers: davidxl, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D52845 llvm-svn: 352001	2019-01-24 00:55:23 +00:00
Mircea Trofin	ec02630278	[llvm] Clarify responsiblity of some of DILocation discriminator APIs Summary: Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match similar APIs. Also changed its behavior to copy over the other discriminator components, instead of eliding them. Renamed cloneWithDuplicationFactor to cloneByMultiplyingDuplicationFactor, which more closely matches what this API does. Reviewers: dblaikie, wmi Reviewed By: dblaikie Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D56220 llvm-svn: 351996	2019-01-24 00:10:25 +00:00
Hideki Saito	4e4ecae028	[LV][VPlan] Change to implement VPlan based predication for VPlan-native path Context: Patch Series #2 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). Patch series #2 checks that inner loops are still trivially lock-step among all vector elements. Non-loop branches are blindly assumed as divergent. Changes here implement VPlan based predication algorithm to compute predicates for blocks that need predication. Predicates are computed for the VPLoop region in reverse post order. A block's predicate is computed as OR of the masks of all incoming edges. The mask for an incoming edge is computed as AND of predecessor block's predicate and either predecessor's Condition bit or NOT(Condition bit) depending on whether the edge from predecessor block to the current block is true or false edge. Reviewers: fhahn, rengolin, hsaito, dcaballe Reviewed By: fhahn Patch by Satish Guggilla, thanks! Differential Revision: https://reviews.llvm.org/D53349 llvm-svn: 351990	2019-01-23 22:43:12 +00:00
Peter Collingbourne	020ce3f026	hwasan: Read shadow address from ifunc if we don't need a frame record. This saves a cbz+cold call in the interceptor ABI, as well as a realign in both ABIs, trading off a dcache entry against some branch predictor entries and some code size. Unfortunately the functionality is hidden behind a flag because ifunc is known to be broken on static binaries on Android. Differential Revision: https://reviews.llvm.org/D57084 llvm-svn: 351989	2019-01-23 22:39:11 +00:00
Florian Hahn	68cea130df	[HotColdSplitting] Remove unused SSAUpdater.h include (NFC). llvm-svn: 351945	2019-01-23 11:51:38 +00:00
Max Kazantsev	d9aee3c0d1	[IRCE] Support narrow latch condition for wide range checks This patch relaxes restrictions on types of latch condition and range check. In current implementation, they should match. This patch allows to handle wide range checks against narrow condition. The motivating example is the following: int N = ... for (long i = 0; (int) i < N; i++) { if (i >= length) deopt; } In this patch, the option that enables this support is turned off by default. We'll wait until it is switched to true. Differential Revision: https://reviews.llvm.org/D56837 Reviewed By: reames llvm-svn: 351926	2019-01-23 07:20:56 +00:00
Peter Collingbourne	73078ecd38	hwasan: Move memory access checks into small outlined functions on aarch64. Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920	2019-01-23 02:20:10 +00:00
Vedant Kumar	cde65c0fac	[HotColdSplit] Calculate BFI lazily to reduce compile-time, NFC The splitting pass does not need BFI unless the Module actually has a profile summary. Do not calcualte BFI unless the summary is present. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.4% of the total to under 0.1%. llvm-svn: 351894	2019-01-22 22:49:22 +00:00
Vedant Kumar	38874c8f7b	[HotColdSplit] Calculate domtrees lazily to reduce compile-time, NFC The splitting pass does not need (post)domtrees until after it's found a cold block. Defer domtree calculation until a cold block is found. For the sqlite3 amalgamation, this reduces time spent in the splitting pass from 0.8% of the total to 0.4%. llvm-svn: 351892	2019-01-22 22:49:08 +00:00
Jordan Rupprecht	0efe27e62b	Revert r351520, "Re-enable terminator folding in LoopSimplifyCFG" This is still causing compilation crashes in some targets. Will follow up shortly with a repro. llvm-svn: 351845	2019-01-22 17:39:02 +00:00
Max Kazantsev	feb475f4cf	[LoopPredication] Support guards expressed as branches by widenable condition This patch adds support of guards expressed as branches by widenable conditions in Loop Predication. Differential Revision: https://reviews.llvm.org/D56081 Reviewed By: reames llvm-svn: 351805	2019-01-22 11:49:06 +00:00
Max Kazantsev	ca45087826	[NFC] Factor out some reusable logic llvm-svn: 351794	2019-01-22 10:13:36 +00:00
Philip Reames	390c0e2f72	[CVP] Use LVI to constant fold deopt operands Deopt operands are generally intended to record information about a site in code with minimal perturbation of the surrounding code. Idiomatically, they also tend to appear down rare paths. Putting these together, we have an obvious case for extending CVP w/deopt operand constant folding. Arguably, we should be doing this for all operands on all instructions, but that's definitely a much larger and risky change. Differential Revision: https://reviews.llvm.org/D55678 llvm-svn: 351774	2019-01-22 01:34:33 +00:00
Serge Guelton	be88539b85	Replace llvm::isPodLike<...> by llvm::is_trivially_copyable<...> As noted in https://bugs.llvm.org/show_bug.cgi?id=36651, the specialization for isPodLike<std::pair<...>> did not match the expectation of std::is_trivially_copyable which makes the memcpy optimization invalid. This patch renames the llvm::isPodLike trait into llvm::is_trivially_copyable. Unfortunately std::is_trivially_copyable is not portable across compiler / STL versions. So a portable version is provided too. Note that the following specialization were invalid: std::pair<T0, T1> llvm::Optional<T> Tests have been added to assert that former specialization are respected by the standard usage of llvm::is_trivially_copyable, and that when a decent version of std::is_trivially_copyable is available, llvm::is_trivially_copyable is compared to std::is_trivially_copyable. As of this patch, llvm::Optional is no longer considered trivially copyable, even if T is. This is to be fixed in a later patch, as it has impact on a long-running bug (see r347004) Note that GCC warns about this UB, but this got silented by https://reviews.llvm.org/D50296. Differential Revision: https://reviews.llvm.org/D54472 llvm-svn: 351701	2019-01-20 21:19:56 +00:00
Simon Pilgrim	e1143c1322	[X86] Auto upgrade VPCOM/VPCOMU intrinsics to generic integer comparisons This causes a couple of changes in the upgrade tests as signed/unsigned eq/ne are equivalent and we constant fold true/false codes, these changes are the same as what we already do for avx512 cmp/ucmp. Noticed while cleaning up vector integer comparison costs for PR40376. llvm-svn: 351697	2019-01-20 19:27:40 +00:00
Vedant Kumar	857cacd9de	[ConstantMerge] Factor out check for un-mergeable globals, NFC llvm-svn: 351671	2019-01-20 02:44:43 +00:00
Nikita Popov	6515db205a	[InstCombine] Simplify cttz/ctlz + icmp ugt/ult Followup to D55745, this time handling comparisons with ugt and ult predicates (which are the canonical forms for non-equality predicates). For ctlz we can convert into a simple icmp, for cttz we can convert into a mask check. Differential Revision: https://reviews.llvm.org/D56355 llvm-svn: 351645	2019-01-19 09:56:01 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Johannes Doerfert	36872b5db9	Enable IPConstantPropagation to work with abstract call sites This modification of the currently unused inter-procedural constant propagation pass (IPConstantPropagation) shows how abstract call sites enable optimization of callback calls alongside direct and indirect calls. Through minimal changes, mostly dealing with the partial mapping of callbacks, inter-procedural constant propagation was enabled for callbacks, e.g., OpenMP runtime calls or pthreads_create. Differential Revision: https://reviews.llvm.org/D56447 llvm-svn: 351628	2019-01-19 05:19:12 +00:00
Vedant Kumar	b537b946b8	[MergeFunc] Allow merging identical vararg functions using aliases Thanks to Nikita Popov for pointing out this missed case. This is a follow-up to r351411, which disabled function merging for vararg functions outright due to a miscompile (see llvm.org/PR40345). Differential Revision: https://reviews.llvm.org/D56865 llvm-svn: 351624	2019-01-19 02:46:22 +00:00
Vedant Kumar	b755a2df51	[HotColdSplit] Mark inherently cold functions as such If an inherently cold function is found, mark it as cold. For now this means applying the `cold` and `minsize` attributes. As a drive-by, revisit and clean up the criteria for considering a function for splitting. Add tests. llvm-svn: 351623	2019-01-19 02:38:47 +00:00
Vedant Kumar	4de1962bb9	[HotColdSplit] Remove a set which tracked split functions (NFC) Use the begin/end iterator idiom to avoid visiting split functions, instead of doing a set lookup. llvm-svn: 351622	2019-01-19 02:38:17 +00:00
Vedant Kumar	17d9f14bff	[CodeExtractor] Emit lifetime markers around reloads of outputs CodeExtractor permits extracting a region of blocks from a function even when values defined within the region are used outside of it. This is typically done by creating an alloca in the original function and reloading the alloca after a call to the extracted function. Wrap the reload in lifetime start/end markers to promote stack coloring. Suggested by Sergei Kachkov! Differential Revision: https://reviews.llvm.org/D56045 llvm-svn: 351621	2019-01-19 02:37:59 +00:00
Florian Hahn	be7cbe3f70	[LCSSA] Skip blocks in sub-loops when scanning for uses. Summary: Scanning blocks in sub-loops for uses is unnecessary, as they were already handled while dealing with the containing sub-loop. This speeds up LCSSA for highly nested loops. For the test case in PR37202, it halves the time spent in LCSSA. In cases were we won't be able to skip any blocks, the additional lookup should be negligible. Time-passes without this patch for test case from PR37202: Total Execution Time: 48.5505 seconds (48.5511 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 10.0822 ( 21.0%) 0.1406 ( 27.0%) 10.2228 ( 21.1%) 10.2228 ( 21.1%) Loop-Closed SSA Form Pass 10.0417 ( 20.9%) 0.1467 ( 28.2%) 10.1884 ( 21.0%) 10.1890 ( 21.0%) Loop-Closed SSA Form Pass #2 4.2703 ( 8.9%) 0.0040 ( 0.8%) 4.2742 ( 8.8%) 4.2742 ( 8.8%) Unswitch loops 2.7376 ( 5.7%) 0.0229 ( 4.4%) 2.7605 ( 5.7%) 2.7611 ( 5.7%) Loop-Closed SSA Form Pass #5 2.7332 ( 5.7%) 0.0214 ( 4.1%) 2.7546 ( 5.7%) 2.7546 ( 5.7%) Loop-Closed SSA Form Pass #3 2.7088 ( 5.6%) 0.0230 ( 4.4%) 2.7319 ( 5.6%) 2.7324 ( 5.6%) Loop-Closed SSA Form Pass #4 2.6855 ( 5.6%) 0.0236 ( 4.5%) 2.7091 ( 5.6%) 2.7090 ( 5.6%) Loop-Closed SSA Form Pass #6 2.1648 ( 4.5%) 0.0018 ( 0.4%) 2.1666 ( 4.5%) 2.1664 ( 4.5%) Unroll loops 1.8371 ( 3.8%) 0.0009 ( 0.2%) 1.8379 ( 3.8%) 1.8380 ( 3.8%) Value Propagation 1.8149 ( 3.8%) 0.0021 ( 0.4%) 1.8170 ( 3.7%) 1.8169 ( 3.7%) Loop Invariant Code Motion 1.6755 ( 3.5%) 0.0226 ( 4.3%) 1.6981 ( 3.5%) 1.6980 ( 3.5%) Loop-Closed SSA Form Pass #7 Time-passes with this patch Total Execution Time: 29.9285 seconds (29.9276 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 5.2786 ( 17.7%) 0.0021 ( 1.2%) 5.2806 ( 17.6%) 5.2808 ( 17.6%) Unswitch loops 4.3739 ( 14.7%) 0.0303 ( 18.1%) 4.4042 ( 14.7%) 4.4042 ( 14.7%) Loop-Closed SSA Form Pass 4.2658 ( 14.3%) 0.0192 ( 11.5%) 4.2850 ( 14.3%) 4.2851 ( 14.3%) Loop-Closed SSA Form Pass #2 2.2307 ( 7.5%) 0.0013 ( 0.8%) 2.2320 ( 7.5%) 2.2318 ( 7.5%) Loop Invariant Code Motion 2.0888 ( 7.0%) 0.0012 ( 0.7%) 2.0900 ( 7.0%) 2.0897 ( 7.0%) Unroll loops 1.6761 ( 5.6%) 0.0013 ( 0.8%) 1.6774 ( 5.6%) 1.6774 ( 5.6%) Value Propagation 1.3686 ( 4.6%) 0.0029 ( 1.8%) 1.3716 ( 4.6%) 1.3714 ( 4.6%) Induction Variable Simplification 1.1457 ( 3.8%) 0.0010 ( 0.6%) 1.1468 ( 3.8%) 1.1468 ( 3.8%) Loop-Closed SSA Form Pass #4 1.1384 ( 3.8%) 0.0005 ( 0.3%) 1.1389 ( 3.8%) 1.1389 ( 3.8%) Loop-Closed SSA Form Pass #6 1.1360 ( 3.8%) 0.0027 ( 1.6%) 1.1387 ( 3.8%) 1.1387 ( 3.8%) Loop-Closed SSA Form Pass #5 1.1331 ( 3.8%) 0.0010 ( 0.6%) 1.1341 ( 3.8%) 1.1340 ( 3.8%) Loop-Closed SSA Form Pass #3 Reviewers: davide, efriedma, mzolotukhin Reviewed By: davide, efriedma Subscribers: hiraditya, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D56848 llvm-svn: 351567	2019-01-18 17:36:22 +00:00
Max Kazantsev	b4cd50be56	Re-enable terminator folding in LoopSimplifyCFG: underlying bugs fixed llvm-svn: 351520	2019-01-18 04:57:32 +00:00
Vedant Kumar	f529b50702	[HotColdSplit] Allow outlining with live outputs Prior to r348205, extracting code regions with live output values was disabled because of a miscompilation (PR39433). Lift the restriction as PR39433 has been addressed. Tested on LNT+externals, on a run of check-llvm in a stage2 build, and with a full build of iOS (with hot/cold splitting enabled). As a drive-by, remove an errant TODO. llvm-svn: 351492	2019-01-17 22:36:05 +00:00
Vedant Kumar	b70e20db62	[HotColdSplit] Consider resume instructions to be cold Resuming exception unwinding is roughly as unlikely as throwing an exception. Tested on LNT+externals (in particular, the C++ EH regression tests provide end-to-end test coverage), as well as with a full build of iOS. llvm-svn: 351491	2019-01-17 22:35:47 +00:00
Vedant Kumar	4541be0686	[HotColdSplit] Relax requirement that the cold sink block be extractable Relaxing this requirement creates opportunities to split code dominated by an EH pad. Tested on LNT+externals. llvm-svn: 351483	2019-01-17 21:42:36 +00:00
Vedant Kumar	32a014d048	[HotColdSplit] Simplify tests by lowering their splitting thresholds This gets rid of the brittle/mysterious calls to @sink()/@sideeffect() peppered throughout the test cases. They are no longer needed to force splitting to occur. llvm-svn: 351480	2019-01-17 21:29:34 +00:00
Wei Mi	3bcccdfe38	[SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink If the sample profile has no inlining hierachy information included, we call the sample profile is flattened. For flattened profile, in ThinLTO postlink phase, SampleProfileLoader's hot function inlining and profile annotation will do nothing, so it is better to save the effort to read in the profile and run the sample profile loader pass. It is helpful for reducing compile time when the flattened profile is huge. Differential Revision: https://reviews.llvm.org/D54819 llvm-svn: 351476	2019-01-17 20:48:34 +00:00
Reid Kleckner	edd653bc07	[InstCombine] Don't sink dynamic allocas Summary: InstCombine's sinking algorithm only thinks about memory. It doesn't think about non-memory constraints like stack object lifetime. It can sink dynamic allocas across a stacksave call, which may be used with stackrestore, which can incorrectly reduce the lifetime of the dynamic alloca. Fixes PR40365 Reviewers: hfinkel, efriedma Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56872 llvm-svn: 351475	2019-01-17 20:46:53 +00:00
Teresa Johnson	8d86f1ba47	Revert "[ThinLTO] Add summary entries for index-based WPD" Mistaken commit of something still under review! This reverts commit r351453. llvm-svn: 351455	2019-01-17 16:05:04 +00:00
Teresa Johnson	4fcf3b1621	[ThinLTO] Add summary entries for index-based WPD Summary: If LTOUnit splitting is disabled, the module summary analysis computes the summary information necessary to perform single implementation devirtualization during the thin link with the index and no IR. The information collected from the regular LTO IR in the current hybrid WPD algorithm is summarized, including: 1) For vtable definitions, record the function pointers and their offset within the vtable initializer (subsumes the information collected from IR by tryFindVirtualCallTargets). 2) A record for each type metadata summarizing the vtable definitions decorated with that metadata (subsumes the TypeIdentiferMap collected from IR). Also added are the necessary bitcode records, and the corresponding assembly support. The index-based WPD will be sent as a follow-on. Depends on D53890. Reviewers: pcc Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54815 llvm-svn: 351453	2019-01-17 15:49:03 +00:00
Max Kazantsev	61a8d3fb33	[LoopSimplifyCFG] Form LCSSA when a parent loop becomes a sibling During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the parent loop may stop being reachable from the child loop, and therefore they become siblings. If the former child loop had uses of some values from its former parent loop, now such uses will require LCSSA Phis, even if they weren't needed before. So we must form LCSSA for all loops that stopped being ancestors of the current loop in this case. Differential Revision: https://reviews.llvm.org/D56144 Reviewed By: fedor.sergeev llvm-svn: 351434	2019-01-17 12:51:10 +00:00
Max Kazantsev	8b134169f5	[LoopSimplifyCFG] Fix order of deletion of complex dead subloops Function `DeleteDeadBlock` requires that all predecessors of a block being deleted have already been deleted, with the exception of a single-block loop. When we use it for removal of dead subloops that contain more than one block, we may not fulfull this requirement and fail an assertion. This patch replaces invocation of `DeleteDeadBlock` with a generalized version `DeleteDeadBlocks` that is able to deal with multiple dead blocks, even if they contain some cycles. Differential Revision: https://reviews.llvm.org/D56121 Reviewed By: fedor.sergeev llvm-svn: 351433	2019-01-17 12:25:40 +00:00
Max Kazantsev	ee61308595	[NFC] Factor out some local vars llvm-svn: 351416	2019-01-17 06:20:42 +00:00
Vedant Kumar	a9906c1e5e	[MergeFunc] Prevent silent miscompile of vararg functions The function merging pass miscompiles identical vararg functions. The forwarding thunk it emits doesn't forward the full variable-length list of arguments. Disable merging for vararg functions for now. I've filed llvm.org/PR40345 to track the issue. rdar://47326238 llvm-svn: 351411	2019-01-17 02:15:05 +00:00
Vedant Kumar	e21ab22115	[FunctionComparator] Consider tail call kinds Essentially, do not treat `call` and `musttail call` as the same thing. As a drive-by, fold CallInst and InvokeInst handling together using the CallSite helper. Differential Revision: https://reviews.llvm.org/D56815 llvm-svn: 351405	2019-01-17 00:29:14 +00:00
Wei Mi	79c4408aa2	Fix a mistake in rL351392. PGOInstrGen should be initialized to "" instead of false. llvm-svn: 351397	2019-01-16 23:31:40 +00:00
Wei Mi	c876e3d42b	[PGO] Make pgo related options in opt more consistent. Currently we have pgo options defined in PassManagerBuilder.cpp only for instrument pgo, but not for sample pgo. We also have pgo options defined in NewPMDriver.cpp in opt only for new pass manager and for all kinds of pgo. They have some inconsistency. To make the options more consistent and make tests writing easier, the patch let old pass manager to share the same pgo options with new pass manager in opt, and removes the options in PassManagerBuilder.cpp. Differential Revision: https://reviews.llvm.org/D56749 llvm-svn: 351392	2019-01-16 23:19:02 +00:00
Alexey Bataev	18809a6bbb	[SLP] Fix PR40310: The reduction nodes should stay scalar. Summary: Sometimes the SLP vectorizer tries to vectorize the horizontal reduction nodes during regular vectorization. This may happen inside of the loops, when there are some vectorizable PHIs. Patch fixes this by checking if the node is the reduction node and thus it must not be vectorized, it must be gathered. Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56783 llvm-svn: 351349	2019-01-16 15:39:52 +00:00
Gabor Buella	3ec170c85a	Assertion in isAllocaPromotable due to extra bitcast goes into lifetime marker For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic. For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation) ``` %rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]* <----- this one causing the assertion and is an extra bitcast %5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* %5) ``` isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion. As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation. The test is a greatly simplified to just reproduce the assertion. Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com> Reviewers: chandlerc, craig.topper Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D55934 llvm-svn: 351325	2019-01-16 12:06:17 +00:00
Philip Pfaffe	81101de585	[MSan] Apply the ctor creation scheme of TSan Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56734 llvm-svn: 351322	2019-01-16 11:14:07 +00:00
Philip Pfaffe	685c76d7a3	[NewPM][TSan] Reiterate the TSan port Summary: Second iteration of D56433 which got reverted in rL350719. The problem in the previous version was that we dropped the thunk calling the tsan init function. The new version keeps the thunk which should appease dyld, but is not actually OK wrt. the current semantics of function passes. Hence, add a helper to insert the functions only on the first time. The helper allows hooking into the insertion to be able to append them to the global ctors list. Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan Subscribers: hiraditya, bollu, llvm-commits Differential Revision: https://reviews.llvm.org/D56538 llvm-svn: 351314	2019-01-16 09:28:01 +00:00
Tom Stellard	3d36e5c3e6	Only promote args when function attributes are compatible Summary: Check to make sure that the caller and the callee have compatible function arguments before promoting arguments. This uses the same TargetTransformInfo queries that are used to determine if attributes are compatible for inlining. The goal here is to avoid breaking ABI when a called function's ABI depends on a target feature that is not enabled in the caller. This is a very conservative fix for PR37358. Ideally we would have a more sophisticated check for ABI compatiblity rather than checking if the attributes are compatible for inlining. Reviewers: echristo, chandlerc, eli.friedman, craig.topper Reviewed By: echristo, chandlerc Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits Differential Revision: https://reviews.llvm.org/D53554 llvm-svn: 351296	2019-01-16 05:15:31 +00:00
Serguei Katkov	a5b0e5585b	[InstCombine]Avoid introduction of unaligned mem access InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair. It might result in generation of unaligned atomic load/store which later in backend will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is and handle it at backend. Reviewers: reames, anna, apilipenko, mkazantsev Reviewed By: reames Subscribers: t.p.northover, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56582 llvm-svn: 351295	2019-01-16 04:36:26 +00:00
David Callahan	d129d3e93f	treat invoke like call Summary: InvokeInst should be treated like CallInst and assigned a separate discriminator. This is particularly import when an Invoke is converted to a Call during compilation and so can invalidate sample profile data collected wtih different link time optimizations Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56491 llvm-svn: 351251	2019-01-15 21:26:51 +00:00
Matt Morehouse	19ff35c481	[SanitizerCoverage] Don't create comdat for interposable functions. Summary: Comdat groups override weak symbol behavior, allowing the linker to keep the comdats for weak symbols in favor of comdats for strong symbols. Fixes the issue described in: https://bugs.chromium.org/p/chromium/issues/detail?id=918662 Reviewers: eugenis, pcc, rnk Reviewed By: pcc, rnk Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56516 llvm-svn: 351247	2019-01-15 21:21:01 +00:00
David Callahan	dee001207e	We can improve the performance (generally) by memo-izing the action to map a debug location to its function summary. Summary: Here are timings (as reported by "opt -time-passes") for sample-profile pass for some files holding hot functions from a major service©r. Average 17% reduction. Delta column is 100*(old-new)/old. ``` Old New Delta 0.0537 0.0538 -0.2% 0.8155 0.6522 20.0% 0.0779 0.0751 3.6% 0.0727 0.0913 -25.6% 0.1622 0.1302 19.7% 0.0627 0.0594 5.3% 0.0766 0.0744 2.9% 0.6426 0.4387 31.7% 0.3521 0.2776 21.2% 0.3549 0.2721 23.3% 0.0912 0.0904 0.9% 0.1236 0.1059 14.3% 0.0854 0.0866 -1.4% 0.0757 0.0722 4.6% 0.1293 0.1147 11.3% 0.1354 0.1122 17.1% 0.0767 0.0770 -0.4% 0.1135 0.0968 14.7% 0.0524 0.0608 -16.0% 0.1279 0.1106 13.5% ========== 3.6820 3.0520 17.1% Total ``` Reviewers: twoh, Kader, danielcdh, wmi Reviewed By: wmi Subscribers: dblaikie, llvm-commits Differential Revision: https://reviews.llvm.org/D56435 llvm-svn: 351211	2019-01-15 17:45:54 +00:00
Zaara Syeda	b7dff9c9af	[SimpleLoopUnswitch] Increment stats counter for unswitching switch instruction Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for unswitching a non-trivial switch instruction. This is to fix a bug that it increments NumBranches even for the case of switch instruction. There is no functional change in this patch. Differential Revision: https://reviews.llvm.org/D56408 llvm-svn: 351193	2019-01-15 15:08:01 +00:00
Florian Hahn	4094f34f78	[InstCombine] Don't undo 0 - (X * Y) canonicalization when combining subs. Otherwise instcombine gets stuck in a cycle. The canonicalization was added in D55961. This patch fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=12400 llvm-svn: 351187	2019-01-15 11:18:21 +00:00
Max Kazantsev	f8a0e0ddf0	[NFC] Remove some code duplication llvm-svn: 351185	2019-01-15 11:16:14 +00:00
Max Kazantsev	80242ee87e	[NFC] Remove obsolete enum RangeCheckKind llvm-svn: 351183	2019-01-15 10:48:45 +00:00
Max Kazantsev	78a5435284	[NFC] Decrease if nest llvm-svn: 351180	2019-01-15 10:01:46 +00:00
Max Kazantsev	a78dc4d6c8	[NFC] Move some functions to LoopUtils llvm-svn: 351179	2019-01-15 09:51:34 +00:00
Marek Olsak	33eb4d947d	AMDGPU: Add a fast path for icmp.i1(src, false, NE) Summary: This allows moving the condition from the intrinsic to the standard ICmp opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic is an identity for retrieving the SGPR mask. And we can also get the mask from and i1, or i1, xor i1. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060 llvm-svn: 351150	2019-01-15 02:13:18 +00:00
Jonathan Metzman	e159a0dd1a	[SanitizerCoverage][NFC] Use appendToUsed instead of include Summary: Use appendToUsed instead of include to ensure that SanitizerCoverage's constructors are not stripped. Also, use isOSBinFormatCOFF() to determine if target binary format is COFF. Reviewers: pcc Reviewed By: pcc Subscribers: hiraditya Differential Revision: https://reviews.llvm.org/D56369 llvm-svn: 351118	2019-01-14 21:02:02 +00:00
David Callahan	957795973b	Ignore PhiNodes when mapping sample profile data Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored. Reviewers: danielcdh, twoh, Kader, wmi Reviewed By: wmi Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55094 llvm-svn: 351102	2019-01-14 19:05:59 +00:00
David Callahan	b1853a6a94	Revert "Merge branch 'arcpatch-D55094'" This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing changes made to f1309ffebf718d16aec4fab83380556c660e2825. unintended merge pushed llvm-svn: 351095	2019-01-14 18:49:27 +00:00
David Callahan	1b46231764	Merge branch 'arcpatch-D55094' llvm-svn: 351092	2019-01-14 18:35:43 +00:00
Tom Stellard	d0a7676087	cmake: Don't install plugins used for examples or tests Summary: This patch drops install targets for LLVMHello.so, TestPlugin.so, and BugpointPasses.so. Reviewers: chandlerc, beanz, thakis, philip.pfaffe Reviewed By: chandlerc Subscribers: SquallATF, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D55965 llvm-svn: 351087	2019-01-14 18:25:35 +00:00
Jeremy Morse	f216da7ee0	[DebugInfo] Remove un-necessary logic from HoistThenElseCodeToIf Following PR39807, the way in which SimplifyCFG hoists common code on branch paths was fixed in r347782. However this left extra code hanging around HoistThenElseCodeToIf that wasn't necessary and needlessly complicated matters -- we no longer need to look up through the 'if' basic block to find a location for hoisted 'select' insts, we can instead use the location chosen by applyMergedLocation. This patch deletes that extra logic, and updates a regression test to reflect the new logic (selects get the merged location, not a previous insts location). Differential Revision: https://reviews.llvm.org/D55272 llvm-svn: 351058	2019-01-14 12:13:12 +00:00
David Stuttard	f77079f892	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b llvm-svn: 351054	2019-01-14 11:55:24 +00:00
Max Kazantsev	1f73310e1e	[BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocks Utility function `DeleteDeadBlock` expects that all predecessors of a block being deleted are already deleted, with the exception of single-block loop. It makes it hard to use for deletion of a set of blocks that may contain cyclic dependencies. The is no correct order of invocations of this function that does not produce dangling pointers on already deleted blocks. This patch introduces a generalized version of this function `DeleteDeadBlocks` that allows us to remove multiple blocks at once, even if there are cycles among them. The only requirement is that no block being deleted should have a predecessor that is not being deleted. The logic of `DeleteDeadBlocks` is following: for each block create relevant DT updates; remove all instructions (replace with undef if needed); replace terminator with unreacheable; apply DT updates; for each block delete block; Therefore, `DeleteDeadBlock` becomes a particular case of the general algorithm called for a single block. Differential Revision: https://reviews.llvm.org/D56120 Reviewed By: skatkov llvm-svn: 351045	2019-01-14 10:26:26 +00:00
Benjamin Kramer	b17d2136ea	Give helper classes/functions local linkage. NFC. llvm-svn: 351016	2019-01-12 18:36:22 +00:00
Sanjay Patel	7d65fe5cd5	[LoopVectorizer] give more advice in remark about failure to vectorize call Something like this is requested by: https://bugs.llvm.org/show_bug.cgi?id=40265 ...and it seems like a common enough case that we should acknowledge it. Differential Revision: https://reviews.llvm.org/D56551 llvm-svn: 351010	2019-01-12 15:27:15 +00:00
Teresa Johnson	290a839891	[LTO] Record whether LTOUnit splitting is enabled in index Summary: Records in the module summary index whether the bitcode was compiled with the option necessary to enable splitting the LTO unit (e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit). The information is passed down to the ModuleSummaryIndex builder via a new module flag "EnableSplitLTOUnit", which is propagated onto a flag on the summary index. This is then used during the LTO link to check whether all linked summaries were built with the same value of this flag. If not, an error is issued when we detect a situation requiring whole program visibility of the class hierarchy. This is the case when both of the following conditions are met: 1) We are performing LowerTypeTests or Whole Program Devirtualization. 2) There are type tests or type checked loads in the code. Note I have also changed the ThinLTOBitcodeWriter to also gate the module splitting on the value of this flag. Reviewers: pcc Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D53890 llvm-svn: 350948	2019-01-11 18:31:57 +00:00
Vedant Kumar	ee10ef737e	[MergeFunc] Erase unused duplicate functions if they are discardable MergeFunc only deletes unused duplicate functions if they have local linkage, but it should be safe to relax this to any "discardable if unused" linkage type. Differential Revision: https://reviews.llvm.org/D56574 llvm-svn: 350939	2019-01-11 17:56:35 +00:00
Vedant Kumar	08fe7e02fb	[MergeFunc] Use Instruction::getFunction as a cleanup, NFC llvm-svn: 350938	2019-01-11 17:56:21 +00:00
Ehsan Amiri	f452f116d2	[Jump Threading] Unfold a select insn that feeds a switch via a phi node Currently when a select has a constant value in one branch and the select feeds a conditional branch (via a compare/ phi and compare) we unfold the select statement. This results in threading the conditional branch later on. Similar opportunity exists when a select (with a constant in one branch) feeds a switch (via a phi node). The patch unfolds select under this condition. A testcase is provided. llvm-svn: 350931	2019-01-11 15:52:57 +00:00
Matt Davis	9cd9f41f0e	[GVN] Update BlockRPONumber prior to use. Summary: The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional. In short we found cases where that map was being accessed with blocks that had not yet been added to that structure. For context, I've kept the wall of text below, to what we are trying to fix, by always ensuring a updated BlockRPONumber. == Backstory == I was investigating an ICE (segfault accessing a DenseMap item). This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018). After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`. Our test case that triggered this warning and periodic crash is rather involved. But the problematic line looks to be: GVN.cpp: Line 2197 ``` if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] && ``` To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs. My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above). In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm. 1) There is no sequence point for the `>=` operation. 2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`). 3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) . A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid. The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison. The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int. And that pointer is being stored on the stack. If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid. Reviewers: t.p.northover, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D55974 llvm-svn: 350880	2019-01-10 19:56:03 +00:00
Alina Sbirlea	cae12edaaa	Use MemorySSA in LICM to do sinking and hoisting. Summary: Step 2 in using MemorySSA in LICM: Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag. Promotion is disabled. Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which relied on promotion, in order to test all sinking tests. Reviewers: sanjoy, davide, gberry, george.burgess.iv Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D40375 llvm-svn: 350879	2019-01-10 19:29:04 +00:00
James Y Knight	62df5eed16	[opaque pointer types] Remove some calls to generic Type subtype accessors. That is, remove many of the calls to Type::getNumContainedTypes(), Type::subtypes(), and Type::getContainedType(N). I'm not intending to remove these accessors -- they are useful/necessary in some cases. However, removing the pointee type from pointers would potentially break some uses, and reducing the number of calls makes it easier to audit. llvm-svn: 350835	2019-01-10 16:07:20 +00:00
Eli Friedman	d4e7a0d83c	[SimplifyLibCalls] Fix memchr expansion for constant strings. The C standard says "The memchr function locates the first occurrence of c (converted to an unsigned char)[...]". The expansion was missing the conversion to unsigned char. Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 . Differential Revision: https://reviews.llvm.org/D55947 llvm-svn: 350775	2019-01-09 23:39:26 +00:00
Easwaran Raman	b45994b843	Refactor synthetic profile count computation. NFC. Summary: Instead of using two separate callbacks to return the entry count and the relative block frequency, use a single callback to return callsite count. This would allow better supporting hybrid mode in the future as the count of callsite need not always be derived from entry count (as in sample PGO). Reviewers: davidxl Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D56464 llvm-svn: 350755	2019-01-09 20:10:27 +00:00
Florian Hahn	9697d2a764	Revert r350647: "[NewPM] Port tsan" This patch breaks thread sanitizer on some macOS builders, e.g. http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/52725/ llvm-svn: 350719	2019-01-09 13:32:16 +00:00
Max Kazantsev	4615a505f8	[IPT] Drop cache less eagerly in GVN and LoopSafetyInfo Current strategy of dropping `InstructionPrecedenceTracking` cache is to invalidate the entire basic block whenever we change its contents. In fact, `InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions` that is needed to be invalidated whenever the contents changes, and the map with first special instructions in block. This second map does not need an update if we add/remove a non-special instuction because it cannot affect the contents of this map. This patch changes API of `InstructionPrecedenceTracking` so that it now accounts for reasons under which we invalidate blocks. This should lead to much less recalculations of the map and should save us some compile time because in practice we don't typically add/remove special instructions. Differential Revision: https://reviews.llvm.org/D54462 Reviewed By: efriedma llvm-svn: 350694	2019-01-09 07:28:13 +00:00
Sanjay Patel	d023dd60e9	[InstCombine] canonicalize another raw IR rotate pattern to funnel shift This is matching the equivalent of the DAG expansion, so it should never end up with worse perf than the original code even if the target doesn't have a rotate instruction. llvm-svn: 350672	2019-01-08 22:39:55 +00:00
Philip Pfaffe	82f995db75	[NewPM] Port tsan A straightforward port of tsan to the new PM, following the same path as D55647. Differential Revision: https://reviews.llvm.org/D56433 llvm-svn: 350647	2019-01-08 19:21:57 +00:00
Anna Thomas	2dfa412efe	[UnrollRuntime] Fix domTree failures in multiexit unrolling Summary: This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case. We initially had a restrictive check that the IDom is only updated when it is the header of the loop. However, we also need to update the IDom to the correct one when the IDom is any block within the original loop. See added test cases (which fail dom tree verification without the patch). Reviewers: reames, mzolotukhin, mkazantsev, hfinkel Reviewed by: brzycki, kuhar Subscribers: zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D56284 llvm-svn: 350640	2019-01-08 17:16:25 +00:00
Chandler Carruth	57578aaf96	[CallSite removal] Port `IndirectCallSiteVisitor` to use `CallBase` and update client code. Also rename it to use the more generic term `call` instead of something that could be confused with a praticular type. Differential Revision: https://reviews.llvm.org/D56183 llvm-svn: 350508	2019-01-07 07:15:51 +00:00
Chandler Carruth	363ac68374	[CallSite removal] Migrate all Alias Analysis APIs to use the newly minted `CallBase` class instead of the `CallSite` wrapper. This moves the largest interwoven collection of APIs that traffic in `CallSite`s. While a handful of these could have been migrated with a minorly more shallow migration by converting from a `CallSite` to a `CallBase`, it hardly seemed worth it. Most of the APIs needed to migrate together because of the complex interplay of AA APIs and the fact that converting from a `CallBase` to a `CallSite` isn't free in its current implementation. Out of tree users of these APIs can fairly reliably migrate with some combination of `.getInstruction()` on the `CallSite` instance and casting the resulting pointer. The most generic form will look like `CS` -> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there is a more elegant migration. Hopefully, this migrates enough APIs for users to fully move from `CallSite` to the base class. All of the in-tree users were easily migrated in that fashion. Thanks for the review from Saleem! Differential Revision: https://reviews.llvm.org/D55641 llvm-svn: 350503	2019-01-07 05:42:51 +00:00
Nikita Popov	65038515ee	[InstCombine] Relax cttz/ctlz with select on zero The cttz/ctlz intrinsics have a parameter specifying whether the result is undefined for zero. cttz(x, false) can be relaxed to cttz(x, true) if x is known non-zero, and in fact such an optimization is already performed. However, this currently doesn't work if x is non-zero as a result of a select rather than an explicit branch. This patch adds handling for this case, thus allowing x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y. Differential Revision: https://reviews.llvm.org/D55786 llvm-svn: 350463	2019-01-05 09:48:16 +00:00
Easwaran Raman	366a873f14	[Inliner] Optimize shouldBeDeferred This has some minor optimizations to shouldBeDeferred. This is not strictly NFC because the early exit inside the loop assumes TotalSecondaryCost is monotonically non-decreasing, which is not true if the threshold used by CostAnalyzer is negative. AFAICT the thresholds do not go below 0 for the default values of the various options we use. llvm-svn: 350456	2019-01-05 02:26:29 +00:00
Evgeniy Stepanov	0184c53cbd	Revert "Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)"" This reapplies commit r348983. llvm-svn: 350448	2019-01-05 00:44:58 +00:00
Nikita Popov	6658fce4fc	[BDCE] Remove dead uses of arguments In addition to finding dead uses of instructions, also find dead uses of function arguments, and replace them with zero as well. I'm changing the way the known bits are computed here to remove the coupling between the transfer function and the algorithm. It previously relied on the first op being visited first and computing known bits -- unless the first op is not an instruction, in which case they're computed on the second op. I could have adjusted this to check for "instruction or argument", but I think it's better to avoid the repeated calculation with an explicit flag. Differential Revision: https://reviews.llvm.org/D56247 llvm-svn: 350435	2019-01-04 21:21:43 +00:00
Peter Collingbourne	87f477b5e4	hwasan: Implement lazy thread initialization for the interceptor ABI. The problem is similar to D55986 but for threads: a process with the interceptor hwasan library loaded might have some threads started by instrumented libraries and some by uninstrumented libraries, and we need to be able to run instrumented code on the latter. The solution is to perform per-thread initialization lazily. If a function needs to access shadow memory or add itself to the per-thread ring buffer its prologue checks to see whether the value in the sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter and reloads from the TLS slot. The runtime does the same thing if it needs to access this data structure. This change means that the code generator needs to know whether we are targeting the interceptor runtime, since we don't want to pay the cost of lazy initialization when targeting a platform with native hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform} has been introduced for selecting the runtime ABI to target. The default ABI is set to interceptor since it's assumed that it will be more common that users will be compiling application code than platform code. Because we can no longer assume that the TLS slot is initialized, the pthread_create interceptor is no longer necessary, so it has been removed. Ideally, lazy initialization should only cost one instruction in the hot path, but at present the call may cause us to spill arguments to the stack, which means more instructions in the hot path (or theoretically in the cold path if the spills are moved with shrink wrapping). With an appropriately chosen calling convention for the per-thread initialization function (TODO) the hot path should always need just one instruction and the cold path should need two instructions with no spilling required. Differential Revision: https://reviews.llvm.org/D56038 llvm-svn: 350429	2019-01-04 19:27:04 +00:00
Teresa Johnson	853b962416	[ThinLTO] Handle chains of aliases At -O0, globalopt is not run during the compile step, and we can have a chain of an alias having an immediate aliasee of another alias. The summaries are constructed assuming aliases in a canonical form (flattened chains), and as a result only the base object but no intermediate aliases were preserved. Fix by adding a pass that canonicalize aliases, which ensures each alias is a direct alias of the base object. Reviewers: pcc, davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54507 llvm-svn: 350423	2019-01-04 19:04:54 +00:00
Vedant Kumar	a1778df474	[CodeExtractor] Do not extract unsafe lifetime markers Lifetime markers which reference inputs to the extraction region are not safe to extract. Example ('rhs' will be extracted): ``` entry: +------------+ \| x = alloca \| \| y = alloca \| +------------+ / \ lhs: rhs: +-------------------+ +-------------------+ \| lifetime_start(x) \| \| lifetime_start(x) \| \| use(x) \| \| lifetime_start(y) \| \| lifetime_end(x) \| \| use(x, y) \| \| lifetime_start(y) \| \| lifetime_end(y) \| \| use(y) \| \| lifetime_end(x) \| \| lifetime_end(y) \| +-------------------+ +-------------------+ ``` Prior to extraction, the stack coloring pass sees that the slots for 'x' and 'y' are in-use at the same time. After extraction, the coloring pass infers that 'x' and 'y' are not in-use concurrently, because markers from 'rhs' are no longer available to help decide otherwise. This leads to a miscompile, because the stack slots actually are in-use concurrently in the extracted function. Fix this by moving lifetime start/end markers for memory regions defined in the calling function around the call to the extracted function. Fixes llvm.org/PR39671 (rdar://45939472). Differential Revision: https://reviews.llvm.org/D55967 llvm-svn: 350420	2019-01-04 17:43:22 +00:00
Sanjay Patel	722466e1f1	[InstCombine] reduce raw IR narrowing rotate patterns to funnel shift Similar to rL350199 - there are no known analysis/codegen holes for funnel shift intrinsics now, so we can canonicalize the 6+ regular instructions to funnel shift to improve vectorization, inlining, unrolling, etc. llvm-svn: 350419	2019-01-04 17:38:12 +00:00
John Brawn	39ac159c24	[LICM] Adjust how moving the re-hoist point works In some cases the order that we hoist instructions in means that when rehoisting (which uses the same order as hoisting) we can rehoist to a block A, then a block B, then block A again. This currently causes an assertion failure as it expects that when changing the hoist point it only ever moves to a block that dominates the hoist point being moved from. Fix this by moving the re-hoist point when it doesn't dominate the dominator of hoisted instruction, or in other words when it wouldn't dominate the uses of the instruction being rehoisted. Differential Revision: https://reviews.llvm.org/D55266 llvm-svn: 350408	2019-01-04 17:12:09 +00:00
Xin Tong	47beee2f3f	[memcpyopt] Remove a few unnecessary isVolatile() checks. NFC We already checked for isSimple() on the store. llvm-svn: 350378	2019-01-04 02:13:22 +00:00
Anna Thomas	a470aa6701	[UnrollRuntime] Move the DomTree verification under expensive checks Suggested by Hal as done in r349871. llvm-svn: 350349	2019-01-03 19:43:33 +00:00
Anna Thomas	0785e7307e	[UnrollRuntime] Add DomTree verification under debug mode NFC: This adds the dom tree verification under debug mode at a point just before we start unrolling the loop. This allows us to verify dom tree at a state where it is much smaller and before the unrolling actually happens. This also implies we do not need to run -verify-dom-info everytime to see if the DT is in a valid state when we transform the loop for runtime unrolling. llvm-svn: 350334	2019-01-03 17:44:44 +00:00
Philip Pfaffe	b39a97c8f6	[NewPM] Port Msan Summary: Keeping msan a function pass requires replacing the module level initialization: That means, don't define a ctor function which calls __msan_init, instead just declare the init function at the first access, and add that to the global ctors list. Changes: - Pull the actual sanitizer and the wrapper pass apart. - Add a newpm msan pass. The function pass inserts calls to runtime library functions, for which it inserts declarations as necessary. - Update tests. Caveats: - There is one test that I dropped, because it specifically tested the definition of the ctor. Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji Differential Revision: https://reviews.llvm.org/D55647 llvm-svn: 350305	2019-01-03 13:42:44 +00:00
Pete Cooper	697281df42	Teach ObjCARC optimizer about equivalent PHIs when eliminating autoreleaseRV/retainRV pairs OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have equivalent arguments (either identical or equivalent PHIs). It then assumes that ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead. Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue. This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence. rdar://problem/47005143 Reviewed By: ahatanak Differential Revision: https://reviews.llvm.org/D56235 llvm-svn: 350284	2019-01-03 01:38:08 +00:00
Xin Tong	33e3b4b9b3	[ThinLTO] Scan all variants of vague symbol for reachability. Summary: Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable from somewhere else. Reviewers: tejohnson, grimar Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D56117 llvm-svn: 350269	2019-01-02 23:18:20 +00:00
Pete Cooper	8d58048024	Fix assert in ObjCARC optimizer when deleting retainBlock of null or undef. The caller to EraseInstruction had this conditional: // ARC calls with null are no-ops. Delete them. if (IsNullOrUndef(Arg)) but the assert inside EraseInstruction only allowed ConstantPointerNull and not undef or bitcasts. This adds support for both of these cases. rdar://problem/47003805 llvm-svn: 350261	2019-01-02 21:00:02 +00:00
Nikita Popov	cc6ef7f153	[BDCE] Remove instructions without demanded bits If an instruction has no demanded bits, remove it directly during BDCE, instead of leaving it for something else to clean up. Differential Revision: https://reviews.llvm.org/D56185 llvm-svn: 350257	2019-01-02 20:02:14 +00:00
Pawel Bylica	119aa8fa5f	Format AggresiveInstCombine.cpp. NFC llvm-svn: 350255	2019-01-02 19:51:46 +00:00
Sanjay Patel	654e6aabb9	[InstCombine] canonicalize raw IR rotate patterns to funnel shift The final piece of IR-level analysis to allow this was committed with: rL350188 Using the intrinsics should improve transforms based on cost models like vectorization and inlining. The backend should be prepared too, so we can now canonicalize more sequences of shift/logic to the intrinsics and know that the end result should be equal or better to the original code even if the target does not have an actual rotate instruction. llvm-svn: 350199	2019-01-01 21:51:39 +00:00
Nikita Popov	bc9986e9ad	Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions" This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188	2019-01-01 10:05:26 +00:00
Chen Zheng	4952e668f8	[InstCombine] canonicalize MUL with NEG operand -X * Y --> -(X * Y) X * -Y --> -(X * Y) Differential Revision: https://reviews.llvm.org/D55961 llvm-svn: 350185	2019-01-01 01:09:20 +00:00
Alexander Potapenko	cea4f83371	[MSan] Handle llvm.is.constant intrinsic MSan used to report false positives in the case the argument of llvm.is.constant intrinsic was uninitialized. In fact checking this argument is unnecessary, as the intrinsic is only used at compile time, and its value doesn't depend on the value of the argument. llvm-svn: 350173	2018-12-31 09:42:23 +00:00
Max Kazantsev	201534d753	Drop SE cache early because loop parent can change in LoopSimplifyCFG llvm-svn: 350145	2018-12-29 04:26:22 +00:00
Anna Thomas	98743fa77a	[UnrollRuntime] NFC: Add comment and verify LCSSA Added -verify-loop-lcssa to test cases. Updated comments in ConnectProlog. llvm-svn: 350131	2018-12-28 18:52:16 +00:00
Max Kazantsev	530ff8f3cc	Temporarily disable term folding in LoopSimplifyCFG, add tests llvm-svn: 350117	2018-12-28 06:22:39 +00:00
Max Kazantsev	80e4b40f3e	[LoopSimplifyCFG] Delete dead blocks in RPO Deletion of dead blocks in arbitrary order may lead to failure of assertion in `DeleteDeadBlock` that requires that we have deleted all predecessors before we can delete the current block. We should instead delete them in RPO order. llvm-svn: 350116	2018-12-28 06:08:51 +00:00
Craig Topper	c9a6000755	[LoopIdiomRecognize] Add CTTZ support Summary: Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom) This commit: 1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left. 2. Prepare for BitScan idiom recognition. Patch by Yuanfang Chen (tabloid.adroit) Reviewers: craig.topper, evstupac Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55876 llvm-svn: 350074	2018-12-26 21:59:48 +00:00
Max Kazantsev	28298e9647	[NFC] Use utility function for guards detection llvm-svn: 350064	2018-12-26 08:22:25 +00:00
Max Kazantsev	9b25bf3960	[NFC] Reuse variables instead of re-calling getParent llvm-svn: 350062	2018-12-25 07:20:06 +00:00
Eugene Leviant	4dc3a3f746	[HWASAN] Instrument memorty intrinsics by default Differential revision: https://reviews.llvm.org/D55926 llvm-svn: 350055	2018-12-24 16:02:48 +00:00
Max Kazantsev	edabb9ae56	[LoopSimplifyCFG] Delete dead exiting edges This patch teaches LoopSimplifyCFG to remove dead exiting edges from loops. Differential Revision: https://reviews.llvm.org/D54025 Reviewed By: fedor.sergeev llvm-svn: 350049	2018-12-24 07:41:33 +00:00
Max Kazantsev	347c583772	Return "[LoopSimplifyCFG] Delete dead in-loop blocks" The underlying bug that caused the revert should be fixed by rL348567. Differential Revision: https://reviews.llvm.org/D54023 llvm-svn: 350045	2018-12-24 06:06:17 +00:00
George Burgess IV	7e12875c89	[LoopIdioms] More LocationSize::precise annotations; NFC Both of these places reference memset-like loops. Memset is precise. Trying to keep these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350044	2018-12-24 05:55:50 +00:00
George Burgess IV	5e4a03a089	[MemCpyOpt] Use LocationSize instead of ints; NFC Trying to keep these patches super small so they're easily post-commit verifiable, as requested in D44748. srcSize is derived from the size of an alloca, and we quit out if the size of that is > the size of the thing we're copying to. Hence, we should always copy everything over, so these sizes are precise. Don't make srcSize itself a LocationSize, since optionality isn't helpful, and we do some comparisons against other sizes elsewhere in that function. llvm-svn: 350019	2018-12-23 06:40:39 +00:00
Mircea Trofin	b53eeb6f4c	[llvm] API for encoding/decoding DWARF discriminators. Summary: Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling) The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues: - communicates overflow back to the user - supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported. Reviewers: davidxl, danielcdh, wmi, dblaikie Reviewed By: dblaikie Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D55681 llvm-svn: 349973	2018-12-21 22:48:50 +00:00
Vedant Kumar	b264d69de7	[IR] Add Instruction::isLifetimeStartOrEnd, NFC Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an llvm.lifetime.start or an llvm.lifetime.end intrinsic. This was suggested as a cleanup in D55967. Differential Revision: https://reviews.llvm.org/D56019 llvm-svn: 349964	2018-12-21 21:49:40 +00:00
Anna Thomas	18be3cb606	[RuntimeUnrolling] NFC: Add TODO and comments in connectProlog Currently, runtime unrolling does not support loops where multiple exiting blocks exit to the latchExit. Added TODO and other code clarifications for ConnectProlog code. llvm-svn: 349944	2018-12-21 19:45:05 +00:00
Simon Pilgrim	5d403f6bf8	[X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm) This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics. Clang counterpart: https://reviews.llvm.org/D55890 Differential Revision: https://reviews.llvm.org/D55894 llvm-svn: 349892	2018-12-21 09:04:14 +00:00
Reid Kleckner	b894ecf903	[memcpyopt] Add debug logs when forwarding memcpy src to dst llvm-svn: 349873	2018-12-21 01:41:20 +00:00
Eli Friedman	3af2f53456	[LoopUnroll] Don't verify domtree by default with +Asserts. This verification is linear in the size of the function, so it can cause a quadratic compile-time explosion in a function with many loops to unroll. Differential Revision: https://reviews.llvm.org/D54732 llvm-svn: 349871	2018-12-21 01:28:49 +00:00
Tom Stellard	2f44fbe936	cmake: Remove add_llvm_loadable_module() Summary: This function is very similar to add_llvm_library(), so this patch merges it into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...) with add_llvm_library(lib MODULE ...) Reviewers: philip.pfaffe, beanz, chandlerc Reviewed By: philip.pfaffe Subscribers: chapuni, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D51748 llvm-svn: 349839	2018-12-20 22:04:08 +00:00
Michael Kruse	199427100b	[InstCombine] Preserve access-group metadata. Preserve llvm.access.group metadata when combining store instructions. This was forgotten in r349725. Fixes llvm.org/PR40117 llvm-svn: 349774	2018-12-20 17:11:02 +00:00
Piotr Sobczak	deaacc17fe	[InstCombine][AMDGPU] Handle more buffer intrinsics Summary: Include the following intrinsics in the InsctCombine simplification: * amdgcn_raw_buffer_load * amdgcn_raw_buffer_load_format * amdgcn_struct_buffer_load * amdgcn_struct_buffer_load_format Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55882 llvm-svn: 349735	2018-12-20 10:08:18 +00:00
Alexander Potapenko	0e3b85a730	[MSan] Don't emit __msan_instrument_asm_load() calls LLVM treats void* pointers passed to assembly routines as pointers to sized types. We used to emit calls to __msan_instrument_asm_load() for every such void*, which sometimes led to false positives. A less error-prone (and truly "conservative") approach is to unpoison only assembly output arguments. llvm-svn: 349734	2018-12-20 10:05:00 +00:00
Eugene Leviant	2d98eb1b2e	[HWASAN] Add support for memory intrinsics Differential revision: https://reviews.llvm.org/D55117 llvm-svn: 349728	2018-12-20 09:04:33 +00:00
Michael Kruse	978ba61536	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725	2018-12-20 04:58:07 +00:00
Vitaly Buka	07a55f27dc	[asan] Undo special treatment of linkonce_odr and weak_odr Summary: On non-Windows these are already removed by ShouldInstrumentGlobal. On Window we will wait until we get actual issues with that. Reviewers: pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55899 llvm-svn: 349707	2018-12-20 00:30:27 +00:00
Vitaly Buka	d414e1bbb5	[asan] Prevent folding of globals with redzones Summary: ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized globals. Also in general unnamed_addr is not valid here as address now is important for ODR violation detector and redzone poisoning. Before the patch ICF on globals caused: 1. false ODR reports when we register global on the same address more than once 2. globals buffer overflow if we fold variables of smaller type inside of large type. Then the smaller one will poison redzone which overlaps with the larger one. Reviewers: eugenis, pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55857 llvm-svn: 349706	2018-12-20 00:30:18 +00:00
Nikita Popov	3817ee7908	Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions" This reverts commit r349674. It causes a failure in test-suite enc-3des.execution_time. llvm-svn: 349684	2018-12-19 22:09:02 +00:00
Nikita Popov	649e125451	[BDCE][DemandedBits] Detect dead uses of undead instructions This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The test case has a couple of cases that are not simplified yet. In particular, we're only looking at uses of instructions right now. I think it would make sense to also extend this to arguments. Furthermore DemandedBits doesn't yet know some of the tricks that InstCombine does for the demanded bits or bitwise or/and/xor in combination with known bits information. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 349674	2018-12-19 19:56:21 +00:00
Anton Afanasyev	ce28791e20	Test commit Fix typos. llvm-svn: 349644	2018-12-19 17:18:40 +00:00
Vitaly Buka	4e4920694c	[asan] Restore ODR-violation detection on vtables Summary: unnamed_addr is still useful for detecting of ODR violations on vtables Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false reports which can be avoided with --icf=none or by using private aliases with -fsanitize-address-use-odr-indicator Reviewers: eugenis Reviewed By: eugenis Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55799 llvm-svn: 349555	2018-12-18 22:23:30 +00:00
Kuba Mracek	3760fc9f3d	[asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list. Differential Revision: https://reviews.llvm.org/D55794 llvm-svn: 349544	2018-12-18 21:20:17 +00:00
Pete Cooper	be4f571107	Change the objc ARC optimizer to use the new objc.* intrinsics We're moving ARC optimisation and ARC emission in clang away from runtime methods and towards intrinsics. This is the part which actually uses the intrinsics in the ARC optimizer when both analyzing the existing calls and emitting new ones. Differential Revision: https://reviews.llvm.org/D55348 Reviewers: ahatanak llvm-svn: 349534	2018-12-18 20:32:49 +00:00
Nikita Popov	20853a7807	[InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check Checking whether a number has a certain number of trailing / leading zeros means checking whether it is of the form XXXX1000 / 0001XXXX, which can be done with an and+icmp. Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next step, this can be extended to non-equality predicates. Differential Revision: https://reviews.llvm.org/D55745 llvm-svn: 349530	2018-12-18 19:59:50 +00:00
Florian Hahn	5c014037b3	[SCCP] Get rid of redundant call for getPredicateInfoFor (NFC). We can use the result fetched a few lines above. llvm-svn: 349527	2018-12-18 19:37:07 +00:00
Sanjay Patel	e51d5bdb3c	[InstCombine] refactor isCheapToScalarize(); NFC As the FIXME indicates, this has the potential to go overboard. So I'm not sure if it's even worth keeping this vs. iteratively doing simple matches, but we might as well clean it up. llvm-svn: 349523	2018-12-18 19:07:38 +00:00
Michael Kruse	d4eb13c880	[LoopVectorize] Rename pass options. NFC. Rename: NoUnrolling to InterleaveOnlyWhenForced and AlwaysVectorize to !VectorizeOnlyWhenForced Contrary to what the name 'AlwaysVectorize' suggests, it does not unconditionally vectorize all loops, but applies a cost model to determine whether vectorization is profitable to all loops. Hence, passing false will disable the cost model, except when a loop is marked with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested by @hfinkel in D55716) better matches this behavior. Similarly, 'NoUnrolling' disables the profitability cost model for interleaving (a term to distinguish it from unrolling by the LoopUnrollPass); rename it for consistency. Differential Revision: https://reviews.llvm.org/D55785 llvm-svn: 349513	2018-12-18 17:46:09 +00:00
Michael Kruse	3284775b70	[LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops. When using clang with `-fno-unroll-loops` (implicitly added with `-O1`), the LoopUnrollPass is not not added to the (legacy) pass pipeline. This also means that it will not process any loop metadata such as llvm.loop.unroll.enable (which is generated by #pragma unroll or WarnMissedTransformationsPass emits a warning that a forced transformation has not been applied (see https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html). Such explicit transformations should take precedence over disabling heuristics. This patch unconditionally adds LoopUnrollPass to the optimizing pipeline (that is, it is still not added with `-O0`), but passes a flag indicating whether automatic unrolling is dis-/enabled. This is the same approach as LoopVectorize uses. The new pass manager's pipeline builder has no option to disable unrolling, hence the problem does not apply. Differential Revision: https://reviews.llvm.org/D55716 llvm-svn: 349509	2018-12-18 17:16:05 +00:00
Dylan McKay	f920da009e	[IPO][AVR] Create new Functions in the default address space specified in the data layout This modifies the IPO pass so that it respects any explicit function address space specified in the data layout. In targets with nonzero program address spaces, all functions should, by default, be placed into the default program address space. This is required for Harvard architectures like AVR. Without this, the functions will be marked as residing in data space, and thus not be callable. This has no effect to any in-tree official backends, as none use an explicit program address space in their data layouts. Patch by Tim Neumann. llvm-svn: 349469	2018-12-18 09:52:52 +00:00
Tim Northover	856628f707	SROA: preserve alignment tags on loads and stores. When splitting up an alloca's uses we were dropping any explicit alignment tags, which means they default to the ABI-required default alignment and this can cause miscompiles if the real value was smaller. Also refactor the TBAA metadata into a parent class since it's shared by both children anyway. llvm-svn: 349465	2018-12-18 09:29:39 +00:00
Peter Collingbourne	d3a3e4b46d	hwasan: Move ctor into a comdat. Differential Revision: https://reviews.llvm.org/D55733 llvm-svn: 349413	2018-12-17 22:56:34 +00:00
Sanjay Patel	200885e654	[AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924) Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. In the worst case (a target that doesn't have rotate instructions), we will expand this into a branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very likely to be a perf improvement over the original code. The motivating source code pattern for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=34924 Background: I looked at several different options before deciding where to try this - instcombine, simplifycfg, CGP - because it doesn't fit cleanly anywhere AFAIK. The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to have the IR converted before we reach things like vectorization because the reduced code can make a loop much simpler to transform. Technically, this could be included in instcombine, but it's a large pattern match that includes control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch. So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. This only runs at -O3, but that seems like a reasonable limitation given that source code has many options to avoid this pattern (including the recently added clang intrinsics for rotates). I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug with the new pass manager that's independent of this change (but it will be masked if we canonicalize harder to funnel shift intrinsics in instcombine). Differential Revision: https://reviews.llvm.org/D55604 llvm-svn: 349396	2018-12-17 21:14:51 +00:00
Sanjay Patel	1a6e9ec434	[InstCombine] don't widen an arbitrary sequence of vector ops (PR40032) The problem is shown specifically for a case with vector multiply here: https://bugs.llvm.org/show_bug.cgi?id=40032 ...and this might mask the original backend bug for ARM shown in: https://bugs.llvm.org/show_bug.cgi?id=39967 As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision: https://reviews.llvm.org/D55744 llvm-svn: 349389	2018-12-17 20:27:43 +00:00
Davide Italiano	e41e1d015f	[EarlyCSE] If DI can't be salvaged, mark it as unavailable. Fixes PR39874. llvm-svn: 349323	2018-12-17 01:42:39 +00:00
Kamil Rytarowski	21e270a479	Add NetBSD support in needsRuntimeRegistrationOfSectionRange. Use linker script magic to get data/cnts/name start/end. llvm-svn: 349277	2018-12-15 16:51:35 +00:00
Kamil Rytarowski	15ae738bc8	Register kASan shadow offset for NetBSD/amd64 The NetBSD x86_64 kernel uses the 0xdfff900000000000 shadow offset. llvm-svn: 349276	2018-12-15 16:32:41 +00:00
Florian Hahn	c214bc2b8d	[NewGVN] Update use counts for SSA copies when replacing them by their operands. The current code relies on LeaderUseCount to determine if we can remove an SSA copy, but in that the LeaderUseCount does not refer to the SSA copy. If a SSA copy is a dominating leader, we use the operand as dominating leader instead. This means we removed a user of a ssa copy and we should decrement its use count, so we can remove the ssa copy once it becomes dead. Fixes PR38804. Reviewers: efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D51595 llvm-svn: 349217	2018-12-15 00:32:38 +00:00
Vedant Kumar	9d1827331f	[Util] Refer to [s\|z]exts of args when converting dbg.declares (fix PR35400) When converting dbg.declares, if the described value is a [s\|z]ext, refer to the ext directly instead of referring to its operand. This fixes a narrowing bug (the debugger got the sign of a variable wrong, see llvm.org/PR35400). The main reason to refer to the ext's operand was that an optimization may remove the ext itself, leading to a dropped variable. Now that InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this is less of a concern. Other passes can/should adopt this API as needed to fix dropped variable bugs. Differential Revision: https://reviews.llvm.org/D51813 llvm-svn: 349214	2018-12-15 00:03:33 +00:00
Michael Kruse	ea9ef34558	[TransformWarning] Do not warn missed transformations in optnone functions. Optimization transformations are intentionally disabled by the 'optnone' function attribute. Therefore do not warn if transformation metadata is still present. Using the legacy pass manager structure, the `skipFunction` method takes care for the optnone attribute (already called before this patch). For the new pass manager, there is no equivalent, so we check for the 'optnone' attribute manually. Differential Revision: https://reviews.llvm.org/D55690 llvm-svn: 349184	2018-12-14 19:45:43 +00:00
Michael Kruse	5948b7f30f	[Transforms] Preserve metadata when converting invoke to call. The `changeToCall` function did not preserve the invoke's metadata. Currently, there is probably no metadata that depends on being applied on a CallInst or InvokeInst. Therefore we can replace the instruction's metadata. This fixes http://llvm.org/PR39994 Suggested-by: Moritz Kreutzer <moritz.kreutzer@siemens.com> Differential Revision: https://reviews.llvm.org/D55666 llvm-svn: 349170	2018-12-14 18:15:11 +00:00
Evgeniy Stepanov	eb238ecf0f	Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)" Breaks sanitizer-android buildbot. This reverts commit af8443a984c3b491c9ca2996b8d126ea31e5ecbe. llvm-svn: 349092	2018-12-13 23:47:50 +00:00
Wei Mi	66c6c5abea	[SampleFDO] handle ProfileSampleAccurate when initializing function entry count ProfileSampleAccurate is used to indicate the profile has exact match to the code to be optimized. Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle ProfileSampleAccurate in multiple places. Differential Revision: https://reviews.llvm.org/D55660 llvm-svn: 349088	2018-12-13 21:51:42 +00:00
Nikita Popov	dc73a6edde	Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail" Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. The previous version of this patch did not perform dependency checking properly: While the dependency is checked at the position of the memset, the used size must be that of the memcpy. Previously the size of the memset was used, which missed modification in the region MemSetSize..CopySize, resulting in miscompiles. The added tests cover variations of this issue. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 349078	2018-12-13 20:04:27 +00:00
Easwaran Raman	5a7056fa03	[ThinLTO] Compute synthetic function entry count Summary: This patch computes the synthetic function entry count on the whole program callgraph (based on module summary) and writes the entry counts to the summary. After function importing, this count gets attached to the IR as metadata. Since it adds a new field to the summary, this bumps up the version. Reviewers: tejohnson Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D43521 llvm-svn: 349076	2018-12-13 19:54:27 +00:00
Davide Italiano	9737096bb1	[LoopUtils] Use i32 instead of `void`. The actual type of the first argument of the @dbg intrinsic doesn't really matter as we're setting it to `undef`, but the bitcode reader is picky about `void` types. llvm-svn: 349069	2018-12-13 18:37:23 +00:00
Vitaly Buka	a257639a69	[asan] Don't check ODR violations for particular types of globals Summary: private and internal: should not trigger ODR at all. unnamed_addr: current ODR checking approach fail and rereport false violation if a linker merges such globals linkonce_odr, weak_odr: could cause similar problems and they are already not instrumented for ELF. Reviewers: eugenis, kcc Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55621 llvm-svn: 349015	2018-12-13 09:47:39 +00:00
David L. Jones	54c01ad6a9	Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail" This revision caused trucated memsets for structs with padding. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html llvm-svn: 349002	2018-12-13 03:15:11 +00:00
Davide Italiano	8ee59ca653	[LoopUtils] Prefer a set over a map. NFCI. llvm-svn: 348999	2018-12-13 01:11:52 +00:00
Davide Italiano	744c3c327f	[LoopDeletion] Update debug values after loop deletion. When loops are deleted, we don't keep track of variables modified inside the loops, so the DI will contain the wrong value for these. e.g. int b() { int i; for (i = 0; i < 2; i++) ; patatino(); return a; -> 6 patatino(); 7 return a; 8 } 9 int main() { b(); } (lldb) frame var i (int) i = 0 We mark instead these values as unavailable inserting a @llvm.dbg.value(undef to make sure we don't end up printing an incorrect value in the debugger. We could consider doing something fancier, for, e.g. constants, in the future. PR39868. rdar://problem/46418795) Differential Revision: https://reviews.llvm.org/D55299 llvm-svn: 348988	2018-12-12 23:32:35 +00:00
Nikita Popov	36e03ac6ee	[InstCombine] Fix negative GEP offset evaluation for 32-bit pointers This fixes https://bugs.llvm.org/show_bug.cgi?id=39908. The evaluateGEPOffsetExpression() function simplifies GEP offsets for use in comparisons against zero, basically by converting XScale+Offset==0 to X+Offset/Scale==0 if Scale divides Offset. However, before this is done, Offset is masked down to the pointer size. This results in incorrect results for negative Offsets, because we basically end up dividing the 32-bit offset zero* extended to 64-bit bits (rather than sign extended). Fix this by explicitly sign extending the truncated value. Differential Revision: https://reviews.llvm.org/D55449 llvm-svn: 348987	2018-12-12 23:19:03 +00:00
Ryan Prichard	e028c818f5	[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6) Summary: The change is needed to support ELF TLS in Android. See D55581 for the same change in compiler-rt. Reviewers: srhines, eugenis Reviewed By: eugenis Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D55592 llvm-svn: 348983	2018-12-12 22:45:06 +00:00
Michael Kruse	7244852557	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944	2018-12-12 17:32:52 +00:00
Mikael Holmen	c06b01cb22	Fix compiler warning about unused variable [NFC] llvm-svn: 348913	2018-12-12 06:33:45 +00:00
Gor Nishanov	20d833d5e3	[coroutines] Improve suspend point simplification Summary: Enable suspend point simplification for cases where: * coro.save and coro.suspend are in different basic blocks * where there are intervening intrinsics Reviewers: modocache, tks2103, lewissbaker Reviewed By: modocache Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D55160 llvm-svn: 348897	2018-12-11 21:23:09 +00:00
Fedor Sergeev	a1d95c3fc4	[NewPM] fixing asserts on deleted loop in -print-after-all IR-printing AfterPass instrumentation might be called on a loop that has just been invalidated. We should skip printing it to avoid spurious asserts. Reviewed By: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D54740 llvm-svn: 348887	2018-12-11 19:05:35 +00:00
Vedant Kumar	b3a7cae045	[HotColdSplitting] Disable outlining landingpad instructions (PR39917) It's currently not safe to outline landingpad instructions (see llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of previous landingpad instructions in a function alters the lowering of subsequent landingpads by renumbering type info ID's. Outlining a landingpad therefore breaks exception handling & unwinding. llvm-svn: 348870	2018-12-11 18:05:31 +00:00
Sanjay Patel	2aa2dc76c2	[InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927) call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM This has the potential to create less-than-8-bit scalar types as shown in some of the test diffs, but it looks like the backend knows how to deal with that in these patterns. This is the simple part of the fix suggested in: https://bugs.llvm.org/show_bug.cgi?id=39927 Differential Revision: https://reviews.llvm.org/D55529 llvm-svn: 348862	2018-12-11 16:38:03 +00:00
David Stenberg	2474ce5862	[DeadArgElim] Fixes for dbg.values using dead arg/return values Summary: When eliminating a dead argument or return value in a function with local linkage, all uses, including in dbg.value intrinsics, would be replaced with null constants. This would mean that, for example for an integer argument, the debug info would incorrectly express that the value is 0. Instead, replace all uses with undef to indicate that the argument/return value is optimized out. Also, make sure that metadata uses of return values are rewritten even if there are no non-metadata uses of the value. As a bit of historical curiosity, the code that emitted null constants was introduced in the initial check-in of the pass in 2003, before 'undef' values even existed in LLVM. This fixes PR23260. Reviewers: dblaikie, aprantl, vsk, djtodoro Reviewed By: aprantl Subscribers: llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D55513 llvm-svn: 348837	2018-12-11 10:33:38 +00:00
Davide Italiano	8ec7709f58	[Local] Promote an utility that could be used elsewhere. NFCI. llvm-svn: 348804	2018-12-10 22:17:04 +00:00
Matt Arsenault	9ccde61f81	InstCombine: Scalarize single use icmp/fcmp llvm-svn: 348801	2018-12-10 21:50:54 +00:00
Nikita Popov	94b8e2ea4e	[MemCpyOpt] memset->memcpy forwarding with undef tail Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 348645	2018-12-07 21:16:58 +00:00
Vedant Kumar	03f9f15b16	[HotColdSplitting] Refine definition of unlikelyExecuted The splitting pass uses its 'unlikelyExecuted' predicate to statically decide which blocks are cold. - Do not treat noreturn calls as if they are cold unless they are actually marked cold. This is motivated by functions like exit() and longjmp(), which are not beneficial to outline. - Do not treat inline asm as an outlining barrier. In practice asm("") is frequently used to inhibit basic block merging; enabling outlining in this case results in substantial memory savings. - Treat invokes of cold functions as cold. As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's no longer needed. The pass can identify & outline blocks dominated by EH pads, so there's no need to special-case __cxa_begin_catch etc. Differential Revision: https://reviews.llvm.org/D54244 llvm-svn: 348640	2018-12-07 20:24:04 +00:00
Vedant Kumar	03aaa3e2aa	[HotColdSplitting] Outline more than once per function Algorithm: Identify maximal cold regions and put them in a worklist. If a candidate region overlaps with another, discard it. While the worklist is full, remove a single-entry sub-region from the worklist and attempt to outline it. By the non-overlap property, this should not invalidate parts of the domtree pertaining to other outlining regions. Testing: LNT results on X86 are clean. With test-suite + externals, llvm outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file 483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm outlines over 100 times in some functions (mostly EH paths). There was not a significant performance impact pre vs. post-patch. Differential Revision: https://reviews.llvm.org/D53887 llvm-svn: 348639	2018-12-07 20:23:52 +00:00
Nikita Popov	110cf05203	Reapply "[DemandedBits][BDCE] Support vectors of integers" DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348602	2018-12-07 15:38:13 +00:00
Max Kazantsev	b9e65cbddf	Introduce llvm.experimental.widenable_condition intrinsic This patch introduces a new instinsic `@llvm.experimental.widenable_condition` that allows explicit representation for guards. It is an alternative to using `@llvm.experimental.guard` intrinsic that does not contain implicit control flow. We keep finding places where `@llvm.experimental.guard` is not supported or treated too conservatively, and there are 2 reasons to that: - `@llvm.experimental.guard` has memory write side effect to model implicit control flow, and this sometimes confuses passes and analyzes that work with memory; - Not all passes and analysis are aware of the semantics of guards. These passes treat them as regular throwing call and have no idea that the condition of guard may be used to prove something. One well-known place which had caused us troubles in the past is explicit loop iteration count calculation in SCEV. Another example is new loop unswitching which is not aware of guards. Whenever a new pass appears, we potentially have this problem there. Rather than go and fix all these places (and commit to keep track of them and add support in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible. The only significant difference between guards and regular explicit branches is that guard's condition can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor, and it is always legal to go there no matter what the guard condition is. The other successor is a guarded block, and it is only legal to go there if the condition is true. This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard` intrinsic. Now a widenable guard can be represented in the CFG explicitly like this: %widenable_condition = call i1 @llvm.experimental.widenable.condition() %new_condition = and i1 %cond, %widenable_condition br i1 %new_condition, label %guarded, label %deopt guarded: ; Guarded instructions deopt: call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ] The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an `undef`, but the intrinsic prevents the optimizer from folding it early. This form should exploit all optimization boons provided to `br` instuction, and it still can be widened by replacing the result of `@llvm.experimental.widenable.condition()` with `and` with any arbitrary boolean value (as long as the branch that is taken when it is `false` has a deopt and has no side-effects). For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using implicit control flow in guards". This patch introduces this new intrinsic with respective LangRef changes and a pass that converts old-style guards (expressed as intrinsics) into the new form. The naming discussion is still ungoing. Merging this to unblock further items. We can later change the name of this intrinsic. Reviewed By: reames, fedor.sergeev, sanjoy Differential Revision: https://reviews.llvm.org/D51207 llvm-svn: 348593	2018-12-07 14:39:46 +00:00
Markus Lavin	4dc4ebd606	[PM] Port LoadStoreVectorizer to the new pass manager. Differential Revision: https://reviews.llvm.org/D54848 llvm-svn: 348570	2018-12-07 08:23:37 +00:00
Max Kazantsev	a523a21175	[LoopSimplifyCFG] Do not deal with loops with irreducible CFG inside The current algorithm that collects live/dead/inloop blocks relies on some invariants related to RPO and PO traversals. In particular, the important fact it requires is that the only loop's latch is the first block in PO traversal. It also relies on fact that during RPO we visit all prececessors of a block before we visit this block (backedges ignored). If a loop has irreducible non-loop cycle inside, both these assumptions may break. This patch adds detection for this situation and prohibits the terminator folding for loops with irreducible CFG. We can in theory support this later, for this some algorithmic changes are needed. Besides, irreducible CFG is not a frequent situation and we can just don't bother. Thanks @uabelho for finding this! Differential Revision: https://reviews.llvm.org/D55357 Reviewed By: skatkov llvm-svn: 348567	2018-12-07 05:44:45 +00:00
Vedant Kumar	b2a6f8e505	[CodeExtractor] Store outputs at the first valid insertion point When CodeExtractor outlines values which are used by the original function, it must store those values in some in-out parameter. This store instruction must not be inserted in between a PHI and an EH pad instruction, as that results in invalid IR. This fixes the following verifier failure seen while outlining within ObjC methods with live exit values: The unwind destination does not have an exception handling instruction! %call35 = invoke i8* bitcast (i8* (i8, i8, ...)* @objc_msgSend to i8* (i8, i8))(i8 %exn.adjusted, i8* %1) to label %invoke.cont34 unwind label %lpad33, !dbg !4183 The unwind destination does not have an exception handling instruction! invoke void @objc_exception_throw(i8* %call35) #12 to label %invoke.cont36 unwind label %lpad33, !dbg !4184 LandingPadInst not the first non-PHI instruction in the block. %3 = landingpad { i8, i32 } catch i8 null, !dbg !1411 rdar://46540815 llvm-svn: 348562	2018-12-07 03:01:54 +00:00
Nikita Popov	14ca9a8355	Revert "[DemandedBits][BDCE] Support vectors of integers" This reverts commit r348549. Causing assertion failures during clang build. llvm-svn: 348558	2018-12-07 00:42:03 +00:00
Nikita Popov	cf65b9207b	[DemandedBits][BDCE] Support vectors of integers DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. The getDemandedBits() method can now only be called on instructions that have integer or vector of integer type. Previously it could be called on any sized instruction (even if it was not particularly useful). The size of the return value is now always the scalar size in bits (while previously it was the type size in bits). Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348549	2018-12-06 23:50:32 +00:00
Adrian Prantl	fbeeac0e1e	Reapply "Adapt gcov to changes in CFE." This reverts commit r348203 and reapplies D55085 with an additional GCOV bugfix to make the change NFC for relative file paths in .gcno files. Thanks to Ilya Biryukov for additional testing! Original commit message: Update Diagnostic handling for changes in CFE. The clang frontend no longer emits the current working directory for DIFiles containing an absolute path in the filename: and will move the common prefix between current working directory and the file into the directory: component. https://reviews.llvm.org/D55085 llvm-svn: 348512	2018-12-06 18:44:48 +00:00
Alexandros Lamprineas	e4c91f5c4c	[GVN] Don't perform scalar PRE on GEPs Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from sinking the addressing mode computation of memory instructions back to its uses. The problem comes from the insertion of PHIs, which confuse CGP and make it bail. I've autogenerated the check lines of an existing test and added a store instruction to demonstrate the motivation behind this change. The store is now using the gep instead of a phi. Differential Revision: https://reviews.llvm.org/D55009 llvm-svn: 348496	2018-12-06 16:11:58 +00:00
Ilya Biryukov	cb5331eb93	Revert "[LoopSimplifyCFG] Delete dead in-loop blocks" This reverts commit r348457. The original commit causes clang to crash when doing an instrumented build with a new pass manager. Reverting to unbreak our integrate. llvm-svn: 348484	2018-12-06 13:21:01 +00:00
Roman Lebedev	98cb1216a6	[InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts I was finally able to quantify what i thought was missing in the fix, it was vector constants. If we have a scalar (and %x, -1), it will be instsimplified before we reach this code, but if it is a vector, we may still have a -1 element. Thus, we want to avoid the fold if at least one element is -1. Or in other words, ignoring the undef elements, no sign bits should be set. Thus, m_NonNegative(). A follow-up for rL348181 https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348462	2018-12-06 08:14:24 +00:00
Max Kazantsev	0b1d069d64	[LoopSimplifyCFG] Delete dead in-loop blocks This patch teaches LoopSimplifyCFG to delete loop blocks that have become unreachable after terminator folding has been done. Differential Revision: https://reviews.llvm.org/D54023 Reviewed By: anna llvm-svn: 348457	2018-12-06 05:45:02 +00:00
Sanjay Patel	998ececef0	[InstCombine] remove dead code from visitExtractElement Extracting from a splat constant is always handled by InstSimplify. Move the test for this from InstCombine to InstSimplify to make sure that stays true. llvm-svn: 348423	2018-12-05 23:09:33 +00:00
Sanjay Patel	47b3b4b5aa	[InstCombine] reduce duplication in visitExtractElementInst; NFC llvm-svn: 348418	2018-12-05 21:57:51 +00:00
Vedant Kumar	09415a850e	[CodeExtractor] Do not marked outlined calls which may resume EH as noreturn Treat terminators which resume exception propagation as returning instructions (at least, for the purposes of marking outlined functions `noreturn`). This is to avoid inserting traps after calls to outlined functions which unwind. rdar://46129950 llvm-svn: 348404	2018-12-05 19:35:37 +00:00
Christian Bruel	4ead99b3ac	Allow norecurse attribute on functions that have debug infos. Summary: debug intrinsics might be marked norecurse to enable the caller function to be norecurse and optimized if needed. This avoids code gen optimisation differences when -g is used, as in globalOpt.cpp:processInternalGlobal checks. Reviewers: chandlerc, jmolloy, aprantl Reviewed By: aprantl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D55187 llvm-svn: 348381	2018-12-05 16:48:00 +00:00
Sanjay Patel	baffae91b2	[InstCombine] simplify icmps with same operands based on dominating cmp The tests here are based on the motivating cases from D54827. More background: 1. We don't get these cases in general with SimplifyCFG because the root of the pattern match is an icmp, not a branch. I'm not sure how often we encounter this pattern vs. the seemingly more likely case with branches, but I don't see evidence to leave the minimal pattern unoptimized. 2. This has a chance of increasing compile-time because we're using a ValueTracking call to handle the match. The motivating cases could be handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/ isImpliedFalseByMatchingCmp, but I saw that we have a more comprehensive wrapper around those, so we might as well use it here unless there's evidence that it's significantly slower. 3. Ideally, we'd handle the fold to constants in InstSimplify, but as with the existing code here, we could extend this to handle cases where the result is not a constant, but a new combined predicate. That would mean splitting the logic across the 2 passes and possibly duplicating the pattern-matching cost. 4. As mentioned in D54827, this seems like the kind of thing that should be handled in Correlated Value Propagation, but that pass is currently limited to dealing with instructions with constant operands, so extending this bit of InstCombine is the smallest/easiest way to get these patterns optimized. llvm-svn: 348367	2018-12-05 15:04:00 +00:00
Alina Sbirlea	0e216854f9	[LICM] Actually disable ControlFlowHoisting. Summary: The remaining code paths that ControlFlowHoisting introduced that were not disabled, increased compile time by 3x for some benchmarks. The time is spent in DominatorTree updates. Reviewers: john.brawn, mkazantsev Subscribers: sanjoy, jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D55313 llvm-svn: 348345	2018-12-05 10:16:21 +00:00
Vitaly Buka	8076c57fd2	[asan] Add clang flag -fsanitize-address-use-odr-indicator Reviewers: eugenis, m.ostapenko, ygribov Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55157 llvm-svn: 348327	2018-12-05 01:44:31 +00:00
Vitaly Buka	d6bab09b4b	[asan] Split -asan-use-private-alias to -asan-use-odr-indicator Reviewers: eugenis, m.ostapenko, ygribov Subscribers: mehdi_amini, kubamracek, hiraditya, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D55156 llvm-svn: 348316	2018-12-04 23:17:41 +00:00
Sanjay Patel	3d5bb15a1d	[CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC The old function underspecified the return type, took an unused parameter, and had a misleading name. llvm-svn: 348292	2018-12-04 18:53:27 +00:00
Sanjay Patel	d23b5ed857	[InstCombine] rearrange foldICmpWithDominatingICmp; NFC Move it out from under the constant check, reorder predicates, add comments. This makes it easier to extend to handle the non-constant case. llvm-svn: 348284	2018-12-04 17:44:24 +00:00
Ilya Biryukov	449a7f0dbb	Revert "Adapt gcov to changes in CFE." This reverts commit r348203. Reason: this produces absolute paths in .gcno files, breaking us internally as we rely on them being consistent with the filenames passed in the command line. Also reverts r348157 and r348155 to account for revert of r348154 in clang repository. llvm-svn: 348279	2018-12-04 16:30:31 +00:00
Sanjay Patel	a40bf9fff7	[InstCombine] add helper for icmp with dominator; NFC There's a potential small enhancement to this code that could solve the cases currently under proposal in D54827 via SimplifyCFG. Whether instcombine should be doing this kind of semi-non-local analysis in the first place is an open question, but separating the logic out can only help if/when we decide to move it to a different pass. AFAICT, any proposal to do this in SimplifyCFG could also be seen as an overreach + it would be incomplete to start the fold from a branch rather than an icmp. There's another question here about the code for processUGT_ADDCST_ADD(). That part may be completely dead after rL234638 ? llvm-svn: 348273	2018-12-04 15:35:17 +00:00
Alina Sbirlea	797935f4f1	[SimpleLoopUnswitch] Remove debug dump. llvm-svn: 348267	2018-12-04 14:43:24 +00:00
Alina Sbirlea	a2eebb828e	Update MemorySSA in SimpleLoopUnswitch. Summary: Teach SimpleLoopUnswitch to preserve MemorySSA. Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D47022 llvm-svn: 348263	2018-12-04 14:23:37 +00:00
Vitaly Buka	537cfc0352	[asan] Reduce binary size by using unnamed private aliases Summary: --asan-use-private-alias increases binary sizes by 10% or more. Most of this space was long names of aliases and new symbols. These symbols are not needed for the ODC check at all. Reviewers: eugenis Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55146 llvm-svn: 348221	2018-12-04 00:36:14 +00:00
Vedant Kumar	d129569e34	[CodeExtractor] Split PHI nodes with incoming values from outlined region (PR39433) If a PHI node out of extracted region has multiple incoming values from it, split this PHI on two parts. First PHI has incomings only from region and extracts with it (they are placed to the separate basic block that added to the list of outlined), and incoming values in original PHI are replaced by first PHI. Similar solution is already used in CodeExtractor for PHIs in entry block (severSplitPHINodes method). It covers PR39433 bug. Patch by Sergei Kachkov! Differential Revision: https://reviews.llvm.org/D55018 llvm-svn: 348205	2018-12-03 22:40:21 +00:00
Adrian Prantl	40eb622325	Adapt gcov to changes in CFE. The clang frontend no longer emits the current working directory for DIFiles containing an absolute path in the filename: and will move the common prefix between current working directory and the file into the directory: component. This fixes the GCOV tests in compiler-rt that were broken by the Clang change. llvm-svn: 348203	2018-12-03 22:37:48 +00:00
Sanjay Patel	8c65515082	[InstCombine] fix undef propagation bug with shuffle+binop When we have a shuffle that extends a source vector with undefs and then do some binop on that, we must make sure that the extra elements remain undef with that binop if we reverse the order of the binop and shuffle. 'or' is probably the easiest example to show the bug because 'or C, undef --> -1' (not undef). But there are other opcode/constant combinations where this is true as shown by the 'shl' test. llvm-svn: 348191	2018-12-03 21:15:17 +00:00
Roman Lebedev	7bf2fed167	[InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds. These two folds are invalid for this non-constant pattern when the mask ends up being all-ones: https://rise4fun.com/Alive/9au https://rise4fun.com/Alive/UcQM Fixes https://bugs.llvm.org/show_bug.cgi?id=39861 llvm-svn: 348181	2018-12-03 20:07:58 +00:00
Sanjay Patel	f2bda5e43f	[InstCombine] rearrange shuffle+binop fold; NFC This code has a bug dealing with undefs, so we need to add another escape hatch, so doing some cleanup ahead of that. llvm-svn: 348175	2018-12-03 19:53:04 +00:00
Sanjay Patel	472652ef68	[CmpInstAnalysis] fix formatting; NFC There are potential improvements to the structure of this API raised by D54994, but remove some cosmetic blemishes before making any functional changes. llvm-svn: 348149	2018-12-03 15:48:30 +00:00
Alexander Potapenko	7502e5fc56	[KMSAN] Enable -msan-handle-asm-conservative by default This change enables conservative assembly instrumentation in KMSAN builds by default. It's still possible to disable it with -msan-handle-asm-conservative=0 if something breaks. It's now impossible to enable conservative instrumentation for userspace builds, but it's not used anyway. llvm-svn: 348112	2018-12-03 10:15:43 +00:00
Sanjay Patel	7d82d37854	[ValueTracking] add helper function for testing implied condition; NFCI We were duplicating code around the existing isImpliedCondition() that checks for a predecessor block/dominating condition, so make that a wrapper call. llvm-svn: 348088	2018-12-02 13:26:03 +00:00
Nikita Popov	0c5d6ccbfc	[InstCombine] Support ssub.sat canonicalization for non-splats Extend ssub.sat(X, C) -> sadd.sat(X, -C) canonicalization to also support non-splat vector constants. This is done by generalizing the implementation of the isNotMinSignedValue() helper to return true for constants that are non-splat, but don't contain any signed min elements. Differential Revision: https://reviews.llvm.org/D55011 llvm-svn: 348072	2018-12-01 10:58:34 +00:00
Joseph Tremoulet	27b1e3bd4f	[Mem2Reg] Fix nondeterministic corner case Summary: When mem2reg inserts phi nodes in blocks with unreachable predecessors, it adds undef operands for those incoming edges. When there are multiple such predecessors, the order is currently based on the address of the BasicBlocks. This change fixes that by using the BBNumbers in the sort/search predicates, as is done elsewhere in mem2reg to ensure determinism. Also adds a testcase with a bunch of unreachable preds, which (nodeterministically) fails without the fix. Reviewers: majnemer Reviewed By: majnemer Subscribers: mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D55077 llvm-svn: 348024	2018-11-30 19:20:02 +00:00
Alexey Bataev	3689747619	[SLP]PR39774: Update references of the replaced external instructions. Summary: An additional fix for PR39774. Need to update the references for the RedcutionRoot instruction when it is replaced during the vectorization phase to avoid compiler crash on reduction vectorization. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55017 llvm-svn: 347997	2018-11-30 15:14:20 +00:00
Max Kazantsev	9cf417db78	[LoopSimplifyCFG] Update MemorySSA in terminator folding. PR39783 Terminator folding transform lacks MemorySSA update for memory Phis, while they exist within MemorySSA analysis. They need exactly the same type of updates as regular Phis. Failing to update them properly ends up with inconsistent MemorySSA and manifests in various assertion failures. This patch adds Memory Phi updates to this transform. Thanks to @jonpa for finding this! Differential Revision: https://reviews.llvm.org/D55050 Reviewed By: asbirlea llvm-svn: 347979	2018-11-30 10:06:23 +00:00
David Stuttard	c6603861d8	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911	2018-11-29 20:14:17 +00:00
Sanjay Patel	d802270808	[InstSimplify] fold select with implied condition This is an almost direct move of the functionality from InstCombine to InstSimplify. There's no reason not to do this in InstSimplify because we never create a new value with this transform. (There's a question of whether any dominance-based transform belongs in either of these passes, but that's a separate issue.) I've changed 1 of the conditions for the fold (1 of the blocks for the branch must be the block we started with) into an assert because I'm not sure how that could ever be false. We need 1 extra check to make sure that the instruction itself is in a basic block because passes other than InstCombine may be using InstSimplify as an analysis on values that are not wired up yet. The 3-way compare changes show that InstCombine has some kind of phase-ordering hole. Otherwise, we would have already gotten the intended final result that we now show here. llvm-svn: 347896	2018-11-29 18:44:39 +00:00
John Brawn	a7eb2c863f	[LICM] Reapply r347776 "Make LICM able to hoist phis" with fix This commit caused a large compile-time slowdown in some cases when NDEBUG is off due to the dominator tree verification it added. Fix this by only doing dominator tree and loop info verification when something has been hoisted. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347889	2018-11-29 17:10:00 +00:00
Teresa Johnson	93f9996278	[ThinLTO] Import local variables from the same module as caller Summary: We can sometimes end up with multiple copies of a local variable that have the same GUID in the index. This happens when there are local variables with the same name that are in different source files having the same name/path at compile time (but compiled into different bitcode objects). In this case make sure we import the copy in the caller's module. This enables importing both of the variables having the same GUID (but which will have different promoted names since the module paths, and therefore the module hashes, will be distinct). Importing the wrong copy is particularly problematic for read only variables, since we must import them as a local copy whenever referenced. Otherwise we get undefs at link time. Note that the llvm-lto.cpp and ThinLTOCodeGenerator changes are needed for testing the distributed index case via clang, which will be sent as a separate clang-side patch shortly. We were previously not doing the dead code/read only computation before computing imports when testing distributed index generation (like it was for testing importing and other ThinLTO mechanisms alone). Reviewers: evgeny777 Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D55047 llvm-svn: 347886	2018-11-29 17:02:42 +00:00
Joseph Tremoulet	926ee459c4	[CallSiteSplitting] Report edge deletion to DomTreeUpdater Summary: When splitting musttail calls, the split blocks' original terminators get removed; inform the DTU when this happens. Also add a testcase that fails an assertion in the DTU without this fix. Reviewers: fhahn, junbuml Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55027 llvm-svn: 347872	2018-11-29 15:27:04 +00:00
David Stuttard	de02e4b1cc	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871	2018-11-29 15:21:13 +00:00
Sanjay Patel	8242c82de4	[CVP] tidy processCmp(); NFC 1. The variables were confusing: 'C' typically refers to a constant, but here it was the Cmp. 2. Formatting violations. 3. Simplify code to return true/false constant. llvm-svn: 347868	2018-11-29 14:41:21 +00:00
Martin Storsjo	bfd1d27585	Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details. llvm-svn: 347867	2018-11-29 14:39:39 +00:00
Max Kazantsev	24c186ff00	Disable TermFolding in LoopSimplifyCFG until PR39783 is fixed llvm-svn: 347844	2018-11-29 09:00:19 +00:00
Sam Parker	d6ebf0108e	[LoopStrengthReduce] ComplexityLimit as an option Convert ComplexityLimit into a command line value. Differential Revision: https://reviews.llvm.org/D54899 llvm-svn: 347843	2018-11-29 08:34:22 +00:00
Jeremy Morse	9b4cfa55b1	[DebugInfo] Give inlinable calls DILocs (PR39807) In PR39807 we incorrectly handle circumstances where calls are common'd from conditional blocks into the parent BB. Calls that can be inlined must always have DebugLocs, however we strip them during commoning, which the IR verifier asserts on. Fix this by using applyMergedLocation: it will perform the same DebugLoc stripping of conditional Locs, but will also generate an unknown location DebugLoc that satisfies the requirement for inlinable calls to always have locations. Some of the prior logic for selecting a DebugLoc is now likely redundant; I'll generate a follow-up to remove it (involves editing more regression tests). Differential Revision: https://reviews.llvm.org/D54997 llvm-svn: 347782	2018-11-28 17:58:45 +00:00
John Brawn	4557ffeb63	[LICM] Enable control flow hoisting by default Differential Revision: https://reviews.llvm.org/D54949 llvm-svn: 347778	2018-11-28 17:23:03 +00:00
John Brawn	31c9769580	[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix This commit caused failures because it failed to correctly handle cases where we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We need to make sure that we rehoist the use to _after_ the hoisted phi, which we do by always rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader. An option is also added to control whether control flow is hoisted, which is off in this commit but will be turned on in a subsequent commit. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347776	2018-11-28 17:21:49 +00:00
Nikita Popov	8d63aed459	[InstCombine] Combine saturating add/sub with constant operands Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773	2018-11-28 16:37:15 +00:00
Nikita Popov	42f89989a1	[InstCombine] Canonicalize ssub.sat to sadd.sat Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772	2018-11-28 16:37:09 +00:00
Nikita Popov	78a9295e15	[InstCombine] Use known overflow information for saturating add/sub If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347770	2018-11-28 16:36:59 +00:00
Nikita Popov	085d24a8b3	[InstCombine] Canonicalize const arg for saturating adds If a saturating add intrinsic has one constant argument, make sure it is on the RHS. This will simplify further transformations. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347769	2018-11-28 16:36:52 +00:00
Xin Tong	53e52e47e8	[ThinLTO] Correct linkonce_any function import linkage. NFC. Summary: This is a NFC as we do not import non-odr vague linkage when computing for import list for a module. Reviewers: tejohnson, pcc Subscribers: inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54928 llvm-svn: 347763	2018-11-28 15:16:35 +00:00
Alexey Bataev	579c2d9d64	[SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized. Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759	2018-11-28 14:34:11 +00:00
Florian Hahn	fd6ea134f4	[PartialInliner] Make PHIs free in cost computation. InlineCost also treats them as free and the current implementation can cause assertion failures if PHI nodes are moved outside the region from entry BBs to the region. It also updates the code to use the instructionsWithoutDebug iterator. Reviewers: davidxl, davide, vsk, graham-yiu-huawei Reviewed By: davidxl Differential Revision: https://reviews.llvm.org/D54748 llvm-svn: 347683	2018-11-27 18:17:27 +00:00
Tim Northover	81bff5e6ea	InstCombine: add comment explaining malloc deletion. NFC. I tried to change this, not quite realising the logic behind what we were doing. Hopefully this comment will help the next person to come along. llvm-svn: 347653	2018-11-27 11:08:14 +00:00
Max Kazantsev	70b11c6d31	[LoopSimplifyCFG] Turn on term folding after underlying bug fixed llvm-svn: 347641	2018-11-27 06:19:42 +00:00
Max Kazantsev	c4e4d6449a	[LoopSimplifyCFG] Fix corner case with duplicating successors It fixes a bug that doesn't update Phi inputs of the only live successor that is in the list of block's successors more than once. Thanks @uabelho for finding this. Differential Revision: https://reviews.llvm.org/D54849 Reviewed By: anna llvm-svn: 347640	2018-11-27 06:17:21 +00:00
Xin Tong	04d49779a1	[ICP] Remove incompatible attributes at indirect-call promoted callsites. Summary: Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in at least a IR verification error. Reviewers: davidxl, xur, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54913 llvm-svn: 347605	2018-11-26 22:03:52 +00:00
Sanjay Patel	790af91803	[InstCombine] add helper function to reduce code duplication; NFC llvm-svn: 347604	2018-11-26 22:00:41 +00:00
Florian Hahn	6615a7132a	[IPSCCP] Use input operand instead of OriginalOp for ssa_copy. OriginalOp of a Predicate refers to the original IR value, before renaming. While solving in IPSCCP, we have to use the operand of the ssa_copy instead, to avoid missing updates for nested conditions on the same IR value. Fixes PR39772. llvm-svn: 347524	2018-11-25 16:32:02 +00:00
Nikita Popov	2c779c0e34	[InstCombine] Determine demanded and known bits for funnel shifts Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 llvm-svn: 347515	2018-11-24 19:00:45 +00:00
Nikita Popov	6e81d421e1	[InstCombine] Simplify funnel shift with zero/undef operand to shift The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision: https://reviews.llvm.org/D54778 llvm-svn: 347505	2018-11-23 22:45:08 +00:00
Max Kazantsev	e1c2dc27d3	Disable LoopSimplifyCFG terminator folding by default llvm-svn: 347486	2018-11-23 09:14:53 +00:00
Max Kazantsev	cb8e240334	[LoopSimplifyCFG] Don't delete LCSSA Phis When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision: https://reviews.llvm.org/D54841 llvm-svn: 347484	2018-11-23 07:56:47 +00:00
Max Kazantsev	b565e6093b	[NFC] Assert that all blocks staying in loop are live llvm-svn: 347458	2018-11-22 12:43:27 +00:00
Max Kazantsev	56a2443024	[NFC] Ensure deterministic order of dead exit blocks llvm-svn: 347457	2018-11-22 12:33:41 +00:00
Max Kazantsev	d9f59f8c80	[NFC] Simplify code by using standard exit blocks collection llvm-svn: 347454	2018-11-22 10:48:30 +00:00
Fedor Sergeev	59246b6bfe	[PM] correcting return value for new-pass-manager version of Scalarizer Obvious mistake missed during D54695 review. llvm-svn: 347432	2018-11-21 22:01:19 +00:00
Nikita Popov	6f54fb0052	[MergeFuncs] Generate alias instead of thunk if possible The MergeFunctions pass was originally intended to emit aliases instead of thunks where possible (unnamed_addr). However, for a long time this functionality was behind a flag hardcoded to false, bitrotted and was eventually removed in r309313. Originally the functionality was first disabled in r108417 due to lack of support for aliases in Mach-O. I believe that this is no longer the case nowadays, but not really familiar with this area. In the interest of being conservative, this patch reintroduces the aliasing functionality behind a default disabled -mergefunc-use-aliases flag. Differential Revision: https://reviews.llvm.org/D53285 llvm-svn: 347407	2018-11-21 19:37:19 +00:00
Mikael Holmen	b6f76002d9	[PM] Port Scalarizer to the new pass manager. Patch by: markus (Markus Lavin) Reviewers: chandlerc, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: llvm-commits, Ka-Ka, bjope Differential Revision: https://reviews.llvm.org/D54695 llvm-svn: 347392	2018-11-21 14:00:17 +00:00
Guozhi Wei	c21fba1bab	[LoopSink] Add preheader to alias set This patch fixes PR39695. The original LoopSink only considers memory alias in loop body. But PR39695 shows that instructions following sink candidate in preheader should also be checked. This is a conservative patch, it simply adds whole preheader block to alias set. It may lose some optimization opportunity, but I think that is very rare because: 1 in the most common case st/ld to the same address, the load should already be optimized away. 2 usually preheader is not very large. Differential Revision: https://reviews.llvm.org/D54659 llvm-svn: 347325	2018-11-20 16:49:07 +00:00
Max Kazantsev	c04b5307d1	Recommit "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches" The initial version of patch lacked Phi nodes updates in destinations of removed edges. This version contains this update and tests on this situation. Differential Revision: https://reviews.llvm.org/D54021 llvm-svn: 347289	2018-11-20 05:43:32 +00:00
Reid Kleckner	994a8451ba	[Transforms] Prefer static and avoid namespaces, NFC Put 'static' on three functions in an anonymous namespace as per our coding style. Remove the 'namespace llvm {}' around the .cpp file and explicitly declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'. I prefer this style for free functions because the compiler will error out if the .h and .cpp files don't agree on the function name or prototype. llvm-svn: 347269	2018-11-19 22:19:05 +00:00
Benjamin Kramer	fdd9b4fc8f	Revert "[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches" This reverts commits r347183 & r347184. Crashes while building libxml. llvm-svn: 347260	2018-11-19 20:01:20 +00:00
Vedant Kumar	238533ec2e	[InstCombine] Set debug loc on `mergeStoreIntoSuccessor` phi Assigning a merged debug location to the `mergeStoreIntoSuccessor` phi improves backtrace quality. Fixes llvm.org/PR38083. llvm-svn: 347257	2018-11-19 19:55:02 +00:00
Vedant Kumar	4de31bba51	[IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock Add methods to BasicBlock which make it easier to efficiently check whether a block has N (or more) predecessors. This can be more efficient than using pred_size(), which is a linear time operation. We might consider adding similar methods for successors. I haven't done so in this patch because succ_size() is already O(1). With this patch applied, I measured a 0.065% compile-time reduction in user time for running `opt -O3` on the sqlite3 amalgamation (30 trials). The change in mergeStoreIntoSuccessor alone saves 45 million linked list iterations in a stage2 Release build of llc. See llvm.org/PR39702 for a harder but more general way of achieving similar results. Differential Revision: https://reviews.llvm.org/D54686 llvm-svn: 347256	2018-11-19 19:54:27 +00:00
Benjamin Kramer	2cad359c91	Revert "[LICM] Make LICM able to hoist phis" This reverts commit r347190. llvm-svn: 347225	2018-11-19 16:51:57 +00:00
Anna Thomas	5e9215f02b	[LV] Avoid vectorizing unsafe dependencies in uniform address Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220	2018-11-19 15:39:59 +00:00
John Brawn	12c046fba0	[LICM] Make LICM able to hoist phis The general approach taken is to make note of loop invariant branches, then when we see something conditional on that branch, such as a phi, we create a copy of the branch and (empty versions of) its successors and hoist using that. This has no impact by itself that I've been able to see, as LICM typically doesn't see such phis as they will have been converted into selects by the time LICM is run, but once we start doing phi-to-select conversion later it will be important. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347190	2018-11-19 11:31:24 +00:00
Max Kazantsev	8e3e33d138	[LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches This patch introduces infrastructure and the simplest case for constant-folding of branch and switch instructions within loop into unconditional branches. It is useful as a cleanup for such passes as loop unswitching that sometimes produce such branches. Only the simplest case supported in this patch: after the folding, no block should become dead or stop being part of the loop. Support for more sophisticated cases will go separately in follow-up patches. Differential Revision: https://reviews.llvm.org/D54021 Reviewed By: anna llvm-svn: 347183	2018-11-19 05:54:38 +00:00
Vedant Kumar	e7b789b529	[ProfileSummary] Standardize methods and fix comment Every Analysis pass has a get method that returns a reference of the Result of the Analysis, for example, BlockFrequencyInfo &BlockFrequencyInfoWrapperPass::getBFI(). I believe that ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning a pointer. Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock, respectively. Most methods use BB as the argument of variable names while methods usually refer to Basic Blocks as Blocks, instead of BB. For example, Function::getEntryBlock, Loop:getExitBlock, etc. I also fixed one of the comments. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D54669 llvm-svn: 347182	2018-11-19 05:23:16 +00:00
Vedant Kumar	35f504c113	[CorrelatedValuePropagation] Preserve debug locations (PR38178) Fix all of the missing debug location errors in CVP found by debugify. This includes the missing-location-after-udiv-truncation case described in llvm.org/PR38178. llvm-svn: 347147	2018-11-18 00:29:58 +00:00
Fangrui Song	7570932977	Use llvm::copy. NFC llvm-svn: 347126	2018-11-17 01:44:25 +00:00
Fedor Sergeev	2e3e224e71	[SimpleLoopUnswitch] adding cost multiplier to cap exponential unswitch with We need to control exponential behavior of loop-unswitch so we do not get run-away compilation. Suggested solution is to introduce a multiplier for an unswitch cost that makes cost prohibitive as soon as there are too many candidates and too many sibling loops (meaning we have already started duplicating loops by unswitching). It does solve the currently known problem with compile-time degradation (PR 39544). Tests are built on top of a recently implemented CHECK-COUNT-<num> FileCheck directives. Reviewed By: chandlerc, mkazantsev Differential Revision: https://reviews.llvm.org/D54223 llvm-svn: 347097	2018-11-16 21:16:43 +00:00
Adrian Prantl	83d87520ed	GlobalDCE: Teach isEmptyFunction() to ignore debug intrinsics. This fixes PR39669. https://bugs.llvm.org/show_bug.cgi?id=39669 llvm-svn: 347065	2018-11-16 17:47:21 +00:00
Eugene Leviant	bf46e7410c	[ThinLTO] Internalize readonly globals An attempt to recommit r346584 after failure on OSX build bot. Fixed cache key computation in ThinLTOCodeGenerator and added test case llvm-svn: 347033	2018-11-16 07:08:00 +00:00
Xin Tong	642c8d3575	[LTO] Load sample profile in LTO link step. Summary: Load sample profile in LTO link step. ThinLTO calls populateModulePassManager to load the profile Reviewers: tejohnson, davidxl, danielcdh Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54564 llvm-svn: 346971	2018-11-15 18:06:42 +00:00
Sanjay Patel	bc56b2432d	[InstCombine] fix rotate narrowing bug for non-pow-2 types llvm-svn: 346968	2018-11-15 17:19:14 +00:00
Mandeep Singh Grang	0905fc77c1	[InstCombine] Remove a couple of asserts based on incorrect assumptions Summary: These asserts are based on the assumption that the order of true/false operands in a select and those in the compare would always be the same. This fixes PR39595. Reviewers: craig.topper, spatel, dmgreen Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54359 llvm-svn: 346874	2018-11-14 17:55:07 +00:00
Sanjay Patel	6072842770	[InstCombine] fix formatting for matchBSwap(); NFC We should have a similar function for matching rotate and/or funnel shift, so tidy up the related existing call. llvm-svn: 346871	2018-11-14 16:03:36 +00:00
Florian Hahn	6df11868b5	[VPlan, SLP] Use SmallPtrSet for Candidates. This slightly improves the candidate handling in getBest(). llvm-svn: 346870	2018-11-14 15:58:40 +00:00
Florian Hahn	02cb67deb9	[VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle. The caller should take care of only calling it with debug enabled. llvm-svn: 346860	2018-11-14 13:33:44 +00:00
Florian Hahn	2eca3728ee	[VPlan] Update ifdef. llvm-svn: 346858	2018-11-14 13:21:26 +00:00
Florian Hahn	09e516c54b	[VPlan, SLP] Add simple SLP analysis on top of VPlan. This patch adds an initial implementation of the look-ahead SLP tree construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha, Luís F. W. Góes'. It returns an SLP tree represented as VPInstructions, with combined instructions represented as a single, wider VPInstruction. This initial version does not support instructions with multiple different users (either inside or outside the SLP tree) or non-instruction operands; it won't generate any shuffles or insertelement instructions. It also just adds the analysis that builds an SLP tree rooted in a set of stores. It does not include any cost modeling or memory legality checks. The plan is to integrate it with VPlan based cost modeling, once available and to only apply it to operations that can be widened. A follow-up patch will add a support for replacing instructions in a VPlan with their SLP counter parts. Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D4949 llvm-svn: 346857	2018-11-14 13:11:49 +00:00
Florian Hahn	505091a8f2	Recommit r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site). The underlying problem causing the expensive-check failure was fixed in rL346769. llvm-svn: 346843	2018-11-14 10:04:30 +00:00
Reid Kleckner	41390b47de	Revert r346810 "Preserve loop metadata when splitting exit blocks" It broke the Windows self-host: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/1457 llvm-svn: 346823	2018-11-14 01:47:32 +00:00
Sanjay Patel	a139564896	[InstCombine] fold funnel shift amount based on demanded bits The shift amount of a funnel shift is modulo the scalar bitwidth: http://llvm.org/docs/LangRef.html#llvm-fshl-intrinsic ...so we can use demanded bits analysis on that operand to simplify it when we have a power-of-2 bitwidth. This is another step towards canonicalizing {shift/shift/or} to the intrinsics in IR. Differential Revision: https://reviews.llvm.org/D54478 llvm-svn: 346814	2018-11-13 23:27:23 +00:00
Craig Topper	3c87c2a3c5	Preserve loop metadata when splitting exit blocks LoopUtils.cpp contains a utility that splits an loop exit block, so that the new block contains only edges coming from the loop. In the case of nested loops, the exit path for the inner loop might also be the back-edge of the outer loop. The new block which is inserted on this path, is now a latch for the outer loop, and it needs to hold the loop metadata for the outer loop. (The test case gives a more concrete view of the situation.) Patch by Chang Lin (clin1) Differential Revision: https://reviews.llvm.org/D53876 llvm-svn: 346810	2018-11-13 23:06:49 +00:00
Sanjay Patel	f8f12272e8	[InstCombine] canonicalize rotate patterns with cmp/select The cmp+branch variant of this pattern is shown in: https://bugs.llvm.org/show_bug.cgi?id=34924 ...and as discussed there, we probably can't transform that without a rotate intrinsic. We do have that now via funnel shift, but we're not quite ready to canonicalize IR to that form yet. The case with 'select' should already be transformed though, so that's this patch. The sequence with negation followed by masking is what we use in the backend and partly in clang (though that part should be updated). https://rise4fun.com/Alive/TplC %cmp = icmp eq i32 %shamt, 0 %sub = sub i32 32, %shamt %shr = lshr i32 %x, %shamt %shl = shl i32 %x, %sub %or = or i32 %shr, %shl %r = select i1 %cmp, i32 %x, i32 %or => %neg = sub i32 0, %shamt %masked = and i32 %shamt, 31 %maskedneg = and i32 %neg, 31 %shl2 = lshr i32 %x, %masked %shr2 = shl i32 %x, %maskedneg %r = or i32 %shl2, %shr2 llvm-svn: 346807	2018-11-13 22:47:24 +00:00
Florian Hahn	107d0a8756	[CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater. This patch updates DuplicateInstructionsInSplitBetween to update a DTU instead of applying updates to the DT directly. Given that there only are 2 users, also updated them in this patch to avoid churn. I slightly moved the code in CallSiteSplitting around to reduce the places where we have to pass in DTU. If necessary, I could split those changes in a separate patch. This fixes missing DT updates when dealing with musttail calls in CallSiteSplitting, by using DTU->deleteBB. Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki Reviewed By: NutshellySima llvm-svn: 346769	2018-11-13 17:54:43 +00:00
Steven Wu	fa43892d6f	Revert "[ThinLTO] Internalize readonly globals" This reverts commit 10c84a8f35cae4a9fc421648d9608fccda3925f2. llvm-svn: 346768	2018-11-13 17:35:04 +00:00
Florian Hahn	a4dc7feeea	[VPlan] VPlan version of InterleavedAccessInfo. This patch turns InterleaveGroup into a template with the instruction type being a template parameter. It also adds a VPInterleavedAccessInfo class, which only contains a mapping from VPInstructions to their respective InterleaveGroup. As we do not have access to scalar evolution in VPlan, we can re-use convert InterleavedAccessInfo to VPInterleavedAccess info. Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D49489 llvm-svn: 346758	2018-11-13 15:58:18 +00:00
Zhizhou Yang	cc633af55b	Introduce DebugCounter into ConstProp pass Summary: This patch introduces DebugCounter into ConstProp pass at per-transformation level. It will provide an option to skip first n or stop after n transformations for the whole ConstProp pass. This will make debug easier for the pass, also providing chance to do transformation level bisecting. Reviewers: davide, fhahn Reviewed By: fhahn Subscribers: llozano, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D50094 llvm-svn: 346720	2018-11-13 00:31:22 +00:00
Sanjay Patel	35b1c2d19d	[InstCombine] narrow width of rotate patterns, part 3 This is a longer variant for the pattern handled in rL346713 This one includes zexts. Eventually, we should canonicalize all rotate patterns to the funnel shift intrinsics, but we need a bit more infrastructure to make sure the vectorizers handle those intrinsics as well as the shift+logic ops. https://rise4fun.com/Alive/FMn Name: narrow rotateright %neg = sub i8 0, %shamt %rshamt = and i8 %shamt, 7 %rshamtconv = zext i8 %rshamt to i32 %lshamt = and i8 %neg, 7 %lshamtconv = zext i8 %lshamt to i32 %conv = zext i8 %x to i32 %shr = lshr i32 %conv, %rshamtconv %shl = shl i32 %conv, %lshamtconv %or = or i32 %shl, %shr %r = trunc i32 %or to i8 => %maskedShAmt2 = and i8 %shamt, 7 %negShAmt2 = sub i8 0, %shamt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shl2 = lshr i8 %x, %maskedShAmt2 %shr2 = shl i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346716	2018-11-12 22:52:25 +00:00
Sanjay Patel	98e427ccf2	[InstCombine] narrow width of rotate patterns, part 2 (PR39624) The sub-pattern for the shift amount in a rotate can take on several different forms, and there's apparently no way to canonicalize those without seeing the entire rotate sequence. This is the form noted in: https://bugs.llvm.org/show_bug.cgi?id=39624 https://rise4fun.com/Alive/qnT %zx = zext i8 %x to i32 %maskedShAmt = and i32 %shAmt, 7 %shl = shl i32 %zx, %maskedShAmt %negShAmt = sub i32 0, %shAmt %maskedNegShAmt = and i32 %negShAmt, 7 %shr = lshr i32 %zx, %maskedNegShAmt %rot = or i32 %shl, %shr %r = trunc i32 %rot to i8 => %truncShAmt = trunc i32 %shAmt to i8 %maskedShAmt2 = and i8 %truncShAmt, 7 %shl2 = shl i8 %x, %maskedShAmt2 %negShAmt2 = sub i8 0, %truncShAmt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shr2 = lshr i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346713	2018-11-12 22:11:09 +00:00
Sanjay Patel	ceab2329b6	[InstCombine] refactor code for matching shift amount of a rotate; NFC As shown in existing test cases and with: https://bugs.llvm.org/show_bug.cgi?id=39624 ...we're missing at least 2 more patterns for rotate narrowing. llvm-svn: 346711	2018-11-12 22:00:00 +00:00
Philip Reames	b8d8db30ea	[GC][InstCombine] Fix a potential iteration issue Noticed via inspection. Appears to be largely innocious in practice, but slight code change could have resulted in either visit order dependent missed optimizations or infinite loops. May be a minor compile time problem today. llvm-svn: 346698	2018-11-12 20:00:53 +00:00
Simon Pilgrim	631f2bf51e	[CostModel] Add more realistic SK_ExtractSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656	2018-11-12 14:25:23 +00:00
Max Kazantsev	7d49a3a816	[LICM] Hoist guards from non-header blocks This patch relaxes overconservative checks on whether or not we could write memory before we execute an instruction. This allows us to hoist guards out of loops even if they are not in the header block. Differential Revision: https://reviews.llvm.org/D50891 Reviewed By: fedor.sergeev llvm-svn: 346643	2018-11-12 09:29:58 +00:00
Calixte Denizet	c6fabeac11	[GCOV] Add options to filter files which must be instrumented. Summary: When making code coverage, a lot of files (like the ones coming from /usr/include) are removed when post-processing gcno/gcda so finally they doen't need to be instrumented nor to appear in gcno/gcda. The goal of the patch is to be able to filter the files we want to instrument, there are several advantages to do that: - improve speed (no overhead due to instrumentation on files we don't care) - reduce gcno/gcda size - it gives the possibility to easily instrument only few files (e.g. ones modified in a patch) without changing the build system - need to accept this patch to be enabled in clang: https://reviews.llvm.org/D52034 Reviewers: marco-c, vsk Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru Differential Revision: https://reviews.llvm.org/D52033 llvm-svn: 346641	2018-11-12 09:01:43 +00:00
Florian Hahn	9026d4ee9b	[IPSCCP,PM] Preserve PDT in the new pass manager. Reviewers: kuhar, chandlerc, NutshellySima, brzycki Reviewed By: NutshellySima, brzycki Differential Revision: https://reviews.llvm.org/D54317 llvm-svn: 346618	2018-11-11 20:22:45 +00:00
Sanjay Patel	4a12aa9791	[InstCombine] simplify code for merging stores; NFCI llvm-svn: 346596	2018-11-10 20:29:25 +00:00
Eugene Leviant	be8d19967a	[ThinLTO] Internalize readonly globals This patch allows internalising globals if all accesses to them (from live functions) are from non-volatile load instructions Differential revision: https://reviews.llvm.org/D49362 llvm-svn: 346584	2018-11-10 08:31:21 +00:00
Eli Friedman	15930bf352	[JumpThreading] Fix exponential time algorithm computing known values. ComputeValueKnownInPredecessors has a "visited" set to prevent infinite loops, since a value can be visited more than once. However, the implementation didn't prevent the algorithm from taking exponential time. Instead of removing elements from the RecursionSet one at a time, we should keep around the whole set until ComputeValueKnownInPredecessors finishes, then discard it. The testcase is synthetic because I was having trouble effectively reducing the original. But it's basically the same idea. Instead of failing, we could theoretically cache the result instead. But I don't think it would help substantially in practice. Differential Revision: https://reviews.llvm.org/D54239 llvm-svn: 346562	2018-11-09 22:35:26 +00:00
Florian Hahn	9f878e9bae	Revert r346483: [CallSiteSplitting] Only record conditions up to the IDom(call site). This cause a failure with EXPENSIVE_CHECKS llvm-svn: 346492	2018-11-09 13:28:58 +00:00
Florian Hahn	a1062f4b68	[IPSCCP,PM] Preserve DT in the new pass manager. After D45330, Dominators are required for IPSCCP and can be preserved. This patch preserves DominatorTreeAnalysis in the new pass manager. AFAIK the legacy pass manager cannot preserve function analysis required by a module analysis. Reviewers: davide, dberlin, chandlerc, efriedma, kuhar, NutshellySima Reviewed By: chandlerc, kuhar, NutshellySima Differential Revision: https://reviews.llvm.org/D47259 llvm-svn: 346486	2018-11-09 11:52:27 +00:00
Florian Hahn	52578f95c9	[CallSiteSplitting] Only record conditions up to the IDom(call site). We can stop recording conditions once we reached the immediate dominator for the block containing the call site. Conditions in predecessors of the that node will be the same for all paths to the call site and splitting is not beneficial. This patch makes CallSiteSplitting dependent on the DT anlysis. because the immediate dominators seem to be the easiest way of finding the node to stop at. I had to update some exiting tests, because they were checking for conditions that were true/false on all paths to the call site. Those should now be handled by instcombine/ipsccp. Reviewers: davide, junbuml Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D44627 llvm-svn: 346483	2018-11-09 10:23:46 +00:00
Carlos Alberto Enciso	fa9cf89734	[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG. In SimplifyCFG when given a conditional branch that goes to BB1 and BB2, the hoisted common terminator instruction in the two blocks, caused debug line records associated with subsequent select instructions to become ambiguous. It causes the debugger to display unreachable source lines. Differential Revision: https://reviews.llvm.org/D53390 llvm-svn: 346481	2018-11-09 09:42:10 +00:00
Max Kazantsev	9883d1e1a7	[NFC] Add utility function for SafetyInfo updates for moveBefore llvm-svn: 346472	2018-11-09 05:39:04 +00:00
Florian Hahn	a684a99441	[LoopInterchange] Support reductions across inner and outer loop. This patch adds logic to detect reductions across the inner and outer loop by following the incoming values of PHI nodes in the outer loop. If the incoming values take part in a reduction in the inner loop or come from outside the outer loop, we found a reduction spanning across inner and outer loop. With this change, ~10% more loops are interchanged in the LLVM test-suite + SPEC2006. Fixes https://bugs.llvm.org/show_bug.cgi?id=30472 Reviewers: mcrosier, efriedma, karthikthecool, davide, hfinkel, dmgreen Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D43245 llvm-svn: 346438	2018-11-08 20:44:19 +00:00
Pirama Arumuga Nainar	e61652a384	[LTO] Drop non-prevailing definitions only if linkage is not local or appending Summary: This fixes PR 37422 In ELF, non-weak symbols can also be non-prevailing. In this particular PR, the __llvm_profile_* symbols are non-prevailing but weren't getting dropped - causing multiply-defined errors with lld. Also add a test, strong_non_prevailing.ll, to ensure that multiple copies of a strong symbol are dropped. To fix the test regressions exposed by this fix, - do not mark prevailing copies for symbols with 'appending' linkage. There's no one prevailing copy for such symbols. - fix the prevailing version in dead-strip-fulllto.ll - explicitly pass exported symbols to llvm-lto in fumcimport.ll and funcimport_var.ll Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, dang, srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D54125 llvm-svn: 346436	2018-11-08 20:10:07 +00:00
Tom Stellard	28d662164d	InstCombine: Avoid introducing poison values when lowering llvm.amdgcn.[us]bfe Summary: When the 3rd argument to these intrinsics is zero, lowering them to shift instructions produces poison values, since we end up with shift amounts equal to the number of bits in the shifted value. This means we can only lower these intrinsics if we can prove that the 3rd argument is not zero. Reviewers: arsenm Reviewed By: arsenm Subscribers: bnieuwenhuizen, jvesely, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D53739 llvm-svn: 346422	2018-11-08 17:57:57 +00:00
Vedant Kumar	d6699423f1	[CodeExtractor] Mark functions noreturn when applicable This eliminates the outlining penalty for llvm.trap/unreachable, because callers no longer have to emit cleanup/ret instructions after calling an outlined `noreturn` function. rdar://45523626 llvm-svn: 346421	2018-11-08 17:57:09 +00:00
Max Kazantsev	266c087b9d	Return "[IndVars] Smart hard uses detection" The patch has been reverted because it ended up prohibiting propagation of a constant to exit value. For such values, we should skip all checks related to hard uses because propagating a constant is always profitable. Differential Revision: https://reviews.llvm.org/D53691 llvm-svn: 346397	2018-11-08 11:54:35 +00:00
Gil Rapaport	7b88bab386	[LSR] Combine unfolded offset into invariant register LSR reassociates constants as unfolded offsets when the constants fit as immediate add operands, which currently prevents such constants from being combined later with loop invariant registers. This patch modifies GenerateCombinations() to generate a second formula which includes the unfolded offset in the combined loop-invariant register. This commit fixes a bug in the original patch (committed at r345114, reverted at r345123). Differential Revision: https://reviews.llvm.org/D51861 llvm-svn: 346390	2018-11-08 09:01:19 +00:00
whitequark	73cb978495	[MergeFuncs] Improve ordering of equal functions Summary: MergeFunctions currently tries to process strong functions before weak functions, because weak functions can simply call strong functions, while a strong/weak function cannot call a weak function (a backing strong function is needed). This patch additionally tries to process external functions before local functions, because we definitely have to keep the external function, but may be able to drop the local one (and definitely can if it is also unnamed_addr). Unfortunately, this exposes an existing bug in the implementation: The FnTree and FNodesInTree structures can currently go out of sync in the case where two weak functions are merged, because the function in FnTree/FNodesInTree is RAUWed. This leaves it behind in FnTree (this is intended, as it is the strong backing function which should be used for further merges), while it is replaced in FNodesInTree (this is not intended). This is fixed by switching FNodesInTree from using a ValueMap to using a DenseMap of AssertingVH. This exposes another minor issue: Currently FNodesInTree is not cleared after MergeFunctions finishes running. Currently, this is potentially dangerous (e.g. if something else wants to RAUW a function with a non-function), but at the very least it is unnecessary/inefficient. After the change to use AssertingVH it becomes more problematic, because there are certainly passes that remove functions. This issue is fixed by clearing FNodesInTree at the end of the pass. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D53271 llvm-svn: 346386	2018-11-08 03:58:01 +00:00
whitequark	3580ac6125	[MergeFuncs] Call removeUsers() prior to unnamed_addr RAUW Summary: For unnamed_addr functions we RAUW instead of only replacing direct callers. However, functions in which replacements were performed currently are not added back to the worklist, resulting in missed merging opportunities. Fix this by calling removeUsers() prior to RAUW. Reviewers: jfb, whitequark Reviewed By: whitequark Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D53262 llvm-svn: 346385	2018-11-08 03:57:55 +00:00
Reid Kleckner	b41b372171	[sancov] Put .SCOV* sections into the right comdat groups on COFF Avoids linker errors about relocations against discarded sections. This was uncovered during the Chromium clang roll here: https://chromium-review.googlesource.com/c/chromium/src/+/1321863#message-717516acfcf829176f6a2f50980f7a4bdd66469a After this change, Chromium's libGLESv2 links successfully for me. Reviewers: metzman, hans, morehouse Differential Revision: https://reviews.llvm.org/D54232 llvm-svn: 346381	2018-11-08 00:57:33 +00:00
Rong Xu	fb4bcc452c	[PGO] Exit early if all count values are zero If all the edge counts for a function are zero, skip count population and annotation, as nothing will happen. This can save some compile time. Differential Revision: https://reviews.llvm.org/D54212 llvm-svn: 346370	2018-11-07 23:51:20 +00:00
Fedor Sergeev	f9a02a7006	[SimpleLoopUnswitch] partial unswitch needs to be careful when replacing invariants with constants When partial unswitch operates on multiple conditions at once, .e.g: if (Cond1 \|\| Cond2 \|\| NonInv) ... it should infer (and replace) values for individual conditions only on one side of unswitch and not another. More precisely only these derivations hold true: (Cond1 \|\| Cond2) == false => Cond1 == Cond2 == false (Cond1 && Cond2) == true => Cond1 == Cond2 == true By the way we organize unswitching it means only replacing on "continue" blocks and never on "unswitched" ones. Since trivial unswitch does not have "unswitched" blocks it does not have this problem. Fixes PR 39568. Reviewers: chandlerc, asbirlea Differential Revision: https://reviews.llvm.org/D54211 llvm-svn: 346350	2018-11-07 20:05:11 +00:00
Mandeep Singh Grang	d47d188b6f	[LoopSink] Do not sink instructions into non-cold blocks Summary: This fixes PR39570. Reviewers: danielcdh, rnk, bkramer Reviewed By: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54181 llvm-svn: 346337	2018-11-07 18:26:24 +00:00
Florian Hahn	ac86038b40	[NewGVN] Make sure we do not add a user to itself. If we simplify an instruction to itself, we do not need to add a user to itself. For congruence classes with a defining expression, we already use a similar logic. Fixes PR38259. Reviewers: davide, efriedma, mcrosier Reviewed By: davide Differential Revision: https://reviews.llvm.org/D51168 llvm-svn: 346335	2018-11-07 17:20:07 +00:00
Sanjay Patel	57a08b3343	[InstCombine] propagate FMF for fcmp+fabs folds By morphing the instruction rather than deleting and creating a new one, we retain fast-math-flags and potentially other metadata (profile info?). llvm-svn: 346331	2018-11-07 16:15:01 +00:00
Sanjay Patel	bb521e63af	[InstCombine] peek through fabs() when checking isnan() That should be the end of the missing cases for this fold. See earlier patches in this series: rL346321 rL346324 llvm-svn: 346327	2018-11-07 15:44:26 +00:00
Sanjay Patel	fa5f146872	[InstCombine] add folds for fcmp Pred fabs(X), 0.0 Similar to rL346321, we had folds for the ordered versions of these compares already, so add the unordered siblings for completeness. llvm-svn: 346324	2018-11-07 15:33:03 +00:00
James Y Knight	72f76bf230	Add support for llvm.is.constant intrinsic (PR4898) This adds the llvm-side support for post-inlining evaluation of the __builtin_constant_p GCC intrinsic. Also fixed SCCPSolver::visitCallSite to not blow up when seeing a call to a function where canConstantFoldTo returns true, and one of the arguments is a struct. Updated from patch initially by Janusz Sobczak. Differential Revision: https://reviews.llvm.org/D4276 llvm-svn: 346322	2018-11-07 15:24:12 +00:00
Sanjay Patel	76faf5145d	[InstCombine] add fold for fabs(X) u< 0.0 The sibling fold for 'oge' --> 'ord' was already here, but this half was missing. The result of fabs() must be positive or nan, so asking if the result is negative or nan is the same as asking if the result is nan. This is another step towards fixing: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346321	2018-11-07 15:11:32 +00:00
Sanjay Patel	de58e93666	fix typos aggressively; NFC llvm-svn: 346316	2018-11-07 14:35:36 +00:00
Sanjay Patel	7552d0d2e6	[InstCombine] do not shrink switch conditions to illegal types (PR29009) This patch makes shrinking switch conditions less aggressive which was introduced by: rL274233 Note that we have 2 new bugs to track potential follow-ups that might have solved PR29009 in different ways: https://bugs.llvm.org/show_bug.cgi?id=39569 https://bugs.llvm.org/show_bug.cgi?id=39578 Patch by: @dendibakh (Denis Bakhvalov) Differential Revision: https://reviews.llvm.org/D54115 llvm-svn: 346315	2018-11-07 14:12:41 +00:00
Calixte Denizet	c3bed1e8e6	[GCOV] Flush counters before to avoid counting the execution before fork twice and for exec functions we must flush before the call Summary: This is replacement for patch in https://reviews.llvm.org/D49460. When we fork, the counters are duplicate as they're and so the values are finally wrong when writing gcda for parent and child. So just before to fork, we flush the counters and so the parent and the child have new counters set to zero. For exec functions, we need to flush before the call to have some data. Reviewers: vsk, davidxl, marco-c Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru, marco-c Differential Revision: https://reviews.llvm.org/D53593 llvm-svn: 346313	2018-11-07 13:49:17 +00:00
Sanjay Patel	d1172a0c20	[IR] add optional parameter for copying IR flags to compare instructions As shown, this is used to eliminate redundant code in InstCombine, and there are more cases where we should be using this pattern, but we're currently unintentionally dropping flags. llvm-svn: 346282	2018-11-07 00:00:42 +00:00
Teresa Johnson	cb397461e1	[ThinLTO] Split NotEligibleToImport into legality and inlinability flags Summary: The NotEligibleToImport flag on the GlobalValueSummary was set if it isn't legal to import (e.g. because it references unpromotable locals) and when it can't be inlined (in which case importing is pointless). I split out the inlinable piece into a separate flag on the FunctionSummary (doesn't make sense for aliases or global variables), because in the future we may want to import for reasons other than inlining. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D53345 llvm-svn: 346261	2018-11-06 19:41:35 +00:00
Vedant Kumar	1e209e284f	[CodeExtractor] Do not extract calls to eh_typeid_for (PR39545) The lowering for a call to eh_typeid_for changes when it's moved from one function to another. There are several proposals for fixing this issue in llvm.org/PR39545. Until some solution is in place, do not allow CodeExtractor to extract calls to eh_typeid_for, as that results in serious miscompilations. llvm-svn: 346256	2018-11-06 19:06:08 +00:00
Vedant Kumar	09b7aa443d	[CodeExtractor] Erase use-without-def debug intrinsics in parent func When CodeExtractor moves instructions to a new function, debug intrinsics referring to those instructions within the parent function become invalid. This results in the same verifier failure which motivated r344545, about function-local metadata being used in the wrong function. llvm-svn: 346255	2018-11-06 19:05:53 +00:00
Sanjay Patel	724014adde	[InstCombine] allow vector types for fcmp+fpext fold llvm-svn: 346245	2018-11-06 17:20:20 +00:00
Sanjay Patel	46bf3922c1	[InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2 llvm-svn: 346242	2018-11-06 16:45:27 +00:00
Sanjay Patel	7c3ee4da42	[InstCombine] rearrange code for fcmp+fpext; NFCI llvm-svn: 346241	2018-11-06 16:37:35 +00:00
Sanjay Patel	1b85f00201	[InstCombine] propagate fast-math-flags when folding fcmp+fpext llvm-svn: 346240	2018-11-06 16:23:03 +00:00
Sanjay Patel	2fd5b0ebfb	[InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2 llvm-svn: 346238	2018-11-06 15:58:57 +00:00
Sanjay Patel	05e70fb978	[InstCombine] reduce code; NFC llvm-svn: 346235	2018-11-06 15:53:58 +00:00
Sanjay Patel	70282a0501	[InstCombine] propagate fast-math-flags when folding fcmp+fneg This is another part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 This might be enough to fix that particular issue, but as noted with the FIXME, we're still dropping FMF on other folds around here. llvm-svn: 346234	2018-11-06 15:49:45 +00:00
Simon Pilgrim	c1da5f757e	[InstCombine] Ensure nested shifts are in range (OSS-Fuzz #9880 ) llvm-svn: 346225	2018-11-06 11:28:22 +00:00
Max Kazantsev	0042c06e8b	[LICM] Remove too conservative IsMustExecute variable LICM relies on variable `MustExecute` which is conservatively set to `false` in all non-headers. It is used when we decide whether or not we want to hoist an instruction or a guard. For the guards, it might be too conservative to use this variable, we can instead use a more precise logic from LoopSafetyInfo. Currently it is only NFC because `IsMemoryNotModified` is also conservatively set to `false` for all non-headers, and we cannot hoist guards from non-header blocks. However once we give up using `IsMemoryNotModified` and use a smarter check instead, this will allow us to hoist guards from all mustexecute non-header blocks. Differential Revision: https://reviews.llvm.org/D50888 Reveiwed By: fedor.sergeev llvm-svn: 346204	2018-11-06 04:17:40 +00:00
Max Kazantsev	69f6dfa0f8	[LICM] Use ICFLoopSafetyInfo in LICM This patch makes LICM use `ICFLoopSafetyInfo` that is a smarter version of LoopSafetyInfo that leverages power of Implicit Control Flow Tracking to keep track of throwing instructions and give less pessimistic answers to queries related to throws. The ICFLoopSafetyInfo itself has been introduced in rL344601. This patch enables it in LICM only. Differential Revision: https://reviews.llvm.org/D50377 Reviewed By: apilipenko llvm-svn: 346201	2018-11-06 02:44:49 +00:00
Max Kazantsev	e059f4452b	Revert "[IndVars] Smart hard uses detection" This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402. It seems that the check that we still should do the transform if we know the result is constant is missing in this code. So the logic that has been deleted by this change is still sometimes accidentally useful. I revert the change to see what can be done about it. The motivating case is the following: @Y = global [400 x i16] zeroinitializer, align 1 define i16 @foo() { entry: br label %for.body for.body: ; preds = %entry, %for.body %i = phi i16 [ 0, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i store i16 0, i16* %arrayidx, align 1 %inc = add nuw nsw i16 %i, 1 %cmp = icmp ult i16 %inc, 400 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body %inc.lcssa = phi i16 [ %inc, %for.body ] ret i16 %inc.lcssa } We should be able to figure out that the result is constant, but the patch breaks it. Differential Revision: https://reviews.llvm.org/D51584 llvm-svn: 346198	2018-11-06 02:02:05 +00:00
Sanjay Patel	1440107821	[InstSimplify] fold select (fcmp X, Y), X, Y This is NFCI for InstCombine because it calls InstSimplify, so I left the tests for this transform there. As noted in the code comment, we can allow this fold more often by using FMF and/or value tracking. llvm-svn: 346169	2018-11-05 21:51:39 +00:00
Taewook Oh	2b7ae47ccb	[MergeICmps] Do not perform the transformation if GEP is used outside of block Summary: This patch prevents MergeICmps to performn the transformation if the address operand GEP of the load instruction has a use outside of the load's parent block. Without this patch, compiler crashes with the given test case because the use of `%first.i` is still around when the basic block is erased from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/MergeICmps.cpp#L620. I think checking `isUsedOutsideOfBlock` with `GEP` is the original intention of the code, as the checking for `LoadI` is already performed in the same function. This patch is incomplete though, as this makes the pass overly conservative and fails the test `tuple-four-int8.ll`. I believe what needs to be done is checking if GEP has a use outside of block that is not the part of "Comparisons" chain. Submit the patch as of now to prevent compiler crash. Reviewers: courbet, trentxintong Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54089 llvm-svn: 346151	2018-11-05 18:16:32 +00:00
Sanjay Patel	c26fd1e772	[InstCombine] canonicalize -0.0 to +0.0 in fcmp As stated in IEEE-754 and discussed in: https://bugs.llvm.org/show_bug.cgi?id=38086 ...the sign of zero does not affect any FP compare predicate. Known regressions were fixed with: rL346097 (D54001) rL346143 The transform will help reduce pattern-matching complexity to solve: https://bugs.llvm.org/show_bug.cgi?id=39475 ...as well as improve CSE and codegen (a zero constant is almost always easier to produce than 0x80..00). llvm-svn: 346147	2018-11-05 17:26:42 +00:00
Sanjay Patel	87aa10062c	[InstCombine] loosen FP 0.0 constraint for fcmp+select substitution It looks like we correctly removed edge cases with 0.0 from D50714, but we were a bit conservative because getBinOpIdentity() doesn't distinguish between +0.0 and -0.0 and 'nsz' is effectively always true for fcmp (see discussion in: https://bugs.llvm.org/show_bug.cgi?id=38086 Without this change, we would get regressions by canonicalizing to +0.0 in all fcmp, and that's a step towards solving: https://bugs.llvm.org/show_bug.cgi?id=39475 llvm-svn: 346143	2018-11-05 16:50:44 +00:00
Vedant Kumar	d2a895a972	[HotColdSplitting] Use TTI to inform outlining threshold Using TargetTransformInfo allows the splitting pass to factor in the code size cost of instructions as it decides whether or not outlining is profitable. This did not regress the overall amount of outlining seen on the handful of internal frameworks I tested. Thanks to Jun Bum Lim for suggesting this! Differential Revision: https://reviews.llvm.org/D53835 llvm-svn: 346108	2018-11-04 23:11:57 +00:00
Jordan Rupprecht	80e7e86c29	[DebugInfo][InstMerge] Fix -debugify for phi node created by -mldst-motion Summary: -mldst-motion creates a new phi node without any debug info. Use the merged debug location from the incoming stores to fix this. Fixes PR38177. The test case here is (somewhat) simplified from: ``` struct S { int foo; void fn(int bar); }; void S::fn(int bar) { if (bar) foo = 1; else foo = 0; } ``` Reviewers: dblaikie, gbedwell, aprantl, vsk Reviewed By: vsk Subscribers: vsk, JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D54019 llvm-svn: 346027	2018-11-02 18:25:41 +00:00
Ayal Zaks	45a3ca7be7	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 llvm-svn: 345959	2018-11-02 09:16:12 +00:00
Max Kazantsev	872bb74b0a	[NFC][LICM] Factor out instruction erasing logic This patch factors out a function that makes all required updates whenever an instruction gets erased. Differential Revision: https://reviews.llvm.org/D54011 Reviewed By: apilipenko llvm-svn: 345914	2018-11-02 00:21:45 +00:00
Florian Hahn	de4f774783	[LoopInterchange] Fix unused variables in release build llvm-svn: 345881	2018-11-01 19:51:13 +00:00
Florian Hahn	c8bd6ea35e	[LoopInterchange] Remove support for inner-only reductions. Inner-loop only reductions require additional checks to make sure they form a load-phi-store cycle across inner and outer loop. Otherwise the reduction value is not properly preserved. This patch disables interchanging such loops for now, as it causes miscompiles in some cases and it seems to apply only for a tiny amount of loops. Across the test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are interchange with inner loop reduction support disabled. With -loop-interchange-threshold=-1000, 3256 instead of 3267. See the discussion and history of D53027 for an outline of how such legality checks could look like. Reviewers: efriedma, mcrosier, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D53027 llvm-svn: 345877	2018-11-01 19:25:00 +00:00
Reid Kleckner	3f756fbabe	Remove unnecessary fallthrough annotation after unreachable Clang's -Wimplicit-fallthrough implementation warns on this. I built clang with GCC 7.3 in +asserts and -asserts mode, and GCC doesn't warn on this in either configuration. I think it is unnecessary. I separated it from the large mechanical patch (https://reviews.llvm.org/D53950) in case I am wrong and it has to be reverted. llvm-svn: 345876	2018-11-01 19:11:05 +00:00
Max Kazantsev	46955b58ee	[NFC] Reorganize code to prepare it for more transforms llvm-svn: 345820	2018-11-01 09:42:50 +00:00
Max Kazantsev	3d347bf545	[IndVars] Smart hard uses detection When rewriting loop exit values, IndVars considers this transform not profitable if the loop instruction has a loop user which it believes cannot be optimized away. In current implementation only calls that immediately use the instruction are considered as such. This patch extends the definition of "hard" users to any side-effecting instructions (which usually cannot be optimized away from the loop) and also allows handling of not just immediate users, but use chains. Differentlai Revision: https://reviews.llvm.org/D51584 Reviewed By: etherzhhb llvm-svn: 345814	2018-11-01 06:47:01 +00:00
Volkan Keles	3ca146d083	[InstCombine] Combine nested min/max intrinsics with constants Reviewers: arsenm, spatel Reviewed By: spatel Subscribers: lebedev.ri, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53774 llvm-svn: 345751	2018-10-31 17:50:52 +00:00
Sanjay Patel	1c254c6716	[InstCombine] refactor fabs+fcmp fold; NFC Also, remove/replace/minimize/enhance the tests for this fold. The code drops FMF, so it needs more tests and at least 1 fix. llvm-svn: 345734	2018-10-31 16:34:43 +00:00
Sanjay Patel	b9fe3fbb57	[InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC The 'OLT' case was updated at rL266175, so I assume it was just an oversight that 'UGE' was not included because that patch handled both predicates in InstSimplify. llvm-svn: 345727	2018-10-31 15:31:45 +00:00
Sanjay Patel	85cba3b6fb	[InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725	2018-10-31 14:57:23 +00:00
Fedor Sergeev	412ed34744	[LoopUnroll] allow customization for new-pass-manager version of LoopUnroll Unlike its legacy counterpart new pass manager's LoopUnrollPass does not provide any means to select which flavors of unroll to run (runtime, peeling, partial), relying on global defaults. In some cases having ability to run a restricted LoopUnroll that does more than LoopFullUnroll is needed. Introduced LoopUnrollOptions to select optional unroll behaviors. Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing. Reviewers: chandlerc, tejohnson Differential Revision: https://reviews.llvm.org/D53440 llvm-svn: 345723	2018-10-31 14:33:14 +00:00
Max Kazantsev	541f824d32	[IndVars] Strengthen restricton in rewriteLoopExitValues For some unclear reason rewriteLoopExitValues considers recalculation after the loop profitable if it has some "soft uses" outside the loop (i.e. any use other than call and return), even if we have proved that it has a user inside the loop which we think will not be optimized away. There is no existing unit test that would explain this. This patch provides an example when rematerialisation of exit value is not profitable but it passes this check due to presence of a "soft use" outside the loop. It makes no sense to recalculate value on exit if we are going to compute it due to some irremovable within the loop. This patch disallows applying this transform in the described situation. Differential Revision: https://reviews.llvm.org/D51581 Reviewed By: etherzhhb llvm-svn: 345708	2018-10-31 10:30:50 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Alexander Potapenko	c1c4c9a494	[MSan] another take at instrumenting inline assembly - now with calls Turns out it's not always possible to figure out whether an asm() statement argument points to a valid memory region. One example would be per-CPU objects in the Linux kernel, for which the addresses are calculated using the FS register and a small offset in the .data..percpu section. To avoid pulling all sorts of checks into the instrumentation, we replace actual checking/unpoisoning code with calls to msan_instrument_asm_load(ptr, size) and msan_instrument_asm_store(ptr, size) functions in the runtime. This patch doesn't implement the runtime hooks in compiler-rt, as there's been no demand in assembly instrumentation for userspace apps so far. llvm-svn: 345702	2018-10-31 09:32:47 +00:00
Matthias Braun	9fd397b423	ADT/STLExtras: Introduce llvm::empty; NFC This is modeled after C++17 std::empty(). Differential Revision: https://reviews.llvm.org/D53909 llvm-svn: 345679	2018-10-31 00:23:23 +00:00
Sanjay Patel	4c39dfc91e	[InstCombine] use 'match' to reduce code; NFC llvm-svn: 345647	2018-10-30 20:52:25 +00:00
Quentin Colombet	900678227c	[InstCombine] Teach the move free before null test opti how to deal with noop casts InstCombine features an optimization that essentially replaces: if (a) free(a) into: free(a) Right now, this optimization is gated by the minsize attribute and therefore we only perform it if we can prove that we are going to be able to eliminate the branch and the destination block. However when casts are involved the optimization would fail to apply, because the optimization was not smart enough to realize that it is possible to also move the casts away from the destination block and that is harmless to the performance since they are just noops. E.g., foo(int a) if (a) free((char)a) Wouldn't be optimized by instcombine, because - We would refuse to hoist the `bitcast i32* %a to i8` in the source block - We would fail to see that `bitcast i32* %a to i8` and %a are the same value. This patch fixes both these problems: - It teaches the pattern matching of the comparison how to look through casts. - It checks that whether the additional instruction in the destination block can be hoisted and are harmless performance-wise. - It hoists all the code of the destination block in the source block. Differential Revision: D53356 llvm-svn: 345644	2018-10-30 20:51:04 +00:00
Calixte Denizet	38d50545fe	[GCOV] Function counters are wrong when on one line Summary: After commit https://reviews.llvm.org/rL344228, the function definitions have a counter but when on one line the counter is wrong (e.g. void foo() { }) I added a test in: https://reviews.llvm.org/D53601 Reviewers: marco-c Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru Differential Revision: https://reviews.llvm.org/D53600 llvm-svn: 345624	2018-10-30 18:41:31 +00:00
Simon Pilgrim	44a9a71d2a	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368) Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617	2018-10-30 18:10:02 +00:00
Sanjay Patel	68a61cb07c	[InstCombine] use getFltSemantics() instead of duplicating it; NFC llvm-svn: 345613	2018-10-30 16:21:56 +00:00
Sanjay Patel	b12e410082	[InstCombine] try to turn shuffle into insertelement shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC' The motivating case is at least a couple of steps away: I noticed that SLPVectorizer does not analyze shuffles as well as sequences of insert/extract in PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 ...so SLP may fail to vectorize when source code has shuffles to start with or instcombine has converted insert/extract to shuffles. Independent of that, an insertelement is always a simpler op for IR analysis vs. a shuffle, so we should transform to insert when possible. I don't think there's any codegen concern here - if a target can't insert a scalar directly to some fixed element in a vector (x86?), then this should get expanded to the insert+shuffle that we started with. Differential Revision: https://reviews.llvm.org/D53507 llvm-svn: 345607	2018-10-30 15:26:39 +00:00
Jonas Paulsson	1f067c94dc	[LoopVectorizer] Fix for cost values of memory accesses. This commit is a combination of two patches: * "Fix in getScalarizationOverhead()" If target returns false in TTI.prefersVectorizedAddressing(), it means the address registers will not need to be extracted. Therefore, there should be no operands scalarization overhead for a load instruction. * "Don't pass the instruction pointer from getMemInstScalarizationCost." Since VF is always > 1, this is a cost query for an instruction in the vectorized loop and it should not be evaluated within the scalar context of the instruction. Review: Ulrich Weigand, Hal Finkel https://reviews.llvm.org/D52351 https://reviews.llvm.org/D52417 llvm-svn: 345603	2018-10-30 14:34:15 +00:00
Nicola Zaghen	f96383c99e	[SROA] Use offset sizes from the DataLayout instead of the pointer siezes. This fixes an assertion when constant folding a GEP when the part of the offset was in i32 (IndexSize, as per DataLayout) and part in the i64 (PointerSize) in the newly created test case. Differential Revision: https://reviews.llvm.org/D52609 llvm-svn: 345585	2018-10-30 11:15:04 +00:00
Vedant Kumar	dd4be53b20	[HotColdSplitting] Allow outlining single-block cold regions It can be profitable to outline single-block cold regions because they may be large. Allow outlining single-block regions if they have over some threshold of non-debug, non-terminator instructions. I chose 3 as the threshold after experimenting with several internal frameworks. In practice, reducing the threshold further did not give much improvement, whereas increasing it resulted in substantial regressions. Differential Revision: https://reviews.llvm.org/D53824 llvm-svn: 345524	2018-10-29 19:15:39 +00:00
Florian Hahn	fc7654a67b	[Local] Keep K's range if K does not move when combining metadata. As K has to dominate I, IIUC I's range metadata must be a subset of K's. After Eli's recent clarification to the LangRef, loading a value outside of the range is undefined behavior. Therefore if I's range contains elements outside of K's range and we would load one such value, K would cause undefined behavior. In cases like hoisting/sinking, we still want the most generic range over all code paths to/from the hoist/sink point. As suggested in the patches related to D47339, I will refactor the handling of those scenarios and try to decouple it from this function as follow up, once we switched to a similar handling of metadata in most of combineMetadata. I updated some tests checking mostly the merging of metadata to keep the metadata of to dominating load. The most interesting one is probably test8 in test/Transforms/JumpThreading/thread-loads.ll. It contained a comment about the alias metadata preventing us to eliminate the branch, but it seem like the actual problem currently is that we merge the ranges of both loads and cannot eliminate the icmp afterwards. With this patch, we manage to eliminate the icmp, as the range of the first load excludes 8. Reviewers: efriedma, nlopes, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D51629 llvm-svn: 345456	2018-10-27 16:53:45 +00:00
Simon Pilgrim	a132016d4d	Fix -Wdocumentation warning. NFCI. llvm-svn: 345454	2018-10-27 15:14:42 +00:00
Leonard Chan	eebecb3214	Revert "[PassManager/Sanitizer] Enable usage of ported AddressSanitizer passes with -fsanitize=address" This reverts commit 8d6af840396f2da2e4ed6aab669214ae25443204 and commit b78d19c287b6e4a9abc9fb0545de9a3106d38d3d which causes slower build times by initializing the AddressSanitizer on every function run. The corresponding revisions are https://reviews.llvm.org/D52814 and https://reviews.llvm.org/D52739. llvm-svn: 345433	2018-10-26 22:51:51 +00:00
Christy Lee	3cc0e935c4	Pointer types were treated as zero-size by MergeICmps Summary: The visitICmp analysis function would record compares of pointer types, as size 0. This causes the resulting memcmp() call to have the wrong total size. Found with "self-build" of clang/LLVM on Windows. Reviewers: christylee, trentxintong, courbet Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53536 llvm-svn: 345413	2018-10-26 18:02:06 +00:00
Max Kazantsev	619a83463f	[SimpleLoopUnswitch] Unswitch by experimental.guard intrinsics This patch adds support of `llvm.experimental.guard` intrinsics to non-trivial simple loop unswitching. These intrinsics represent implicit control flow which has pretty much the same semantics as usual conditional branches. The algorithm of dealing with them is following: - Consider guards as unswitching candidates; - If a guard is considered the best candidate, turn it into a branch; - Apply normal unswitching algorithm on this branch. The patch has no compile time effect on code that does not contain any guards. Differential Revision: https://reviews.llvm.org/D53744 Reviewed By: chandlerc llvm-svn: 345387	2018-10-26 14:20:11 +00:00
Max Kazantsev	bde31000b1	[SimpleLoopUnswitch] Make all checks before actual non-trivial unswitch We should be able to make all relevant checks before we actually start the non-trivial unswitching, so that we could guarantee that once we have started the transform, it will always succeed. Reviewed By: chandlerc Differential Revision: https://reviews.llvm.org/D53747 llvm-svn: 345375	2018-10-26 09:52:58 +00:00
Cameron McInally	384a74b0e6	[FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes Replacing BinaryOperator::isFNeg(...) to avoid regressions when we separate FNeg from the FSub IR instruction. Differential Revision: https://reviews.llvm.org/D53650 llvm-svn: 345295	2018-10-25 18:09:33 +00:00
Carlos Alberto Enciso	9a24e1a7cd	[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG. When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines. Differential Revision: https://reviews.llvm.org/D53287 llvm-svn: 345250	2018-10-25 09:58:59 +00:00
Gabor Buella	1f6ca0ba15	Add -instcombine-code-sinking option Reviewers: craig.topper, andrew.w.kaylor, efriedma Reviewed By: craig.topper, andrew.w.kaylor, efriedma Differential Revision: https://reviews.llvm.org/D52709 llvm-svn: 345248	2018-10-25 08:32:29 +00:00
Alina Sbirlea	ad4d018202	Update MemorySSA in LoopRotate. Summary: Teach LoopRotate to preserve MemorySSA. Enable tests for correctness, dependency disabled by default. Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D51718 llvm-svn: 345216	2018-10-24 22:46:45 +00:00
Vedant Kumar	c299006879	[HotColdSplitting] Identify larger cold regions using domtree queries The current splitting algorithm works in three stages: 1) Identify cold blocks, then 2) Use forward/backward propagation to mark hot blocks, then 3) Grow a SESE region of blocks outside of the set of hot blocks and start outlining. While testing this pass on Apple internal frameworks I noticed that some kinds of control flow (e.g. loops) are never outlined, even though they unconditionally lead to / follow cold blocks. I noticed two other issues related to how cold regions are identified: - An inconsistency can arise in the internal state of the hotness propagation stage, as a block may end up in both the ColdBlocks set and the HotBlocks set. Further inconsistencies can arise as these sets do not match what's in ProfileSummaryInfo. - It isn't necessary to limit outlining to single-exit regions. This patch teaches the splitting algorithm to identify maximal cold regions and outline them. A maximal cold region is defined as the set of blocks post-dominated by a cold sink block, or dominated by that sink block. This approach can successfully outline loops in the cold path. As a side benefit, it maintains less internal state than the current approach. Due to a limitation in CodeExtractor, blocks within the maximal cold region which aren't dominated by a single entry point (a so-called "max ancestor") are filtered out. Results: - X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to 47KB pre-patch, or a ~3x improvement. Did not see a performance impact across two runs. - AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB of TEXT were outlined. Ditto re: performance impact. - Outlining results improve marginally in the internal frameworks I tested. Follow-ups: - Outline more than once per function, outline large single basic blocks, & try to remove unconditional branches in outlined functions. Differential Revision: https://reviews.llvm.org/D53627 llvm-svn: 345209	2018-10-24 22:15:41 +00:00
Teresa Johnson	c8dba682bb	[hot-cold-split] Name split functions with ".cold" suffix Summary: The current default of appending "_"+entry block label to the new extracted cold function breaks demangling. Change the deliminator from "_" to "." to enable demangling. Because the header block label will be empty for release compile code, use "extracted" after the "." when the label is empty. Additionally, add a mechanism for the client to pass in an alternate suffix applied after the ".", and have the hot cold split pass use "cold."+Count, where the Count is currently 1 but can be used to uniquely number multiple cold functions split out from the same function with D53588. Reviewers: sebpop, hiraditya Subscribers: llvm-commits, erik.pilkington Differential Revision: https://reviews.llvm.org/D53534 llvm-svn: 345178	2018-10-24 18:53:47 +00:00
Sanjay Patel	3b206305fd	[InstCombine] try harder to form select from logic ops (2nd try) The original patch was committed here: rL344609 ...and reverted: rL344612 ...because it did not properly check/test data types before calling ComputeNumSignBits(). The tests that caused bot failures for the previous commit are over-reaching front-end tests that run the entire -O optimizer pipeline: Clang :: CodeGen/builtins-systemz-zvector.c Clang :: CodeGen/builtins-systemz-zvector2.c I've added a negative test here to ensure coverage for that case. The new early exit check also tests the type of the 'B' parameter, so we don't waste time on matching if either value is unsuitable. Original commit message: This is part of solving PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 The patterns shown here are a special case of something that we already convert to select. Using ComputeNumSignBits() catches that case (but not the more complicated motivating patterns yet). The backend has hooks/logic to convert back to logic ops if that's better for the target. llvm-svn: 345149	2018-10-24 15:17:56 +00:00
Cameron McInally	678f43f666	[FPEnv] Convert more BinaryOperator::isFNeg(...) to m_FNeg(...) This work is to avoid regressions when we seperate FNeg from the FSub IR instruction. Differential Revision: https://reviews.llvm.org/D53205 llvm-svn: 345146	2018-10-24 14:45:18 +00:00
Gil Rapaport	c523036fd2	Revert r345114 Investigating fails. llvm-svn: 345123	2018-10-24 08:41:22 +00:00
Dorit Nuzman	5114390e48	[LV] Don't have fold-tail under optsize invalidate interleave-groups when masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 llvm-svn: 345115	2018-10-24 07:11:38 +00:00
Gil Rapaport	5012e7f6ac	[LSR] Combine unfolded offset into invariant register LSR reassociates constants as unfolded offsets when the constants fit as immediate add operands, which currently prevents such constants from being combined later with loop invariant registers. This patch modifies GenerateCombinations() to generate a second formula which includes the unfolded offset in the combined loop-invariant register. Differential Revision: https://reviews.llvm.org/D51861 llvm-svn: 345114	2018-10-24 07:08:38 +00:00
Wei Mi	80a0c97e07	[PM] keeping history when original SCC split and then merge into itself in the same round of SCC update. In https://reviews.llvm.org/rL309784, inline history is added to prevent infinite inlining across multiple run of inliner and SCC update, but the history will only be kept when new SCC is actually generated during SCC update. We found a case that SCC can be split and then merge into itself in the same round of SCC update, so the same SCC will be pop out from UR.CWorklist and then added back immediately, without any new SCC generated, that is why the existing patch cannot catch the infinite inline case. What the patch does is even if no new SCC is generated, if only the current SCC appears in UR.CWorklist again, then keep the inline history. Differential Revision: https://reviews.llvm.org/D52915 llvm-svn: 345103	2018-10-23 23:29:45 +00:00
Vedant Kumar	503154615d	[HotColdSplitting] Attach MinSize to outlined code Outlined code is cold by assumption, so it makes sense to optimize it for minimal code size rather than performance. After r344869 moved the splitting pass to the end of the IR pipeline, this does not result in much of a code size reduction. This is probably because a comparatively small number backend transforms make use of the MinSize hint. Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of 919 bytes of TEXT reduction. I didn't measure a significant performance impact. Differential Revision: https://reviews.llvm.org/D53518 llvm-svn: 345072	2018-10-23 19:41:12 +00:00
Sanjay Patel	95790c546f	[InstCombine] use 'match' to simplify code There's probably some vector-with-undef-element pattern that shows an improvement, so this is probably not quite 'NFC'. This is the last step towards removing the fake binop queries for not/neg. Ie, there are no more uses of those functions in trunk. Fneg should follow. llvm-svn: 345050	2018-10-23 16:54:28 +00:00
Jordan Rupprecht	2fed6ac186	[DebugInfo][GlobalOpt] Fix -debugify for globalopt shrinking globals to booleans. Summary: TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass. Fixes PR37959. The test case here is the simplified .ll for: ``` static int foo; int bar() { foo = 5; return foo; } ``` Reviewers: dblaikie, gbedwell, aprantl Reviewed By: dblaikie Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D53531 llvm-svn: 345046	2018-10-23 16:35:51 +00:00
Sanjay Patel	5b6b090cf2	[Reassociate] replace fake binop queries with 'match' API We need to update this code before introducing an 'fneg' instruction in IR, so we might as well kill off the integer neg/not queries too. This is no-functional-change-intended for scalar code and most vector code. For vectors, we can see that the 'match' API allows for undef elements in constants, so we optimize those cases better. Ideally, there would be a test for each code diff, but I don't see evidence of that for the existing code, so I didn't try very hard to come up with new vector tests for each code change. Differential Revision: https://reviews.llvm.org/D53533 llvm-svn: 345042	2018-10-23 15:55:06 +00:00
Simon Pilgrim	532a0f122e	[SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions Expand arithmetic reduction to include mul/and/or/xor instructions. This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests. This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason. Differential Revision: https://reviews.llvm.org/D53473 llvm-svn: 345037	2018-10-23 15:13:09 +00:00
Sanjay Patel	747feb28e4	[InstCombine] use 'match' to handle vectors and simplify code This is another step towards completely removing the fake binop queries for not/neg/fneg. llvm-svn: 345036	2018-10-23 15:05:12 +00:00
Sanjay Patel	ad76c682c7	[InstCombine] swap select profile metadata when swapping select ops llvm-svn: 345034	2018-10-23 14:43:31 +00:00
Sanjay Patel	5141435d23	[SLSR] use 'match' to simplify code; NFC This pass could probably be modified slightly to allow vector splat transforms for practically no cost, but it only works on scalars for now. So the use of the newer 'match' API should make no functional difference. llvm-svn: 345030	2018-10-23 14:07:39 +00:00
Dorit Nuzman	da5dc13355	Leftover bits from https://reviews.llvm.org/D53420 that were accidentally left out of revision 344883 llvm-svn: 345021	2018-10-23 11:51:55 +00:00
Kostya Serebryany	af95597c3c	[hwasan] add stack frame descriptions. Summary: At compile-time, create an array of {PC,HumanReadableStackFrameDescription} for every function that has an instrumented frame, and pass this array to the run-time at the module-init time. Similar to how we handle pc-table in SanitizerCoverage. The run-time is dummy, will add the actual logic in later commits. Reviewers: morehouse, eugenis Reviewed By: eugenis Subscribers: srhines, llvm-commits, kubamracek Differential Revision: https://reviews.llvm.org/D53227 llvm-svn: 344985	2018-10-23 00:50:40 +00:00
Sanjay Patel	dd1c3df72d	[Reassociate] add 'using namespace' to reduce bloat; NFC llvm-svn: 344959	2018-10-22 21:37:02 +00:00
Teresa Johnson	f431a2f261	[hot-cold-split] Add opt remark on success Summary: Emit optimization remark on successful hot cold split. Reviewers: sebpop, hiraditya Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53512 llvm-svn: 344938	2018-10-22 19:06:42 +00:00
Benjamin Kramer	3e778165d6	[CGProfile] Turn constant-size SmallVector into array No functionality change. llvm-svn: 344893	2018-10-22 10:51:34 +00:00
Dorit Nuzman	3ec99fe21b	[IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 llvm-svn: 344883	2018-10-22 06:17:09 +00:00
Aditya Kumar	d9e2e383a9	Schedule Hot Cold Splitting pass after most optimization passes Summary: In the new+old pass manager, hot cold splitting was schedule too early. Thanks to Vedant for pointing this out. Reviewers: sebpop, vsk Reviewed By: sebpop, vsk Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D53437 llvm-svn: 344869	2018-10-21 18:11:56 +00:00
Sanjay Patel	0522b0da31	[InstCombine] use 'match' to simplify code; NFC llvm-svn: 344855	2018-10-20 17:15:57 +00:00
Sanjay Patel	ec572ade20	[InstCombine] make code more flexible with lambda; NFC I couldn't tell from svn history when these checks were added, but it pre-dates the split of instcombine into its own directory at rL92459. The motivation for changing the check is partly shown by the code in PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 There are also existing regression tests for SLPVectorizer with sequences of extract+insert that are likely assumed to become shuffles by the vectorizer cost models. llvm-svn: 344854	2018-10-20 16:58:27 +00:00
Sanjay Patel	729c4362cf	[InstCombine] add explanatory comment for strange vector logic; NFC llvm-svn: 344852	2018-10-20 16:25:55 +00:00
Evandro Menezes	164ea101ab	[NFC][InstCombine] Undo stray change Undo stray change introduced by r344725. llvm-svn: 344814	2018-10-19 20:57:45 +00:00
Thomas Lively	c339250e12	[InstCombine] InstCombine and InstSimplify for minimum and maximum Summary: Depends on D52765 Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52766 llvm-svn: 344799	2018-10-19 19:01:26 +00:00
Sanjay Patel	70daf85bc2	[InstCombine] use m_Neg() in dyn_castNegVal() to match vectors with undef elts llvm-svn: 344793	2018-10-19 17:54:53 +00:00
Fangrui Song	2e83b2e9ee	Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC llvm-svn: 344774	2018-10-19 06:12:02 +00:00
Ayal Zaks	b0b5312e67	[LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 llvm-svn: 344743	2018-10-18 15:03:15 +00:00
Mikael Holmen	e3605d0f70	Add a emitUnaryFloatFnCall version that fetches the function name from TLI Summary: In several places in the code we use the following pattern: if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) { [...] Value Res = emitUnaryFloatFnCall(X, TLI.getName(LibFunc_tan), B, Attrs); [...] } In short, we check if there is a lib-function for a certain type, and then we _always_ fetch the name of the "double" version of the lib function and construct a call to the appropriate function, that we just checked exists, using that "double" name as a basis. This is of course a problem in cases where the target doesn't support the "double" version, but e.g. only the "float" version. In that case TLI.getName(LibFunc_tan) returns "", and emitUnaryFloatFnCall happily appends an "f" to "", and we erroneously end up with a call to a function called "f". To solve this, the above pattern is changed to if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) { [...] Value Res = emitUnaryFloatFnCall(X, &TLI, LibFunc_tan, LibFunc_tanf, LibFunc_tanl, B, Attrs); [...] } I.e instead of first fetching the name of the "double" version and then letting emitUnaryFloatFnCall() add the final "f" or "l", we let emitUnaryFloatFnCall() fetch the right name from TLI. Reviewers: eli.friedman, efriedma Reviewed By: efriedma Subscribers: efriedma, bjope, llvm-commits Differential Revision: https://reviews.llvm.org/D53370 llvm-svn: 344725	2018-10-18 06:27:53 +00:00
Chandler Carruth	60b2e054dc	[TI removal] Switch simple loop unswitch to `Instruction`. llvm-svn: 344719	2018-10-18 00:40:26 +00:00
Chandler Carruth	c6cad4251e	[TI removal] Switch NewGVN to directly use `Instruction`. llvm-svn: 344718	2018-10-18 00:39:46 +00:00
Chandler Carruth	c8eaea71c9	[TI removal] Use `Instruction` instead of `TerminatorInst` for a variable's type. llvm-svn: 344717	2018-10-18 00:39:18 +00:00
Chandler Carruth	8b7a8123dd	[TI removal] Update CodeExtractor to use Instruction directly. llvm-svn: 344716	2018-10-18 00:38:54 +00:00
Chandler Carruth	c1e3ee29a4	[TI removal] Switch ObjCARC code to directly use the nice range-based successors API or directly build the iterators out of the terminator instruction and avoid requiring a TerminatorInst variable. llvm-svn: 344715	2018-10-18 00:38:34 +00:00
Chandler Carruth	93cf2ea27a	[TI removal] Switch MergeFunctions to directly use Instruction API. llvm-svn: 344714	2018-10-18 00:37:37 +00:00
Nicolai Haehnle	0823050b9f	StructurizeCFG: Simplify inserted PHI nodes Summary: This improves subsequent divergence analysis in some cases. Change-Id: I5e95e7ec7fd3fa80d414d1a53a02fea23e3d67d3 Reviewers: arsenm, rampitec Subscribers: jvesely, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53316 llvm-svn: 344697	2018-10-17 15:37:41 +00:00
Fedor Sergeev	c297e84b97	[LoopPredication] add some simple stats Just adding some useful statistics to LoopPredication pass which was lacking any of these. llvm-svn: 344681	2018-10-17 09:02:54 +00:00
Leonard Chan	423957ad3a	[Sanitizer][PassManager] Fix for failing ASan tests on arm-linux-gnueabihf Forgot to initialize the legacy pass in it's constructor. Differential Revision: https://reviews.llvm.org/D53350 llvm-svn: 344659	2018-10-17 00:16:07 +00:00
Teresa Johnson	d2c234a4cc	[ThinLTO] Add importing stats to thin link Summary: Previously we could only get the number of imported functions and variables from the backend. This adds stats to the thin link where the importing is decided. Reviewers: wmi Subscribers: inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D53337 llvm-svn: 344658	2018-10-16 23:49:50 +00:00
Jonathan Metzman	5eb8cba280	[SanitizerCoverage] Don't duplicate code to get section pointers Summary: Merge code used to get section start and section end pointers for SanitizerCoverage constructors. This includes code that handles getting the start pointers when targeting MSVC. Reviewers: kcc, morehouse Reviewed By: morehouse Subscribers: kcc, hiraditya Differential Revision: https://reviews.llvm.org/D53211 llvm-svn: 344657	2018-10-16 23:43:57 +00:00
David Bolvansky	7c7760da7e	[InstCombine] Cleanup libfunc attribute inferring Reviewers: efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53338 llvm-svn: 344645	2018-10-16 21:18:31 +00:00
Anna Thomas	6f732bfb79	[LV] Teach vectorizer about variant value store into uniform address Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 llvm-svn: 344613	2018-10-16 15:46:26 +00:00
Sanjay Patel	bb3dd34e62	revert rL344609: [InstCombine] try harder to form select from logic ops I noticed a missing check and added it at rL344610, but there actually are codegen tests that will fail without that, so I'll edit those and submit a fixed patch with more tests. llvm-svn: 344612	2018-10-16 15:26:08 +00:00
Sanjay Patel	f6a7c8b1fc	[InstCombine] make sure type is integer before calling ComputeNumSignBits llvm-svn: 344610	2018-10-16 14:44:50 +00:00
Sanjay Patel	0c48c977b8	[InstCombine] try harder to form select from logic ops This is part of solving PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 The patterns shown here are a special case of something that we already convert to select. Using ComputeNumSignBits() catches that case (but not the more complicated motivating patterns yet). The backend has hooks/logic to convert back to logic ops if that's better for the target. llvm-svn: 344609	2018-10-16 14:35:21 +00:00
Max Kazantsev	9c90ec2fae	[NFC] Make LoopSafetyInfo abstract to allow alternative implementations llvm-svn: 344592	2018-10-16 08:31:05 +00:00
Max Kazantsev	8d56be7070	[NFC] Encapsulate work with BlockColors in LoopSafetyInfo llvm-svn: 344590	2018-10-16 08:07:14 +00:00
David Stenberg	c9163855dd	[DebugInfo][LCSSA] Rewrite pre-existing debug values outside loop Summary: Extend LCSSA so that debug values outside loops are rewritten to use the PHI nodes that the pass creates. This fixes PR39019. In that case, we ran LCSSA on a loop that was later on vectorized, which left us with something like this: for.cond.cleanup: %add.lcssa = phi i32 [ %add, %for.body ], [ %34, %middle.block ] call void @llvm.dbg.value(metadata i32 %add, ret i32 %add.lcssa for.body: %add = [...] br i1 %exitcond, label %for.cond.cleanup, label %for.body which later resulted in the debug.value becoming undef when removing the scalar loop (and the location would have probably been wrong for the vectorized case otherwise). As we now may need to query the AvailableVals cache more than once for a basic block, FindAvailableVals() in SSAUpdaterImpl is changed so that it updates the cache for blocks that we do not create a PHI node for, regardless of the block's number of predecessors. The debug value in the attached IR reproducer would not be properly rewritten without this. Debug values residing in blocks where we have not inserted any PHI nodes are currently left as-is by this patch. I'm not sure what should be done with those uses. Reviewers: mattd, aprantl, vsk, probinson Reviewed By: mattd, aprantl Subscribers: jmorse, gbedwell, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D53130 llvm-svn: 344589	2018-10-16 08:06:48 +00:00
Max Kazantsev	c8466f937c	[NFC] Turn isGuaranteedToExecute into a method llvm-svn: 344587	2018-10-16 06:34:53 +00:00
Lang Hames	8f9a2446e0	Change a TerminatorInst* to an Instruction* in HotColdSplitting.cpp. r344558 added an assignment to a TerminatorInst* from BasicBlock::getTerminatorInst(), but BasicBlock::getTerminatorInst() returns an Instruction* rather than a TerminatorInst* since r344504 so this fails to compile. Changing the variable to an Instruction* should get the bots building again. llvm-svn: 344566	2018-10-15 22:27:03 +00:00
Sebastian Pop	542e522b87	[hot-cold-split] fix static analysis of cold regions Make the code of blockEndsInUnreachable to match the function blockEndsInUnreachable in CodeGen/BranchFolding.cpp. I also have added a note to make sure the code of this function will not be modified unless the back-end version is also modified. An early return before outlining has been added to avoid outlining the full function body when the first block in the function is marked cold. The static analysis of cold code has been amended to avoid marking the whole function as cold by back-propagation because the back-propagation would mark blocks with return statements as cold. The patch adds debug statements to help discover these problems. Differential Revision: https://reviews.llvm.org/D52904 llvm-svn: 344558	2018-10-15 21:43:11 +00:00
Vedant Kumar	15718a6190	[CodeExtractor] Erase debug intrinsics in outlined thunks (fix PR22900) Variable updates within the outlined function are invisible to debuggers. This could be improved by defining a DISubprogram for the new function. For the moment, simply erase the debug intrinsics instead. This fixes verifier failures about function-local metadata being used in the wrong function, seen while testing the hot/cold splitting pass. rdar://45142482 Differential Revision: https://reviews.llvm.org/D53267 llvm-svn: 344545	2018-10-15 19:22:20 +00:00
Chandler Carruth	e303c87e19	[TI removal] Make `getTerminator()` return a generic `Instruction`. This removes the primary remaining API producing `TerminatorInst` which will reduce the rate at which code is introduced trying to use it and generally make it much easier to remove the remaining APIs across the codebase. Also clean up some of the stragglers that the previous mechanical update of variables missed. Users of LLVM and out-of-tree code generally will need to update any explicit variable types to handle this. Replacing `TerminatorInst` with `Instruction` (or `auto`) almost always works. Most of these edits were made in prior commits using the perl one-liner: ``` perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator)/Instruction\1/g' ``` This also my break some rare use cases where people overload for both `Instruction` and `TerminatorInst`, but these should be easily fixed by removing the `TerminatorInst` overload. llvm-svn: 344504	2018-10-15 10:42:50 +00:00
Chandler Carruth	52eaaf3ff8	[TI removal] Rework `InstVisitor` to support visiting instructions that are terminators without relying on the specific `TerminatorInst` type. This required cleaning up two users of `InstVisitor`s usage of `TerminatorInst` as well. llvm-svn: 344503	2018-10-15 10:10:54 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Chandler Carruth	ae98759ec5	[TI removal] Remove `TerminatorInst` from GVN.h and GVN.cpp. This is the last interesting usage in all of LLVM's headers. The remaining usages in headers are the core typesystem bits (Core.h, instruction types, and InstVisitor) and as the return of `BasicBlock::getTerminator`. The latter is the big remaining API point that I'll remove after mass updates to user code. llvm-svn: 344501	2018-10-15 10:00:15 +00:00
Chandler Carruth	4a2d58e16a	[TI removal] Remove `TerminatorInst` from BasicBlockUtils.h This requires updating a number of .cpp files to adapt to the new API. I've just systematically updated all uses of `TerminatorInst` within these files te `Instruction` so thta I won't have to touch them again in the future. llvm-svn: 344498	2018-10-15 09:34:05 +00:00
Chandler Carruth	b99a24689b	[TI removal] Remove TerminatorInst as an input parameter from all public LLVM APIs. There weren't very many. We still have the instruction visitor, and APIs with TerminatorInst as a return type or an output parameter. llvm-svn: 344494	2018-10-15 09:17:09 +00:00
Ayal Zaks	e567b5b526	[LV] Fix comments reported when not vectorizing single iteration loops; NFC Landing this as a separate part of https://reviews.llvm.org/D50480, being a seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). llvm-svn: 344483	2018-10-14 17:53:02 +00:00
Sanjay Patel	7181146c6c	[InstCombine] combine a shuffle and an extract subvector shuffle This is part of the missing IR-level folding noted in D52912. This should be ok as a canonicalization because the new shuffle mask can't be any more complicated than the existing shuffle mask. If there's some target where the shorter vector shuffle is not legal, it should just end up expanding to something like the pair of shuffles that we're starting with here. Differential Revision: https://reviews.llvm.org/D53037 llvm-svn: 344476	2018-10-14 15:25:06 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Benjamin Kramer	c55e997556	Move some helpers from the global namespace into anonymous ones. llvm-svn: 344468	2018-10-13 22:18:22 +00:00
Sanjay Patel	47579b21e2	[InstCombine] fix complexity canonicalization with fake unary vector ops This is a preliminary step to avoid regressions when we add an actual 'fneg' instruction to IR. See D52934 and D53205. llvm-svn: 344458	2018-10-13 16:15:37 +00:00
David Bolvansky	e8b3bba717	[InstCombine] Fixed crash with aliased functions Summary: Fixes PR39177 Reviewers: spatel, jbuening Reviewed By: jbuening Subscribers: jbuening, llvm-commits Differential Revision: https://reviews.llvm.org/D53129 llvm-svn: 344454	2018-10-13 15:21:55 +00:00
Kostya Serebryany	bc504559ec	move GetOrCreateFunctionComdat to Instrumentation.cpp/Instrumentation.h Summary: GetOrCreateFunctionComdat is currently used in SanitizerCoverage, where it's defined. I'm planing to use it in HWASAN as well, so moving it into a common location. NFC Reviewers: morehouse Reviewed By: morehouse Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53218 llvm-svn: 344433	2018-10-12 23:21:48 +00:00
Jonathan Metzman	0b94e88007	[SanitizerCoverage] Prevent /OPT:REF from stripping constructors Summary: Linking with the /OPT:REF linker flag when building COFF files causes the linker to strip SanitizerCoverage's constructors. Prevent this by giving the constructors WeakODR linkage and by passing the linker a directive to include sancov.module_ctor. Include a test in compiler-rt to verify libFuzzer can be linked using /OPT:REF Reviewers: morehouse, rnk Reviewed By: morehouse, rnk Subscribers: rnk, morehouse, hiraditya Differential Revision: https://reviews.llvm.org/D52119 llvm-svn: 344391	2018-10-12 18:11:47 +00:00
Max Moroz	4d010ca35b	[SanitizerCoverage] Make Inline8bit and TracePC counters dead stripping resistant. Summary: Otherwise, at least on Mac, the linker eliminates unused symbols which causes libFuzzer to error out due to a mismatch of the sizes of coverage tables. Issue in Chromium: https://bugs.chromium.org/p/chromium/issues/detail?id=892167 Reviewers: morehouse, kcc, george.karpenkov Reviewed By: morehouse Subscribers: kubamracek, llvm-commits Differential Revision: https://reviews.llvm.org/D53113 llvm-svn: 344345	2018-10-12 13:59:31 +00:00
Tim Northover	487780678f	SCCP: avoid caching DenseMap entry that might be invalidated. Later calls to getValueState might insert entries into the ValueState map and cause reallocation, invalidating a reference. llvm-svn: 344327	2018-10-12 09:01:59 +00:00
Eugene Leviant	eddf6b5df5	[ThinLTO] Don't import GV which contains blockaddress Differential revision: https://reviews.llvm.org/D53139 llvm-svn: 344325	2018-10-12 07:24:02 +00:00
Kostya Serebryany	d891ac9794	merge two near-identical functions createPrivateGlobalForString into one Summary: We have two copies of createPrivateGlobalForString (in asan and in esan). This change merges them into one. NFC Reviewers: vitalybuka Reviewed By: vitalybuka Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53178 llvm-svn: 344314	2018-10-11 23:03:27 +00:00
Leonard Chan	64e21b5cfd	[PassManager/Sanitizer] Port of AddresSanitizer pass from legacy to new PassManager This patch ports the legacy pass manager to the new one to take advantage of the benefits of the new PM. This involved moving a lot of the declarations for `AddressSantizer` to a header so that it can be publicly used via PassRegistry.def which I believe contains all the passes managed by the new PM. This patch essentially decouples the instrumentation from the legacy PM such hat it can be used by both legacy and new PM infrastructure. Differential Revision: https://reviews.llvm.org/D52739 llvm-svn: 344274	2018-10-11 18:31:51 +00:00
Amara Emerson	54f60255a2	[InstCombine] Fix SimplifyLibCalls erasing an instruction while IC still had references to it. InstCombine keeps a worklist and assumes that optimizations don't eraseFromParent() the instruction, which SimplifyLibCalls violates. This change adds a new callback to SimplifyLibCalls to let clients specify their own hander for erasing actions. Differential Revision: https://reviews.llvm.org/D52729 llvm-svn: 344251	2018-10-11 14:51:11 +00:00
David Green	8066198442	[InstCombine] Demand bits of UMin This is the umin alternative to the umax code from rL344237. We use DeMorgans law on the umax case to bring us to the same thing on umin, but using countLeadingOnes, not countLeadingZeros. Differential Revision: https://reviews.llvm.org/D53036 llvm-svn: 344239	2018-10-11 11:28:27 +00:00
David Green	30c0e98b9c	[InstCombine] Demand bits of UMax Use the demanded bits of umax(A,C) to prove we can just use A so long as the lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C Differential Revision: https://reviews.llvm.org/D53033 llvm-svn: 344237	2018-10-11 11:04:09 +00:00
Florian Hahn	18e07bb822	[LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC). We assign indices sequentially for seen instructions, so we can just use a vector and push back the seen instructions. No need for using a DenseMap. Reviewers: hsaito, rengolin, nadav, dcaballe Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D53089 llvm-svn: 344233	2018-10-11 09:46:25 +00:00
Florian Hahn	7eb5cb4ebc	[LV] Ignore more debug info. We can avoid doing some unnecessary work by skipping debug instructions in a few loops. It also helps to ensure debug instructions do not prevent vectorization, although I do not have any concrete test cases for that. Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk Reviewed By: rengolin, dcaballe Differential Revision: https://reviews.llvm.org/D53091 llvm-svn: 344232	2018-10-11 09:27:24 +00:00
Calixte Denizet	d2f290b034	[gcov] Display the hit counter for the line of a function definition Summary: Right now there is no hit counter on the line of function. So the idea is add the line of the function to all the lines covered by the entry block. Tests in compiler-rt/profile will be fixed in another patch: https://reviews.llvm.org/D49854 Reviewers: marco-c, davidxl Reviewed By: marco-c Subscribers: sylvestre.ledru, llvm-commits Differential Revision: https://reviews.llvm.org/D49853 llvm-svn: 344228	2018-10-11 08:53:43 +00:00
Max Kazantsev	b2e51090a4	[IndVars] Drop "exact" flag from lshr and udiv when substituting their args There is a transform that may replace `lshr (x+1), 1` with `lshr x, 1` in case if it can prove that the result will be the same. However the initial instruction might have an `exact` flag set, and it now should be dropped unless we prove that it may hold. Incorrectly set `exact` attribute may then produce poison. Differential Revision: https://reviews.llvm.org/D53061 Reviewed By: sanjoy llvm-svn: 344223	2018-10-11 07:22:26 +00:00
Richard Smith	6c67662816	Add a flag to remap manglings when reading profile data information. This can be used to preserve profiling information across codebase changes that have widespread impact on mangled names, but across which most profiling data should still be usable. For example, when switching from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI, or even from a 32-bit to a 64-bit build. The user can provide a remapping file specifying parts of mangled names that should be treated as equivalent (eg, std::__1 should be treated as equivalent to std::__cxx11), and profile data will be treated as applying to a particular function if its name is equivalent to the name of a function in the profile data under the provided equivalences. See the documentation change for a description of how this is configured. Remapping is supported for both sample-based profiling and instruction profiling. We do not support remapping indirect branch target information, but all other profile data should be remapped appropriately. Support is only added for the new pass manager. If someone wants to also add support for this for the old pass manager, doing so should be straightforward. This is the LLVM side of Clang r344199. Reviewers: davidxl, tejohnson, dlj, erik.pilkington Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51249 llvm-svn: 344200	2018-10-10 23:13:47 +00:00
George Burgess IV	6ef8002c2c	Replace most users of UnknownSize with LocationSize::unknown(); NFC Moving away from UnknownSize is part of the effort to migrate us to LocationSizes (e.g. the cleanup promised in D44748). This doesn't entirely remove all of the uses of UnknownSize; some uses require tweaks to assume that UnknownSize isn't just some kind of int. This patch is intended to just be a trivial replacement for all places where LocationSize::unknown() will Just Work. llvm-svn: 344186	2018-10-10 21:28:44 +00:00
Sanjay Patel	05aadf885d	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181	2018-10-10 20:47:46 +00:00
Sanjay Patel	58fc00d0bc	revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization This commit accidentally included the diffs from D53057. llvm-svn: 344178	2018-10-10 20:39:39 +00:00
Renato Golin	d8e7ca4a32	[VPlan] Fix CondBit quoting in dumpBasicBlock Quotes were being printed for VPInstructions but not the rest. llvm-svn: 344161	2018-10-10 17:55:21 +00:00
Scott Linder	3759efc650	Relax trivial cast requirements in CallPromotionUtils Differential Revision: https://reviews.llvm.org/D52792 llvm-svn: 344153	2018-10-10 16:35:47 +00:00
Carlos Alberto Enciso	c0952c8a08	Revert "[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG." This reverts commit r344120. It was causing buildbot failures. llvm-svn: 344135	2018-10-10 12:09:34 +00:00
Neil Henning	3d4579829e	Fix an ordering bug in the scalarizer. I've added a new test case that causes the scalarizer to try and use dead-and-erased values - caused by the basic blocks not being in domination order within the function. To fix this, instead of iterating through the blocks in function order, I walk them in reverse post order. Differential Revision: https://reviews.llvm.org/D52540 llvm-svn: 344128	2018-10-10 09:27:45 +00:00
Carlos Alberto Enciso	e7a347e5f8	[DebugInfo][Dexter] Unreachable line stepped onto after SimplifyCFG. When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines. Differential Revision: https://reviews.llvm.org/D52887 llvm-svn: 344120	2018-10-10 08:29:55 +00:00
George Burgess IV	40dc63e1f0	[Analysis] Make LocationSizes carry an 'imprecise' bit There are places where we need to merge multiple LocationSizes of different sizes into one, and get a sensible result. There are other places where we want to optimize aggressively based on the value of a LocationSizes (e.g. how can a store of four bytes be to an area of storage that's only two bytes large?) This patch makes LocationSize hold an 'imprecise' bit to note whether the LocationSize can be treated as an upper-bound and lower-bound for the size of a location, or just an upper-bound. This concludes the series of patches leading up to this. The most recent of which is r344108. Fixes PR36228. Differential Revision: https://reviews.llvm.org/D44748 llvm-svn: 344114	2018-10-10 06:39:40 +00:00
Max Kazantsev	1d893bfcea	[NFC] Make a variable const llvm-svn: 344113	2018-10-10 04:19:38 +00:00
Sanjay Patel	e9ca7ea3e5	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082	2018-10-09 21:26:01 +00:00
Sanjay Patel	88194dfe1a	[InstCombine] make helper function 'static'; NFC llvm-svn: 344056	2018-10-09 15:29:26 +00:00
George Burgess IV	f96e618017	Make LocationSize a proper Optional type; NFC This is the second in a series of changes intended to make https://reviews.llvm.org/D44748 more easily reviewable. Please see that patch for more context. The first change being r344012. Since I was requested to do all of this with post-commit review, this is about as small as I can make this patch. This patch makes LocationSize into an actual type that wraps a uint64_t; users are required to call getValue() in order to get the size now. If the LocationSize has an Unknown size (e.g. if LocSize == MemoryLocation::UnknownSize), getValue() will assert. This also adds DenseMap specializations for LocationInfo, which required taking two more values from the set of values LocationInfo can represent. Hence, heavy users of multi-exabyte arrays or structs may observe slightly lower-quality code as a result of this change. The intent is for getValue()s to be very close to a corresponding hasValue() (which is often spelled `!= MemoryLocation::UnknownSize`). Sadly, small diff context appears to crop that out sometimes, and the last change in DSE does require a bit of nonlocal reasoning about control-flow. :/ This also removes an assert, since it's now redundant with the assert in getValue(). llvm-svn: 344013	2018-10-09 03:18:56 +00:00
George Burgess IV	fefc42c9bb	Use locals instead of struct fields; NFC This is one of a series of changes intended to make https://reviews.llvm.org/D44748 more easily reviewable. Please see that patch for more context. Since I was requested to do all of this with post-commit review, this is about as small as I can make it (beyond committing changes to these few files separately, but they're incredibly similar in spirit, so...) On its own, this change doesn't make a great deal of sense. I plan on having a follow-up Real Soon Now(TM) to make the bits here make more sense. :) In particular, the next change in this series is meant to make LocationSize an actual type, which you have to call .getValue() on in order to get at the uint64_t inside. Hence, this change refactors code so that: - we only need to call the soon-to-come getValue() once in most cases, and - said call to getValue() happens very closely to a piece of code that checks if the LocationSize has a value (e.g. if it's != UnknownSize). llvm-svn: 344012	2018-10-09 02:14:33 +00:00
Robert Lougher	0c93ea2634	[TailCallElim] Enable marking of calls with byval as tails In r339636 the alias analysis rules were changed with regards to tail calls and byval arguments. Previously, tail calls were assumed not to alias allocas from the current frame. This has been updated, to not assume this for arguments with the byval attribute. This patch aligns TailCallElim with the new rule. Tail marking can now be more aggressive and mark more calls as tails, e.g.: define void @test() { %f = alloca %struct.foo call void @bar(%struct.foo* byval %f) ret void } define void @test2(%struct.foo* byval %f) { call void @bar(%struct.foo* byval %f) ret void } define void @test3(%struct.foo* byval %f) { %agg.tmp = alloca %struct.foo %0 = bitcast %struct.foo* %agg.tmp to i8* %1 = bitcast %struct.foo* %f to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false) call void @bar(%struct.foo* byval %agg.tmp) ret void } The problematic case where a byval parameter is captured by a call is still handled correctly, and will not be marked as a tail (see PR7272). llvm-svn: 343986	2018-10-08 18:03:40 +00:00
Xin Tong	bfdad33b82	[ThinLTO] Keep non-prevailing (linkonce\|weak)_odr symbols live Summary: If we have a symbol with (linkonce\|weak)_odr linkage, we do not want to dead strip it even it is not prevailing. IR level (linkonce\|weak)_odr symbol can become non-prevailing when we mix ELF objects and IR objects where the (linkonce\|weak)_odr symbol in the ELF object is prevailing and the ones in the IR objects are not. Stripping them will prevent us from doing optimizations with them. By not dead stripping them, We will convert these symbols to available_externally linkage as a result of non-prevailing and eventually dropping them after inlining. I modified cache-prevailing.ll to use linkonce linkage as it is testing whether cache prevailing bit is effective or not, not we should treat linkonce_odr alive or not Reviewers: tejohnson, pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D52893 llvm-svn: 343970	2018-10-08 15:12:48 +00:00
Neil Henning	57f5d0a885	[IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle. The IRBuilder CreateIntrinsic method wouldn't allow you to specify the types that you wanted the intrinsic to be mangled with. To fix this I've: - Added an ArrayRef<Type > member to both CreateIntrinsic overloads. - Used that array to pass into the Intrinsic::getDeclaration call. - Added a CreateUnaryIntrinsic to replace the most common use of CreateIntrinsic where the type was auto-deduced from operand 0. - Added a bunch more unit tests to test CreateIntrinsic calls that weren't being tested (including the FMF flag that wasn't checked). This was suggested as part of the AMDGPU specific atomic optimizer review (https://reviews.llvm.org/D51969). Differential Revision: https://reviews.llvm.org/D52087 llvm-svn: 343962	2018-10-08 10:32:33 +00:00
Ewan Crawford	fa120cbdbc	[InstCombine] Fix incongruous GEP type addrspace Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR. Input: ``` define void @insertelem_after_gep(<16 x i32>* %t0) { %t1 = bitcast <16 x i32>* %t0 to [16 x i32]* %t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)* %t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0 %t4 = insertelement <16 x i32 addrspace(3)> undef, i32 addrspace(3) %t3, i32 0 call void @extern_vec_pointers_func(<16 x i32 addrspace(3)> %t4) ret void } ``` Output: ``` define void @insertelem_after_gep(<16 x i32> %t0) { %t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0 %t4 = insertelement <16 x i32 addrspace(3)> undef, i32 addrspace(3) %t3, i32 0 call void @my_extern_func(<16 x i32 addrspace(3)> %t4) ret void } ``` Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space. ``` opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32' but expected 'i32 addrspace(3)' %t4 = insertelement <16 x i32 addrspace(3)> undef, i32 addrspace(3)* %t3, i32 0 ``` I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier. Reviewers: spatel, lebedev.ri Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52294 llvm-svn: 343956	2018-10-08 08:40:45 +00:00
Max Kazantsev	b07369651e	[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160 At the point when we perform `emitTransformedIndex`, we have a broken IR (in particular, we have Phis for which not every incoming value is properly set). On such IR, it is illegal to create SCEV expressions, because their internal simplification process may try to prove some predicates and break when it stumbles across some broken IR. The only purpose of using SCEV in this particular place is attempt to simplify the generated code slightly. It seems that the result isn't worth it, because some trivial cases (like addition of zero and multiplication by 1) can be handled separately if needed, but more generally InstCombine is able to achieve the goals we want to achieve by using SCEV. This patch fixes a functional crash described in PR39160, and as side-effect it also generates a bit smarter code in some simple cases. It also may cause some optimality loss (i.e. we will now generate `mul` by power of `2` instead of shift etc), but there is nothing what InstCombine could not handle later. In case of dire need, we can support more trivial cases just in place. Note that this patch only fixes one particular case of the general problem that LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR. The general solution, however, seems complex enough. Differential Revision: https://reviews.llvm.org/D52881 Reviewed By: fhahn, hsaito llvm-svn: 343954	2018-10-08 05:46:29 +00:00
Jonas Paulsson	29d80f07ee	[LoopVectorizer] Use TTI.getOperandInfo() Call getOperandInfo() instead of using (near) duplicated code in LoopVectorizationCostModel::getInstructionCost(). This gets the OperandValueKind and OperandValueProperties values for a Value passed as operand to an arithmetic instruction. getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but is now instead a public member. Review: Florian Hahn https://reviews.llvm.org/D52883 llvm-svn: 343852	2018-10-05 14:34:04 +00:00
Neil Henning	d2261f617b	Add missing period to comment to match style of file. This is a test commit to show that my commit access is working. llvm-svn: 343842	2018-10-05 09:39:07 +00:00
Craig Topper	029d1ef6eb	[SimplifyCFG] Pass AggressiveInsts to DominatesMergePoint by reference. Remove null check. Summary: At some point in the past the recursion in DominatesMergePoint used to pass null for AggressiveInsts as part of the recursion. It no longer does this. So there is no way for AggressiveInsts to be null. This passes it by reference and removes the null check to make this explicit. Reviewers: efriedma, reames Reviewed By: efriedma Subscribers: xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D52575 llvm-svn: 343828	2018-10-04 23:40:31 +00:00
Sanjay Patel	3436dc2923	[InstCombine] drop poison flags in SimplifyVectorDemandedElts We established the (unfortunately complicated) rules for UB/poison propagation with vector ops in: D48893 D48987 D49047 It's clear from the affected tests that we are potentially creating poison where none existed before the transforms. For add/sub/mul, the answer is simple: just drop the flags because the extra undef vector lanes are generally more valuable for analysis and codegen. llvm-svn: 343819	2018-10-04 21:36:50 +00:00
Craig Topper	1d15f7b02b	[SimplifyCFG] Change recursive calls to llvm::SimplifyCFG to instead use an outer while loop to revisit. Summary: The llvm::SimplifyCFG function creates a SimplifyCFGOpt object and calls run on it. There were numerous places reached from this run function that called back out llvm::SimplifyCFG which would create another SimplifyCFGOpt object. This is an inefficient use of stack space at minimum. We are also not passing along the LoopHeaders pointer passed into the outer llvm::SimplifyCFG call. So if its not null we lose it on the first recursion and get nullptr from there on. This patch adds an outer loop around the main BasicBlock simplifying code and adds a flag to the SimplifyCFGOpt class that can be set by to request another iteration. I don't think we can iterate based just on the change flag alone since some of the simplifications delete a basic block entirely leaving nothing to iterate on. Reviewers: bogner, eli.friedman, reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52760 llvm-svn: 343816	2018-10-04 21:11:52 +00:00
Sanjay Patel	9d6688e38d	[InstCombine] reduce code duplication in SimplifyDemandedVectorElts; NFCI llvm-svn: 343806	2018-10-04 19:12:07 +00:00
Sanjay Patel	3746e11abe	[InstCombine] allow bitcast to/from FP for vector insert/extract transform This is a follow-up to rL343482 / D52439. This was a pattern that initially caused the commit to be reverted because the transform requires a bitcast as shown here. llvm-svn: 343794	2018-10-04 16:25:05 +00:00
Sanjay Patel	cafdeb1aa6	[InstCombine] allow SimplifyDemandedVectorElts to work with FP binops We're a long way from D50992 and D51553, but this is where we have to start. We weren't back-propagating undefs into binop constant values for anything but add/sub/mul/and/or/xor. This is likely because we have to be careful about not introducing UB/poison with div/rem/shift. But I suspect we already are getting the poison part wrong for add/sub/mul (although it may not be possible to expose the bug currently because we use SimplifyDemandedVectorElts from a limited set of opcodes). See the discussion/implementation from D48987 and D49047. This patch just enables functionality for FP ops because those do not have UB/poison potential. llvm-svn: 343727	2018-10-03 21:44:59 +00:00
Sanjay Patel	306f14ceb8	[InstCombine] clean up foldVectorBinop(); NFC 1. Fix include ordering. 2. Improve variable name (width is bitwidth not number-of-elements). 3. Add local Opcode variable to reduce code duplication. llvm-svn: 343694	2018-10-03 15:46:03 +00:00
Sanjay Patel	79dceb2903	[InstCombine] name change: foldShuffledBinop -> foldVectorBinop; NFC This function will deal with more than shuffles with D50992, and I have another potential per-element fold that could live here. llvm-svn: 343692	2018-10-03 15:20:58 +00:00
Florian Hahn	11a1423348	[LoopInterchange] Remove unused variable PreserveLCSSA (NFC). llvm-svn: 343676	2018-10-03 11:01:23 +00:00
Aditya Kumar	a27014b851	Improve static analysis of cold basic blocks Differential Revision: https://reviews.llvm.org/D52704 Reviewers: sebpop, tejohnson, brzycki, SirishP Reviewed By: sebpop llvm-svn: 343663	2018-10-03 06:21:05 +00:00
Aditya Kumar	9e20ade72a	Add support for new pass manager Modified the testcases to use both pass managers Use single commandline flag for both pass managers. Differential Revision: https://reviews.llvm.org/D52708 Reviewers: sebpop, tejohnson, brzycki, SirishP Reviewed By: tejohnson, brzycki llvm-svn: 343662	2018-10-03 05:55:20 +00:00
David Green	1e44c3b62c	[InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A This is an attempt to get out of a local-minimum that instcombine currently gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at least neutral. This involves using IsFreeToInvert, which has been expanded a little to include selects that can be easily inverted. This is trying to fix PR35875, using the ideas from Sanjay. It is a large improvement to one of our rgb to cmy kernels. Differential Revision: https://reviews.llvm.org/D52177 llvm-svn: 343569	2018-10-02 09:48:34 +00:00
Craig Topper	d616d33a96	[SimplifyCFG] Use Value::hasNUses instead of 'getNumUses() =='. NFCI getNumUses is linear in the number of uses. Since we're looking for a specific use count, we can use hasNUses which will stop as soon as it determines there are more than N uses instead of walking all of them. llvm-svn: 343550	2018-10-01 23:09:52 +00:00
Craig Topper	90c0a0621c	[SimplifyCFG] Update comments that refer to CondBB to say ThenBB instead. NFC There is no variable in this function named CondBB, but there is one named ThenBB and I believe the comments are all refering to it. llvm-svn: 343548	2018-10-01 22:56:11 +00:00
Eric Christopher	dcf1d97c5c	Temporarily revert "[GVNHoist] Re-enable GVNHoist by default" This reverts commit r342387 as it's showing significant performance regressions in a number of benchmarks. Followed up with the committer and original thread with an example and will get performance numbers before recommitting. llvm-svn: 343522	2018-10-01 18:57:08 +00:00
Jesper Antonsson	c954b86391	[InstCombine] Handle vector compares in foldGEPIcmp(), take 2 Summary: This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.) This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type. The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger. Reviewers: lebedev.ri, majnemer, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52494 llvm-svn: 343486	2018-10-01 14:59:25 +00:00
Sanjay Patel	31b07198f1	[InstCombine] try to convert vector insert+extract to trunc; 2nd try This was originally committed at rL343407, but reverted at rL343458 because it crashed trying to handle a case where the destination type is FP. This version of the patch adds a check for that possibility. Tests added at rL343480. Original commit message: This transform is requested for the backend in: https://bugs.llvm.org/show_bug.cgi?id=39016 ...but I figured it was worth doing in IR too, and it's probably easier to implement here, so that's this patch. In the simplest case, we are just truncating a scalar value. If the extract index doesn't correspond to the LSBs of the scalar, then we have to shift-right before the truncate. Endian-ness makes this tricky, but hopefully the ASCII-art helps visualize the transform. Differential Revision: https://reviews.llvm.org/D52439 llvm-svn: 343482	2018-10-01 14:40:00 +00:00
Hans Wennborg	a60aa91374	Revert r343407 "[InstCombine] try to convert vector insert+extract to trunc" This caused Chromium builds to fail with "Illegal Trunc" assertion. See https://crbug.com/890723 for repro. > This transform is requested for the backend in: > https://bugs.llvm.org/show_bug.cgi?id=39016 > ...but I figured it was worth doing in IR too, and it's probably > easier to implement here, so that's this patch. > > In the simplest case, we are just truncating a scalar value. If the > extract index doesn't correspond to the LSBs of the scalar, then we > have to shift-right before the truncate. Endian-ness makes this tricky, > but hopefully the ASCII-art helps visualize the transform. > > Differential Revision: https://reviews.llvm.org/D52439 llvm-svn: 343458	2018-10-01 12:07:45 +00:00
Florian Hahn	8600fee52e	Recommit r343308: [LoopInterchange] Turn into a loop pass. llvm-svn: 343450	2018-10-01 09:59:48 +00:00
Fangrui Song	3507c6e884	Use the container form llvm::sort(C, ...) There are a few leftovers in rL343163 which span two lines. This commit changes these llvm::sort(C.begin(), C.end, ...) to llvm::sort(C, ...) llvm-svn: 343426	2018-09-30 22:31:29 +00:00
Sanjay Patel	1e0f1f645a	[InstCombine] try to convert vector insert+extract to trunc This transform is requested for the backend in: https://bugs.llvm.org/show_bug.cgi?id=39016 ...but I figured it was worth doing in IR too, and it's probably easier to implement here, so that's this patch. In the simplest case, we are just truncating a scalar value. If the extract index doesn't correspond to the LSBs of the scalar, then we have to shift-right before the truncate. Endian-ness makes this tricky, but hopefully the ASCII-art helps visualize the transform. Differential Revision: https://reviews.llvm.org/D52439 llvm-svn: 343407	2018-09-30 14:34:01 +00:00
Sanjay Patel	26c119a9c2	[InstCombine] allow lengthening of insertelement to eliminate shuffles As noted in post-commit comments for D52548, the limitation on increasing vector length can be applied by opcode. As a first step, this patch only allows insertelement to be widened because that has no logical downsides for IR and has little risk of pessimizing codegen. This may cause PR39132 to go into hiding during a full compile, but that bug is not fixed. llvm-svn: 343406	2018-09-30 13:50:42 +00:00
Sanjay Patel	54d31ef87e	[InstCombine] fix formatting in vector evaluators; NFC We need to alter the functionality as shown in D52548. llvm-svn: 343379	2018-09-29 15:05:24 +00:00
Vitaly Buka	0509070811	[cxx2a] Fix warning triggered by r343285 llvm-svn: 343369	2018-09-29 02:17:12 +00:00
whitequark	29b2980159	Revert "[LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints" This reverts commit c4baf7c2f06ff5459c4f5998ce980346e72bff97. Broke the bots, and should really be in Transforms/Coroutines instead. llvm-svn: 343337	2018-09-28 16:45:18 +00:00
whitequark	937afbc365	[LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: mehdi_amini, modocache, llvm-commits Differential Revision: https://reviews.llvm.org/D51642 llvm-svn: 343336	2018-09-28 16:38:11 +00:00
Sanjay Patel	242f90fe82	[InstCombine] don't propagate wider shufflevector arguments to predecessors InstCombine would propagate shufflevector insts that had wider output vectors onto predecessors, which would sometimes push undef's onto the divisor of a div/rem and result in bad codegen. I've fixed this by just banning propagating shufflevector back if the result of the shufflevector is wider than the input vectors. Patch by: @sheredom (Neil Henning) Differential Revision: https://reviews.llvm.org/D52548 llvm-svn: 343329	2018-09-28 15:24:41 +00:00
Florian Hahn	8d72ecc36f	Revert r343308: [LoopInterchange] Turn into a loop pass. llvm-svn: 343310	2018-09-28 10:20:07 +00:00
Florian Hahn	0694c159f7	[LoopInterchange] Turn into a loop pass. This patch turns LoopInterchange into a loop pass. It now only considers top-level loops and tries to move the innermost loop to the optimal position within the loop nest. By only looking at top-level loops, we might miss a few opportunities the function pass would get (e.g. if we have a loop nest of 3 loops, in the function pass we might process loops at level 1 and 2 and move the inner most loop to level 1, and then we process loops at levels 0, 1, 2 and interchange again, because we now have a different inner loop). But I think it would be better to handle such cases by picking the best inner loop from the start and avoid re-visiting the same loops again. The biggest advantage of it being a function pass is that it interacts nicely with the other loop passes. Without this patch, there are some performance regressions on AArch64 with loop interchanging enabled, where no loops were interchanged, but we missed out on some other loop optimizations. It also removes the SimplifyCFG run. We are just changing branches, so the CFG should not be more complicated, besides the additional 'unique' preheaders this pass might create. Reviewers: chandlerc, efriedma, mcrosier, javed.absar, xbolva00 Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D51702 llvm-svn: 343308	2018-09-28 09:45:50 +00:00
Sanjay Patel	c3f50ff92e	[InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0) When C is not zero and infinites are not allowed (C / X) > 0 is a sign test. Depending on the sign of C, the predicate must be swapped. E.g.: foo(double X) { if ((-2.0 / X) <= 0) ... } => foo(double X) { if (X >= 0) ... } Patch by: @marels (Martin Elshuber) Differential Revision: https://reviews.llvm.org/D51942 llvm-svn: 343228	2018-09-27 15:59:24 +00:00
Teresa Johnson	f24136f17a	[WPD] Fix incorrect devirtualization after indirect call promotion Summary: Add a dominance check to ensure that the possible devirtualizable call is actually dominated by the type test/checked load intrinsic being analyzed. With PGO, after indirect call promotion is performed during the compile step, followed by inlining, we may have a type test in the promoted and inlined sequence that allows an indirect call in that sequence to be devirtualized. That indirect call (inserted by inlining after promotion) will share the same vtable pointer as the fallback indirect call that cannot be devirtualized. Before this patch the code was incorrectly devirtualizing the fallback indirect call. See the new test and the example described there for more details. Reviewers: pcc, vitalybuka Subscribers: mehdi_amini, Prazek, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D52514 llvm-svn: 343226	2018-09-27 14:55:32 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Florian Hahn	6feb637124	[LoopInterchange] Preserve LCSSA. This patch extends LoopInterchange to move LCSSA to the right place after interchanging. This is required for LoopInterchange to become a function pass. An alternative to the manual moving of the PHIs, we could also re-form the LCSSA phis for a set of interchanged loops, but that's more expensive. Reviewers: efriedma, mcrosier, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D52154 llvm-svn: 343132	2018-09-26 19:34:25 +00:00
Vyacheslav Zakharin	e06831a3b2	Remove LoopID metadata from the branch instruction that follows the peeled iterations. Differential Revision: https://reviews.llvm.org/D52176 llvm-svn: 343054	2018-09-26 01:03:21 +00:00
Zhaoshi Zheng	95710337b4	Revert "Revert "[ConstHoist] Do not rebase single (or few) dependent constant"" This reverts commit bd7b44f35ee9fbe365eb25ce55437ea793b39346. Reland r342994: disabled the optimization and explicitly enable it in test. -mllvm -consthoist-min-num-to-rebase<unsigned>=0 [ConstHoist] Do not rebase single (or few) dependent constant If an instance (InsertionPoint or IP) of Base constant A has only one or few rebased constants depending on it, do NOT rebase. One extra ADD instruction is required to materialize each rebased constant, assuming A and the rebased have the same materialization cost. Differential Revision: https://reviews.llvm.org/D52243 llvm-svn: 343053	2018-09-26 00:59:09 +00:00
Anna Thomas	b1e3d45318	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 llvm-svn: 343028	2018-09-25 20:57:20 +00:00
Jessica Paquette	e02de05b32	Revert "[ConstHoist] Do not rebase single (or few) dependent constant" This caused a couple test failures on a bot: CodeGen/X86/constant-hoisting-bfi.ll Transforms/ConstantHoisting/X86/ehpad.ll Example: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/53575/ llvm-svn: 343005	2018-09-25 18:41:40 +00:00
Zhaoshi Zheng	2c1a09188f	[ConstHoist] Do not rebase single (or few) dependent constant If an instance (InsertionPoint or IP) of Base constant A has only one or few rebased constants depending on it, do NOT rebase. One extra ADD instruction is required to materialize each rebased constant, assuming A and the rebased have the same materialization cost. Differential Revision: https://reviews.llvm.org/D52243 llvm-svn: 342994	2018-09-25 17:45:37 +00:00
Sanjay Patel	69ed4710b8	[InstCombine] narrow binops on concatenated vectors (PR33026) The motivating case from: https://bugs.llvm.org/show_bug.cgi?id=33026 ...has no shuffles now. This kind of pattern may occur during vectorization when targets have lumpy ISAs like SSE/AVX. llvm-svn: 342988	2018-09-25 15:57:37 +00:00
David Green	9108c2b921	[LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder, similar to how it is already done in the llvm::UnrollLoop. The compiler would crash if this function is called with a malformed loop. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D51486 llvm-svn: 342958	2018-09-25 10:08:47 +00:00
Evgeniy Stepanov	090f0f9504	[hwasan] Record and display stack history in stack-based reports. Summary: Display a list of recent stack frames (not a stack trace!) when tag-mismatch is detected on a stack address. The implementation uses alignment tricks to get both the address of the history buffer, and the base address of the shadow with a single 8-byte load. See the comment in hwasan_thread_list.h for more details. Developed in collaboration with Kostya Serebryany. Reviewers: kcc Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52249 llvm-svn: 342923	2018-09-24 23:03:34 +00:00
Evgeniy Stepanov	20c4999e8b	Revert "[hwasan] Record and display stack history in stack-based reports." This reverts commit r342921: test failures on clang-cmake-arm* bots. llvm-svn: 342922	2018-09-24 22:50:32 +00:00
Evgeniy Stepanov	9043e17edd	[hwasan] Record and display stack history in stack-based reports. Summary: Display a list of recent stack frames (not a stack trace!) when tag-mismatch is detected on a stack address. The implementation uses alignment tricks to get both the address of the history buffer, and the base address of the shadow with a single 8-byte load. See the comment in hwasan_thread_list.h for more details. Developed in collaboration with Kostya Serebryany. Reviewers: kcc Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52249 llvm-svn: 342921	2018-09-24 21:38:42 +00:00
Christy Lee	e94374809e	Re-submitting changes in D51550 because it failed to patch. Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919	2018-09-24 20:47:12 +00:00
Sanjay Patel	4674c7765d	[InstCombine] add bitcast+extelt helper function; NFC We can handle patterns where the elements have different sizes, so refactoring ahead of trying to add another blob within these clauses. llvm-svn: 342918	2018-09-24 20:41:22 +00:00
Sanjay Patel	7a52626a08	[InstCombine] improve variable name and use 'match'; NFC 'width' of a vector usually refers to the bit-width. https://bugs.llvm.org/show_bug.cgi?id=39016 shows a case where we could extend this fold to handle a case where the number of elements in the bitcasted vector is not equal to the resulting value. llvm-svn: 342902	2018-09-24 16:39:03 +00:00
Petar Jovanovic	c451c9ef50	[deadargelim] Update dbg.value of 'unused' parameters DeadArgElim pass marks unused function arguments as ‘undef’ without updating existing dbg.values referring to it. As a consequence the debug info metadata in the final executable was wrong. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D51968 llvm-svn: 342871	2018-09-24 10:01:24 +00:00
Eugene Leviant	2b70d616f0	[WholeProgramDevirt] Don't process declarations when building type id map Differential revision: https://reviews.llvm.org/D52175 llvm-svn: 342836	2018-09-23 13:27:47 +00:00
Sanjay Patel	09e02fbf51	[InstCombine][x86] try even harder to convert blendv intrinsic to generic IR (PR38814) Follow-up to rL342324 (D52059): Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 This is an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. llvm-svn: 342806	2018-09-22 14:43:55 +00:00
Craig Topper	2b3f5df73a	[InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible Summary: This restores the combine that was reverted in r341883. The infinite loop from the failing test no longer occurs due to changes from r342163. Reviewers: spatel, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52070 llvm-svn: 342797	2018-09-22 05:53:27 +00:00
Warren Ristow	4f27730eaf	[Loop Vectorizer] Abandon vectorization when no integer IV found Support for vectorizing loops with secondary floating-point induction variables was added in r276554. A primary integer IV is still required for vectorization to be done. If an FP IV was found, but no integer IV was found at all (primary or secondary), the attempt to vectorize still went forward, causing a compiler-crash. This change abandons that attempt when no integer IV is found. (Vectorizing FP-only cases like this, rather than bailing out, is discussed as possible future work in D52327.) See PR38800 for more information. Differential Revision: https://reviews.llvm.org/D52327 llvm-svn: 342786	2018-09-21 23:03:50 +00:00
Sameer Sahasrabuddhe	0807e94951	revert changes from r342722 "[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 llvm-svn: 342743	2018-09-21 16:31:51 +00:00
Sameer Sahasrabuddhe	2de7653fd5	[AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll Differential Revision: https://reviews.llvm.org/D52221 llvm-svn: 342722	2018-09-21 11:26:55 +00:00
JF Bastien	73d8e4e531	Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue Summary: his code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types. clang part of this patch: D51752 Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51751 llvm-svn: 342709	2018-09-21 05:17:42 +00:00
Xin Tong	9b7e45159f	[GlobalDCE] AvailableExternal linkage is checked in isDiscardableIfUnused [NFC]. Summary: AvailableExternal was not handled in isDiscardableIfUnused when isDiscardableIfUnused was added in r158476. Till it was handled in r247044. This is a NFC. Reviewers: pcc, tejohnson Reviewed By: tejohnson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52319 llvm-svn: 342684	2018-09-20 21:16:16 +00:00
Fedor Sergeev	ee8d31c49e	[New PM] Introducing PassInstrumentation framework Pass Execution Instrumentation interface enables customizable instrumentation of pass execution, as per "RFC: Pass Execution Instrumentation interface" posted 06/07/2018 on llvm-dev@ The intent is to provide a common machinery to implement all the pass-execution-debugging features like print-before/after, opt-bisect, time-passes etc. Here we get a basic implementation consisting of: * PassInstrumentationCallbacks class that handles registration of callbacks and access to them. * PassInstrumentation class that handles instrumentation-point interfaces that call into PassInstrumentationCallbacks. * Callbacks accept StringRef which is just a name of the Pass right now. There were some ideas to pass an opaque wrapper for the pointer to pass instance, however it appears that pointer does not actually identify the instance (adaptors and managers might have the same address with the pass they govern). Hence it was decided to go simple for now and then later decide on what the proper mental model of identifying a "pass in a phase of pipeline" is. * Callbacks accept llvm::Any serving as a wrapper for const IRUnit, to remove direct dependencies on different IRUnits (e.g. Analyses). PassInstrumentationAnalysis analysis is explicitly requested from PassManager through usual AnalysisManager::getResult. All pass managers were updated to run that to get PassInstrumentation object for instrumentation calls. * Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra args out of a generic PassManager's extra args. This is the only way I was able to explicitly run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or RepeatedPass::run. TODO: Upon lengthy discussions we agreed to accept this as an initial implementation and then get rid of getAnalysisResult by improving RepeatedPass implementation. * PassBuilder takes PassInstrumentationCallbacks object to pass it further into PassInstrumentationAnalysis. Callbacks registration should be performed directly through PassInstrumentationCallbacks. * new-pm tests updated to account for PassInstrumentationAnalysis being run * Added PassInstrumentation tests to PassBuilderCallbacks unit tests. Other unit tests updated with registration of the now-required PassInstrumentationAnalysis. Made getName helper to return std::string (instead of StringRef initially) to fix asan builtbot failures on CGSCC tests. Reviewers: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D47858 llvm-svn: 342664	2018-09-20 17:08:45 +00:00
Jesper Antonsson	719fa055d0	[InstCombine] Handle vector compares in foldGEPIcmp() Summary: This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding": https://bugs.llvm.org/show_bug.cgi?id=38984 Reviewers: majnemer, spatel, lattner, lebedev.ri Reviewed By: lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52263 llvm-svn: 342647	2018-09-20 13:37:28 +00:00
Bjorn Pettersson	cd53b7f54e	[IPSCCP] Fix a problem with removing labels in a switch with undef condition Summary: Before removing basic blocks that ipsccp has considered as dead all uses of the basic block label must be removed. That is done by calling ConstantFoldTerminator on the users. An exception is when the branch condition is an undef value. In such scenarios ipsccp is using some internal assumptions regarding which edge in the control flow that should remain, while ConstantFoldTerminator don't know how to fold the terminator. The problem addressed here is related to ConstantFoldTerminator's ability to rewrite a 'switch' into a conditional 'br'. In such situations ConstantFoldTerminator returns true indicating that the terminator has been rewritten. However, ipsccp treated the true value as if the edge to the dead basic block had been removed. So the code for resolving an undef branch condition did not trigger, and we ended up with assertion that there were uses remaining when deleting the basic block. The solution is to resolve indeterminate branches before the call to ConstantFoldTerminator. Reviewers: efriedma, fhahn, davide Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52232 llvm-svn: 342632	2018-09-20 09:00:17 +00:00
Calixte Denizet	eb7f60201c	[IR] Add a boolean field in DILocation to know if a line must covered or not Summary: Some lines have a hit counter where they should not have one. For example, in C++, some cleanup is adding at the end of a scope represented by a '}'. So such a line has a hit counter where a user expects to not have one. The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp. A following patch in clang will add this information when generating IR (https://reviews.llvm.org/D49916). Reviewers: marco-c, davidxl, vsk, javed.absar, rnk Reviewed By: rnk Subscribers: eraman, xur, danielcdh, aprantl, rnk, dblaikie, #debug-info, vsk, llvm-commits, sylvestre.ledru Tags: #debug-info Differential Revision: https://reviews.llvm.org/D49915 llvm-svn: 342631	2018-09-20 08:53:06 +00:00
Eric Christopher	019889374b	Temporarily Revert "[New PM] Introducing PassInstrumentation framework" as it was causing failures in the asan buildbot. This reverts commit r342597. llvm-svn: 342616	2018-09-20 05:16:29 +00:00
Fedor Sergeev	a5f279ea89	[New PM] Introducing PassInstrumentation framework Pass Execution Instrumentation interface enables customizable instrumentation of pass execution, as per "RFC: Pass Execution Instrumentation interface" posted 06/07/2018 on llvm-dev@ The intent is to provide a common machinery to implement all the pass-execution-debugging features like print-before/after, opt-bisect, time-passes etc. Here we get a basic implementation consisting of: * PassInstrumentationCallbacks class that handles registration of callbacks and access to them. * PassInstrumentation class that handles instrumentation-point interfaces that call into PassInstrumentationCallbacks. * Callbacks accept StringRef which is just a name of the Pass right now. There were some ideas to pass an opaque wrapper for the pointer to pass instance, however it appears that pointer does not actually identify the instance (adaptors and managers might have the same address with the pass they govern). Hence it was decided to go simple for now and then later decide on what the proper mental model of identifying a "pass in a phase of pipeline" is. * Callbacks accept llvm::Any serving as a wrapper for const IRUnit, to remove direct dependencies on different IRUnits (e.g. Analyses). PassInstrumentationAnalysis analysis is explicitly requested from PassManager through usual AnalysisManager::getResult. All pass managers were updated to run that to get PassInstrumentation object for instrumentation calls. * Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra args out of a generic PassManager's extra args. This is the only way I was able to explicitly run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or RepeatedPass::run. TODO: Upon lengthy discussions we agreed to accept this as an initial implementation and then get rid of getAnalysisResult by improving RepeatedPass implementation. * PassBuilder takes PassInstrumentationCallbacks object to pass it further into PassInstrumentationAnalysis. Callbacks registration should be performed directly through PassInstrumentationCallbacks. * new-pm tests updated to account for PassInstrumentationAnalysis being run * Added PassInstrumentation tests to PassBuilderCallbacks unit tests. Other unit tests updated with registration of the now-required PassInstrumentationAnalysis. Reviewers: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D47858 llvm-svn: 342597	2018-09-19 22:42:57 +00:00
Matt Morehouse	e62fc3d0b6	[InstCombine] Disable strcmp->memcmp transform for MSan. Summary: The strcmp->memcmp transform can make the resulting memcmp read uninitialized data, which MSan doesn't like. Resolves https://github.com/google/sanitizers/issues/993. Reviewers: eugenis, xbolva00 Reviewed By: eugenis Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52272 llvm-svn: 342582	2018-09-19 19:37:24 +00:00
Fedor Sergeev	25de3f83be	Revert rL342544: [New PM] Introducing PassInstrumentation framework A bunch of bots fail to compile unittests. Reverting. llvm-svn: 342552	2018-09-19 14:54:48 +00:00
Roman Lebedev	f50023d37c	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask Summary: The last low-bit-mask-pattern-producing-pattern i can think of. https://rise4fun.com/Alive/UGzE <- non-canonical But we can not canonicalize it because of extra uses. https://bugs.llvm.org/show_bug.cgi?id=38123 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52148 llvm-svn: 342548	2018-09-19 13:35:46 +00:00
Roman Lebedev	ca2bdb03d6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask Summary: Same as to D52146. `((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl We can not canonicalize it due to the extra uses. But we can handle it here. Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52147 llvm-svn: 342547	2018-09-19 13:35:40 +00:00
Roman Lebedev	183a465dc6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask Summary: Two folds are happening here: 1. https://rise4fun.com/Alive/oaFX 2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4 This change doesn't just add the handling for eq/ne predicates, it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work, so all the 16 fold variants* are immediately supported. I'm indeed only testing these two predicates. I do not feel like re-proving all 16 folds, because they were already proven for the general case of constant with all-ones in low bits. So as long as the mask produces all-ones in low bits, i'm pretty sure the fold is valid. But required, i can re-prove, let me know. eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total. https://bugs.llvm.org/show_bug.cgi?id=38123 https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52146 llvm-svn: 342546	2018-09-19 13:35:27 +00:00
Fedor Sergeev	875c938fec	[New PM] Introducing PassInstrumentation framework Summary: Pass Execution Instrumentation interface enables customizable instrumentation of pass execution, as per "RFC: Pass Execution Instrumentation interface" posted 06/07/2018 on llvm-dev@ The intent is to provide a common machinery to implement all the pass-execution-debugging features like print-before/after, opt-bisect, time-passes etc. Here we get a basic implementation consisting of: * PassInstrumentationCallbacks class that handles registration of callbacks and access to them. * PassInstrumentation class that handles instrumentation-point interfaces that call into PassInstrumentationCallbacks. * Callbacks accept StringRef which is just a name of the Pass right now. There were some ideas to pass an opaque wrapper for the pointer to pass instance, however it appears that pointer does not actually identify the instance (adaptors and managers might have the same address with the pass they govern). Hence it was decided to go simple for now and then later decide on what the proper mental model of identifying a "pass in a phase of pipeline" is. * Callbacks accept llvm::Any serving as a wrapper for const IRUnit, to remove direct dependencies on different IRUnits (e.g. Analyses). PassInstrumentationAnalysis analysis is explicitly requested from PassManager through usual AnalysisManager::getResult. All pass managers were updated to run that to get PassInstrumentation object for instrumentation calls. * Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra args out of a generic PassManager's extra args. This is the only way I was able to explicitly run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or RepeatedPass::run. TODO: Upon lengthy discussions we agreed to accept this as an initial implementation and then get rid of getAnalysisResult by improving RepeatedPass implementation. * PassBuilder takes PassInstrumentationCallbacks object to pass it further into PassInstrumentationAnalysis. Callbacks registration should be performed directly through PassInstrumentationCallbacks. * new-pm tests updated to account for PassInstrumentationAnalysis being run * Added PassInstrumentation tests to PassBuilderCallbacks unit tests. Other unit tests updated with registration of the now-required PassInstrumentationAnalysis. Reviewers: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D47858 llvm-svn: 342544	2018-09-19 12:25:52 +00:00
Benjamin Kramer	e5e1ea79fd	[InstCombine] Don't transform sin/cos -> tanl if for half types This is still unsafe for long double, we will transform things into tanl even if tanl is for another type. But that's for someone else to fix. llvm-svn: 342542	2018-09-19 12:01:38 +00:00
Carlos Alberto Enciso	ba4e437c6a	[DebugInfo][Dexter] Speculated BB presents illegal variable value to debugger. When SimplifyCFG changes the PHI node into a select instruction, the debug information becomes ambiguous. It causes the debugger to display wrong variable value. Differential Revision: https://reviews.llvm.org/D51976 llvm-svn: 342527	2018-09-19 08:16:56 +00:00
Christy Lee	c85da8bd9a	Do not optimize atomic load to non-atomic memcmp Differential Revision: https://reviews.llvm.org/D51998 llvm-svn: 342498	2018-09-18 17:02:42 +00:00
Hiroshi Yamauchi	fd2c699dd6	[PGO][CHR] Add opt remarks. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52056 llvm-svn: 342495	2018-09-18 16:50:10 +00:00
Teresa Johnson	5e1c0e7610	[LTO] Make detection of WPD remark enablement more robust Summary: Currently only the first function in the module is checked to see if it has remarks enabled. If that first function is a declaration, remarks will be incorrectly skipped. Change to look for the first non-empty function. Reviewers: pcc Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51556 llvm-svn: 342477	2018-09-18 13:42:24 +00:00
whitequark	e7e14f464c	[LLVM-C][OCaml] Add UnifyFunctionExitNodes pass to C and OCaml APIs Summary: Adds LLVMAddUnifyFunctionExitNodesPass to expose createUnifyFunctionExitNodesPass to the C and OCaml APIs. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52212 llvm-svn: 342476	2018-09-18 13:36:03 +00:00
whitequark	1f50560e56	[LLVM-C][OCaml] Add LowerAtomic pass to C and OCaml APIs Summary: Adds LLVMAddLowerAtomicPass to expose createLowerAtomicPass in the C and OCaml APIs. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52211 llvm-svn: 342475	2018-09-18 13:35:50 +00:00
Max Kazantsev	0994abda3a	[IndVars] Remove unreasonable checks in rewriteLoopExitValues A piece of logic in rewriteLoopExitValues has a weird check on number of users which allowed an unprofitable transform in case if an instruction has more than 6 users. Differential Revision: https://reviews.llvm.org/D51404 Reviewed By: etherzhhb llvm-svn: 342444	2018-09-18 04:57:18 +00:00
Matt Arsenault	c640798597	LSV: Fix adjust alloca alignment trick for AMDGPU This was checking the hardcoded address space 0 for the stack. Additionally, this should be checking for legality with the adjusted alignment, so defer the alignment check. Also try to split if the unaligned access isn't allowed. llvm-svn: 342442	2018-09-18 02:05:44 +00:00
Alina Sbirlea	a782a70ad9	[EarlyCSEwMemorySSA] Add MSSA verification and tests to make EarlyCSE failures easier to track. Summary: EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result. Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA. Adding some tests to track this and a basic verify for future potential failures. Reviewers: george.burgess.iv, gberry Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D51960 llvm-svn: 342422	2018-09-17 22:35:21 +00:00
Xin Tong	8a505c64b0	[CVP] Handle instructions with no user. No need to create CVPLattice state. This handles terminator instructions and more. Summary: I tested this patch by compiling sqlite3.ll (clang -O3 -mllvm -disable-llvm-optzns sqlite3.c.) opt -called-value-propagation sqlite3.ll -time-passes -f -o out.ll I get 10+% speedup for the pass. I expect some of the gain come from skipping terminator instructions. === BEFORE THE PATCH === ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.5562 seconds (0.5582 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2485 ( 46.4%) 0.0120 ( 57.7%) 0.2605 ( 46.8%) 0.2615 ( 46.8%) Bitcode Writer 0.1607 ( 30.0%) 0.0079 ( 37.7%) 0.1685 ( 30.3%) 0.1693 ( 30.3%) Called Value Propagation 0.1262 ( 23.6%) 0.0009 ( 4.5%) 0.1271 ( 22.9%) 0.1275 ( 22.8%) Module Verifier 0.5353 (100.0%) 0.0209 (100.0%) 0.5562 (100.0%) 0.5582 (100.0%) Total === AFTER THE PATCH === ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.5338 seconds (0.5355 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2498 ( 48.6%) 0.0118 ( 59.3%) 0.2615 ( 49.0%) 0.2629 ( 49.1%) Bitcode Writer 0.1377 ( 26.8%) 0.0075 ( 37.8%) 0.1452 ( 27.2%) 0.1455 ( 27.2%) Called Value Propagation 0.1264 ( 24.6%) 0.0006 ( 3.0%) 0.1270 ( 23.8%) 0.1271 ( 23.7%) Module Verifier 0.5139 (100.0%) 0.0199 (100.0%) 0.5338 (100.0%) 0.5355 (100.0%) Total Reviewers: davide, mssimpso Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49108 llvm-svn: 342398	2018-09-17 15:28:01 +00:00
Alexandros Lamprineas	8a1c374b2e	[GVNHoist] Re-enable GVNHoist by default Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912 has been fixed by rL342055. Precommit testing performed: * Overnight runs of csmith comparing the output between programs compiled with gvn-hoist enabled/disabled. * Bootstrap builds of clang with UbSan/ASan configurations. llvm-svn: 342387	2018-09-17 12:24:55 +00:00
Max Kazantsev	5fe3620261	[NFC] Turn unsigned counters into boolean flags llvm-svn: 342360	2018-09-17 06:33:29 +00:00
Craig Topper	2da7381678	[InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y)) Summary: If the sub doesn't overflow in the original type we can move it above the sext/zext. This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52075 llvm-svn: 342335	2018-09-15 18:54:10 +00:00
Sanjay Patel	296d35a5e9	[InstCombine][x86] try harder to convert blendv intrinsic to generic IR (PR38814) Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 If this works, it's an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. I don't think that's too likely, but I've kept this patch minimal with a 'TODO', so we can test that theory in the wild before expanding the transform. Differential Revision: https://reviews.llvm.org/D52059 llvm-svn: 342324	2018-09-15 14:25:44 +00:00
Roman Lebedev	1b7fc87020	[InstCombine] Inefficient pattern for high-bits checking 3 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) The last (as far i know?) pattern, non-canonical due to the extra use. https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52062 llvm-svn: 342321	2018-09-15 12:04:13 +00:00
Sanjay Patel	90a36346bc	[InstCombine] refactor mul narrowing folds; NFCI Similar to rL342278: The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. D52075 should be able to use this code too rather than duplicating all of the logic. llvm-svn: 342292	2018-09-14 22:23:35 +00:00
Sanjay Patel	46945b9e9d	[InstCombine] add/use overflowing math helper functions; NFC The mul case can already be refactored to use this similar to rL342278. The sub case is proposed in D52075. llvm-svn: 342289	2018-09-14 21:30:07 +00:00
Wei Mi	6a14325dff	[SampleFDO] Add FunctionOffsetTable in compact binary format profile. The patch saves a function offset table which maps function name index to the offset of its function profile to the start of the binary profile. By using the function offset table, for those function profiles which will not be used when compiling a module, the profile reader does't have to read them. For profile size around 10~20M, it saves ~10% compile time. Differential Revision: https://reviews.llvm.org/D51863 llvm-svn: 342283	2018-09-14 20:52:59 +00:00
Sanjay Patel	2426eb46dd	[InstCombine] refactor add narrowing folds; NFCI The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. llvm-svn: 342278	2018-09-14 20:40:46 +00:00
Sebastian Pop	0f30f08b02	HotColdSplit: fix invalid SSA due to outlining The test used to fail with an invalid phi node: the two predecessors were outlined and the SSA representation was left invalid. The patch adds the exit block to the cold region. llvm-svn: 342277	2018-09-14 20:36:19 +00:00
Sebastian Pop	3abcf69074	HotColdSplit: fix isSingleEntrySingleExit remove duplicate entries from isSingleEntrySingleExit: the Entry block is already added by the loop over the dominance frontier. Remove the heuristic from isOutlineCandidate that a region is too small when it only contains a basic block. With this change we now grow regions starting from a block and we continue adding to the ValidColdRegion. Check the heuristic just before code generation. llvm-svn: 342276	2018-09-14 20:36:14 +00:00
Sebastian Pop	1217160bb3	HotColdSplit: add back propagation to extend cold regions Also fix a problem in forward propagation: const TerminatorInst *TI = It->getTerminator(); was set outside the while loop that iterates over It. llvm-svn: 342275	2018-09-14 20:36:10 +00:00
Florian Hahn	3afb974aa5	[LoopInterchange] Preserve ScalarEvolution, by forgetting about interchanged loops. As preparation for LoopInterchange becoming a loop pass, it needs to preserve ScalarEvolution. Even though interchanging should not change the trip count of the loop, it modifies loop entry, latch and exit blocks. I added -verify-scev to some loop interchange tests, but the verification does not catch problems caused by missing invalidation of SE in loop interchange, as the trip counts themselves do not change. So there might be potential to make the SE verification covering more stuff in the future. Reviewers: mkazantsev, efriedma, karthikthecool Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D52026 llvm-svn: 342209	2018-09-14 07:50:20 +00:00
Max Kazantsev	e9765ac275	[NFC] Remove meaningless code from GVN llvm-svn: 342202	2018-09-14 04:50:38 +00:00
Hideki Saito	d19851ac7e	Fix for the buildbot failure http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/23635 from the commit (r342197) of https://reviews.llvm.org/D50820. llvm-svn: 342201	2018-09-14 02:02:57 +00:00
Hideki Saito	ea7f3035a0	[VPlan] Implement initial vector code generation support for simple outer loops. Summary: [VPlan] Implement vector code generation support for simple outer loops. Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following: - force vector code generation using explicit vectorize_width - add conservative early returns in cost model and other places for VPlanNativePath - add code for setting up outer loop inductions - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches - support for generating uniform inner branches We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal Reviewed By: fhahn, hsaito Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D50820 llvm-svn: 342197	2018-09-14 00:36:00 +00:00
Matt Morehouse	3bea25e554	[SanitizerCoverage] Create comdat for global arrays. Summary: Place global arrays in comdat sections with their associated functions. This makes sure they are stripped along with the functions they reference, even on the BFD linker. Reviewers: eugenis Reviewed By: eugenis Subscribers: eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D51902 llvm-svn: 342186	2018-09-13 21:45:55 +00:00
Roman Lebedev	6dc87004fa	[InstCombine] Inefficient pattern for high-bits checking 2 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) More complicated, canonical pattern: https://rise4fun.com/Alive/uhA We do need to have two `switch()`'es like this, to not mismatch the swappable predicates. https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52001 llvm-svn: 342173	2018-09-13 20:33:12 +00:00
George Burgess IV	d565b0f017	[PartiallyInlineLibCalls] Add DebugCounter support This adds DebugCounter support to the PartiallyInlineLibCalls pass, which should make debugging/automated bisection easier in the future. Patch by Zhizhou Yang! Differential Revision: https://reviews.llvm.org/D50093 llvm-svn: 342172	2018-09-13 20:33:04 +00:00
George Burgess IV	7a8dc3dafa	[DCE] Add DebugCounter support Patch by Zhizhou Yang! Differential Revision: https://reviews.llvm.org/D50092 llvm-svn: 342170	2018-09-13 20:29:50 +00:00
Craig Topper	8fc05ce340	[InstCombine] Fold (xor (min/max X, Y), -1) -> (max/min ~X, ~Y) when X and Y are freely invertible. This allows the xor to be removed completely. This might help with recomitting r341674, but seems good regardless. Coincidentally fixes PR38915. Differential Revision: https://reviews.llvm.org/D51964 llvm-svn: 342163	2018-09-13 18:52:58 +00:00
Sanjay Patel	37e464876b	[InstCombine] remove checks for IsFreeToInvert() I accidentally committed this diff with rL342147 because I had applied D51964. We probably do need those checks, but D51964 has tests and more discussion/motivation, so they should be re-added with that patch. llvm-svn: 342149	2018-09-13 16:18:12 +00:00
Sanjay Patel	6f00fc3317	[InstCombine] reorder folds to reduce chance of infinite loops I don't have a test case for this, but it's motivated by the discussion in D51964, and I've added TODO comments for the better fix - move simplifications into instsimplify because that's more efficient and reduces risk of infinite loops in instcombine caused by transforms trying to do the opposite folds. In this case, we know that the transform that tries to move 'not' through min/max can be fooled by the multiple uses of a value in another min/max, so try to squash the foldSPFofSPF() patterns first. llvm-svn: 342147	2018-09-13 16:04:06 +00:00
Sanjay Patel	d341988c86	revert r341288 - [Reassociate] swap binop operands to increase factoring potential This causes or exposes indeterminism that is visible in the output of -reassociate. llvm-svn: 342083	2018-09-12 21:29:11 +00:00
Roman Lebedev	75404fb9f8	[InstCombine] Inefficient pattern for high-bits checking (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only `n` bits wide (where `n` is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) Let's handle the second variant first, since it is much simpler. https://rise4fun.com/Alive/LYjY https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51985 llvm-svn: 342067	2018-09-12 18:19:43 +00:00
Alexandros Lamprineas	54d56f274c	[GVNHoist] computeInsertionPoints() miscalculates IDF Fix for https://bugs.llvm.org/show_bug.cgi?id=38912. In GVNHoist::computeInsertionPoints() we iterate over the Value Numbers and calculate the Iterated Dominance Frontiers without clearing the IDFBlocks vector first. IDFBlocks ends up accumulating an insane number of basic blocks, which bloats the compilation time of SemaChecking.cpp with ubsan enabled. Differential Revision: https://reviews.llvm.org/D51980 llvm-svn: 342055	2018-09-12 14:28:23 +00:00
David Green	2352b30c96	[SimplifyCFG] Put an alignment on generated switch tables Previously the alignment on the newly created switch table data was not set, meaning that DataLayout::getPreferredAlignment was free to overalign it to 16 bytes. This causes unnecessary code bloat. Differential Revision: https://reviews.llvm.org/D51800 llvm-svn: 342039	2018-09-12 09:54:17 +00:00
Florian Hahn	1086ce2397	[LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC) Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use them for VPlan outside LoopVectorize.cpp Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00 Reviewed By: rengolin, xbolva00 Differential Revision: https://reviews.llvm.org/D49488 llvm-svn: 342027	2018-09-12 08:01:57 +00:00
Vikram TV	7e98d69847	Break LoopUtils into an Analysis file. Summary: The InductionDescriptor and RecurrenceDescriptor classes basically analyze the IR to identify the respective IVs. So, it is better to have them in the "Analysis" directory instead of the "Transforms" directory. The rationale for this is to make the Induction and Recurrence descriptor classes available for analysis passes. Currently including them in an analysis pass produces link error (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124456.html). Induction and Recurrence descriptors are moved from Transforms/Utils/LoopUtils.h\|cpp to Analysis/IVDescriptors.h\|cpp. Reviewers: dmgreen, llvm-commits, hfinkel Reviewed By: dmgreen Subscribers: mgorny Differential Revision: https://reviews.llvm.org/D51153 llvm-svn: 342016	2018-09-12 01:59:43 +00:00
Sanjay Patel	1cf0734b2f	[InstCombine] add folds for unsigned-overflow compares Name: op_ugt_sum %a = add i8 %x, %y %r = icmp ugt i8 %x, %a => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx Name: sum_ult_op %a = add i8 %x, %y %r = icmp ult i8 %a, %x => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx https://rise4fun.com/Alive/ZRxI AFAICT, this doesn't interfere with any add-saturation patterns because those have >1 use for the 'add'. But this should be better for IR analysis and codegen in the basic cases. This is another fold inspired by PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 342004	2018-09-11 22:40:20 +00:00
Alexandros Lamprineas	fe0512d575	Revert "[GVNHoist] Re-enable GVNHoist by default" This reverts rL341954. The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been failing with timeouts at stage2 clang/ubsan: [3065/3073] Linking CXX executable bin/lld command timed out: 1200 seconds without output running python ../sanitizer_buildbot/sanitizers/buildbot_selector.py, attempting to kill llvm-svn: 342001	2018-09-11 22:10:57 +00:00
Sanjay Patel	26725bdc50	[InstCombine] add folds for icmp with xor mask constant These are the folds in Alive; Name: xor_ult Pre: isPowerOf2(-C1) %xor = xor i8 %x, C1 %r = icmp ult i8 %xor, C1 => %r = icmp ugt i8 %x, ~C1 Name: xor_ugt Pre: isPowerOf2(C1+1) %xor = xor i8 %x, C1 %r = icmp ugt i8 %xor, C1 => %r = icmp ugt i8 %x, C1 https://rise4fun.com/Alive/Vty The ugt case in its simplest form was already handled by DemandedBits, but that's not ideal as shown in the multi-use test. I'm not sure if these are all of the symmetrical folds, but I adjusted the existing code for one of the folds to try to show the similarities. There's no obvious connection, but this is another preliminary step for PR14613... https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 341997	2018-09-11 22:00:15 +00:00
Matt Morehouse	f0d7daa972	Revert "[SanitizerCoverage] Create comdat for global arrays." This reverts r341987 since it will cause trouble when there's a module ID collision. llvm-svn: 341995	2018-09-11 21:15:41 +00:00
Matt Morehouse	7ce6032432	[SanitizerCoverage] Create comdat for global arrays. Summary: Place global arrays in comdat sections with their associated functions. This makes sure they are stripped along with the functions they reference, even on the BFD linker. Reviewers: eugenis Reviewed By: eugenis Subscribers: eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D51902 llvm-svn: 341987	2018-09-11 20:10:40 +00:00
Alina Sbirlea	a496143c9e	Update MemorySSA in LoopUnswitch. Summary: Update MemorySSA in old LoopUnswitch pass. Actual dependency and update is disabled by default. Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D45301 llvm-svn: 341984	2018-09-11 19:19:21 +00:00
Sanjay Patel	342c3bcf11	[InstCombine] enhance vector demanded elements to look at a vector select condition operand I noticed that we were not back-propagating undef lanes to shuffle masks when we have a shuffle that reduces the vector width. This is part of investigating/solving PR38691: https://bugs.llvm.org/show_bug.cgi?id=38691 The DAG equivalent was proposed with: D51696 Differential Revision: https://reviews.llvm.org/D51433 llvm-svn: 341981	2018-09-11 18:49:00 +00:00
Vedant Kumar	727d89526e	[gcov] Fix branch counters with switch statements (fix PR38821) Right now, the counters are added in regards of the number of successors for a given BasicBlock: it's good when we've only 1 or 2 successors (at least with BranchInstr). But in the case of a switch statement, the BasicBlock after switch has several predecessors and we need know from which BB we're coming from. So the idea is to revert what we're doing: add a PHINode in each block which will select the counter according to the incoming BB. They're several pros for doing that: - we fix the "switch" bug - we remove the function call to "__llvm_gcov_indirect_counter_increment" and the lookup table stuff - we replace by PHINodes, so the optimizer will probably makes a better job. Patch by calixte! Differential Revision: https://reviews.llvm.org/D51619 llvm-svn: 341977	2018-09-11 18:38:34 +00:00
Craig Topper	4e63db8387	[InstCombine] Fix incorrect usage of getPrimitiveSizeInBits when we should be using the element size for vectors For vectors, getPrimitiveSizeInBits returns the full vector width. This code should using the element size for vectors. This could be fixed by calling getScalarSizeInBits, but its even easier to just get it from the APInt we're checking. Differential Revision: https://reviews.llvm.org/D51938 llvm-svn: 341971	2018-09-11 17:57:20 +00:00
Florian Hahn	5b7e21a6b7	[CallSiteSplitting] Add debug location to created PHI nodes. There are 2 cases when we create PHI nodes: * For the result of the call that was duplicated in the split blocks. Those PHI nodes should have the debug location of the call. * For values produced before the call. Those instructions need to be duplicated in the split blocks and the PHI nodes should have the debug locations of those instructions. Fixes PR37962. Reviewers: junbuml, gbedwell, vsk Reviewed By: junbuml Tags: #debug-info Differential Revision: https://reviews.llvm.org/D51919 llvm-svn: 341970	2018-09-11 17:55:58 +00:00
Matt Morehouse	40fbdd0c4f	Revert "[SanitizerCoverage] Create comdat for global arrays." This reverts r341951 due to bot breakage. llvm-svn: 341965	2018-09-11 17:20:14 +00:00
Craig Topper	12fd6bd4ad	[InstCombine] Use dyn_cast instead of match(m_Constant). NFC llvm-svn: 341962	2018-09-11 16:51:26 +00:00
Craig Topper	a57bb61a3e	[InstCombine] Support (mul (sext x), cst) --> (sext (mul x, cst')) and (mul (zext x), cst) --> (zext (mul x, cst')) for vectors constants. Similar to D51236, but for mul instead of add. Differential Revision: https://reviews.llvm.org/D51900 llvm-svn: 341961	2018-09-11 16:51:24 +00:00
Alexandros Lamprineas	db18e972d7	[GVNHoist] Re-enable GVNHoist by default Rebase rL340922 since https://bugs.llvm.org/show_bug.cgi?id=38807 has been fixed by rL341947. llvm-svn: 341954	2018-09-11 15:55:45 +00:00
Matt Morehouse	eac270caf4	[SanitizerCoverage] Create comdat for global arrays. Summary: Place global arrays in comdat sections with their associated functions. This makes sure they are stripped along with the functions they reference, even on the BFD linker. Reviewers: eugenis Reviewed By: eugenis Subscribers: eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D51902 llvm-svn: 341951	2018-09-11 15:23:14 +00:00
Johannes Doerfert	ae3cfeb3ad	[FuncAttrs] Remove "access range attributes" for read-none functions The presence of readnone and an access range attribute (argmemonly, inaccessiblememonly, inaccessiblemem_or_argmemonly) is considered an error by the verifier. This seems strict but also not wrong. This patch makes sure function attribute detection will remove all access range attributes for readnone functions. llvm-svn: 341927	2018-09-11 11:51:29 +00:00
Serguei Katkov	5f4a9e9ea0	[LICM] Avoid duplicate work during building AliasSetTracker Currently we re-use cached info from sub loops or traverse them to populate AliasSetTracker. But after that we traverse all basic blocks from the current loop. This is redundant work. All what we need is traversing the all basic blocks from the loop except those which are used to get the data from the cache. This should improve compile time only. Reviewers: mkazantsev, reames, kariddi, anna Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51715 llvm-svn: 341896	2018-09-11 04:07:36 +00:00
Max Kazantsev	e6413919ce	[IndVars][NFC] Refactor to make modifications of Changed transparent IndVarSimplify's design is somewhat odd in the way how it reports that some transform has made a change. It has a `Changed` field which can be set from within any function, which makes it hard to track whether or not it was set properly after a transform was made. It leads to oversights in setting this flag where needed, see example in PR38855. This patch removes the `Changed` field, turns it into a local and unifies the signatures of all relevant transform functions to return boolean value which designates whether or not this transform has made a change. Differential Revision: https://reviews.llvm.org/D51850 Reviewed By: skatkov llvm-svn: 341893	2018-09-11 03:57:22 +00:00
Philip Reames	1f52e38e8e	[LICM] (re-)simplify code using MemoryLocation API [NFC] I'd made exactly this same change before, but it appears to have been accidentally reverted in another change. (I'm assuming accidental since it was without comment or test case, and in an unrelated change.) llvm-svn: 341892	2018-09-11 03:28:28 +00:00
Alina Sbirlea	116caa2920	[InstCombine] Partially revert rL341674 due to PR38897. Summary: Revert min/max changes in rL341674 dues to high compile times causing timeouts (PR38897). Checking in to unblock failing builds. Patch available for post-commit review and re-revert once resolved. Working on a smaller reproducer for PR38897. Reviewers: craig.topper, spatel Subscribers: sanjoy, jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D51897 llvm-svn: 341883	2018-09-10 23:47:21 +00:00
Sanjay Patel	691d1a40e2	[InstCombine] use SelectInst operand names to make code clearer; NFC Cleanup step for D51433. llvm-svn: 341850	2018-09-10 18:37:59 +00:00
Sebastian Pop	a1f20fcc96	HotColdSplitting: check that target supports cold calling convention Before tagging a function with coldcc make sure the target supports cold calling convention. Without this patch HotColdSplitting pass fails on aarch64 with: fatal error: error in backend: Unsupported calling convention. llvm-svn: 341838	2018-09-10 15:08:02 +00:00
Sebastian Pop	6284b54ccd	add flag instead of using a constant [NFC] llvm-svn: 341837	2018-09-10 15:07:59 +00:00
Sebastian Pop	ea0a91298d	make flag name more specific to gvn [NFC] llvm-svn: 341836	2018-09-10 15:07:56 +00:00
Tim Northover	12c1f7675f	InstCombine: move hasOneUse check to the top of foldICmpAddConstant There were two combines not covered by the check before now, neither of which actually differed from normal in the benefit analysis. The most recent seems to be because it was just added at the top of the function (naturally). The older is from way back in 2008 (r46687) when we just didn't put those checks in so routinely, and has been diligently maintained since. llvm-svn: 341831	2018-09-10 14:26:44 +00:00
Benjamin Kramer	28559a2605	Don't create a temporary vector of loop blocks just to iterate over them. Loop's getBlocks returns an ArrayRef. llvm-svn: 341821	2018-09-10 12:32:06 +00:00
John Brawn	8967e18c4a	[GVN] Invalidate cached info for values replaced by equality propagation When GVN propagates an equality by replacing one value with another it also needs to invalidate the cached information for the value being replaced. Differential Revision: https://reviews.llvm.org/D51218 llvm-svn: 341820	2018-09-10 12:23:05 +00:00
Max Kazantsev	fde88578e5	[IndVars] Set Changed if rewriteFirstIterationLoopExitValues changes IR. PR38863 Currently, `rewriteFirstIterationLoopExitValues` does not set Changed flag even if it makes changes in the IR. There is no clear evidence that it can cause a crash, but it looks highly suspicious and likely invalid. Differential Revision: https://reviews.llvm.org/D51779 Reviewed By: skatkov llvm-svn: 341779	2018-09-10 06:50:16 +00:00
Max Kazantsev	4d10ba37b9	[IndVars] Set Changed if sinkUnusedInvariants changes IR. PR38863 Currently, `sinkUnusedInvariants` does not set Changed flag even if it makes changes in the IR. There is no clear evidence that it can cause a crash, but it looks highly suspicious and likely invalid. Differential Revision: https://reviews.llvm.org/D51777 Reviewed By: skatkov llvm-svn: 341777	2018-09-10 06:32:00 +00:00
Vikram TV	09be521d4d	Move a transformation routine from LoopUtils to LoopVectorize. Summary: Move InductionDescriptor::transform() routine from LoopUtils to its only uses in LoopVectorize.cpp. Specifically, the function is renamed as InnerLoopVectorizer::emitTransformedIndex(). This is a child to D51153. Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51837 llvm-svn: 341776	2018-09-10 06:16:44 +00:00
Vikram TV	6594dc377d	Move createMinMaxOp() out of RecurrenceDescriptor. Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51838 llvm-svn: 341773	2018-09-10 05:05:08 +00:00
Abderrazek Zaafrani	c30dfb2dfc	[SimplifyIndVar] Avoid generating truncate instructions with non-hoisted Laod operand. Differential Revision: https://reviews.llvm.org/D49151 llvm-svn: 341726	2018-09-07 22:41:57 +00:00
Alina Sbirlea	f98c2c5e6d	[MemorySSA] Update MemoryPhi wiring for block splitting to consider if identical edges were merged. Summary: Block splitting is done with either identical edges being merged, or not. Only critical edges can be split without merging identical edges based on an option. Teach the memoryssa updater to take this into account: for the same edge between two blocks only move one entry from the Phi in Old to the new Phi in New. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D51563 llvm-svn: 341709	2018-09-07 21:14:48 +00:00
Sanjay Patel	c1416b60f2	[InstCombine] narrow vector select with padded condition and extracted result (PR38691) shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) --> sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask) The motivating case from: https://bugs.llvm.org/show_bug.cgi?id=38691 ...is the last regression test. In that case, we're just left with the narrow select. Note that if we do create new shuffles, they use the existing extraction identity mask, so there's no danger that this transform creates arbitrary shuffles. Differential Revision: https://reviews.llvm.org/D51496 llvm-svn: 341708	2018-09-07 21:03:34 +00:00
Fangrui Song	b3b61de09a	[PGO] Fix some style issue of ControlHeightReduction Reviewers: yamauchi Reviewed By: yamauchi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51811 llvm-svn: 341702	2018-09-07 20:23:15 +00:00
Hiroshi Yamauchi	06650941a2	[PGO][CHR] Build/warning fix llvm-svn: 341692	2018-09-07 18:44:53 +00:00
JF Bastien	7e2dd2d2fa	NFC: remove magic bool in LoopIdiomRecognize Use an enum class instead. llvm-svn: 341684	2018-09-07 18:17:59 +00:00
Hiroshi Yamauchi	5fb509b763	[PGO][CHR] Small cleanup. Summary: Do away with demangling. It wasn't really necessary. Declared some local functions to be static. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51740 llvm-svn: 341681	2018-09-07 18:00:58 +00:00
Craig Topper	040c2b0acf	[InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible If the ~X wasn't able to simplify above the max/min, we might be able to simplify it by moving it below the max/min. I had to modify the ~(min/max ~X, Y) transform to prevent getting stuck in a loop when we saw the new ~(max/min X, ~Y) before the ~Y had been folded away to remove the new not. Differential Revision: https://reviews.llvm.org/D51398 llvm-svn: 341674	2018-09-07 16:19:50 +00:00
Anna Thomas	110df11a1a	[LV] Fix code gen for conditionally executed loads and stores Fix a latent bug in loop vectorizer which generates incorrect code for memory accesses that are executed conditionally. As pointed in review, this bug definitely affects uniform loads and may affect conditional stores that should have turned into scatters as well). The code gen for conditionally executed uniform loads on architectures that support masked gather instructions is broken. Without this patch, we were unconditionally executing the conditional load in the vectorized version. This patch does the following: 1. Uniform conditional loads on architectures with gather support will have correct code generated. In particular, the cost model (setCostBasedWideningDecision) is fixed. 2. For the recipes which are handled after the widening decision is set, we use the isScalarWithPredication(I, VF) form which is added in the patch. 3. Fix the vectorization cost model for scalarization (getMemInstScalarizationCost): implement and use isPredicatedInst to identify all predicated instructions, not just scalar+predicated. So, now the cost for scalarization will be increased for maskedloads/stores and gather/scatter operations. In short, we should be choosing the gather/scatter in place of scalarization on archs where it is profitable. 4. We needed to weaken the assert in useEmulatedMaskMemRefHack. Reviewers: Ayal, hsaito, mkuper Differential Revision: https://reviews.llvm.org/D51313 llvm-svn: 341673	2018-09-07 15:53:48 +00:00
Aditya Kumar	801394a3d7	Hot cold splitting pass Find cold blocks based on profile information (or optionally with static analysis). Forward propagate profile information to all cold-blocks. Outline a cold region. Set calling conv and prof hint for the callsite of the outlined function. Worked in collaboration with: Sebastian Pop <s.pop@samsung.com> Differential Revision: https://reviews.llvm.org/D50658 llvm-svn: 341669	2018-09-07 15:03:49 +00:00
Florian Hahn	e32ff4b28a	[InstCombine] Do not fold scalar ops over select with vector condition. If OtherOpT or OtherOpF have scalar types and the condition is a vector, we would create an invalid select. Reviewers: spatel, john.brawn, mssimpso, craig.topper Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D51781 llvm-svn: 341666	2018-09-07 14:40:06 +00:00
Florian Hahn	b30f7aeeeb	[NewGVN] Mark function as changed if we erase instructions. Currently eliminateInstructions only returns true if any instruction got replaced. In the test case for this patch, we eliminate the trivially dead calls, for which eliminateInstructions not do a replacement and the function is not marked as changed, which is why the inliner crashes while traversing the call graph. Alternatively we could also change eliminateInstructions to return true in case we mark instructions for deletion, but that's slightly more code and doing it at the place where the replacement happens seems safer. Fixes PR37517. Reviewers: davide, mcrosier, efriedma, bjope Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D51169 llvm-svn: 341651	2018-09-07 11:41:34 +00:00
Alexander Potapenko	6301574cb4	[MSan] don't access MsanCtorFunction when using KMSAN MSan has found a use of uninitialized memory in MSan, fix it. llvm-svn: 341646	2018-09-07 09:56:36 +00:00
Alexander Potapenko	8fe99a0ef2	[MSan] Add KMSAN instrumentation to MSan pass Introduce the -msan-kernel flag, which enables the kernel instrumentation. The main differences between KMSAN and MSan instrumentations are: - KMSAN implies msan-track-origins=2, msan-keep-going=true; - there're no explicit accesses to shadow and origin memory. Shadow and origin values for a particular X-byte memory location are read and written via pointers returned by __msan_metadata_ptr_for_load_X(u8 addr) and __msan_store_shadow_origin_X(u8 addr, uptr shadow, uptr origin); - TLS variables are stored in a single struct in per-task storage. A call to a function returning that struct is inserted into every instrumented function before the entry block; - __msan_warning() takes a 32-bit origin parameter; - local variables are poisoned with __msan_poison_alloca() upon function entry and unpoisoned with __msan_unpoison_alloca() before leaving the function; - the pass doesn't declare any global variables or add global constructors to the translation unit. llvm-svn: 341637	2018-09-07 09:10:30 +00:00
Max Kazantsev	9e6845d8e1	[IndVars] Set Changed when we delete dead instructions. PR38855 IndVars does not set `Changed` flag when it eliminates dead instructions. As result, it may make IR modifications and report that it has done nothing. It leads to inconsistent preserved analyzes results. Differential Revision: https://reviews.llvm.org/D51770 Reviewed By: skatkov llvm-svn: 341633	2018-09-07 07:23:39 +00:00
Wei Mi	94d44c97bc	[SampleFDO] Make sample profile loader unaware of compact format change. The patch tries to make sample profile loader independent of profile format change. It moves compact format related code into FunctionSamples and SampleProfileReader classes, and sample profile loader only has to interact with those two classes and will be unaware of profile format changes. The cleanup also contain some fixes to further remove the difference between compactbinary format and binary format. After the cleanup using different formats originated from the same profile will generate the same binaries, which we verified by compiling two large server benchmarks w/wo thinlto. Differential Revision: https://reviews.llvm.org/D51643 llvm-svn: 341591	2018-09-06 22:03:37 +00:00
Sanjay Patel	93bd15a005	[InstCombine] add xor+not folds This fold is needed to avoid a regression when we try to recommit rL300977. We can't see the most basic win currently because demanded bits changes the patterns: https://rise4fun.com/Alive/plpp llvm-svn: 341559	2018-09-06 16:23:40 +00:00
Alexander Potapenko	7f270fcf0a	[MSan] store origins for variadic function parameters in __msan_va_arg_origin_tls Add the __msan_va_arg_origin_tls TLS array to keep the origins for variadic function parameters. Change the instrumentation pass to store parameter origins in this array. This is a reland of r341528. test/msan/vararg.cc doesn't work on Mips, PPC and AArch64 (because this patch doesn't touch them), XFAIL these arches. Also turned out Clang crashed on i80 vararg arguments because of incorrect origin type returned by getOriginPtrForVAArgument() - fixed it and added a test. llvm-svn: 341554	2018-09-06 15:14:36 +00:00
Sanjay Patel	1a00ffd656	[InstCombine] fix formatting in SimplifyDemandedVectorElts->Select; NFCI I'm preparing to add the same functionality both here and to the DAG version of this code in D51696 / D51433, so try to make those cases as similar as possible to avoid bugs. llvm-svn: 341545	2018-09-06 13:19:22 +00:00
Alexander Potapenko	ac6595bd53	[MSan] revert r341528 to unbreak the bots llvm-svn: 341541	2018-09-06 12:19:27 +00:00
Florian Hahn	c51d08825f	[LoopInterchange] Cleanup unused variables. llvm-svn: 341537	2018-09-06 10:41:01 +00:00
Florian Hahn	236f6feb8f	[LoopInterchange] Move preheader creation to transform stage and simplify. There is no need to create preheaders in the analysis stage, we only need them when adjusting the branches. Also, the only cases we need to create our own preheaders is when they have more than 1 predecessors or PHI nodes (even with only 1 predecessor, we could have an LCSSA phi node). I have simplified the conditions and added some assertions to be sure. Because we know the inner and outer loop need to be tightly nested, it is sufficient to check if the inner loop preheader is the outer loop header to check if we need to create a new preheader. Reviewers: efriedma, mcrosier, karthikthecool Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D51703 llvm-svn: 341533	2018-09-06 09:57:27 +00:00
Alexander Potapenko	1a10ae0def	[MSan] store origins for variadic function parameters in __msan_va_arg_origin_tls Add the __msan_va_arg_origin_tls TLS array to keep the origins for variadic function parameters. Change the instrumentation pass to store parameter origins in this array. llvm-svn: 341528	2018-09-06 08:50:11 +00:00
Alexander Potapenko	d518c5fc87	[MSan] Make sure variadic function arguments do not overflow __msan_va_arg_tls Turns out that calling a variadic function with too many (e.g. >100 i64's) arguments overflows __msan_va_arg_tls, which leads to smashing other TLS data with function argument shadow values. getShadow() already checks for kParamTLSSize and returns clean shadow if the argument does not fit, so just skip storing argument shadow for such arguments. llvm-svn: 341525	2018-09-06 08:21:54 +00:00
Max Kazantsev	f90154069c	Revert "[IndVars] Turn isValidRewrite into an assertion" because it seems wrong llvm-svn: 341517	2018-09-06 05:52:47 +00:00
Max Kazantsev	51690c4f52	[IndVars] Turn isValidRewrite into an assertion Function rewriteLoopExitValues contains a check on isValidRewrite which is needed to make sure that SCEV does not convert the pattern `gep Base, (&p[n] - &p[0])` into `gep &p[n], Base - &p[0]`. This problem has been fixed in SCEV long ago, so this check is just obsolete. This patch converts it into an assertion to make sure that the SCEV will not mess up this case in the future. Differential Revision: https://reviews.llvm.org/D51582 Reviewed By: atrick llvm-svn: 341516	2018-09-06 05:21:25 +00:00
Benjamin Kramer	9abad4814d	[ControlHeightReduction] Remove unused includes Also clang-format them. llvm-svn: 341468	2018-09-05 13:51:05 +00:00
Benjamin Kramer	837f93af7d	[Aggressive InstCombine] Move C bindings to their own header file. llvm-svn: 341461	2018-09-05 11:41:12 +00:00
Richard Trieu	47c2bc58b3	Prevent unsigned overflow. The sum of the weights is caculated in an APInt, which has a width smaller than 64. In certain cases, the sum of the widths would overflow when calculations are done inside an APInt, but would not if done with uint64_t. Since the values will be passed as uint64_t in the function call anyways, do all the math in 64 bits. Also added an assert in case the probabilities overflow 64 bits. llvm-svn: 341444	2018-09-05 04:19:15 +00:00
Fangrui Song	c8f348cba7	Fix -Wunused-function in release build after rL341386 llvm-svn: 341443	2018-09-05 03:10:20 +00:00
Sanjay Patel	63cf26cf01	[InstCombine] fix xor-or-xor fold to check uses and handle commutes I'm probably missing some way to use m_Deferred to remove the code duplication, but that can be a follow-up. The improvement in demand_shrink_nsw.ll is an example of missing the fold because the pattern matching was deficient. I didn't try to follow the bits in that test, but Alive says it's correct: https://rise4fun.com/Alive/ugc llvm-svn: 341426	2018-09-04 23:22:13 +00:00
Zhaoshi Zheng	a0aa41d793	Revert "Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions" Reland r341269. Use std::stable_sort when sorting constant condidates. Reverting commit, r341365: Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions One of the tests is failing 50% of the time when expensive checks are enabled. Not sure how deep the problem is so just reverting while the author can investigate so that the bots stop repeatedly failing and blaming things incorrectly. Will respond with details on the original commit. Original commit, r341269: [Constant Hoisting] Hoisting Constant GEP Expressions Leverage existing logic in constant hoisting pass to transform constant GEP expressions sharing the same base global variable. Multi-dimensional GEPs are rewritten into single-dimensional GEPs. https://reviews.llvm.org/D51396 Differential Revision: https://reviews.llvm.org/D51654 llvm-svn: 341417	2018-09-04 22:17:03 +00:00
Anna Thomas	dbacea188b	[LV] First order recurrence phis should not be treated as uniform This is fix for PR38786. First order recurrence phis were incorrectly treated as uniform, which caused them to be vectorized as uniform instructions. Patch by Ayal Zaks and Orivej Desh! Reviewed by: Anna Differential Revision: https://reviews.llvm.org/D51639 llvm-svn: 341416	2018-09-04 22:12:23 +00:00
Hiroshi Yamauchi	bd897a02a0	Fix a memory leak after rL341386. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51658 llvm-svn: 341412	2018-09-04 21:28:22 +00:00
Sanjay Patel	0f70f86ce0	[InstCombine] make ((X & C) ^ C) form consistent for vectors It would be better to create a 'not' here, but that's not possible yet. llvm-svn: 341410	2018-09-04 21:17:14 +00:00
Sanjay Patel	a89f183253	[InstCombine] simplify code for xor folds; NFCI This is just a cleanup step. The TODO comments show what is wrong with the 'and' version of the fold. Fixing this should be part of recommitting: rL300977 llvm-svn: 341405	2018-09-04 21:00:13 +00:00
Reid Kleckner	792a4f8a21	Fix unused variable warning llvm-svn: 341400	2018-09-04 20:34:47 +00:00
Fedor Sergeev	8b6effd969	[SimpleLoopUnswitch] remove a chain of dead blocks at once Recent change to deleteDeadBlocksFromLoop was not enough to fix all the problems related to dead blocks after nontrivial unswitching of switches. We need to delete all the dead blocks that were created during unswitching, otherwise we will keep having problems with phi's or dead blocks. This change removes all the dead blocks that are reachable from the loop, not trying to track whether these blocks are newly created by unswitching or not. While not completely correct, we are unlikely to get loose but reachable dead blocks that do not belong to our loop nest. It does fix all the failures currently known, in particular PR38778. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D51519 llvm-svn: 341398	2018-09-04 20:19:41 +00:00
Hiroshi Yamauchi	72ee6d6000	Fix build failures after rL341386. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51647 llvm-svn: 341391	2018-09-04 18:10:54 +00:00
Hiroshi Yamauchi	9775a620b0	[PGO] Control Height Reduction Summary: Control height reduction merges conditional blocks of code and reduces the number of conditional branches in the hot path based on profiles. if (hot_cond1) { // Likely true. do_stg_hot1(); } if (hot_cond2) { // Likely true. do_stg_hot2(); } -> if (hot_cond1 && hot_cond2) { // Hot path. do_stg_hot1(); do_stg_hot2(); } else { // Cold path. if (hot_cond1) { do_stg_hot1(); } if (hot_cond2) { do_stg_hot2(); } } This speeds up some internal benchmarks up to ~30%. Reviewers: davidxl Reviewed By: davidxl Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D50591 llvm-svn: 341386	2018-09-04 17:19:13 +00:00
Chandler Carruth	6cb12444cc	Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions One of the tests is failing 50% of the time when expensive checks are enabled. Not sure how deep the problem is so just reverting while the author can investigate so that the bots stop repeatedly failing and blaming things incorrectly. Will respond with details on the original commit. llvm-svn: 341365	2018-09-04 13:36:44 +00:00
Chandler Carruth	664aa868f5	[x86/SLH] Add a real Clang flag and LLVM IR attribute for Speculative Load Hardening. Wires up the existing pass to work with a proper IR attribute rather than just a hidden/internal flag. The internal flag continues to work for now, but I'll likely remove it soon. Most of the churn here is adding the IR attribute. I talked about this Kristof Beyls and he seemed at least initially OK with this direction. The idea of using a full attribute here is that we do expect at least some forms of this for other architectures. There isn't anything inherently x86-specific about this technique, just that we only have an implementation for x86 at the moment. While we could potentially expose this as a Clang-level attribute as well, that seems like a good question to defer for the moment as it isn't 100% clear whether that or some other programmer interface (or both?) would be best. We'll defer the programmer interface side of this for now, but at least get to the point where the feature can be enabled without relying on implementation details. This also allows us to do something that was really hard before: we can enable just the indirect call retpolines when using SLH. For x86, we don't have any other way to mitigate indirect calls. Other architectures may take a different approach of course, and none of this is surfaced to user-level flags. Differential Revision: https://reviews.llvm.org/D51157 llvm-svn: 341363	2018-09-04 12:38:00 +00:00
Nicola Zaghen	9588ad9611	[InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2) Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares. This is the Alive proof: https://rise4fun.com/Alive/nyY Differential Revision: https://reviews.llvm.org/D50972 llvm-svn: 341353	2018-09-04 10:29:48 +00:00
Max Kazantsev	f34115c627	[NFC] Add assert to detect LCSSA breaches early llvm-svn: 341347	2018-09-04 06:34:40 +00:00
Max Kazantsev	2cbba56337	[IndVars] Fix usage of SCEVExpander to not mess with SCEVConstant. PR38674 This patch removes the function `expandSCEVIfNeeded` which behaves not as it was intended. This function tries to make a lookup for exact existing expansion and only goes to normal expansion via `expandCodeFor` if this lookup hasn't found anything. As a result of this, if some instruction above the loop has a `SCEVConstant` SCEV, this logic will return this instruction when asked for this `SCEVConstant` rather than return a constant value. This is both non-profitable and in some cases leads to breach of LCSSA form (as in PR38674). Whether or not it is possible to break LCSSA with this algorithm and with some non-constant SCEVs is still in question, this is still being investigated. I wasn't able to construct such a test so far, so maybe this situation is impossible. If it is, it will go as a separate fix. Rather than do it, it is always correct to just invoke `expandCodeFor` unconditionally: it behaves smarter about insertion points, and as side effect of this it will choose a constant value for SCEVConstants. For other SCEVs it may end up finding a better insertion point. So it should not be worse in any case. NOTE: So far the only known case for which this transform may break LCSSA is mapping of SCEVConstant to an instruction. However there is a suspicion that the entire algorithm can compromise LCSSA form for other cases as well (yet not proved). Differential Revision: https://reviews.llvm.org/D51286 Reviewed By: etherzhhb llvm-svn: 341345	2018-09-04 05:01:35 +00:00
Sanjay Patel	2fe1f62c88	[InstCombine] simplify xor/not folds; NFCI llvm-svn: 341336	2018-09-03 18:40:56 +00:00
Sanjay Patel	d75064e6d5	[InstCombine] allow add+not --> sub for arbitrary vector constants. llvm-svn: 341335	2018-09-03 18:21:59 +00:00
Florian Hahn	cc9dc599ba	[SLC] Support expanding pow(x, n+0.5) to x * x * ... * sqrt(x) Reviewers: evandro, efriedma, spatel Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D51435 llvm-svn: 341330	2018-09-03 17:37:39 +00:00
Argyrios Kyrtzidis	c30340b207	Add header guards to some headers that are missing them Also adjust some of dsymutil's headers to put the header guards at the top, otherwise the compiler will not recognize them as header guards. llvm-svn: 341323	2018-09-03 16:22:05 +00:00
Sanjay Patel	17e709b66a	[InstCombine] allow not+sub fold for arbitrary vector constants The fold was implemented for the general case but use-limitation, but the later constant version which didn't check uses was only matching splat constants. llvm-svn: 341292	2018-09-02 19:31:45 +00:00
Sanjay Patel	ca36eb4e33	[Reassociate] swap binop operands to increase factoring potential If we have a pair of binops feeding another pair of binops, rearrange the operands so the matching pair are together because that allows easy factorization folds to happen in instcombine: ((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation) --> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization) This is part of solving PR37098: https://bugs.llvm.org/show_bug.cgi?id=37098 Note that there's an instcombine version of this patch attached there, but we're trying to make instcombine have less responsibility to improve compile-time efficiency. For reasons I still don't completely understand, reassociate does this kind of transform sometimes, but misses everything in my motivating cases. This patch on its own is gluing an independent cleanup chunk to the end of the existing RewriteExprTree() loop. We can build on it and do something stronger to better order the full expression tree like D40049. That might be an alternative to the proposal to add a separate reassociation pass like D41574. Differential Revision: https://reviews.llvm.org/D45842 llvm-svn: 341288	2018-09-02 14:22:54 +00:00
Sanjay Patel	099b1a4b0c	[InstCombine] simplify code for 'or' fold This is no-outwardly-visible-change intended, so no test. But the code is smaller and more efficient. The check for a 'not' op is intended to avoid the expensive value tracking call when it should not be necessary, and it might prevent infinite looping when we resurrect: rL300977 llvm-svn: 341280	2018-09-01 15:08:59 +00:00
Zhaoshi Zheng	f5297fb24b	[Constant Hoisting] Hoisting Constant GEP Expressions Leverage existing logic in constant hoisting pass to transform constant GEP expressions sharing the same base global variable. Multi-dimensional GEPs are rewritten into single-dimensional GEPs. Differential Revision: https://reviews.llvm.org/D51396 llvm-svn: 341269	2018-09-01 00:04:56 +00:00
Matt Arsenault	c807ce0ee4	SLPVectorizer: Fix assert with different sized address spaces llvm-svn: 341215	2018-08-31 14:34:53 +00:00
Evandro Menezes	2123ea7d5c	[InstCombine] Expand the simplification of pow() into exp2() Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all scalar and vector types. This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as 252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on x86-64 or A64. Differential revision: https://reviews.llvm.org/D49273 llvm-svn: 341095	2018-08-30 19:04:51 +00:00
Eli Friedman	94d3e4dd77	[SROA] Fix alignment for uses of PHI nodes. Splitting an alloca can decrease the alignment of GEPs into the partition. Normally, rewriting accounts for this, but the code was missing for uses of PHI nodes and select instructions. Fixes https://bugs.llvm.org/show_bug.cgi?id=38707 . Differential Revision: https://reviews.llvm.org/D51335 llvm-svn: 341094	2018-08-30 18:59:24 +00:00
Matt Morehouse	7e042bb1d1	[libFuzzer] Port to Windows Summary: Port libFuzzer to windows-msvc. This patch allows libFuzzer targets to be built and run on Windows, using -fsanitize=fuzzer and/or fsanitize=fuzzer-no-link. It allows these forms of coverage instrumentation to work on Windows as well. It does not fix all issues, such as those with -fsanitize-coverage=stack-depth, which is not usable on Windows as of this patch. It also does not fix any libFuzzer integration tests. Nearly all of them fail to compile, fixing them will come in a later patch, so libFuzzer tests are disabled on Windows until them. Patch By: metzman Reviewers: morehouse, rnk Reviewed By: morehouse, rnk Subscribers: #sanitizers, delcypher, morehouse, kcc, eraman Differential Revision: https://reviews.llvm.org/D51022 llvm-svn: 341082	2018-08-30 15:54:44 +00:00
Nicolai Haehnle	35617ed4cb	[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071	2018-08-30 14:21:36 +00:00
Martin Storsjo	22dcddf651	Revert "[SimplifyCFG] Common debug handling [NFC]" This reverts commit r340997. This change turned out not to be NFC after all, but e.g. causes clang to crash when building the linux kernel for aarch64. llvm-svn: 341031	2018-08-30 08:06:50 +00:00
Max Kazantsev	d3a4cbe153	[NFC] Move OrderedInstructions and InstructionPrecedenceTracking to Analysis These classes don't make any changes to IR and have no reason to be in Transform/Utils. This patch moves them to Analysis folder. This will allow us reusing these classes in some analyzes, like MustExecute. llvm-svn: 341015	2018-08-30 04:49:03 +00:00
Max Kazantsev	3c284bde3f	Re-enable "[NFC] Unify guards detection" rL340921 has been reverted by rL340923 due to linkage dependency from Transform/Utils to Analysis which is not allowed. In this patch this has been fixed, a new utility function moved to Analysis. Differential Revision: https://reviews.llvm.org/D51152 llvm-svn: 341014	2018-08-30 03:39:16 +00:00
Philip Reames	bed556133d	[SimplifyCFG] Rename a variable for readibility of a future change [NFC] llvm-svn: 341004	2018-08-30 00:12:29 +00:00
Philip Reames	6bd16b5850	[SimplifyCFG] Fix a cost modeling oversight in branch commoning The cost modeling was not accounting for the fact we were duplicating the instruction once per predecessor. With a default threshold of 1, this meant we were actually creating #pred copies. Adding to the fun, there is absolutely no test coverage for this. Simply bailing for more than one predecessor passes all checked in tests. llvm-svn: 341001	2018-08-30 00:03:02 +00:00
Philip Reames	7c57dac955	[SimplifyCFG] Common debug handling [NFC] llvm-svn: 340997	2018-08-29 23:22:07 +00:00
Reid Kleckner	9397c2a23b	Revert r340947 "[InstCombine] Expand the simplification of pow() into exp2()" It broke the clang-cl self-host. llvm-svn: 340991	2018-08-29 22:58:33 +00:00
Philip Reames	1887c40b22	Add a todo and tests to Address a review commnt from D50925 [NFC] llvm-svn: 340978	2018-08-29 22:09:21 +00:00
Philip Reames	f562fc8dbf	[LICM] Hoist stores of invariant values to invariant addresses out of loops Teach LICM to hoist stores out of loops when the store writes to a location otherwise unused in the loop, writes a value which is invariant, and is guaranteed to execute if the loop is entered. Worth noting is that this transformation is partially overlapping with the existing promotion transformation. Reasons this is worthwhile anyway include: * For multi-exit loops, this doesn't require duplication of the store. * It kicks in for case where we can't prove we exit through a normal exit (i.e. we may throw), but can prove the store executes before that possible side exit. Differential Revision: https://reviews.llvm.org/D50925 llvm-svn: 340974	2018-08-29 21:49:30 +00:00
Fedor Sergeev	7b49aa03af	[SimpleLoopUnswitch] After unswitch delete dead blocks in parent loops Summary: Assert from PR38737 happens on the dead block inside the parent loop after unswitching nontrivial switch in the inner loop. deleteDeadBlocksFromLoop now takes extra care to detect/remove dead blocks in all the parent loops in addition to the blocks from original loop being unswitched. Reviewers: asbirlea, chandlerc Reviewed By: asbirlea Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51415 llvm-svn: 340955	2018-08-29 19:10:44 +00:00
Matt Morehouse	cf311cfc20	Revert "[libFuzzer] Port to Windows" This reverts r340949 due to bot breakage again. llvm-svn: 340954	2018-08-29 18:40:41 +00:00
Sanjay Patel	0f29e953b7	[InstCombine] canonicalize fneg with llvm.sin This is a follow-up to rL339604 which did the same transform for a sin libcall. The handling of intrinsics vs. libcalls is unfortunately scattered, so I'm just adding this next to the existing transform for llvm.cos for now. This should resolve PR38458: https://bugs.llvm.org/show_bug.cgi?id=38458 If the call was already negated, the negates will cancel each other out. llvm-svn: 340952	2018-08-29 18:27:49 +00:00
Matt Morehouse	245ebd71ef	[libFuzzer] Port to Windows Summary: Port libFuzzer to windows-msvc. This patch allows libFuzzer targets to be built and run on Windows, using -fsanitize=fuzzer and/or fsanitize=fuzzer-no-link. It allows these forms of coverage instrumentation to work on Windows as well. It does not fix all issues, such as those with -fsanitize-coverage=stack-depth, which is not usable on Windows as of this patch. It also does not fix any libFuzzer integration tests. Nearly all of them fail to compile, fixing them will come in a later patch, so libFuzzer tests are disabled on Windows until them. Reviewers: morehouse, rnk Reviewed By: morehouse, rnk Subscribers: #sanitizers, delcypher, morehouse, kcc, eraman Differential Revision: https://reviews.llvm.org/D51022 llvm-svn: 340949	2018-08-29 18:08:34 +00:00
Evandro Menezes	22e0bdf4ed	[InstCombine] Expand the simplification of pow() with nested exp{,2}() Expand the simplification of `pow(exp{,2}(x), y)` to all FP types. This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as 252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on x86-64 or A64. Differential revision: https://reviews.llvm.org/D51195 llvm-svn: 340948	2018-08-29 17:59:48 +00:00
Evandro Menezes	a3a7b53571	[InstCombine] Expand the simplification of pow() into exp2() Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all scalar and vector types. This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as 252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on x86-64 or A64. Differential revision: https://reviews.llvm.org/D49273 llvm-svn: 340947	2018-08-29 17:59:34 +00:00
Craig Topper	2bcb1eeee1	[InstCombine] Replace two calls to getNumUses() with !hasNUsesOrMore We were calling getNumUses to check for 1 or 2 uses. But getNumUses is linear in the number of uses. We can instead use !hasNUsesOrMore(3) which will stop the linear scan as soon as it determines there are at least 3 uses even if there are more. llvm-svn: 340939	2018-08-29 17:09:21 +00:00
Sanjay Patel	d4e19d272a	[InstCombine] move declarations closer to uses; NFC llvm-svn: 340930	2018-08-29 14:42:12 +00:00
Sanjay Patel	7a05641fa8	[InstCombine] remove unnecessary shuffle undef folding Add a test for constant folding to show that (shuffle undef, undef, mask) should already be handled via instsimplify. llvm-svn: 340926	2018-08-29 13:24:34 +00:00
Alexandros Lamprineas	f6db5bcd38	Revert r340922 "[GVNHoist] Re-enable GVNHoist by default" Another sanitizer buildbot failed this time at bootstrap when compiling SemaTemplateInstantiate.cpp with this assertion: `dominates(MD, U) && "Memory Def does not dominate it's uses"'. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/15047 llvm-svn: 340925	2018-08-29 13:00:55 +00:00
Hans Wennborg	2c390c54f6	Revert r340921 "[NFC] Unify guards detection" This broke the build, see e.g. http://lab.llvm.org:8011/builders/clang-cmake-armv8-lnt/builds/4626/ http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/18647/ http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/5856/ http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/22800/ > We have multiple places in code where we try to identify whether or not > some instruction is a guard. This patch factors out this logic into a separate > utility function which works uniformly in all places. > > Differential Revision: https://reviews.llvm.org/D51152 > Reviewed By: fedor.sergeev llvm-svn: 340923	2018-08-29 12:21:32 +00:00
Alexandros Lamprineas	c03b9b8854	[GVNHoist] Re-enable GVNHoist by default Rebase rL338240 since the excessive memory usage observed when using GVNHoist with UBSan has been fixed by rL340818. Differential Revision: https://reviews.llvm.org/D49858 llvm-svn: 340922	2018-08-29 11:58:34 +00:00
Max Kazantsev	1dafaa87d9	[NFC] Unify guards detection We have multiple places in code where we try to identify whether or not some instruction is a guard. This patch factors out this logic into a separate utility function which works uniformly in all places. Differential Revision: https://reviews.llvm.org/D51152 Reviewed By: fedor.sergeev llvm-svn: 340921	2018-08-29 11:37:34 +00:00
Max Kazantsev	8b4ffe66d6	[NFC] Factor out guard utility methods into a separate file This patch creates file GuardUtils which will contain logic for work with guards that can be shared across different passes. Differential Revision: https://reviews.llvm.org/D51151 Reviewed By: fedor.sergeev llvm-svn: 340914	2018-08-29 10:51:59 +00:00
Hans Wennborg	e0f3e9283f	LoopSink: Don't sink into blocks without an insertion point (PR38462) In the PR, LoopSink was trying to sink into a catchswitch block, which doesn't have a valid insertion point. Differential Revision: https://reviews.llvm.org/D51307 llvm-svn: 340900	2018-08-29 06:55:27 +00:00
Zhaoshi Zheng	35818e2789	[QTOOL-37352] Consider isLegalAddressingImm in Constant Hoisting In Thumb1, legal imm range is [0, 255] for ADD/SUB instructions. However, the legal imm range for LD/ST in (R+Imm) addressing mode is [0, 127]. Imms in [128, 255] are materialized by mov R, #imm, and LD/STs use them in (R+R) addressing mode. This patch checks if a constant is used as offset in (R+Imm), if so, it checks isLegalAddressingMode passing the constant value as BaseOffset. Differential Revision: https://reviews.llvm.org/D50931 llvm-svn: 340882	2018-08-28 23:00:59 +00:00
Alina Sbirlea	52e97a28d4	[SimpleLoopUnswitch] Form dedicated exits after trivial unswitches. Summary: Form dedicated exits after trivial unswitches. Fixes PR38737, PR38283. Reviewers: chandlerc, fedor.sergeev Subscribers: sanjoy, jlebar, uabelho, llvm-commits Differential Revision: https://reviews.llvm.org/D51375 llvm-svn: 340871	2018-08-28 20:41:05 +00:00
Matt Morehouse	bab8556f01	Revert "[libFuzzer] Port to Windows" This reverts commit r340860 due to failing tests. llvm-svn: 340867	2018-08-28 19:07:24 +00:00
Matt Morehouse	c6fff3b6f5	[libFuzzer] Port to Windows Summary: Port libFuzzer to windows-msvc. This patch allows libFuzzer targets to be built and run on Windows, using -fsanitize=fuzzer and/or fsanitize=fuzzer-no-link. It allows these forms of coverage instrumentation to work on Windows as well. It does not fix all issues, such as those with -fsanitize-coverage=stack-depth, which is not usable on Windows as of this patch. It also does not fix any libFuzzer integration tests. Nearly all of them fail to compile, fixing them will come in a later patch, so libFuzzer tests are disabled on Windows until them. Patch By: metzman Reviewers: morehouse, rnk Reviewed By: morehouse, rnk Subscribers: morehouse, kcc, eraman Differential Revision: https://reviews.llvm.org/D51022 llvm-svn: 340860	2018-08-28 18:34:32 +00:00
Matt Arsenault	10de2775bd	AMDGPU: Remove nan tests in class if src is nnan llvm-svn: 340850	2018-08-28 18:10:02 +00:00
David Bolvansky	c1b27b562b	[Inliner] Attribute callsites with inline remarks Summary: Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites. An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//. For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons: remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6) remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites: define void @test1() { call void @bar(i1 true) #0 call void @bar(i1 false) #2 ret void } attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" } .. attributes #2 = { "inline-remark"="(cost=never): recursive" } Patch by: yrouban (Yevgeny Rouban) Reviewers: xbolva00, tejohnson, apilipenko Reviewed By: xbolva00, tejohnson Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D50435 llvm-svn: 340834	2018-08-28 15:27:25 +00:00
Mikael Holmen	4d652c4ce7	[CloneFunction] Constant fold terminators before checking single predecessor Summary: This fixes PR31105. There is code trying to delete dead code that does so by e.g. checking if the single predecessor of a block is the block itself. That check fails on a block like this bb: br i1 undef, label %bb, label %bb since that has two (identical) predecessors. However, after the check for dead blocks there is a call to ConstantFoldTerminator on the basic block, and that call simplifies the block to bb: br label %bb Therefore we now do the call to ConstantFoldTerminator before the check if the block is dead, so it can realize that it really is. The original behavior lead to the block not being removed, but it was simplified as above, and then we did a call to Dest->replaceAllUsesWith(&*I); with old and new being equal, and an assertion triggered. Reviewers: chandlerc, fhahn Reviewed By: fhahn Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D51280 llvm-svn: 340820	2018-08-28 12:40:11 +00:00
Alexandros Lamprineas	484bd13e2d	[GVNHoist] Prune out useless CHI insertions Fix for the out-of-memory error when compiling SemaChecking.cpp with GVNHoist and ubsan enabled. I've used a cache for inserted CHIs to avoid excessive memory usage. Differential Revision: https://reviews.llvm.org/D50323 llvm-svn: 340818	2018-08-28 11:07:54 +00:00
Max Kazantsev	0c4b84e2df	[NFC] A loop can never contain Ret instruction llvm-svn: 340808	2018-08-28 09:26:28 +00:00
Craig Topper	a6cd4b9bce	[InstCombine] Extend (add (sext x), cst) --> (sext (add x, cst')) and (add (zext x), cst) --> (zext (add x, cst')) to work for vectors Differential Revision: https://reviews.llvm.org/D51236 llvm-svn: 340796	2018-08-28 02:02:29 +00:00
Sanjay Patel	c615910be5	[InstCombine] fix formatting; NFC llvm-svn: 340790	2018-08-27 23:01:10 +00:00
Sanjay Patel	42d31c20a8	[InstCombine] allow shuffle+binop canonicalization with widening shuffles This lines up with the behavior of an existing transform where if both operands of the binop are shuffled, we allow moving the binop before the shuffle regardless of whether the shuffle changes the size of the vector. llvm-svn: 340787	2018-08-27 22:41:44 +00:00
Evandro Menezes	253991cfaf	[PATCH] [InstCombine] Fix issue in the simplification of pow() with nested exp{,2}() Fix the issue of duplicating the call to `exp{,2}()` when it's nested in `pow()`, as exposed by rL340462. Differential revision: https://reviews.llvm.org/D51194 llvm-svn: 340784	2018-08-27 22:11:15 +00:00
George Burgess IV	aa09a82b4b	s/std::set/DenseSet/; NFC We only use this set for `insert` and `count`, so a hashing container seems better here. llvm-svn: 340783	2018-08-27 22:10:59 +00:00
Nico Weber	e75fd1b184	fix comment typo llvm-svn: 340744	2018-08-27 14:25:22 +00:00
Max Kazantsev	b5dd092051	[NFC] Split logic of ImplicitControlFlowTracking to allow generalization We have a class `ImplicitControlFlowTracking` which allows us to keep track of instructions that can abnormally exit and answer queries like "whether or not there is side-exiting instruction above this instruction in its block". We may want to have the similar tracking for other types of "special" instructions, for example instructions that write memory. This patch separates ImplicitControlFlowTracking into two classes, isolating all general logic not related to implicit control flow into its parent class. We can later make another child of this class to keep track of instructions that write memory. The motivation for that is that we want to make these checks efficiently in the patch https://reviews.llvm.org/D50891. NOTE: The naming of the parent class is not super cool, but the other options we have are hardly better. Please feel free to rename it as NFC if you think you've found a more informative name for it. Differential Revision: https://reviews.llvm.org/D50954 Reviewed By: fedor.sergeev llvm-svn: 340728	2018-08-27 09:43:16 +00:00
Chandler Carruth	9ae926b973	[IR] Replace `isa<TerminatorInst>` with `isTerminator()`. This is a bit awkward in a handful of places where we didn't even have an instruction and now we have to see if we can build one. But on the whole, this seems like a win and at worst a reasonable cost for removing `TerminatorInst`. All of this is part of the removal of `TerminatorInst` from the `Instruction` type hierarchy. llvm-svn: 340701	2018-08-26 09:51:22 +00:00
Chandler Carruth	698fbe7b59	[IR] Sink `isExceptional` predicate to `Instruction`, rename it to `isExceptionalTermiantor` and implement it for opcodes as well following the common pattern in `Instruction`. Part of removing `TerminatorInst` from the `Instruction` type hierarchy to make it easier to share logic and interfaces between instructions that are both terminators and not terminators. llvm-svn: 340699	2018-08-26 08:56:42 +00:00
Chandler Carruth	96fc1de77d	[IR] Begin removal of TerminatorInst by removing successor manipulation. The core get and set routines move to the `Instruction` class. These routines are only valid to call on instructions which are terminators. The iterator and generic range based access move to `CFG.h` where all the other generic successor and predecessor access lives. While moving the iterator here, simplify it using the iterator utilities LLVM provides and updates coding style as much as reasonable. The APIs remain pointer-heavy when they could better use references, and retain the odd behavior of `operator*` and `operator->` that is common in LLVM iterators. Adjusting this API, if desired, should be a follow-up step. Non-generic range iteration is added for the two instructions where there is an especially easy mechanism and where there was code attempting to use the range accessor from a specific subclass: `indirectbr` and `br`. In both cases, the successors are contiguous operands and can be easily iterated via the operand list. This is the first major patch in removing the `TerminatorInst` type from the IR's instruction type hierarchy. This change was discussed in an RFC here and was pretty clearly positive: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123407.html There will be a series of much more mechanical changes following this one to complete this move. Differential Revision: https://reviews.llvm.org/D47467 llvm-svn: 340698	2018-08-26 08:41:15 +00:00
Xinliang David Li	bcf726a32d	[PGO] add target md5sum in warning message for icall Differential revision: http://reviews.llvm.org/D51193 llvm-svn: 340657	2018-08-24 21:38:24 +00:00

... 13 14 15 16 17 ...

21919 Commits