llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	6911bfe263	[ScalarizeMaskedMemIntrin] When expanding masked gathers, start with the passthru vector and insert the new load results into it. Previously we started with undef and did a final merge with the passthru at the end. llvm-svn: 343273	2018-09-27 21:28:59 +00:00
Craig Topper	45ad631b4c	[ScalarizeMaskedMemIntrin] Add some IR only test cases for masked gather expansion. llvm-svn: 343272	2018-09-27 21:28:55 +00:00
Craig Topper	7d234d6628	[ScalarizeMaskedMemIntrin] When expanding masked loads, start with the passthru value and insert each conditional load result over their element. Previously we started with undef and did one final merge at the end with a select. llvm-svn: 343271	2018-09-27 21:28:52 +00:00
Craig Topper	dfc0f289fa	[ScalarizeMaskedMemIntrin] Handle the case where the mask is an all zero vector. This shouldn't really happen in practice I hope, but we tried to handle other constant cases. We missed this one because we checked for ConstantVector without realizing that zero becomes ConstantAggregateZero instead. So instead just check for Constant and use getAggregateElement which will do the dirty work for us. llvm-svn: 343270	2018-09-27 21:28:46 +00:00
Craig Topper	a6478ac5d4	[ScalarizeMaskedMemIntrin] Add dedicated IR only tests for masked load expansion so I can begin making modifications. llvm-svn: 343269	2018-09-27 21:28:43 +00:00
Sanjay Patel	c3f50ff92e	[InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0) When C is not zero and infinites are not allowed (C / X) > 0 is a sign test. Depending on the sign of C, the predicate must be swapped. E.g.: foo(double X) { if ((-2.0 / X) <= 0) ... } => foo(double X) { if (X >= 0) ... } Patch by: @marels (Martin Elshuber) Differential Revision: https://reviews.llvm.org/D51942 llvm-svn: 343228	2018-09-27 15:59:24 +00:00
Sanjay Patel	95a816b34a	[InstCombine] add tests for FP sign-bit cmp optimization with fdiv; NFC These are baseline tests for D51942. Patch by: @marels (Martin Elshuber) llvm-svn: 343222	2018-09-27 14:24:29 +00:00
Nicola Zaghen	436c012702	[InstCombine] Add new tests in preparation for a combine of icmp (mul nsw/nuw X, C2), C Proof for the future optimisations are here: - eq/neq: https://rise4fun.com/Alive/9PBA - sgt/ugt: https://rise4fun.com/Alive/58yr - slt/ult: https://rise4fun.com/Alive/VCQ Differential Revision: https://reviews.llvm.org/D51625 llvm-svn: 343190	2018-09-27 10:08:38 +00:00
Sanjay Patel	150afce75a	[InstCombine] add tests that show undef propagation failures from D52548; NFC Differential Revision: https://reviews.llvm.org/D52556 llvm-svn: 343140	2018-09-26 20:30:47 +00:00
Florian Hahn	6feb637124	[LoopInterchange] Preserve LCSSA. This patch extends LoopInterchange to move LCSSA to the right place after interchanging. This is required for LoopInterchange to become a function pass. An alternative to the manual moving of the PHIs, we could also re-form the LCSSA phis for a set of interchanged loops, but that's more expensive. Reviewers: efriedma, mcrosier, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D52154 llvm-svn: 343132	2018-09-26 19:34:25 +00:00
Sanjay Patel	d938d0d4f5	[InstCombine] add tests for vector insert/extract; NFC Preliminary step for D52439. llvm-svn: 343128	2018-09-26 17:57:38 +00:00
David Green	353cb3d4e5	[CodeGen] Enable tail calls for functions with NonNull attributes. Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091	2018-09-26 10:46:18 +00:00
Vyacheslav Zakharin	e06831a3b2	Remove LoopID metadata from the branch instruction that follows the peeled iterations. Differential Revision: https://reviews.llvm.org/D52176 llvm-svn: 343054	2018-09-26 01:03:21 +00:00
Sanjay Patel	f23727d972	[InstCombine] add fneg variation of shuffle-binop fold; NFC If the fsub in this pattern was replaced by an actual fneg instruction, we would need to add a fold to recognize that because fneg would not be a binop. llvm-svn: 343041	2018-09-25 22:48:58 +00:00
Anna Thomas	b1e3d45318	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 llvm-svn: 343028	2018-09-25 20:57:20 +00:00
Teresa Johnson	7fb39dfa7c	[ThinLTO] Efficiency fix for writing type id records in per-module indexes Summary: In D49565/r337503, the type id record writing was fixed so that only referenced type ids were emitted into each per-module index for ThinLTO distributed builds. However, this still left an efficiency issue: each per-module index checked all type ids for membership in the referenced set, yielding O(M*N) performance (M indexes and N type ids). Change the TypeIdMap in the summary to be indexed by GUID, to facilitate correlating with type identifier GUIDs referenced in the function summary TypeIdInfo structures. This allowed simplifying other places where a map from type id GUID to type id map entry was previously being used to aid this correlation. Also fix AsmWriter code to handle the rare case of type id GUID collision. For a large internal application, this reduced the thin link time by almost 15%. Reviewers: pcc, vitalybuka Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51330 llvm-svn: 343021	2018-09-25 20:14:40 +00:00
Sanjay Patel	69ed4710b8	[InstCombine] narrow binops on concatenated vectors (PR33026) The motivating case from: https://bugs.llvm.org/show_bug.cgi?id=33026 ...has no shuffles now. This kind of pattern may occur during vectorization when targets have lumpy ISAs like SSE/AVX. llvm-svn: 342988	2018-09-25 15:57:37 +00:00
David Green	9108c2b921	[LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder, similar to how it is already done in the llvm::UnrollLoop. The compiler would crash if this function is called with a malformed loop. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D51486 llvm-svn: 342958	2018-09-25 10:08:47 +00:00
Christy Lee	e94374809e	Re-submitting changes in D51550 because it failed to patch. Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919	2018-09-24 20:47:12 +00:00
Christy Lee	bf112ea25b	Reland r342494 after fixing LIT checks. llvm-svn: 342907	2018-09-24 17:26:30 +00:00
Sanjay Patel	7b86bc22de	[InstCombine] add/move tests for extractelement; NFC llvm-svn: 342905	2018-09-24 17:17:16 +00:00
Matt Arsenault	f432011d33	AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879	2018-09-24 13:18:15 +00:00
Petar Jovanovic	c451c9ef50	[deadargelim] Update dbg.value of 'unused' parameters DeadArgElim pass marks unused function arguments as ‘undef’ without updating existing dbg.values referring to it. As a consequence the debug info metadata in the final executable was wrong. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D51968 llvm-svn: 342871	2018-09-24 10:01:24 +00:00
Eugene Leviant	2b70d616f0	[WholeProgramDevirt] Don't process declarations when building type id map Differential revision: https://reviews.llvm.org/D52175 llvm-svn: 342836	2018-09-23 13:27:47 +00:00
Sanjay Patel	09e02fbf51	[InstCombine][x86] try even harder to convert blendv intrinsic to generic IR (PR38814) Follow-up to rL342324 (D52059): Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 This is an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. llvm-svn: 342806	2018-09-22 14:43:55 +00:00
Craig Topper	2b3f5df73a	[InstCombine] Fold (min/max ~X, Y) -> ~(max/min X, ~Y) when Y is freely invertible Summary: This restores the combine that was reverted in r341883. The infinite loop from the failing test no longer occurs due to changes from r342163. Reviewers: spatel, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52070 llvm-svn: 342797	2018-09-22 05:53:27 +00:00
Warren Ristow	4f27730eaf	[Loop Vectorizer] Abandon vectorization when no integer IV found Support for vectorizing loops with secondary floating-point induction variables was added in r276554. A primary integer IV is still required for vectorization to be done. If an FP IV was found, but no integer IV was found at all (primary or secondary), the attempt to vectorize still went forward, causing a compiler-crash. This change abandons that attempt when no integer IV is found. (Vectorizing FP-only cases like this, rather than bailing out, is discussed as possible future work in D52327.) See PR38800 for more information. Differential Revision: https://reviews.llvm.org/D52327 llvm-svn: 342786	2018-09-21 23:03:50 +00:00
Sanjay Patel	72d627e5ec	[InstCombine] add tests for extractelement; NFC There are folds under visitExtractElementInst() that don't appear to have any test coverage, so adding a few basic cases here. llvm-svn: 342740	2018-09-21 14:43:49 +00:00
JF Bastien	73d8e4e531	Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue Summary: his code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types. clang part of this patch: D51752 Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51751 llvm-svn: 342709	2018-09-21 05:17:42 +00:00
Sanjay Patel	18c29b7d74	[InstCombine] rename test file, simplify tests, regenerate full checks; NFC Fast-math is irrelevant for these transforms. llvm-svn: 342683	2018-09-20 21:10:14 +00:00
Sameer AbuAsal	77beee4136	[inline Cost] Don't mark functions accessing varargs as non-inlinable Summary: rL323619 marks functions that are calling va_end as not viable for inlining. This patch reverses that since this va_end doesn't need access to the vriadic arguments list that are saved on the stack, only va_start does. Reviewers: efriedma, fhahn Reviewed By: fhahn Subscribers: eraman, haicheng, llvm-commits Differential Revision: https://reviews.llvm.org/D52067 llvm-svn: 342675	2018-09-20 18:39:34 +00:00
Sanjay Patel	dfe4380440	[InstCombine] add tests for vector concat with binop (PR33026); NFC llvm-svn: 342665	2018-09-20 17:10:38 +00:00
Fedor Sergeev	ee8d31c49e	[New PM] Introducing PassInstrumentation framework Pass Execution Instrumentation interface enables customizable instrumentation of pass execution, as per "RFC: Pass Execution Instrumentation interface" posted 06/07/2018 on llvm-dev@ The intent is to provide a common machinery to implement all the pass-execution-debugging features like print-before/after, opt-bisect, time-passes etc. Here we get a basic implementation consisting of: * PassInstrumentationCallbacks class that handles registration of callbacks and access to them. * PassInstrumentation class that handles instrumentation-point interfaces that call into PassInstrumentationCallbacks. * Callbacks accept StringRef which is just a name of the Pass right now. There were some ideas to pass an opaque wrapper for the pointer to pass instance, however it appears that pointer does not actually identify the instance (adaptors and managers might have the same address with the pass they govern). Hence it was decided to go simple for now and then later decide on what the proper mental model of identifying a "pass in a phase of pipeline" is. * Callbacks accept llvm::Any serving as a wrapper for const IRUnit, to remove direct dependencies on different IRUnits (e.g. Analyses). PassInstrumentationAnalysis analysis is explicitly requested from PassManager through usual AnalysisManager::getResult. All pass managers were updated to run that to get PassInstrumentation object for instrumentation calls. * Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra args out of a generic PassManager's extra args. This is the only way I was able to explicitly run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or RepeatedPass::run. TODO: Upon lengthy discussions we agreed to accept this as an initial implementation and then get rid of getAnalysisResult by improving RepeatedPass implementation. * PassBuilder takes PassInstrumentationCallbacks object to pass it further into PassInstrumentationAnalysis. Callbacks registration should be performed directly through PassInstrumentationCallbacks. * new-pm tests updated to account for PassInstrumentationAnalysis being run * Added PassInstrumentation tests to PassBuilderCallbacks unit tests. Other unit tests updated with registration of the now-required PassInstrumentationAnalysis. Made getName helper to return std::string (instead of StringRef initially) to fix asan builtbot failures on CGSCC tests. Reviewers: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D47858 llvm-svn: 342664	2018-09-20 17:08:45 +00:00
Jesper Antonsson	719fa055d0	[InstCombine] Handle vector compares in foldGEPIcmp() Summary: This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding": https://bugs.llvm.org/show_bug.cgi?id=38984 Reviewers: majnemer, spatel, lattner, lebedev.ri Reviewed By: lebedev.ri Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D52263 llvm-svn: 342647	2018-09-20 13:37:28 +00:00
Bjorn Pettersson	cd53b7f54e	[IPSCCP] Fix a problem with removing labels in a switch with undef condition Summary: Before removing basic blocks that ipsccp has considered as dead all uses of the basic block label must be removed. That is done by calling ConstantFoldTerminator on the users. An exception is when the branch condition is an undef value. In such scenarios ipsccp is using some internal assumptions regarding which edge in the control flow that should remain, while ConstantFoldTerminator don't know how to fold the terminator. The problem addressed here is related to ConstantFoldTerminator's ability to rewrite a 'switch' into a conditional 'br'. In such situations ConstantFoldTerminator returns true indicating that the terminator has been rewritten. However, ipsccp treated the true value as if the edge to the dead basic block had been removed. So the code for resolving an undef branch condition did not trigger, and we ended up with assertion that there were uses remaining when deleting the basic block. The solution is to resolve indeterminate branches before the call to ConstantFoldTerminator. Reviewers: efriedma, fhahn, davide Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52232 llvm-svn: 342632	2018-09-20 09:00:17 +00:00
Eric Christopher	019889374b	Temporarily Revert "[New PM] Introducing PassInstrumentation framework" as it was causing failures in the asan buildbot. This reverts commit r342597. llvm-svn: 342616	2018-09-20 05:16:29 +00:00
Fedor Sergeev	a5f279ea89	[New PM] Introducing PassInstrumentation framework Pass Execution Instrumentation interface enables customizable instrumentation of pass execution, as per "RFC: Pass Execution Instrumentation interface" posted 06/07/2018 on llvm-dev@ The intent is to provide a common machinery to implement all the pass-execution-debugging features like print-before/after, opt-bisect, time-passes etc. Here we get a basic implementation consisting of: * PassInstrumentationCallbacks class that handles registration of callbacks and access to them. * PassInstrumentation class that handles instrumentation-point interfaces that call into PassInstrumentationCallbacks. * Callbacks accept StringRef which is just a name of the Pass right now. There were some ideas to pass an opaque wrapper for the pointer to pass instance, however it appears that pointer does not actually identify the instance (adaptors and managers might have the same address with the pass they govern). Hence it was decided to go simple for now and then later decide on what the proper mental model of identifying a "pass in a phase of pipeline" is. * Callbacks accept llvm::Any serving as a wrapper for const IRUnit, to remove direct dependencies on different IRUnits (e.g. Analyses). PassInstrumentationAnalysis analysis is explicitly requested from PassManager through usual AnalysisManager::getResult. All pass managers were updated to run that to get PassInstrumentation object for instrumentation calls. * Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra args out of a generic PassManager's extra args. This is the only way I was able to explicitly run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or RepeatedPass::run. TODO: Upon lengthy discussions we agreed to accept this as an initial implementation and then get rid of getAnalysisResult by improving RepeatedPass implementation. * PassBuilder takes PassInstrumentationCallbacks object to pass it further into PassInstrumentationAnalysis. Callbacks registration should be performed directly through PassInstrumentationCallbacks. * new-pm tests updated to account for PassInstrumentationAnalysis being run * Added PassInstrumentation tests to PassBuilderCallbacks unit tests. Other unit tests updated with registration of the now-required PassInstrumentationAnalysis. Reviewers: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D47858 llvm-svn: 342597	2018-09-19 22:42:57 +00:00
Matt Morehouse	e62fc3d0b6	[InstCombine] Disable strcmp->memcmp transform for MSan. Summary: The strcmp->memcmp transform can make the resulting memcmp read uninitialized data, which MSan doesn't like. Resolves https://github.com/google/sanitizers/issues/993. Reviewers: eugenis, xbolva00 Reviewed By: eugenis Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D52272 llvm-svn: 342582	2018-09-19 19:37:24 +00:00
Fedor Sergeev	25de3f83be	Revert rL342544: [New PM] Introducing PassInstrumentation framework A bunch of bots fail to compile unittests. Reverting. llvm-svn: 342552	2018-09-19 14:54:48 +00:00
Roman Lebedev	f50023d37c	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask Summary: The last low-bit-mask-pattern-producing-pattern i can think of. https://rise4fun.com/Alive/UGzE <- non-canonical But we can not canonicalize it because of extra uses. https://bugs.llvm.org/show_bug.cgi?id=38123 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52148 llvm-svn: 342548	2018-09-19 13:35:46 +00:00
Roman Lebedev	ca2bdb03d6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask Summary: Same as to D52146. `((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl We can not canonicalize it due to the extra uses. But we can handle it here. Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52147 llvm-svn: 342547	2018-09-19 13:35:40 +00:00
Roman Lebedev	183a465dc6	[InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask Summary: Two folds are happening here: 1. https://rise4fun.com/Alive/oaFX 2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4 This change doesn't just add the handling for eq/ne predicates, it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work, so all the 16 fold variants* are immediately supported. I'm indeed only testing these two predicates. I do not feel like re-proving all 16 folds, because they were already proven for the general case of constant with all-ones in low bits. So as long as the mask produces all-ones in low bits, i'm pretty sure the fold is valid. But required, i can re-prove, let me know. eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total. https://bugs.llvm.org/show_bug.cgi?id=38123 https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52146 llvm-svn: 342546	2018-09-19 13:35:27 +00:00
Fedor Sergeev	875c938fec	[New PM] Introducing PassInstrumentation framework Summary: Pass Execution Instrumentation interface enables customizable instrumentation of pass execution, as per "RFC: Pass Execution Instrumentation interface" posted 06/07/2018 on llvm-dev@ The intent is to provide a common machinery to implement all the pass-execution-debugging features like print-before/after, opt-bisect, time-passes etc. Here we get a basic implementation consisting of: * PassInstrumentationCallbacks class that handles registration of callbacks and access to them. * PassInstrumentation class that handles instrumentation-point interfaces that call into PassInstrumentationCallbacks. * Callbacks accept StringRef which is just a name of the Pass right now. There were some ideas to pass an opaque wrapper for the pointer to pass instance, however it appears that pointer does not actually identify the instance (adaptors and managers might have the same address with the pass they govern). Hence it was decided to go simple for now and then later decide on what the proper mental model of identifying a "pass in a phase of pipeline" is. * Callbacks accept llvm::Any serving as a wrapper for const IRUnit, to remove direct dependencies on different IRUnits (e.g. Analyses). PassInstrumentationAnalysis analysis is explicitly requested from PassManager through usual AnalysisManager::getResult. All pass managers were updated to run that to get PassInstrumentation object for instrumentation calls. * Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra args out of a generic PassManager's extra args. This is the only way I was able to explicitly run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or RepeatedPass::run. TODO: Upon lengthy discussions we agreed to accept this as an initial implementation and then get rid of getAnalysisResult by improving RepeatedPass implementation. * PassBuilder takes PassInstrumentationCallbacks object to pass it further into PassInstrumentationAnalysis. Callbacks registration should be performed directly through PassInstrumentationCallbacks. * new-pm tests updated to account for PassInstrumentationAnalysis being run * Added PassInstrumentation tests to PassBuilderCallbacks unit tests. Other unit tests updated with registration of the now-required PassInstrumentationAnalysis. Reviewers: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D47858 llvm-svn: 342544	2018-09-19 12:25:52 +00:00
Benjamin Kramer	e5e1ea79fd	[InstCombine] Don't transform sin/cos -> tanl if for half types This is still unsafe for long double, we will transform things into tanl even if tanl is for another type. But that's for someone else to fix. llvm-svn: 342542	2018-09-19 12:01:38 +00:00
Douglas Yung	de94ea140c	Revert r342494 as it was failing on a bot and the author cannot look at it until tomorrow. Failing bot: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/36708 llvm-svn: 342509	2018-09-18 19:34:05 +00:00
Christy Lee	c85da8bd9a	Do not optimize atomic load to non-atomic memcmp Differential Revision: https://reviews.llvm.org/D51998 llvm-svn: 342498	2018-09-18 17:02:42 +00:00
Christy Lee	1bc0fb4a3f	Check lines before using alias analysis to check for interference This diff is to show the difference before and after D51550 Differential Revision: https://reviews.llvm.org/D52044 llvm-svn: 342494	2018-09-18 16:43:44 +00:00
Max Kazantsev	0994abda3a	[IndVars] Remove unreasonable checks in rewriteLoopExitValues A piece of logic in rewriteLoopExitValues has a weird check on number of users which allowed an unprofitable transform in case if an instruction has more than 6 users. Differential Revision: https://reviews.llvm.org/D51404 Reviewed By: etherzhhb llvm-svn: 342444	2018-09-18 04:57:18 +00:00
Matt Arsenault	c640798597	LSV: Fix adjust alloca alignment trick for AMDGPU This was checking the hardcoded address space 0 for the stack. Additionally, this should be checking for legality with the adjusted alignment, so defer the alignment check. Also try to split if the unaligned access isn't allowed. llvm-svn: 342442	2018-09-18 02:05:44 +00:00
Alina Sbirlea	a782a70ad9	[EarlyCSEwMemorySSA] Add MSSA verification and tests to make EarlyCSE failures easier to track. Summary: EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result. Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA. Adding some tests to track this and a basic verify for future potential failures. Reviewers: george.burgess.iv, gberry Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D51960 llvm-svn: 342422	2018-09-17 22:35:21 +00:00
Matt Arsenault	80ea6dd1d5	Fix vectorization of canonicalize llvm-svn: 342390	2018-09-17 13:24:30 +00:00
Roman Lebedev	6356864e6d	[NFC][InstCombine] One more test pattern for comparisons with low-bit-mask. https://rise4fun.com/Alive/UGzE <- non-canonical, but has extra uses. https://bugs.llvm.org/show_bug.cgi?id=38123 llvm-svn: 342345	2018-09-16 12:51:09 +00:00
Roman Lebedev	3fb9414d02	[NFC][InstCombine] Some more tests for comparisons with low-bit-mask. https://bugs.llvm.org/show_bug.cgi?id=38123 https://bugs.llvm.org/show_bug.cgi?id=38708 llvm-svn: 342343	2018-09-16 08:05:06 +00:00
Craig Topper	2da7381678	[InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y)) Summary: If the sub doesn't overflow in the original type we can move it above the sext/zext. This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52075 llvm-svn: 342335	2018-09-15 18:54:10 +00:00
Sanjay Patel	296d35a5e9	[InstCombine][x86] try harder to convert blendv intrinsic to generic IR (PR38814) Missing optimizations with blendv are shown in: https://bugs.llvm.org/show_bug.cgi?id=38814 If this works, it's an easier and more powerful solution than adding pattern matching for a few special cases in the backend. The potential danger with this transform in IR is that the condition value can get separated from the select, and the backend might not be able to make a blendv out of it again. I don't think that's too likely, but I've kept this patch minimal with a 'TODO', so we can test that theory in the wild before expanding the transform. Differential Revision: https://reviews.llvm.org/D52059 llvm-svn: 342324	2018-09-15 14:25:44 +00:00
Roman Lebedev	1b7fc87020	[InstCombine] Inefficient pattern for high-bits checking 3 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) The last (as far i know?) pattern, non-canonical due to the extra use. https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52062 llvm-svn: 342321	2018-09-15 12:04:13 +00:00
Vedant Kumar	1b02dad9f2	[CodeGenPrepare] Preserve debug locs in OptimizeExtractBits CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it easier for the backend to emit fancy extract-bits instructions (e.g UBFX). Teach it to preserve debug locations and salvage debug values. llvm-svn: 342319	2018-09-15 04:08:52 +00:00
Sanjay Patel	90a36346bc	[InstCombine] refactor mul narrowing folds; NFCI Similar to rL342278: The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. D52075 should be able to use this code too rather than duplicating all of the logic. llvm-svn: 342292	2018-09-14 22:23:35 +00:00
Wei Mi	6a14325dff	[SampleFDO] Add FunctionOffsetTable in compact binary format profile. The patch saves a function offset table which maps function name index to the offset of its function profile to the start of the binary profile. By using the function offset table, for those function profiles which will not be used when compiling a module, the profile reader does't have to read them. For profile size around 10~20M, it saves ~10% compile time. Differential Revision: https://reviews.llvm.org/D51863 llvm-svn: 342283	2018-09-14 20:52:59 +00:00
Sanjay Patel	2426eb46dd	[InstCombine] refactor add narrowing folds; NFCI The test diffs are all cosmetic due to the change in value naming, but I'm including that to show that the new code does perform these folds rather than something else in instcombine. llvm-svn: 342278	2018-09-14 20:40:46 +00:00
Sebastian Pop	0f30f08b02	HotColdSplit: fix invalid SSA due to outlining The test used to fail with an invalid phi node: the two predecessors were outlined and the SSA representation was left invalid. The patch adds the exit block to the cold region. llvm-svn: 342277	2018-09-14 20:36:19 +00:00
Sanjay Patel	fcf8c7c908	[InstCombine] add more tests for add narrowing folds; NFC llvm-svn: 342274	2018-09-14 20:33:40 +00:00
Sanjay Patel	003f452522	[InstCombine] rename test file to better describe the fold; NFC The folds are not limited to zext, and the real goal is width reduction of a math op. D52075 is proposing to extend this to subtracts. llvm-svn: 342254	2018-09-14 18:12:30 +00:00
Sanjay Patel	5a9462e42a	[InstCombine] remove unnecessary target constraints for tests; NFC These are universal folds. llvm-svn: 342253	2018-09-14 18:06:36 +00:00
Sanjay Patel	7b9e1afd1f	[InstCombine] move test next to related tests; NFC llvm-svn: 342251	2018-09-14 18:05:14 +00:00
Sanjay Patel	1f4f26a2bb	[InstCombine] remove stall comment from test file; NFC llvm-svn: 342250	2018-09-14 18:02:17 +00:00
Sanjay Patel	f7ba0ac0b5	[InstCombine] regenerate test checks; NFC There was a bug in a check line regex that could cause the test to fail with a naming difference. The auto-gen script seems to work as expected now. llvm-svn: 342249	2018-09-14 17:53:44 +00:00
Sanjay Patel	b437238e95	[InstCombine] add more tests for x86 blendv (PR38814); NFC llvm-svn: 342237	2018-09-14 13:47:33 +00:00
Florian Hahn	3afb974aa5	[LoopInterchange] Preserve ScalarEvolution, by forgetting about interchanged loops. As preparation for LoopInterchange becoming a loop pass, it needs to preserve ScalarEvolution. Even though interchanging should not change the trip count of the loop, it modifies loop entry, latch and exit blocks. I added -verify-scev to some loop interchange tests, but the verification does not catch problems caused by missing invalidation of SE in loop interchange, as the trip counts themselves do not change. So there might be potential to make the SE verification covering more stuff in the future. Reviewers: mkazantsev, efriedma, karthikthecool Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D52026 llvm-svn: 342209	2018-09-14 07:50:20 +00:00
Craig Topper	e385365c40	[InstCombine] Add some test cases for (add (sext x), (sext y)) --> (sext (add int x, y)) and (mul (sext x), (sext y)) --> (sext (mul x, y)). NFC llvm-svn: 342203	2018-09-14 05:16:58 +00:00
Hideki Saito	ea7f3035a0	[VPlan] Implement initial vector code generation support for simple outer loops. Summary: [VPlan] Implement vector code generation support for simple outer loops. Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following: - force vector code generation using explicit vectorize_width - add conservative early returns in cost model and other places for VPlanNativePath - add code for setting up outer loop inductions - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches - support for generating uniform inner branches We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal Reviewed By: fhahn, hsaito Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D50820 llvm-svn: 342197	2018-09-14 00:36:00 +00:00
Roman Lebedev	d2316d756e	[NFC][InstCombine] PR38708 - inefficient pattern for high-bits checking 3. The last, non-canonical variant: https://godbolt.org/z/aCMsPk https://rise4fun.com/Alive/I6f It can only happen due to the extra use on the inner shift. But here it is ok. https://bugs.llvm.org/show_bug.cgi?id=38708 llvm-svn: 342184	2018-09-13 21:34:47 +00:00
Roman Lebedev	6dc87004fa	[InstCombine] Inefficient pattern for high-bits checking 2 (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only n bits wide (where n is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) More complicated, canonical pattern: https://rise4fun.com/Alive/uhA We do need to have two `switch()`'es like this, to not mismatch the swappable predicates. https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52001 llvm-svn: 342173	2018-09-13 20:33:12 +00:00
Roman Lebedev	083744852a	[NFC][InstCombine] Test what happens if 'unefficient high bit check' pattern is on both sides. Came up in https://reviews.llvm.org/D52001#1233827 While we don't do a good job here, we at least want to make sure that we don't have any inf-loops. llvm-svn: 342171	2018-09-13 20:33:02 +00:00
Craig Topper	8fc05ce340	[InstCombine] Fold (xor (min/max X, Y), -1) -> (max/min ~X, ~Y) when X and Y are freely invertible. This allows the xor to be removed completely. This might help with recomitting r341674, but seems good regardless. Coincidentally fixes PR38915. Differential Revision: https://reviews.llvm.org/D51964 llvm-svn: 342163	2018-09-13 18:52:58 +00:00
Craig Topper	3fc5e72d84	[InstCombine] Add test cases for D51964. NFC llvm-svn: 342162	2018-09-13 18:52:56 +00:00
Matt Arsenault	9de2fb58fa	AMDGPU: Fix some outdated datalayouts in tests llvm-svn: 342131	2018-09-13 11:56:28 +00:00
Max Kazantsev	b2724d9af8	[NFC] Add Requires: asserts where needed llvm-svn: 342108	2018-09-13 04:43:24 +00:00
Max Kazantsev	0e0e19c980	[NFC] Use expensive asserts in relevant LICM tests llvm-svn: 342107	2018-09-13 04:00:39 +00:00
Sanjay Patel	d341988c86	revert r341288 - [Reassociate] swap binop operands to increase factoring potential This causes or exposes indeterminism that is visible in the output of -reassociate. llvm-svn: 342083	2018-09-12 21:29:11 +00:00
Sanjay Patel	31017cd10a	[InstCombine] add tests for unsigned add overflow; NFC llvm-svn: 342082	2018-09-12 21:13:37 +00:00
Roman Lebedev	e14b0282bb	[NFC][InstCombine] Drop newly-added interference-tests-for-high-bit-check.ll Now that i have actually double-checked, no, there is no such interference possible... llvm-svn: 342076	2018-09-12 20:06:46 +00:00
Roman Lebedev	91c668a276	[NFC][InstCombine] R38708 - inefficient pattern for high-bits checking. More complicated, canonical pattern: https://rise4fun.com/Alive/uhA https://godbolt.org/z/o4RB8D Also, we need to be careful not to skip some patters... https://bugs.llvm.org/show_bug.cgi?id=38708 llvm-svn: 342074	2018-09-12 19:44:26 +00:00
Roman Lebedev	75404fb9f8	[InstCombine] Inefficient pattern for high-bits checking (PR38708) Summary: It is sometimes important to check that some newly-computed value is non-negative and only `n` bits wide (where `n` is a variable.) There are many ways to check that: https://godbolt.org/z/o4RB8D The last variant seems best? (I'm sure there are some other variations i haven't thought of..) Let's handle the second variant first, since it is much simpler. https://rise4fun.com/Alive/LYjY https://bugs.llvm.org/show_bug.cgi?id=38708 Reviewers: spatel, craig.topper, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51985 llvm-svn: 342067	2018-09-12 18:19:43 +00:00
Roman Lebedev	99359f391e	[NFC][InstCombine] R38708 - inefficient pattern for high-bits checking. The simplest pattern for now: https://rise4fun.com/Alive/LYjY https://godbolt.org/z/o4RB8D https://bugs.llvm.org/show_bug.cgi?id=38708 llvm-svn: 342054	2018-09-12 14:11:37 +00:00
David Green	e27e87cdcb	[CGP] Ensure splitgep gives deterministic output The output of splitLargeGEPOffsets does not appear to be deterministic because of the way that we iterate over a DenseMap. I've changed it to a MapVector for consistent output. The test here isn't particularly great, only showing a consmetic difference in output. The original reproducer is much larger but show a diffierence in instruction ordering, leading to different codegen. Differential Revision: https://reviews.llvm.org/D51851 llvm-svn: 342043	2018-09-12 10:19:10 +00:00
David Green	2352b30c96	[SimplifyCFG] Put an alignment on generated switch tables Previously the alignment on the newly created switch table data was not set, meaning that DataLayout::getPreferredAlignment was free to overalign it to 16 bytes. This causes unnecessary code bloat. Differential Revision: https://reviews.llvm.org/D51800 llvm-svn: 342039	2018-09-12 09:54:17 +00:00
Florian Hahn	1086ce2397	[LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC) Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use them for VPlan outside LoopVectorize.cpp Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00 Reviewed By: rengolin, xbolva00 Differential Revision: https://reviews.llvm.org/D49488 llvm-svn: 342027	2018-09-12 08:01:57 +00:00
Sanjay Patel	1cf0734b2f	[InstCombine] add folds for unsigned-overflow compares Name: op_ugt_sum %a = add i8 %x, %y %r = icmp ugt i8 %x, %a => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx Name: sum_ult_op %a = add i8 %x, %y %r = icmp ult i8 %a, %x => %notx = xor i8 %x, -1 %r = icmp ugt i8 %y, %notx https://rise4fun.com/Alive/ZRxI AFAICT, this doesn't interfere with any add-saturation patterns because those have >1 use for the 'add'. But this should be better for IR analysis and codegen in the basic cases. This is another fold inspired by PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 342004	2018-09-11 22:40:20 +00:00
Sanjay Patel	26725bdc50	[InstCombine] add folds for icmp with xor mask constant These are the folds in Alive; Name: xor_ult Pre: isPowerOf2(-C1) %xor = xor i8 %x, C1 %r = icmp ult i8 %xor, C1 => %r = icmp ugt i8 %x, ~C1 Name: xor_ugt Pre: isPowerOf2(C1+1) %xor = xor i8 %x, C1 %r = icmp ugt i8 %xor, C1 => %r = icmp ugt i8 %x, C1 https://rise4fun.com/Alive/Vty The ugt case in its simplest form was already handled by DemandedBits, but that's not ideal as shown in the multi-use test. I'm not sure if these are all of the symmetrical folds, but I adjusted the existing code for one of the folds to try to show the similarities. There's no obvious connection, but this is another preliminary step for PR14613... https://bugs.llvm.org/show_bug.cgi?id=14613 llvm-svn: 341997	2018-09-11 22:00:15 +00:00
Sanjay Patel	c79d964fdd	[InstCombine] add tests for icmp with xor; NFC llvm-svn: 341993	2018-09-11 21:13:20 +00:00
Alina Sbirlea	a496143c9e	Update MemorySSA in LoopUnswitch. Summary: Update MemorySSA in old LoopUnswitch pass. Actual dependency and update is disabled by default. Subscribers: sanjoy, jlebar, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D45301 llvm-svn: 341984	2018-09-11 19:19:21 +00:00
Sanjay Patel	342c3bcf11	[InstCombine] enhance vector demanded elements to look at a vector select condition operand I noticed that we were not back-propagating undef lanes to shuffle masks when we have a shuffle that reduces the vector width. This is part of investigating/solving PR38691: https://bugs.llvm.org/show_bug.cgi?id=38691 The DAG equivalent was proposed with: D51696 Differential Revision: https://reviews.llvm.org/D51433 llvm-svn: 341981	2018-09-11 18:49:00 +00:00
Sanjay Patel	44c1b3a331	[InstCombine] add tests for add-with-overflow compares; NFC llvm-svn: 341979	2018-09-11 18:45:28 +00:00
Craig Topper	4e63db8387	[InstCombine] Fix incorrect usage of getPrimitiveSizeInBits when we should be using the element size for vectors For vectors, getPrimitiveSizeInBits returns the full vector width. This code should using the element size for vectors. This could be fixed by calling getScalarSizeInBits, but its even easier to just get it from the APInt we're checking. Differential Revision: https://reviews.llvm.org/D51938 llvm-svn: 341971	2018-09-11 17:57:20 +00:00
Florian Hahn	5b7e21a6b7	[CallSiteSplitting] Add debug location to created PHI nodes. There are 2 cases when we create PHI nodes: * For the result of the call that was duplicated in the split blocks. Those PHI nodes should have the debug location of the call. * For values produced before the call. Those instructions need to be duplicated in the split blocks and the PHI nodes should have the debug locations of those instructions. Fixes PR37962. Reviewers: junbuml, gbedwell, vsk Reviewed By: junbuml Tags: #debug-info Differential Revision: https://reviews.llvm.org/D51919 llvm-svn: 341970	2018-09-11 17:55:58 +00:00
Craig Topper	a57bb61a3e	[InstCombine] Support (mul (sext x), cst) --> (sext (mul x, cst')) and (mul (zext x), cst) --> (zext (mul x, cst')) for vectors constants. Similar to D51236, but for mul instead of add. Differential Revision: https://reviews.llvm.org/D51900 llvm-svn: 341961	2018-09-11 16:51:24 +00:00
Alexandros Lamprineas	96762b37e1	[MemorySSAUpdater] Avoid creating self-referencing MemoryDefs Fix for https://bugs.llvm.org/show_bug.cgi?id=38807, which occurred while compiling SemaTemplateInstantiate.cpp with clang and GVNHoist enabled. In the following example: 1=def(entry) / \ 2=def(1) 4=def(1) 3=def(2) 5=def(4) When removing the MemoryDef 2=def(1) from its basic block, and just before adding it to the end of the parent basic block, we first replace all its uses with the defining memory access: 3=def(2) -> 3=def(1) Then we call insertDef for adding 2=def(1) to the parent basic block, where we replace the uses of 1=def(entry) with 2=def(1). Doing so we create a self reference: 2=def(1) -> 2=def(2) (bad) 3=def(1) -> 3=def(2) (ok) 4=def(1) -> 4=def(2) (ok) Differential Revision: https://reviews.llvm.org/D51801 llvm-svn: 341947	2018-09-11 14:29:59 +00:00
Johannes Doerfert	ae3cfeb3ad	[FuncAttrs] Remove "access range attributes" for read-none functions The presence of readnone and an access range attribute (argmemonly, inaccessiblememonly, inaccessiblemem_or_argmemonly) is considered an error by the verifier. This seems strict but also not wrong. This patch makes sure function attribute detection will remove all access range attributes for readnone functions. llvm-svn: 341927	2018-09-11 11:51:29 +00:00
Max Kazantsev	9aacaffd98	[NFC] Specify test's option to reduce reliance on defaults llvm-svn: 341904	2018-09-11 06:34:43 +00:00

1 2 3 4 5 ...

11553 Commits