llvm-project

Commit Graph

Author	SHA1	Message	Date
Benjamin Kramer	14e915f7b4	InstCombine: Don't claim to be able to evaluate any shl in a zexted type. The shift amount may be larger than the type leading to undefined behavior. Limit the transform to constant shift amounts. While there update the bits to clear in the result which may enable additional optimizations. PR15959. llvm-svn: 181604	2013-05-10 16:26:37 +00:00
Benjamin Kramer	a6645e8b8f	InstCombine: Verify the type before transforming uitofp into select. PR15952. llvm-svn: 181586	2013-05-10 09:16:52 +00:00
Rafael Espindola	007521673b	Don't replace an alias in llvm.used with its target. When we replace an internal alias with its target, be careful not to replace the entry in llvm.used (and llvm.compiler_used). llvm-svn: 181524	2013-05-09 17:22:59 +00:00
Benjamin Kramer	21b972ae94	InstCombine: Don't just copy known bits from the first operand of an srem. That's obviously wrong. Conservatively restrict it to the sign bit, which matches the original intention of this analysis. Fixes PR15940. llvm-svn: 181518	2013-05-09 16:32:32 +00:00
Arnold Schwaighofer	2e8c69cf97	LoopVectorizer: Don't assert on the absence of induction variables A computable loop exit count does not imply the presence of an induction variable. Scalar evolution can return a value for an infinite loop. Fixes PR15926. llvm-svn: 181495	2013-05-09 00:32:18 +00:00
Daniel Malea	abdd7f197a	Revert 181475 as the DebugIR tests are breaking (automake) buildbots that re-use build dirs - the temporaries "-debug.ll" files generated by DebugIR pass are considered tests, even though they are not llvm-svn: 181476	2013-05-08 21:55:31 +00:00
Daniel Malea	0233db70f0	DebugIR tests -- lit tests for the line number transform - simple one-function case - function-calling case - external function calling case - exception throwing case - vector case Note: these tests are somewhat coupled to the current format of debug metadata. llvm-svn: 181469	2013-05-08 21:03:00 +00:00
Arnold Schwaighofer	3610139ac5	LoopVectorizer: Improve reduction variable identification The two nested loops were confusing and also conservative in identifying reduction variables. This patch replaces them by a worklist based approach. llvm-svn: 181369	2013-05-07 21:55:37 +00:00
Arnold Schwaighofer	e78b76fbed	LoopVectorize: getConsecutiveVector must respect signed arithmetic We were passing an i32 to ConstantInt::get where an i64 was needed and we must also pass the sign if we pass negatives numbers. The start index passed to getConsecutiveVector must also be signed. Should fix PR15882. llvm-svn: 181286	2013-05-07 04:37:05 +00:00
David Majnemer	70f286d95f	InstCombine: (X ^ signbit) + C -> X + (signbit ^ C) llvm-svn: 181249	2013-05-06 21:21:31 +00:00
Jean-Luc Duprat	1610d571db	Test results verified using FileCheck rather than grep \| count llvm-svn: 181234	2013-05-06 18:45:16 +00:00
Andrew Trick	9c72b071fe	Rotate multi-exit loops even if the latch was simplified. Test case by Michele Scandale! Fixes PR10293: Load not hoisted out of loop with multiple exits. There are few regressions with this patch, now tracked by rdar:13817079, and a roughly equal number of improvements. The regressions are almost certainly back luck because LoopRotate has very little idea of whether rotation is profitable. Doing better requires a more comprehensive solution. This checkin is a quick fix that lacks generality (PR10293 has a counter-example). But it trivially fixes the case in PR10293 without interfering with other cases, and it does satify the criteria that LoopRotate is a loop canonicalization pass that should avoid heuristics and special cases. I can think of two approaches that would probably be better in the long run. Ultimately they may both make sense. (1) LoopRotate should check that the current header would make a good loop guard, and that the loop does not already has a sufficient guard. The artifical SimplifiedLoopLatch check would be unnecessary, and the design would be more general and canonical. Two difficulties: - We need a strong guarantee that we won't endlessly rotate, so the analysis would need to be precise in order to avoid the SimplifiedLoopLatch precondition. - Analysis like this are usually based on SCEV, which we don't want to rely on. (2) Rotate on-demand in late loop passes. This could even be done by shoving the loop back on the queue after the optimization that needs it. This could work well when we find LICM opportunities in multi-branch loops. This requires some work, and it doesn't really solve the problem of SCEV wanting a loop guard before the analysis. llvm-svn: 181230	2013-05-06 17:58:18 +00:00
Jean-Luc Duprat	4189ef456a	Fix add4.ll test cmdline so that it passes llvm-svn: 181219	2013-05-06 17:18:47 +00:00
Jean-Luc Duprat	3e4fc3ef24	Provide InstCombines for the following 3 cases: A * (1 - (uitofp i1 C)) -> select C, 0, A B * (uitofp i1 C) -> select C, B, 0 select C, 0, A + select C, B, 0 -> select C, B, A These come up in code that has been hand-optimized from a select to a linear blend, on platforms where that may have mattered. We want to undo such changes with the following transform: A(1 - uitofp i1 C) + B(uitofp i1 C) -> select C, A, B llvm-svn: 181216	2013-05-06 16:55:50 +00:00
Nadav Rotem	c70ef4e93c	Revert r164763 because it introduces new shuffles. Thanks Nick Lewycky for pointing this out. llvm-svn: 181177	2013-05-06 02:39:09 +00:00
Matt Arsenault	c23753a53e	Fix unchecked uses of DominatorTree in MemoryDependenceAnalysis. Use unknown results for places where it would be needed llvm-svn: 181176	2013-05-06 02:07:24 +00:00
Rafael Espindola	c229a4fff4	Fix const merging when an alias of a const is llvm.used. We used to disable constant merging not only if a constant is llvm.used, but also if an alias of a constant is llvm.used. This change fixes that. llvm-svn: 181175	2013-05-06 01:48:55 +00:00
Arnold Schwaighofer	d96e427eac	LoopVectorize: Add support for floating point min/max reductions Add support for min/max reductions when "no-nans-float-math" is enabled. This allows us to assume we have ordered floating point math and treat ordered and unordered predicates equally. radar://13723044 llvm-svn: 181144	2013-05-05 01:54:48 +00:00
Arnold Schwaighofer	a670a0a3aa	LoopVectorize: We don't need an identity element for min/max reductions We can just use the initial element that feeds the reduction. max(max(x, y), z) == max(max(x,y), max(x,z)) radar://13723044 llvm-svn: 181141	2013-05-05 01:54:42 +00:00
Nadav Rotem	4ce060b3da	LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values. By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements. We can now vectorize this loop: int foo(int A, int B, int n) { for (int i=0; i < n; i++) { int x = 9; if (A[i] > B[i]) { if (A[i] > 19) { x = 3; } else if (B[i] < 4 ) { x = 4; } else { x = 5; } } A[i] = x; } } llvm-svn: 181037	2013-05-03 17:42:55 +00:00
Manman Ren	16649b0107	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180935	2013-05-02 18:11:35 +00:00
David Majnemer	a18dfe6b96	Add a test for the foldSelectICmpAndOr fix committed in r180779. This tests a case where C1 and C2 were the same but X and Y were different widths. llvm-svn: 180907	2013-05-02 02:44:23 +00:00
Nadav Rotem	1e211913b5	SROA: Generate selects instead of shuffles when blending values because this is the cannonical form. Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often. llvm-svn: 180875	2013-05-01 19:53:30 +00:00
Jim Grosbach	d11584a7f7	Revert "InstCombine: Fold more shuffles of shuffles." This reverts commit r180802 There's ongoing discussion about whether this is the right place to make this transformation. Reverting for now while we figure it out. llvm-svn: 180834	2013-05-01 00:25:27 +00:00
Jim Grosbach	0b914fe839	InstCombine: Fold more shuffles of shuffles. Always fold a shuffle-of-shuffle into a single shuffle when there's only one input vector in the first place. Continue to be more conservative when there's multiple inputs. rdar://13402653 PR15866 llvm-svn: 180802	2013-04-30 20:43:52 +00:00
Manman Ren	1a5ff287fd	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796	2013-04-30 17:52:57 +00:00
David Majnemer	8d048d0482	Fix "Combine bit test + conditional or into simple math" This fixes the optimization introduced in r179748 and reverted in r179750. While the optimization was sound, it did not properly respect differences in bit-width. llvm-svn: 180777	2013-04-30 08:57:58 +00:00
Arnold Schwaighofer	474df6d3ed	SimplifyCFG: If convert single conditional stores This resurrects r179957, but adds code that makes sure we don't touch atomic/volatile stores: This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case where the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. llvm-svn: 180731	2013-04-29 21:28:24 +00:00
Michael Gottesman	214ca90f8e	[objc-arc] Apply the RV optimization to retains next to calls in ObjCARCContract instead of ObjCARCOpts. Turning retains into retainRV calls disrupts the data flow analysis in ObjCARCOpts. Thus we move it as late as we can by moving it into ObjCARCContract. We leave in the conversion from retainRV -> retain in ObjCARCOpt since it enables the dataflow analysis. rdar://10813093 llvm-svn: 180698	2013-04-29 06:53:53 +00:00
Shuxin Yang	04a4fd43aa	Fix a XOR reassociation bug. When Reassociator optimize "(x \| C1)" ^ "(X & C2)", it may swap the two subexpressions, however, it forgot to swap cached constants (of C1 and C2) accordingly. rdar://13739160 llvm-svn: 180676	2013-04-27 18:02:12 +00:00
Michael Gottesman	b33b6cb84a	[objc-arc] Test cleanups. Mainly adding paranoid checks for the closing brace of a function to help with FileCheck error readability. Also some other minor changes. No actual CHECK changes. llvm-svn: 180668	2013-04-27 05:25:54 +00:00
Nadav Rotem	13306816fc	LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes. llvm-svn: 180593	2013-04-26 05:08:59 +00:00
Nadav Rotem	f43cbeee15	LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers. llvm-svn: 180570	2013-04-25 19:55:03 +00:00
Arnold Schwaighofer	a6578f7056	LoopVectorize: Scalarize padded types This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196	2013-04-24 16:16:01 +00:00
Arnold Schwaighofer	23a0589bce	LoopVectorizer: Bail out if we don't have datalayout we need it llvm-svn: 180195	2013-04-24 16:15:58 +00:00
Nadav Rotem	71c9d6d333	LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order. This fixes a miscompilation in FreeBSD's regex library. llvm-svn: 180121	2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen	d3c90e132a	Call the potentially costly isAnnotatedParallel() only once. Made the uniform write test's checks a bit stricter. llvm-svn: 180119	2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen	6f2f66b63f	Refuse to (even try to) vectorize loops which have uniform writes, even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081	2013-04-23 08:08:51 +00:00
Anat Shemer	10260a75e3	Changed back (relative to commit 179786) the operations executed when extract(cast) is transformed to cast(extract). It uses the Builder class as before. In addition the result node is added to the Worklist, so all the previous extract users will become the new scalar cast users. llvm-svn: 180045	2013-04-22 20:51:10 +00:00
David Blaikie	f55abeaf4c	Revert "Revert "PR14606: debug info imported_module support"" This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even though the debug info was clearly invalid on all of them, but this ought to fix it. llvm-svn: 179996	2013-04-22 06:12:31 +00:00
Benjamin Kramer	0212dc27ed	SROA: Don't crash on a select with two identical operands. This is an edge case that can happen if we modify a chain of multiple selects. Update all operands in that case and remove the assert. PR15805. llvm-svn: 179982	2013-04-21 17:48:39 +00:00
Arnold Schwaighofer	6eb32b31bd	Revert "SimplifyCFG: If convert single conditional stores" There is the temptation to make this tranform dependent on target information as it is not going to be beneficial on all (sub)targets. Therefore, we should probably do this in MI Early-Ifconversion. This reverts commit r179957. Original commit message: "SimplifyCFG: If convert single conditional stores This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up." llvm-svn: 179980	2013-04-21 13:09:04 +00:00
Nadav Rotem	c57af326a4	SLPVectorize: Add support for vectorization of casts. llvm-svn: 179975	2013-04-21 08:05:59 +00:00
Michael Gottesman	d5b701faf1	[objc-arc] Cleaned up tail-call-invariant-enforcement.ll. Specifically: 1. Added checks that unwind is being properly added to various instructions. 2. Fixed the declaration/calling of objc_release to have a return type of void. 3. Moved all checks to precede the functions and added checks to ensure that the checks would only match inside the specific function that we are attempting to check. llvm-svn: 179973	2013-04-21 02:59:44 +00:00
Michael Gottesman	77aa946321	[objc-arc] Check that objc-arc-expand properly handles all strictly forwarding calls and does not touch calls which are not strictly forwarding (i.e. objc_retainBlock). llvm-svn: 179972	2013-04-21 01:57:46 +00:00
Michael Gottesman	524052fec1	[objc-arc] Renamed the test file clang-arc-used-intrinsic-removed-if-isolated.ll -> intrinsic-use-isolated.ll to match the other test file intrinsic-use.ll. llvm-svn: 179971	2013-04-21 01:42:24 +00:00
Nadav Rotem	8aca44a623	Fix PR15800. Do not try to vectorize vectors and structs. llvm-svn: 179960	2013-04-20 22:29:43 +00:00
Arnold Schwaighofer	3546ccf465	SimplifyCFG: If convert single conditional stores This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up. llvm-svn: 179957	2013-04-20 21:42:09 +00:00
Nuno Lopes	36e827602a	recommit tests llvm-svn: 179955	2013-04-20 17:39:52 +00:00
Nadav Rotem	83c7c41bc2	SLPVectorizer: Improve the cost model for loop invariant broadcast values. llvm-svn: 179930	2013-04-20 06:13:47 +00:00
Benjamin Kramer	630e6e1422	MergeFunc: Make pointer and integer types generate the same hash. The logic that actually compares the types considers pointers and integers the same if they are of the same size. This created a strange mismatch between hash and reality and made the test case for this fail on some platforms (yay, test cases). llvm-svn: 179905	2013-04-19 23:06:44 +00:00
Bill Wendling	24e8a0d5f0	Make variable match any name. llvm-svn: 179903	2013-04-19 22:30:43 +00:00
Bill Wendling	81c8cf5ef9	Try explicitly setting the target triple to see if this gets it to pass on ARM. llvm-svn: 179890	2013-04-19 21:24:51 +00:00
Chad Rosier	11ebe05643	Attempt to pacify this test for the buildbots. llvm-svn: 179874	2013-04-19 19:27:33 +00:00
Bill Wendling	b670649067	Add test to make sure that a int-to-ptr can be merged correctly. llvm-svn: 179869	2013-04-19 18:16:06 +00:00
Benjamin Kramer	ec1bb4fdaf	ConstantFolding: ComputeMaskedBits wants the scalar size for vectors. Fixes PR15791. llvm-svn: 179859	2013-04-19 16:56:24 +00:00
Jakub Staszak	9b59d14fc4	Revert 179826. Tests were worthless. llvm-svn: 179845	2013-04-19 09:32:30 +00:00
Eric Christopher	0e89ade8ff	Revert "PR14606: debug info imported_module support" This reverts commit r179836 as it seems to have caused test failures. llvm-svn: 179840	2013-04-19 07:47:16 +00:00
David Blaikie	88564f3cf7	PR14606: debug info imported_module support Adding another CU-wide list, in this case of imported_modules (since they should be relatively rare, it seemed better to add a list where each element had a "context" value, rather than add a (usually empty) list to every scope). This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll need to expand this to cover DW_TAG_imported_declaration too. llvm-svn: 179836	2013-04-19 06:57:04 +00:00
Jakub Staszak	2c1daf75b9	Don't run expensive -O2 and -O3 in tests. llvm-svn: 179825	2013-04-19 01:10:45 +00:00
Anat Shemer	5570318f43	In the function InstCombiner::visitExtractElementInst() removed the limitation that extract is promoted over a cast only if the cast has only one use. llvm-svn: 179786	2013-04-18 19:56:44 +00:00
Anat Shemer	0c95efad7e	Added a function scalarizePHI() that sclarizes a vector phi instruction if it has only 2 uses: one to promote the vector phi in a loop and the other use is an extract operation of one element at a constant location. llvm-svn: 179783	2013-04-18 19:35:39 +00:00
Arnold Schwaighofer	4cd6aa110c	LoopVectorizer: Recognize min/max reductions A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y) sequence in LLVM. If we see such a sequence we can treat it just as any other commutative binary instruction and reduce it. This appears to help bzip2 by about 1.5% on an imac12,2. radar://12960601 llvm-svn: 179773	2013-04-18 17:22:34 +00:00
Benjamin Kramer	8df2cfb858	LoopVectorize: Use a set to avoid longer cycles in the reduction chain too. Fixes PR15748. llvm-svn: 179757	2013-04-18 14:29:13 +00:00
David Majnemer	81af06e003	Revert "Combine bit test + conditional or into simple math" It is causing stage2 builds to fail, let's get them running again. llvm-svn: 179750	2013-04-18 08:42:33 +00:00
David Majnemer	bdf0caf6b1	Combine bit test + conditional or into simple math Simplify: (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) Into: (or (shl (and X, C1), C3), y) Where: C3 = Log(C2) - Log(C1) If: C1 and C2 are both powers of two llvm-svn: 179748	2013-04-18 07:30:07 +00:00
Michael Gottesman	323964ca9e	[objc-arc] Do not mismatch up retains inside a for loop with releases outside said for loop in the presense of differing provenance caused by escaping blocks. This occurs due to an alloca representing a separate ownership from the original pointer. Thus consider the following pseudo-IR: objc_retain(%a) for (...) { objc_retain(%a) %block <- %a F(%block) objc_release(%block) } objc_release(%a) From the perspective of the optimizer, the %block is a separate provenance from the original %a. Thus the optimizer pairs up the inner retain for %a and the outer release from %a, resulting in segfaults. This is fixed by noting that the signature of a mismatch of retain/releases inside the for loop is a Use/CanRelease top down with an None bottom up (since bottom up the Retain-CanRelease-Use-Release sequence is completed by the inner objc_retain, but top down due to the differing provenance from the objc_release said sequence is not completed). In said case in CheckForCFGHazards, we now clear the state of %a implying that no pairing will occur. Additionally a test case is included. rdar://12969722 llvm-svn: 179747	2013-04-18 05:39:45 +00:00
Michael Gottesman	a15ab25238	Streamline arc-annotation test (removing some cases which do not add any extra coverage) and set it up to use FileCheck variables to make the test more robust. llvm-svn: 179745	2013-04-18 04:34:06 +00:00
Peter Collingbourne	37ae72b508	Do not optimise fprintf() calls if its return value is used. Differential Revision: http://llvm-reviews.chandlerc.com/D620 llvm-svn: 179661	2013-04-17 02:01:10 +00:00
Hans Wennborg	c9e1d99279	simplifycfg: Fix integer overflow converting switch into icmp. If a switch instruction has a case for every possible value of its type, with the same successor, SimplifyCFG would replace it with an icmp ult, but the computation of the bound overflows in that case, which inverts the test. Patch by Jed Davis! llvm-svn: 179587	2013-04-16 08:35:36 +00:00
Bill Wendling	3789171972	We are not able to bitcast a pointer to an integral value. Two return types are not equivalent if one is a pointer and the other is an integral. This is because we cannot bitcast a pointer to an integral value. PR15185 llvm-svn: 179569	2013-04-15 22:33:50 +00:00
Nadav Rotem	b9116e6966	SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops. llvm-svn: 179562	2013-04-15 22:00:26 +00:00
Eric Christopher	13637e900e	Revert "Recommit r179497 after fixing uninitialized variable." until I can fix the testcases here: http://lab.llvm.org:8011/builders/clang-native-arm-cortex-a9/builds/6952 This reverts commit r179512 due to testcases specifying triples that they didn't actually mean and causing failures on other platforms. llvm-svn: 179513	2013-04-15 07:31:37 +00:00
Eric Christopher	fc2beaa136	Recommit r179497 after fixing uninitialized variable. llvm-svn: 179512	2013-04-15 07:07:21 +00:00
Nadav Rotem	5d393c416f	SLPVectorizer: Add support for vectorizing trees that start at compare instructions. llvm-svn: 179504	2013-04-15 04:25:27 +00:00
Eric Christopher	1f140317e3	Revert "Remove some unused triple and data layout." This reverts commit r179497 and the accompanying commit as it broke random platforms that aren't osx. llvm-svn: 179499	2013-04-14 23:35:36 +00:00
Eric Christopher	4eebd14ad0	Remove some unused triple and data layout. llvm-svn: 179498	2013-04-14 23:32:44 +00:00
David Majnemer	1fae195557	Reorders two transforms that collide with each other One performs: (X == 13 \| X == 14) -> X-13 <u 2 The other: (A == C1 \|\| A == C2) -> (A & ~(C1 ^ C2)) == C1 The problem is that there are certain values of C1 and C2 that trigger both transforms but the first one blocks out the second, this generates suboptimal code. Reordering the transforms should be better in every case and allows us to do interesting stuff like turn: %shr = lshr i32 %X, 4 %and = and i32 %shr, 15 %add = add i32 %and, -14 %tobool = icmp ne i32 %add, 0 into: %and = and i32 %X, 240 %tobool = icmp ne i32 %and, 224 llvm-svn: 179493	2013-04-14 21:15:43 +00:00
Nadav Rotem	6ebddae118	Make the command line triple match the module triple. llvm-svn: 179492	2013-04-14 20:13:05 +00:00
Nadav Rotem	029208ceeb	Remove unused function attributes. llvm-svn: 179476	2013-04-14 05:47:04 +00:00
Nadav Rotem	54b413d157	SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree. llvm-svn: 179475	2013-04-14 05:15:53 +00:00
Nadav Rotem	0b9cf8567b	SLPVectorizer: add initial support for reduction variable vectorization. llvm-svn: 179470	2013-04-14 03:22:20 +00:00
Benjamin Kramer	adc1727c39	GlobalDCE: Fix an oversight in my last commit that could lead to crashes. There is a Constant with non-constant operands: blockaddress. llvm-svn: 179460	2013-04-13 16:11:14 +00:00
Benjamin Kramer	89ca4bc6d4	Fix a scalability issue with complex ConstantExprs. This is basically the same fix in three different places. We use a set to avoid walking the whole tree of a big ConstantExprs multiple times. For example: (select cmp, (add big_expr 1), (add big_expr 2)) We don't want to visit big_expr twice here, it may consist of thousands of nodes. The testcase exercises this by creating an insanely large ConstantExprs out of a loop. It's questionable if the optimizer should ever create those, but this can be triggered with real C code. Fixes PR15714. llvm-svn: 179458	2013-04-13 12:53:18 +00:00
Benjamin Kramer	e89c705030	InstCombine: Check the operand types before merging fcmp ord & fcmp ord. Fixes PR15737. llvm-svn: 179417	2013-04-12 21:56:23 +00:00
Nadav Rotem	8543ba3e52	SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. llvm-svn: 179414	2013-04-12 21:16:54 +00:00
Nadav Rotem	87a0af6e1b	CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles. llvm-svn: 179413	2013-04-12 21:15:03 +00:00
David Majnemer	1a08accbb7	Simplify (A & ~B) in icmp if A is a power of 2 The transform will execute like so: (A & ~B) == 0 --> (A & B) != 0 (A & ~B) != 0 --> (A & B) == 0 llvm-svn: 179386	2013-04-12 17:25:07 +00:00
Arnold Schwaighofer	f9cea17f75	LoopVectorizer: integer division is not a reduction operation Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381	2013-04-12 15:15:19 +00:00
David Majnemer	b81cd63c4b	Optimize icmp involving addition better Allows LLVM to optimize sequences like the following: %add = add nsw i32 %x, 1 %cmp = icmp sgt i32 %add, %y into: %cmp = icmp sge i32 %x, %y as well as: %add1 = add nsw i32 %x, 20 %add2 = add nsw i32 %y, 57 %cmp = icmp sge i32 %add1, %add2 into: %add = add nsw i32 %y, 37 %cmp = icmp sle i32 %cmp, %x llvm-svn: 179316	2013-04-11 20:05:46 +00:00
Benjamin Kramer	a95f87494a	Fix for wrong instcombine on vector insert/extract When trying to collapse sequences of insertelement/extractelement instructions into single shuffle instructions, there is one specific case where the Instruction Combiner wrongly updates the resulting Mask of shuffle indexes. The problem is in function CollectShuffleElments. If we have a sequence of insert/extract element instructions like the one below: %tmp1 = extractelement <4 x float> %LHS, i32 0 %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1 %tmp3 = extractelement <4 x float> %RHS, i32 2 %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3 Where: . %RHS will have a mask of [4,5,6,7] . %LHS will have a mask of [0,1,2,3] The Mask of shuffle indexes is wrongly computed to [4,1,6,7] instead of [4,0,6,7]. When analyzing %tmp2 in order to compute the Mask for the resulting shuffle instruction, the algorithm forgets to update the mask index at position 1 with the index associated to the element extracted from %LHS by instruction %tmp1. Patch by Andrea DiBiagio! llvm-svn: 179291	2013-04-11 15:10:09 +00:00
Benjamin Kramer	b50682e156	Add missing colons to check lines. llvm-svn: 179277	2013-04-11 12:41:41 +00:00
Benjamin Kramer	3960c1cd56	FileCheckize a bunch of tests. llvm-svn: 179276	2013-04-11 12:32:23 +00:00
Nadav Rotem	73dffa4184	Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions. llvm-svn: 179207	2013-04-10 19:41:36 +00:00
Nadav Rotem	2d9dec322e	Add support for bottom-up SLP vectorization infrastructure. This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations. The infrastructure has three potential users: 1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]). 2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute. 3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization. This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code: void SAXPY(int x, int y, int a, int i) { x[i] = a * x[i] + y[i]; x[i+1] = a * x[i+1] + y[i+1]; x[i+2] = a * x[i+2] + y[i+2]; x[i+3] = a * x[i+3] + y[i+3]; } llvm-svn: 179117	2013-04-09 19:44:35 +00:00
Nadav Rotem	abcc64fd13	Revert r176408 and r176407 to address PR15540. llvm-svn: 179111	2013-04-09 18:16:05 +00:00
Michael Gottesman	ccc93e72e1	Converted 8x tests of SimplifyCFG to use FileCheck instead of grep. llvm-svn: 179087	2013-04-09 05:18:53 +00:00
Nadav Rotem	7b7585d153	Revert 179071 because it is not the right way to support non standard new/new[] operators. llvm-svn: 179084	2013-04-09 04:43:46 +00:00
Nadav Rotem	9dd90ac5b4	c++ new operators are not malloc-like functions because they do not return uninitialized memory. Users may overide new-operators and implement any function that they like. llvm-svn: 179071	2013-04-08 23:40:47 +00:00
Chandler Carruth	0e8a52d18f	Fix PR15674 (and PR15603): a SROA think-o. The fix for PR14972 in r177055 introduced a real think-o in the store side, likely because I was much more focused on the load side. While we can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily widen a value to be stored, as that changes the width of memory access! Lock down the code path in the store rewriting which would do this to only handle the intended circumstance. All of the existing tests continue to pass, and I've added a test from the PR. llvm-svn: 178974	2013-04-07 11:47:54 +00:00
Michael Gottesman	31ba23aa56	An objc_retain can serve as a use for a different pointer. This is the counterpart to commit r160637, except it performs the action in the bottomup portion of the data flow analysis. llvm-svn: 178922	2013-04-05 22:54:32 +00:00
Michael Gottesman	1d8d25777d	Properly model precise lifetime when given an incomplete dataflow sequence. The normal dataflow sequence in the ARC optimizer consists of the following states: Retain -> CanRelease -> Use -> Release The optimizer before this patch stored the uses that determine the lifetime of the retainable object pointer when it bottom up hits a retain or when top down it hits a release. This is correct for an imprecise lifetime scenario since what we are trying to do is remove retains/releases while making sure that no ``CanRelease'' (which is usually a call) deallocates the given pointer before we get to the ``Use'' (since that would cause a segfault). If we are considering the precise lifetime scenario though, this is not correct. In such a situation, we DO care about the previous sequence, but additionally, we wish to track the uses resulting from the following incomplete sequences: Retain -> CanRelease -> Release (TopDown) Retain <- Use <- Release (BottomUp) NOTE This patch looks large but the most of it consists of updating test cases. Additionally this fix exposed an additional bug. I removed the test case that expressed said bug and will recommit it with the fix in a little bit. llvm-svn: 178921	2013-04-05 22:54:28 +00:00
Shuxin Yang	95adf5258f	Disable the optimization about promoting vector-element-access with symbolic index. This optimization is unstable at this moment; it 1) block us on a very important application 2) PR15200 3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll (the CHECK command compare the output against wrong result) I personally believe this optimization should not have any impact on the autovectorized code, as auto-vectorizer is supposed to put gather/scatter in a "right" way. Although in theory downstream optimizaters might reveal some gather/scatter optimization opportunities, the chance is quite slim. For the hand-crafted vectorizing code, in term of redundancy elimination, load-CSE, copy-propagation and DSE can collectively achieve the same result, but in much simpler way. On the other hand, these optimizers are able to improve the code in a incremental way; in contrast, SROA is sort of all-or-none approach. However, SROA might slighly win in stack size, as it tries to figure out a stretch of memory tightenly cover the area accessed by the dynamic index. rdar://13174884 PR15200 llvm-svn: 178912	2013-04-05 21:07:08 +00:00
Arnold Schwaighofer	df6f67ed87	LoopVectorizer: Pass OperandValueKind information to the cost model Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809	2013-04-04 23:26:27 +00:00
Michael Gottesman	b8c8836594	Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue. The semantics of ARC implies that a pointer passed into an objc_autorelease must live until some point (potentially down the stack) where an autorelease pool is popped. On the other hand, an objc_autoreleaseReturnValue just signifies that the object must live until the end of the given function at least. Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in terms of the semantics of ARC* implying that performing the given strength reduction without any knowledge of how this relates to the autorelease pool pop that is further up the stack violates the semantics of ARC. *Even though objc_autoreleaseReturnValue if you know that no RV optimization will occur is more computationally expensive. llvm-svn: 178612	2013-04-03 02:57:24 +00:00
Bill Wendling	88d06c3b2d	Use a worklist to avoid a sneaky iterator invalidation. The iterator could be invalidated when it's recursively deleting a whole bunch of constant expressions in a constant initializer. Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was run on a `.ll' file, it wouldn't crash. This is why the test first pushes the `.ll' file through `llvm-as' before feeding it to `opt'. PR15440 llvm-svn: 178531	2013-04-02 08:16:45 +00:00
Shuxin Yang	6662fd0f15	Correct assertion condition llvm-svn: 178484	2013-04-01 18:13:05 +00:00
Benjamin Kramer	52ceb44331	X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. llvm-svn: 178459	2013-04-01 10:23:49 +00:00
Shuxin Yang	7b0c94e207	Implement XOR reassociation. It is based on following rules: rule 1: (x \| c1) ^ c2 => (x & ~c1) ^ (c1^c2), only useful when c1=c2 rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2)) rule 3: (x \| c1) ^ (x \| c2) = (x & c3) ^ c3 where c3 = c1 ^ c2 rule 4: (x \| c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2 It reduces an application's size (in terms of # of instructions) by 8.9%. Reviwed by Pete Cooper. Thanks a lot! rdar://13212115 llvm-svn: 178409	2013-03-30 02:15:01 +00:00
Michael Gottesman	9412830090	Updated test0 of retain-not-declared.ll to reflect the fact that objc-arc-expand runs before objc-arc/objc-arc-contract. Specifically, objc-arc-expand will make sure that the objc_retainAutoreleasedReturnValue, objc_autoreleaseReturnValue, and ret will all have %call as an argument. llvm-svn: 178382	2013-03-29 22:44:59 +00:00
Michael Gottesman	3b8f877860	Add clang.arc.used to ModuleHasARC so ARC always runs if said call is present in a module. clang.arc.used is an interesting call for ARC since ObjCARCContract needs to run to remove said intrinsic to avoid a linker error (since the call does not exist). llvm-svn: 178369	2013-03-29 21:15:23 +00:00
Michael Gottesman	49f9885a2a	Non optimizable objc_retainBlock calls are not forwarding. Since we handle optimizable objc_retainBlocks through strength reduction in OptimizableIndividualCalls, we know that all code after that point will only see non-optimizable objc_retainBlock calls. IsForwarding is only called by functions after that point, so it is ok to just classify objc_retainBlock as non-forwarding. <rdar://problem/13249661>. llvm-svn: 178285	2013-03-28 20:11:30 +00:00
Michael Gottesman	158fdf699e	[ObjCARC] Strength reduce objc_retainBlock -> objc_retain if the objc_retainBlock is optimizable. If an objc_retainBlock has the copy_on_escape metadata attached to it AND if the block pointer argument only escapes down the stack, we are allowed to strength reduce the objc_retainBlock to to an objc_retain and thus optimize it. Current there is logic in the ARC data flow analysis to handle this case which is complicated and involved making distinctions in between objc_retainBlock and objc_retain in certain places and considering them the same in others. This patch simplifies said code by: 1. Performing the strength reduction in the initial ARC peephole analysis (ObjCARCOpts::OptimizeIndividualCalls). 2. Changes the ARC dataflow analysis (which runs after the peephole analysis) to consider all objc_retainBlock calls to not be optimizable (since if the call was optimizable, we would have strength reduced it already). This patch leaves in the infrastructure in the ARC dataflow analysis to handle this case, which due to 2 will just be dead code. I am doing this on purpose to separate the removal of the old code from the testing of the new code. <rdar://problem/13249661>. llvm-svn: 178284	2013-03-28 20:11:19 +00:00
Akira Hatanaka	19468cafad	Remove -O3. llvm-svn: 178278	2013-03-28 19:34:14 +00:00
David Blaikie	5692e72f30	Revert "Adding DIImportedModules to DIScopes." This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7. Turns out we're going with a different schema design to represent DW_TAG_imported_modules so we won't need this extra field. llvm-svn: 178215	2013-03-28 02:44:59 +00:00
Akira Hatanaka	99866dd535	Check if Type is a vector before calling function Type::getVectorNumElements. llvm-svn: 178208	2013-03-28 01:28:02 +00:00
Michael Gottesman	46ebe53ead	Added back in the test for arc-annotations. The test was removed since I had not turned off the test during release builds. This fails since ARC annotations support is conditionally compiled out during release builds. I added the proper requires header to assuage this issue. llvm-svn: 178101	2013-03-27 00:09:58 +00:00
David Blaikie	a26d70358f	Adding DIImportedModules to DIScopes. This is just the basic groundwork for supporting DW_TAG_imported_module but I wanted to commit this before pushing support further into Clang or LLVM so that this rather churny change is isolated from the rest of the work. The major churn here is obviously adding another field (within the common DIScope prefix) to all DIScopes (files, classes, namespaces, lexical scopes, etc). This should be the last big churny change needed for DW_TAG_imported_module/using directive support/PR14606. llvm-svn: 178099	2013-03-27 00:07:26 +00:00
Ulrich Weigand	b1e02b2af2	Add test case for commit r178031. llvm-svn: 178038	2013-03-26 17:30:02 +00:00
Bill Wendling	1ee6d8cef4	Remove testcase. It's failing on some platforms but not others. llvm-svn: 177956	2013-03-26 01:10:03 +00:00
Bill Wendling	d78cfa81be	Hmm...not failing...odd llvm-svn: 177955	2013-03-26 01:08:02 +00:00
Bill Wendling	a5a44d4fd6	Temporarily XFAIL this test until Michael can look at it. llvm-svn: 177953	2013-03-26 00:46:31 +00:00
Michael Gottesman	cd4de0f9bb	[ObjCARC Annotations] Added support for displaying the state of pointers at the bottom/top of BBs of the ARC dataflow analysis for both bottomup and topdown analyses. This will allow for verification and analysis of the merge function of the data flow analyses in the ARC optimizer. The actual implementation of this feature is by introducing calls to the functions llvm.arc.annotation.{bottomup,topdown}.{bbstart,bbend} which are only declared. Each such call takes in a pointer to a global with the same name as the pointer whose provenance is being tracked and a pointer whose name is one of our Sequence states and points to a string that contains the same name. To ensure that the optimizer does not consider these annotations in any way, I made it so that the annotations are considered to be of IC_None type. A test case is included for this commit and the previous ObjCARCAnnotation commit. llvm-svn: 177952	2013-03-26 00:42:09 +00:00
John McCall	a237239097	Add an optimizer-side test case for ARC bug <rdar://13195034>, fixed in the frontend with @clang.arc.use. llvm-svn: 177928	2013-03-25 22:09:52 +00:00
Shuxin Yang	389ed4b8f7	Fix a bug in fast-math fadd/fsub simplification. The problem is that the code mistakenly took for granted that following constructor is able to create an APFloat from a SIGNED integer: APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value) rdar://13486998 llvm-svn: 177906	2013-03-25 20:43:41 +00:00
NAKAMURA Takumi	951a9b169b	Disable, for now, llvm/test/Transforms/GCOVProfiling on win32. I'll investigate them later. llvm-svn: 177894	2013-03-25 19:47:20 +00:00
Arnaud A. de Grandmaison	3ee88e8a77	Address issues found by Duncan during post-commit review of r177856. llvm-svn: 177863	2013-03-25 11:47:38 +00:00
Arnaud A. de Grandmaison	9c383d68cf	InstCombine: simplify comparisons to zero of (shl %x, Cst) or (mul %x, Cst) This simplification happens at 2 places : - using the nsw attribute when the shl / mul is used by a sign test - when the shl / mul is compared for (in)equality to zero llvm-svn: 177856	2013-03-25 09:48:49 +00:00
John McCall	20182ac0c7	Kill every call to @clang.arc.use in the ARC contract phase. llvm-svn: 177769	2013-03-22 21:38:36 +00:00
Bill Wendling	d96a7a6be8	Update test. There may be multiple catches, but those will be cleaned up. llvm-svn: 177758	2013-03-22 20:36:39 +00:00
Arnaud A. de Grandmaison	f364bc63e7	InstCombine: Improve the result bitvect type when folding (cmp pred (load (gep GV, i)) C) to a bit test. The original code used i32, and i64 if legal. This introduced unneeded casts when they aren't legal, or when the index variable i has another type. In order of preference: try to use i's type; use the smallest fitting legal type (using an added DataLayout method); default to i32. A testcase checks that this works when the index gep operand is i16. Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com> Reviewed by : Duncan llvm-svn: 177712	2013-03-22 08:25:01 +00:00
David Blaikie	0d7d62e4b2	Move the DIFile in DISubprogram to the beginning to be a common prefix along with other DIScopes llvm-svn: 177674	2013-03-21 22:29:36 +00:00
David Blaikie	cc8d090163	Remove unused field in DISubprogram llvm-svn: 177661	2013-03-21 20:28:52 +00:00
Bill Wendling	5b98172115	Update some EH tests that were violating the new EH model. The landingpad instruction needs to be the first non-PHI instruction in the unwind destination block. llvm-svn: 177650	2013-03-21 18:30:10 +00:00
Meador Inge	6b6a161ccf	Move library call prototype attribute inference to functionattrs The simplify-libcalls pass implemented a doInitialization hook to infer function prototype attributes for well-known functions. Given that the simplify-libcalls pass is going away and that the functionattrs pass is already in place to deduce function attributes, I am moving this logic to the functionattrs pass. This approach was discussed during patch review: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html. llvm-svn: 177619	2013-03-21 00:55:59 +00:00
David Blaikie	43a729d165	Remove unused field in DICompileUnit llvm-svn: 177590	2013-03-20 22:34:33 +00:00
Nick Lewycky	2b7fe9fc93	Don't assume the test directory is writable, use %T to find a writable directory. llvm-svn: 177488	2013-03-20 05:59:40 +00:00
Arnaud A. de Grandmaison	87c473f0d1	IndVarSimplify: do not recompute an IV value outside of the loop if : - it is trivially known to be used inside the loop in a way that can not be optimized away - there is no use outside of the loop which can take advantage of the computation hoisting llvm-svn: 177432	2013-03-19 20:00:22 +00:00
Nick Lewycky	d67186337a	Emit the linkage name instead of the function name, when available. This means that we'll prefer to emit the mangled C++ name (pending a clang change). llvm-svn: 177371	2013-03-19 01:37:55 +00:00
Manman Ren	1217112d11	Check whether a pointer is non-null (isKnownNonNull) in isKnownNonZero. This handles the case where we have an inbounds GEP with alloca as the pointer. This fixes the regression in PR12750 and rdar://13286434. Note that we can also fix this by handling some GEP cases in isKnownNonNull. llvm-svn: 177321	2013-03-18 21:23:25 +00:00
David Tweed	d505b24277	Initially forgotten-to-svn-add test case for r177279. llvm-svn: 177280	2013-03-18 12:07:24 +00:00
Michael Gottesman	a8b60a4fda	Reduced dont-infinite-loop-during-block-escape-analysis.ll with bugpoint and moved it to retain-block-escape-analysis.ll. NOTE I verified that the original bug behind dont-infinite-loop-during-block-escape-analysis.ll occurs when using opt on retain-block-escape-analysis.ll. llvm-svn: 177240	2013-03-17 21:31:12 +00:00
David Blaikie	8fb8224578	Split out filename & directory from DIFile to start generalizing over DIScopes This is the first step to making all DIScopes have a common metadata prefix (so that things (using directives, for example) that can appear in any scope can be added to that common prefix). DIFile is itself a DIScope so the common prefix of all DIScopes cannot be a DIFile - instead it's the raw filename/directory name pair. llvm-svn: 177239	2013-03-17 21:13:55 +00:00
David Blaikie	2e488d1f0d	Generalize debug info test to be resilient to changes in metadata node numbering llvm-svn: 177238	2013-03-17 21:08:22 +00:00
Michael Gottesman	9782183126	The promised test case for r175939. This test makes sure that the ObjCARC escape analysis looks at the uses of instructions which copy the block pointer value by checking all four cases where that can occur. llvm-svn: 177232	2013-03-17 08:42:58 +00:00
Arnold Schwaighofer	9b55e31bcb	LoopVectorizer: Insert some white space to make test case more readable Also remove some unneeded function attributes. llvm-svn: 177114	2013-03-14 21:31:09 +00:00
Arnold Schwaighofer	4991ce9d49	Add missing asserts flag to test - it uses debug flags llvm-svn: 177102	2013-03-14 19:01:58 +00:00
Arnold Schwaighofer	c63cf3a0ae	LoopVectorize: Invert case when we use a vector cmp value to query select cost We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098	2013-03-14 18:54:36 +00:00
Shuxin Yang	2eca602f8b	Perform factorization as a last resort of unsafe fadd/fsub simplification. Rules include: 1)1 xy +/- xz => x*(y +/- z) (the order of operands dosen't matter) 2) y/x +/- z/x => (y +/- z)/x The transformation is disabled if the new add/sub expr "y +/- z" is a denormal/naz/inifinity. rdar://12911472 llvm-svn: 177088	2013-03-14 18:08:26 +00:00
Chandler Carruth	a1c54bbe34	PR14972: SROA vs. GVN exposed a really bad bug in SROA. The fundamental problem is that SROA didn't allow for overly wide loads where the bits past the end of the alloca were masked away and the load was sufficiently aligned to ensure there is no risk of page fault, or other trapping behavior. With such widened loads, SROA would delete the load entirely rather than clamping it to the size of the alloca in order to allow mem2reg to fire. This was exposed by a test case that neatly arranged for GVN to run first, widening certain loads, followed by an inline step, and then SROA which miscompiles the code. However, I see no reason why this hasn't been plaguing us in other contexts. It seems deeply broken. Diagnosing all of the above took all of 10 minutes of debugging. The really annoying aspect is that fixing this completely breaks the pass. ;] There was an implicit reliance on the fact that no loads or stores extended past the alloca once we decided to rewrite them in the final stage of SROA. This was used to encode information about whether the loads and stores had been split across multiple partitions of the original alloca. That required threading explicit tracking of whether a use of a partition is split across multiple partitions. Once that was done, another problem arose: we allowed splitting of integer loads and stores iff they were loads and stores to the entire alloca. This is a really arbitrary limitation, and splitting at least some integer loads and stores is crucial to maximize promotion opportunities. My first attempt was to start removing the restriction entirely, but currently that does Very Bad Things by causing many common alloca patterns to be fully decomposed into i8 operations and lots of or-ing together to produce larger integers on demand. The code bloat is terrifying. That is still the right end-goal, but substantial work must be done to either merge partitions or ensure that small i8 values are eagerly merged in some other pass. Sadly, figuring all this out took essentially all the time and effort here. So the end result is that we allow splitting only when the load or store at least covers the alloca. That ensures widened loads and stores don't hurt SROA, and that we don't rampantly decompose operations more than we have previously. All of this was already fairly well tested, and so I've just updated the tests to cover the wide load behavior. I can add a test that crafts the pass ordering magic which caused the original PR, but that seems really brittle and to provide little benefit. The fundamental problem is that widened loads should Just Work. llvm-svn: 177055	2013-03-14 11:32:24 +00:00

1 2 3 4 5 ...

3778 Commits