llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	9db26ffc9a	Carry facts about nullness and undef across GC relocation This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968	2014-12-29 23:27:30 +00:00
Philip Reames	b35f46ce06	Refine the notion of MayThrow in LICM to include a header specific version In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up. This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs. define void @nothrow_header(i64 %x, i64 %y, i1 %cond) { ; CHECK-LABEL: nothrow_header ; CHECK-LABEL: entry ; CHECK: %div = udiv i64 %x, %y ; CHECK-LABEL: loop ; CHECK: call void @use(i64 %div) entry: br label %loop loop: ; preds = %entry, %for.inc %div = udiv i64 %x, %y br i1 %cond, label %loop-if, label %exit loop-if: call void @use(i64 %div) br label %loop exit: ret void } The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load. The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata. Differential Revision: http://reviews.llvm.org/D6725 llvm-svn: 224965	2014-12-29 23:00:57 +00:00
Philip Reames	5ad26c353c	Loading from null is valid outside of addrspace 0 This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961	2014-12-29 22:46:21 +00:00
David Majnemer	b1296ec0fd	InstCombine: Infer nuw for multiplies A multiply cannot unsigned wrap if there are bitwidth, or more, leading zero bits between the two operands. llvm-svn: 224849	2014-12-26 09:50:35 +00:00
David Majnemer	54c2ca2539	InstCombe: Infer nsw for multiplies We already utilize this logic for reducing overflow intrinsics, it makes sense to reuse it for normal multiplies as well. llvm-svn: 224847	2014-12-26 09:10:14 +00:00
Michael Kuperstein	be8032c875	[ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits() GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects. Differential Revision: http://reviews.llvm.org/D6758 llvm-svn: 224765	2014-12-23 11:33:41 +00:00
Michael Liao	5313da3263	[SimplifyCFG] Revise common code sinking - Fix the case where more than 1 common instructions derived from the same operand cannot be sunk. When a pair of value has more than 1 derived values in both branches, only 1 derived value could be sunk. - Replace BB1 -> (BB2, PN) map with joint value map, i.e. map of (BB1, BB2) -> PN, which is more accurate to track common ops. llvm-svn: 224757	2014-12-23 08:26:55 +00:00
Bruno Cardoso Lopes	bad65c3b70	[LCSSA] Handle PHI insertion in disjoint loops Take two disjoint Loops L1 and L2. LoopSimplify fails to simplify some loops (e.g. when indirect branches are involved). In such situations, it can happen that an exit for L1 is the header of L2. Thus, when we create PHIs in one of such exits we are also inserting PHIs in L2 header. This could break LCSSA form for L2 because these inserted PHIs can also have uses in L2 exits, which are never handled in the current implementation. Provide a fix for this corner case and test that we don't assert/crash on that. Differential Revision: http://reviews.llvm.org/D6624 rdar://problem/19166231 llvm-svn: 224740	2014-12-22 22:35:46 +00:00
David Majnemer	6eed0e0d20	This should have been part of r224676. llvm-svn: 224677	2014-12-20 04:48:34 +00:00
David Majnemer	b0362e4ee6	InstCombine: Squash an icmp+select into bitwise arithmetic (X & INT_MIN) == 0 ? X ^ INT_MIN : X into X \| INT_MIN (X & INT_MIN) != 0 ? X ^ INT_MIN : X into X & INT_MAX This fixes PR21993. llvm-svn: 224676	2014-12-20 04:45:35 +00:00
David Majnemer	0b6a0b0257	InstSimplify: Optimize away pointless comparisons (X & INT_MIN) ? X & INT_MAX : X into X & INT_MAX (X & INT_MIN) ? X : X & INT_MAX into X (X & INT_MIN) ? X \| INT_MIN : X into X (X & INT_MIN) ? X : X \| INT_MIN into X \| INT_MIN llvm-svn: 224669	2014-12-20 03:04:38 +00:00
Bruno Cardoso Lopes	f6cf8ad4e5	Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. Also, fix code to also return the modified switch when only the truncation is performed. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224588	2014-12-19 17:12:35 +00:00
Sanjay Patel	ea3c802887	use -0.0 when creating an fneg instruction Backends recognize (-0.0 - X) as the canonical form for fneg and produce better code. Eg, ppc64 with 0.0: lis r2, ha16(LCPI0_0) lfs f0, lo16(LCPI0_0)(r2) fsubs f1, f0, f1 blr vs. -0.0: fneg f1, f1 blr Differential Revision: http://reviews.llvm.org/D6723 llvm-svn: 224583	2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes	3be15b2fa6	Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr" Reverts commit r224574 to appease buildbots: The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. llvm-svn: 224576	2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes	c9005f2f2b	[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224574	2014-12-19 14:23:15 +00:00
David Majnemer	824e011ad7	ConstantFold: Shifting undef by zero results in undef llvm-svn: 224553	2014-12-18 23:54:43 +00:00
Suyog Sarda	43fae93da8	Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it." This was re-ordering floating point data types resulting in mismatch in output. llvm-svn: 224424	2014-12-17 10:34:27 +00:00
Elena Demikhovsky	028e966a54	Added 5 more tests related to sink store revision 224247 - by Ella Bolshinsky http://reviews.llvm.org/D6420 llvm-svn: 224418	2014-12-17 08:12:59 +00:00
Erik Eckstein	a451b9b0b5	Strength reduce intrinsics with overflow into regular arithmetic operations if possible. Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced. This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow. It completes the work on PR20194. llvm-svn: 224417	2014-12-17 07:29:19 +00:00
David Majnemer	65c52ae8ca	InstSimplify: shl nsw/nuw undef, %V -> undef We can always choose an value for undef which might cause %V to shift out an important bit except for one case, when %V is zero. However, shl behaves like an identity function when the right hand side is zero. llvm-svn: 224405	2014-12-17 01:54:33 +00:00
Elena Demikhovsky	f5b72afff4	Masked Load and Store Intrinsics in loop vectorizer. The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334	2014-12-16 11:50:42 +00:00
Sanjoy Das	4555b6d870	Teach ScalarEvolution to exploit min and max expressions when proving isKnownPredicate. The motivation for this change is to optimize away checks in loops like this: limit = min(t, len) for (i = 0 to limit) if (i >= len \|\| i < 0) throw_array_of_of_bounds(); a[i] = ... Differential Revision: http://reviews.llvm.org/D6635 llvm-svn: 224285	2014-12-15 22:50:15 +00:00
Duncan P. N. Exon Smith	be7ea19b58	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
Elena Demikhovsky	bf74736290	Added a test related to 224247 revision llvm-svn: 224248	2014-12-15 14:14:10 +00:00
Suyog Sarda	2b27fc78a2	Typo Correction in Test Case. NFC. llvm-svn: 224244	2014-12-15 12:19:46 +00:00
Ahmed Bougacha	0cb861634b	Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores." r223862 tried to also combine base-updating load/stores. r224198 reverted it, as "it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown." Reapply, with a fix to ignore non-normal load/stores. Truncstores are handled elsewhere (you can actually write a pattern for those, whereas for postinc loads you can't, since they return two values), but it should be possible to also combine extloads base updates, by checking that the memory (rather than result) type is of the same size as the addend. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 224203	2014-12-13 23:22:12 +00:00
Renato Golin	df8f9b6dc9	Revert "[ARM] Combine base-updating/post-incrementing vector load/stores." This reverts commit r223862, as it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown. We'll investigate the issue and re-apply when safe. llvm-svn: 224198	2014-12-13 20:23:18 +00:00
David Majnemer	9b6097586c	ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume Respect the MaxDepth recursion limit, doing otherwise will trigger an assert in computeKnownBits. This fixes PR21891. llvm-svn: 224168	2014-12-12 23:59:29 +00:00
Suyog Sarda	384095e65c	This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it. Test case : float hadd(float* a) { return (a[0] + a[1]) + (a[2] + a[3]); } AArch64 assembly before patch : ldp s0, s1, [x0] ldp s2, s3, [x0, #8] fadd s0, s0, s1 fadd s1, s2, s3 fadd s0, s0, s1 ret AArch64 assembly after patch : ldp d0, d1, [x0] fadd v0.2s, v0.2s, v1.2s faddp s0, v0.2s ret Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html llvm-svn: 224119	2014-12-12 12:53:44 +00:00
Steven Wu	881916dea5	Fix another infinite loop in InstCombine Summary: InstCombine infinite-loops for the testcase added It is because InstCombine is generating instructions that can be optimized by itself. Fix by not optimizing frem if the optimized type is the same as original type. rdar://problem/19150820 Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6634 llvm-svn: 224097	2014-12-12 04:34:07 +00:00
Andrea Di Biagio	72b05aa59c	[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi. This patch teaches the instruction combiner how to fold a call to 'insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64. From the AMD64 Architecture Programmer's Manual: 1. If the sum of the bit index + length field is greater than 64, then the results are undefined; 2. A value of zero in the field length is defined as a length of 64. This patch improves the existing combining logic for intrinsic 'insertqi' adding extra checks to address both point 1. and point 2. Differential Revision: http://reviews.llvm.org/D6583 llvm-svn: 224054	2014-12-11 20:44:59 +00:00
David Majnemer	f532fcb889	InstSimplify: Remove usesless %a parameter from tests No functional change intended. llvm-svn: 224016	2014-12-11 12:56:17 +00:00
Michael Kuperstein	fffb6996c9	The inliner needs to fix up debug information for llvm.dbg.declare, not only for llvm.dbg.value. Patch by Amjad Aboud Differential Revision: http://reviews.llvm.org/D6525 llvm-svn: 224015	2014-12-11 12:41:10 +00:00
David Majnemer	5a7717e498	ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0 Zero is usually a nicer constant to have than -1. llvm-svn: 223969	2014-12-10 21:58:15 +00:00
David Majnemer	89cf6d79eb	ConstantFold: an undef shift amount results in undef X shifted by undef results in undef because the undef value can represent values greater than the width of the operands. llvm-svn: 223968	2014-12-10 21:38:05 +00:00
David Majnemer	7b86b77248	ConstantFold: div undef, 0 should fold to undef, not zero Dividing by zero yields an undefined value. llvm-svn: 223924	2014-12-10 09:14:55 +00:00
David Majnemer	ae707582c0	InstSimplify: [al]shr exact undef, %X -> undef Exact shifts always keep the non-zero bits of their input. This means it keeps it's undef bits. llvm-svn: 223923	2014-12-10 09:14:52 +00:00
David Majnemer	71dc8fb867	InstSimplify: div %X, 0 -> undef We already optimized rem %X, 0 to undef, we should do the same for div. llvm-svn: 223919	2014-12-10 07:52:18 +00:00
Ahmed Bougacha	7efbac74ec	[ARM] Combine base-updating/post-incrementing vector load/stores. We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 223862	2014-12-10 00:07:37 +00:00
Chandler Carruth	a7f247ea56	Revert r223764 which taught instcombine about integer-based elment extraction patterns. This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely also on ARM and PPC. I really don't know how, but reverting now that I've confirmed this is actually the culprit. I have a reproduction as well and so should be able to restore this shortly. This reverts commit r223764. Original commit log follows: Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. llvm-svn: 223813	2014-12-09 19:21:16 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Sonam Kumari	72ccc3c428	Removal Of Duplicate Test Cases and Addition Of Missing Check Statements llvm-svn: 223768	2014-12-09 10:46:38 +00:00
Ankur Garg	51eeba70da	[test/Transforms/InstCombine/shift.ll] Removed duplicate test cases. NFC. Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll. test54 and test57 were duplicates of each other. test55 and test58 were duplicates of each other. (Removed test57 and test58) llvm-svn: 223767	2014-12-09 10:35:19 +00:00
Chandler Carruth	7415205113	Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. Differential Revision: http://reviews.llvm.org/D6548 llvm-svn: 223764	2014-12-09 08:55:32 +00:00
David Majnemer	d5b3aa49ac	InstSimplify: Try to bring back the rest of r223583 This reverts r223624 with a small tweak, hopefully this will make stage3 equivalent. llvm-svn: 223679	2014-12-08 18:30:43 +00:00
Sonam Kumari	90d266c0a9	Removal Of Duplicate Test Case from shift.ll file llvm-svn: 223648	2014-12-08 09:40:43 +00:00
NAKAMURA Takumi	2b6e662672	Revert a part of r223583, for now. It seems causing different emission between stage2(gcc-clang) and stage3 clang. Investigating. llvm-svn: 223624	2014-12-08 02:07:22 +00:00
David Majnemer	1af36e5baf	InstSimplify: Optimize away useless unsigned comparisons Code like X < Y && Y == 0 should always be folded away to false. llvm-svn: 223583	2014-12-06 10:51:40 +00:00
Duncan P. N. Exon Smith	da41af9e94	IR: Disallow complicated function-local metadata Disallow complex types of function-local metadata. The only valid function-local metadata is an `MDNode` whose sole argument is a non-metadata function-local value. Part of PR21532. llvm-svn: 223564	2014-12-06 01:26:49 +00:00
Hans Wennborg	67ead59108	Add some tests for SimplifyCFG's TurnSwitchRangeIntoICmp(). NFC. llvm-svn: 223396	2014-12-04 22:19:28 +00:00
Hans Wennborg	1096539335	Add some tests for SimplifyCFG's ConstantFoldTerminator(). NFC. llvm-svn: 223395	2014-12-04 22:19:25 +00:00
Philip Reames	5b3ce71b62	Add a test case for argument type coercion in an invoke of a vararg function This would have caught the bug I fixed in 223370. llvm-svn: 223378	2014-12-04 19:13:45 +00:00
Hal Finkel	aa19bafc9c	Revert "r223364 - Revert r223347 which has caused crashes on bootstrap bots." Reapply r223347, with a fix to not crash on uninserted instructions (or more precisely, instructions in uninserted blocks). bugpoint was able to reduce the test case somewhat, but it is still somewhat large (and relies on setting things up to be simplified during inlining), so I've not included it here. Nevertheless, it is clear what is going on and why. Original commit message: Restrict somewhat the memory-allocation pointer cmp opt from r223093 Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223371	2014-12-04 17:45:19 +00:00
Alexander Potapenko	76770e4930	Revert r223347 which has caused crashes on bootstrap bots. llvm-svn: 223364	2014-12-04 14:22:27 +00:00
Simon Pilgrim	be24ab367b	[InstCombine] Minor optimization for bswap with binary ops Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349	2014-12-04 09:44:01 +00:00
Hal Finkel	8b24b32c44	Restrict somewhat the memory-allocation pointer cmp opt from r223093 Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223347	2014-12-04 09:22:28 +00:00
Matthias Braun	d34e4d2354	[SimplifyLibCalls] Improve double->float shrinking to consider constants This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x); rdar://19049359 Differential Revision: http://reviews.llvm.org/D6496 llvm-svn: 223270	2014-12-03 21:46:33 +00:00
Matthias Braun	892c923c46	[SimplifyLibCalls] Enable double to float shrinking for copysign rdar://19049359 Differential Revision: http://reviews.llvm.org/D6495 llvm-svn: 223269	2014-12-03 21:46:29 +00:00
Nick Lewycky	8ede1ef09b	Fix test to use the right metadata node (reapply r223239 plus a fix) and also to use the correct path to the GCNO file. llvm-svn: 223244	2014-12-03 17:32:44 +00:00
Alexander Potapenko	ff5326f0d1	Revert r223239, which broke some bots. llvm-svn: 223240	2014-12-03 16:03:08 +00:00
Alexander Potapenko	1b42ad9b43	Fix the metadata number used by llvm.gcov to match the number of the inserted metadata node. llvm-svn: 223239	2014-12-03 15:15:58 +00:00
Erik Eckstein	d181752be0	InstCombine: simplify signed range checks Try to convert two compares of a signed range check into a single unsigned compare. Examples: (icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n llvm-svn: 223224	2014-12-03 10:39:15 +00:00
Tom Stellard	1f0dded057	StructurizeCFG: Use LoopInfo analysis for better loop detection We were assuming that each back-edge in a region represented a unique loop, which is not always the case. We need to use LoopInfo to correctly determine which back-edges are loops. llvm-svn: 223199	2014-12-03 04:28:32 +00:00
Nick Lewycky	2e8a6219fc	Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that. llvm-svn: 223193	2014-12-03 02:45:01 +00:00
Michael Zolotukhin	ea8327b80f	PR21302. Vectorize only bottom-tested loops. rdar://problem/18886083 llvm-svn: 223171	2014-12-02 22:59:06 +00:00
Michael Zolotukhin	540580ca06	Apply loop-rotate to several vectorizer tests. Such loops shouldn't be vectorized due to the loops form. After applying loop-rotate (+simplifycfg) the tests again start to check what they are intended to check. llvm-svn: 223170	2014-12-02 22:59:02 +00:00
Bruno Cardoso Lopes	15520db9ad	[SwitchLowering] Handle destinations on multiple phi instructions Follow up from r222926. Also handle multiple destinations from merged cases on multiple and subsequent phi instructions. rdar://problem/19106978 llvm-svn: 223135	2014-12-02 18:31:53 +00:00
Bruno Cardoso Lopes	d035fbb96f	[LICM] Avoind store sinking if no preheader is available Load instructions are inserted into loop preheaders when sinking stores and later removed if not used by the SSA updater. Avoid sinking if the loop has no preheader and avoid crashes. This fixes one more side effect of not handling indirectbr instructions properly on LoopSimplify. llvm-svn: 223119	2014-12-02 14:22:34 +00:00
Sonam Kumari	f2eacabd66	[signext.ll] Removal Of Duplicate Test Cases Removed the duplicate test case existing in signext.ll file. llvm-svn: 223109	2014-12-02 05:29:47 +00:00
Hal Finkel	afcd8dbbcf	Simplify pointer comparisons involving memory allocation functions System memory allocation functions, which are identified at the IR level by the noalias attribute on the return value, must return a pointer into a memory region disjoint from any other memory accessible to the caller. We can use this property to simplify pointer comparisons between allocated memory and local stack addresses and the addresses of global variables. Neither the stack nor global variables can overlap with the region used by the memory allocator. Fixes PR21556. llvm-svn: 223093	2014-12-01 23:38:06 +00:00
Reid Kleckner	35fc363ce8	Parse 'ghccc' in .ll files as the GHC convention (cc 10) Previously we just used "cc 10" in the .ll files, but that isn't very human readable. llvm-svn: 223076	2014-12-01 21:04:44 +00:00
Hans Wennborg	5bef5b522b	Revert r223049, r223050 and r223051 while investigating test failures. I didn't foresee affecting the Clang test suite :/ llvm-svn: 223054	2014-12-01 17:36:43 +00:00
Hans Wennborg	269ebb612e	SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable They would get optimized away later, but we might as well not emit them. llvm-svn: 223051	2014-12-01 17:08:38 +00:00
Hans Wennborg	5a1e5c05d8	SimplifyCFG: don't remove unreachable default switch destinations An unreachable default destination can be exploited by other optimizations, and SDag lowering is now prepared to handle them efficiently. For example, branches to the unreachable destination will be optimized away, such as in the case of range checks for switch lookup tables. On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and Chromium by 30 kB). llvm-svn: 223050	2014-12-01 17:08:35 +00:00
Sonam Kumari	237cfa9916	Removed extra whitespace. (Testing commit access). NFC. llvm-svn: 222994	2014-12-01 09:27:46 +00:00
Rafael Espindola	a4b2ee4548	Relax an assert a bit to avoid a crash on unreachable code. Patch by Duncan Exon Smith with a small tweak by me. llvm-svn: 222984	2014-12-01 02:55:24 +00:00
Duncan P. N. Exon Smith	910f05d181	DebugIR: Delete -debug-ir llvm-svn: 222945	2014-11-29 03:15:47 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
David Majnemer	3d6f80b619	InstCombine: FoldOrOfICmps harder We may be in a situation where the icmps might not be near each other in a tree of or instructions. Try to dig out related compare instructions and see if they combine. N.B. This won't fire on deep trees of compares because rewritting the tree might end up creating a net increase of IR. We may have to resort to something more sophisticated if this is a real problem. llvm-svn: 222928	2014-11-28 19:58:29 +00:00
Bruno Cardoso Lopes	46d5bf2982	[LICM] Store sink and indirectbr instructions Loop simplify skips exit-block insertion when exits contain indirectbr instructions. This leads to an assertion in LICM when trying to sink stores out of non-dedicated loop exits containing indirectbr instructions. This patch fix this issue by re-checking for dedicated exits in LICM prior to store sink attempts. Differential Revision: http://reviews.llvm.org/D6414 rdar://problem/18943047 llvm-svn: 222927	2014-11-28 19:47:46 +00:00
Bruno Cardoso Lopes	bc7ba2c766	[SwitchLowering] Handle multiple destinations on condensed case stmts Switch cases statements with sequential values that branch to the same destination BB may often be handled together in a single new source BB. In this scenario we need to remove remaining incoming values from PHI instructions in the destination BB, as to match the number of source branches. Differential Revision: http://reviews.llvm.org/D6415 rdar://problem/19040894 llvm-svn: 222926	2014-11-28 19:47:33 +00:00
Erik Eckstein	0d86c7623f	reinstate r222872: Peephole optimization in switch table lookup: reuse the guarding table comparison if possible. Fixed missing dominance check. Original commit message: This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch. Example: if (idx < tablesize) r = table[idx]; // table does not contain default_value else r = default_value; if (r != default_value) ... Is optimized to: cond = idx < tablesize; if (cond) r = table[idx]; else r = default_value; if (cond) ... Jump threading will then eliminate the second if(cond). llvm-svn: 222891	2014-11-27 15:13:14 +00:00
Suyog Sarda	f8516e1662	Use FileCheck instead of grep. Change by Ankur Garg. Differential Revision: http://reviews.llvm.org/D6430 llvm-svn: 222879	2014-11-27 11:22:49 +00:00
Erik Eckstein	2190cd9ffa	Revert "Peephole optimization in switch table lookup: reuse the guarding table comparison if possible." It is breaking the clang bootstrag. llvm-svn: 222877	2014-11-27 10:59:08 +00:00
Suyog Sarda	c3024c75e0	Use FileCheck instead of grep. Change by Sonam. Differential Revision: http://reviews.llvm.org/D6432 llvm-svn: 222876	2014-11-27 10:57:24 +00:00
Erik Eckstein	e73e308ab9	Peephole optimization in switch table lookup: reuse the guarding table comparison if possible. This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch. Example: if (idx < tablesize) r = table[idx]; // table does not contain default_value else r = default_value; if (r != default_value) ... Is optimized to: cond = idx < tablesize; if (cond) r = table[idx]; else r = default_value; if (cond) ... \endcode Jump threading will then eliminate the second if(cond). llvm-svn: 222872	2014-11-27 08:33:51 +00:00
David Majnemer	40157d5c4d	InstCombine: Restore optimizations lost in r210006 This restores our ability to optimize: (X & C) == 0 ? X ^ C : X into X \| C (X & C) != 0 ? X ^ C : X into X & ~C llvm-svn: 222871	2014-11-27 07:25:21 +00:00
David Majnemer	c6a5e1dd4f	InstSimplify: Restore optimizations lost in r210006 This restores our ability to optimize: (X & C) ? X & ~C : X into X & ~C (X & C) ? X : X & ~C into X (X & C) ? X \| C : X into X (X & C) ? X : X \| C into X \| C llvm-svn: 222868	2014-11-27 06:32:46 +00:00
David Majnemer	5468e86469	Revert "Added inst combine transforms for single bit tests from Chris's note" This reverts commit r210006, it miscompiled libapr which is used in who knows how many projects. A test has been added to ensure that we don't regress again. I'll work on a rewrite of what the optimization was trying to do later. llvm-svn: 222856	2014-11-26 23:00:38 +00:00
Hans Wennborg	bda193edff	Remove useless rdar:// comment from switch_to_lookup_table.ll test. llvm-svn: 222772	2014-11-25 18:45:23 +00:00
Hans Wennborg	45172aceb3	LazyValueInfo: Actually re-visit partially solved block-values in solveBlockValue() If solveBlockValue() needs results from predecessors that are not already computed, it returns false with the intention of resuming when the dependencies have been resolved. However, the computation would never be resumed since an 'overdefined' result had been placed in the cache, preventing any further computation. The point of placing the 'overdefined' result in the cache seems to have been to break cycles, but we can check for that when inserting work items in the BlockValue stack instead. This makes the "stop and resume" mechanism of solveBlockValue() work as intended, unlocking more analysis. Using this patch shaves 120 KB off a 64-bit Chromium build on Linux. I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in compile time. Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me. Differential Revision: http://reviews.llvm.org/D6397 llvm-svn: 222768	2014-11-25 17:23:05 +00:00
Chandler Carruth	816d26fe5e	[InstCombine] Change LLVM To canonicalize toward the value type being stored rather than the pointer type. This change is analogous to r220138 which changed the canonicalization for loads. The rationale is the same: memory does not have a type, operations (and thus the values they produce) have a type. We should match that type as closely as possible rather than reading some form of semantics into the pointer type. With this change, loads and stores should no longer be made with nonsensical types for the values that tehy load and store. This is particularly important when trying to match specific loaded and stored types in the process of doing other instcombines, which is what led me down this twisty maze of miscanonicalization. I've put quite some effort into looking through IR to find places where LLVM's optimizer was being unreasonably conservative in the face of mismatched load and store types, however it is possible (let's say, likely!) I have missed some. If you see regressions here, or from r220138, the likely cause is some part of LLVM failing to cope with load and store types differing. Test cases appreciated, it is important that we root all of these out of LLVM. llvm-svn: 222748	2014-11-25 10:09:51 +00:00
Suyog Sarda	99c9c1f2b0	Change the test case file to use FileCheck instead of grep. NFC. Change by Ankur Garg. Differential Revision: http://reviews.llvm.org/D6382 llvm-svn: 222740	2014-11-25 08:44:56 +00:00
Chandler Carruth	1a3c2c414c	Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite clearly only exactly equal width ptrtoint and inttoptr casts are no-op casts, it says so right there in the langref. Make the code agree. Original log from r220277: Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 222739	2014-11-25 08:20:27 +00:00
David Majnemer	bd9ce4ea51	InstSimplify: Handle some simple tautological comparisons This handles cases where we are comparing a masked value against itself. The analysis could be further improved by making it recursive but such expense is not currently justified. llvm-svn: 222716	2014-11-25 02:55:48 +00:00
Matt Arsenault	238ff1ad1e	Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons llvm-svn: 222705	2014-11-24 23:15:18 +00:00
Matt Arsenault	ea515d33c9	Convert test to FileCheck and use CHECK-LABEL llvm-svn: 222704	2014-11-24 23:03:17 +00:00
David Majnemer	8e6f6a98b5	InstCombine: Don't create an unused instruction We would create an instruction but not inserting it. Not inserting the unused instruction would lead us to verification failure. This fixes PR21653. llvm-svn: 222659	2014-11-24 16:41:13 +00:00
David Majnemer	b2a6e7458d	InstCombine: Don't assume DataLayout is always available We tried to get the result of DataLayout::getLargestLegalIntTypeSize but we didn't have a DataLayout. This resulted in opt crashing. This fixes PR21651. llvm-svn: 222645	2014-11-24 07:26:20 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
David Majnemer	fb3805576b	InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2) llvm-svn: 222625	2014-11-22 20:00:41 +00:00
David Majnemer	ec6e481bc5	InstCombine: Propagate exact for (sdiv X, Y) -> (udiv X, Y) llvm-svn: 222624	2014-11-22 20:00:38 +00:00
David Majnemer	fa4699e65f	InstCombine: Propagate exact for (sdiv -X, C) -> (sdiv X, -C) llvm-svn: 222623	2014-11-22 20:00:34 +00:00
David Majnemer	a3aeb15613	InstCombine: Propagate exact in (udiv (lshr X,C1),C2) -> (udiv x,C1<<C2) llvm-svn: 222620	2014-11-22 18:16:54 +00:00
David Majnemer	546f81064c	InstCombine: Propagate NSW/NUW for X*(1<<Y) -> X<<Y llvm-svn: 222613	2014-11-22 08:57:02 +00:00
David Majnemer	8279a7506d	InstCombine: Propagate NSW for -X * -Y -> X * Y llvm-svn: 222612	2014-11-22 07:25:19 +00:00
David Majnemer	4efa9ff8ca	InstSimplify: Simplify (sub 0, X) -> X if it's NUW This is a generalization of the X - (0 - Y) -> X transform. llvm-svn: 222611	2014-11-22 07:15:16 +00:00
David Majnemer	80c8f627db	InstCombine: Preserve nsw when folding X*(2^C) -> X << C llvm-svn: 222606	2014-11-22 04:52:55 +00:00
David Majnemer	fd4a6d2b7a	InstCombine: Preserve nsw/nuw for ((X << C2)C1) -> (X (C1 << C2)) llvm-svn: 222605	2014-11-22 04:52:52 +00:00
David Majnemer	027bc80928	InstCombine: Preserve nsw for (mul %V, -1) -> (sub 0, %V) llvm-svn: 222604	2014-11-22 04:52:38 +00:00
Gerolf Hoflehner	ec6217c929	[InstCombine] Re-commit of r218721 (Optimize icmp-select-icmp sequence) Fixes the self-host fail. Note that this commit activates dominator analysis in the combiner by default (like the original commit did). llvm-svn: 222590	2014-11-21 23:36:44 +00:00
David Majnemer	c0a313b57c	SROA: The alloca type isn't a candidate promotion type for vectors The alloca's type is irrelevant, only those types which are used in a load or store of the exact size of the slice should be considered. This manifested as an assertion failure when we compared the various types: we had a size mismatch. This fixes PR21480. llvm-svn: 222499	2014-11-21 02:34:55 +00:00
Michael Zolotukhin	0dcae71449	Fix a trip-count overflow issue in LoopUnroll. Currently LoopUnroll generates a prologue loop before the main loop body to execute first N%UnrollFactor iterations. Also, this loop is used if trip-count can overflow - it's determined by a runtime check. However, we've been mistakenly optimizing this loop to a linear code for UnrollFactor = 2, not taking into account that it also serves as a safe version of the loop if its trip-count overflows. llvm-svn: 222451	2014-11-20 20:19:55 +00:00
Chad Rosier	90a2f9b110	Revert "[Reassociate] As the expression tree is rewritten make sure the operands are" This reverts commit r222142. This is causing/exposing an execution-time regression in spec2006/gcc and coremark on AArch64/A57/Ofast. Conflicts: test/Transforms/Reassociate/optional-flags.ll llvm-svn: 222398	2014-11-19 23:21:20 +00:00
Suyog Sarda	aba97f4aba	Vectorize a reduction chain feeding into a 'return' statement. e.x return (a[0]+b[0]) + (a[1]+b[1]) Differential Revision: http://reviews.llvm.org/D6227 llvm-svn: 222364	2014-11-19 16:07:38 +00:00
Arnaud A. de Grandmaison	7b9dc28060	Fix tail recursion elimination When the BasicBlock containing the return instrution has a PHI with 2 incoming values, FoldReturnIntoUncondBranch will remove the no longer used incoming value and remove the no longer needed phi as well. This leaves us with a BB that no longer has a PHI, but the subsequent call to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not remove the return instruction (which still uses the result of the call instruction). This prevents EliminateRecursiveTailCall to remove the value, as it is still being used in a basicblock which has no predecessors. The basicblock can not be erased on the spot, because its iterator is still being used in runTRE. This issue was exposed when removing the threshold on size for lifetime marker insertion for named temporaries in clang. The testcase is a much reduced version of peelOffOuterExpr(const Expr, const ExplodedNode ) from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp. llvm-svn: 222354	2014-11-19 13:32:51 +00:00
David Majnemer	b7adf34ee0	AliasSetTracker: UnknownInsts should contribute to the refcount AliasSetTracker::addUnknown may create an AliasSet devoid of pointers just to contain an instruction if no suitable AliasSet already exists. It will then AliasSet::addUnknownInst and we will be done. However, it's possible for addUnknown to choose an existing AliasSet to addUnknownInst. If this were to occur, we are in a bit of a pickle: removing pointers from the AliasSet can cause the entire AliasSet to become destroyed, taking our unknown instructions out with them. Instead, keep track whether or not our AliasSet has any unknown instructions. This fixes PR21582. llvm-svn: 222338	2014-11-19 09:41:05 +00:00
Manman Ren	c67109313c	Revert r222039 because of bot failure. http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/ Hopefully, bot will be green. If not, we will re-submit the commit. llvm-svn: 222287	2014-11-19 00:13:26 +00:00
David Majnemer	c6b8e20a5c	InstCombine: Fix another infinite loop caused by visitFPTrunc We would attempt to replace an frem's operand with the same operand. This would cause InstCombine to think real work was done, causing InstCombine to enter an infinite loop. This fixes the second part of PR21576. llvm-svn: 222265	2014-11-18 22:06:45 +00:00
David Majnemer	b32eaddf11	Revert "Revert r222040 because of bot failure." This reverts commit r222203, reverting r222040 didn't end up turning the bot green. llvm-svn: 222261	2014-11-18 21:30:02 +00:00
Chad Rosier	b83c6d9c08	[Reassociate] Use test cases that can actually be optimized to verify optional flags are cleared. The reassociation pass was just reordering the leaf nodes in the previous test cases. llvm-svn: 222250	2014-11-18 20:34:01 +00:00
Philip Reames	018dbf18c4	Tweak EarlyCSE to recognize series of dead stores EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!). Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D6301 llvm-svn: 222241	2014-11-18 17:46:32 +00:00
David Majnemer	6fdb6b8fd4	InstCombine: Fold away tautological masked compares It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true. While this sort of reasoning should normally live in InstSimplify, the machinery that derives this result is not trivial to split out. llvm-svn: 222230	2014-11-18 09:31:41 +00:00
David Majnemer	9a91e4a18a	IndVarSimplify: Allow LFTR to fire more often I added a pessimization in r217102 to prevent miscompiles when the incremented induction variable was used in a comparison; it would be poison. Try to use the incremented induction variable more often when we can be sure that the increment won't end in poison. Differential Revision: http://reviews.llvm.org/D6222 llvm-svn: 222213	2014-11-18 02:20:58 +00:00
Manman Ren	a64bd44fd8	Revert r222040 because of bot failure. http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/ Hopefully, bot will be green. llvm-svn: 222203	2014-11-18 00:33:22 +00:00
Juergen Ributzka	c9591e9bdb	[SimplifyCFG] Make the value type of the hole check bitmask a power-of-2. When converting a switch to a lookup table we might have to generate a bitmaks to encode and check for holes in the original switch statement. The type of this mask depends on the number of switch statements, which can result in illegal types for pretty much all architectures. To avoid unnecessary type legalization and help FastISel this commit increases the size of the bitmask to next power-of-2 value when necessary. This fixes rdar://problem/18984639. llvm-svn: 222168	2014-11-17 19:39:56 +00:00
Chad Rosier	bc0b869be9	[Reassociate] As the expression tree is rewritten make sure the operands are emitted in canonical form. llvm-svn: 222142	2014-11-17 16:33:50 +00:00
Chad Rosier	9a1ac6e494	[Reassociate] Canonicalize constants to RHS operand. Fix a thinko where the RHS was already a constant. llvm-svn: 222139	2014-11-17 15:52:51 +00:00
Erik Eckstein	105374fe5e	Optimize switch lookup tables with linear mapping. This is a simple optimization for switch table lookup: It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output. Example: int f1(int x) { switch (x) { case 0: return 10; case 1: return 11; case 2: return 12; case 3: return 13; } return 0; } generates: define i32 @f1(i32 %x) #0 { entry: %0 = icmp ult i32 %x, 4 br i1 %0, label %switch.lookup, label %return switch.lookup: %switch.offset = add i32 %x, 10 ret i32 %switch.offset return: ret i32 0 } llvm-svn: 222121	2014-11-17 09:13:57 +00:00
Rafael Espindola	a3b5b60753	Add back r222061 with a fix. This adds back r222061, but now calls initializePAEvalPass from the correct library to avoid link problems. Original message: Don't make assumptions about the name of private global variables. Private variables are can be renamed, so it is not reliable to make decisions on the name. The name is also dropped by the assembler before getting to the linker, so using the name causes a disconnect between how llvm makes a decision (var name) and how the linker makes a decision (section it is in). This patch changes one case where we were looking at the variable name to use the section instead. Test tuning by Michael Gottesman. llvm-svn: 222117	2014-11-17 02:28:27 +00:00
Reid Kleckner	007239863e	Revert "Don't make assumptions about the name of private global variables." This reverts commit r222061. It's causing linker errors. llvm-svn: 222077	2014-11-15 02:03:53 +00:00
Rafael Espindola	2fc723099f	Don't make assumptions about the name of private global variables. Private variables are can be renamed, so it is not reliable to make decisions on the name. The name is also dropped by the assembler before getting to the linker, so using the name causes a disconnect between how llvm makes a decision (var name) and how the linker makes a decision (section it is in). This patch changes one case where we were looking at the variable name to use the section instead. Test tuning by Michael Gottesman. llvm-svn: 222061	2014-11-14 23:17:47 +00:00
David Majnemer	8c3d92e7e5	InstCombine: Fix infinite loop caused by visitFPTrunc We would attempt to replace a fptrunc of an frem with an identical fptrunc. This would cause the new fptrunc to be added to the worklist. Of course, this results in an infinite loop because we will keep visiting the newly created fptruncs. This fixes PR21576. llvm-svn: 222040	2014-11-14 21:21:15 +00:00
Chad Rosier	1ff4c0bf0b	Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" This commit updates the failing test in Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll The failing test is sensitive to the order in which we process loads. This version turns on the RPO traversal instead of the while DT traversal in GVN. The new test code is functionally same just the order of loads that are eliminated is swapped. This new version also fixes an issue where GVN splits a critical edge and potentially invalidate the RPO/DT iterator. llvm-svn: 222039	2014-11-14 21:09:13 +00:00
Chad Rosier	df8f2a23cb	[Reassociate] Canonicalize the operands of all binary operators. llvm-svn: 222008	2014-11-14 17:09:19 +00:00
Chad Rosier	d99df68e19	[Reassociate] Canonicalize operands of vector binary operators. Prior to this commit fmul and fadd binary operators were being canonicalized for both scalar and vector versions. We now canonicalize add, mul, and, or, and xor vector instructions. llvm-svn: 222006	2014-11-14 17:08:15 +00:00
Chad Rosier	f8b55f1bc5	[Reassociate] Canonicalize constants to RHS operand. llvm-svn: 222005	2014-11-14 17:05:59 +00:00
Reid Kleckner	29d880bdc7	Relax the gcov version.ll test to check '.' instead of '\' The escaping of the '\' doesn't work with my combination of testing tools. llvm-svn: 221944	2014-11-13 23:07:55 +00:00
Chad Rosier	8716b58583	Revert "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE." This reverts commit r221924. It appears the commit was a bit premature and is causing bot failures that need further investigation. llvm-svn: 221939	2014-11-13 22:54:59 +00:00
Chad Rosier	dd526665fc	[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE. Phabricator Revision: http://reviews.llvm.org/D6103 Patch by "Balaram Makam" <bmakam@codeaurora.org>! llvm-svn: 221924	2014-11-13 21:17:58 +00:00
Sanjoy Das	c5676df3ec	Teach ScalarEvolution to sharpen range information. If x is known to have the range [a, b), in a loop predicated by (icmp ne x, a) its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. This change was originally landed in r219834 but had a bug and broke ASan. It was reverted in r219878, and is now being re-landed after fixing the original bug. phabricator: http://reviews.llvm.org/D5639 reviewed by: atrick llvm-svn: 221839	2014-11-13 00:00:58 +00:00
Ahmed Bougacha	0788d49a40	[CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead. Fixes PR21548. Related to PR20474. llvm-svn: 221820	2014-11-12 22:16:55 +00:00
Sanjay Patel	4c219fd248	CGSCC should not treat intrinsic calls like function calls (PR21403) Make the handling of calls to intrinsics in CGSCC consistent: they are not treated like regular function calls because they are never lowered to function calls. Without this patch, we can get dangling pointer asserts from the subsequent loop that processes callsites because it already ignores intrinsics. See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion. Differential Revision: http://reviews.llvm.org/D6124 llvm-svn: 221802	2014-11-12 18:25:47 +00:00
Jingyue Wu	8a12cea5f1	Disable indvar widening if arithmetics on the wider type are more expensive Summary: Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test was run whether NVPTX was enabled or not. IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. This test is run only when NVPTX is enabled. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221799	2014-11-12 18:09:15 +00:00
Jingyue Wu	a48273390c	Reverts r221772 which fails tests llvm-svn: 221773	2014-11-12 07:19:25 +00:00
Jingyue Wu	635a9b14fa	Disable indvar widening if arithmetics on the wider type are more expensive Summary: IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221772	2014-11-12 06:58:45 +00:00
Bill Schmidt	729547847f	[PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. llvm-svn: 221767	2014-11-12 04:19:40 +00:00
Chad Rosier	f53f07046b	[Reassociate] Canonicalize negative constants out of expressions. Add support for FDiv, which was regressed by the previous commit. llvm-svn: 221738	2014-11-11 23:36:42 +00:00
Philip Reames	66c6de61ee	Canonicalize an assume(load != null) into !nonnull metadata We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms. We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something. Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D5951 llvm-svn: 221737	2014-11-11 23:33:19 +00:00
Chad Rosier	094ac7735b	[Reassociate] Canonicalize negative constants out of expressions. This is a reapplication of r221171, but we only perform the transformation on expressions which include a multiplication. We do not transform rem/div operations as this doesn't appear to be safe in all cases. llvm-svn: 221721	2014-11-11 22:58:35 +00:00
Suyog Sarda	beb064bd94	Addition to r216371 (SLP and Loop Vectorization) and r218607 where cost model for signed division by power of 2 was improved for AArch64. The revision r218607 missed test case for Loop Vectorization. Adding it in this revision. Differential Revision: http://reviews.llvm.org/D6181 llvm-svn: 221674	2014-11-11 07:39:27 +00:00
Juergen Ributzka	d441725d3d	[SwitchLowering] Fix the "fixPhis" function. Switch statements may have more than one incoming edge into the same BB if they all have the same value. When the switch statement is converted these incoming edges are now coming from multiple BBs. Updating all incoming values to be from a single BB is incorrect and would generate invalid LLVM IR. The fix is to only update the first occurrence of an incoming value. Switch lowering will perform subsequent calls to this helper function for each incoming edge with a new basic block - updating all edges in the process. This fixes rdar://problem/18916275. llvm-svn: 221627	2014-11-10 21:05:27 +00:00
Chad Rosier	b3eb452e83	[Reassociate] Better preserve NSW/NUW flags. Part of PR12985. Phabricator Revision: http://reviews.llvm.org/D6172 llvm-svn: 221555	2014-11-07 22:12:57 +00:00
David Majnemer	2098b86f64	SCCP: overdefined calls cannot become constant We would attempt to fold away a call instruction which had been marked overdefined. However, it's not valid to transition to constant from overdefined. This fixes PR21512. llvm-svn: 221513	2014-11-07 08:54:19 +00:00
David Majnemer	bf93e7c7d3	LoopVectorize: Don't assume pointees are sized A pointer's pointee might not be sized: the pointee could be a function. Report this as IK_NoInduction when calculating isInductionVariable. This fixes PR21508. llvm-svn: 221501	2014-11-07 00:31:14 +00:00
David Majnemer	c1eca5ad7c	InstCombine: Rely on cmpxchg's return code when it's strong Comparing the result of a cmpxchg instruction can be replaced with an extractvalue of the cmpxchg success indicator. llvm-svn: 221498	2014-11-06 23:23:30 +00:00
Chad Rosier	ac6a2f532c	[Reassociate] Don't reassociate when mixing regular and fast-math FP instructions. Inlining might cause such cases and it's not valid to reassociate floating-point instructions without the unsafe algebra flag. Patch by Mehdi Amini <mehdi_amini@apple.com>! llvm-svn: 221462	2014-11-06 16:46:37 +00:00
Justin Bogner	58e41344f9	GCOV: Make sure that function idents in the .gcda and .gcno match When generating gcov compatible profiling, we sometimes skip emitting data for functions for one reason or another. However, this was emitting different function IDs in the .gcno and .gcda files, because the .gcno case was using the loop index before skipping functions and the .gcda the array index after. This resulted in completely invalid gcov data. This fixes the problem by making the .gcno loop track the ID separately from the loop index. llvm-svn: 221441	2014-11-06 06:55:02 +00:00
NAKAMURA Takumi	61b2f48453	llvm/test/Transforms/GCOVProfiling: Avoid to parse backslashes in MDString. Use %/T instead of %T. LLVM Parser decodes "\bb" as hex in "C:\bb-win7\buildername\build...", with MDString. See also, http://llvm.org/docs/LangRef.html#metadata-nodes-and-metadata-strings This reverts r221270, "Disable 3 tests in llvm/test/Transforms/GCOVProfiling/ for now. Investigating." FIXME: Please check EC in GCOVProfiler::emitProfileNotes(). llvm-svn: 221334	2014-11-05 06:29:05 +00:00
David Majnemer	bf7550e7ec	InstSimplify: Exact shifts of X by Y are X if X has the lsb set Exact shifts may not shift out any non-zero bits. Use computeKnownBits to determine when this occurs and just return the left hand side. This fixes PR21477. llvm-svn: 221325	2014-11-05 00:59:59 +00:00
David Majnemer	f20d7c4c61	Analysis: Make isSafeToSpeculativelyExecute fire less for divides Divides and remainder operations do not behave like other operations when they are given poison: they turn into undefined behavior. It's really hard to know if the operands going into a div are or are not poison. Because of this, we should only choose to speculate if there are constant operands which we can easily reason about. This fixes PR21412. llvm-svn: 221318	2014-11-04 23:49:08 +00:00
Reid Kleckner	941e93e9a8	Revert "[Reassociate] Canonicalize negative constants out of expressions." This reverts commit r221171. It performs this invalid transformation: - %div.i = urem i64 -1, %add - %sub.i = sub i64 -2, %div.i + %div.i = urem i64 1, %add + %sub.i1 = add i64 %div.i, -2 llvm-svn: 221317	2014-11-04 23:42:45 +00:00
NAKAMURA Takumi	14e3fc6ad4	Disable 3 tests in llvm/test/Transforms/GCOVProfiling/ for now. Investigating. llvm-svn: 221270	2014-11-04 14:41:53 +00:00
NAKAMURA Takumi	06ac98299f	Remove "REQUIRES:shell" from tests. They work for me. llvm-svn: 221269	2014-11-04 13:41:33 +00:00
NAKAMURA Takumi	217ee5bf92	llvm/test/Transforms/GCOVProfiling/linezero.ll: Use %/T instead of %T in regex. This works on win32. llvm-svn: 221262	2014-11-04 13:00:48 +00:00
David Majnemer	d28edfea03	Minimize test case further No functional change intended. llvm-svn: 221237	2014-11-04 05:17:58 +00:00
Reid Kleckner	dd3f3edafa	Revert "Transforms: reapply SVN r219899" This reverts commit r220811 and r220839. It made an incorrect change to musttail handling. llvm-svn: 221226	2014-11-04 02:02:14 +00:00
Hal Finkel	840257a49c	Use AA in LoadCombine LoadCombine can be smarter about aborting when a writing instruction is encountered, instead of aborting upon encountering any writing instruction, use an AliasSetTracker, and only abort when encountering some write that might alias with the loads that could potentially be combined. This was originally motivated by comments made (and a test case provided) by David Majnemer in response to PR21448. It turned out that LoadCombine was not responsible for that PR, but LoadCombine should also be improved so that unrelated stores (and @llvm.assume) don't interrupt load combining. llvm-svn: 221203	2014-11-03 23:19:16 +00:00
David Majnemer	7e2b9882b1	InstCombine: Remove infinite loop caused by FoldOpIntoPhi FoldOpIntoPhi could create an infinite loop if the PHI could potentially reach a BB it was considering inserting instructions into. The instructions it would insert would eventually lead to other combines firing which would, again, lead to FoldOpIntoPhi firing. The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to insert instructions that the PHI might reach. This fixes PR21377. llvm-svn: 221187	2014-11-03 21:55:12 +00:00
Hal Finkel	1e16fa302e	EarlyCSE should ignore calls to @llvm.assume EarlyCSE uses a simple generation scheme for handling memory-based dependencies, and calls to @llvm.assume (which are marked as writing to memory to ensure the preservation of control dependencies) disturb that scheme unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative (adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that). Fixes PR21448. llvm-svn: 221175	2014-11-03 20:21:32 +00:00
Chad Rosier	005505b027	[Reassociate] Canonicalize negative constants out of expressions. This gives CSE/GVN more options to eliminate duplicate expressions. This is a follow up patch to http://reviews.llvm.org/D4904. http://reviews.llvm.org/D5363 llvm-svn: 221171	2014-11-03 19:11:30 +00:00
Paul Robinson	ad06e430ce	Normally an 'optnone' function goes through fast-isel, which does not call DAGCombiner. But we ran into a case (on Windows) where the calling convention causes argument lowering to bail out of fast-isel, and we end up in CodeGenAndEmitDAG() which does run DAGCombiner. So, we need to make DAGCombiner check for 'optnone' after all. Commit includes the test that found this, plus another one that got missed in the original optnone work. llvm-svn: 221168	2014-11-03 18:19:26 +00:00
David Majnemer	72a643dc8f	InstCombine: Combine (X \| Y) - X to (~X & Y) This implements the transformation from (X \| Y) - X to (~X & Y). Differential Revision: http://reviews.llvm.org/D5791 llvm-svn: 221129	2014-11-03 05:53:55 +00:00
Elena Demikhovsky	27152aea88	Use Alias Analysis to hoist 2 loads from diamond to the common predecessor basic block. Alias Analysis allows to detect real barriers for load hoisting. Review in http://reviews.llvm.org/D5991 llvm-svn: 221091	2014-11-02 08:03:05 +00:00
David Majnemer	634ca236dc	InstCombine: Don't assume that m_ZExt matches an Instruction m_ZExt might bind against a ConstantExpr instead of an Instruction. Assuming this, using cast<Instruction>, results in InstCombine crashing. Instead, introduce ZExtOperator to bridge both Instruction and ConstantExpr ZExts. This fixes PR21445. llvm-svn: 221069	2014-11-01 23:46:05 +00:00
David Majnemer	549f4f2510	InstCombine: Combine (X+cst) < 0 --> X < -cst This can happen pretty often in code that looks like: int foo = bar - 1; if (foo < 0) do stuff In this case, bar < 1 is an equivalent condition. This transform requires that the add instruction be annotated with nsw. llvm-svn: 221045	2014-11-01 09:09:51 +00:00
Michael Zolotukhin	9b9624de0c	Correctly update dom-tree after loop vectorizer. llvm-svn: 221009	2014-10-31 22:28:03 +00:00
Bradley Smith	9992b167ae	[SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags In a case where we have a no {un,}signed wrap flag on the increment, if RHS - Start is constant then we can avoid inserting a max operation bewteen the two, since we can statically determine which is greater. This allows us to unroll loops such as: void testcase3(int v) { for (int i=v; i<=v+1; ++i) f(i); } llvm-svn: 220960	2014-10-31 11:40:32 +00:00
NAKAMURA Takumi	d9913e6d35	llvm/test/Transforms/SampleProfile/syntax.ll: Relax MISSING-FILE not to check locale-aware message catalog. llvm-svn: 220934	2014-10-30 22:28:46 +00:00
Philip Reames	4cb4d3e048	Add handling for range metadata in ValueTracking isKnownNonZero If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes. This helps to remove redundant checks and canonicalize checks for other optimization passes. This particular patch checks whether a value is known to be non-zero from the range metadata. Currently, these tests are against InstCombine. In theory, all of these should be InstSimplify since we're not inserting any new instructions. Moving the code may follow in a separate change. Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D5947 llvm-svn: 220925	2014-10-30 20:25:19 +00:00
Diego Novillo	c572e92c76	Add profile writing capabilities for sampling profiles. Summary: This patch finishes up support for handling sampling profiles in both text and binary formats. The new binary format uses uleb128 encoding to represent numeric values. This makes profiles files about 25% smaller. The profile writer class can write profiles in the existing text and the new binary format. In subsequent patches, I will add the capability to read (and perhaps write) profiles in the gcov format used by GCC. Additionally, I will be adding support in llvm-profdata to manipulate sampling profiles. There was a bit of refactoring needed to separate some code that was in the reader files, but is actually common to both the reader and writer. The new test checks that reading the same profile encoded as text or raw, produces the same results. Reviewers: bogner, dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6000 llvm-svn: 220915	2014-10-30 18:00:06 +00:00
NAKAMURA Takumi	27b6d47f36	llvm/test/Transforms/LoopRotate/nosimplifylatch.ll: Fix possibly mis-repeatedly-pasted test. llvm-svn: 220880	2014-10-29 23:05:12 +00:00
Yi Jiang	323d57336c	Test Case for r220872:Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance llvm-svn: 220873	2014-10-29 20:20:33 +00:00
Yi Jiang	ab19fff4d8	Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance llvm-svn: 220872	2014-10-29 20:19:47 +00:00
Saleem Abdulrasool	56ade2991b	test: tweak inlined-allocs test Remove pointless checks for storage of uninteresting values. Ensure that we perform basic alias analysis to make the test more correct. Finally, apply a stylistic change to the test. llvm-svn: 220839	2014-10-29 06:31:11 +00:00
Saleem Abdulrasool	d178ada55e	Transforms: reapply SVN r219899 This restores the commit from SVN r219899 with an additional change to ensure that the CodeGen is correct for the case that was identified as being incorrect (originally PR7272). In the case that during inlining we need to synthesize a value on the stack (i.e. for passing a value byval), then any function involving that alloca must be stripped of its tailness as the restriction that it does not access the parent's stack no longer holds. Unfortunately, a single alloca can cause a rippling effect through out the inlining as the value may be aliased or may be mutated through an escaped external call. As such, we simply track if an alloca has been introduced in the frame during inlining, and strip any tail calls. llvm-svn: 220811	2014-10-28 18:27:37 +00:00
David Majnemer	c8bdd23acf	InstCombine: Fix a combine assuming that icmp operands were integers An icmp may have pointer arguments, it isn't limited to integers or vectors of integers. This fixes PR21388. llvm-svn: 220664	2014-10-27 05:47:49 +00:00
Jingyue Wu	fe72fcebf6	[SeparateConstOffsetFromGEP] Fixed a bug related to unsigned modulo The dividend in "signed % unsigned" is treated as unsigned instead of signed, causing unexpected behavior such as -64 % (uint64_t)24 == 0. Added a regression test in split-gep.ll Patched by Hao Liu. llvm-svn: 220618	2014-10-25 18:34:03 +00:00
Jingyue Wu	b723152379	[SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressions The two operands of the new OR expression should be NextInChain and TheOther instead of the two original operands. Added a regression test in split-gep.ll. Hao Liu reported this bug, and provded the test case and an initial patch. Thanks! llvm-svn: 220615	2014-10-25 17:36:21 +00:00
Sanjay Patel	848309da7c	Handle sqrt() shrinking in SimplifyLibCalls like any other call This patch removes a chunk of special case logic for folding (float)sqrt((double)x) -> sqrtf(x) in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls. No functional change intended, but I loosened the restriction on the existing sqrt testcases to allow for this optimization even without unsafe-fp-math because that's the existing behavior. I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic in case the result is used as a double. Differential Revision: http://reviews.llvm.org/D5919 llvm-svn: 220514	2014-10-23 21:52:45 +00:00
Justin Bogner	72d1f2b61b	test: Make this test runnable in directories with @ in their names Jenkins likes to use directories with names involving the '@' character, which breaks the sed expression in this test. Switch to use '\|' on the assumption that it's less likely to show up in a path. llvm-svn: 220401	2014-10-22 18:18:54 +00:00
Sanjay Patel	a92fa44740	Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850) When a call to a double-precision libm function has fast-math semantics (via function attribute for now because there is no IR-level FMF on calls), we can avoid fpext/fptrunc operations and use the float version of the call if the input and output are both float. We already do this optimization using a command-line option; this patch just adds the ability for fast-math to use the existing functionality. I moved the cl::opt from InstructionCombining into SimplifyLibCalls because it's only ever used internally to that class. Modified the existing test cases to use the unsafe-fp-math attribute rather than repeating all tests. This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850 Differential Revision: http://reviews.llvm.org/D5893 llvm-svn: 220390	2014-10-22 15:29:23 +00:00
Diego Novillo	a67c0b43e1	Change error to warning when a profile cannot be found. When the profile for a function cannot be applied, we use to emit an error. This seems extreme. The compiler can continue, it's just that the optimization opportunities won't include profile information. llvm-svn: 220386	2014-10-22 13:36:35 +00:00
Diego Novillo	8027b80b41	Support using sample profiles with partial debug info. Summary: When using a profile, we used to require the use -gmlt so that we could get access to the line locations. This is used to match line numbers in the input profile to the line numbers in the function's IR. But this is actually not necessary. The driver can provide source location tracking without the emission of debug information. In these cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the actual line location annotations are still present. This patch adds a new way of looking for the start of the current function. Instead of looking through the compile units in llvm.dbg.cu, we can walk up the scope for the first instruction in the function with a debug loc. If that describes the function, we use it. Otherwise, we keep looking until we find one. If no such instruction is found, we then give up and produce an error. Reviewers: echristo, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5887 llvm-svn: 220382	2014-10-22 12:59:00 +00:00
Bruno Cardoso Lopes	c29520c5b3	[InstSimplify] Support constant folding to vector of pointers ConstantFolding crashes when trying to InstSimplify the following load: @a = private unnamed_addr constant %mst { i8* inttoptr (i64 -1 to i8), i8 inttoptr (i64 -1 to i8) }, align 8 %x = load <2 x i8>* bitcast (%mst* @a to <2 x i8>), align 8 This patch fix this by adding support to this type of folding: %x = load <2 x i8> bitcast (%mst* @a to <2 x i8>), align 8 ==> gets folded to: %x = <2 x i8> <i8 inttoptr (i64 -1 to i8), i8 inttoptr (i64 -1 to i8*)> llvm-svn: 220380	2014-10-22 12:18:48 +00:00
Hans Wennborg	0b39fc0d16	Revert "Teach the load analysis to allow finding available values which require" (r220277) This seems to have caused PR21330. llvm-svn: 220349	2014-10-21 23:49:52 +00:00
Matt Arsenault	d6511b49ac	Add minnum / maxnum intrinsics These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341	2014-10-21 23:00:20 +00:00
David Majnemer	d205602a0b	InstCombine: Simplify FoldICmpCstShrCst This function was complicated by the fact that it tried to perform canonicalizations that were already preformed by InstSimplify. Remove this extra code and move the tests over to InstSimplify. Add asserts to make sure our preconditions hold before we make any assumptions. llvm-svn: 220314	2014-10-21 19:51:55 +00:00
Chandler Carruth	aa72a6dd3b	Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 220277	2014-10-21 09:00:40 +00:00
Philip Reames	cdb72f369f	Introduce a 'nonnull' metadata on Load instructions. The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns. Long term, it would be nice to combine these into a single construct. The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull. Reviewed by: Hal Finkel Differential Revision: http://reviews.llvm.org/D5220 llvm-svn: 220240	2014-10-20 22:40:55 +00:00
Chandler Carruth	a32038b006	Fix a miscompile introduced in r220178. The original code had an implicit assumption that if the test for allocas or globals was reached, the two pointers were not equal. With my changes to make the pointer analysis more powerful here, I also had to guard against circumstances where the results weren't useful. That in turn violated the assumption and gave rise to a circumstance in which we could have a store with both the queried pointer and stored pointer rooted at the same alloca. Clearly, we cannot ignore such a store. There are other things we might do in this code to better handle the case of both pointers ending up at the same alloca or global, but it seems best to at least make the test explicit in what it intends to check. I've added tests for both the alloca and global case here. llvm-svn: 220190	2014-10-20 10:03:01 +00:00
Chandler Carruth	6665d62117	Fix a somewhat subtle pair of issues with JumpThreading I introduced in r220178. First, the creation routine doesn't insert prior to the terminator of the basic block provided, but really at the end of the basic block. Instead, get the terminator and insert before that. The next issue was that we need to ensure multiple PHI node entries for a single predecessor re-use the same cast instruction rather than creating new ones. All of the logic here was without tests previously. I've reduced and added a test case from the test suite that crashed without both of these fixes. llvm-svn: 220186	2014-10-20 05:34:36 +00:00
Chandler Carruth	eeec35ae1c	Teach the load analysis driving core instcombine logic and other bits of logic to look through pointer casts, making them trivially stronger in the face of loads and stores with intervening pointer casts. I've included a few test cases that demonstrate the kind of folding instcombine can do without pointer casts and then variations which obfuscate the logic through bitcasts. Without this patch, the variations all fail to optimize fully. This is more important now than it has been in the past as I've started moving the load canonicialization to more closely follow the value type requirements rather than the pointer type requirements and thus this needs to be prepared for more pointer casts. When I made the same change to stores several test cases regressed without logic along these lines so I wanted to systematically improve matters first. llvm-svn: 220178	2014-10-20 00:24:14 +00:00
Chandler Carruth	b5f4c32830	Add a datalayout string to this test so that it exercises the full gamut of InstCombine rather than just the bits enabled when datalayout is optional. The primary fixes here are because now things are little endian. In good news, silliness like this seems like it will be going away as we've got pretty stong consensus on dropping optional datalayout entirely. llvm-svn: 220176	2014-10-20 00:11:31 +00:00
Chandler Carruth	bc6378defb	Do a better and more complete job of preserving metadata when combining loads. This handles many more cases than just the AA metadata, some of them suggested by Hal in his review of the AA metadata handling patch. I've tried to test this behavior where tractable to do so. I'll point out that I have specifically not included a test for debuginfo because it was going to require 2 or 3 times as much work to craft some input which would survive the "helpful" stripping of debug info metadata that doesn't match the desired schema. This is another good example of why the current state of write-ability for our debug info metadata is unacceptable. I spent over 30 minutes trying to conjure some test case that would survive, even copying from other debug info tests, but it always failed to survive with no explanation of why or how I might fix it. =[ llvm-svn: 220165	2014-10-19 10:46:46 +00:00
Chandler Carruth	5b8cd2f73c	Move previously dead code to handle computing the known bits of an alias up to where it actually works as intended. The problem is that a GlobalAlias isa GlobalValue and so the prior block handled all of the cases. This allows us to constant fold based on the actual constant expression in the global alias. As an example, see the last function in the newly added test case which explicitly aligns an unaligned pointer using constant expression math. Without this change, we fail to see that and fold an alignment test to zero. llvm-svn: 220164	2014-10-19 09:06:56 +00:00
David Majnemer	312c3e5f39	InstCombine: (sub (or A B) (xor A B)) --> (and A B) The following implements the transformation: (sub (or A B) (xor A B)) --> (and A B). Patch by Ankur Garg! Differential Revision: http://reviews.llvm.org/D5719 llvm-svn: 220163	2014-10-19 08:32:32 +00:00
David Majnemer	59939acd26	InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1 The following implements the optimization for sequences of the form: icmp eq/ne (shl Const2, A), Const1 Such sequences can be transformed to: icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2)) This handles only the equality operators for now. Other operators need to be handled. Patch by Ankur Garg! llvm-svn: 220162	2014-10-19 08:23:08 +00:00
Chandler Carruth	a801dd5799	Fix a long-standing miscompile in the load analysis that was uncovered by my refactoring of this code. The method isSafeToLoadUnconditionally assumes that the load will proceed with the preferred type alignment. Given that, it has to ensure that the alloca or global is at least that aligned. It has always done this historically when a datalayout is present, but has never checked it when the datalayout is absent. When I refactored the code in r220156, I exposed this path when datalayout was present and that turned the latent bug into a patent bug. This fixes the issue by just removing the special case which allows folding things without datalayout. This isn't worth the complexity of trying to tease apart when it is or isn't safe without actually knowing the preferred alignment. llvm-svn: 220161	2014-10-19 08:17:50 +00:00
Chandler Carruth	be9dccd64d	Preserve AA metadata when combining (cast (load (...))) -> (load (cast (...))). llvm-svn: 220141	2014-10-18 11:00:12 +00:00
Chandler Carruth	2f75fcfef3	[InstCombine] Do an about-face on how LLVM canonicalizes (cast (load ...)) and (load (cast ...)): canonicalize toward the former. Historically, we've tried to load using the type of the pointer, and tried to match that type as closely as possible removing as many pointer casts as we could and trading them for bitcasts of the loaded value. This is deeply and fundamentally wrong. Repeat after me: memory does not have a type! This was a hard lesson for me to learn working on SROA. There is only one thing that should actually drive the type used for a pointer, and that is the type which we need to use to load from that pointer. Matching up pointer types to the loaded value types is very useful because it minimizes the physical size of the IR required for no-op casts. Similarly, the only thing that should drive the type used for a loaded value is how that value is used! Again, this minimizes casts. And in fact, the only thing motivating types in any part of LLVM's IR are the types used by the operations in the IR. We should match them as closely as possible. I've ended up removing some tests here as they were testing bugs or behavior that is no longer present. Mostly though, this is just cleanup to let the tests continue to function as intended. The only fallout I've found so far from this change was SROA and I have fixed it to not be impeded by the different type of load. If you find more places where this change causes optimizations not to fire, those too are likely bugs where we are assuming that the type of pointers is "significant" for optimization purposes. llvm-svn: 220138	2014-10-18 06:36:22 +00:00
Chandler Carruth	71009cad95	Remove a test that was ported from the old llvm-gcc frontend test suite. This test is pretty awesome. It is claiming to test devirtualization. However, the code in question is not in fact devirtualized by LLVM. If you take the original C++ test case and run it through Clang at -O3 we fail to devirtualize it completely. It also isn't a sufficiently focused test case. The reason we fail to devirtualize it isn't because of any missing instcombine though. Instead, it is because we fail to emit an available externally vtable and thus the vtable is just an external and completely opaque. If I cause the vtable to be emitted, we successfully devirtualize things. Anyways, I'm just removing it because it is providing negative value at this point: it isn't representative of the output of Clang really, LLVM isn't doing the transform it claims to be testing, LLVM's failure to do the transform isn't actually an LLVM bug at all and we shouldn't be testing for it here, and finally the test is written in such a way that it will trivially pass even when the point of the test is failing. llvm-svn: 220137	2014-10-18 06:36:18 +00:00
Chandler Carruth	2dc9682e59	[SROA] Change how SROA does vector-based promotion of allocas to handle cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the very few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116	2014-10-18 00:44:02 +00:00
Rafael Espindola	7da1ea83a9	Revert "TRE: make TRE a bit more aggressive" This reverts commit r219899. This also updates byval-tail-call.ll to make it clear what was breaking. Adding r219899 again will cause the load/store to disappear. llvm-svn: 220093	2014-10-17 21:25:48 +00:00
Hal Finkel	dd38c0b876	[DSE] Remove no-data-layout-only type-based overlap checking DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035	2014-10-17 11:56:00 +00:00
Rafael Espindola	11aaaeebe0	Delete -std-compile-opts. These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951	2014-10-16 20:00:02 +00:00
Bjorn Steinbrink	d20816fde9	Allow call-slop optzn for destinations with a suitable dereferenceable attribute Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950	2014-10-16 19:43:08 +00:00
Sanjay Patel	c699a6117b	fold: sqrt(x * x * y) -> fabs(x) * sqrt(y) If a square root call has an FP multiplication argument that can be reassociated, then we can hoist a repeated factor out of the square root call and into a fabs(). In the simplest case, this: y = sqrt(x * x); becomes this: y = fabs(x); This patch relies on an earlier optimization in instcombine or reassociate to put the multiplication tree into a canonical form, so we don't have to search over every permutation of the multiplication tree. Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to use function-level attributes to do this optimization. This needs to be fixed for both the intrinsics and in the backend. Differential Revision: http://reviews.llvm.org/D5787 llvm-svn: 219944	2014-10-16 18:48:17 +00:00
Akira Hatanaka	5c221ef98f	Reapply r219832 - InstCombine: Narrow switch instructions using known bits. The code committed in r219832 asserted when it attempted to shrink a switch statement whose type was larger than 64-bit. llvm-svn: 219902	2014-10-16 06:00:46 +00:00
Saleem Abdulrasool	7f52921976	TRE: make TRE a bit more aggressive Make tail recursion elimination a bit more aggressive. This allows us to get tail recursion on functions that are just branches to a different function. The fact that the function takes a byval argument does not restrict it from being optimised into just a tail call. llvm-svn: 219899	2014-10-16 03:27:30 +00:00
Akira Hatanaka	40c2cf4afc	Revert r219832. llvm-svn: 219884	2014-10-16 01:17:02 +00:00
Sanjoy Das	360b1ed5f2	Revert "r219834 - Teach ScalarEvolution to sharpen range information" This change breaks the asan buildbots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468 llvm-svn: 219878	2014-10-15 23:46:04 +00:00
Hal Finkel	68dc3c7ab2	Preserve non-byval pointer alignment attributes using @llvm.assume when inlining For pointer-typed function arguments, enhanced alignment can be asserted using the 'align' attribute. When inlining, if this enhanced alignment information is not otherwise available, preserve it using @llvm.assume-based alignment assumptions. llvm-svn: 219876	2014-10-15 23:44:41 +00:00
Sanjoy Das	90c2f1455a	Teach ScalarEvolution to sharpen range information. If x is known to have the range [a, b) in a loop predicated by (icmp ne x, a), its range can be sharpened to [a + 1, b). Get ScalarEvolution and hence IndVars to exploit this fact. This change triggers an optimization to widen-loop-comp.ll, so it had to be edited to get it to pass. phabricator: http://reviews.llvm.org/D5639 llvm-svn: 219834	2014-10-15 19:25:28 +00:00
Akira Hatanaka	5bb9346a45	InstCombine: Narrow switch instructions using known bits. Truncate the operands of a switch instruction to a narrower type if the upper bits are known to be all ones or zeros. rdar://problem/17720004 llvm-svn: 219832	2014-10-15 19:05:50 +00:00
Hal Finkel	3b7fc86677	[SLPVectorize] Basic ephemeral-value awareness The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816	2014-10-15 17:35:01 +00:00
Hal Finkel	1a600faba0	[LoopVectorize] Ignore @llvm.assume for cost estimates and legality A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741	2014-10-14 22:59:49 +00:00
Sanjay Patel	0ca42bb5a8	Optimize away fabs() calls when input is squared (known positive). Eliminate library calls and intrinsic calls to fabs when the input is a squared value. Note that no unsafe-math / fast-math assumptions are needed for this optimization. Differential Revision: http://reviews.llvm.org/D5777 llvm-svn: 219717	2014-10-14 20:43:11 +00:00
David Majnemer	dad2103801	InstCombine: Don't miscompile X % ((Pow2 << A) >>u B) We assumed that A must be greater than B because the right hand side of a remainder operator must be nonzero. However, it is possible for A to be less than B if Pow2 is a power of two greater than 1. Take for example: i32 %A = 0 i32 %B = 31 i32 Pow2 = 2147483648 ((Pow2 << 0) >>u 31) is non-zero but A is less than B. This fixes PR21274. llvm-svn: 219713	2014-10-14 20:28:40 +00:00
Hal Finkel	171c2ec008	Revert "r216914 - Revert: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'" Reapply r216913, a fix for PR20832 by Andrea Di Biagio. The commit was reverted because of buildbot failures, and credit goes to Ulrich Weigand for isolating the underlying issue (which can be confirmed by Valgrind, which does helpfully light up like the fourth of July). Uli explained the problem with the original patch as: It seems the problem is calling multiplySignificand with an addend of category fcZero; that is not expected by this routine. Note that for fcZero, the significand parts are simply uninitialized, but the code in (or rather, called from) multiplySignificand will unconditionally access them -- in effect using uninitialized contents. This version avoids using a category == fcZero addend within multiplySignificand, which avoids this problem (the Valgrind output is also now clean). Original commit message: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'. When folding a fused multiply-add builtin call, make sure that we propagate the correct result in the case where the addend is zero, and the two other operands are finite non-zero. Example: define double @test() { %1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0) ret double %1 } Before this patch, the instruction simplifier wrongly folded the builtin call in function @test to constant 'double 7.0'. With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and propagates the expected result (i.e. 56.0). Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence of NaN/Inf operands. This fixes PR20832. llvm-svn: 219708	2014-10-14 19:23:07 +00:00
Hal Finkel	a3f23e3725	[LVI] Check for @llvm.assume dominating the edge branch When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value constraints, we should check for intrinsics that dominate the edge's branch, not just any potential context instructions. An assumption that dominates the edge's branch represents a truth on that edge. This is specifically useful, for example, if multiple predecessors assume a pointer to be nonnull, allowing us to simplify a later null comparison. The test case, and an initial patch, were provided by Philip Reames. Thanks! llvm-svn: 219688	2014-10-14 16:04:49 +00:00
Marcello Maggioni	5bbe3df63f	Switch to select optimization for two-case switches This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block and a test to check that this condition is handled. llvm-svn: 219656	2014-10-14 01:58:26 +00:00
David Majnemer	db0773089f	InstCombine: Fix miscompile in X % -Y -> X % Y transform We assumed that negation operations of the form (0 - %Z) resulted in a negative number. This isn't true if %Z was originally negative. Substituting the negative number into the remainder operation may result in undefined behavior because the dividend might be INT_MIN. This fixes PR21256. llvm-svn: 219639	2014-10-13 22:37:51 +00:00
David Majnemer	a252138942	InstCombine: Don't miscompile (x lshr C1) udiv C2 We have a transform that changes: (x lshr C1) udiv C2 into: x udiv (C2 << C1) However, it is unsafe to do so if C2 << C1 discards any of C2's bits. This fixes PR21255. llvm-svn: 219634	2014-10-13 21:48:30 +00:00
Joerg Sonnenberger	5ca10d0edb	Revert r219223, it creates invalid PHI nodes. llvm-svn: 219587	2014-10-12 17:16:04 +00:00
Benjamin Kramer	240b85eec5	InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1) llvm-svn: 219585	2014-10-12 14:02:34 +00:00
David Majnemer	fe7fccff11	InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we would fold to X. Note that this is valid when we are in the unsigned domain because we require NUW: 2 <<u 31 results in poison. This fixes PR21245. llvm-svn: 219568	2014-10-11 10:20:04 +00:00
David Majnemer	cb9d596655	InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow consider: C1 = INT_MIN C2 = -1 C1 * C2 overflows without a doubt but consider the following: %x = i32 INT_MIN This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1. N. B. Move the unsigned version of this transform to InstSimplify, it doesn't create any new instructions. This fixes PR21243. llvm-svn: 219567	2014-10-11 10:20:01 +00:00
David Majnemer	3cac85e071	InstCombine: mul to shl shouldn't preserve nsw consider: mul i32 nsw %x, -2147483648 this instruction will not result in poison if %x is 1 however, if we transform this into: shl i32 nsw %x, 31 then we will be generating poison because we just shifted into the sign bit. This fixes PR21242. llvm-svn: 219566	2014-10-11 10:19:52 +00:00
Sanjay Patel	ad8b666624	Return undef on FP <-> Int conversions that overflow (PR21330). The LLVM Lang Ref states for signed/unsigned int to float conversions: "If the value cannot fit in the floating point value, the results are undefined." And for FP to signed/unsigned int: "If the value cannot fit in ty2, the results are undefined." This matches the C definitions. The existing behavior pins to infinity or a max int value, but that may just lead to more confusion as seen in: http://llvm.org/bugs/show_bug.cgi?id=21130 Returning undef will hopefully lead to a less silent failure. Differential Revision: http://reviews.llvm.org/D5603 llvm-svn: 219542	2014-10-10 23:00:21 +00:00
Sanjoy Das	1f05c51e5e	This patch teaches ScalarEvolution to pick and use !range metadata. It also makes it more aggressive in querying range information by adding a call to isKnownPredicateWithRanges to isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond. phabricator: http://reviews.llvm.org/D5638 Reviewed by: atrick, hfinkel llvm-svn: 219532	2014-10-10 21:22:34 +00:00
Mark Heffernan	2beab5f0b4	This patch de-pessimizes the calculation of loop trip counts in ScalarEvolution in the presence of multiple exits. Previously all loops exits had to have identical counts for a loop trip count to be considered computable. This pessimization was implemented by calling getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock) inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME in the comments of that function). The pessimization was added to fix a corner case involving undefined behavior (pr/16130). This patch more precisely handles the undefined behavior case allowing the pessimization to be removed. ControlsExit replaces IsSubExpr to more precisely track the case where undefined behavior is expected to occur. Because undefined behavior is tracked more precisely we can remove MustExit from ExitLimit. MustExit was used to track the case where the limit was computed potentially assuming undefined behavior even if undefined behavior didn't necessarily occur. llvm-svn: 219517	2014-10-10 17:39:11 +00:00
Arnold Schwaighofer	d7d010eb2a	SimplifyCFG: Don't convert phis into selects if we could remove undef behavior instead We used to transform this: define void @test6(i1 %cond, i8* %ptr) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ] store i8 2, i8* %ptr.2, align 8 ret void } into this: define void @test6(i1 %cond, i8* %ptr) { %ptr.2 = select i1 %cond, i8* null, i8* %ptr store i8 2, i8* %ptr.2, align 8 ret void } because the simplifycfg transformation into selects would happen to happen before the simplifycfg transformation that removes unreachable control flow (We have 'unreachable control flow' due to the store to null which is undefined behavior). The existing transformation that removes unreachable control flow in simplifycfg is: /// If BB has an incoming value that will always trigger undefined behavior /// (eg. null pointer dereference), remove the branch leading here. static bool removeUndefIntroducingPredecessor(BasicBlock BB) Now we generate: define void @test6(i1 %cond, i8 %ptr) { store i8 2, i8* %ptr.2, align 8 ret void } I did not see any impact on the test-suite + externals. rdar://18596215 llvm-svn: 219462	2014-10-10 01:27:02 +00:00
Chad Rosier	bd64d46188	[Reassociate] Don't canonicalize X - undef to X + (-undef). Phabricator Revision: http://reviews.llvm.org/D5674 PR21205 llvm-svn: 219434	2014-10-09 20:06:29 +00:00
Andrea Di Biagio	458a669f49	[InstCombine] Fix wrong folding of constant comparisons involving ashr and negative values. This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we wrongly computed the distance between the highest bits set of two negative values. This fixes PR21222. Differential Revision: http://reviews.llvm.org/D5700 llvm-svn: 219406	2014-10-09 12:41:49 +00:00
David Majnemer	ac07703842	Inliner: Non-local functions in COMDATs shouldn't be dropped A function with discardable linkage cannot be discarded if its a member of a COMDAT group without considering all the other COMDAT members as well. This sort of thing is already handled by GlobalOpt/GlobalDCE. This fixes PR21206. llvm-svn: 219335	2014-10-08 19:32:32 +00:00
Justin Bogner	894eff7a9f	Revert "[InstCombine] re-commit r218721 with fix for pr21199" This seems to cause a miscompile when building clang, which causes a bootstrapped clang to fail or crash in several of its tests. See: http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184 http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813 This reverts commit r219282. llvm-svn: 219317	2014-10-08 16:30:22 +00:00
David Majnemer	1b3b70e371	GlobalOpt: Don't drop unused memberes of a Comdat A linkonce_odr member of a COMDAT shouldn't be dropped if we need to keep the entire COMDAT group. This fixes PR21191. llvm-svn: 219283	2014-10-08 07:23:31 +00:00
Gerolf Hoflehner	e2ff5b9223	[InstCombine] re-commit r218721 with fix for pr21199 The icmp-select-icmp optimization targets select-icmp.eq only. This is now ensured by testing the branch predicate explictly. This commit also includes the test case for pr21199. llvm-svn: 219282	2014-10-08 06:42:19 +00:00
Hans Wennborg	1256198bbc	Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimization This seems to have caused PR21199. llvm-svn: 219264	2014-10-08 01:05:57 +00:00
Duncan P. N. Exon Smith	c46cfcbbc6	LoopUnroll: Create sub-loops in LoopInfo `LoopUnrollPass` says that it preserves `LoopInfo` -- make it so. In particular, tell `LoopInfo` about copies of inner loops when unrolling the outer loop. Conservatively, also tell `ScalarEvolution` to forget about the original versions of these loops, since their inputs may have changed. Fixes PR20987. llvm-svn: 219241	2014-10-07 21:19:00 +00:00
Marcello Maggioni	963bc87dbd	Two case switch to select optimization This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block to a select or a couple of selects (depending if the default block is reachable or not). The typical case this optimization wants to be able to optimize is this one: Example: switch (a) { case 10: %0 = icmp eq i32 %a, 10 return 10; %1 = select i1 %0, i32 10, i32 4 case 20: ----> %2 = icmp eq i32 %a, 20 return 2; %3 = select i1 %2, i32 2, i32 %1 default: return 4; } It also sets the base for further optimizations that are planned and being reviewed. llvm-svn: 219223	2014-10-07 18:16:44 +00:00
David Blaikie	17364d4e05	DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are updated even when DAE removes both varargs and non-varargs arguments on the same function. After some stellar (& inspired) help from Reid Kleckner providing a test case for some rather unstable undefined behavior showing up as assertions produced by r214761, I was able to fix this issue in DAE involving the application of both varargs removal, followed by normal argument removal. Indeed I introduced this same bug into ArgumentPromotion (r212128) by copying the code from DAE, and when I fixed the bug in ArgPromo (r213805) and commented in that patch that I didn't need to address the same issue in DAE because it was a single pass. Turns out it's two pass, one for the varargs and one for the normal arguments, so the same fix is needed (at least during varargs removal). So here it is. (the observable/net effect of this bug, even when it didn't result in assertion failure, is that debug info would describe the DAE'd function in the abstract, but wouldn't provide high/low_pc, variable locations, line table, etc (it would appear as though the function had been entirely optimized away), see the original PR14016 for details of the general problem) I'm not recommitting the assertion just yet, as there's been another regression of it since I last tried. It might just be a few test cases weren't adequately updated after Adrian or Duncan's recent schema changes. llvm-svn: 219210	2014-10-07 15:10:23 +00:00
Suyog Sarda	181cc9a029	Remove Extra lines. NFC. llvm-svn: 219201	2014-10-07 11:31:31 +00:00
David Majnemer	e025321d36	GlobalDCE: Don't drop any COMDAT members If we require a single member of a comdat, require all of the other members as well. This fixes PR20981. llvm-svn: 219191	2014-10-07 07:07:19 +00:00
Gerolf Hoflehner	c0b4c20e5e	[InstCombine] re-commit r218721 icmp-select-icmp optimization Takes care of the assert that caused build fails. Rather than asserting the code checks now that the definition and use are in the same block, and does not attempt to optimize when that is not the case. llvm-svn: 219175	2014-10-07 00:16:12 +00:00
Owen Anderson	8373d338f6	Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions. Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical. Patch by Dmitri Shtilman. llvm-svn: 219092	2014-10-05 23:41:26 +00:00
Hal Finkel	04a156139e	[InstCombine] Remove redundant @llvm.assume intrinsics For any @llvm.assume intrinsic, if there is another which dominates it and uses the same condition, then it is redundant and can be removed. While this does not alter the semantics of the @llvm.assume intrinsics, it makes subsequent handling more efficient (and the resulting IR easier to read). llvm-svn: 219067	2014-10-04 21:27:06 +00:00
Richard Smith	1ed4229f6f	PR21145: Teach LLVM about C++14 sized deallocation functions. C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014	2014-10-03 20:17:06 +00:00
Duncan P. N. Exon Smith	176b691d32	Revert "Revert "DI: Fold constant arguments into a single MDString"" This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010	2014-10-03 20:01:09 +00:00
James Molloy	cb7449d058	Revert r215343. This was contentious and needs invesigation. llvm-svn: 218971	2014-10-03 09:29:24 +00:00
Duncan P. N. Exon Smith	786cd049fc	Revert "DI: Fold constant arguments into a single MDString" This reverts commit r218914 while I investigate some bots. llvm-svn: 218918	2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith	571f97bd90	DI: Fold constant arguments into a single MDString This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914	2014-10-02 21:56:57 +00:00
Sanjay Patel	13a657819b	Remove unused function attribute params. llvm-svn: 218909	2014-10-02 21:12:04 +00:00
Sanjay Patel	12d1ce5408	Optimize square root squared (PR21126). When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X. This can happen in the real world when calculating x ** 3/2. This occurs in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c. Differential Revision: http://reviews.llvm.org/D5584 llvm-svn: 218906	2014-10-02 21:10:54 +00:00
Zinovy Nis	ccc3e3733b	[BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]' thus inverting index expression. This patch fixes it. Thanks to Jörg Sonnenberger for pointing. Differential Revision: http://reviews.llvm.org/D5576 llvm-svn: 218867	2014-10-02 13:01:15 +00:00
Sanjay Patel	7b2cd9ad86	Make the sqrt intrinsic return undef for a negative input. As discussed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html And again here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html The sqrt of a negative number when using the llvm intrinsic is undefined. We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref. This change should not affect any code that isn't using "no-nans-fp-math"; ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call. Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. We knowingly approve of this difference with the other compilers in an attempt to flag code that is invoking undefined behavior. A front-end warning should also try to convince the user that the program will fail: http://llvm.org/bugs/show_bug.cgi?id=21093 Differential Revision: http://reviews.llvm.org/D5527 llvm-svn: 218803	2014-10-01 20:36:33 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Evgeniy Stepanov	815f2869ad	Revert r218721, r218735. Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 llvm-svn: 218752	2014-10-01 10:07:28 +00:00
Gerolf Hoflehner	08cc4b950c	[InstCombine] Optimize icmp-select-icmp In special cases select instructions can be eliminated by replacing them with a cheaper bitwise operation even when the select result is used outside its home block. The instances implemented are patterns like %x=icmp.eq %y=select %x,%r, null %z=icmp.eq\|neq %y, null br %z,true, false ==> %x=icmp.ne %y=icmp.eq %r,null %z=or %x,%y br %z,true,false The optimization is integrated into the instruction combiner and performed only when all uses of the select result can be replaced by the select operand proper. For this dominator information is used and dominance is now a required analysis pass in the combiner. The optimization itself is iterative. The critical step is to replace the select result with the non-constant select operand. So the select becomes local and the combiner iteratively works out simpler code pattern and eventually eliminates the select. rdar://17853760 llvm-svn: 218721	2014-10-01 00:13:22 +00:00
Jingyue Wu	fc0296704c	[SimplifyCFG] threshold for folding branches with common destination Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711	2014-09-30 22:23:38 +00:00
Chad Rosier	aab5d7bd33	[IndVarSimplify] Widen loop unsigned compares. This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659	2014-09-30 03:17:42 +00:00
Chad Rosier	70d54ac848	[AArch64] Improve cost model to handle sdiv by a pow-of-two. This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607	2014-09-29 13:59:31 +00:00
Kevin Qin	fc02e3c363	Use a loop to simplify the runtime unrolling prologue. Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604	2014-09-29 11:15:00 +00:00
Chad Rosier	7b974b73ae	[IndVar] Don't widen loop compare unless IV user is sign extended. PR21030 llvm-svn: 218539	2014-09-26 20:05:35 +00:00
David Peixotto	472b05b36c	Ignore annotation function calls in cost computation The annotation instructions are dropped during codegen and have no impact on size. In some cases, the annotations were preventing the unroller from unrolling a loop because the annotation calls were pushing the cost over the unrolling threshold. Differential Revision: http://reviews.llvm.org/D5335 llvm-svn: 218525	2014-09-26 17:48:40 +00:00
David Peixotto	0d4d5e64ec	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387	2014-09-24 16:48:31 +00:00
Reid Kleckner	78927e884b	GlobalOpt: Preserve comdats of unoptimized initializers Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337	2014-09-23 22:33:01 +00:00
Chad Rosier	307b50b0f6	[IndVarSimplify] Partially revert r217953 to see if this fixes the bots. Specifically, disable widening of unsigned compare instructions. llvm-svn: 217962	2014-09-17 16:35:09 +00:00
Chad Rosier	bb99f40530	[IndVarSimplify] Widen loop compare instructions. This improves other optimizations such as LSR. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop. llvm-svn: 217953	2014-09-17 14:10:33 +00:00
Andrea Di Biagio	5b92b4971a	[InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945). Example: define i1 @foo(i32 %a) { %shr = ashr i32 -9, %a %cmp = icmp ne i32 %shr, -5 ret i1 %cmp } Before this fix, the instruction combiner wrongly thought that %shr could have never been equal to -5. Therefore, %cmp was always folded to 'true'. However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore, in this example, it is not valid to fold %cmp to 'true'. The problem was only affecting the case where the comparison was between negative quantities where one of the quantities was obtained from arithmetic shift of a negative constant. This patch fixes the problem with the wrong folding (fixes PR20945). With this patch, the 'icmp' from the example is now simplified to a comparison between %a and 1. This still allows us to get rid of the arithmetic shift (%shr). llvm-svn: 217950	2014-09-17 11:32:31 +00:00
David Majnemer	b435a4214e	InstSimplify: Don't allow (x srem y) urem y -> x srem y Let's consider the case where: %x i16 = 32768 %y i16 = 384 %x srem %y = 65408 (%x srem %y) urem %y = 128 llvm-svn: 217939	2014-09-17 04:16:35 +00:00
David Majnemer	ac717f0972	InstSimplify: ((X % Y) % Y) -> (X % Y) Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D5350 llvm-svn: 217937	2014-09-17 03:34:34 +00:00
Tilmann Scheller	40fc9595c8	[InstCombine] Remove redundant test case. Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D5284 llvm-svn: 217865	2014-09-16 08:50:10 +00:00
David Majnemer	a315bd80c2	InstSimplify: Simplify trivial and/or of icmps Some ICmpInsts when anded/ored with another ICmpInst trivially reduces to true or false depending on whether or not all integers or no integers satisfy the intersected/unioned range. This sort of trivial looking code can come about when InstCombine performs a range reduction-type operation on sdiv and the like. This fixes PR20916. llvm-svn: 217750	2014-09-15 08:15:28 +00:00
Chad Rosier	e668f61076	FileCheckize. NFC. llvm-svn: 217698	2014-09-12 17:55:16 +00:00
Hal Finkel	f83e1f7f66	[AlignmentFromAssumptions] Don't crash just because the target is 32-bit We used to crash processing any relevant @llvm.assume on a 32-bit target (because we'd ask SE to subtract expressions of differing types). I've copied our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get some meaningful test coverage here. llvm-svn: 217574	2014-09-11 08:40:17 +00:00
Hal Finkel	71b7084112	[AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment The routine that determines an alignment given some SCEV returns zero if the answer is unknown. In a case where we could determine the increment of an AddRec but not the starting alignment, we would compute the integer modulus by zero (which is illegal and traps). Prevent this by returning early if either the start or increment alignment is unknown (zero). llvm-svn: 217544	2014-09-10 21:05:52 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Bjorn Steinbrink	3c33150801	Add a test for hoisting instructions with metadata out of then/else blocks Test for the bug fixed in r215723. llvm-svn: 217453	2014-09-09 17:10:21 +00:00
Hal Finkel	93873cc10e	Check for all known bits on ret in InstCombine From a combination of @llvm.assume calls (and perhaps through other means, such as range metadata), it is possible that all bits of a return value might be known. Previously, InstCombine did not check for this (which is understandable given assumptions of constant propagation), but means that we'd miss simple cases where assumptions are involved. llvm-svn: 217346	2014-09-07 21:28:34 +00:00
Hal Finkel	7e1844940e	Make use of @llvm.assume from LazyValueInfo This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with the known-bits change (r217342), this requires feeding a "context" instruction pointer through many functions. Aside from a little refactoring to reuse the logic that turns predicates into constant ranges in LVI, the only new code is that which can 'merge' the range from an assumption into that otherwise computed. There is also a small addition to JumpThreading so that it can have LVI use assumptions in the same block as the comparison feeding a conditional branch. With this patch, we can now simplify this as expected: int foo(int a) { __builtin_assume(a > 5); if (a > 3) { bar(); return 1; } return 0; } llvm-svn: 217345	2014-09-07 20:29:59 +00:00
Hal Finkel	d67e463901	Add an AlignmentFromAssumptions Pass This adds a ScalarEvolution-powered transformation that updates load, store and memory intrinsic pointer alignments based on invariant((a+q) & b == 0) expressions. Many of the simple cases we can get with ValueTracking, but we still need something like this for the more complicated cases (such as those with an offset) that require some algebra. Note that gcc's __builtin_assume_aligned's optional third argument provides exactly for this kind of 'misalignment' offset for which this kind of logic is necessary. The primary motivation is to fixup alignments for vector loads/stores after vectorization (and unrolling). This pass is added to the optimization pipeline just after the SLP vectorizer runs (which, admittedly, does not preserve SE, although I imagine it could). Regardless, I actually don't think that the preservation matters too much in this case: SE computes lazily, and this pass won't issue any SE queries unless there are any assume intrinsics, so there should be no real additional cost in the common case (SLP does preserve DT and LoopInfo). llvm-svn: 217344	2014-09-07 20:05:11 +00:00
Hal Finkel	15aeaaf24a	Add additional patterns for @llvm.assume in ValueTracking This builds on r217342, which added the infrastructure to compute known bits using assumptions (@llvm.assume calls). That original commit added only a few patterns (to catch common cases related to determining pointer alignment); this change adds several other patterns for simple cases. r217342 contained that, for assume(v & b = a), bits in the mask that are known to be one, we can propagate known bits from the a to v. It also had a known-bits transfer for assume(a = b). This patch adds: assume(~(v & b) = a) : For those bits in the mask that are known to be one, we can propagate inverted known bits from the a to v. assume(v \| b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. assume(~(v \| b) = a): For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. assume(v ^ b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. For those bits in b that are known to be one, we can propagate inverted known bits from the a to v. assume(~(v ^ b) = a) : For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. For those bits in b that are known to be one, we can propagate known bits from the a to v. assume(v << c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v << c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >> c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v >> c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >=_s c) where c is non-negative: The sign bit of v is zero assume(v >_s c) where c is at least -1: The sign bit of v is zero assume(v <=_s c) where c is negative: The sign bit of v is one assume(v <_s c) where c is non-positive: The sign bit of v is one assume(v <=_u c): Transfer the known high zero bits assume(v <_u c): Transfer the known high zero bits (if c is know to be a power of 2, transfer one more) A small addition to InstCombine was necessary for some of the test cases. The problem is that when InstCombine was simplifying and, or, etc. it would fail to check the 'do I know all of the bits' condition before checking less specific conditions and would not fully constant-fold the result. I'm not sure how to trigger this aside from using assumptions, so I've just included the change here. llvm-svn: 217343	2014-09-07 19:21:07 +00:00
Hal Finkel	60db05896a	Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342	2014-09-07 18:57:58 +00:00
Hal Finkel	57f03dda49	Add functions for finding ephemeral values This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or indirectly), and thus will be removed prior to code generation, implying that they should be considered free for certain purposes (like inlining). The inliner's cost analysis, and a few other passes, have been updated to account for ephemeral values using the provided functionality. This functionality is important for the usability of @llvm.assume, because it limits the "non-local" side-effects of adding llvm.assume on inlining, loop unrolling, etc. (these are hints, and do not generate code, so they should not directly contribute to estimates of execution cost). llvm-svn: 217335	2014-09-07 13:49:57 +00:00
David Majnemer	6fe6ea740c	InstCombine: Remove a special case pattern The special case did not work when run under -reassociate and can easily be expressed by a further generalization of an existing pattern. llvm-svn: 217227	2014-09-05 06:09:24 +00:00
David Majnemer	c6ab01ecca	IndVarSimplify: Don't let LFTR compare against a poison value LinearFunctionTestReplace tries to use the next indvar to compare against when possible. However, it may be the case that the calculation for the next indvar has NUW/NSW flags and that it may only be safely used inside the loop. Using it in a comparison to calculate the exit condition could result in observing poison. This fixes PR20680. Differential Revision: http://reviews.llvm.org/D5174 llvm-svn: 217102	2014-09-03 23:03:18 +00:00
Robin Morisset	a47cb411dc	Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass) Fixes two latent bugs: - There was no fence inserted before expanded seq_cst load (unsound on Power) - There was only a fence release before seq_cst stores (again unsound, in particular on Power) It is not even clear if this is correct on ARM swift processors (where release fences are DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift as it is not clear whether it is incorrect. I would love to get documentation stating whether it is correct or not. These two bugs were not triggered because Power is not (yet) using this pass, and these behaviours happen to be (mostly?) working on ARM (although they completely butchered the semantics of the llvm IR). See: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html for an example of the problems that can be caused by the second of these bugs. I couldn't see a way of fixing these in a completely target-independent way without adding lots of unnecessary fences on ARM, hence the target-dependent parts of this patch. This patch implements the new target-dependent parts only for ARM (the default of not doing anything is enough for AArch64), other architectures will use this infrastructure in later patches. llvm-svn: 217076	2014-09-03 21:01:03 +00:00
Sanjay Patel	9433a28845	Preserve IR flags (nsw, nuw, exact, fast-math) in SLP vectorizer (PR20802). The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar instructions into vectors. But this isn't a simple copy - we need to take the intersection (the logical 'and') of the sets of flags on the scalars. The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after: http://reviews.llvm.org/D4015 http://llvm.org/viewvc/llvm-project?view=revision&revision=211339 The vast majority of changed files are existing tests that were not propagating IR flags, but I've also added a new test file for focused testing of IR flag possibilities. Differential Revision: http://reviews.llvm.org/D5172 llvm-svn: 217051	2014-09-03 17:40:30 +00:00
Yi Jiang	77a609b556	Generate extract for in-tree uses if the use is scalar operand in vectorized instruction. radar://18144665 llvm-svn: 216946	2014-09-02 21:00:39 +00:00
Matt Arsenault	9d412ed41e	Fix crash when looking up the addrspace of GEPs with vector types Patch by Björn Steinbrink llvm-svn: 216930	2014-09-02 18:47:54 +00:00
Andrea Di Biagio	b9de900788	Revert: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'. This reverts revision 216913; the new test added at revision 216913 caused regression failures on a couple of buildbots. llvm-svn: 216914	2014-09-02 17:22:49 +00:00
Andrea Di Biagio	7676fe1878	[APFloat] Fixed a bug in method 'fusedMultiplyAdd'. When folding a fused multiply-add builtin call, make sure that we propagate the correct result in the case where the addend is zero, and the two other operands are finite non-zero. Example: define double @test() { %1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0) ret double %1 } Before this patch, the instruction simplifier wrongly folded the builtin call in function @test to constant 'double 7.0'. With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and propagates the expected result (i.e. 56.0). Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence of NaN/Inf operands. This fixes PR20832. Differential Revision: http://reviews.llvm.org/D5152 llvm-svn: 216913	2014-09-02 16:44:56 +00:00
David Majnemer	49428105aa	LICM: Don't crash when an instruction is used by an unreachable BB Summary: BBs might contain non-LCSSA'd values after the LCSSA pass is run if they are unreachable from the entry block. Normally, the users of the instruction would be PHIs but the unreachable BBs have normal users; rewrite their uses to be undef values. An alternative fix could involve fixing this at LCSSA but that would require this invariant to hold after subsequent transforms. If a BB created an unreachable block, they would be in violation of this. This fixes PR19798. Differential Revision: http://reviews.llvm.org/D5146 llvm-svn: 216911	2014-09-02 16:22:00 +00:00
David Majnemer	d4cffcf073	SROA: Don't insert instructions before a PHI SROA may decide that it needs to insert a bitcast and would set it's insertion point before a PHI. This will create an invalid module right quick. Instead, choose the first insertion point in the basic block that holds our PHI. This fixes PR20822. Differential Revision: http://reviews.llvm.org/D5141 llvm-svn: 216891	2014-09-01 21:20:14 +00:00
David Majnemer	d2df50196f	Revert "Revert two GEP-related InstCombine commits" This reverts commit r216698 which reverted r216523 and r216598. We would attempt to perform the transformation even if the match() failed because, as a side effect, it would set V. This would trick us into believing that we correctly found a place to correctly apply the transform. An additional test case was added to getelementptr.ll so that we might not regress in the future. llvm-svn: 216890	2014-09-01 21:10:02 +00:00
Sanjay Patel	5ad239e15a	Add a convenience method to copy wrapping, exact, and fast-math flags (NFC). The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions. This patch adds a convenience method to make that operation easier because we need to do this in the loop vectorizer, SLP vectorizer, and possibly other places. Although this is a 'no functional change' patch, I've added a testcase to verify that the exact flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked in existing testcases. Differential Revision: http://reviews.llvm.org/D5138 llvm-svn: 216886	2014-09-01 18:44:57 +00:00
Chandler Carruth	18cee1defc	Fix a really bad miscompile introduced in r216865 - the else-if logic chain became completely broken here as all intrinsic users ended up being skipped, and the ones that seemed to be singled out were actually the exact wrong set. This is a great example of why long else-if chains can be easily confusing. Switch the entire code to use early exits and early continues to have simpler (and more importantly, correct) logic here, as well as fixing the reversed logic for detecting and continuing on lifetime intrinsics. I've also significantly cleaned up the test case and added another test case demonstrating an example where the optimization is not (trivially) safe to perform. llvm-svn: 216871	2014-09-01 10:09:18 +00:00
Renato Golin	86a6c3f269	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. Re-applying again due to MSVC 2013 being minimum requirement, and this patch having C++11 that MSVC 2012 didn't support. Fixes PR20655. llvm-svn: 216870	2014-09-01 10:00:17 +00:00
Hal Finkel	0c083024f0	Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata This feeds AA through the IFI structure into the inliner so that AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which functions only access their arguments (instead of just hard-coding some knowledge of memory intrinsics). Most of the information is only available from BasicAA; this is important for preserving alias scoping information for target-specific intrinsics when doing the noalias parameter attribute to metadata conversion. llvm-svn: 216866	2014-09-01 09:01:39 +00:00
Nick Lewycky	fc243d54d2	Ignore lifetime intrinsics in use list for MemCpyOptimizer. Patch by Luqman Aden, review by Hal Finkel. llvm-svn: 216865	2014-09-01 06:03:11 +00:00
Hal Finkel	cbb85f249e	Fix AddAliasScopeMetadata again - alias.scope must be a complete description I thought that I had fixed this problem in r216818, but I did not do a very good job. The underlying issue is that when we add alias.scope metadata we are asserting that this metadata completely describes the aliasing relationships within the current aliasing scope domain, and so in the context of translating noalias argument attributes, the pointers must all be based on noalias arguments (as underlying objects) and have no other kind of underlying object. In r216818 excluding appropriate accesses from getting alias.scope metadata is done by looking for underlying objects that are not identified function-local objects -- but that's wrong because allocas, etc. are also function-local objects and we need to explicitly check that all underlying objects are the noalias arguments for which we're adding metadata aliasing scopes. This fixes the underlying-object check for adding alias.scope metadata, and does some refactoring of the related capture-checking eligibility logic (and adds more comments; hopefully making everything a bit clearer). Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the feature is still disabled by default). llvm-svn: 216863	2014-09-01 04:26:40 +00:00
Hal Finkel	a3708df41a	Fix AddAliasScopeMetadata to not add scopes when deriving from unknown pointers The previous implementation of AddAliasScopeMetadata, which adds noalias metadata to preserve noalias parameter attribute information when inlining had a flaw: it would add alias.scope metadata to accesses which might have been derived from pointers other than noalias function parameters. This was incorrect because even some access known not to alias with all noalias function parameters could easily alias with an access derived from some other pointer. Instead, when deriving from some unknown pointer, we cannot add alias.scope metadata at all. This fixes a miscompile of the test-suite's tramp3d-v4. Furthermore, we cannot add alias.scope to functions unless we know they access only argument-derived pointers (currently, we know this only for memory intrinsics). Also, we fix a theoretical problem with using the NoCapture attribute to skip the capture check. This is incorrect (as explained in the comment added), but would not matter in any code generated by Clang because we get only inferred nocapture attributes in Clang-generated IR. This functionality is not yet enabled by default. llvm-svn: 216818	2014-08-30 12:48:33 +00:00
David Majnemer	5e96f1b4c8	InstCombine: Try harder to combine icmp instructions consider: (and (icmp X, Y), (and Z, (icmp A, B))) It may be possible to combine (icmp X, Y) with (icmp A, B). If we successfully combine, create an 'and' instruction with Z. This fixes PR20814. N.B. There is room for improvement after this change but I'm not convinced it's worth chasing yet. llvm-svn: 216814	2014-08-30 06:18:20 +00:00
Robin Morisset	163ef0402a	Relax the constraint more in MemoryDependencyAnalysis.cpp Even loads/stores that have a stronger ordering than monotonic can be safe. The rule is no release-acquire pair on the path from the QueryInst, assuming that the QueryInst is not atomic itself. llvm-svn: 216771	2014-08-29 20:32:58 +00:00
Matt Arsenault	85cbc7e371	Make fabs safe to speculatively execute llvm-svn: 216736	2014-08-29 16:01:17 +00:00
David Majnemer	400e725bde	Revert two GEP-related InstCombine commits This reverts commit r216523 and r216598; people have reported regressions. llvm-svn: 216698	2014-08-29 00:06:43 +00:00
Reid Kleckner	febb279c9c	Don't promote byval pointer arguments when padding matters Don't promote byval pointer arguments when when their size in bits is not equal to their alloc size in bits. This can happen for x86_fp80, where the size in bits is 80 but the alloca size in bits in 128. Promoting these types can break passing unions of x86_fp80s and other types. Patch by Thomas Jablin! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D5057 llvm-svn: 216693	2014-08-28 22:42:00 +00:00
Erik Eckstein	8354cfaf95	Fix: SLPVectorizer tried to move an instruction which was replaced by a vector instruction. For a detailed description of the problem see the comment in the test file. The problematic moveBefore() calls are not required anymore because the new scheduling algorithm ensures a correct ordering anyway. llvm-svn: 216656	2014-08-28 07:04:02 +00:00
David Majnemer	76d06bc613	InstSimplify: Move a transform from InstCombine to InstSimplify Several combines involving icmp (shl C2, %X) C1 can be simplified without introducing any new instructions. Move them to InstSimplify; while we are at it, make them more powerful. llvm-svn: 216642	2014-08-28 03:34:28 +00:00
David Majnemer	22ccfc4484	InstCombine: Combine gep X, (Y-X) to Y We try to perform this transform in InstSimplify but we aren't always able to. Sometimes, we need to insert a bitcast if X and Y don't have the same time. llvm-svn: 216598	2014-08-27 20:08:37 +00:00
David Majnemer	11ca2971e8	InstSimplify: Don't simplify gep X, (Y-X) to Y if types differ It's incorrect to perform this simplification if the types differ. A bitcast would need to be inserted for this to work. This fixes PR20771. llvm-svn: 216597	2014-08-27 20:08:34 +00:00
Nico Weber	48c82400ed	Reland r216439 215441, majnemer has a real fix for PR20771. llvm-svn: 216586	2014-08-27 20:06:19 +00:00
Nico Weber	7b343e3cc6	Revert r216439 (and r216441, else the former doesn't revert cleanly). It caused PR 20771. I'll land a test on the clang side. llvm-svn: 216582	2014-08-27 20:00:13 +00:00
David Majnemer	d6d1671c1e	InstSimplify: Compute comparison ranges for left shift instructions 'shl nuw CI, x' produces [CI, CI << CLZ(CI)] 'shl nsw CI, x' produces [CI << CLO(CI)-1, CI] if CI is negative 'shl nsw CI, x' produces [CI, CI << CLZ(CI)-1] if CI is non-negative llvm-svn: 216570	2014-08-27 18:03:46 +00:00
Michael Zolotukhin	5dc466b863	[SLP] Re-enable vectorization of GEP expressions (re-apply r210342 with a fix). llvm-svn: 216549	2014-08-27 15:01:18 +00:00
David Majnemer	54e97d5dc0	InstCombine: Optimize GEP's involving ptrtoint better We supported transforming: (gep i8* X, -(ptrtoint Y)) to: (inttoptr (sub (ptrtoint X), (ptrtoint Y))) However, this only fired if 'X' had type i8*. Generalize this to support various types of different sizes. This results in much better CodeGen, especially for pointers to packed structs. llvm-svn: 216523	2014-08-27 05:16:04 +00:00
Joerg Sonnenberger	cb5674b9c2	Revert r210342 and r210343, add test case for the crasher. PR 20642. llvm-svn: 216475	2014-08-26 19:06:41 +00:00
David Majnemer	788d0ab8c8	InstSimplify: Fold gep X, (sub 0, ptrtoint(X)) to null Save InstCombine some work if we can perform this fold during InstSimplify. llvm-svn: 216441	2014-08-26 07:08:03 +00:00
David Majnemer	bc4981323f	InstSimplify: Simplify trivial pointer expressions like b + (e - b) consider: long long f(long long b, long long e) { return b + (e - b); } we would lower this to something like: define i64 @f(i64* %b, i64* %e) { %1 = ptrtoint i64* %e to i64 %2 = ptrtoint i64* %b to i64 %3 = sub i64 %1, %2 %4 = ashr exact i64 %3, 3 %5 = getelementptr inbounds i64* %b, i64 %4 ret i64* %5 } This should fold away to just 'e'. N.B. This adds m_SpecificInt as a convenient way to match against a particular 64-bit integer when using LLVM's match interface. llvm-svn: 216439	2014-08-26 05:55:16 +00:00
Reid Kleckner	3715461b48	musttail: Don't eliminate varargs packs if there is a forwarding call Also clean up and beef up this grep test for the feature. llvm-svn: 216425	2014-08-26 00:59:51 +00:00
Reid Kleckner	8349864dbd	Declare that musttail calls in variadic functions forward the ellipsis Summary: There is no functionality change here except in the way we assemble and dump musttail calls in variadic functions. There's really no need to separate out the bits for musttail and "is forwarding varargs" on call instructions. A musttail call by definition has to forward the ellipsis or it would fail verification. Reviewers: chandlerc, nlewycky Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4892 llvm-svn: 216423	2014-08-26 00:33:28 +00:00
Reid Kleckner	e6e88f99b3	ArgPromotion: Don't touch variadic functions Adding, removing, or changing non-pack parameters can change the ABI classification of pack parameters. Clang and other frontends encode the classification in the IR of the call site, but the callee side determines it dynamically based on the number of registers consumed so far. Changing the prototype affects the number of registers consumed would break such code. Dead argument elimination performs a similar task and already has a similar check to avoid this problem. Patch by Thomas Jablin! llvm-svn: 216421	2014-08-25 23:58:48 +00:00
Bruno Cardoso Lopes	e2a1fa35df	Remove dangling initializers in GlobalDCE GlobalDCE deletes global vars and updates their initializers to nullptr while leaving underlying constants to be cleaned up later by its uses. The clean up may never happen, fix this by forcing it every time it's safe to destroy constants. Final patch by Rafael Espindola http://reviews.llvm.org/D4931 <rdar://problem/17523868> llvm-svn: 216390	2014-08-25 17:51:14 +00:00
Karthik Bhat	7f33ff7dea	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
David Majnemer	0ffccf7fb5	InstCombine: Properly optimize or'ing bittests together CFE, with -03, would turn: bool f(unsigned x) { bool a = x & 1; bool b = x & 2; return a \| b; } into: %1 = lshr i32 %x, 1 %2 = or i32 %1, %x %3 = and i32 %2, 1 %4 = icmp ne i32 %3, 0 This sort of thing exposes a nasty pathology in GCC, ICC and LLVM. Instead, we would rather want: %1 = and i32 %x, 3 %2 = icmp ne i32 %1, 0 Things get a bit more interesting in the following case: %1 = lshr i32 %x, %y %2 = or i32 %1, %x %3 = and i32 %2, 1 %4 = icmp ne i32 %3, 0 Replacing it with the following sequence is better: %1 = shl nuw i32 1, %y %2 = or i32 %1, 1 %3 = and i32 %2, %x %4 = icmp ne i32 %3, 0 This sequence is preferable because %1 doesn't involve %x and could potentially be hoisted out of loops if it is invariant; only perform this transform in the non-constant case if we know we won't increase register pressure. llvm-svn: 216343	2014-08-24 09:10:57 +00:00
Yunzhong Gao	300bdb35d4	Add a test case for SROA where the store size is bigger than slice size. The test case was fixed in r216248. llvm-svn: 216303	2014-08-22 23:27:04 +00:00
Jingyue Wu	ec33fa9aca	[SROA] Fold a PHI node if all its incoming values are the same Summary: Fixes PR20425. During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote. Test Plan: Added three more tests in phi-and-select.ll Reviewers: nlewycky, eliben, meheff, chandlerc Reviewed By: chandlerc Subscribers: zinovy.nis, hfinkel, baldrick, llvm-commits Differential Revision: http://reviews.llvm.org/D4659 llvm-svn: 216299	2014-08-22 22:45:57 +00:00
David Majnemer	49775e0173	InstCombine: Don't unconditionally preserve 'nuw' when shrinking constants Consider: %add = add nuw i32 %a, -16777216 %and = and i32 %add, 255 Regardless of whether or not we demand the sign bit of %add, we cannot replace -16777216 with 2130706432 without also removing 'nuw' from the instruction. llvm-svn: 216273	2014-08-22 17:11:04 +00:00
David Majnemer	0e6c986696	InstCombine: sub nsw %x, C -> add nsw %x, -C if C isn't INT_MIN We can preserve nsw during this transform if -C won't overflow. llvm-svn: 216269	2014-08-22 16:41:23 +00:00
David Majnemer	42b83a5e36	InstCombine: Don't unconditionally preserve 'nsw' when shrinking constants Consider: %add = add nsw i32 %a, -16777216 %and = and i32 %add, 255 Regardless of whether or not we demand the sign bit of %add, we cannot replace -16777216 with 2130706432 without also removing 'nsw' from the instruction. This fixes PR20377. llvm-svn: 216261	2014-08-22 07:56:32 +00:00
Erik Eckstein	b49d7abb7b	fix: SLPVectorizer crashes for unreachable blocks containing not schedulable instructions. In unreachable blocks it's legal to have instructions like "%x = op %x". Such instuctions are not schedulable. Therefore the SLPVectorizer has to check for unreachable blocks and ignore them. Fixes bug 20646. llvm-svn: 216256	2014-08-22 01:18:39 +00:00
David Majnemer	97ddca3224	ValueTracking: Figure out more bits when looking at add/sub Given something like X01XX + X01XX, we know that the result must look like X1XXX. Adapted from a patch by Richard Smith, test-case written by me. llvm-svn: 216250	2014-08-22 00:40:43 +00:00
Reid Kleckner	c36f48f08a	SROA: Handle a case of store size being smaller than allocation size In this case, we are creating an x86_fp80 slice for a union from C where the padding bytes may contain real data. An x86_fp80 alloca is 16 bytes, and that's just fine. We can't, however, use regular loads and stores to access the slice, because the store size is only 10 bytes / 80 bits. Instead, use memcpy and memset. Fixes PR18726. Reviewed By: chandlerc Differential Revision: http://reviews.llvm.org/D5012 llvm-svn: 216248	2014-08-22 00:09:56 +00:00
David Blaikie	2f3f76fdb1	Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks. Somewhat unnoticed in the original implementation of discriminators, but it could cause instructions to end up in new, small, DW_TAG_lexical_blocks due to the use of DILexicalBlock to track discriminator changes. Instead, use DILexicalBlockFile which we already use to track file changes without introducing new scopes, so it works well to track discriminator changes in the same way. llvm-svn: 216239	2014-08-21 22:45:21 +00:00
Robin Morisset	59c23cd946	Rename AtomicExpandLoadLinked into AtomicExpand AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of a group that aim at making it more target-independent. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html for details The command line option is "atomic-expand" llvm-svn: 216231	2014-08-21 21:50:01 +00:00
Erik Verbruggen	2b98bd2a80	Reassociate x + -0.1234 * y into x - 0.1234 * y This does not require -ffast-math, and it gives CSE/GVN more options to eliminate duplicate expressions in, e.g.: return ((x + 0.1234 * y) * (x - 0.1234 * y)); Differential Revision: http://reviews.llvm.org/D4904 llvm-svn: 216169	2014-08-21 10:45:30 +00:00
Zinovy Nis	0a36cba29d	[INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions. Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like: int N = 100; float *A; void foo(int x0, int x1) { float A_cur = &A[0][0]; float * A_next = &A[1][0]; for(int x = x0; x < x1; ++x). { // Currently only [x+N] case is widened. Others 2 cases lead to sext. // This patch fixes it, so all 3 cases do not need sext. const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N]; A_next[x] = div; } } ... > clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o - Differential Revision: http://reviews.llvm.org/D4695 llvm-svn: 216160	2014-08-21 08:25:45 +00:00
David Majnemer	5d1aeba2ea	InstCombine: Fold ((A \| B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1 Adapted from a patch by Richard Smith, test-case written by me. llvm-svn: 216157	2014-08-21 05:14:48 +00:00
Jiangning Liu	950844fadb	Fix a bug around truncating vector in const prop. In constant folding stage, "TRUNC" can't handle vector data type. llvm-svn: 216149	2014-08-21 02:12:35 +00:00
Yi Jiang	1a4e73d7bf	New InstCombine pattern: (icmp ult/ule (A + C1), C3) \| (icmp ult/ule (A + C2), C3) to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3) under certain condition llvm-svn: 216135	2014-08-20 22:55:40 +00:00
David Majnemer	42158f3eea	InstCombine: Annotate sub with nuw when we prove it's safe We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is negative and the right-hand side is non-negative. llvm-svn: 216045	2014-08-20 07:17:31 +00:00
David Majnemer	57d5bc8849	InstCombine: Annotate sub with nsw when we prove it's safe We can prove that a 'sub' can be a 'sub nsw' under certain conditions: - The sign bits of the operands is the same. - Both operands have more than 1 sign bit. The subtraction cannot be a signed overflow in either case. llvm-svn: 216037	2014-08-19 23:36:30 +00:00
Renato Golin	06d601fb3e	Revert "Small refactor on VectorizerHint for deduplication" This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness. llvm-svn: 215999	2014-08-19 18:08:50 +00:00
Renato Golin	dd6394d833	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. llvm-svn: 215994	2014-08-19 17:30:43 +00:00
Mayur Pandey	960507beb4	InstCombine: ((A & ~B) ^ (~A & B)) to A ^ B Proof using CVC3 follows: $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR((A & ~B),(~A & B)) = BVXOR(A,B); $ cvc3 t.cvc Valid. Differential Revision: http://reviews.llvm.org/D4898 llvm-svn: 215974	2014-08-19 08:19:19 +00:00
Robin Morisset	9e98e7f7fc	Answer to Philip Reames comments - add check for volatile (probably unneeded, but I agree that we should be conservative about it). - strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered). - separate a condition in two. - lengthen comment about aliasing and loads - add tests in GVN/atomic.ll llvm-svn: 215943	2014-08-18 22:18:14 +00:00
Robin Morisset	4ffe8aaa69	Weak relaxing of the constraints on atomics in MemoryDependencyAnalysis Monotonic accesses do not have to kill the analysis, as long as the QueryInstr is not itself atomic. llvm-svn: 215942	2014-08-18 22:18:11 +00:00
Owen Anderson	a4428aa484	Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set. While this might seem like an obvious canonicalization, there is one subtle problem with it. The result of the original expression is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN. This means that the new expression is strictly more defined than the original one. One unfortunate consequence of this is that the transform is not reversible! It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it. Thus, targets that prefer the original form of the expression cannot reverse the transform to recover it. Another way to think of it is that the transform has lost source-level information (the fast math flags), which is undesirable. llvm-svn: 215825	2014-08-17 03:51:29 +00:00
David Majnemer	f9a095d606	InstCombine: Combine mul with div. We can combne a mul with a div if one of the operands is a multiple of the other: %mul = mul nsw nuw %a, C1 %ret = udiv %mul, C2 => %ret = mul nsw %a, (C1 / C2) This can expose further optimization opportunities if we end up multiplying or dividing by a power of 2. Consider this small example: define i32 @f(i32 %a) { %mul = mul nuw i32 %a, 14 %div = udiv exact i32 %mul, 7 ret i32 %div } which gets CodeGen'd to: imull $14, %edi, %eax imulq $613566757, %rax, %rcx shrq $32, %rcx subl %ecx, %eax shrl %eax addl %ecx, %eax shrl $2, %eax retq We can now transform this into: define i32 @f(i32 %a) { %shl = shl nuw i32 %a, 1 ret i32 %shl } which gets CodeGen'd to: leal (%rdi,%rdi), %eax retq This fixes PR20681. llvm-svn: 215815	2014-08-16 08:55:06 +00:00
Hal Finkel	61c386126b	Copy noalias metadata from call sites to inlined instructions When a call site with noalias metadata is inlined, that metadata can be propagated directly to the inlined instructions (only those that might access memory because it is not useful on the others). Prior to inlining, the noalias metadata could express that a call would not alias with some other memory access, which implies that no instruction within that called function would alias. By propagating the metadata to the inlined instructions, we preserve that knowledge. This should complete the enhancements requested in PR20500. llvm-svn: 215676	2014-08-14 21:09:37 +00:00
Hal Finkel	d2dee16c27	Add noalias metadata for general calls (not just memory intrinsics) during inlining When preserving noalias function parameter attributes by adding noalias metadata in the inliner, we should do this for general function calls (not just memory intrinsics). The logic is very similar to what already existed (except that we want to add this metadata even for functions taking no relevant parameters). This metadata can be used by ModRef queries in the caller after inlining. This addresses the first part of PR20500. Adding noalias metadata during inlining is still turned off by default. llvm-svn: 215657	2014-08-14 16:44:03 +00:00
Chad Rosier	11ab941644	[Reassociation] Add support for reassociation with unsafe algebra. Vector instructions are (still) not supported for either integer or floating point. Hopefully, that work will be landed shortly. llvm-svn: 215647	2014-08-14 15:23:01 +00:00
David Majnemer	698dca0b95	InstCombine: ((A \| ~B) ^ (~A \| B)) to A ^ B Proof using CVC3 follows: $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR((A \| ~B),(~A \|B)) = BVXOR(A,B); $ cvc3 t.cvc Valid. Patch by Mayur Pandey! Differential Revision: http://reviews.llvm.org/D4883 llvm-svn: 215621	2014-08-14 06:46:25 +00:00
David Majnemer	f1eda23514	Added InstCombine Transform for ((B \| C) & A) \| B -> B \| (A & C) Transform ((B \| C) & A) \| B --> B \| (A & C) Z3 Link: http://rise4fun.com/Z3/hP6p Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D4865 llvm-svn: 215619	2014-08-14 06:41:38 +00:00
Jan Vesely	0cd3ec6cfa	utils: Fix segfault in flattencfg v2: continue iterating through the rest of the bb use for loop v3: initialize FlattenCFG pass in ScalarOps add test v4: split off initializing flattencfg to a separate patch add comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215574	2014-08-13 20:31:53 +00:00
Chandler Carruth	0fb998110a	[optnone] Make the optnone attribute effective at suppressing function attribute and function argument attribute synthesizing and propagating. As with the other uses of this attribute, the goal remains a best-effort (no guarantees) attempt to not optimize the function or assume things about the function when optimizing. This is particularly useful for compiler testing, bisecting miscompiles, triaging things, etc. I was hitting specific issues using optnone to isolate test code from a test driver for my fuzz testing, and this is one step of fixing that. llvm-svn: 215538	2014-08-13 10:49:33 +00:00
Karthik Bhat	a4a4db91be	InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b) Correctness proof of the transform using CVC3- $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR(A \| B, BVXOR(A,B) ) = A & B; $ cvc3 t.cvc Valid. llvm-svn: 215524	2014-08-13 05:13:14 +00:00
Matt Arsenault	4815f09bbe	Allwo bitcast + struct GEP transform to work with addrspacecast llvm-svn: 215467	2014-08-12 19:46:13 +00:00
David Majnemer	ab07f00c64	InstCombine: Combine (add (and %a, %b) (or %a, %b)) to (add %a, %b) What follows bellow is a correctness proof of the transform using CVC3. $ < t.cvc A, B : BITVECTOR(32); QUERY BVPLUS(32, A & B, A \| B) = BVPLUS(32, A, B); $ cvc3 < t.cvc Valid. llvm-svn: 215400	2014-08-11 22:32:02 +00:00
Jiangning Liu	40b04fd994	In LVI(Lazy Value Info), originally value on a BB can only be caculated once, and the lattice will be updated to be a state other than "undefined". This limiation could miss some opportunities of lowering "overdefined" to be an even accurate value. So this patch ask the algorithm to try to lower the lattice value again even if the value has been lowered to be "overdefined". llvm-svn: 215343	2014-08-11 05:02:04 +00:00
James Molloy	65b08f5e46	[LoopVectorizer] Enable support for floating-point subtraction reductions llvm-svn: 215200	2014-08-08 12:41:08 +00:00
David Majnemer	fe8c7540b0	GlobalOpt: Optimize in the face of insertvalue/extractvalue GlobalOpt didn't know how to simulate InsertValueInst or ExtractValueInst. Optimizing these is pretty straightforward. N.B. This came up when looking at clang's IRGen for MS ABI member pointers; they are represented as aggregates. llvm-svn: 215184	2014-08-08 05:50:43 +00:00
Arnold Schwaighofer	4fb3c47456	SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment We were using the pointer type which is incorrect. llvm-svn: 215162	2014-08-07 22:47:27 +00:00
Owen Anderson	6c19ab1b5d	Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In this case, the code path dealing with vector promotion was missing the explicit checks for lifetime intrinsics that were present on the corresponding integer promotion path. llvm-svn: 215148	2014-08-07 21:07:35 +00:00
Rui Ueyama	c487f7728e	Revert "r214897 - Remove dead zero store to calloc initialized memory" It broke msan. llvm-svn: 214989	2014-08-06 19:30:38 +00:00
Philip Reames	00c9b6461f	Remove dead zero store to calloc initialized memory Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897	2014-08-05 17:48:20 +00:00
James Molloy	2b8933c354	Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859	2014-08-05 12:30:34 +00:00
Manman Ren	062f58d550	[SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion. When we have a covered lookup table, make sure we don't delete PHINodes that are cached in PHIs. rdar://17887153 llvm-svn: 214642	2014-08-02 23:41:54 +00:00
Erik Eckstein	26a1bf7d84	fix bug 20513 - Crash in SLP Vectorizer llvm-svn: 214638	2014-08-02 19:39:42 +00:00
Tyler Nowicki	064896bbc5	Add diagnostics to the vectorizer cost model. When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision. Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive. Reviewed by Arnold Schwaighofer llvm-svn: 214599	2014-08-02 00:14:03 +00:00
Peter Collingbourne	e52646cd80	PartiallyInlineLibCalls: Check sqrt result type before transforming it. Some configure scripts declare this with the wrong prototype, which can lead to an assertion failure. llvm-svn: 214593	2014-08-01 23:21:21 +00:00
Erik Eckstein	c80e1dc081	SLPVectorizer: improved scheduling algorithm. llvm-svn: 214494	2014-08-01 09:20:42 +00:00
Suyog Sarda	56c9a87035	This patch implements transform for pattern "(A & ~B) ^ (~A) -> ~(A & B)". Differential Revision: http://reviews.llvm.org/D4653 llvm-svn: 214479	2014-08-01 05:07:20 +00:00
Suyog Sarda	1c6c2f69f7	This patch implements transform for pattern "(A \| B) & ((~A) ^ B) -> (A & B)". Differential Revision: http://reviews.llvm.org/D4628 llvm-svn: 214478	2014-08-01 04:59:26 +00:00
Suyog Sarda	52324c82cc	This patch implements transform for pattern "( A & (~B)) \| (A ^ B) -> (A ^ B)" Differential Revision: http://reviews.llvm.org/D4652 llvm-svn: 214477	2014-08-01 04:50:31 +00:00
Suyog Sarda	16d646594e	This patch implements transform for pattern "(A & B) \| ((~A) ^ B) -> (~A ^ B)". Patch Credit to Ankit Jain ! Differential Revision: http://reviews.llvm.org/D4655 llvm-svn: 214476	2014-08-01 04:41:43 +00:00
Tyler Nowicki	b5a65395cc	Improve the remark generated for -Rpass-missed. The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss. Reviewed by Arnold Schwaighofer` llvm-svn: 214445	2014-07-31 21:22:22 +00:00
Tyler Nowicki	9fe497fcac	Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable. Reviewed by Arnold Schwaighofer llvm-svn: 214440	2014-07-31 21:02:40 +00:00
David Majnemer	a92687d636	InstCombine: Correctly propagate NSW/NUW for x-(-A) -> x+A We can only propagate the nsw bits if both subtraction instructions are marked with the appropriate bit. N.B. We only propagate the nsw bit in InstCombine because the nuw case is already handled in InstSimplify. This fixes PR20189. llvm-svn: 214385	2014-07-31 04:49:29 +00:00
David Majnemer	cd4fbcd1bb	InstSimplify: Simplify (X - (0 - Y)) if the second sub is NUW If the NUW bit is set for 0 - Y, we know that all values for Y other than 0 would produce a poison value. This allows us to replace (0 - Y) with 0 in the expression (X - (0 - Y)) which will ultimately leave us with X. This partially fixes PR20189. llvm-svn: 214384	2014-07-31 04:49:18 +00:00
Rafael Espindola	464fe024c5	Use "weak alias" instead of "alias weak" Before this patch we had @a = weak global ... but @b = alias weak ... The patch changes aliases to look more like global variables. Looking at some really old code suggests that the reason was that the old bison based parser had a reduction for alias linkages and another one for global variable linkages. Putting the alias first avoided the reduce/reduce conflict. The days of the old .ll parser are long gone. The new one parses just "linkage" and a later check is responsible for deciding if a linkage is valid in a given context. llvm-svn: 214355	2014-07-30 22:51:54 +00:00
David Majnemer	42af3601c2	InstCombine: Simplify (A ^ B) or/and (A ^ B ^ C) While we can already transform A \| (A ^ B) into A \| B, things get bad once we have (A ^ B) \| (A ^ B ^ Cst) because reassociation will morph this into (A ^ B) \| ((A ^ Cst) ^ B). Our existing patterns fail once this happens. To fix this, we add a new pattern which looks through the tree of xor binary operators to see that, in fact, there exists a redundant xor operation. What follows bellow is a correctness proof of the transform using CVC3. $ cat t.cvc A, B, C : BITVECTOR(64); QUERY BVXOR(A, B) \| BVXOR(BVXOR(B, C), A) = BVXOR(A, B) \| C; QUERY BVXOR(BVXOR(A, C), B) \| BVXOR(A, B) = BVXOR(A, B) \| C; QUERY BVXOR(A, B) & BVXOR(BVXOR(B, C), A) = BVXOR(A, B) & ~C; QUERY BVXOR(BVXOR(A, C), B) & BVXOR(A, B) = BVXOR(A, B) & ~C; $ cvc3 < t.cvc Valid. Valid. Valid. Valid. llvm-svn: 214342	2014-07-30 21:26:37 +00:00
Chad Rosier	78f41b3ca7	SLP Vectorizer: Canonicalize tree operands of commutitive binary operands. llvm-svn: 214338	2014-07-30 21:07:56 +00:00
Rafael Espindola	d07cf400ab	SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics. The lifetime intrinsics need some work in order to make it clear which optimizations are or are not valid. For now dropping this optimization avoids a miscompilation. Patch by Björn Steinbrink. llvm-svn: 214336	2014-07-30 21:04:00 +00:00
Tim Northover	e2239ff3eb	CodeGenPrep: fall back to MVT::Other if instruction's type isn't an EVT. The test being performed is just an approximation anyway, so it really shouldn't crash when things don't go entirely as expected. Should fix PR20474. llvm-svn: 214177	2014-07-29 10:20:22 +00:00
Hal Finkel	f5867a79c5	Canonicalization for @llvm.assume Adds simple logical canonicalization of assumption intrinsics to instcombine, currently: - invariant(a && b) -> invariant(a); invariant(b) - invariant(!(a \|\| b)) -> invariant(!a); invariant(!b) llvm-svn: 213977	2014-07-25 21:45:17 +00:00
Hal Finkel	930469107d	Add @llvm.assume, lowering, and some basic properties This is the first commit in a series that add an @llvm.assume intrinsic which can be used to provide the optimizer with a condition it may assume to be true (when the control flow would hit the intrinsic call). Some basic properties are added here: - llvm.invariant(true) is dead. - llvm.invariant(false) is unreachable (this directly corresponds to the documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef). The intrinsic is tagged as writing arbitrarily, in order to maintain control dependencies. BasicAA has been updated, however, to return NoModRef for any particular location-based query so that we don't unnecessarily block code motion. llvm-svn: 213973	2014-07-25 21:13:35 +00:00
Hal Finkel	ff0bcb60c9	Convert noalias parameter attributes into noalias metadata during inlining This functionality is currently turned off by default. Part of the motivation for introducing scoped-noalias metadata is to enable the preservation of noalias parameter attribute information after inlining. Sometimes this can be inferred from the code in the caller after inlining, but often we simply lose valuable information. The overall process if fairly simple: 1. Create a new unqiue scope domain. 2. For each (used) noalias parameter, create a new alias scope. 3. For each pointer, collect the underlying objects. Add a noalias scope for each noalias parameter from which we're not derived (and has not been captured prior to that point). 4. Add an alias.scope for each noalias parameter from which we might be derived (or has been captured before that point). Note that the capture checks apply only if one of the underlying objects is not an identified function-local object. llvm-svn: 213949	2014-07-25 15:50:08 +00:00
Mark Heffernan	8ec1474f7f	After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor hint) the loop unroller replaces the llvm.loop.unroll.count metadata with llvm.loop.unroll.disable metadata to prevent any subsequent unrolling passes from unrolling more than the hint indicates. This patch fixes an issue where loop unrolling could be disabled for other loops as well which share the same llvm.loop metadata. llvm-svn: 213900	2014-07-24 22:36:40 +00:00
Manman Ren	29a2005596	Try to fix the bots again by moving test to X86 directory. llvm-svn: 213884	2014-07-24 17:57:09 +00:00
Manman Ren	a8bc9a4c36	Try to fix the bots. If this does not work, I am going to move it to X86 directory. llvm-svn: 213880	2014-07-24 17:18:33 +00:00
Hal Finkel	9414665a3b	Add scoped-noalias metadata This commit adds scoped noalias metadata. The primary motivations for this feature are: 1. To preserve noalias function attribute information when inlining 2. To provide the ability to model block-scope C99 restrict pointers Neither of these two abilities are added here, only the necessary infrastructure. In fact, there should be no change to existing functionality, only the addition of new features. The logic that converts noalias function parameters into this metadata during inlining will come in a follow-up commit. What is added here is the ability to generally specify noalias memory-access sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA nodes: !scope0 = metadata !{ metadata !"scope of foo()" } !scope1 = metadata !{ metadata !"scope 1", metadata !scope0 } !scope2 = metadata !{ metadata !"scope 2", metadata !scope0 } !scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 } !scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 } Loads and stores can be tagged with an alias-analysis scope, and also, with a noalias tag for a specific scope: ... = load %ptr1, !alias.scope !{ !scope1 } ... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 } When evaluating an aliasing query, if one of the instructions is associated with an alias.scope id that is identical to the noalias scope associated with the other instruction, or is a descendant (in the scope hierarchy) of the noalias scope associated with the other instruction, then the two memory accesses are assumed not to alias. Note that is the first element of the scope metadata is a string, then it can be combined accross functions and translation units. The string can be replaced by a self-reference to create globally unqiue scope identifiers. [Note: This overview is slightly stylized, since the metadata nodes really need to just be numbers (!0 instead of !scope0), and the scope lists are also global unnamed metadata.] Existing noalias metadata in a callee is "cloned" for use by the inlined code. This is necessary because the aliasing scopes are unique to each call site (because of possible control dependencies on the aliasing properties). For example, consider a function: foo(noalias a, noalias b) { a = b; } that gets inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } -- now just because we know that a1 does not alias with b1 at the first call site, and a2 does not alias with b2 at the second call site, we cannot let inlining these functons have the metadata imply that a1 does not alias with b2. llvm-svn: 213864	2014-07-24 14:25:39 +00:00
Manman Ren	edc60376ed	SimplifyCFG: fix a bug in switch to table conversion We use gep to access the global array "switch.table", and the table index should be treated as unsigned. When the highest bit is 1, this commit zero-extends the index to an integer type with larger size. For a switch on i2, we used to generate: %switch.tableidx = sub i2 %0, -2 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate %switch.tableidx = sub i2 %0, -2 %switch.tableidx.zext = zext i2 %switch.tableidx to i3 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext rdar://17735071 llvm-svn: 213815	2014-07-23 23:13:23 +00:00
David Blaikie	8e9cfa5497	ArgPromo+DebugInfo: Handle updating debug info over multiple applications of argument promotion. While the subprogram map cache used by Dead Argument Elimination works there, I made a mistake when reusing it for Argument Promotion in r212128 because ArgPromo may transform functions more than once whereas DAE transforms each function only once, removing all the dead arguments in one go. To address this, ensure that the map is updated after each argument promotion. In retrospect it might be a little wasteful to create a map of all subprograms when only handling a single CGSCC, but the alternative is walking the debug info for each function in the CGSCC that gets updated. It's not clear to me what the right tradeoff is there, but since the current tradeoff seems to be working OK (and the code to keep things updated is very cheap), let's stick with that for now. llvm-svn: 213805	2014-07-23 22:09:29 +00:00
David Blaikie	f997c6f90b	Test debug info in arg promotion with an actual promotion case, rather than a degenerate arg promotion that's actually DAE performed by ArgPromo Also the debug location I had here was bogus, describing the location of the call site as in the callee - and unnecessary, so just drop it. llvm-svn: 213803	2014-07-23 21:30:59 +00:00
Mark Heffernan	9e112443b6	Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full). llvm-svn: 213789	2014-07-23 20:05:44 +00:00
Mark Heffernan	e6b4ba1c41	In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full". llvm-svn: 213772	2014-07-23 17:31:37 +00:00
Nick Lewycky	aba900c252	We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405! llvm-svn: 213726	2014-07-23 06:24:49 +00:00
Suyog Sarda	3a8c2c1e6c	This patch implements optimization as mentioned in PR19753: Optimize comparisons with "ashr/lshr exact" of a constanst. It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch. Added code for lshr as well as non-exact right shifts. It implements : (icmp eq/ne (ashr/lshr const2, A), const1)" -> (icmp eq/ne A, Log2(const2/const1)) -> (icmp eq/ne A, Log2(const2) - Log2(const1)) Differential Revision: http://reviews.llvm.org/D4068 llvm-svn: 213678	2014-07-22 19:19:36 +00:00
Suyog Sarda	b60ec909ca	Added InstCombine transform for pattern "(A & B) ^ (A ^ B) -> (A \| B)" Patch idea by Ankit Jain ! Differential Revision: http://reviews.llvm.org/D4618 llvm-svn: 213677	2014-07-22 18:30:54 +00:00
Suyog Sarda	d64faf6cae	Added InstCombine Transform for patterns: "((~A & B) \| A) -> (A \| B)" and "((A & B) \| ~A) -> (~A \| B)" Original Patch credit to Ankit Jain !! Differential Revision: http://reviews.llvm.org/D4591 llvm-svn: 213676	2014-07-22 18:09:41 +00:00
Hal Finkel	ccc7090671	Make use of the align parameter attribute for all pointer arguments We previously supported the align attribute on all (pointer) parameters, but we only used it for byval parameters. However, it is completely consistent at the IR level to treat 'align n' on all pointer parameters as an alignment assumption on the pointer, and now we wll. Specifically, this causes computeKnownBits to use the align attribute on all pointer parameters, not just byval parameters. I've also added an explicit parameter attribute test for this to test/Bitcode/attributes.ll. And I've updated the LangRef to document the align parameter attribute (as it turns out, it was not documented at all previously, although the byval documentation mentioned that it could be used). There are (at least) two benefits to doing this: - It allows enhancing alignment based on the pointer alignment after inlining callees. - It allows simplification of pointer arithmetic. llvm-svn: 213670	2014-07-22 16:58:55 +00:00
Suyog Sarda	521237cad6	This patch implements transform for pattern "(A \| B) ^ (~A) -> (A \| ~B)". Patch Credit to Ankit Jain !! Differential Revision: http://reviews.llvm.org/D4588 llvm-svn: 213662	2014-07-22 15:37:39 +00:00
Mark Heffernan	9d20e42765	Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave. llvm-svn: 213588	2014-07-21 23:11:03 +00:00
Hal Finkel	7ae00a1282	[LoopVectorize] Use AA to partition potential dependency checks Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int a, float b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486	2014-07-20 23:07:52 +00:00
Hal Finkel	4f7d55aac8	[LoopVectorize] Propagate known metadata to vectorized instructions There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. llvm-svn: 213452	2014-07-19 13:33:16 +00:00
Hal Finkel	9e440c08a9	Make Value::isDereferenceablePointer handle offsets to pointer types with dereferenceable attributes When we have a parameter (or call site return) with a dereferenceable attribute, it can specify the size of an array pointed to by that parameter. If we have a value for which we can accumulate a constant offset to such a parameter, then we can use that offset in a direct comparison with the size specified by the dereferenceable attribute. This enables us to handle cases like this: int foo(int a[static 3]) { return a[2]; /* this is always dereferenceable */ } llvm-svn: 213447	2014-07-19 03:25:16 +00:00
Mark Heffernan	053a68688a	Remove unroll pragma metadata after it is used. llvm-svn: 213412	2014-07-18 21:04:33 +00:00
Gerolf Hoflehner	f27ae6cdcf	MergedLoadStoreMotion pass Merges equivalent loads on both sides of a hammock/diamond and hoists into into the header. Merges equivalent stores on both sides of a hammock/diamond and sinks it to the footer. Can enable if conversion and tolerate better load misses and store operand latencies. llvm-svn: 213396	2014-07-18 19:13:09 +00:00
Hal Finkel	b0407ba071	Add a dereferenceable attribute This attribute indicates that the parameter or return pointer is dereferenceable. Practically speaking, loads from such a pointer within the associated byte range are safe to speculatively execute. Such pointer parameters are common in source languages (C++ references, for example). llvm-svn: 213385	2014-07-18 15:51:28 +00:00
Matt Arsenault	3dd43fc75d	R600: Implement TTI:getPopcntSupport The test is just copied from X86, and I don't know of a better way to test it. llvm-svn: 213351	2014-07-18 06:07:13 +00:00
Suyog Sarda	68862414b5	Move ashr optimization from InstCombineShift to InstSimplify. Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231	2014-07-17 06:28:15 +00:00
Hal Finkel	354e23b029	Improve BasicAA CS-CS queries (redux) This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219	2014-07-17 01:28:25 +00:00
Jingyue Wu	0bdc027e31	Partially revert r210444 due to performance regression Summary: Converting outermost zext(a) to sext(a) causes worse code when the computation of zext(a) could be reused. For example, after converting ... = array[zext(a)] ... = array[zext(a) + 1] to ... = array[sext(a)] ... = array[zext(a) + 1], the program computes sext(a), which is actually unnecessary. I added one test in split-gep-and-gvn.ll to illustrate this scenario. Also, with r211281 and r211084, we annotate more "nuw" tags to computation involving CUDA intrinsics such as threadIdx.x. These annotations help with splitting GEP a lot, rendering the benefit we get from this reverted optimization only marginal. Test Plan: make check-all Reviewers: eliben, meheff Reviewed By: meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D4542 llvm-svn: 213209	2014-07-16 23:25:00 +00:00
Justin Holewinski	3e037d98e6	[NVPTX] Rename registers %fl -> %fd and %rl -> %rd This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168	2014-07-16 16:26:58 +00:00
Tyler Nowicki	641d8a06bd	Emit warnings if vectorization is forced and fails. This patch modifies the existing DiagnosticInfo system to create a generic base class that is inherited to produce diagnostic-based warnings. This is used by the loop vectorizer to trigger a warning when vectorization is forced and fails. Several tests have been added to verify this behavior. Reviewed by: Arnold Schwaighofer llvm-svn: 213110	2014-07-16 00:36:00 +00:00
Stepan Dyatkovskiy	dee612d4f6	MergeFunc patch from Björn Steinbrink. Phabricator ticket: D4246, Don't merge functions with different range metadata on call/invoke. Thanks! llvm-svn: 213060	2014-07-15 10:46:51 +00:00
Matt Arsenault	f1a7e62033	Teach computeKnownBits to look through addrspacecast. This fixes inferring alignment through an addrspacecast. llvm-svn: 213030	2014-07-15 01:55:03 +00:00
Matt Arsenault	70f4db88d5	Teach GetUnderlyingObject / BasicAA about addrspacecast llvm-svn: 213025	2014-07-15 00:56:40 +00:00
Matt Arsenault	ee54aaef0c	Convert test to FileCheck. Check the individual test functions for more useful failure errors. llvm-svn: 213021	2014-07-15 00:07:27 +00:00
Matt Arsenault	caa9c71f32	Look through addrspacecast in IsConstantOffsetFromGlobal llvm-svn: 213000	2014-07-14 22:39:26 +00:00
Matt Arsenault	fd78d0c934	Look through addrspacecast in GetPointerBaseWithConstantOffset llvm-svn: 212999	2014-07-14 22:39:22 +00:00
Matt Arsenault	eb9e5f41a6	Convert test to FileCheck llvm-svn: 212992	2014-07-14 21:59:26 +00:00
David Majnemer	b8f435ca70	Fix a test broken in r212981 @icmp_sdiv_neg1 should have referred to %a instead of %call, it was renamed at the last second. llvm-svn: 212983	2014-07-14 20:46:04 +00:00
David Majnemer	af9180fd04	InstSimplify: Correct sdiv x / -1 Determining the bounds of x/ -1 would start off with us dividing it by INT_MIN. Suffice to say, this would not work very well. Instead, handle it upfront by checking for -1 and mapping it to the range: [INT_MIN + 1, INT_MAX. This means that the result of our division can be any value other than INT_MIN. llvm-svn: 212981	2014-07-14 20:38:45 +00:00
David Majnemer	5ea4fc0b33	InstSimplify: The upper bound of X / C was missing a rounding step Summary: When calculating the upper bound of X / -8589934592, we would perform the following calculation: Floor[INT_MAX / 8589934592] However, flooring the result would make us wrongly come to the conclusion that 1073741824 was not in the set of possible values. Instead, use the ceiling of the result. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4502 llvm-svn: 212976	2014-07-14 19:49:57 +00:00
Matt Arsenault	199b39e063	Look through addrspacecast when checking isDereferenceablePointer llvm-svn: 212971	2014-07-14 18:54:12 +00:00
Nick Lewycky	703e488ed9	Don't eliminate memcpy's when the address of the pointer may itself be relevant. Fixes PR18304. Patch by David Wiberg! llvm-svn: 212970	2014-07-14 18:52:02 +00:00
Aditya Nandakumar	0b5a674243	When we sink an instruction, this can open up opportunity for the operands to be sunk - add them to the worklist llvm-svn: 212847	2014-07-11 21:49:39 +00:00
Marcello Maggioni	83442bb8b2	Added test for commit r212802 that was missing llvm-svn: 212803	2014-07-11 10:36:00 +00:00
Duncan P. N. Exon Smith	04934b0fec	InstCombine: Fix a crash in Descale for multiply-by-zero Fix a crash in `InstCombiner::Descale()` when a multiply-by-zero gets created as an argument to a GEP partway through an iteration, causing -instcombine to optimize the GEP before the multiply. rdar://problem/17615671 llvm-svn: 212742	2014-07-10 17:13:27 +00:00
Hal Finkel	a71fe078c8	A test case for not asserting in isDereferenceablePointer upon unsized types This is the test case for r212687. llvm-svn: 212688	2014-07-10 07:04:37 +00:00
Hal Finkel	2e42c34d05	Allow isDereferenceablePointer to look through some bitcasts isDereferenceablePointer should not give up upon encountering any bitcast. If we're casting from a pointer to a larger type to a pointer to a small type, we can continue by examining the bitcast's operand. This missing capability was noted in a comment in the function. In order for this to work, isDereferenceablePointer now takes an optional DataLayout pointer (essentially all callers already had such a pointer available). Most code uses isDereferenceablePointer though isSafeToSpeculativelyExecute (which already took an optional DataLayout pointer), and to enable the LICM test case, LICM needs to actually provide its DL pointer to isSafeToSpeculativelyExecute (which it was not doing previously). llvm-svn: 212686	2014-07-10 05:27:53 +00:00
Adam Nemet	2820a5b9e9	[X86] AVX512: Enable it in the Loop Vectorizer This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634	2014-07-09 18:22:33 +00:00
Sanjay Patel	7ae7a831b9	removed duplicate testcase llvm-svn: 212632	2014-07-09 17:49:58 +00:00
Sanjay Patel	58814445d4	Fix for PR20059 (instcombine reorders shufflevector after instruction that may trap) In PR20059 ( http://llvm.org/pr20059 ), instcombine eliminates shuffles that are necessary before performing an operation that can trap (srem). This patch calls isSafeToSpeculativelyExecute() and bails out of the optimization in SimplifyVectorOp() if needed. Differential Revision: http://reviews.llvm.org/D4424 llvm-svn: 212629	2014-07-09 16:34:54 +00:00
Pete Cooper	91e4ba2f88	Revert "GlobalDCE: Delete available_externally initializers if it allows removing the value the initializer is referring to." This reverts commit 5b55a47e94e28fbb56d0cd5d72c3db9105c15b4c. A test case was found to crash after this was applied. I'll file a bug to track fixing this with the test case needed. llvm-svn: 212550	2014-07-08 17:06:03 +00:00
Sanjay Patel	a932da8f35	Fix for PR17073 ( http://llvm.org/pr17073 ), simplifycfg illegally hoists an operation in a phi node that can trap. This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those. The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected. llvm-svn: 212490	2014-07-07 21:19:00 +00:00
Tim Northover	55beb64bd0	CodeGen: it turns out that NAND is not the same thing as BIC. At all. We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443	2014-07-07 09:06:35 +00:00
David Majnemer	d1bea693e2	IR: Fold away compares between GV GEPs and GVs A GEP of a non-weak global variable will not be equivalent to another non-weak global variable or a GEP of such a variable. Differential Revision: http://reviews.llvm.org/D4238 llvm-svn: 212360	2014-07-04 22:05:26 +00:00
Benjamin Kramer	3c5b126239	GlobalDCE: Delete available_externally initializers if it allows removing the value the initializer is referring to. This is useful for functions that are not actually available externally but referenced by a vtable of some kind. Clang emits functions like this for the MS ABI. PR20182. llvm-svn: 212337	2014-07-04 12:36:05 +00:00
Benjamin Kramer	a420df2999	InstCombine: Strength reduce sadd.with.overflow into a regular nsw add if we can prove that it cannot overflow. PR20194 llvm-svn: 212331	2014-07-04 10:22:21 +00:00
David Majnemer	651ed5e8fd	InstSimplify: Fix a bug when INT_MIN is in a sdiv When INT_MIN is the numerator in a sdiv, we would not properly handle overflow when calculating the bounds of possible values; abs(INT_MIN) is not a meaningful number. Instead, check and handle INT_MIN by reasoning that the largest value is INT_MIN/-2 and the smallest value is INT_MIN. This fixes PR20199. llvm-svn: 212307	2014-07-04 00:23:39 +00:00
Richard Trieu	f2a795241a	Add new lines to debugging information. Differential Revision: http://reviews.llvm.org/D4262 llvm-svn: 212250	2014-07-03 02:11:49 +00:00
David Majnemer	f28e2a4282	InstCombine: Optimize x/INT_MIN to x==INT_MIN The result of x/INT_MIN is either 0 or 1, we can just use an icmp instead. llvm-svn: 212167	2014-07-02 06:42:13 +00:00
David Majnemer	e18d302ef4	InstCombine: Add a vector variant test for PR20186 No functional change, just adding more test coverage that was meant to go in with r212164. llvm-svn: 212165	2014-07-02 06:14:13 +00:00
David Majnemer	bdeef602e9	InstCombine: Don't turn -(x/INT_MIN) -> x/INT_MIN It is not safe to negate the smallest signed integer, doing so yields the same number back. This fixes PR20186. llvm-svn: 212164	2014-07-02 06:07:09 +00:00
David Blaikie	e844cd5305	DebugInfo: Keep track of subprograms who's arguments have been promoted. Matching behavior with DeadArgumentElimination (and leveraging some now-common infrastructure), keep track of the function from debug info metadata if arguments are promoted. This may produce interesting debug info - since the arguments may be missing or of different types... but at least backtraces, inlining, etc, will be correct. llvm-svn: 212128	2014-07-01 21:13:37 +00:00
David Majnemer	5c92115972	GlobalOpt: Don't swap private for internal linkage There were transforms whose intent was to downgrade the linkage of external objects to have internal linkage. However, it fired on things with private linkage as well. llvm-svn: 212104	2014-07-01 15:26:50 +00:00
David Majnemer	9797abb0bf	GlobalOpt: FileCheck-ize test No functionality change. llvm-svn: 212103	2014-07-01 15:26:47 +00:00
David Majnemer	0e2cc2a519	GlobalOpt: Handle non-zero offsets for aliases An alias with an aliasee of a non-zero GEP is not trivially replacable with it's aliasee. llvm-svn: 212079	2014-07-01 00:30:56 +00:00
Gerolf Hoflehner	734f4c8984	Suppress inlining when the block address is taken Inlining functions with block addresses can cause many problem and requires a rich infrastructure to support including escape analysis. At this point the safest approach to address these problems is by blocking inlining from happening. Background: There have been reports on Ruby segmentation faults triggered by inlining functions with block addresses like //Ruby code snippet vm_exec_core() { finish_insn_seq_0 = &&INSN_LABEL_finish; INSN_LABEL_finish: ; } This kind of scenario can also happen when LLVM picks a subset of blocks for inlining, which is the case with the actual code in the Ruby environment. LLVM suppresses inlining for such functions when there is an indirect branch. The attached patch does so even when there is no indirect branch. Note that user code like above would not make much sense: using the global for jumping across function boundaries would be illegal. Why was there a segfault: In the snipped above the block with the label is recognized as dead So it is eliminated. Instead of a block address the cloner stores a constant (sic!) into the global resulting in the segfault (when the global is used in a goto). Why had it worked in the past then: By luck. In older versions vm_exec_core was also inlined but the label address used was the block label address in vm_exec_core. So the global jump ended up in the original function rather than in the caller which accidentally happened to work. Test case ./tools/clang/test/CodeGen/indirect-goto.c will fail as a result of this commit. rdar://17245966 llvm-svn: 212077	2014-07-01 00:19:34 +00:00
Reid Kleckner	cdb4e64a20	Convert some byval argpromotion grep tests to FileCheck Surprisingly, the i32* byval parameter is not transformed by argpromotion. llvm-svn: 212067	2014-06-30 20:44:28 +00:00
David Blaikie	644d2eee59	DebugInfo: Preserve debug location information when transforming a call into an invoke during inlining. This both improves basic debug info quality, but also fixes a larger hole whenever we inline a call/invoke without a location (debug info for the entire inlining is lost and other badness that the debug info emission code is currently working around but shouldn't have to). llvm-svn: 212065	2014-06-30 20:30:39 +00:00
David Blaikie	ba405c22b8	Remove unnecessary datalayout string from a test case. llvm-svn: 212063	2014-06-30 20:26:12 +00:00
Erik Eckstein	5b18a09748	test commit: add a comment line in GVN test file llvm-svn: 212019	2014-06-30 07:19:02 +00:00
Dinesh Dwivedi	adc07739a9	Added instruction combine to transform few more negative values addition to subtraction (Part 3) This patch enables transforms for (x + (~(y \| c) + 1) --> x - (y \| c) if c is odd Differential Revision: http://reviews.llvm.org/D4210 llvm-svn: 211881	2014-06-27 07:47:35 +00:00
David Majnemer	9930398c5c	GlobalOpt: Fix constantfold-initializers.ll test The test added in r211762 was sloppy, the correct initializer wasn't added to @llvm.global_ctors Spotted by Pasi Parviainen! llvm-svn: 211879	2014-06-27 07:36:26 +00:00
David Blaikie	b0cdf530c3	ArgumentPromotion: Propagate debug locations on calls for which arguments are promoted. llvm-svn: 211872	2014-06-27 05:32:09 +00:00
Arnold Schwaighofer	ed988fb97d	GVN: Preserve invariant.load metadata If both instructions to be replaced are marked invariant the resulting instruction is invariant. rdar://13358910 Fix by Erik Eckstein! llvm-svn: 211801	2014-06-26 19:51:19 +00:00
Dinesh Dwivedi	99281a0615	This patch removed duplicate code for matching patterns which are now handled in SimplifyUsingDistributiveLaws() (after r211261) Differential Revision: http://reviews.llvm.org/D4253 llvm-svn: 211768	2014-06-26 08:57:33 +00:00
Dinesh Dwivedi	a716173581	Added instruction combine to transform few more negative values addition to subtraction (Part 2) This patch enables transforms for (x + (~(y \| c) + 1) --> x - (y \| c) if c is even Differential Revision: http://reviews.llvm.org/D4209 llvm-svn: 211765	2014-06-26 05:40:22 +00:00
David Majnemer	6098b2f519	GlobalOpt: Don't optimize thread_local for initializers Folding a reference to a thread_local variable into another global variable's initializer is very problematic, there is no relocation that exists to represent such an access. llvm-svn: 211762	2014-06-26 03:02:19 +00:00
Hans Wennborg	b03ebfb77e	Don't build switch tables for dllimport and TLS variables in GEPs This is a follow-up to r211331, which failed to notice that we were returning early from ValidLookupTableConstant for GEPs. llvm-svn: 211753	2014-06-26 00:30:52 +00:00
Tyler Nowicki	4b07b00786	Add Rpass-missed and Rpass-analysis reports to the loop vectorizer. The remarks give the vector width of vectorized loops and a brief analysis of loops that fail to be vectorized. For example, an analysis will be generated for loops containing control flow that cannot be simplified to a select. The optimization remarks also give the debug location of expressions that cannot be vectorized, for example the location of an unvectorizable call. Reviewed by: Arnold Schwaighofer llvm-svn: 211721	2014-06-25 17:50:15 +00:00
Eli Bendersky	5d5e18da3e	Rename loop unrolling and loop vectorizer metadata to have a common prefix. [LLVM part] These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes: llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.* This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata. Patch by Mark Heffernan. llvm-svn: 211710	2014-06-25 15:41:00 +00:00
Evgeniy Stepanov	10280dac1d	[LICM] Don't create more than one copy of an instruction per loop exit block when sinking. Fixes exponential compilation complexity in PR19835, caused by LICM::sink not handling the following pattern well: f = op g e = op f, g d = op e c = op d, e b = op c a = op b, c When an instruction with N uses is sunk, each of its operands gets N new uses (all of them - phi nodes). In the example above, if a had 1 use, c would have 2, e would have 4, and g would have 8. llvm-svn: 211673	2014-06-25 07:54:58 +00:00
Diego Novillo	56653fdada	Add new debug kind LocTrackingOnly. Summary: This new debug emission kind supports emitting line location information in all instructions, but stops code generation from emitting debug info to the final output. This mode is useful when the backend wants to track source locations during code generation, but it does not want to produce debug info. This is currently used by optimization remarks (-pass-remarks, -pass-remarks-missed and -pass-remarks-analysis). To prevent debug info emission, DIBuilder never inserts the annotation 'llvm.dbg.cu' when LocTrackingOnly is enabled. Reviewers: echristo, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4234 llvm-svn: 211609	2014-06-24 17:02:03 +00:00
Benjamin Kramer	c96a7f88b9	InstCombine: Disable umul.with.overflow recognition for vectors. It doesn't make a lot on most targets and the code isn't ready for it. PR20113. llvm-svn: 211583	2014-06-24 10:47:52 +00:00
Benjamin Kramer	6de786666a	InstCombine: Don't try to reorder shuffles where the mask is a ConstantExpr. We can't analyze the individual values of a vector expression. PR20114. llvm-svn: 211581	2014-06-24 10:38:10 +00:00
David Majnemer	23fc9afa4d	GlobalOpt: Don't optimize dllimport for initializers Referencing a dllimport variable requires actually instructions, not just a relocation. This fixes PR19955. Differential Revision: http://reviews.llvm.org/D4249 llvm-svn: 211571	2014-06-24 06:53:45 +00:00
Benjamin Kramer	f504ec2298	Add a description to the test from r211433 explaining why it's written that way. llvm-svn: 211465	2014-06-22 12:22:04 +00:00
Arnold Schwaighofer	c11107cb1e	LoopVectorizer: Fix a dominance issue The induction variables start value needs to be defined before we branch (overflow check) to the scalar preheader where we used it. llvm-svn: 211460	2014-06-22 03:38:59 +00:00
Benjamin Kramer	0bf086f80f	LoopUnrollRuntime: Check for overflow in the trip count calculation. Fixes PR19823. llvm-svn: 211436	2014-06-21 13:46:25 +00:00
Benjamin Kramer	8dd637aa04	SCEVExpander: Fold constant PHIs harder. The logic below only understands proper IVs. PR20093. llvm-svn: 211433	2014-06-21 11:47:18 +00:00
Stepan Dyatkovskiy	6baeb8805c	Commited patch from Björn Steinbrink: Summary: Different range metadata can lead to different optimizations in later passes, possibly breaking the semantics of the merged function. So range metadata must be taken into consideration when comparing Load instructions. Thanks! llvm-svn: 211391	2014-06-20 19:11:56 +00:00
Karthik Bhat	e03a25da70	Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer. This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and vectorizes them as vector shuffles if they are profitable. These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86. Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 llvm-svn: 211339	2014-06-20 04:32:48 +00:00
Hans Wennborg	4dc895164a	Don't build switch lookup tables for dllimport or TLS variables We would previously put dllimport variables in switch lookup tables, which doesn't work because the address cannot be used in a constant initializer. This is basically the same problem that we have in PR19955. Putting TLS variables in switch tables also desn't work, because the address of such a variable is not constant. Differential Revision: http://reviews.llvm.org/D4220 llvm-svn: 211331	2014-06-20 00:38:12 +00:00
Jingyue Wu	37fcb5919d	[ValueTracking] Extend range metadata to call/invoke Summary: With this patch, range metadata can be added to call/invoke including IntrinsicInst. Previously, it could only be added to load. Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because range metadata is not only used by load. Update the language reference to reflect this change. Test Plan: Add several tests in range-2.ll to confirm the verifier is happy with having range metadata on call/invoke. Add two tests in AddOverFlow.ll to confirm annotating range metadata to call/invoke can benefit InstCombine. Reviewers: meheff, nlewycky, reames, hfinkel, eliben Reviewed By: eliben Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4187 llvm-svn: 211281	2014-06-19 16:50:16 +00:00
Dinesh Dwivedi	562fd7534c	Added instruction combine to transform few more negative values addition to subtraction (Part 1) This patch enables transforms for following patterns. (x + (~(y & c) + 1) --> x - (y & c) (x + (~((y >> z) & c) + 1) --> x - ((y>>z) & c) Differential Revision: http://reviews.llvm.org/D3733 llvm-svn: 211266	2014-06-19 10:36:52 +00:00
Dinesh Dwivedi	b62e52e1b5	Refactored and updated SimplifyUsingDistributiveLaws() to * Find factorization opportunities using identity values. * Find factorization opportunities by treating shl(X, C) as mul (X, shl(C)) * Keep NSW flag while simplifying instruction using factorization. This fixes PR19263. Differential Revision: http://reviews.llvm.org/D3799 llvm-svn: 211261	2014-06-19 08:29:18 +00:00
David Majnemer	6cf6c05322	InstCombine: Stop two transforms dueling InstCombineMulDivRem has: // Canonicalize (X+C1)CI -> XCI+C1CI. InstCombineAddSub has: // WX + YZ --> W (X+Z) iff W == Y These two transforms could fight with each other if C1CI would not fold away to something simpler than a ConstantExpr mul. The InstCombineMulDivRem transform only acted on ConstantInts until r199602 when it was changed to operate on all Constants in order to let it fire on ConstantVectors. To fix this, make this transform more careful by checking to see if we actually folded away C1CI. This fixes PR20079. llvm-svn: 211258	2014-06-19 07:14:33 +00:00
Nick Lewycky	8561a49c27	Move optimization of some cases of (A & C1)\|(B & C2) from instcombine to instsimplify. Patch by Rahul Jain, plus some last minute changes by me -- you can blame me for any bugs. llvm-svn: 211252	2014-06-19 03:51:46 +00:00
Nick Lewycky	c961030ac2	Make instsimplify's analysis of icmp eq/ne use computeKnownBits to determine whether the icmp is always true or false. Patch by Suyog Sarda! llvm-svn: 211251	2014-06-19 03:35:49 +00:00
Matt Arsenault	a0050b0961	R600/SI: Add intrinsics for various math instructions. These will be used for custom lowering and for library implementations of various math functions, so it's useful to expose these as builtins. llvm-svn: 211247	2014-06-19 01:19:19 +00:00
Dinesh Dwivedi	657105e582	Fixed jump threading going to infinite loop. This patch add code to remove unreachable blocks from function as they may cause jump threading to stuck in infinite loop. Differential Revision: http://reviews.llvm.org/D3991 llvm-svn: 211103	2014-06-17 14:34:19 +00:00
Jingyue Wu	33bd53df7f	[InstCombine] mark ADD with nuw if no unsigned overflow Summary: As a starting step, we only use one simple heuristic: if the sign bits of both a and b are zero, we can prove "add a, b" do not unsigned overflow, and thus convert it to "add nuw a, b". Updated all affected tests and added two new tests (@zero_sign_bit and @zero_sign_bit2) in AddOverflow.ll Test Plan: make check-all Reviewers: eliben, rafael, meheff, chandlerc Reviewed By: chandlerc Subscribers: chandlerc, llvm-commits Differential Revision: http://reviews.llvm.org/D4144 llvm-svn: 211084	2014-06-17 00:42:07 +00:00
Duncan P. N. Exon Smith	73686d305a	SROA: Only split loads on byte boundaries r199771 accidently broke the logic that makes sure that SROA only splits load on byte boundaries. If such a split happens, some bits get lost when reassembling loads of wider types, causing data corruption. Move the width check up to reject such splits early, avoiding the corruption. Fixes PR19250. Patch by: Björn Steinbrink <bsteinbr@gmail.com> llvm-svn: 211082	2014-06-17 00:19:35 +00:00
Eli Bendersky	ff90324599	Teach LoopUnrollPass to respect loop unrolling hints in metadata. [This is resubmitting r210721, which was reverted due to suspected breakage which turned out to be unrelated]. Some extra review comments were addressed. See D4090 and D4147 for more details. The Clang change that produces this metadata was committed in r210667 Patch by Mark Heffernan. llvm-svn: 211076	2014-06-16 23:53:02 +00:00
Jim Grosbach	fff5663d48	LowerSwitch: track bounding range for the condition tree. When LowerSwitch transforms a switch instruction into a tree of ifs it is actually performing a binary search into the various case ranges, to see if the current value falls into one cases range of values. So, if we have a program with something like this: switch (a) { case 0: do0(); break; case 1: do1(); break; case 2: do2(); break; default: break; } the code produced is something like this: if (a < 1) { if (a == 0) { do0(); } } else { if (a < 2) { if (a == 1) { do1(); } } else { if (a == 2) { do2(); } } } This code is inefficient because the check (a == 1) to execute do1() is not needed. The reason is that because we already checked that (a >= 1) initially by checking that also (a < 2) we basically already inferred that (a == 1) without the need of an extra basic block spawned to check if actually (a == 1). The patch addresses this problem by keeping track of already checked bounds in the LowerSwitch algorithm, so that when the time arrives to produce a Leaf Block that checks the equality with the case value / range the algorithm can decide if that block is really needed depending on the already checked bounds . For example, the above with "a = 1" would work like this: the bounds start as LB: NONE , UB: NONE as (a < 1) is emitted the bounds for the else path become LB: 1 UB: NONE. This happens because by failing the test (a < 1) we know that the value "a" cannot be smaller than 1 if we enter the else branch. After the emitting the check (a < 2) the bounds in the if branch become LB: 1 UB: 1. This is because by checking that "a" is smaller than 2 then the upper bound becomes 2 - 1 = 1. When it is time to emit the leaf block for "case 1:" we notice that 1 can be squeezed exactly in between the LB and UB, which means that if we arrived to that block there is no need to emit a block that checks if (a == 1). Patch by: Marcello Maggioni <hayarms@gmail.com> llvm-svn: 211038	2014-06-16 16:55:20 +00:00
Jingyue Wu	baabe5091c	Canonicalize addrspacecast ConstExpr between different pointer types As a follow-up to r210375 which canonicalizes addrspacecast instructions, this patch canonicalizes addrspacecast constant expressions. Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast cosntant expressions, this patch is also a step towards having the frontend emit canonicalized addrspacecasts. Piggyback a minor refactor in InstCombineCasts.cpp Update three affected tests in addrspacecast-alias.ll, access-non-generic.ll and constant-fold-gep.ll and added one new test in constant-fold-address-space-pointer.ll llvm-svn: 211004	2014-06-15 21:40:57 +00:00
Jiangning Liu	96e92c1d75	Move GlobalMerge from Transform to CodeGen. This patch is to move GlobalMerge pass from Transform/Scalar to CodeGen, because GlobalMerge depends on TargetMachine. In the mean time, the macro INITIALIZE_TM_PASS is also moved to CodeGen/Passes.h. With this fix we can avoid making libScalarOpts depend on libCodeGen. llvm-svn: 210951	2014-06-13 22:57:59 +00:00
Tim Northover	20b9f739eb	Atomics: make use of the "cmpxchg weak" instruction. This also simplifies the IR we create slightly: instead of working out where success & failure should go manually, it turns out we can just always jump to a success/failure block created for the purpose. Later phases will sort out the mess without much difficulty. llvm-svn: 210917	2014-06-13 16:45:52 +00:00
Tim Northover	d039abdeeb	Atomics: switch direction of cmpxchg comparison This has two benefits: it makes the result more suitable for direct insertaion into the struct to emulate the new cmpxchg, and it means the name we give the instruction matches its actual effect better. llvm-svn: 210916	2014-06-13 16:45:36 +00:00
Tim Northover	6bf04e4512	SCCP: update for cmpxchg returning { iN, i1 } now. I accidentally missed this one since its use looked OK locally. llvm-svn: 210909	2014-06-13 14:54:09 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Duncan P. N. Exon Smith	fd5c553f54	GVN: Enable value forwarding for calloc Enable value forwarding for loads from `calloc()` without an intervening store. This change extends GVN to handle the following case: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This load is trivially constant zero %3 = load i32* %2, align 4 This is analogous to the handling for `malloc()` in the same places. `malloc()` returns `undef`; `calloc()` returns a zero value. Note that it is correct to return zero even for out of bounds GEPs since the result of such a GEP would be undefined. Patch by Philip Reames! llvm-svn: 210828	2014-06-12 21:16:19 +00:00
Eli Bendersky	dc6de2ce29	Revert r210721 as it causes breakage in internal builds (and possibly GDB). llvm-svn: 210807	2014-06-12 18:05:39 +00:00
Dinesh Dwivedi	95f0d51bd3	This removes TODO added in http://reviews.llvm.org/D3658 The patch transforms ABS(NABS(X)) -> ABS(X) NABS(ABS(X)) -> NABS(X) Differential Revision: http://reviews.llvm.org/D4040 llvm-svn: 210782	2014-06-12 14:06:00 +00:00
Eli Bendersky	899bef099f	Teach LoopUnrollPass to respect loop unrolling hints in metadata. See http://reviews.llvm.org/D4090 for more details. The Clang change that produces this metadata was committed in r210667 Patch by Mark Heffernan. llvm-svn: 210721	2014-06-11 23:15:35 +00:00
Chad Rosier	5ea14e09e9	[Reassociate] FileCheckize and cleanup a few testcases. No functional change intended. llvm-svn: 210685	2014-06-11 18:28:45 +00:00
Jiangning Liu	b2ae37fb67	Global merge for global symbols. This commit is to improve global merge pass and support global symbol merge. The global symbol merge is not enabled by default. For aarch64, we need some more back-end fix to make it really benifit ADRP CSE. llvm-svn: 210640	2014-06-11 06:44:53 +00:00
Jiangning Liu	3e5b855a51	Rename global-merge to enable-global-merge. llvm-svn: 210639	2014-06-11 06:35:26 +00:00
Juergen Ributzka	b2e4edb5c8	[ConstantHoisting][X86] Improve the cost model for small constants with large types (i64 and above). This improves the X86 cost model for small constants with large types. Before this commit we would even hoist trivial constants such as i96 2. This is related to <rdar://problem/17070936> llvm-svn: 210504	2014-06-10 00:32:29 +00:00
Alp Toker	d3d017cf00	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Matt Arsenault	44f60d0a60	Look through addrspacecasts when turning ptr comparisons into index comparisons. llvm-svn: 210488	2014-06-09 19:20:29 +00:00
Stepan Dyatkovskiy	297c826385	Added functions cross-reference test. Originally this similar was initiated by Björn Steinbrink here: http://reviews.llvm.org/D3437 Bug itself has been fixed by principal changes in MergeFunctions. Though special checks for functions merging are still actual. And the test has been accepted with slight modifications. llvm-svn: 210486	2014-06-09 19:03:02 +00:00
Jingyue Wu	5c7b1aed5d	[SeparateConstOffsetFromGEP] inbounds zext => sext for better splitting For each array index that is in the form of zext(a), convert it to sext(a) if we can prove zext(a) <= max signed value of typeof(a). The conversion helps to split zext(x + y) into sext(x) + sext(y). Reviewed in http://reviews.llvm.org/D4060 llvm-svn: 210444	2014-06-08 23:49:34 +00:00
Jingyue Wu	f2b85881d4	[SeparateConstOffsetFromGEP] make two tests more strict inbounds are not necessary in these two tests. zext(a +nuw b) = zext(a) + zext(b) should hold with or without inbounds. llvm-svn: 210437	2014-06-08 20:01:42 +00:00
Rafael Espindola	4ba22f0813	Revert 209903 and 210040. The messages were "PR19753: Optimize comparisons with "ashr exact" of a constanst." "Added support to optimize comparisons with "lshr exact" of a constant." They were not correctly handling signed/unsigned operation differences, causing pr19958. llvm-svn: 210393	2014-06-07 04:12:35 +00:00
Jingyue Wu	77145d9410	InstCombine: Canonicalize addrspacecast between different element types addrspacecast X addrspace(M)* to Y addrspace(N)* --> bitcast X addrspace(M)* to Y addrspace(M)* addrspacecast Y addrspace(M)* to Y addrspace(N)* Updat all affected tests and add several new tests in addrspacecast.ll. This patch is based on http://reviews.llvm.org/D2186 (authored by Matt Arsenault) with fixes and more tests. llvm-svn: 210375	2014-06-06 21:52:55 +00:00
Michael Zolotukhin	07ee478829	Fix typo in a test from r210342. llvm-svn: 210343	2014-06-06 15:49:47 +00:00
Michael Zolotukhin	1c51612d42	[SLP] Enable vectorization of GEP expressions. The use cases look like the following: x->a = y->a + 10 x->b = y->b + 12 llvm-svn: 210342	2014-06-06 15:34:24 +00:00
Dinesh Dwivedi	3217b6c661	Added select flavour for ABS and NEG(ABS) This patch can identify ABS(X) ==> (X >s 0) ? X : -X and (X >s -1) ? X : -X ABS(X) ==> (X <s 0) ? -X : X and (X <s 1) ? -X : X NABS(X) ==> (X >s 0) ? -X : X and (X >s -1) ? -X : X NABS(X) ==> (X <s 0) ? X : -X and (X <s 1) ? X : -X and can transform ABS(ABS(X)) -> ABS(X) NABS(NABS(X)) -> NABS(X) Differential Revision: http://reviews.llvm.org/D3658 llvm-svn: 210312	2014-06-06 06:54:45 +00:00
Karthik Bhat	bf56d44cab	Fix PR19657 (scalar loads not combined into vector load) If we have common uses on separate paths in the tree; process the one with greater common depth first. This makes sure that we do not assume we need to extract a load when it is actually going to be part of a vectorized tree. Review: http://reviews.llvm.org/D3800 llvm-svn: 210310	2014-06-06 06:20:08 +00:00
Rafael Espindola	42a4c9f9e0	Allow aliases to be unnamed_addr. Alias with unnamed_addr were in a strange state. It is stored in GlobalValue, the language reference talks about "unnamed_addr aliases" but the verifier was rejecting them. It seems natural to allow unnamed_addr in aliases: * It is a property of how it is accessed, not of the data itself. * It is perfectly possible to write code that depends on the address of an alias. This patch then makes unname_addr legal for aliases. One side effect is that the syntax changes for a corner case: In globals, unnamed_addr is now printed before the address space. llvm-svn: 210302	2014-06-06 01:20:28 +00:00
Jingyue Wu	84465473e7	Fixed several correctness issues in SeparateConstOffsetFromGEP Most issues are on mishandling s/zext. Fixes: 1. When rebuilding new indices, s/zext should be distributed to sub-expressions. e.g., sext(a +nsw (b +nsw 5)) = sext(a) + sext(b) + 5 but not sext(a + b) + 5. This also affects the logic of recursively looking for a constant offset, we need to include s/zext into the context of the searching. 2. Function find should return the bitwidth of the constant offset instead of always sign-extending it to i64. 3. Stop shortcutting zext'ed GEP indices. LLVM conceptually sign-extends GEP indices to pointer-size before computing the address. Therefore, gep base, zext(a + b) != gep base, a + b Improvements: 1. Add an optimization for splitting sext(a + b): if a + b is proven non-negative (e.g., used as an index of an inbound GEP) and one of a, b is non-negative, sext(a + b) = sext(a) + sext(b) 2. Function Distributable checks whether both sext and zext can be distributed to operands of a binary operator. This helps us split zext(sext(a + b)) to zext(sext(a) + zext(sext(b)) when a + b does not signed or unsigned overflow. Refactoring: Merge some common logic of handling add/sub/or in find. Testing: Add many tests in split-gep.ll and split-gep-and-gvn.ll to verify the changes we made. llvm-svn: 210291	2014-06-05 22:07:33 +00:00
Rafael Espindola	c286f4bf2a	Add a testcase where there is an overflow when combining two constants. I noticed that a proposed optimization would have prevented this. llvm-svn: 210287	2014-06-05 21:29:49 +00:00
Nick Lewycky	ff114dae5a	Fix coverage for files with global constructors again. Adds a testcase to the commit from r206671, as requested by David Blaikie. llvm-svn: 210239	2014-06-05 04:31:43 +00:00
Alexey Samsonov	ad81f0f419	Use AArch64 instead of now removed ARM64 in test configs llvm-svn: 210229	2014-06-05 00:25:30 +00:00
Rafael Espindola	04c2258624	InstCombine: Improvement to check if signed addition overflows. This patch implements two things: 1. If we know one number is positive and another is negative, we return true as signed addition of two opposite signed numbers will never overflow. 2. Implemented TODO : If one of the operands only has one non-zero bit, and if the other operand has a known-zero bit in a more significant place than it (not including the sign bit) the ripple may go up to and fill the zero, but won't change the sign. e.x - (x & ~4) + 1 We make sure that we are ignoring 0 at MSB. Patch by Suyog Sarda. llvm-svn: 210186	2014-06-04 15:39:14 +00:00
Nick Lewycky	5f53ddd0cc	Ignore line numbers on debug intrinsics. Add an assert to ensure that we aren't emitting line number zero, the .gcno format uses this to indicate that the next field is a filename. llvm-svn: 210068	2014-06-03 04:25:36 +00:00
Rafael Espindola	64c1e18033	Allow alias to point to an arbitrary ConstantExpr. This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is up to MC (or the system assembler) to decide if that expression is valid or not. This reduces our ability to diagnose invalid uses and how early we can spot them, but it also lets us do things like @test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32), i32 ptrtoint (i32* @bar to i32)) to i32) An important implication of this patch is that the notion of aliased global doesn't exist any more. The alias has to encode the information needed to access it in its metadata (linkage, visibility, type, etc). Another consequence to notice is that getSection has to return a "const char ". It could return a NullTerminatedStringRef if there was such a thing, but when that was proposed the decision was to just uses "const char*" for that. llvm-svn: 210062	2014-06-03 02:41:57 +00:00
Rafael Espindola	d1a2c2d905	Add back commit r210029. The code was actually correct. Sorry for the confusion. I have expanded the comment saying why the analysis is valid to avoid me misunderstaning it again in the future. llvm-svn: 210052	2014-06-02 22:01:04 +00:00
Rafael Espindola	80546be566	Convert test to FileCheck. llvm-svn: 210049	2014-06-02 21:23:54 +00:00
Rafael Espindola	582c890fbe	Revert "Add the nsw flag when we detect that an add will not signed overflow." This reverts commit r210029. It was not correctly handling cases where LHS and RHS had multiple but different sign bits. llvm-svn: 210048	2014-06-02 21:12:19 +00:00
Rafael Espindola	6b04ef785e	Added support to optimize comparisons with "lshr exact" of a constant. Patch by Rahul Jain. llvm-svn: 210040	2014-06-02 19:19:04 +00:00
Rafael Espindola	82899febf0	Add the nsw flag when we detect that an add will not signed overflow. We already had a function for checking this, we were just using it only in specialized cases. llvm-svn: 210029	2014-06-02 14:32:58 +00:00
Dinesh Dwivedi	ce5d35a9d0	Added inst combine tarnsform for (1 << X) & C pattrens where C is (some PowerOf2 - 1) This patch can handles following cases from http://nondot.org/sabre/LLVMNotes/InstCombine.txt "((1 << X) & 7) == 0" ==> "X > 2" "((1 << X) & 7) != 0" ==> "X < 3". Differential Revision: http://reviews.llvm.org/D3678 llvm-svn: 210007	2014-06-02 07:57:24 +00:00
Dinesh Dwivedi	43e127bded	Added inst combine transforms for single bit tests from Chris's note if ((x & C) == 0) x \|= C becomes x \|= C if ((x & C) != 0) x ^= C becomes x &= ~C if ((x & C) == 0) x ^= C becomes x \|= C if ((x & C) != 0) x &= ~C becomes x &= ~C if ((x & C) == 0) x &= ~C becomes nothing Differential Revision: http://reviews.llvm.org/D3777 llvm-svn: 210006	2014-06-02 07:24:36 +00:00
Benjamin Kramer	4968944376	[Reassociate] Similar to "X + -X" -> "0", added code to handle "X + ~X" -> "-1". Handle "X + ~X" -> "-1" in the function Value Reassociate::OptimizeAdd(Instruction I, SmallVectorImpl<ValueEntry> &Ops); This patch implements: TODO: We could handle "X + ~X" -> "-1" if we wanted, since "-X = ~X+1". Patch by Rahul Jain! Differential Revision: http://reviews.llvm.org/D3835 llvm-svn: 209973	2014-05-31 15:01:54 +00:00
Matt Arsenault	c8fc08c31b	Make bitcast, extractelement, and insertelement considered cheap for speculation. This helps more branches into selects. On R600, vectors are cheap and anything that helps remove branches is very good. llvm-svn: 209914	2014-05-30 18:34:43 +00:00
Rafael Espindola	c323952cb4	PR19753: Optimize comparisons with "ashr exact" of a constanst. Patch by suyog sarda. llvm-svn: 209903	2014-05-30 15:54:32 +00:00
Tim Northover	b4ddc0845a	ARM & AArch64: make use of common cmpxchg idioms after expansion The C and C++ semantics for compare_exchange require it to return a bool indicating success. This gets mapped to LLVM IR which follows each cmpxchg with an icmp of the value loaded against the desired value. When lowered to ldxr/stxr loops, this extra comparison is redundant: its results are implicit in the control-flow of the function. This commit makes two changes: it replaces that icmp with appropriate PHI nodes, and then makes sure earlyCSE is called after expansion to actually make use of the opportunities revealed. I've also added -{arm,aarch64}-enable-atomic-tidy options, so that existing fragile tests aren't perturbed too much by the change. Many of them either rely on undef/unreachable too pervasively to be restored to something well-defined (particularly while making sure they test the same obscure assert from many years ago), or depend on a particular CFG shape, which is disrupted by SimplifyCFG. rdar://problem/16227836 llvm-svn: 209883	2014-05-30 10:09:59 +00:00
Karthik Bhat	5ab7795649	Allow vectorization of intrinsics such as powi,cttz and ctlz in Loop and SLP Vectorizer. This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar. Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857 llvm-svn: 209873	2014-05-30 04:31:24 +00:00
Nick Lewycky	59633cb478	When analyzing params/args for readnone/readonly, don't forget to consider that a pointer argument may be passed through a callsite to the return, and that we may need to analyze it. Fixes a bug reported on llvm-dev: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073098.html llvm-svn: 209870	2014-05-30 02:31:27 +00:00
Arnold Schwaighofer	e2067680a6	LoopVectorizer: Add a check that the backedge taken count + 1 does not overflow The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count. If this expression overflows the generated code was invalid. In case of overflow the code now jumps to the scalar loop. Fixes PR17288. llvm-svn: 209854	2014-05-29 22:10:01 +00:00
Louis Gerbarg	c6b506a0ae	Add support for combining GEPs across PHI nodes Currently LLVM will generally merge GEPs. This allows backends to use more complex addressing modes. In some cases this is not happening because there is PHI inbetween the two GEPs: GEP1--\ \|-->PHI1-->GEP3 GEP2--/ This patch checks to see if GEP1 and GEP2 are similiar enough that they can be cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123): GEP1--\ --\ --\ \|-->PHI1-->GEP3 ==> \|-->PHI2->GEP12->GEP3 == > \|-->PHI2->GEP123 GEP2--/ --/ --/ This also breaks certain use chains that are preventing GEP->GEP merges that the the existing instcombine would merge otherwise. Tests included. llvm-svn: 209843	2014-05-29 20:29:47 +00:00
Rafael Espindola	a248f536b3	Revert "Revert "Revert "InstCombine: Improvement to check if signed addition overflows.""" This reverts commit r209776. It was miscompiling llvm::SelectionDAGISel::MorphNode. llvm-svn: 209817	2014-05-29 14:39:16 +00:00
Dinesh Dwivedi	d266cb1a0b	LCSSA should be performed on the outermost affected loop while unrolling loop. During loop-unroll, loop exits from the current loop may end up in in different outer loop. This requires to re-form LCSSA recursively for one level down from the outer most loop where loop exits are landed during unroll. This fixes PR18861. Differential Revision: http://reviews.llvm.org/D2976 llvm-svn: 209796	2014-05-29 06:47:23 +00:00
Michael J. Spencer	289067cc3d	Add LoadCombine pass. This pass is disabled by default. Use -combine-loads to enable in -O[1-3] Differential revision: http://reviews.llvm.org/D3580 llvm-svn: 209791	2014-05-29 01:55:07 +00:00
Rafael Espindola	6196b7430e	Revert "Revert "InstCombine: Improvement to check if signed addition overflows."" This reverts commit r209762, bringing back r209746. It was not responsible for the libc++ build failure llvm-svn: 209776	2014-05-28 21:43:52 +00:00
Rafael Espindola	910528a3eb	Revert "Add support for combining GEPs across PHI nodes" This reverts commit r209755. it was the real cause of the libc++ build failure. llvm-svn: 209775	2014-05-28 21:41:21 +00:00
Rafael Espindola	fb59b05ca4	Revert "InstCombine: Improvement to check if signed addition overflows." This reverts commit r209746. It looks it is causing a crash while building libcxx. I am trying to get a reduced testcase. llvm-svn: 209762	2014-05-28 18:48:10 +00:00
Louis Gerbarg	727f1cbb17	Add support for combining GEPs across PHI nodes Currently LLVM will generally merge GEPs. This allows backends to use more complex addressing modes. In some cases this is not happening because there is PHI inbetween the two GEPs: GEP1--\ \|-->PHI1-->GEP3 GEP2--/ This patch checks to see if GEP1 and GEP2 are similiar enough that they can be cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123): GEP1--\ --\ --\ \|-->PHI1-->GEP3 ==> \|-->PHI2->GEP12->GEP3 == > \|-->PHI2->GEP123 GEP2--/ --/ --/ This also breaks certain use chains that are preventing GEP->GEP merges that the the existing instcombine would merge otherwise. Tests included. llvm-svn: 209755	2014-05-28 17:38:31 +00:00
Rafael Espindola	085b57941f	InstCombine: Improvement to check if signed addition overflows. This patch implements two things: 1. If we know one number is positive and another is negative, we return true as signed addition of two opposite signed numbers will never overflow. 2. Implemented TODO : If one of the operands only has one non-zero bit, and if the other operand has a known-zero bit in a more significant place than it (not including the sign bit) the ripple may go up to and fill the zero, but won't change the sign. e.x - (x & ~4) + 1 We make sure that we are ignoring 0 at MSB. Patch by Suyog Sarda. llvm-svn: 209746	2014-05-28 15:30:40 +00:00
Arnaud A. de Grandmaison	de5ff26865	No need for those tests to go thru llvm-as and/or llvm-dis. opt can handle them by itself. llvm-svn: 209689	2014-05-27 22:03:28 +00:00
Jingyue Wu	76cbea6b6d	Fixed a test in r209670 The test was outdated with r209537. llvm-svn: 209671	2014-05-27 18:12:55 +00:00
Jingyue Wu	80a738dc62	Distribute sext/zext to the operands of and/or/xor This is an enhancement to SeparateConstOffsetFromGEP. With this patch, we can extract a constant offset from "s/zext and/or/xor A, B". Added a new test @ext_or to verify this enhancement. Refactoring the code, I also extracted some common logic to function Distributable. llvm-svn: 209670	2014-05-27 18:00:00 +00:00
Filipe Cabecinhas	e8d6a1e82f	Post-commit fixes for r209643 Detected by Daniel Jasper, Ilia Filippov, and Andrea Di Biagio Fixed the argument order to select (the mask semantics to blendv* are the inverse of select) and fixed the tests Added parenthesis to the assert condition Ran clang-format llvm-svn: 209667	2014-05-27 16:54:33 +00:00
Filipe Cabecinhas	82ac07c283	Convert some X86 blendv* intrinsics into IR. Summary: Implemented an InstCombine transformation that takes a blendv* intrinsic call and translates it into an IR select, if the mask is constant. This will eventually get lowered into blends with immediates if possible, or pblendvb (with an option to further optimize if we can transform the pblendvb into a blend+immediate instruction, depending on the selector). It will also enable optimizations by the IR passes, which give up on sight of the intrinsic. Both the transformation and the lowering of its result to asm got shiny new tests. The transformation is a bit convoluted because of blendvp[sd]'s definition: Its mask is a floating point value! This forces us to convert it and get the highest bit. I suppose this happened because the mask has type __m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin. I will send an email to llvm-dev to discuss if we want to change this or not. Reviewers: grosbach, delena, nadav Differential Revision: http://reviews.llvm.org/D3859 llvm-svn: 209643	2014-05-27 03:42:20 +00:00
Tim Northover	3b0846e8f7	AArch64/ARM64: move ARM64 into AArch64's place This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577	2014-05-24 12:50:23 +00:00
Tim Northover	cc08e1fe1b	AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64. I'm doing this in two phases for a better "git blame" record. This commit removes the previous AArch64 backend and redirects all functionality to ARM64. It also deduplicates test-lines and removes orphaned AArch64 tests. The next step will be "git mv ARM64 AArch64" and rewire most of the tests. Hopefully LLVM is still functional, though it would be even better if no-one ever had to care because the rename happens straight afterwards. llvm-svn: 209576	2014-05-24 12:42:26 +00:00
Michael Zolotukhin	d4c724625a	Implement sext(C1 + C2X) --> sext(C1) + sext(C2X) and sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar Evolution. That helps SLP-vectorizer to recognize consecutive loads/stores. <rdar://problem/14860614> llvm-svn: 209568	2014-05-24 08:09:57 +00:00
Nico Rieck	bd15945ab2	Fix broken FileCheck prefixes llvm-svn: 209538	2014-05-23 19:06:24 +00:00
Jingyue Wu	bbb6e4a885	Add the extracted constant offset using GEP Fixed a TODO in r207783. Add the extracted constant offset using GEP instead of ugly ptrtoint+add+inttoptr. Using GEP simplifies future optimizations and makes IR easier to understand. Updated all affected tests, and added a new test in split-gep.ll to cover a corner case where emitting uglygep is necessary. llvm-svn: 209537	2014-05-23 18:39:40 +00:00
Justin Bogner	cbb8438bb3	ScalarEvolution: Fix handling of AddRecs in isKnownPredicate ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison when both the LHS and RHS are SCEVAddRecExprs. This checks that both LHS and RHS are guarded in the case when both are SCEVAddRecExprs. The test case is against indvars because I could not find a way to directly test SCEV. Patch by Sanjay Patel! llvm-svn: 209487	2014-05-23 00:06:56 +00:00
Diego Novillo	7f8af8bf91	Add support for missed and analysis optimization remarks. Summary: This adds two new diagnostics: -pass-remarks-missed and -pass-remarks-analysis. They take the same values as -pass-remarks but are intended to be triggered in different contexts. -pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed, which passes call when they tried to apply a transformation but couldn't. -pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis, which passes call when they want to inform the user about analysis results. The patch also: 1- Adds support in the inliner for the two new remarks and a test case. 2- Moves emitOptimizationRemark* functions to the llvm namespace. 3- Adds an LLVMContext argument instead of making them member functions of LLVMContext. Reviewers: qcolombet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3682 llvm-svn: 209442	2014-05-22 14:19:46 +00:00
Eli Bendersky	f13a05607c	Similar to bitcast, treat addrspacecast as a foldable operand. Added a test sink-addrspacecast.ll to verify this change. Patch by Jingyue Wu. llvm-svn: 209343	2014-05-22 00:02:52 +00:00
Nick Lewycky	ec373545b8	Teach isKnownNonNull that a nonnull return is not null. Add a test for this case as well as the case of a nonnull attribute (already handled but not tested). llvm-svn: 209193	2014-05-20 05:13:21 +00:00
Juergen Ributzka	431761771c	[ConstantHoisting][X86] Change the cost model to never hoist constants for types larger than i128. Currently the X86 backend doesn't support types larger than i128 very well. For example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted. This fix changes the cost model to never hoist constants for types larger than i128. Once the codegen issues have been resolved, the cost model can be updated to allow also larger types. This is related to <rdar://problem/16954938> llvm-svn: 209162	2014-05-19 21:00:53 +00:00
Peter Collingbourne	68a889757d	Check the alwaysinline attribute on the call as well as on the caller. Differential Revision: http://reviews.llvm.org/D3815 llvm-svn: 209150	2014-05-19 18:25:54 +00:00
Benjamin Kramer	6dd790c617	Flip on vectorization of bswap intrinsics. The cost model conservatively assumes that it will always get scalarized and that's about as good as we can get with the generic TTI; reasoning whether a shuffle with an efficient lowering is available is hard. We can override that conservative estimate for some targets in the future. llvm-svn: 209125	2014-05-19 13:48:08 +00:00
Dinesh Dwivedi	f82f16e3e6	Added inst-combine for 'MIN(MIN(A, 97), 23)' and 'MAX(MAX(A, 23), 97)' This removes TODO added in r208849 [http://reviews.llvm.org/D3629] MIN(MIN(A, 97), 23) -> MIN(A, 23) MAX(MAX(A, 23), 97) -> MAX(A, 97) Differential Revision: http://reviews.llvm.org/D3785 llvm-svn: 209110	2014-05-19 07:08:32 +00:00
NAKAMURA Takumi	7ef81a4f98	Revert r209049 and r209065, "Add support for combining GEPs across PHI nodes" It broke clang selfhosting even after r209065. llvm-svn: 209067	2014-05-17 14:39:21 +00:00
Louis Gerbarg	8d2a43e9be	Add support for combining GEPs across PHI nodes Currently LLVM will generally merge GEPs. This allows backends to use more complex addressing modes. In some cases this is not happening because there is PHI inbetween the two GEPs: GEP1--\ \|-->PHI1-->GEP3 GEP2--/ This patch checks to see if GEP1 and GEP2 are similiar enough that they can be cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123): GEP1--\ --\ --\ \|-->PHI1-->GEP3 ==> \|-->PHI2->GEP12->GEP3 == > \|-->PHI2->GEP123 GEP2--/ --/ --/ This also breaks certain use chains that are preventing GEP->GEP merges that the the existing instcombine would merge otherwise. Tests included. rdar://15547484 llvm-svn: 209049	2014-05-16 23:47:24 +00:00
Reid Kleckner	fceb76f5f9	Add comdat key field to llvm.global_ctors and llvm.global_dtors This allows us to put dynamic initializers for weak data into the same comdat group as the data being initialized. This is necessary for MSVC ABI compatibility. Once we have comdats for guard variables, we can use the combination to help GlobalOpt fire more often for weak data with guarded initialization on other platforms. Reviewers: nlewycky Differential Revision: http://reviews.llvm.org/D3499 llvm-svn: 209015	2014-05-16 20:39:27 +00:00
Rafael Espindola	6b238633b7	Fix most of PR10367. This patch changes the design of GlobalAlias so that it doesn't take a ConstantExpr anymore. It now points directly to a GlobalObject, but its type is independent of the aliasee type. To avoid changing all alias related tests in this patches, I kept the common syntax @foo = alias i32* @bar to mean the same as now. The cases that used to use cast now use the more general syntax @foo = alias i16, i32* @bar. Note that GlobalAlias now behaves a bit more like GlobalVariable. We know that its type is always a pointer, so we omit the '*'. For the bitcode, a nice surprise is that we were writing both identical types already, so the format change is minimal. Auto upgrade is handled by looking through the casts and no new fields are needed for now. New bitcode will simply have different types for Alias and Aliasee. One last interesting point in the patch is that replaceAllUsesWith becomes smart enough to avoid putting a ConstantExpr in the aliasee. This seems better than checking and updating every caller. A followup patch will delete getAliasedGlobal now that it is redundant. Another patch will add support for an explicit offset. llvm-svn: 209007	2014-05-16 19:35:39 +00:00
David Majnemer	78910fc4da	InstSimplify: Improve handling of ashr/lshr Summary: Analyze the range of values produced by ashr/lshr cst, %V when it is being used in an icmp. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3774 llvm-svn: 209000	2014-05-16 17:14:03 +00:00
David Majnemer	ea8d5dbf24	InstSimplify: Optimize using dividend in sdiv Summary: The dividend in an sdiv tells us the largest and smallest possible results. Use this fact to optimize comparisons against an sdiv with a constant dividend. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3795 llvm-svn: 208999	2014-05-16 16:57:04 +00:00
Rafael Espindola	5a52b9f139	Revert "Implement global merge optimization for global variables." This reverts commit r208934. The patch depends on aliases to GEPs with non zero offsets. That is not supported and fairly broken. The good news is that GlobalAlias is being redesigned and will have support for offsets, so this patch should be a nice match for it. llvm-svn: 208978	2014-05-16 13:02:18 +00:00
Jiangning Liu	932e1c3924	Implement global merge optimization for global variables. This commit implements two command line switches -global-merge-on-external and -global-merge-aligned, and both of them are false by default, so this optimization is disabled by default for all targets. For ARM64, some back-end behaviors need to be tuned to get this optimization further enabled. llvm-svn: 208934	2014-05-15 23:45:42 +00:00
Reid Kleckner	900d46ff39	Don't insert lifetime.end markers between a musttail call and ret The allocas going out of scope are immediately killed by the return instruction. This is a resend of r208912, which was committed accidentally. Reviewers: chandlerc Differential Revision: http://reviews.llvm.org/D3792 llvm-svn: 208920	2014-05-15 21:10:46 +00:00
Reid Kleckner	b16564109b	Revert "Don't insert lifetime.end markers between a musttail call and ret" This reverts commit r208912. It was committed accidentally without review. llvm-svn: 208914	2014-05-15 20:41:05 +00:00
Reid Kleckner	26ab7ead45	Don't insert lifetime.end markers between a musttail call and ret The allocas going out of scope are immediately killed by the return instruction. Reviewers: chandlerc Differential Revision: http://reviews.llvm.org/D3630 llvm-svn: 208912	2014-05-15 20:39:13 +00:00
Reid Kleckner	f0915aa0e6	Teach the inliner how to preserve musttail invariants The interesting case is what happens when you inline a musttail call through a musttail call site. In this case, we can't break perfect forwarding or allow any stack growth. Instead of merging control flow from the inlined return instruction after a musttail call into the body of the caller, leave the inlined return instruction in the caller so that the musttail call stays in the tail position. More work is required in http://reviews.llvm.org/D3630 to handle the case where the inlined function has dynamic allocas or byval arguments. Reviewers: chandlerc Differential Revision: http://reviews.llvm.org/D3491 llvm-svn: 208910	2014-05-15 20:11:28 +00:00
Chandler Carruth	a0e5695ad9	Teach the constant folder to look through bitcast constant expressions much more effectively when trying to constant fold a load of a constant. Previously, we only handled bitcasts by trying to find a totally generic byte representation of the constant and use that. Now, we look through the bitcast to see what constant we might fold the load into, and then try to form a constant expression cast of the found value that would be equivalent to loading the value. You might wonder why on earth this actually matters. Well, turns out that the Itanium ABI causes us to create a single array for a vtable where the first elements are virtual base offsets, followed by the virtual function pointers. Because the array is homogenous the element type is consistently i8* and we inttoptr the virtual base offsets into the initial elements. Then constructors bitcast these pointers to i64 pointers prior to loading them. Boom, no more constant folding of virtual base offsets. This is the first fix to LLVM to address the insane performance Eric Niebler discovered with Clang on his range comprehensions[1]. There is more to come though, this doesn't really fix the problem fully. [1]: http://ericniebler.com/2014/04/27/range-comprehensions/ llvm-svn: 208856	2014-05-15 09:56:28 +00:00
Dinesh Dwivedi	83c11da849	Reverting r208848, reason: build failure: sanitizer-x86_64-linux-bootstrap/builds/3399 llvm-svn: 208852	2014-05-15 08:22:55 +00:00
Dinesh Dwivedi	f675f4201b	Added instcombine for 'MIN(MIN(A, 27), 93)' and 'MAX(MAX(A, 93), 27)' MIN(MIN(A, 23), 97) -> MIN(A, 23) MAX(MAX(A, 97), 23) -> MAX(A, 97) Differential Revision: http://reviews.llvm.org/D3629 llvm-svn: 208849	2014-05-15 06:13:40 +00:00
Dinesh Dwivedi	837c16097e	Added inst combine transforms for single bit tests from Chris's note if ((x & C) == 0) x \|= C becomes x \|= C if ((x & C) != 0) x ^= C becomes x &= ~C if ((x & C) == 0) x ^= C becomes x \|= C if ((x & C) != 0) x &= ~C becomes x &= ~C if ((x & C) == 0) x &= ~C becomes nothing Z3 Verifications code for above transform http://rise4fun.com/Z3/Pmsh Differential Revision: http://reviews.llvm.org/D3717 llvm-svn: 208848	2014-05-15 06:01:33 +00:00
David Majnemer	186c94244c	InstCombine: Optimize -x s< cst Summary: This gets rid of a sub instruction by moving the negation to the constant when valid. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3773 llvm-svn: 208827	2014-05-15 00:02:20 +00:00
David Majnemer	2d6c023576	InstSimplify: Optimize signed icmp of -(zext V) Summary: We know that -(zext V) will always be <= zero, simplify signed icmps that have these. Uncovered using http://www.cs.utah.edu/~regehr/souper/ Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3754 llvm-svn: 208809	2014-05-14 20:16:28 +00:00
Serge Pavlov	e6de9e39a8	Fix the case when reordering shuffle and binop produces a constant. This resolves PR19737. llvm-svn: 208762	2014-05-14 09:05:09 +00:00
Nick Lewycky	f0cf8fa941	Optimize integral reciprocal (udiv 1, x and sdiv 1, x) to not use division. This fires exactly once in a clang bootstrap, but covers a few different results from http://www.cs.utah.edu/~regehr/souper/ llvm-svn: 208750	2014-05-14 03:03:05 +00:00
Serge Pavlov	b575ee8294	Fix type of shuffle resulted from shuffle merge. This fix resolves PR19730. llvm-svn: 208666	2014-05-13 06:07:21 +00:00
Rafael Espindola	cae6590f30	Convert test to FileCheck. llvm-svn: 208658	2014-05-13 00:31:31 +00:00
Rafael Espindola	fbfcb533ba	Convert test to FileCheck. llvm-svn: 208644	2014-05-13 00:07:46 +00:00
Adam Nemet	63e4b30f79	[Test] Trim unnecessary .c and .cpp from config.suffix in lit.local.cfg Tested by comparing make check VERBOSE=1 before and after to make sure no tests are missed. (VERBOSE=1 prints the list of tests.) Only one test :( remains where .cpp is required: tools/llvm-cov/range_based_for.cpp:// RUN: llvm-cov range_based_for.cpp \| FileCheck %s --check-prefix=STDOUT The topic was discussed in this thread: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140428/214905.html llvm-svn: 208621	2014-05-12 19:57:31 +00:00
Serge Pavlov	02ff620c7b	Fix type of shuffle obtained from reordering with binary operation In transformation: BinOp(shuffle(v1,undef), shuffle(v2,undef)) -> shuffle(BinOp(v1, v2),undef) type of the undef argument must be same as type of BinOp. llvm-svn: 208531	2014-05-12 10:11:27 +00:00
Serge Pavlov	0581109708	Fix reordering of shuffles and binary operations Do not apply transformation: BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2)) if operands v1 and v2 are of different size. This change fixes PR19717, which was caused by r208488. llvm-svn: 208518	2014-05-12 05:44:53 +00:00
Serge Pavlov	9ef66a8266	Reorder shuffle and binary operation. This patch enables transformations: BinOp(shuffle(v1), shuffle(v2)) -> shuffle(BinOp(v1, v2)) BinOp(shuffle(v1), const1) -> shuffle(BinOp, const2) They allow to eliminate extra shuffles in some cases. Differential Revision: http://reviews.llvm.org/D3525 llvm-svn: 208488	2014-05-11 08:46:12 +00:00
Benjamin Kramer	8722aa5754	SLPVectorizer: When sorting by domination for CSE don't assert on unreachable code. There is no total ordering if the CFG is disconnected. We don't care if we catch all CSE opportunities in dead code either so just exclude ignore them in the assert. PR19646 llvm-svn: 208461	2014-05-09 23:28:49 +00:00
Louis Gerbarg	1f54b82164	Add ExtractValue instruction to SimplifyCFG's ComputeSpeculationCost Since ExtractValue is not included in ComputeSpeculationCost CFGs containing ExtractValueInsts cannot be simplified. In particular this interacts with InstCombineCompare's tendency to insert add.with.overflow intrinsics for certain idiomatic math operations, preventing optimization. This patch adds ExtractValue to the ComputeSpeculationCost. Test case included rdar://14853450 llvm-svn: 208434	2014-05-09 17:02:46 +00:00
Michael Zolotukhin	292d3caa15	[InstCombine] Some cleanup in optimization of redundant insertvalue instructions. And one more test added. llvm-svn: 208355	2014-05-08 19:50:24 +00:00
Dario Domizioli	058a8fd36c	Revert test commit. Removed blank line. llvm-svn: 208308	2014-05-08 12:54:43 +00:00
Dario Domizioli	a2fa84edbf	Test commit. Added blank line. llvm-svn: 208298	2014-05-08 11:28:14 +00:00
Hal Finkel	6532c20faa	Move late partial-unrolling thresholds into the processor definitions The old method used by X86TTI to determine partial-unrolling thresholds was messy (because it worked by testing target features), and also would not correctly identify the target CPU if certain target features were disabled. After some discussions on IRC with Chandler et al., it was decided that the processor scheduling models were the right containers for this information (because it is often tied to special uop dispatch-buffer sizes). This does represent a small functionality change: - For generic x86-64 (which uses the SB model and, thus, will get some unrolling). - For AMD cores (because they still currently use the SB scheduling model) - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump the default threshold to 50; we're working on a test case for this). Otherwise, nothing has changed for any other targets. The logic, however, has been moved into BasicTTI, so other targets may now also opt-in to this functionality simply by setting LoopMicroOpBufferSize in their processor model definitions. llvm-svn: 208289	2014-05-08 09:14:44 +00:00
Duncan P. N. Exon Smith	b80de1012a	IR: Don't allow non-default visibility on local linkage Visibilities of `hidden` and `protected` are meaningless for symbols with local linkage. - Change the assembler to reject non-default visibility on symbols with local linkage. - Change the bitcode reader to auto-upgrade `hidden` and `protected` to `default` when the linkage is local. - Update LangRef. <rdar://problem/16141113> llvm-svn: 208263	2014-05-07 22:57:20 +00:00
Michael Zolotukhin	7d6293a0d3	[InstCombine] Add optimization of redundant insertvalue instructions. rdar://problem/11861387 llvm-svn: 208214	2014-05-07 14:30:18 +00:00
Nick Lewycky	5ef6bc8815	Improve 'tail' call marking in TRE. A bootstrap of clang goes from 375k calls marked tail in the IR to 470k, however this improvement does not carry into an improvement of the call/jmp ratio on x86. The most common pattern is a tail call + br to a block with nothing but a 'ret'. The number of tail call to loop conversions remains the same (1618 by my count). The new algorithm does a local scan over the use-def chains to identify local "alloca-derived" values, as well as points where the alloca could escape. Then, a visit over the CFG marks blocks as being before or after the allocas have escaped, and annotates the calls accordingly. llvm-svn: 208017	2014-05-05 23:59:03 +00:00
Michael Zolotukhin	e37f33c466	Move test from r207969 to another folder and rename it. llvm-svn: 207984	2014-05-05 18:10:15 +00:00
Yi Jiang	a4821fc9fb	Always set alignment of vectorized LD/ST in SLP-Vectorizer. <rdar://problem/16812145> llvm-svn: 207983	2014-05-05 17:59:14 +00:00
Duncan P. N. Exon Smith	1789fb6493	LTO: -internalize sets visibility to default Visibility is meaningless when the linkage is local. Change `-internalize` to reset the visibility to `default`. <rdar://problem/16141113> llvm-svn: 207979	2014-05-05 17:40:44 +00:00
Michael Zolotukhin	4e030e8fb4	Fix test from r207966 and add a comment there. llvm-svn: 207969	2014-05-05 14:46:53 +00:00
Michael Zolotukhin	0c380a30d8	Add regression test for r207692. llvm-svn: 207966	2014-05-05 14:05:25 +00:00
Benjamin Kramer	9130cb8547	LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling. Otherwise we use the same threshold as for complete unrolling, which is way too high. This made us unroll any loop smaller than 150 instructions by 8 times, but only if someone specified -march=core2 or better, which happens to be the default on darwin. llvm-svn: 207940	2014-05-04 19:12:38 +00:00
Arnold Schwaighofer	cd566c423a	SLPVectorizer: Bring back the insertelement patch (r205965) with fixes When can't assume a vectorized tree is rooted in an instruction. The IRBuilder could have constant folded it. When we rebuild the build_vector (the series of InsertElement instructions) use the last original InsertElement instruction. The vectorized tree root is guaranteed to be before it. Also, we can't assume that the n-th InsertElement inserts the n-th element into a vector. This reverts r207746 which reverted the revert of the revert of r205018 or so. Fixes the test case in PR19621. llvm-svn: 207939	2014-05-04 17:10:15 +00:00
Karthik Bhat	ddd0cb5ecf	Vectorize intrinsic math function calls in SLPVectorizer. This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer. Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559 llvm-svn: 207901	2014-05-03 09:59:54 +00:00
Adam Nemet	6a56c37b95	[LSR] Add llc testcase for r207271/r207569. See PR19608 for the details but to summarize it was easy to modify the .ll file to get the desired def-use ordering. llvm-svn: 207887	2014-05-02 23:49:01 +00:00
Nico Weber	4b2acde21a	Teach GlobalDCE how to remove empty global_ctor entries. This moves most of GlobalOpt's constructor optimization code out of GlobalOpt into Transforms/Utils/CDtorUtils.{h,cpp}. The public interface is a single function OptimizeGlobalCtorsList() that takes a predicate returning which constructors to remove. GlobalOpt calls this with a function that statically evaluates all constructors, just like it did before. This part of the change is behavior-preserving. Also add a call to this from GlobalDCE with a filter that removes global constructors that contain a "ret" instruction and nothing else – this fixes PR19590. llvm-svn: 207856	2014-05-02 18:35:25 +00:00
Akira Hatanaka	f76388dd7e	[GVN] Pass the phi-translated address of a load instead of the untranslated address to AnalyzeLoadFromClobberingLoad. This fixes a bug in load-PRE where PRE is applied to a load that is not partially redundant. <rdar://problem/16638765>. llvm-svn: 207853	2014-05-02 17:59:17 +00:00
Nick Lewycky	718ada97bc	Fold strlen(expr ? "str1" : "str2") to x ? len1 : len2. This fires about 330 times in a bootstrap of clang. llvm-svn: 207828	2014-05-02 04:11:45 +00:00
Eli Bendersky	a108a65df2	Add an optimization that does CSE in a group of similar GEPs. This optimization merges the common part of a group of GEPs, so we can compute each pointer address by adding a simple offset to the common part. The optimization is currently only enabled for the NVPTX backend, where it has a large payoff on some benchmarks. Review: http://reviews.llvm.org/D3462 Patch by Jingyue Wu. llvm-svn: 207783	2014-05-01 18:38:36 +00:00
Chandler Carruth	18c2fbb143	Revert r205965, which essentially reverts r205018 for the second time. =[ Turns out that this was the root cause of PR19621. We found a crasher only recently (likely due to improvements elsewhere in the SLP vectorizer) but the reduced test case failed all the way back to here. I've confirmed that reverting this patch both fixes the reduced test case in PR19621 and the actual source file that led to it, so it seems to really be rooted here. I've replied to the commit thread with discussion of my (feeble) attempts to debug this. Didn't make it very far, so reverting now that we have a good test case so that things can get back to healthy while the debugging carries on. llvm-svn: 207746	2014-05-01 11:24:11 +00:00
Michael Zolotukhin	1f4a960ccf	[X86] Never hoist the shift value of a shift instruction. There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. This change is like r206101, but for X86. rdar://problem/16190769 llvm-svn: 207692	2014-04-30 19:17:32 +00:00
Carlo Kok	307625c974	[IPO/MergeFunctions] changes so it doesn't try to bitcast a struct return type but instead recreates it with insert/extract value. llvm-svn: 207679	2014-04-30 17:53:04 +00:00
David Majnemer	91db08bfe4	IR: Conservatively verify inalloca arguments Summary: Try to spot obvious mismatches with inalloca use. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3572 llvm-svn: 207676	2014-04-30 17:22:00 +00:00
Rafael Espindola	85f3610222	Also handle ConstantAggregateZero when optimizing vpermilvar*. llvm-svn: 207582	2014-04-29 22:20:40 +00:00
Rafael Espindola	eb7bdbd0ce	Two fixes to the vpermilvar optimization. The instcomine logic to handle vpermilvar's pd and 256 variants was incorrect. The _256 variants have indexes into the individual 128 bit lanes and in all cases it also has to mask out unused bits. llvm-svn: 207577	2014-04-29 20:41:54 +00:00
Diego Novillo	cd64780d18	Fix vectorization remarks. This patch changes the vectorization remarks to also inform when vectorization is possible but not beneficial. Added tests to exercise some loop remarks. llvm-svn: 207574	2014-04-29 20:06:10 +00:00
Yi Jiang	1a3f18b161	Continue slp vectorization even the BB already has vectorized store radar://16641956 llvm-svn: 207572	2014-04-29 19:37:20 +00:00
Zinovy Nis	d373fec199	[OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizer llvm-svn: 207512	2014-04-29 08:55:11 +00:00
Chandler Carruth	c71b2c3c7f	Revert r207271 for now. This commit introduced a test case that ran clang directly from the LLVM test suite! That doesn't work. I've followed up on the review thread to try and get a viable solution sorted out, but trying to get the tree clean here. llvm-svn: 207462	2014-04-28 23:07:49 +00:00
Hans Wennborg	e36e116826	InstCombine: don't drop 'inalloca' in PromoteCastOfAllocation (PR19569) llvm-svn: 207426	2014-04-28 17:40:03 +00:00
Chandler Carruth	e01fd5f63a	[inliner] Significantly improve the compile time in cases like PR19499 by avoiding inlining massive switches merely because they have no instructions in them. These switches still show up where we fail to form lookup tables, and in those cases they are actually going to cause a very significant code size hit anyways, so inlining them is not the right call. The right way to fix any performance regressions stemming from this is to enhance the switch-to-lookup-table logic to fire in more places. This makes PR19499 about 5x less bad. It uncovers a second compile time problem in that test case that is unrelated (surprisingly!). llvm-svn: 207403	2014-04-28 08:52:44 +00:00
Gerolf Hoflehner	af7a87d2e3	RecursivelyDeleteTriviallyDeadInstructions() could remove more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 Repaired r207302. llvm-svn: 207309	2014-04-26 05:58:11 +00:00
Gerolf Hoflehner	c46e9b0423	Revert commit r207302 since build failures have been reported. llvm-svn: 207303	2014-04-26 02:03:17 +00:00
Gerolf Hoflehner	34210108b3	RecursivelyDeleteTriviallyDeadInstructions() could remove more than 1 instruction. The caller need to be aware of this and adjust instruction iterators accordingly. rdar://16679376 llvm-svn: 207302	2014-04-26 01:19:16 +00:00
Andrea Di Biagio	8cc9059ce8	[InstCombine][X86] Teach how to fold calls to SSE2/AVX2 packed logical shift right intrinsics. A packed logical shift right with a shift count bigger than or equal to the element size always produces a zero vector. In all other cases, it can be safely replaced by a 'lshr' instruction. llvm-svn: 207299	2014-04-26 01:03:22 +00:00
Adam Nemet	03d91c51e4	[LoopStrengthReduce] Don't trim formula that uses a subset of required registers Consider this use from the new testcase: LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32 reg({1000,+,-1}<nw><%for.body>) -3003 + reg({3,+,3}<nw><%for.body>) -1001 + reg({1,+,1}<nuw><nsw><%for.body>) -1000 + reg({0,+,1}<nw><%for.body>) -3000 + reg({0,+,3}<nuw><%for.body>) reg({-1000,+,1}<nw><%for.body>) reg({-3000,+,3}<nsw><%for.body>) This is the last use we consider for a solution in SolveRecurse, so CurRegs is a large set. (CurRegs is the set of registers that are needed by the previously visited uses in the in-progress solution.) ReqRegs is { {3,+,3}<nw><%for.body>, {1,+,1}<nuw><nsw><%for.body> } This is the intersection of the regs used by any of the formulas for the current use and CurRegs. Now, the code requires a formula to contain all these regs (the comment is simply wrong), otherwise the formula is immediately disqualified. Obviously, no formula for this use contains two regs so they will all get disqualified. The fix modifies the check to allow the formula in this case. The idea is that neither of these formulae is introducing any new registers which is the point of this early pruning as far as I understand. In terms of set arithmetic, we now allow formulas whose used regs are a subset of the required regs not just the other way around. There are few more loops in the test-suite that are now successfully LSRed. I have benchmarked those and found very minimal change. Fixes <rdar://problem/13965777> llvm-svn: 207271	2014-04-25 21:02:21 +00:00
Manman Ren	3c44067a30	[inline cold threshold] Command line argument for inline threshold will override the default cold threshold. When we use command line argument to set the inline threshold, the default cold threshold will not be used. This is in line with how we use OptSizeThreshold. When we want a higher threshold for all functions, we do not have to set both inline threshold and cold threshold. llvm-svn: 207245	2014-04-25 17:34:55 +00:00
Karthik Bhat	6a48f7d66e	Allow vectorization of bit intrinsics in BB Vectorizer. This patch adds support for vectorization of bit intrinsics such as bswap,ctpop,ctlz,cttz. llvm-svn: 207174	2014-04-25 03:33:48 +00:00
Zinovy Nis	27c486ffe1	[CLNUP] Test commit. Remove newline. llvm-svn: 207089	2014-04-24 08:42:58 +00:00
Karthik Bhat	81e6bf0a41	Allow vectorization of few missed llvm intrinsic calls in BBVectorizor by handling them in isVectorizableIntrinsic function. llvm-svn: 207085	2014-04-24 07:29:55 +00:00
Michael J. Spencer	dee4b2c379	[InstCombine][x86] Constant fold psll intrinsics. This excludes avx512 as I don't have hardware to verify. It excludes _dq variants because they are represented in the IR as <{2,4} x i64> when it's actually a byte shift of the entire i{128,265}. This also excludes _dq_bs as they aren't at all supported by the backend. There are also no corresponding instructions in the ISA. I have no idea why they exist... llvm-svn: 207058	2014-04-24 00:58:18 +00:00
Filipe Cabecinhas	1a80595a2b	Optimize some special cases for SSE4a insertqi Summary: Since the upper 64 bits of the destination register are undefined when performing this operation, we can substitute it and let the optimizer figure out that only a copy is needed. Also added range merging, if an instruction copies a range that can be merged with a previous copied range. Added test cases for both optimizations. Reviewers: grosbach, nadav CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3357 llvm-svn: 207055	2014-04-24 00:38:14 +00:00
Matt Arsenault	60728177fb	Handle addrspacecast when looking at memcpys from globals llvm-svn: 207054	2014-04-24 00:01:09 +00:00
Matt Arsenault	fed895c9c6	Convert test to FileCheck llvm-svn: 207015	2014-04-23 19:32:37 +00:00
Alexander Musman	f0785f4db4	[LV] Statistics numbers for LoopVectorize introduced: a number of analyzed loops & a number of vectorized loops. Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized. Patch by Zinovy Nis Differential Revision: http://reviews.llvm.org/D3438 llvm-svn: 206956	2014-04-23 08:40:37 +00:00
Juergen Ributzka	575bcb770a	[Constant Hoisting] Materialize the constant before the cloned cast instruction. In the case where the constant comes from a cloned cast instruction, the materialization code has to go before the cloned cast instruction. This commit fixes the method that finds the materialization insertion point by making it aware of this case. This fixes <rdar://problem/15532441> llvm-svn: 206913	2014-04-22 18:06:58 +00:00
Rafael Espindola	bad3f77703	Simplify a vpermil* with constant mask. With a constant mask a vpermil* is just a shufflevector. This patch implements that simplification. This allows us to produce denser code. It should also allow more folding down the line. llvm-svn: 206801	2014-04-21 22:06:04 +00:00
Reid Kleckner	9b2cc647eb	Fix PR7272 in -tailcallelim instead of the inliner The -tailcallelim pass should be checking if byval or inalloca args can be captured before marking calls as tail calls. This was the real root cause of PR7272. With a better fix in place, revert the inliner change from r105255. The test case it introduced still passes and has been moved to test/Transforms/Inline/byval-tail-call.ll. Reviewers: chandlerc Differential Revision: http://reviews.llvm.org/D3403 llvm-svn: 206789	2014-04-21 20:48:47 +00:00
Jiangning Liu	300a6b84f2	Add missing config file for newly added test case introduced by r206563. llvm-svn: 206567	2014-04-18 09:05:50 +00:00
Jiangning Liu	ad874fca28	This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64. A new test case is also added for ARM64. Patched by Z.Zheng llvm-svn: 206563	2014-04-18 07:57:54 +00:00
Diego Novillo	0915c047c2	Fix bug 19437 - Only add discriminators for DWARF 4 and above. Summary: This prevents the discriminator generation pass from triggering if the DWARF version being used in the module is prior to 4. Reviewers: echristo, dblaikie CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3413 llvm-svn: 206507	2014-04-17 22:33:50 +00:00
Gerolf Hoflehner	ecebc3730e	Reverse 206485. After some discussions the preferred semantics of the always_inline attribute is inline always when the compiler can determine that it it safe to do so. llvm-svn: 206487	2014-04-17 19:14:06 +00:00
Tim Northover	037f26f212	Atomics: promote ARM's IR-based atomics pass to CodeGen. Still only 32-bit ARM using it at this stage, but the promotion allows direct testing via opt and is a reasonably self-contained patch on the way to switching ARM64. At this point, other targets should be able to make use of it without too much difficulty if they want. (See ARM64 commit coming soon for an example). llvm-svn: 206485	2014-04-17 18:22:47 +00:00
Gerolf Hoflehner	5f6268a40e	Inline a function when the always_inline attribute is set even when it contains a indirect branch. The attribute overrules correctness concerns like the escape of a local block address. This is for rdar://16501761 llvm-svn: 206429	2014-04-17 00:21:52 +00:00
Julien Lerouge	be4fe32eb8	Add lifetime markers for allocas created to hold byval arguments, make them appear in the InlineFunctionInfo. llvm-svn: 206308	2014-04-15 18:06:46 +00:00
NAKAMURA Takumi	0ec1918675	vect.omp.persistence.ll REQUIRES asserts due to -debug-only. llvm-svn: 206271	2014-04-15 10:12:47 +00:00
Alexey Bataev	b97f9e8698	D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadata llvm-svn: 206266	2014-04-15 09:37:30 +00:00
Matt Arsenault	fed3dc8dc6	Revert "Revert r206045, "Fix shift by constants for vector."" Fix cases where the Value itself is used, and not the constant value. llvm-svn: 206214	2014-04-14 21:50:37 +00:00
NAKAMURA Takumi	58ad0c87f8	Whitespace. llvm-svn: 206154	2014-04-14 07:03:13 +00:00
NAKAMURA Takumi	26afa982ec	Revert r206045, "Fix shift by constants for vector." It broke some builders, at least, i686. llvm-svn: 206153	2014-04-14 07:02:57 +00:00
Hal Finkel	0192cbac66	[PowerPC] [Constant Hoisting] Enable constant hoisting on PPC Implements the various TTI functions to enable constant hoisting on PPC. The only significant test-suite change is this: MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup (which essentially reverses the slowdown from r206120). llvm-svn: 206141	2014-04-13 23:02:40 +00:00
Serge Pavlov	4bb54d51c8	Recognize test for overflow in integer multiplication. If multiplication involves zero-extended arguments and the result is compared as in the patterns: %mul32 = trunc i64 %mul64 to i32 %zext = zext i32 %mul32 to i64 %overflow = icmp ne i64 %mul64, %zext or %overflow = icmp ugt i64 %mul64 , 0xffffffff then the multiplication may be replaced by call to umul.with.overflow. This change fixes PR4917 and PR4918. Differential Revision: http://llvm-reviews.chandlerc.com/D2814 llvm-svn: 206137	2014-04-13 18:23:41 +00:00
Juergen Ributzka	cf03068d91	[ARM64] Never hoist the shift value of a shift instruction. There is no need to check if we want to hoist the immediate value of an shift instruction. Simply return TCC_Free right away. llvm-svn: 206101	2014-04-12 02:53:51 +00:00
Juergen Ributzka	6e17aa45a3	[ARM64] Fix the cost model for cheap large constants. Originally the cost model would give up for large constants and just return the maximum cost. This is not what we want for constant hoisting, because some of these constants are large in bitwidth, but are still cheap to materialize. This commit fixes the cost model to either return TCC_Free if the cost cannot be determined, or accurately calculate the cost even for large constants (bitwidth > 128). This fixes <rdar://problem/16591573>. llvm-svn: 206100	2014-04-12 02:36:28 +00:00
Hal Finkel	c3998306f4	Add the ability to use GEPs for address sinking in CGP The current memory-instruction optimization logic in CGP, which sinks parts of the address computation that can be adsorbed by the addressing mode, does this by explicitly converting the relevant part of the address computation into IR-level integer operations (making use of ptrtoint and inttoptr). For most targets this is currently not a problem, but for targets wishing to make use of IR-level aliasing analysis during CodeGen, the use of ptrtoint/inttoptr is a problem for two reasons: 1. BasicAA becomes less powerful in the face of the ptrtoint/inttoptr 2. In cases where type-punning was used, and BasicAA was used to override TBAA, BasicAA may no longer do so. (this had forced us to disable all use of TBAA in CodeGen; something which we can now enable again) This (use of GEPs instead of ptrtoint/inttoptr) is not currently enabled by default (except for those targets that use AA during CodeGen), and so aside from some PowerPC subtargets and SystemZ, there should be no change in behavior. We may be able to switch completely away from the ptrtoint/inttoptr sinking on all targets, but further testing is required. I've doubled-up on a number of existing tests that are sensitive to the address sinking behavior (including some store-merging tests that are sensitive to the order of the resulting ADD operations at the SDAG level). llvm-svn: 206092	2014-04-12 00:59:48 +00:00
Matt Arsenault	173a1e577c	Fix shift by constants for vector. ashr <N x iM>, <N x iM> M -> undef llvm-svn: 206045	2014-04-11 17:57:53 +00:00
Arnold Schwaighofer	b373e01d87	Reapply "SLPVectorizer: Ignore users that are insertelements we can reschedule them" This commit reapplies 205018. After 205855 we should correctly vectorize intrinsics. llvm-svn: 205965	2014-04-10 13:41:35 +00:00
Juergen Ributzka	48c8c07d0a	[ARM64] Fix immediate cost calculation for types larger than i64. The immediate cost calculation code was hitting an assertion in the included test case, because APInt was still internally 128-bits. Truncating it to 64-bits fixed the issue. Fixes <rdar://problem/16572521>. llvm-svn: 205947	2014-04-10 01:36:59 +00:00
Arnold Schwaighofer	fd0bf5d6e5	SLPVectorizer: Only vectorize intrinsics whose operands are widened equally The vectorizer only knows how to vectorize intrinics by widening all operands by the same factor. Patch by Tyler Nowicki! llvm-svn: 205855	2014-04-09 14:20:47 +00:00
Juergen Ributzka	c11e8b67bb	[Constant Hoisting][ARM64] Enable constant hoisting for ARM64. This implements the target-hooks for ARM64 to enable constant hoisting. This fixes <rdar://problem/14774662> and <rdar://problem/16381500>. llvm-svn: 205791	2014-04-08 20:39:59 +00:00
Eric Christopher	beb2cd6b7c	Handle vlas during inline cost computation if they'll be turned into a constant size alloca by inlining. Ran a run over the testsuite, no results out of the noise, fixes the testcase in the PR. PR19115. llvm-svn: 205710	2014-04-07 13:36:21 +00:00
Juergen Ributzka	9dff139025	Update the test to use FileCheck. llvm-svn: 205647	2014-04-04 19:57:01 +00:00
Saleem Abdulrasool	905b6d192c	ARM: yet another round of ARM test clean ups llvm-svn: 205586	2014-04-03 23:47:24 +00:00
Eli Bendersky	9966b26dac	Fix PR19270 - type mismatch caused by invalid optimization. Patch by Jingyue Wu. llvm-svn: 205547	2014-04-03 17:51:58 +00:00
Juergen Ributzka	b0abeb0984	Add test case for [Constant Hoisting] Erase dead cast instructions (r204538). llvm-svn: 205484	2014-04-02 23:06:22 +00:00
Adrian Prantl	fd43f6ba3b	typo llvm-svn: 205473	2014-04-02 22:17:30 +00:00
Juergen Ributzka	27435b3b8a	Add comments and test case for [X86TTI] Make constant base pointers for GetElementPtr opaque (r204739). llvm-svn: 205468	2014-04-02 21:45:36 +00:00
Juergen Ributzka	ab6f44efaf	Add test case for [Stackmaps][X86TTI] Fix think-o in getIntImmCost calculation (r204738). llvm-svn: 205464	2014-04-02 21:15:36 +00:00
Tim Northover	670df3d937	SLPVectorizer: compare entire intrinsic for SLP compatibility. Some Intrinsics are overloaded to the extent that return type equality (all that's been checked up to now) does not guarantee that the arguments are the same. In these cases SLP vectorizer should not recurse into the operands, which can be achieved by comparing them as "Function *" rather than simply the ID. llvm-svn: 205424	2014-04-02 14:39:02 +00:00
Hal Finkel	b0ebdc0f43	[LoopVectorizer] Count dependencies of consecutive pointers as uniforms For the purpose of calculating the cost of the loop at various vectorization factors, we need to count dependencies of consecutive pointers as uniforms (which means that the VF = 1 cost is used for all overall VF values). For example, the TSVC benchmark function s173 has: ... %3 = add nsw i64 %indvars.iv, 16000 %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3 ... and we must realize that the add will be a scalar in order to correctly deduce it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all dependencies of a consecutive pointer must be a scalar (uniform), and so we simply need to add all consecutive pointers to the worklist that currently detects collects uniforms. Fixes PR19296. llvm-svn: 205387	2014-04-02 02:34:49 +00:00
Hal Finkel	2eed29f3c8	Implement X86TTI::getUnrollingPreferences This provides an initial implementation of getUnrollingPreferences for x86. getUnrollingPreferences is used by the generic (concatenation) unroller, which is distinct from the unrolling done by the loop vectorizer. Many modern x86 cores have some kind of uop cache and loop-stream detector (LSD) used to efficiently dispatch small loops, and taking full advantage of this requires unrolling small loops (small here means 10s of uops). These caches also have limits on the number of taken branches in the loop, and so we also cap the loop unrolling factor based on the maximum "depth" of the loop. This is currently calculated with a partial DFS traversal (partial because it will stop early if the path length grows too much). This is still an approximation, and one that is both conservative (because it does not account for branches eliminated via block placement) and optimistic (because it is only recording the maximum depth over minimum paths). Nevertheless, because the loops that fit in these uop caches are so small, it is not clear how much the details matter. The original set of patches posted for review produced the following test-suite performance results (from the TSVC benchmark) at that time: ControlLoops-dbl - 13% speedup ControlLoops-flt - 15% speedup Reductions-dbl - 7.5% speedup llvm-svn: 205348	2014-04-01 18:50:34 +00:00
Hal Finkel	86b3064f2b	Move partial/runtime unrolling late in the pipeline The generic (concatenation) loop unroller is currently placed early in the standard optimization pipeline. This is a good place to perform full unrolling, but not the right place to perform partial/runtime unrolling. However, most targets don't enable partial/runtime unrolling, so this never mattered. However, even some x86 cores benefit from partial/runtime unrolling of very small loops, and follow-up commits will enable this. First, we need to move partial/runtime unrolling late in the optimization pipeline (importantly, this is after SLP and loop vectorization, as vectorization can drastically change the size of a loop), while keeping the full unrolling where it is now. This change does just that. llvm-svn: 205264	2014-03-31 23:23:51 +00:00
Arnold Schwaighofer	15262e6703	Revert "SLPVectorizer: Ignore users that are insertelements we can reschedule them" This reverts commit r205018. Conflicts: lib/Transforms/Vectorize/SLPVectorizer.cpp test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll This is breaking libclc build. llvm-svn: 205260	2014-03-31 23:05:56 +00:00
Adam Nemet	10c4ce2584	[X86] Adjust cost of FP_TO_UINT v4f64->v4i32 as well Pretty obvious follow-on to r205159 to also handle conversion from double besides float. Fixes <rdar://problem/16373208> llvm-svn: 205253	2014-03-31 21:54:48 +00:00
Adam Nemet	6dafe97271	[X86] Adjust cost of FP_TO_UINT v8f32->v8i32 There is no direct AVX instruction to convert to unsigned. I have some ideas how we may be able to do this with three vector instructions but the current backend just bails on this to get it scalarized. See the comment why we need to adjust the cost returned by BasicTTI. The test is a bit roundabout (and checks assembly rather than bit code) because I'd like it to work even if at some point we could vectorize this conversion. Fixes <rdar://problem/16371920> llvm-svn: 205159	2014-03-30 18:07:13 +00:00
NAKAMURA Takumi	4cf1a3be82	llvm/test/Transforms/LoopStrengthReduce/ARM64/lsr-*.ll: Add explicit triple arm64-unknown for targeting pecoff. llvm-svn: 205125	2014-03-30 05:01:04 +00:00
Tim Northover	00ed9964c6	ARM64: initial backend import This adds a second implementation of the AArch64 architecture to LLVM, accessible in parallel via the "arm64" triple. The plan over the coming weeks & months is to merge the two into a single backend, during which time thorough code review should naturally occur. Everything will be easier with the target in-tree though, hence this commit. llvm-svn: 205090	2014-03-29 10:18:08 +00:00
Arnold Schwaighofer	c9d58e8d32	SLPVectorizer: Take credit for free extractelement instructions Extract element instructions that will be removed when vectorzing lower the cost. Patch by Arch D. Robison! llvm-svn: 205020	2014-03-28 17:21:32 +00:00
Arnold Schwaighofer	b190cb30c3	SLPVectorizer: Ignore users that are insertelements we can reschedule them Patch by Arch D. Robison! llvm-svn: 205018	2014-03-28 17:21:22 +00:00
Erik Verbruggen	5e1bac3a38	Revert "InstCombine: merge constants in both operands of icmp." This reverts commit r204912, and follow-up commit r204948. This introduced a performance regression, and the fix is not completely clear yet. llvm-svn: 205010	2014-03-28 14:50:57 +00:00
Erik Verbruggen	2074ebd8af	Revert "GVN: merge overflow intrinsics with non-overflow instructions." This reverts commit r203553, and follow-up commits r203558 and r203574. I will follow this up on the mailinglist to do it in a way that won't cause subtle PRE bugs. llvm-svn: 205009	2014-03-28 14:42:34 +00:00
Reid Kleckner	3bdf9bc48b	InstCombine: Don't combine constants on unsigned icmps Fixes a miscompile introduced in r204912. It would miscompile code like (unsigned)(a + -49) <= 5U. The transform would turn this into (unsigned)a < 55U, which would return true for values in [0, 49], when it should not. llvm-svn: 204948	2014-03-27 17:49:27 +00:00
Rafael Espindola	24a669d225	Prevent alias from pointing to weak aliases. This adds back r204781. Original message: Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204934	2014-03-27 15:26:56 +00:00
Erik Verbruggen	59a1219846	InstCombine: merge constants in both operands of icmp. Transform: icmp X+Cst2, Cst into: icmp X, Cst-Cst2 when Cst-Cst2 does not overflow, and the add has nsw. llvm-svn: 204912	2014-03-27 11:16:05 +00:00
Quentin Colombet	3914bf516b	[X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64 and v4i64->v4f64. The new costs match what we did for SSE2 and reflect the reality of our codegen. <rdar://problem/16381225> llvm-svn: 204884	2014-03-27 00:52:16 +00:00
Jim Grosbach	6373e70f81	add 'requires asserts' to test that needs it llvm-svn: 204882	2014-03-27 00:20:42 +00:00
Jim Grosbach	72fbde84b8	X86: Correct vectorization cost model for v8f32->v8i8. Fix the cost model to reflect the reality of our codegen. rdar://16370633 llvm-svn: 204880	2014-03-27 00:04:11 +00:00
Nick Lewycky	77d5fb40c8	Treat lifetime.start'd memory like we treat freshly alloca'd memory. Patch by Björn Steinbrink! llvm-svn: 204876	2014-03-26 23:45:15 +00:00
Rafael Espindola	65481d7b97	Revert "Prevent alias from pointing to weak aliases." This reverts commit r204781. I will follow up to with msan folks to see what is what they were trying to do with aliases to weak aliases. llvm-svn: 204784	2014-03-26 06:14:40 +00:00
Rafael Espindola	3b712a84a9	Prevent alias from pointing to weak aliases. Aliases are just another name for a position in a file. As such, the regular symbol resolutions are not applied. For example, given define void @my_func() { ret void } @my_alias = alias weak void ()* @my_func @my_alias2 = alias void ()* @my_alias We produce without this patch: .weak my_alias my_alias = my_func .globl my_alias2 my_alias2 = my_alias That is, in the resulting ELF file my_alias, my_func and my_alias are just 3 names pointing to offset 0 of .text. That is not the semantics of IR linking. For example, linking in a @my_alias = alias void ()* @other_func would require the strong my_alias to override the weak one and my_alias2 would end up pointing to other_func. There is no way to represent that with aliases being just another name, so the best solution seems to be to just disallow it, converting a miscompile into an error. llvm-svn: 204781	2014-03-26 04:48:47 +00:00
Richard Osborne	0af4aa9a19	[InstCombine] Don't fold bitcast into store if it would need addrspacecast Summary: Previously the code didn't check if the before and after types for the store were pointers to different address spaces. This resulted in instcombine using a bitcast to convert between pointers to different address spaces, causing an assertion due to the invalid cast. It is not be appropriate to use addrspacecast this case because it is not guaranteed to be a no-op cast. Instead bail out and do not do the transformation. CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D3117 llvm-svn: 204733	2014-03-25 17:21:41 +00:00
Karthik Bhat	195e9dd91b	Allow constant folding of ceil function whenever feasible llvm-svn: 204583	2014-03-24 04:36:06 +00:00
Lang Hames	459b5dc39e	Revert r204076 for now - it caused significant regressions in a number of benchmarks. <rdar://problem/16368461> llvm-svn: 204558	2014-03-23 04:22:31 +00:00
Juergen Ributzka	e802d507b0	[Constant Hoisting] Fix multiple entries for the same basic block in PHI nodes. A PHI node usually has only one value/basic block pair per incoming basic block. In the case of a switch statement it is possible that a following PHI node may have more than one such pair per incoming basic block. E.g.: %0 = phi i64 [ 123456, %case2 ], [ 654321, %Entry ], [ 654321, %Entry ] This is valid and the verfier doesn't complain, because both values are the same. Constant hoisting materializes the constant for each operand separately and the value is still the same, but the variable names have changed. As a result the verfier can't recognize anymore that they are the same value and complains. This fix adds special update code for PHI node in constant hoisting to prevent this corner case. This fixes <rdar://problem/16394449> llvm-svn: 204537	2014-03-22 01:49:27 +00:00
Tom Stellard	edfd81d965	Sink: Don't sink static allocas from the entry block CodeGen treats allocas outside the entry block as dynamically sized stack objects. llvm-svn: 204473	2014-03-21 15:51:51 +00:00
Juergen Ributzka	f0dff49ad0	[Constant Hoisting] Make the constant materialization cost operand dependent Extend the target hook to take also the operand index into account when calculating the cost of the constant materialization. Related to <rdar://problem/16381500> llvm-svn: 204435	2014-03-21 06:04:45 +00:00
Juergen Ributzka	5429c06b90	[Constant Hoisting] Change the algorithm to only track constants for instructions. Originally the algorithm would search for expensive constants and track their users, which could be instructions and constant expressions. This change only tracks the constants for instructions, but constant expressions are indirectly covered too. If an operand is an constant expression, then we look through the expression to find anny expensive constants. The algorithm keep now track of the instruction and the operand index where the constant is used. This allows more precise hoisting of constant materialization code for PHI instructions, because we only hoist to the basic block of the incoming operand. Before we had to find the idom of all PHI operands and hoist the materialization code there. This also makes updating of instructions easier. Before we had to keep track of the original constant, find it in the instructions, and then replace it. Now we can just simply update the operand. Related to <rdar://problem/16381500> llvm-svn: 204433	2014-03-21 06:04:36 +00:00
Juergen Ributzka	46357931ab	Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass." I will break this up into smaller pieces for review and recommit. llvm-svn: 204393	2014-03-20 20:17:13 +00:00
Juergen Ributzka	6dab520c70	[Constant Hoisting] Extend coverage of the constant hoisting pass. This commit extends the coverage of the constant hoisting pass, adds additonal debug output and updates the function names according to the style guide. Related to <rdar://problem/16381500> llvm-svn: 204389	2014-03-20 19:55:52 +00:00
Mark Seaborn	b6118c5b17	Remove LowerInvoke's obsolete "-enable-correct-eh-support" option This option caused LowerInvoke to generate code using SJLJ-based exception handling, but there is no code left that interprets the jmp_buf stack that the resulting code maintained (llvm.sjljeh.jblist). This option has been obsolete for a while, and replaced by SjLjEHPrepare. This leaves the default behaviour of LowerInvoke, which is to convert invokes to calls. Differential Revision: http://llvm-reviews.chandlerc.com/D3136 llvm-svn: 204388	2014-03-20 19:54:47 +00:00
Mark Seaborn	277fbe1bfe	Add a test for LowerInvoke that doesn't use "-enable-correct-eh-support" None of the existing tests for LowerInvoke check LowerInvoke's output, and all but one use "-enable-correct-eh-support", which is obsolete, so those tests will be removed when that option is removed. To make sure LowerInvoke will still have test coverage, this adds a test for its default mode which converts invokes to calls. Differential Revision: http://llvm-reviews.chandlerc.com/D3124 llvm-svn: 204344	2014-03-20 14:12:47 +00:00
Duncan P. N. Exon Smith	cb1c81afa0	Fix use_iterator crash in ObjCArc from r203364 The use_iterator redesign in r203364 introduced an increment past the end of a range in -objc-arc-contract. Added an explicit check for the end of the range. <rdar://problem/16333235> llvm-svn: 204195	2014-03-18 22:32:43 +00:00
Diego Novillo	213bb00245	Tolerate unmangled names in sample profiles. Summary: The compiler does not always generate linkage names. If a function has been inlined and its body elided, its linkage name may not be generated. When the binary executes, the profiler will use its unmangled name when attributing samples. This results in unmangled names in the input profile. We are currently failing hard when this happens. However, in this case all that happens is that we fail to attribute samples to the inlined function. While this means fewer optimization opportunities, it should not cause a compilation failure. This patch accepts all valid function names, regardless of whether they were mangled or not. Reviewers: chandlerc CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D3087 llvm-svn: 204142	2014-03-18 12:03:12 +00:00
Dan Gohman	172c5d3451	Use range metadata instead of introducing selects. When GlobalOpt has determined that a GlobalVariable only ever has two values, it would convert the GlobalVariable to a boolean, and introduce SelectInsts at every load, to choose between the two possible values. These SelectInsts introduce overhead and other unpleasantness. This patch makes GlobalOpt just add range metadata to loads from such GlobalVariables instead. This enables the same main optimization (as seen in test/Transforms/GlobalOpt/integer-bool.ll), without introducing selects. The main downside is that it doesn't get the memory savings of shrinking such GlobalVariables, but this is expected to be negligible. llvm-svn: 204076	2014-03-17 19:57:04 +00:00
NAKAMURA Takumi	64587433ce	llvm/test/Transforms/SampleProfile/syntax.ll: Suppress checking the message catalog in ENOENT. It is locale-dependent on Windows. llvm-svn: 203997	2014-03-15 02:32:21 +00:00
Diego Novillo	a32aa3251c	Use DiagnosticInfo facility. Summary: The sample profiler pass emits several error messages. Instead of just aborting the compiler with report_fatal_error, we can emit better messages using DiagnosticInfo. This adds a new sub-class of DiagnosticInfo to handle the sample profiler. Reviewers: chandlerc, qcolombet CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D3086 llvm-svn: 203976	2014-03-14 21:58:59 +00:00
Rafael Espindola	2fb5bc33a3	Remove the linker_private and linker_private_weak linkages. These linkages were introduced some time ago, but it was never very clear what exactly their semantics were or what they should be used for. Some investigation found these uses: * utf-16 strings in clang. * non-unnamed_addr strings produced by the sanitizers. It turns out they were just working around a more fundamental problem. For some sections a MachO linker needs a symbol in order to split the section into atoms, and llvm had no idea that was the case. I fixed that in r201700 and it is now safe to use the private linkage. When the object ends up in a section that requires symbols, llvm will use a 'l' prefix instead of a 'L' prefix and things just work. With that, these linkages were already dead, but there was a potential future user in the objc metadata information. I am still looking at CGObjcMac.cpp, but at this point I am convinced that linker_private and linker_private_weak are not what they need. The objc uses are currently split in * Regular symbols (no '\01' prefix). LLVM already directly provides whatever semantics they need. * Uses of a private name (start with "\01L" or "\01l") and private linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm agrees with clang on L being ok or not for a given section. I have two patches in code review for this. * Uses of private name and weak linkage. The last case is the one that one could think would fit one of these linkages. That is not the case. The semantics are * the linker will merge these symbol by name. * the linker will hide them in the final DSO. Given that the merging is done by name, any of the private (or internal) linkages would be a bad match. They allow llvm to rename the symbols, and that is really not what we want. From the llvm point of view, these objects should really be (linkonce\|weak)(_odr)?. For now, just keeping the "\01l" prefix is probably the best for these symbols. If we one day want to have a more direct support in llvm, IMHO what we should add is not a linkage, it is just a hidden_symbol attribute. It would be applicable to multiple linkages. For example, on weak it would produce the current behavior we have for objc metadata. On internal, it would be equivalent to private (and we should then remove private). llvm-svn: 203866	2014-03-13 23:18:37 +00:00
Owen Anderson	9b8f9c3d95	Fix a bug in InstCombine where we would incorrectly attempt to construct a bitcast between pointers of two different address spaces if they happened to have the same pointer size. llvm-svn: 203862	2014-03-13 22:51:43 +00:00
Manuel Jacob	a7c48f99ae	CodeGenPrep: sink extends of illegal types into use block. Summary: This helps the instruction selector to lower an i64 * i64 -> i128 multiplication into a single instruction on targets which support it. This is an update of D2973 which was reverted because of a bug reported as PR19084. Reviewers: t.p.northover, chapuni Reviewed By: t.p.northover CC: llvm-commits, alex, chapuni Differential Revision: http://llvm-reviews.chandlerc.com/D3021 llvm-svn: 203797	2014-03-13 13:36:25 +00:00
Karthik Bhat	294607e122	Fix PR18800. llvm intrinsic memcpy takes 5 arguments void @llvm.memcpy.p0i8.p0i8.i32(i8* <dest>, i8* <src>, i32 <len>, i32 <align>, i1 <isvolatile>).The test case incorrectly uses the old format resulting in isVolatile function in MemIntrinsic to crash during SROA transformation.Modified the test case to use correct signature of memcpy and memset. llvm-svn: 203750	2014-03-13 04:50:29 +00:00
Rafael Espindola	1cf777bc12	This test need the X86 backend, move it to the X86 sub directory. llvm-svn: 203725	2014-03-12 22:03:43 +00:00
Michael Zolotukhin	66806aef1e	PR17473: Don't normalize an expression during postinc transformation unless it's invertible. llvm-svn: 203719	2014-03-12 21:31:05 +00:00
Raul E. Silvera	62f0236d36	Resubmit "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..." This reverts commit 86cb795388643710dab34941ddcb5a9470ac39d8. The problems previously found have been resolved through other CLs. llvm-svn: 203707	2014-03-12 20:21:50 +00:00
Rafael Espindola	f3336bc1d5	Reject alias to undefined symbols in the verifier. On ELF and COFF an alias is just another name for a position in the file. There is no way to refer to a position in another file, so an alias to undefined is meaningless. MachO currently doesn't support aliases. The spec has a N_INDR, which when implemented will have a different set of restrictions. Adding support for it shouldn't be harder than any other IR extension. For now, having the IR represent what is actually possible with current tools makes it easier to fix the design of GlobalAlias. llvm-svn: 203705	2014-03-12 20:15:49 +00:00
Hans Wennborg	b73c0b041d	Allow switch-to-lookup table for tables with holes by adding bitmask check This allows us to generate table lookups for code such as: unsigned test(unsigned x) { switch (x) { case 100: return 0; case 101: return 1; case 103: return 2; case 105: return 3; case 107: return 4; case 109: return 5; case 110: return 6; default: return f(x); } } Since cases 102, 104, etc. are not constants, the lookup table has holes in those positions. We therefore guard the table lookup with a bitmask check. Patch by Jasper Neumann! llvm-svn: 203694	2014-03-12 18:35:40 +00:00
Evan Cheng	ad6efbfa0f	Revert r203488 and r203520. llvm-svn: 203687	2014-03-12 18:09:37 +00:00
Erik Verbruggen	3f5dcc97e0	Fix crash in PRE. After r203553 overflow intrinsics and their non-intrinsic (normal) instruction get hashed to the same value. This patch prevents PRE from moving an instruction into a predecessor block, and trying to add a phi node that gets two different types (the intrinsic result and the non-intrinsic result), resulting in a failing assert. llvm-svn: 203574	2014-03-11 15:07:32 +00:00
Tim Northover	e94a518a22	IR: add a second ordering operand to cmpxhg for failure The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559	2014-03-11 10:48:52 +00:00
Erik Verbruggen	e2d437148a	GVN: merge overflow intrinsics with non-overflow instructions. When an overflow intrinsic is followed by a non-overflow instruction, replace the latter with an extract. For example: %sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b) %sadd3 = add i32 %a, %b Here the add statement will be replaced by an extract. When an overflow intrinsic follows a non-overflow instruction, a clone of the intrinsic is inserted before the normal instruction, which makes it the same as the previous case. Subsequent runs of GVN can then clean up the duplicate instructions and insert the extract. This fixes PR8817. llvm-svn: 203553	2014-03-11 09:36:48 +00:00
Diego Novillo	92aa8c220a	Use discriminator information in sample profiles. Summary: When the sample profiles include discriminator information, use the discriminator values to distinguish instruction weights in different basic blocks. This modifies the BodySamples mapping to map <line, discriminator> pairs to weights. Instructions on the same line but different blocks, will use different discriminator values. This, in turn, means that the blocks may have different weights. Other changes in this patch: - Add tests for positive values of line offset, discriminator and samples. - Change data types from uint32_t to unsigned and int and do additional validation. Reviewers: chandlerc CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2857 llvm-svn: 203508	2014-03-10 22:41:28 +00:00
Benjamin Kramer	3ef5e46b6d	MemCpyOpt: When merging memsets also merge the trivial case of two memsets with the same destination. The testcase is from PR19092, but I think the bug described there is actually a clang issue. llvm-svn: 203489	2014-03-10 21:05:13 +00:00
Evan Cheng	0e8f4612a9	For functions with ARM target specific calling convention, when simplify-libcall optimize a call to a llvm intrinsic to something that invovles a call to a C library call, make sure it sets the right calling convention on the call. e.g. extern double pow(double, double); double t(double x) { return pow(10, x); } Compiles to something like this for AAPCS-VFP: define arm_aapcs_vfpcc double @t(double %x) #0 { entry: %0 = call double @llvm.pow.f64(double 1.000000e+01, double %x) ret double %0 } declare double @llvm.pow.f64(double, double) #1 Simplify libcall (part of instcombine) will turn the above into: define arm_aapcs_vfpcc double @t(double %x) #0 { entry: %__exp10 = call double @__exp10(double %x) #1 ret double %__exp10 } declare double @__exp10(double) The pre-instcombine code works because calls to LLVM builtins are special. Instruction selection will chose the right calling convention for the call. However, the code after instcombine is wrong. The call to __exp10 will use the C calling convention. I can think of 3 options to fix this. 1. Make "C" calling convention just work since the target should know what CC is being used. This doesn't work because each function can use different CC with the "pcs" attribute. 2. Have Clang add the right CC keyword on the calls to LLVM builtin. This will work but it doesn't match the LLVM IR specification which states these are "Standard C Library Intrinsics". 3. Fix simplify libcall so the resulting calls to the C routines will have the proper CC keyword. e.g. %__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1 This works and is the solution I implemented here. Both solutions #2 and #3 would work. After carefully considering the pros and cons, I decided to implement #3 for the following reasons. 1. It doesn't change the "spec" of the intrinsics. 2. It's a self-contained fix. There are a couple of potential downsides. 1. There could be other places in the optimizer that is broken in the same way that's not addressed by this. 2. There could be other calling conventions that need to be propagated by simplify-libcall that's not handled. But for now, this is the fix that I'm most comfortable with. llvm-svn: 203488	2014-03-10 20:49:45 +00:00
NAKAMURA Takumi	1783e1e984	Revert r203230, "CodeGenPrep: sink extends of illegal types into use block." It choked i686 stage2. llvm-svn: 203386	2014-03-09 11:01:07 +00:00
David Majnemer	c4ab61cb2f	IR: Change inalloca's grammar a bit The grammar for LLVM IR is not well specified in any document but seems to obey the following rules: - Attributes which have parenthesized arguments are never preceded by commas. This form of attribute is the only one which ever has optional arguments. However, not all of these attributes support optional arguments: 'thread_local' supports an optional argument but 'addrspace' does not. Interestingly, 'addrspace' is documented as being a "qualifier". What constitutes a qualifier? I cannot find a definition. - Some attributes use a space between the keyword and the value. Examples of this form are 'align' and 'section'. These are always preceded by a comma. - Otherwise, the attribute has no argument. These attributes do not have a preceding comma. Sometimes an attribute goes before the instruction, between the instruction and it's type, or after it's type. 'atomicrmw' has 'volatile' between the instruction and the type while 'call' has 'tail' preceding the instruction. With all this in mind, it seems most consistent for 'inalloca' on an 'inalloca' instruction to occur before between the instruction and the type. Unlike the current formulation, there would be no preceding comma. The combination 'alloca inalloca' doesn't look particularly appetizing, perhaps a better spelling of 'inalloca' is down the road. llvm-svn: 203376	2014-03-09 06:41:58 +00:00
Tim Northover	ad3d81d320	CodeGenPrep: sink extends of illegal types into use block. This helps the instruction selector to lower an i64 * i64 -> i128 multiplication into a single instruction on targets which support it. Patch by Manuel Jacob. llvm-svn: 203230	2014-03-07 11:04:30 +00:00
Tim Northover	fad2761ca0	InstCombine: form shuffles from wider range of insert/extractelements Sequences of insertelement/extractelements are sometimes used to build vectorsr; this code tries to put them back together into shuffles, but could only produce a completely uniform shuffle types (<N x T> from two <N x T> sources). This should allow shuffles with different numbers of elements on the input and output sides as well. llvm-svn: 203229	2014-03-07 10:24:44 +00:00
Karthik Bhat	b67688a87c	Allow constant folding of round function whenever feasible llvm-svn: 203198	2014-03-07 04:36:21 +00:00
Karthik Bhat	daa8cd10d9	Allow constant folding of copysign llvm-svn: 203076	2014-03-06 05:32:52 +00:00
Raul E. Silvera	b741b945c5	Change math intrinsic attributes from readonly to readnone. These are operations that do not access memory but may be sensitive to floating-point environment changes. LLVM does not attempt to model FP environment changes, so this was unnecessarily conservative and was getting on the way of some optimizations, in particular SLP vectorization. llvm-svn: 203037	2014-03-06 00:18:15 +00:00
Arnold Schwaighofer	ab12363c02	LoopVectorizer: Preserve fast-math flags Fixes PR19045. llvm-svn: 203008	2014-03-05 21:10:47 +00:00
Benjamin Kramer	061d147f74	ConstantFolding: Also fold the vector overloads of our math intrinsics. llvm-svn: 202997	2014-03-05 19:41:48 +00:00
Raul E. Silvera	18ebc7cd0a	Trivial test commit. llvm-svn: 202924	2014-03-05 02:09:51 +00:00
Matt Arsenault	8377858c55	Allow constant folding of fma and fmuladd llvm-svn: 202914	2014-03-05 00:02:00 +00:00
Diego Novillo	f5041ce558	Pass to emit DWARF path discriminators. DWARF discriminators are used to distinguish multiple control flow paths on the same source location. When this happens, instructions across basic block boundaries will share the same debug location. This pass detects this situation and creates a new lexical scope to one of the two instructions. This lexical scope is a child scope of the original and contains a new discriminator value. This discriminator is then picked up from MCObjectStreamer::EmitDwarfLocDirective to be written on the object file. This fixes http://llvm.org/bugs/show_bug.cgi?id=18270. llvm-svn: 202752	2014-03-03 20:06:11 +00:00
Eric Christopher	75d49db19b	Add a debug info code generation level to the compile unit metadata and update everything accordingly. This can be used to conditionalize the amount of output in the backend based on the amount of debug requested/metadata emission scheme by a front end (e.g. clang). Paired with a commit to clang. llvm-svn: 202332	2014-02-27 01:24:56 +00:00
Nico Rieck	0a0c674b7a	Fix broken FileCheck prefixes llvm-svn: 202308	2014-02-26 22:29:11 +00:00
Reid Kleckner	22869378d9	GlobalOpt: Apply fastcc to internal x86_thiscallcc functions We should apply fastcc whenever profitable. We can expand this list, but there are lots of conventions with performance implications that we don't want to change. Differential Revision: http://llvm-reviews.chandlerc.com/D2705 llvm-svn: 202293	2014-02-26 19:57:30 +00:00
Nico Rieck	5645b36306	Fix broken FileCheck prefix llvm-svn: 202291	2014-02-26 19:51:08 +00:00
Andrew Trick	429e9edd08	Fix PR18165: LSR must avoid scaling factors that exceed the limit on truncated use. Patch by Michael Zolotukhin! llvm-svn: 202273	2014-02-26 16:31:56 +00:00
Chandler Carruth	dfb2efd0da	[SROA] Use the correct index integer size in GEPs through non-default address spaces. This isn't really a correctness issue (the values are truncated) but its much cleaner. Patch by Matt Arsenault! llvm-svn: 202252	2014-02-26 10:08:16 +00:00
Chandler Carruth	286d87ed38	[SROA] Teach SROA how to handle pointers from address spaces other than the default. Based on the patch by Matt Arsenault, D1764! I switched one place to use the more direct pointer type to compute the desired address space, and I reworked the memcpy rewriting section to reflect significant refactorings that this patch helped inspire. Thanks to several of the folks who helped review and improve the patch as well. llvm-svn: 202247	2014-02-26 08:25:02 +00:00
Chandler Carruth	aa72b93ae7	[SROA] Split the alignment computation complete for the memcpy rewriting to work independently for the slice side and the other side. This allows us to only compute the minimum of the two when we actually rewrite to a memcpy that needs to take the minimum, and preserve higher alignment for one side or the other when rewriting to loads and stores. This fix was inspired by seeing the result of some refactoring that makes addrspace handling better. llvm-svn: 202242	2014-02-26 07:29:54 +00:00
Chandler Carruth	6aedc106ba	[SROA] Fix PR18615 with some long overdue simplifications to the bounds checking in SROA. The primary change is to just rely on uge for checking that the offset is within the allocation size. This removes the explicit checks against isNegative which were terribly error prone (including the reversed logic that led to PR18615) and prevented us from supporting stack allocations larger than half the address space.... Ok, so maybe the latter isn't common but it's a silly restriction to have. Also, we used to try to support a PHI node which loaded from before the start of the allocation if any of the loaded bytes were within the allocation. This doesn't make any sense, we have never really supported loading or storing before the allocation starts. The simplified logic just doesn't care. We continue to allow loading past the end of the allocation in part to support cases where there is a PHI and some loads are larger than others and the larger ones reach past the end of the allocation. We could solve this a different and more conservative way, but I'm still somewhat paranoid about this. llvm-svn: 202224	2014-02-26 03:14:14 +00:00
Chandler Carruth	3bf18ed5e3	[SROA] Fix another instability in SROA with respect to the slice ordering. The fundamental problem that we're hitting here is that the use-def chain ordering is itself not a stable thing to be relying on in the rewriting for SROA. Further, we use a non-stable sort over the slices to arrange them based on the section of the alloca they're operating on. With a debugging STL implementation (or different implementations in stage2 and stage3) this can cause stage2 != stage3. The specific aspect of this problem fixed in this commit deals with the rewriting and load-speculation around PHIs and Selects. This, like many other aspects of the use-rewriting in SROA, is really part of the "strong SSA-formation" that is doen by SROA where it works very hard to canonicalize loads and stores in just the right way to satisfy the needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the same alloca, we test that loads downstream of the select are speculatable around it twice. If only one of the operands to the select needs to be rewritten, then if we get lucky we rewrite that one first and the select is immediately speculatable. This can cause the order of operand visitation, and thus the order of slices to be rewritten, to change an alloca from promotable to non-promotable and vice versa. The fix is to defer all of the speculation until after the rewrite phase is done. Once we've rewritten everything, we can accurately test for whether speculation will work (once, instead of twice!) and the order ceases to matter. This also happens to simplify the other subtlety of speculation -- we need to not speculate anything unless the result of speculating will make the alloca fully promotable by mem2reg. I had a previous attempt at simplifying this, but it was still pretty horrible. There is actually already a really nice test case for this in basictest.ll, but on multiple STL implementations and inputs, we just got "lucky". Fortunately, the test case is very small and we can essentially build it in exactly the opposite way to get reasonable coverage in both directions even from normal STL implementations. llvm-svn: 202092	2014-02-25 00:07:09 +00:00
Arnold Schwaighofer	9611d23d63	SLPVectorizer: Try vectorizing 'splat' stores Vectorize sequential stores of a broadcasted value. 5% on eon. radar://16124699 llvm-svn: 202067	2014-02-24 19:52:29 +00:00
Nick Lewycky	75080ff25d	Make sure that value handle users see the transformation of an indirect call to a direct call. This is important for the CallGraph iteration. Patch by Björn Steinbrink! llvm-svn: 201822	2014-02-20 23:00:15 +00:00
Tim Northover	d495642c54	X86: move test requiring X86TargetLowering info into its own directory If LLVM is built without X86 as a supported target then the test would mysteriously fail. llvm-svn: 201668	2014-02-19 12:24:19 +00:00
Tim Northover	d496355b53	Try addding datalayout in case that's what Hexagon doesn't like. Just a wild stab in the dark really, but in the absence of any ability to reproduce the problem... llvm-svn: 201658	2014-02-19 10:32:40 +00:00
Tim Northover	aeb8e06d4c	X86 CodeGenPrep: sink shufflevectors before shifts On x86, shifting a vector by a scalar is significantly cheaper than shifting a vector by another fully general vector. Unfortunately, because SelectionDAG operates on just one basic block at a time, the shufflevector instruction that reveals whether the right-hand side of a shift is really a scalar is often not visible to CodeGen when it's needed. This adds another handler to CodeGenPrepare, to sink any useful shufflevector instructions down to the basic block where they're used, predicated on a target hook (since on other architectures, doing so will often just introduce extra real work). rdar://problem/16063505 llvm-svn: 201655	2014-02-19 10:02:43 +00:00
Gerolf Hoflehner	7a463d0650	fix for null VectorizedValue assertion in the SLP Vectorizer (in function vectorizeTree()). radar://16064178 llvm-svn: 201501	2014-02-17 03:06:16 +00:00
Arnold Schwaighofer	26f567d8a4	SCEVExpander: Try hard not to create derived induction variables in other loops During LSR of one loop we can run into a situation where we have to expand the start of a recurrence of a loop induction variable in this loop. This start value is a value derived of the induction variable of a preceeding loop. SCEV has cannonicalized this value to a different recurrence than the recurrence of the preceeding loop's induction variable (the type and/or step direction) has changed). When we come to instantiate this SCEV we created a second induction variable in this preceeding loop. This patch tries to base such derived induction variables of the preceeding loop's induction variable. This helps twolf on arm and seems to help scimark2 on x86. Reapply with a fix for the case of a value derived from a pointer. radar://15970709 llvm-svn: 201496	2014-02-16 15:49:50 +00:00
Nico Rieck	7647178738	Fix broken CHECK lines llvm-svn: 201479	2014-02-16 07:31:05 +00:00
Arnold Schwaighofer	847d96142c	Revert "SCEVExpander: Try hard not to create derived induction variables in other loops" This reverts commit r201465. It broke an arm bot. llvm-svn: 201466	2014-02-15 18:16:56 +00:00
Arnold Schwaighofer	1e12f8563d	SCEVExpander: Try hard not to create derived induction variables in other loops During LSR of one loop we can run into a situation where we have to expand the start of a recurrence of a loop induction variable in this loop. This start value is a value derived of the induction variable of a preceeding loop. SCEV has cannonicalized this value to a different recurrence than the recurrence of the preceeding loop's induction variable (the type and/or step direction) has changed). When we come to instantiate this SCEV we created a second induction variable in this preceeding loop. This patch tries to base such derived induction variables of the preceeding loop's induction variable. This helps twolf on arm and seems to help scimark2 on x86. radar://15970709 llvm-svn: 201465	2014-02-15 17:11:56 +00:00
Matt Arsenault	aa689f5079	Do more addrspacecast transforms that happen for bitcast. Makes addrspacecast (gep) do addrspacecast (gep) instead. llvm-svn: 201376	2014-02-14 00:49:12 +00:00
Daniel Sanders	753e17629d	Re-commit: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Changes since review (and last commit attempt): - Fixed test failures that were missed due to configuration of local build. (fixes crash.ll and a couple others). - Fixed tests that happened to pass because the local build was on X86 (should fix 2007-12-17-InvokeAsm.ll) - mature-mc-support.ll's should no longer require all targets to be compiled. (should fix ARM and PPC buildbots) - Object output (-filetype=obj and similar) now forces the integrated assembler to be enabled regardless of default setting or -no-integrated-as. (should fix SystemZ buildbots) Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201333	2014-02-13 14:44:26 +00:00
Reid Kleckner	22b19da9fc	GlobalOpt: Aliases don't have sections, don't copy them when replacing As defined in LangRef, aliases do not have sections. However, LLVM's GlobalAlias class inherits from GlobalValue, which means we can read and set its section. We should probably ban that as a separate change, since it doesn't make much sense for an alias to have a section that differs from its aliasee. Fixes PR18757, where the section was being lost on the global in code from Clang like: extern "C" { __attribute__((used, section("CUSTOM"))) static int in_custom_section; } Reviewers: rafael.espindola Differential Revision: http://llvm-reviews.chandlerc.com/D2758 llvm-svn: 201286	2014-02-13 02:18:36 +00:00
Owen Anderson	883b5add8e	Remove a very old instcombine where we would turn sequences of selects into logical operations on the i1's driving them. This is a bad idea for every target I can think of (confirmed with micro tests on all of: x86-64, ARM, AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into a general purpose register, whereas consuming it directly into a select generally allows it to exist only transiently in a predicate or flags register. Chandler ran a set of performance tests with this change, and reported no measurable change on x86-64. llvm-svn: 201275	2014-02-12 23:54:07 +00:00
Daniel Sanders	abe212a3b8	Revert r201237+r201238: Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call It introduced multiple test failures in the buildbots. llvm-svn: 201241	2014-02-12 15:39:20 +00:00
Daniel Sanders	a7d504cf58	Demote EmitRawText call in AsmPrinter::EmitInlineAsm() and remove hasRawTextSupport() call Summary: AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output. The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as. All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler. Reviewers: rafael Reviewed By: rafael CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2686 llvm-svn: 201237	2014-02-12 14:44:54 +00:00
Benjamin Kramer	94fc18d040	InstCombine: Teach icmp merging about the equivalence of bit tests and UGE/ULT with a power of 2. This happens in bitfield code. While there reorganize the existing code a bit. llvm-svn: 201176	2014-02-11 21:09:03 +00:00
Chandler Carruth	fc25854b09	[LPM] Switch LICM to actively use LCSSA in addition to preserving it. Fixes PR18753 and PR18782. This is necessary for LICM to preserve LCSSA correctly and efficiently. There is still some active discussion about whether we should be using LCSSA, but we can't just immediately stop using it and we need LICM to preserve it while we are using it. We can restore the old SSAUpdater driven code if and when there is a serious effort to remove the reliance on LCSSA from all of the loop passes. However, this also serves as a great example of why LCSSA is very nice to have. This change significantly simplifies the process of sinking instructions for LICM, and makes it quite a bit less expensive. It wouldn't even be as complex as it is except that I had to start the process of removing the big recursive LCSSA formation hammer in order to switch even this much of the re-forming code to asserting that LCSSA was preserved. I'll fully remove that next just to tidy things up until the LCSSA debate settles one way or the other. llvm-svn: 201148	2014-02-11 12:52:27 +00:00
Arnold Schwaighofer	348e1b60be	LoopVectorizer: Keep track of conditional store basic blocks Before conditional store vectorization/unrolling we had only one vectorized/unrolled basic block. After adding support for conditional store vectorization this will not only be one block but multiple basic blocks. The last block would have the back-edge. I updated the code to use a vector of basic blocks instead of a single basic block and fixed the users to use the last entry in this vector. But, I forgot to add the basic blocks to this vector! Fixes PR18724. llvm-svn: 201028	2014-02-08 20:41:13 +00:00
Juergen Ributzka	9479b31f97	[Constant Hoisting] Fix insertion point for constant materialization. The bitcast instruction during constant materialization was not placed correcly in the presence of phi nodes. This commit fixes the insertion point to be in the idom instead. This fixes PR18768 llvm-svn: 201009	2014-02-08 00:20:49 +00:00
Nick Lewycky	993849490e	A memcpy out of an fresh alloca is a no-op, delete it. Patch by Patrick Walton! llvm-svn: 200907	2014-02-06 06:29:19 +00:00
Manman Ren	d461244972	Set default of inlinecold-threshold to 225. 225 is the default value of inline-threshold. This change will make sure we have the same inlining behavior as prior to r200886. As Chandler points out, even though we don't have code in our testing suite that uses cold attribute, there are larger applications that do use cold attribute. r200886 + this commit intend to keep the same behavior as prior to r200886. We can later on tune the inlinecold-threshold. The main purpose of r200886 is to help performance of instrumentation based PGO before we actually hook up inliner with analysis passes such as BPI and BFI. For instrumentation based PGO, we try to increase inlining of hot functions and reduce inlining of cold functions by setting inlinecold-threshold. Another option suggested by Chandler is to use a boolean flag that controls if we should use OptSizeThreshold for cold functions. The default value of the boolean flag should not change the current behavior. But it gives us less freedom in controlling inlining of cold functions. llvm-svn: 200898	2014-02-06 01:59:22 +00:00
Manman Ren	e8781b1a36	Inliner uses a smaller inline threshold for callees with cold attribute. Added command line option inlinecold-threshold to set threshold for inlining functions with cold attribute. Listen to the cold attribute when it would decrease the inline threshold. llvm-svn: 200886	2014-02-05 22:53:44 +00:00
Benjamin Kramer	34f460ed29	SimplifyLibCalls: Push TLI through the exp2->ldexp transform. For the odd case of platforms with exp2 available but not ldexp. llvm-svn: 200795	2014-02-04 20:27:23 +00:00
Tim Northover	103e648d30	OS X: the correct function is __sincospif_stret, not __sincospi_stretf rdar://problem/13729466 llvm-svn: 200771	2014-02-04 16:28:20 +00:00
Kai Nacke	a56bb78021	Add strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls and remove the ToDo comment. Reviewer: Duncan P.N. Exan Smith llvm-svn: 200736	2014-02-04 05:55:16 +00:00
Reid Kleckner	d47a59a4f8	inalloca: Don't remove dead arguments in the presence of inalloca args It disturbs the layout of the parameters in memory and registers, leading to problems in the backend. The plan for optimizing internal inalloca functions going forward is to essentially SROA the argument memory and demote any captured arguments (things that aren't trivially written by a load or store) to an indirect pointer to a static alloca. llvm-svn: 200717	2014-02-03 20:42:49 +00:00
Duncan P. N. Exon Smith	1ff08e389f	Lower llvm.expect intrinsic correctly for i1 LowerExpectIntrinsic previously only understood the idiom of an expect intrinsic followed by a comparison with zero. For llvm.expect.i1, the comparison would be stripped by the early-cse pass. Patch by Daniel Micay. llvm-svn: 200664	2014-02-02 22:43:55 +00:00
Arnold Schwaighofer	17455633c7	LoopVectorizer: Enable unrolling of conditional stores and the load/store unrolling heuristic per default Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed up some benchmarks while not causing any interesting regressions. llvm-svn: 200621	2014-02-02 03:12:34 +00:00
Arnold Schwaighofer	445f7fb064	ARMTTI: We don't have 16 allocatable scalar registers This caused an regression on libquantum after enabling the new loop vectorizer unroll heuristics. llvm-svn: 200616	2014-02-01 18:00:25 +00:00
Chandler Carruth	1665152cce	[LPM] Apply a really big hammer to fix PR18688 by recursively reforming LCSSA when we promote to SSA registers inside of LICM. Currently, this is actually necessary. The promotion logic in LICM uses SSAUpdater which doesn't understand how to place LCSSA PHI nodes. Teaching it to do so would be a very significant undertaking. It may be worthwhile and I've left a FIXME about this in the code as well as starting a thread on llvmdev to try to figure out the right long-term solution. For now, the PR needs to be fixed. Short of using the promition SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes, I don't see a cleaner or cheaper way of achieving this. Fortunately, LCSSA is relatively lazy and sparse -- it should only update instructions which need it. We can also skip the recursive variant when we don't promote to SSA values. llvm-svn: 200612	2014-02-01 13:35:14 +00:00
Chandler Carruth	6b4cc8b66a	[inliner] Skip debug intrinsics even earlier in computing the inline cost so that they don't impact the vector bonus. Fundamentally, counting unsimplified instructions is just wrong; it will continue to introduce instability as things which do not generate code bizarrely impact inlining. For example, sufficiently nested inlined functions could turn off the vector bonus with lifetime markers just like the debug intrinsics do. =/ This is a short-term tactical fix. Long term, I think we need to remove the vector bonus entirely. That's a separate patch and discussion though. The patch to fix this provided by Dario Domizioli. I've added some comments about the planned direction and used a heavily pruned form of debug info intrinsics for the test case. While this debug info doesn't work or "do" anything useful, it lets us easily test all manner of interference easily, and I suspect this will not be the last time we want to craft a pattern where debug info interferes with the inliner in a problematic way. llvm-svn: 200609	2014-02-01 10:38:17 +00:00
Reid Kleckner	a04504fe97	Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..." This reverts commit r200576. It broke 32-bit self-host builds by vectorizing two calls to @llvm.bswap.i64, which we then fail to expand. llvm-svn: 200602	2014-02-01 01:37:30 +00:00
Chandler Carruth	b3da389e30	[SLPV] Recognize vectorizable intrinsics during SLP vectorization and transform accordingly. Based on similar code from Loop vectorization. Subsequent commits will include vectorization of function calls to vector intrinsics and form function calls to vector library calls. Patch by Raul Silvera! (Much delayed due to my not running dcommit) llvm-svn: 200576	2014-01-31 21:14:40 +00:00
Chandler Carruth	c12224cb93	[vectorizer] Tweak the way we do small loop runtime unrolling in the loop vectorizer to not do so when runtime pointer checks are needed and share code with the new (not yet enabled) load/store saturation runtime unrolling. Also ensure that we only consider the runtime checks when the loop hasn't already been vectorized. If it has, the runtime check cost has already been paid. I've fleshed out a test case to cover the scalar unrolling as well as the vector unrolling and comment clearly why we are or aren't following the pattern. llvm-svn: 200530	2014-01-31 10:51:08 +00:00
Matt Arsenault	ee364ee729	Allow speculating llvm.sqrt, fma and fmuladd This doesn't set errno, so this should be OK. Also update the documentation to explicitly state that errno are not set. llvm-svn: 200501	2014-01-31 00:09:00 +00:00
Arnold Schwaighofer	85a26704e9	LoopVectorizer: Add a test case for unrolling of small loops that need a runtime check. llvm-svn: 200408	2014-01-29 18:55:44 +00:00
Chandler Carruth	d4be9dc02d	[LPM] Fix PR18643, another scary place where loop transforms failed to preserve loop simplify of enclosing loops. The problem here starts with LoopRotation which ends up cloning code out of the latch into the new preheader it is buidling. This can create a new edge from the preheader into the exit block of the loop which breaks LoopSimplify form. The code tries to fix this by splitting the critical edge between the latch and the exit block to get a new exit block that only the latch dominates. This sadly isn't sufficient. The exit block may be an exit block for multiple nested loops. When we clone an edge from the latch of the inner loop to the new preheader being built in the outer loop, we create an exiting edge from the outer loop to this exit block. Despite breaking the LoopSimplify form for the inner loop, this is fine for the outer loop. However, when we split the edge from the inner loop to the exit block, we create a new block which is in neither the inner nor outer loop as the new exit block. This is a predecessor to the old exit block, and so the split itself takes the outer loop out of LoopSimplify form. We need to split every edge entering the exit block from inside a loop nested more deeply than the exit block in order to preserve all of the loop simplify constraints. Once we try to do that, a problem with splitting critical edges surfaces. Previously, we tried a very brute force to update LoopSimplify form by re-computing it for all exit blocks. We don't need to do this, and doing this much will sometimes but not always overlap with the LoopRotate bug fix. Instead, the code needs to specifically handle the cases which can start to violate LoopSimplify -- they aren't that common. We need to see if the destination of the split edge was a loop exit block in simplified form for the loop of the source of the edge. For this to be true, all the predecessors need to be in the exact same loop as the source of the edge being split. If the dest block was originally in this form, we have to split all of the deges back into this loop to recover it. The old mechanism of doing this was conservatively correct because at least one of the exiting blocks it rewrote was the DestBB and so the DestBB's predecessors were fixed. But this is a much more targeted way of doing it. Making it targeted is important, because ballooning the set of edges touched prevents LoopRotate from being able to split edges it needs to split to preserve loop simplify in a coherent way -- the critical edge splitting would sometimes find the other edges in need of splitting but not others. Many, many thanks for help from Nick reducing these test cases mightily. And helping lots with the analysis here as this one was quite tricky to track down. llvm-svn: 200393	2014-01-29 13:16:53 +00:00

... 14 15 16 17 18 ...

5712 Commits