Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements.
llvm-svn: 268158
Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements.
llvm-svn: 268115
We neglected to transfer operand bundles for some transforms. These
were found via inspection, I'll try to come up with some test cases.
llvm-svn: 268010
This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:
1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)
2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input
Differential Revision: http://reviews.llvm.org/D17490
llvm-svn: 267356
This bug was introduced with:
http://reviews.llvm.org/rL262269
AVX masked loads are specified to set vector lanes to zero when the high bit of the mask
element for that lane is zero:
"If the mask is 0, the corresponding data element is set to zero in the load form of these
instructions, and unmodified in the store form." --Intel manual
Differential Revision: http://reviews.llvm.org/D19017
llvm-svn: 266148
`allocsize` is a function attribute that allows users to request that
LLVM treat arbitrary functions as allocation functions.
This patch makes LLVM accept the `allocsize` attribute, and makes
`@llvm.objectsize` recognize said attribute.
The review for this was split into two patches for ease of reviewing:
D18974 and D14933. As promised on the revisions, I'm landing both
patches as a single commit.
Differential Revision: http://reviews.llvm.org/D14933
llvm-svn: 266032
Two or more identical assumes are occasionally next to each other in a
basic block.
While our generic machinery will turn a redundant assume into a no-op,
it is not super cheap.
We can perform a simpler check to achieve the same result for this case.
llvm-svn: 265801
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Originally landed as r261544, then reverted in r261544 for (incidental)
build breakage. Re-landed here with no changes.
Reviewers: chandlerc, jingyue
Subscribers: llvm-commits, tra, jhen, hfinkel
Differential Revision: http://reviews.llvm.org/D17739
llvm-svn: 263481
This follows up on the related AVX instruction transforms, but this
one is too strange to do anything more with. Intel's behavioral
description of this instruction in its Software Developer's Manual
is tragi-comic.
llvm-svn: 263340
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:
__m128 mload_zeros(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0));
}
__m128 mload_fakeones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(1));
}
__m128 mload_ones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}
__m128 mload_oneset(float *f) {
return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}
...so none of the above will actually generate a masked load for optimized code.
This is the masked load counterpart to:
http://reviews.llvm.org/rL262064
llvm-svn: 262269
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:
void mstore_zero_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}
void mstore_fake_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}
void mstore_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}
void mstore_one_set_elt_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}
...so none of the above will actually generate a masked store for optimized code.
Differential Revision: http://reviews.llvm.org/D17485
llvm-svn: 262064
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
llvm-svn: 261736
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Reviewers: chandlerc, jingyue
Subscribers: hfinkel, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17317
llvm-svn: 261544
We introduced gc.relocates of vector-of-pointer types a couple of weeks back. Somehow, I missed updating the InstCombine rule to account for this. If we hit this code path with a vector-of-pointers gc.relocate, we'd crash on a cast<PointerType>.
I also took the chance to do a bit of code style cleanup.
llvm-svn: 260279
A masked store with a zero mask means there's no store.
A masked store with an allOnes mask means it's a normal vector store.
This is a continuation of:
http://reviews.llvm.org/rL259369
llvm-svn: 259392
A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.
Differential Revision: http://reviews.llvm.org/D16691
llvm-svn: 259369
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
llvm-svn: 258557
Summary:
This commit renames GCRelocateOperands to GCRelocateInst and makes it an
intrinsic wrapper, similar to e.g. MemCpyInst. Also, all users of
GCRelocateOperands were changed to use the new intrinsic wrapper instead.
Reviewers: sanjoy, reames
Subscribers: reames, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15762
llvm-svn: 256811
time.
The new overloaded function is used when an attribute is added to a
large number of slots of an AttributeSet (for example, to function
parameters). This is much faster than calling AttributeSet::addAttribute
once per slot, because AttributeSet::getImpl (which calls
FoldingSet::FIndNodeOrInsertPos) is called only once per function
instead of once per slot.
With this commit, clang compiles a file which used to take over 22
minutes in just 13 seconds.
rdar://problem/23581000
Differential Revision: http://reviews.llvm.org/D15085
llvm-svn: 254491