We would incorrectly emit the directive sections due to the missing overridden
methods. We now emit the expected "/DEFAULTLIB" rather than "-l" options for
requested linkage
llvm-svn: 273558
Add support for /Ob1 (and equivalent -finline-hint-functions), which enable
inlining only for functions marked inline, either explicitly (via inline
keyword, for example), or implicitly (function definition in class body,
for example).
This works by enabling inlining pass, and adding noinline attribute to
every function not marked inline.
Patch by Rudy Pons <rudy.pons@ilod.org>!
Differential Revision: http://reviews.llvm.org/D20647
llvm-svn: 273440
It currently only takes 2048 gotos to overflow the FixupDepth bitfield,
causing silent miscompilation. Apparently some parser generators run into
this (see PR).
I don't know that that data structure is terribly size sensitive anyway,
and since there's no room to widen the bitfield, let's just use a separate
word in EHCatchScope for it.
Differential Revision: http://reviews.llvm.org/D21566
llvm-svn: 273434
This new test tests that functions are capable of being imported, rather than
that the import pass is run. This new test is compatible with the approach
being developed in D20268 which runs the importer on its own rather than in
a pass.
Differential Revision: http://reviews.llvm.org/D21542
llvm-svn: 273347
Summary:
If the RenderScript LangOpt is set, either via '-x renderscript' or the '.rs'
file extension, set the DWARF language tag to be that of RenderScript.
Reviewers: rsmith
Subscribers: cfe-commits, srhines
Differential Revision: http://reviews.llvm.org/D21451
llvm-svn: 273321
This is a fix for PR28112:
https://llvm.org/bugs/show_bug.cgi?id=28112
The FP comparison intrinsics that take an immediate parameter (rather than specifying
a comparison predicate in the function name) were added with AVX; these are macros in
avxintrin.h. This patch makes clang behavior match gcc (error if a program tries to use
these without -mavx) and matches the Intel documentation, eg:
VCMPPS: m128 _mm_cmp_ps(m128 a, __m128 b, const int imm)
'V' means this is intended to only work with the AVX form of the instruction.
Differential Revision: http://reviews.llvm.org/D21306
llvm-svn: 273311
Summary: We need to call PruneEH pass before AutoFDO pass so that some EH-related calls can get inlined in Sample Profile pass.
Reviewers: davidxl, dnovillo
Subscribers: junbuml, llvm-commits
Differential Revision: http://reviews.llvm.org/D21197
llvm-svn: 273298
Given something like:
void *v = (void *)100;
We need to synthesize a ptrtoint operation from 100. During constant
emission, we choose i64 as the type for our constant because it
guaranteed not to drop any bits from our CharUnits representation of the
value. However, this is suboptimal for 32-bit targets: LLVM passes like
GlobalOpt will get confused by these sorts of casts resulting in
pessimization.
Instead, make sure the ptrtoint operand has a pointer-sized integer
type.
llvm-svn: 273020
Reapplying patch in r272777 which was reverted
because the llvm patch which added support
for generating the mcrr/mcrr2 instructions
from the intrinsic was causing an assertion
failure. This has now been fixed in llvm.
llvm-svn: 272983
This patch fixes a bug where we'd segfault (in some cases) if we saw a
variadic function with one or more pass_object_size arguments.
Differential Revision: http://reviews.llvm.org/D17462
llvm-svn: 272971
As noted in the code comment, a potential follow-on would be to remove
the builtins themselves. Other than ord/unord, this already works as
expected. Eg:
typedef float v4sf __attribute__((__vector_size__(16)));
v4sf fcmpgt(v4sf a, v4sf b) { return a > b; }
Differential Revision: http://reviews.llvm.org/D21268
llvm-svn: 272840
Patch adds intrinsics for mrrc/mrrc2. The
intrinsics for mrrc/mrrc2 return a single
uint64_t to represent two 32 bit values.
The mcrr/mcrr2 intrinsic was changed to
accept a single uint64_t instead of two
32 bit values as the input for consistency.
Differential Revision: http://reviews.llvm.org/D21179
llvm-svn: 272777
There is no need to use a target-specific intrinsic to implement
_bit_scan_forward or _bit_scan_reverse, reimplementing them using
generic intrinsics makes it more likely that the middle end will
understand what's going on.
llvm-svn: 272564
We can now use __builtin_nontemporal_store instead of target specific builtins for naturally aligned nontemporal stores which avoids the need for handling in CGBuiltin.cpp
The scalar integer nontemporal (unaligned) store builtins will have to wait as __builtin_nontemporal_store currently assumes natural alignment and doesn't accept the 'packed struct' trick that we use for normal unaligned load/stores.
The nontemporal loads require further backend support before we can safely convert them to __builtin_nontemporal_load
Differential Revision: http://reviews.llvm.org/D21272
llvm-svn: 272540
Summary:
Create a new Frontend LangOpt to specify the renderscript language. It
is enabled by the "-x renderscript" option from the driver.
Add a "kernel" function attribute only for RenderScript (an "ignored
attribute" warning is generated otherwise).
Make the NativeHalfType and NativeHalfArgsAndReturns LangOpts be implied
by the RenderScript LangOpt.
Reviewers: rsmith
Subscribers: cfe-commits, srhines
Differential Revision: http://reviews.llvm.org/D21198
llvm-svn: 272342
The 'cvtt' truncation (round to zero) conversions can be safely represented as generic __builtin_convertvector (fptosi) calls instead of x86 intrinsics. We already do this (implicitly) for the scalar equivalents.
Note: I looked at updating _mm_cvttpd_epi32 as well but this still requires a lot more backend work to correctly lower (both for debug and optimized builds).
Differential Revision: http://reviews.llvm.org/D20859
llvm-svn: 271436
The `isa' member was previously not given the correct DLL Storage. Ensure that
we give the `isa' constant `__CFConstantStringClassReference' the correct DLL
storage. Default to dllimport unless an explicit specification gives it a
dllexport storage.
llvm-svn: 271361
According to the gcc headers, intel intrinsics docs and msdn codegen the _mm_store1_pd (and its _mm_store_pd1 equivalent) should use an aligned pointer - the clang headers are the only implementation I can find that assume non-aligned stores (by storing with _mm_storeu_pd).
Additionally, according to the intel intrinsics docs and msdn codegen the _mm_store1_ps (_mm_store_ps1) requires a similarly aligned pointer.
This patch raises the alignment requirements to match the other implementations by calling _mm_store_ps/_mm_store_pd instead.
I've also added the missing _mm_store_pd1 intrinsic (which maps to _mm_store1_pd like _mm_store_ps1 does to _mm_store1_ps).
As a followup I'll update the llvm fast-isel tests to match this codegen.
Differential Revision: http://reviews.llvm.org/D20617
llvm-svn: 271218
Adjust the constant CFString emission to emit into more appropriate sections on
ELF and COFF targets. It would previously try to use MachO section names
irrespective of the file format.
llvm-svn: 271211
This extends the blocks support to support blocks with a dynamically linked
blocks runtime. The previous code generation would work only for static builds
of the blocks runtime. Mark the block "isa" pointers and functions as dllimport
if no explicit declaration marked with __declspec(dllexport) is found. This
additional check allows for the use of the functionality in the runtime library
if desired.
llvm-svn: 271138
The VPMOVSX and (V)PMOVZX sign/zero extension intrinsics can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics.
This patch removes the clang builtins and their use in the sse2/avx headers - a companion patch will remove/auto-upgrade the llvm intrinsics.
Note: We already did this for SSE41 PMOVSX sometime ago.
Differential Revision: http://reviews.llvm.org/D20684
llvm-svn: 271106
_InterlockedIncrement and _InterlockedDecrement have 'long' in their
prototypes. We assumed 'long' was the same size as an i32 which is
incorrect for other targets.
This fixes PR27892.
llvm-svn: 270953
If we have some function with dllimport attribute and then we have the function
definition in the same module but without dllimport attribute we should add
dllexport attribute to this function definition.
The same should be done for variables.
Example:
struct __declspec(dllimport) C3 {
~C3();
};
C3::~C3() {;} // we should export this definition.
Patch by Andrew V. Tischenko
Differential revision: http://reviews.llvm.org/D18953
llvm-svn: 270686
-finline-functions and /Ob2 are currently ignored by Clang. The only way to
enable inlining is to use the global O flags, which also enable other options,
or to emit LLVM bitcode using Clang, then running opt by hand with the inline
pass.
This patch allows to simply use the -finline-functions flag (same as GCC) or
/Ob2 in clang-cl mode to enable inlining without other optimizations.
This is the first patch of a serie to improve support for the /Ob flags.
Patch by Rudy Pons <rudy.pons@ilod.org>!
Differential Revision: http://reviews.llvm.org/D20576
llvm-svn: 270609
Underaligned atomic LValues require libcalls which MSVC doesn't have.
MSVC doesn't seem to consider such operations as requiring a barrier
anyway.
This fixes PR27843.
llvm-svn: 270576
Summary:
Following patch D19265 which enable software floating point support in the Sparc backend, this patch enables the option to be enabled in the front-end using the -msoft-float option.
The user should ensure a library (such as the builtins from Compiler-RT) that includes the software floating point routines is provided.
Reviewers: jyknight, lero_chris
Subscribers: jyknight, cfe-commits
Differential Revision: http://reviews.llvm.org/D20419
llvm-svn: 270538
Both the (V)CVTDQ2PD(Y) (i32 to f64) and (V)CVTPS2PD(Y) (f32 to f64) conversion instructions are lossless and can be safely represented as generic __builtin_convertvector calls instead of x86 intrinsics without affecting final codegen.
This patch removes the clang builtins and their use in the sse2/avx headers - a future patch will deal with removing the llvm intrinsics, but that will require a bit more work.
Differential Revision: http://reviews.llvm.org/D20528
llvm-svn: 270499
Ensure _mm256_extract_epi8 and _mm256_extract_epi16 zero extend their i8/i16 result to i32. This matches _mm_extract_epi8 and _mm_extract_epi16.
Fix for PR27594
Differential Revision: http://reviews.llvm.org/D20468
llvm-svn: 270330
This is matching what trunk gcc is accepting. Also adds a missing ssse3
case. PR27779. The amount of duplication here is annoying, maybe it
should be factored into a separate .def file?
llvm-svn: 270224
Summary:
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
Reviewers: tra, rsmith
Subscribers: jhen, cfe-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D19990
llvm-svn: 270150
- Fixed cdp intrinsic to only accept compile time
constant values previously you could pass in a
variable to the builtin which would result in
illegal llvm assembly output
Differential Revision: http://reviews.llvm.org/D20394
llvm-svn: 270058
Summary:
MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT
pair while adding a timer function, such that another termination of the MWAITX
instruction occurs when the timer expires. The presence of the MONITORX and
MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29.
The MONITORX and MWAITX instructions are intercepted by the same bits that
intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be
monitored. MWAITX instruction causes the processor to stop instruction
execution and enter an implementation-dependent optimized state until
occurrence of a class of events.
Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is
"0F 01 FB". These opcode information is used in adding tests for the
disassembler.
These instructions are enabled for AMD's bdver4 architecture.
Patch by Ganesh Gopalasubramanian!
Reviewers: echristo, craig.topper
Subscribers: RKSimon, joker.eph, llvm-commits, cfe-commits
Differential Revision: http://reviews.llvm.org/D19796
llvm-svn: 269907
Visual Studio's C++ standard library headers include intrin.h, so the intrinsic
headers get included a lot more often in Microsoft mode than elsewhere. The
AVX512 intrinsics are a lot of code (0.7 MB, causing 30% compile time overhead
for small programs including e.g. <string> and 6% compile time overhead for
larger projects like e.g. v8). Since multiversioning can't be relied on in
Microsoft mode (cl.exe doesn't support it), having faster compiles seems like
the much better tradeoff until we have a better intrinsic story going forward
(which we'll need for e.g. PR19898).
Actually using intrinsics on Windows already requires the right /arch:
settings, so this patch should have no big behavior change.
See also thread "The intrinsics headers (especially avx512) are too big. What
to do about it?" on cfe-dev.
http://reviews.llvm.org/D20291
llvm-svn: 269675
Summary:
Clang does not detect `aapcs-vfp` for the EABIHF environment. The reason is that only GNUEABIHF is considered while choosing calling convention, EABIHF is ignored.
This causes clang to use `aapcs` for EABIHF and add the `arm_aapcscc` specifier to functions in generated IR.
The modified `arm-cc.c` test checks that no calling convention specifier is added to functions for EABIHF, which means the default one is used (`CallingConv::ARM_AAPCS_VFP`).
Reviewers: rengolin, compnerd, t.p.northover
Subscribers: aemerson, rengolin, asl, cfe-commits
Differential Revision: http://reviews.llvm.org/D20219
llvm-svn: 269419
Summary:
This option allows the user to control how much of the file name is
emitted by UBSan. Tuning this option allows one to save space in the
resulting binary, which is helpful for restricted execution
environments.
With a positive N, UBSan skips the first N path components.
With a negative N, UBSan only keeps the last N path components.
Reviewers: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D19666
llvm-svn: 269309
Currently, if clang::isBetterOverloadCandidate encounters an enable_if
attribute on either candidate that it's inspecting, it will ignore all
lower priority attributes (e.g. pass_object_size). This is problematic
in cases like:
```
void foo(char *c) __attribute__((enable_if(1, "")));
void foo(char *c __attribute__((pass_object_size(0))))
__attribute__((enable_if(1, "")));
```
...Because we would ignore the pass_object_size attribute in the second
`foo`, and consider any call to `foo` to be ambiguous.
This patch makes overload resolution consult further tiebreakers (e.g.
pass_object_size) if two candidates have equally good enable_if
attributes.
llvm-svn: 269005