llvm-project

Commit Graph

Author	SHA1	Message	Date
JF Bastien	ed92f608bb	[NFC] CodeGen: rename memset to bzero The optimization looks for opportunities to emit bzero, not memset. Rename the functions accordingly (and clang-format the diff) because I want to add a fallback optimization which actually tries to generate memset. bzero is still better and it would confuse the code to merge both. llvm-svn: 337636	2018-07-20 23:37:12 +00:00
Yaxun Liu	f99752b66b	[HIP] Register/unregister device fat binary only once HIP generates one fat binary for all devices after linking. However, for each compilation unit a ctor function is emitted which register the same fat binary. Measures need to be taken to make sure the fat binary is only registered once. Currently each ctor function calls __hipRegisterFatBinary and stores the returned value to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce so that they are shared between LLVM modules. Then this patch adds check of value of __hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code is equivalent to void *_gpubin_handle; void ctor() { if (__hip_gpubin_handle == 0) { __hip_gpubin_handle = __hipRegisterFatBinary(...); } // register kernels and variables. } The patch also does similar change to dtors so that __hipUnregisterFatBinary is called once. Differential Revision: https://reviews.llvm.org/D49083 llvm-svn: 337631	2018-07-20 22:45:24 +00:00
Reid Kleckner	891b2714bc	[codeview] Don't emit variable templates as class members MSVC doesn't, so neither should we. Fixes PR38004, which is a crash that happens when we try to emit debug info for a still-dependent partial variable template specialization. As a follow-up, we should review what we're doing for function and class member templates. It looks like we don't filter those out, but I can't seem to get clang to emit any. llvm-svn: 337616	2018-07-20 20:55:00 +00:00
Akira Hatanaka	dbfa453e41	[CodeGen][ObjC] Make copying and disposing of a non-escaping block no-ops. A non-escaping block on the stack will never be called after its lifetime ends, so it doesn't have to be copied to the heap. To prevent a non-escaping block from being copied to the heap, this patch sets field 'isa' of the block object to NSConcreteGlobalBlock and sets the BLOCK_IS_GLOBAL bit of field 'flags', which causes the runtime to treat the block as if it were a global block (calling _Block_copy on the block just returns the original block and calling _Block_release is a no-op). Also, a new flag bit 'BLOCK_IS_NOESCAPE' is added, which allows the runtime or tools to distinguish between true global blocks and non-escaping blocks. rdar://problem/39352313 Differential Revision: https://reviews.llvm.org/D49303 llvm-svn: 337580	2018-07-20 17:10:32 +00:00
Erich Keane	3efe00206f	Implement cpu_dispatch/cpu_specific Multiversioning As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552	2018-07-20 14:13:28 +00:00
Fangrui Song	99337e246c	Change \t to spaces llvm-svn: 337530	2018-07-20 08:19:20 +00:00
Richard Smith	4c6568869e	Fix typo causing assert in self-host. llvm-svn: 337508	2018-07-19 23:24:41 +00:00
Richard Smith	83497d9ead	When we choose to use zeroinitializer for a trailing portion of an array constant, don't convert the rest into a packed struct. If an array constant has a large non-zero portion and a large zero portion, we want to emit the first part as an array and the rest as a zeroinitializer if possible. This fixes a memory usage regression from r333141 when compiling PHP. llvm-svn: 337498	2018-07-19 21:38:56 +00:00
Nico Weber	f29044536d	fix typo in comment llvm-svn: 337480	2018-07-19 18:59:38 +00:00
Erich Keane	e69755a55f	Fix unused variable warning. llvm-svn: 337473	2018-07-19 17:19:16 +00:00
Alexey Bataev	b363813543	The patch adds support for the new map interface between clang and libomptarget. The changes in the interface are the following: device IDs are now 64-bit integers (as opposed to 32-bit) map flags are 64-bit long (used to be 32-bit) mappings for partially mapped structs are now calculated at compile time and members of partially mapped structs are flagged using the MEMBER_OF field Support for is_device_ptr on struct members was dropped - this functionality is not supported by the OpenMP standard and its implementation is technically infeasible (however, use_device_ptr on struct members works as a non-standard extension of the compiler) llvm-svn: 337468	2018-07-19 16:34:13 +00:00
Pavel Labath	45a8dfacf4	[CodeGen] Disable aggressive structor optimizations at -O0, take 3 The previous version of this patch (r332839) was reverted because it was causing "definition with same mangled name as another definition" errors in some module builds. This was caused by an unrelated bug in module importing which it exposed. The importing problem was fixed in r336240, so this recommits the original patch (r332839). Differential Revision: https://reviews.llvm.org/D46685 llvm-svn: 337456	2018-07-19 14:05:22 +00:00
Nemanja Ivanovic	2600b839d5	NFC: Remove extraneous semicolons as pointed out in the differential review The commit for https://reviews.llvm.org/D49424 missed the comment about the extraneous semicolons. Remove them. llvm-svn: 337451	2018-07-19 12:49:27 +00:00
Nemanja Ivanovic	1ac56bd33f	[PowerPC] Handle __builtin_xxpermdi the same way as GCC does The codegen for this builtin was initially implemented to match GCC. However, due to interest from users GCC changed behaviour to account for the big endian bias of the instruction and correct it. This patch brings the handling inline with GCC. Fixes https://bugs.llvm.org/show_bug.cgi?id=38192 Differential Revision: https://reviews.llvm.org/D49424 llvm-svn: 337449	2018-07-19 12:44:15 +00:00
Manoj Gupta	da08f6ac16	[clang]: Add support for "-fno-delete-null-pointer-checks" Summary: Support for this option is needed for building Linux kernel. This is a very frequently requested feature by kernel developers. More details : https://lkml.org/lkml/2018/4/4/601 GCC option description for -fdelete-null-pointer-checks: This Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. -fno-delete-null-pointer-checks is the inverse of this implying that null pointer dereferencing is not undefined. This feature is implemented in as the function attribute "null-pointer-is-valid"="true". This CL only adds the attribute on the function. It also strips "nonnull" attributes from function arguments but keeps the related warnings unchanged. Corresponding LLVM change rL336613 already updated the optimizations to not treat null pointer dereferencing as undefined if the attribute is present. Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv Reviewed By: jyknight Subscribers: drinkcat, xbolva00, cfe-commits Differential Revision: https://reviews.llvm.org/D47894 llvm-svn: 337433	2018-07-19 00:44:52 +00:00
Erich Keane	7963e8bebb	Add support for __declspec(code_seg("segname")) This patch uses CodeSegAttr to represent __declspec(code_seg) rather than building on the existing support for #pragma code_seg. The code_seg declspec is applied on functions and classes. This attribute enables the placement of code into separate named segments, including compiler- generated codes and template instantiations. For more information, please see the following: https://msdn.microsoft.com/en-us/library/dn636922.aspx This patch fixes the regression for the support for attribute ((section). `746b78de78` Patch by Soumi Manna (Manna) Differential Revision: https://reviews.llvm.org/D48841 llvm-svn: 337420	2018-07-18 20:04:48 +00:00
Peter Collingbourne	14b468bab6	Re-land r337333, "Teach Clang to emit address-significance tables.", which was reverted in r337336. The problem that required a revert was fixed in r337338. Also added a missing "REQUIRES: x86-registered-target" to one of the tests. Original commit message: > Teach Clang to emit address-significance tables. > > By default, we emit an address-significance table on all ELF > targets when the integrated assembler is enabled. The emission of an > address-significance table can be controlled with the -faddrsig and > -fno-addrsig flags. > > Differential Revision: https://reviews.llvm.org/D48155 llvm-svn: 337339	2018-07-18 00:27:07 +00:00
Peter Collingbourne	35c6996b68	Revert r337333, "Teach Clang to emit address-significance tables." Causing multiple failures on sanitizer bots due to TLS symbol errors, e.g. /usr/bin/ld: __msan_origin_tls: TLS definition in /home/buildbots/ppc64be-clang-test/clang-ppc64be/stage1/lib/clang/7.0.0/lib/linux/libclang_rt.msan-powerpc64.a(msan.cc.o) section .tbss.__msan_origin_tls mismatches non-TLS reference in /tmp/lit_tmp_0a71tA/mallinfo-3ca75e.o llvm-svn: 337336	2018-07-17 23:56:30 +00:00
Peter Collingbourne	27242c0402	Teach Clang to emit address-significance tables. By default, we emit an address-significance table on all ELF targets when the integrated assembler is enabled. The emission of an address-significance table can be controlled with the -faddrsig and -fno-addrsig flags. Differential Revision: https://reviews.llvm.org/D48155 llvm-svn: 337333	2018-07-17 23:17:16 +00:00
Richard Smith	7027ffa85f	Replace LLVM_ALIGNAS with just alignas. Various places in Clang and LLVM are already using alignas; it seems our minimum host configuration now requires it. llvm-svn: 337330	2018-07-17 22:24:11 +00:00
Mandeep Singh Grang	0054f48b44	[COFF] Add more missing MSVC ARM64 intrinsics Summary: Added the following intrinsics: _BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64 _InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64, _InterlockedExchangeAdd64, _InterlockedExchangeSub64, _InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64. Reviewers: compnerd, mstorsjo, rnk, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49445 llvm-svn: 337327	2018-07-17 22:03:24 +00:00
Alexey Bataev	c52f01d1d7	[OPENMP] Fix checks for declare target link entries. If the declare target link entries are created but not used, the compiler will produce an error message. Patch improves handling of such situations + improves checks for possibly lost declare target variables. llvm-svn: 337207	2018-07-16 20:05:25 +00:00
Alexey Bataev	7f01d20993	[OPENMP] Fix syntactic errors in error messages. Fixed spelling of the offloading error messages. llvm-svn: 337196	2018-07-16 18:12:18 +00:00
Alexey Bataev	3dd1f9d61d	[OPENMP, NVPTX] Globalize only captured variables. Sometimes we can try to globalize non-variable declarations, which may lead to compiler crash. llvm-svn: 337191	2018-07-16 16:49:20 +00:00
Teresa Johnson	b1d17f64e5	Restore "[ThinLTO] Ensure we always select the same function copy to import" This reverts commit r337082, restoring r337051, since the LLVM side patch has been restored. llvm-svn: 337185	2018-07-16 15:30:36 +00:00
Teresa Johnson	70993d37e8	Revert "[ThinLTO] Ensure we always select the same function copy to import" This reverts commit r337051. llvm-svn: 337082	2018-07-14 01:50:14 +00:00
Teresa Johnson	9fe8af7e00	[ThinLTO] Ensure we always select the same function copy to import Clang change to reflect the FunctionsToImportTy type change in the llvm changes for D48670. llvm-svn: 337051	2018-07-13 21:35:58 +00:00
JF Bastien	9aab85a6a0	CodeGen: specify alignment + inbounds for automatic variable initialization Summary: Automatic variable initialization was generating default-aligned stores (which are deprecated) instead of using the known alignment from the alloca. Further, they didn't specify inbounds. Subscribers: dexonsmith, cfe-commits Differential Revision: https://reviews.llvm.org/D49209 llvm-svn: 337041	2018-07-13 20:33:23 +00:00
Gheorghe-Teodor Bercea	ad4e579407	[OpenMP] Initialize data sharing stack for SPMD case Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D49188 llvm-svn: 337015	2018-07-13 16:18:24 +00:00
Petr Pavlu	a934f9da41	Fix setting of empty implicit-section-name attribute Code in `CodeGenModule::SetFunctionAttributes()` could set an empty attribute `implicit-section-name` on a function that is affected by `#pragma clang text="section"`. This is incorrect because the attribute should contain a valid section name. If the function additionally also used `__attribute__((section("section")))` then this could result in emitting the function in a section with an empty name. The patch fixes the issue by removing the problematic code that sets empty `implicit-section-name` from `CodeGenModule::SetFunctionAttributes()` because it is sufficient to set this attribute only from a similar code in `setNonAliasAttributes()` when the function is emitted. Differential Revision: https://reviews.llvm.org/D48916 llvm-svn: 336842	2018-07-11 20:17:54 +00:00
JF Bastien	f014bdc199	[NFC] typo llvm-svn: 336840	2018-07-11 19:51:40 +00:00
Erich Keane	be65e874fe	[NFC] Switch CodeGenFunction to use value init instead of member init lists The member init list for the sole constructor for CodeGenFunction has gotten out of hand, so this patch moves the non-parameter-dependent initializations into the member value inits. Note: This is what was intended to be committed in r336726 llvm-svn: 336729	2018-07-10 21:07:50 +00:00
Erich Keane	9960b8f13a	Revert -r336726, which included more files than intended. llvm-svn: 336727	2018-07-10 20:51:41 +00:00
Erich Keane	7b8c12e7cc	[NFC] Switch CodeGenFunction to use value init instead of member init lists The member init list for the sole constructor for CodeGenFunction has gotten out of hand, so this patch moves the non-parameter-dependent initializations into the member value inits. llvm-svn: 336726	2018-07-10 20:46:46 +00:00
Bjorn Pettersson	404f414ee1	Patch to fix pragma metadata for do-while loops Summary: Make sure that loop metadata only is put on the backedge when expanding a do-while loop. Previously we added the loop metadata also on the branch in the pre-header. That could confuse optimization passes and result in the loop metadata being associated with the wrong loop. Fixes https://bugs.llvm.org/show_bug.cgi?id=38011 Committing on behalf of deepak2427 (Deepak Panickal) Reviewers: #clang, ABataev, hfinkel, aaron.ballman, bjope Reviewed By: bjope Subscribers: bjope, rsmith, shenhan, zzheng, xbolva00, lebedev.ri, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D48721 llvm-svn: 336717	2018-07-10 19:55:02 +00:00
Craig Topper	1faf953d75	[X86] Remove custom handling for __builtin_ia32_divss_round_mask and __builtin_ia32_divsd_round_mask. llvm-svn: 336628	2018-07-10 00:50:03 +00:00
Craig Topper	638426fc36	[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622	2018-07-10 00:37:25 +00:00
Craig Topper	74c10e3236	[Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583	2018-07-09 19:00:16 +00:00
Alexey Bataev	b99dcb5f31	[OPENMP, NVPTX] Do not globalize local variables in parallel regions. In generic data-sharing mode we are allowed to not globalize local variables that escape their declaration context iff they are declared inside of the parallel region. We can do this because L2 parallel regions are executed sequentially and, thus, we do not need to put shared local variables in the global memory. llvm-svn: 336567	2018-07-09 17:43:58 +00:00
Craig Topper	8a8d72794f	[X86] Add new scalar fma intrinsics with rounding mode that use f32/f64 types. This allows us to handle masking in a very similar way to the default rounding version that uses llvm.fma llvm-svn: 336507	2018-07-08 01:10:47 +00:00
Craig Topper	f89f62a680	[X86] When creating a select for scalar masked sqrt and div builtins make sure we optimize the all ones mask case. This case occurs in the intrinsic headers so we should avoid emitting the mask in those cases. Factor the code into a helper function to make this easy. llvm-svn: 336472	2018-07-06 22:46:52 +00:00
Craig Topper	be4c2933a2	[X86] Implement _builtin_ia32_vfmaddss and _builtin_ia32_vfmaddsd with native IR using llvm.fma intrinsic. This generates some extra zeroing currently, but we should be able to quickly address that with some isel patterns. llvm-svn: 336417	2018-07-06 07:14:47 +00:00
Craig Topper	284c5f342c	[X86] Use shufflevector instead of a select with a constant mask for fmaddsub/fmsubadd IR emission. Shufflevector is easier to generate and matches what the backend pattern matches without relying on constant selects being turned into shuffles. While I was there I also made the IR regular expressions a little stricter to ensure operand order on the shuffle. llvm-svn: 336388	2018-07-05 20:38:31 +00:00
Gabor Buella	9679eb6527	[X86] Fix some vector cmp builtins - TRUE/FALSE predicates This patch removes on optimization used with the TRUE/FALSE predicates, as was suggested in https://reviews.llvm.org/D45616 for r335339. The optimization was buggy, since r335339 used it also for *_mask builtins, without actually applying the mask -- the mask argument was just ignored. Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D48715 llvm-svn: 336355	2018-07-05 14:26:56 +00:00
Lei Huang	449252d2ad	[Power9] Update fp128 as a valid homogenous aggregate base type Update clang to treat fp128 as a valid base type for homogeneous aggregate passing and returning. Differential Revision: https://reviews.llvm.org/D48044 llvm-svn: 336308	2018-07-05 04:32:01 +00:00
Piotr Padlewski	0705829e40	[CodeGenCXX] Emit strip.invariant.group with -fstrict-vtable-pointers Summary: Emmiting new intrinsic that strips invariant.groups to make devirtulization sound, as described in RFC: Devirtualization v2. Reviewers: rjmccall, rsmith, amharc, kuhar Subscribers: llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D47299 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336137	2018-07-02 19:21:36 +00:00
Craig Topper	8bf793fb35	[X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead. llvm-svn: 335945	2018-06-29 05:43:33 +00:00
Artem Belevich	5ce0a08cf6	[CUDA] Place all CUDA sections in __NV_CUDA segment on Mac. That's where CUDA binaries appear to put them. Differential Revision: https://reviews.llvm.org/D48615 llvm-svn: 335880	2018-06-28 17:15:52 +00:00
Jonas Devlieghere	9ef4965a0d	[DebugInfo] Follow-up commit to improve consistency. NFC Follow-up commit for r335757 to address some inconsistencies. llvm-svn: 335834	2018-06-28 10:56:40 +00:00
Artem Belevich	c66d254ded	[CUDA] Use atexit() to call module destructor. This matches the way NVCC does it. Doing module cleanup at global destructor phase used to work, but is, apparently, too late for the CUDA runtime in CUDA-9.2, which ends up crashing with double-free. Differential Revision: https://reviews.llvm.org/D48613 llvm-svn: 335763	2018-06-27 18:32:51 +00:00
Jonas Devlieghere	d8ba8ae875	[DebugInfo] Emit ObjC methods as part of interface As brought up during the discussion of the DWARF5 accelerator tables, there is currently no way to associate Objective-C methods with the interface they belong to, other than the .apple_objc accelerator table. After due consideration we came to the conclusion that it makes more sense to follow Pavel's suggestion of just emitting this information in the .debug_info section. One concern was that categories were emitted in the .apple_names as well, but it turns out that LLDB doesn't rely on the accelerator tables for this information. This patch changes the codegen behavior to emit subprograms for structure types, like we do for C++. This will result in the DW_TAG_subprogram being nested as a child under its DW_TAG_structure_type. This behavior is only enabled for DWARF5 and later, so we can have a unique code path in LLDB with regards to obtaining the class methods. This was tested on the LLDB side and doesn't lead to a regression. There's already code in place to deal with member functions in C++, which deals with this transparently. For more background please refer to the discussion on the mailing list: http://lists.llvm.org/pipermail/llvm-dev/2018-June/123986.html Differential revision: https://reviews.llvm.org/D48241 llvm-svn: 335757	2018-06-27 17:31:59 +00:00
Craig Topper	851f363691	[X86] Rename llvm.x86.avx512.mask.fpclass.p* to exclude 'mask.' from the name to match llvm. llvm-svn: 335745	2018-06-27 15:57:57 +00:00
Ivan A. Kosarev	a9f484ac4a	[NEON] Support vldNq intrinsics in AArch32 (Clang part) This patch reworks the support for dup NEON intrinsics as described in D48439. Differential Revision: https://reviews.llvm.org/D48440 llvm-svn: 335734	2018-06-27 13:58:43 +00:00
Richard Smith	bf5bcf2c15	Diagnose missing 'template' keywords in more cases. We track when we see a name-shaped expression followed by a '<' token and parse the '<' as a comparison. Then: * if we see a token sequence that cannot possibly be an expression but can be a template argument (in particular, a type-id) that follows either a ',' or the '<', diagnose that the '<' was supposed to start a template argument list, and * if we see '>()', diagnose that the '<' was supposed to start a template argument list. This only changes the diagnostic for error cases, and in practice appears to catch the most common cases where a missing 'template' keyword leads to parse errors within a template. Differential Revision: https://reviews.llvm.org/D48571 llvm-svn: 335687	2018-06-26 23:20:26 +00:00
Evgeniy Stepanov	c69e067668	Revert "[MS] Use mangled names and comdats for string merging with ASan" Depends on r334313, which has been reverted in r335681. llvm-svn: 335684	2018-06-26 23:10:48 +00:00
Peter Collingbourne	7a17a8ba1e	Compile CodeGenModule.cpp with /bigobj. Apparently we're now hitting an object file section limit on this file with expensive checks enabled. llvm-svn: 335636	2018-06-26 17:45:26 +00:00
Alexey Bataev	91433f6877	[OPENMP, NVPTX] Reduce the number of the globalized variables. Patch tries to make better analysis of the variables that should be globalized. From now, instead of all parallel directives it will check only distribute parallel .. directives and check only for firstprivte/lastprivate variables if they must be globalized. llvm-svn: 335632	2018-06-26 17:24:03 +00:00
Peter Collingbourne	e44acadf6a	Implement CFI for indirect calls via a member function pointer. Similarly to CFI on virtual and indirect calls, this implementation tries to use program type information to make the checks as precise as possible. The basic way that it works is as follows, where `C` is the name of the class being defined or the target of a call and the function type is assumed to be `void()`. For virtual calls: - Attach type metadata to the addresses of function pointers in vtables (not the functions themselves) of type `void (B::)()` for each `B` that is a recursive dynamic base class of `C`, including `C` itself. This type metadata has an annotation that the type is for virtual calls (to distinguish it from the non-virtual case). - At the call site, check that the computed address of the function pointer in the vtable has type `void (C::)()`. For non-virtual calls: - Attach type metadata to each non-virtual member function whose address can be taken with a member function pointer. The type of a function in class `C` of type `void()` is each of the types `void (B::)()` where `B` is a most-base class of `C`. A most-base class of `C` is defined as a recursive base class of `C`, including `C` itself, that does not have any bases. - At the call site, check that the function pointer has one of the types `void (B::)()` where `B` is a most-base class of `C`. Differential Revision: https://reviews.llvm.org/D47567 llvm-svn: 335569	2018-06-26 02:15:47 +00:00
Craig Topper	4ef61aecbd	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. Additional IR is emitted to convert between scalar and vXi1 type to match the expected software inferface for the builtin that clang exposes. llvm-svn: 335564	2018-06-26 00:44:02 +00:00
Sam Clegg	6fd7d680b0	[WebAssembly] Add no-prototype attribute to prototype-less C functions The WebAssembly backend in particular benefits from being able to distinguish between varargs functions (...) and prototype-less C functions. Differential Revision: https://reviews.llvm.org/D48443 llvm-svn: 335510	2018-06-25 18:47:32 +00:00
Alexey Bataev	96edb2e37e	[OPENMP] Do not consider address constant vars as possibly threadprivate. Do not delay emission of the address constant variables in OpenMP mode as they cannot be defined as threadprivate. llvm-svn: 335483	2018-06-25 15:32:05 +00:00
Igor Kudrin	eff8f9d178	[CodeGen] Provide source locations for UBSan type checks when emitting constructor calls. Differential Revision: https://reviews.llvm.org/D48531 llvm-svn: 335445	2018-06-25 05:48:04 +00:00
Brian Gesiak	12728474b3	[Coroutines] Less IR for noexcept await_resume Summary: In his review of https://reviews.llvm.org/D45860, @GorNishanov suggested avoiding generating additional exception-handling IR in the case that the resume function was marked as 'noexcept', and exceptions could not occur. This implements that suggestion. Test Plan: `check-clang` Reviewers: GorNishanov, EricWF Reviewed By: GorNishanov Subscribers: cfe-commits, GorNishanov Differential Revision: https://reviews.llvm.org/D47673 llvm-svn: 335422	2018-06-23 18:57:26 +00:00
Tobias Edler von Koch	7609cb83e6	Re-land "[LTO] Enable module summary emission by default for regular LTO" Since we are now producing a summary also for regular LTO builds, we need to run the NameAnonGlobals pass in those cases as well (the summary cannot handle anonymous globals). See https://reviews.llvm.org/D34156 for details on the original change. This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3. llvm-svn: 335385	2018-06-22 20:23:21 +00:00
Alexey Bataev	12c62908b5	[OPENMP, NVPTX] Fix reduction of the big data types/structures. If the shuffle is required for the reduced structures/big data type, current code may cause compiler crash because of the loading of the aggregate values. Patch fixes this problem. llvm-svn: 335377	2018-06-22 19:10:38 +00:00
Gabor Buella	716863c820	[X86] Lower _mm[256\|512]_cmp[.]_mask intrinsics to native llvm IR Summary: Lowering some vector comparision builtins to fcmp IR instructions. This ignores the signaling behaviour specified in the predicate argument of said builtins. Affected AVX512 builtins: __builtin_ia32_cmpps128_mask __builtin_ia32_cmpps256_mask __builtin_ia32_cmpps512_mask __builtin_ia32_cmppd128_mask __builtin_ia32_cmppd256_mask __builtin_ia32_cmppd512_mask Reviewers: craig.topper, uriel.k, RKSimon, andrew.w.kaylor, spatel, scanon, efriedma Reviewed By: craig.topper, spatel, efriedma Differential Revision: https://reviews.llvm.org/D45616 llvm-svn: 335339	2018-06-22 11:59:16 +00:00
Craig Topper	342b095689	[X86] Update handling in CGBuiltin to be tolerant of out of range immediates. D48464 contains changes that will loosen some of the range checks in SemaChecking to a DefaultError warning that can be disabled. This patch adds explicit masking to avoid using the upper bits of immediates to gracefully handle the warning being disabled. llvm-svn: 335308	2018-06-21 23:39:47 +00:00
Evgeniy Stepanov	fb762b27f2	Ignore blacklist when generating __cfi_check_fail. Summary: Fixes PR37898. Reviewers: pcc, vlad.tsyrklevich Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D48454 llvm-svn: 335305	2018-06-21 23:22:37 +00:00
Tobias Edler von Koch	e597a2cf81	Revert "[LTO] Enable module summary emission by default for regular LTO" This is breaking a couple of buildbots. We need to run the NameAnonGlobal pass for regular LTO now as well (since we're producing a summary). I'll post a separate patch for review to make this happen and then re-commit. This reverts commit c0759b7b1f4a81ff9021b952aa38a222d5fa4dfd. llvm-svn: 335291	2018-06-21 21:24:30 +00:00
Alexey Bataev	4065b9ae48	[OPENMP, NVPTX] Fix globalization of the variables passed to orphaned parallel region. If the current construct requires sharing of the local variable in the inner parallel region, this variable must be globalized to avoid runtime crash. llvm-svn: 335285	2018-06-21 20:26:33 +00:00
Tobias Edler von Koch	9a8be606f3	[LTO] Enable module summary emission by default for regular LTO Summary: With D33921, we gained the ability to have module summaries in regular LTO modules without triggering ThinLTO compilation. Module summaries in regular LTO allow garbage collection (dead stripping) before LTO compilation and thus open up additional optimization opportunities. This patch enables summary emission in regular LTO for all targets except ld64-based ones (which use the legacy LTO API). Reviewers: pcc, tejohnson, mehdi_amini Subscribers: inglorion, eraman, cfe-commits Differential Revision: https://reviews.llvm.org/D34156 llvm-svn: 335284	2018-06-21 20:20:41 +00:00
Anastasis Grammenos	dfe8fe503c	[DebugInfo] Inline for without DebugLocation Summary: This test is a strip down version of a function inside the amalgamated sqlite source. When converted to IR clang produces a phi instruction without debug location. This patch fixes the above issue. Differential Revision: https://reviews.llvm.org/D47720 llvm-svn: 335255	2018-06-21 16:53:48 +00:00
Leonard Chan	db01c3adc6	[Fixed Point Arithmetic] Fixed Point Precision Bits and Fixed Point Literals This diff includes the logic for setting the precision bits for each primary fixed point type in the target info and logic for initializing a fixed point literal. Fixed point literals are declared using the suffixes ``` hr: short _Fract uhr: unsigned short _Fract r: _Fract ur: unsigned _Fract lr: long _Fract ulr: unsigned long _Fract hk: short _Accum uhk: unsigned short _Accum k: _Accum uk: unsigned _Accum ``` Errors are also thrown for illegal literal values ``` unsigned short _Accum u_short_accum = 256.0uhk; // expected-error{{the integral part of this literal is too large for this unsigned _Accum type}} ``` Differential Revision: https://reviews.llvm.org/D46915 llvm-svn: 335148	2018-06-20 17:19:40 +00:00
Peter Collingbourne	d914fd2163	IRgen: Mark aliases of ctors and dtors as unnamed_addr. This is not only semantically correct but ensures that they will not be marked as address-significant once D48155 lands. Differential Revision: https://reviews.llvm.org/D48206 llvm-svn: 334982	2018-06-18 20:58:54 +00:00
Tomasz Krupa	83ba6fa98d	Fix a bug introduced by rL334850 Summary: All *_sqrt_round_s[s\|d] intrinsics should execute a square root on zeroth element from B (Ops[1]) and insert in to A (Ops[0]), not the other way around. Reviewers: itaraban, craig.topper Reviewed By: craig.topper Subscribers: craig.topper, cfe-commits Differential Revision: https://reviews.llvm.org/D48288 llvm-svn: 334964	2018-06-18 17:57:05 +00:00
Alexey Bataev	7b55d2d554	[OPENMP, NVPTX] Emit simple reduction if requested. If simple reduction is requested, use the simple reduction instead of the runtime functions calls. llvm-svn: 334962	2018-06-18 17:11:45 +00:00
Yaxun Liu	cbd80f49d5	Call CreateTempAllocaWithoutCast for ActiveFlag This is partial re-commit of r332982. llvm-svn: 334879	2018-06-16 01:20:52 +00:00
Tomasz Krupa	f1792bb3d6	[X86] Lowering sqrt intrinsics to native IR Reviewers: craig.topper, spatel, RKSimon, igorb, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, cfe-commits Differential Revision: https://reviews.llvm.org/D41168 llvm-svn: 334850	2018-06-15 18:05:59 +00:00
Yaxun Liu	aefdb8ed34	[NFC] Add CreateMemTempWithoutCast and CreateTempAllocaWithoutCast This is partial re-commit of r332982 llvm-svn: 334837	2018-06-15 15:33:22 +00:00
Luke Geeson	da2b2e8c26	[AArch64] Reverted rC334696 with Clang VCVTA test fix llvm-svn: 334820	2018-06-15 10:10:45 +00:00
Craig Topper	31730ae761	[X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773	2018-06-14 22:02:35 +00:00
Tomasz Krupa	82aa42af49	[X86] Lowering Mask Scalar intrinsics to native IR (Clang part) Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741	2018-06-14 17:36:23 +00:00
Leonard Chan	ab80f3c8b7	[Fixed Point Arithmetic] Addition of the remaining fixed point types and their saturated equivalents This diff includes changes for the remaining _Fract and _Sat fixed point types. ``` signed short _Fract s_short_fract; signed _Fract s_fract; signed long _Fract s_long_fract; unsigned short _Fract u_short_fract; unsigned _Fract u_fract; unsigned long _Fract u_long_fract; // Aliased fixed point types short _Accum short_accum; _Accum accum; long _Accum long_accum; short _Fract short_fract; _Fract fract; long _Fract long_fract; // Saturated fixed point types _Sat signed short _Accum sat_s_short_accum; _Sat signed _Accum sat_s_accum; _Sat signed long _Accum sat_s_long_accum; _Sat unsigned short _Accum sat_u_short_accum; _Sat unsigned _Accum sat_u_accum; _Sat unsigned long _Accum sat_u_long_accum; _Sat signed short _Fract sat_s_short_fract; _Sat signed _Fract sat_s_fract; _Sat signed long _Fract sat_s_long_fract; _Sat unsigned short _Fract sat_u_short_fract; _Sat unsigned _Fract sat_u_fract; _Sat unsigned long _Fract sat_u_long_fract; // Aliased saturated fixed point types _Sat short _Accum sat_short_accum; _Sat _Accum sat_accum; _Sat long _Accum sat_long_accum; _Sat short _Fract sat_short_fract; _Sat _Fract sat_fract; _Sat long _Fract sat_long_fract; ``` This diff only allows for declaration of these fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches. Differential Revision: https://reviews.llvm.org/D46911 llvm-svn: 334718	2018-06-14 14:53:51 +00:00
Luke Geeson	bb399f8013	[AArch64] reverting rC334693 due to build failures llvm-svn: 334696	2018-06-14 08:59:33 +00:00
Luke Geeson	010bbbf390	[AArch64] Added support for the vcvta_u16_f16 instrinsic for FP16 Armv8.2-A llvm-svn: 334693	2018-06-14 08:28:56 +00:00
Mandeep Singh Grang	2d28383097	[COFF] Add ARM64 intrinsics: __yield, __wfe, __wfi, __sev, __sevl Summary: These intrinsics result in hint instructions. They are provided here for MSVC ARM64 compatibility. Reviewers: mstorsjo, compnerd, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D48132 llvm-svn: 334639	2018-06-13 18:49:35 +00:00
Piotr Padlewski	e368de364e	Add -fforce-emit-vtables Summary: In many cases we can't devirtualize because definition of vtable is not present. Most of the time it is caused by inline virtual function not beeing emitted. Forcing emitting of vtable adds a reference of these inline virtual functions. Note that GCC was always doing it. Reviewers: rjmccall, rsmith, amharc, kuhar Subscribers: llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D47108 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 334600	2018-06-13 13:55:42 +00:00
Richard Smith	8fa638ae6f	Fix crash emitting transparent list initializer for a large aggregate. llvm-svn: 334565	2018-06-13 02:06:28 +00:00
Yaxun Liu	6c10a66ec7	[CUDA][HIP] Set kernel calling convention before arrange function Currently clang set kernel calling convention for CUDA/HIP after arranging function, which causes incorrect kernel function type since it depends on calling convention. This patch moves setting kernel convention before arranging function. Differential Revision: https://reviews.llvm.org/D47733 llvm-svn: 334457	2018-06-12 00:16:33 +00:00
Craig Topper	201b9dd334	[X86] Fix operand order in the shuffle created for blend builtins. This was broken when the builtin was added in r334249. llvm-svn: 334422	2018-06-11 17:06:01 +00:00
Reid Kleckner	3513fdcc0f	[MS] Use mangled names and comdats for string merging with ASan This should reduce the binary size penalty of ASan on Windows. After r334313, ASan will add red zones to globals in comdats, so we will still find OOB accesses to string literals. llvm-svn: 334417	2018-06-11 16:49:43 +00:00
Craig Topper	3cce6a7ed9	[X86] Use target independent masked expandload and compressstore intrinsics to implement expandload/compressstore builtins. Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them. Reviewers: RKSimon, delena, spatel, GBuella Reviewed By: RKSimon Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D47693 llvm-svn: 334366	2018-06-10 17:27:05 +00:00
Ivan A. Kosarev	73c76c35a5	[NEON] Support VST1xN intrinsics in AArch32 mode (Clang part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47446 llvm-svn: 334362	2018-06-10 09:28:10 +00:00
Craig Topper	7c89d046ea	Use SmallPtrSet instead of SmallSet in places where we iterate over the set. SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work. These places were found by hiding the begin/end methods in the SmallSet forwarding. llvm-svn: 334339	2018-06-09 00:30:45 +00:00
Craig Topper	88097d9355	[X86] Add back some masked vector truncate builtins. Custom IRgen a a few others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331	2018-06-08 21:50:08 +00:00
Craig Topper	5f50f33806	[X86] Fold masking into subvector extract builtins. I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330	2018-06-08 21:50:07 +00:00
Craig Topper	03f4f04b91	[X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking. llvm-svn: 334311	2018-06-08 18:00:25 +00:00
Jonas Hahnfeld	3b9cbba9a8	[CUDA] Fix emission of constant strings in sections CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue to unnamed. When emitting the object file LLVM will mark the surrounding section as SHF_MERGE iff the string is nul-terminated and contains no other nuls (see IsNullTerminatedString). This results in problems when saving temporaries because LLVM doesn't set an EntrySize, so reading in the serialized assembly file fails. This never happened for the GPU binaries because they usually contain a nul-character somewhere. Instead this only affected the module ID when compiling relocatable device code. However, this points to a potentially larger problem: If we put a constant string into a named section, we really want the data to end up in that section in the object file. To avoid LLVM merging sections this patch unmarks the GlobalVariable's address as unnamed which also fixes the problem of invalid serialized assembly files when saving temporaries. Differential Revision: https://reviews.llvm.org/D47902 llvm-svn: 334281	2018-06-08 11:17:08 +00:00
Craig Topper	422a1bbb84	[X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking. llvm-svn: 334266	2018-06-08 07:18:33 +00:00
Craig Topper	03de166ccd	[X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking. llvm-svn: 334265	2018-06-08 06:13:16 +00:00
Craig Topper	3428beeb2f	[X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261	2018-06-08 03:24:47 +00:00
Craig Topper	acf5601961	[X86] Add builtins for vpermilps/pd instructions to enable target feature checking. llvm-svn: 334256	2018-06-08 00:59:27 +00:00
Shoaib Meenai	a5fc603379	[CodeGen] Always use MSVC personality for windows-msvc targets The windows-msvc target is meant to be ABI compatible with MSVC, including the exception handling. Ensure that a windows-msvc triple always equates to the MSVC personality being used. This mostly affects the GNUStep and ObjFW Obj-C runtimes. To the best of my knowledge, those are normally not used with windows-msvc triples. I believe WinObjC is based on GNUStep (or it at least uses libobjc2), but that also takes the approach of wrapping Obj-C exceptions in C++ exceptions, so the MSVC personality function is the right one to use there as well. Differential Revision: https://reviews.llvm.org/D47862 llvm-svn: 334253	2018-06-08 00:41:01 +00:00
Craig Topper	7d17d7278b	[X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range. llvm-svn: 334249	2018-06-08 00:00:21 +00:00
Craig Topper	9392136414	[X86] Add builtins for shuff32x4/shuff64x2/shufi32x4/shuff64x2 to enable target feature checking and immediate range checking. llvm-svn: 334244	2018-06-07 23:03:08 +00:00
Reid Kleckner	aa46ed9278	[MS] Re-add support for the ARM interlocked bittest intrinscs Adds support for these intrinsics, which are ARM and ARM64 only: _interlockedbittestandreset_acq _interlockedbittestandreset_rel _interlockedbittestandreset_nf _interlockedbittestandset_acq _interlockedbittestandset_rel _interlockedbittestandset_nf Refactor the bittest intrinsic handling to decompose each intrinsic into its action, its width, and its atomicity. llvm-svn: 334239	2018-06-07 21:39:04 +00:00
Craig Topper	e56819eb69	[X86] Add builtins for VALIGNQ/VALIGND to enable proper target feature checking. We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature. I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that. llvm-svn: 334237	2018-06-07 21:27:41 +00:00
Craig Topper	d3623155a2	[X86] Add back builtins for _mm_slli_si128/_mm_srli_si128 and similar intrinsics. We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits. This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously. llvm-svn: 334208	2018-06-07 17:28:03 +00:00
Gabor Buella	1a83d06768	[CodeGen] Improve diagnostics related to target attributes Summary: When requirement imposed by __target__ attributes on functions are not satisfied, prefer printing those requirements, which are explicitly mentioned in the attributes. This makes such messages more useful, e.g. printing avx512f instead of avx2 in the following scenario: ``` $ cat foo.c static inline void __attribute__((__always_inline__, __target__("avx512f"))) x(void) { } int main(void) { x(); } $ clang foo.c foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2' x(); ^ 1 error generated. ``` bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338 Reviewers: craig.topper, echristo, dblaikie Reviewed By: craig.topper, echristo Differential Revision: https://reviews.llvm.org/D46541 llvm-svn: 334174	2018-06-07 08:48:36 +00:00
Craig Topper	b92c77d176	[X86] Add back _mask, _maskz, and _mask3 builtins for some 512-bit fmadd/fmsub/fmaddsub/fmsubadd builtins. Summary: We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't. I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim. The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform. I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it. Reviewers: tkrupa, RKSimon, spatel, GBuella Reviewed By: tkrupa Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47724 llvm-svn: 334159	2018-06-07 02:46:02 +00:00
Reid Kleckner	11c99ed05f	[MS][ARM64]: Promote _setjmp to_setjmpex as there is no _setjmp in the ARM64 libvcruntime.lib Factor out the common setjmp call emission code. Based on a patch by Chris January Differential Revision: https://reviews.llvm.org/D47784 llvm-svn: 334112	2018-06-06 18:39:47 +00:00
Reid Kleckner	05df851327	Fix std::tuple errors llvm-svn: 334060	2018-06-06 01:44:10 +00:00
Reid Kleckner	368d52b7e0	Implement bittest intrinsics generically for non-x86 platforms I tested these locally on an x86 machine by disabling the inline asm codepath and confirming that it does the same bitflips as we do with the inline asm. Addresses code review feedback. llvm-svn: 334059	2018-06-06 01:35:08 +00:00
Craig Topper	f3914b74c1	[X86] Add builtins for vector element insert and extract for different 128 and 256 bit vector types. Use them to implement the extract and insert intrinsics. Previously we were just using extended vector operations in the header file. This unfortunately allowed non-constant indices to be used with the intrinsics. This is incompatible with gcc, icc, and MSVC. It also introduces a different performance characteristic because non-constant index gets lowered to a vector store and an element sized load. By adding the builtins we can check for the index to be a constant and ensure its in range of the vector element count. User code still has the option to use extended vector operations themselves if they need non-constant indexing. llvm-svn: 334057	2018-06-06 00:24:55 +00:00
Craig Topper	6b5b5ce06c	[X86] Implement __builtin_ia32_vec_ext_v2si correctly even though we only use it with an index of 0. This builtin takes an index as its second operand, but the codegen hardcodes an index of 0 and doesn't use the operand. The only use of the builtin in the header file passes 0 to the operand so this works for that usage. But its more correct to use the real operand. llvm-svn: 334054	2018-06-05 22:40:03 +00:00
Yaxun Liu	6328f9a988	[CUDA][HIP] Do not emit type info when compiling for device CUDA/HIP does not support RTTI on device side, therefore there is no point of emitting type info when compiling for device. Emitting type info for device not only clutters the IR with useless global variables, but also causes undefined symbol at linking since vtable for cxxabiv1::class_type_info has external linkage. Differential Revision: https://reviews.llvm.org/D47694 llvm-svn: 334021	2018-06-05 15:11:02 +00:00
Reid Kleckner	1d9c249db5	Reimplement the bittest intrinsic family as builtins with inline asm We need to implement _interlockedbittestandset as a builtin for windows.h, so we might as well do the whole family. It reduces code duplication anyway. Fixes PR33188, a long standing bug in our bittest implementation encountered by Chakra. llvm-svn: 333978	2018-06-05 01:33:40 +00:00
Reid Kleckner	89fbd55145	Revert r333791 "Cap "voluntary" vector alignment at 16 for all Darwin platforms." Adding __attribute__((aligned(32))) to __m256 breaks the implementation of _mm256_loadu_ps on Windows. On Windows, alignment attributes have higher precedence than packing attributes. We also might want to carefully consider the consequences of changing our vector typedefs, since many users copy them and invent their own new, non-Intel specific vector type names. llvm-svn: 333958	2018-06-04 21:39:20 +00:00
David Blaikie	181a61307b	Update for an LLVM header file move llvm-svn: 333955	2018-06-04 21:23:29 +00:00
Heejin Ahn	0083179f06	Remove llvm::Triple argument from get***Personality() functions. NFC. Summary: Because `llvm::Triple` can be derived from `TargetInfo`, it is simpler to take only `TargetInfo` argument. Reviewers: sbc100 Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47620 llvm-svn: 333938	2018-06-04 18:23:00 +00:00
Leonard Chan	f921d85422	This diff includes changes for supporting the following types. // Primary fixed point types signed short _Accum s_short_accum; signed _Accum s_accum; signed long _Accum s_long_accum; unsigned short _Accum u_short_accum; unsigned _Accum u_accum; unsigned long _Accum u_long_accum; // Aliased fixed point types short _Accum short_accum; _Accum accum; long _Accum long_accum; This diff only allows for declaration of the fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches. The saturated versions of these types and the equivalent _Fract types will also be added in future patches. The tests included are for asserting that we can declare these types. Fixed the test that was failing by not checking for dso_local on some targets. Differential Revision: https://reviews.llvm.org/D46084 llvm-svn: 333923	2018-06-04 16:07:52 +00:00
Craig Topper	6fb26f93ef	[X86] Replace __builtin_ia32_vbroadcastf128_pd256 and __builtin_ia32_vbroadcastf128_ps256 with an unaligned load intrinsics and a __builtin_shufflevector call. llvm-svn: 333853	2018-06-03 19:42:59 +00:00
Craig Topper	f886b44693	[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC llvm-svn: 333851	2018-06-03 19:02:57 +00:00
Craig Topper	8508c1db98	Revert r333848 "[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC" Looks like I missed some changes to make this work. llvm-svn: 333850	2018-06-03 18:41:22 +00:00
Craig Topper	d4a610f6f7	[X86] Pass ArrayRef instead of SmallVectorImpl& to the X86 builtin helper functions. NFC llvm-svn: 333848	2018-06-03 18:08:37 +00:00
Craig Topper	21f56f5b9c	[X86] When emitting masked loads/stores don't check for all ones mask. This seems like a premature optimization. It's unlikely a user would pass something the frontend can tell is all ones to the masked load/store intrinsics. We do this optimization for emitting select for masking because we have builtin calls in header files that pass an all ones mask in. Though at this point we may not longer have any builtins that emit some IR and a select. We may only have the select builtins so maybe we can remove that optimization too. llvm-svn: 333847	2018-06-03 18:08:36 +00:00
Ivan A. Kosarev	9c40c0ad0c	[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333829	2018-06-02 17:42:59 +00:00
Leonard Chan	0d485dbb40	Revert "This diff includes changes for supporting the following types." This reverts commit r333814, which fails for a test checking the bit width on ubuntu. llvm-svn: 333815	2018-06-02 03:27:13 +00:00
Leonard Chan	db55d8331e	This diff includes changes for supporting the following types. ``` // Primary fixed point types signed short _Accum s_short_accum; signed _Accum s_accum; signed long _Accum s_long_accum; unsigned short _Accum u_short_accum; unsigned _Accum u_accum; unsigned long _Accum u_long_accum; // Aliased fixed point types short _Accum short_accum; _Accum accum; long _Accum long_accum; ``` This diff only allows for declaration of the fixed point types. Assignment and other operations done on fixed point types according to http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1169.pdf will be added in future patches. The saturated versions of these types and the equivalent `_Fract` types will also be added in future patches. The tests included are for asserting that we can declare these types. Differential Revision: https://reviews.llvm.org/D46084 llvm-svn: 333814	2018-06-02 02:58:51 +00:00
John McCall	280c656031	Cap "voluntary" vector alignment at 16 for all Darwin platforms. This fixes two major problems: - We were not capping vector alignment as desired on 32-bit ARM. - We were using different alignments based on the AVX settings on Intel, so we did not have a consistent ABI. This is an ABI break, but we think we can get away with it because vectors tend to be used mostly in inline code (which is why not having a consistent ABI has not proven disastrous on Intel). Intel's AVX types are specified as having 32-byte / 64-byte alignment, so align them explicitly instead of relying on the base ABI rule. Note that this sort of attribute is stripped from template arguments in template substitution, so there's a possibility that code templated over vectors will produce inadequately-aligned objects. The right long-term solution for this is for alignment attributes to be interpreted as true qualifiers and thus preserved in the canonical type. llvm-svn: 333791	2018-06-01 21:34:26 +00:00
Heejin Ahn	1eb074d76e	[WebAssembly] Hide new Wasm EH behind its feature flag Summary: clang's current wasm EH implementation is a non-MVP feature in progress. We had a `-mexception-handling` wasm feature but were not using it. This patch hides the non-MVP wasm EH behind a flag, so it does not affect other code for now. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D47614 llvm-svn: 333716	2018-06-01 01:01:37 +00:00
Vedant Kumar	d781d97ed4	[Coverage] End deferred regions before labels, fixes PR35867 A deferred region should end before the start of a label, and should not extend to the start of the label sub-statement. Fixes llvm.org/PR35867. llvm-svn: 333715	2018-06-01 00:37:13 +00:00
Dan Gohman	9f8ee03772	[WebAssembly] Update to the new names for the memory builtin functions. The WebAssembly committee has decided on the names `memory.size` and `memory.grow` for the memory intrinsics, so update the clang builtin functions to follow those names, keeping both sets of old names in place for compatibility. llvm-svn: 333712	2018-06-01 00:05:51 +00:00
Heejin Ahn	c647919933	[WebAssembly] Use Windows EH instructions for Wasm EH Summary: Because wasm control flow needs to be structured, using WinEH instructions to support wasm EH brings several benefits. This patch makes wasm EH uses Windows EH instructions, with some changes: 1. Because wasm uses a single catch block to catch all C++ exceptions, this merges all catch clauses into a single catchpad, within which we test the EH selector as in Itanium EH. 2. Generates a call to `__clang_call_terminate` in case a cleanup throws. Wasm does not have a runtime to handle this. 3. In case there is no catch-all clause, inserts a call to `__cxa_rethrow` at the end of a catchpad in order to unwind to an enclosing EH scope. Reviewers: majnemer, dschuff Subscribers: jfb, sbc100, jgravelle-google, sunfish, cfe-commits Differential Revision: https://reviews.llvm.org/D44931 llvm-svn: 333703	2018-05-31 22:18:13 +00:00
Reid Kleckner	26fc531dbc	Fix null MSInheritanceAttr deref in CXXRecordDecl::getMSInheritanceModel() Ensure latest MPT decl has a MSInheritanceAttr when instantiating templates, to avoid null MSInheritanceAttr deref in CXXRecordDecl::getMSInheritanceModel(). See PR#37399 for repo / details. Patch by Andrew Rogers! Differential Revision: https://reviews.llvm.org/D46664 llvm-svn: 333680	2018-05-31 18:42:29 +00:00
Peter Collingbourne	3aa30e8062	IRGen: Write .dwo files when -split-dwarf-file is used together with -fthinlto-index. Differential Revision: https://reviews.llvm.org/D47597 llvm-svn: 333677	2018-05-31 18:25:59 +00:00
Vedant Kumar	61763b65af	[Coverage] Discard the last uncompleted deferred region in a decl Discard the last uncompleted deferred region in a decl, if one exists. This prevents lines at the end of a function containing only whitespace or closing braces from being marked as uncovered, if they follow a region terminator (return/break/etc). The previous behavior was to heuristically complete deferred regions at the end of a decl. In practice this ended up being too brittle for too little gain. Users would complain that there was no way to reach full code coverage because whitespace at the end of a function would be marked uncovered. rdar://40238228 Differential Revision: https://reviews.llvm.org/D46918 llvm-svn: 333609	2018-05-30 23:35:44 +00:00
Peter Collingbourne	ac94ca54c5	IRGen: Rename bitsets -> type metadata. NFC. "Type metadata" is the term that we've been using for the CFI-related information on vtables for a while now. llvm-svn: 333602	2018-05-30 22:29:08 +00:00
Gabor Buella	70d8d51073	[X86] Lowering FMA intrinsics to native IR (Clang part) This patch replaces all packed (and scalar without rounding mode) fused intrinsics with fmadd/fmaddsub variations. Then fmadd/fmaddsub are lowered to native IR. Patch by tkrupa Reviewers: craig.topper, sroland, spatel, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D47444 llvm-svn: 333555	2018-05-30 15:27:49 +00:00
Simon Tatham	89e31fa7fc	Support __iso_volatile_load8 etc on aarch64-win32. These intrinsics are used by MSVC's header files on AArch64 Windows as well as AArch32, so we should support them for both targets. I've factored them out of CodeGenFunction::EmitARMBuiltinExpr into separate functions that EmitAArch64BuiltinExpr can call as well. Reviewers: javed.absar, mstorsjo Reviewed By: mstorsjo Subscribers: kristof.beyls, cfe-commits Differential Revision: https://reviews.llvm.org/D47476 llvm-svn: 333513	2018-05-30 07:54:05 +00:00
Richard Smith	b534510cd5	Make the mangled name collision diagnostic a bit more useful by listing the mangling. This helps especially when the collision is for a template specialization, where the template arguments are not available from anywhere else in the diagnostic, and are likely relevant to the problem. llvm-svn: 333489	2018-05-30 01:52:16 +00:00
Richard Smith	6ca999baf2	Revert r332839. This is causing miscompiles and "definition with same mangled name as another definition" errors. llvm-svn: 333482	2018-05-30 00:45:10 +00:00
Akira Hatanaka	1da9dbbc25	[CodeGen][Darwin] Set the calling-convention of thread-local variable initialization functions to 'cxx_fast_tlscc'. This fixes a bug where instructions calling initialization functions for thread-local static members of c++ template classes were using calling convention 'cxx_fast_tlscc' while the called functions weren't annotated with the calling convention. rdar://problem/40447463 Differential Revision: https://reviews.llvm.org/D47354 llvm-svn: 333447	2018-05-29 18:28:49 +00:00
Paul Robinson	76178632a2	Revert "[DebugInfo] Don't bother with MD5 checksums of preprocessed files." This reverts commit d734f2aa3f76fbf355ecd2bbe081d0c1f49867ab. Also known as r333311. A very small but nonzero number of bots fail. llvm-svn: 333319	2018-05-25 22:35:59 +00:00
Bob Wilson	fa84fc916c	Support Swift calling convention for PPC64 targets This adds basic support for the Swift calling convention with PPC64 targets. Patch provided by Atul Sowani in bug report #37223 llvm-svn: 333316	2018-05-25 21:26:03 +00:00
Paul Robinson	638d606f83	[DebugInfo] Don't bother with MD5 checksums of preprocessed files. The checksum will not reflect the real source, so there's no clear reason to include them in the debug info. Also this was causing a crash on the DWARF side. Differential Revision: https://reviews.llvm.org/D47260 llvm-svn: 333311	2018-05-25 20:59:29 +00:00
Alexey Bataev	0baba9e728	[OPENMP, NVPTX] Fixed codegen for orphaned parallel region. If orphaned parallel region is found, the next code must be emitted: ``` if(__kmpc_is_spmd_exec_mode() \|\| __kmpc_parallel_level(loc, gtid)) Serialized execution. else if (IsMasterThread()) Prepare and signal worker. else Outined function call. ``` llvm-svn: 333301	2018-05-25 20:16:03 +00:00
Richard Smith	3e268632cf	Use zeroinitializer for (trailing zero portion of) large array initializers more reliably. This re-commits r333044 with a fix for PR37560. llvm-svn: 333141	2018-05-23 23:41:38 +00:00
Hans Wennborg	156349fa10	Revert r333044 "Use zeroinitializer for (trailing zero portion of) large array initializers" It caused asserts, see PR37560. > Use zeroinitializer for (trailing zero portion of) large array initializers > more reliably. > > Clang has two different ways it emits array constants (from InitListExprs and > from APValues), and both had some ability to emit zeroinitializer, but neither > was able to catch all cases where we could use zeroinitializer reliably. In > particular, emitting from an APValue would fail to notice if all the explicit > array elements happened to be zero. In addition, for large arrays where only an > initial portion has an explicit initializer, we would emit the complete > initializer (which could be huge) rather than emitting only the non-zero > portion. With this change, when the element would have a suffix of more than 8 > zero elements, we emit the array constant as a packed struct of its initial > portion followed by a zeroinitializer constant for the trailing zero portion. > > In passing, I found a bug where SemaInit would sometimes walk the entire array > when checking an initializer that only covers the first few elements; that's > fixed here to unblock testing of the rest. > > Differential Revision: https://reviews.llvm.org/D47166 llvm-svn: 333067	2018-05-23 08:24:01 +00:00
Craig Topper	f2043b08b4	[X86] Remove mask argument from more builtins that are handled completely in CGBuiltin.cpp. Just wrap a select builtin around them in the header file instead. llvm-svn: 333061	2018-05-23 04:51:54 +00:00

1 2 3 4 5 ...

11830 Commits