llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	a9831cce1e	[NFC] Remove public uses of AttributeList::getAttributes() Use methods that better convey the intent.	2021-08-13 11:38:12 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Amy Kwan	581a80304c	[PowerPC] Disable CTR Loop generate for fma with the PPC double double type. It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this intrinsic needs to remain as a call as it cannot be lowered to any instruction, which also means we need to disable CTR loop generation for fma involving the ppcf128 type. This patch accomplishes this behaviour. Differential Revision: https://reviews.llvm.org/D107914	2021-08-13 12:27:24 -05:00
Haowei Wu	571b0d84d2	[IFS] Fix the copy constructor warning in IFSStub.cpp This change fixes the gcc warning on copy constructor in IFSStub.cpp file. Differential Revision: https://reviews.llvm.org/D108000	2021-08-13 10:17:53 -07:00
Alfonso Gregory	17bc82dd3b	[AsmWriter][NFC] Simplify writeDIGenericSubrange Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107469	2021-08-13 09:31:13 -07:00
Jessica Paquette	ccfc079047	[AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT These are lowered, matching SDAG behaviour. (See llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll) These fall back ~159 times on a build of clang with GISel enabled. Differential Revision: https://reviews.llvm.org/D107777	2021-08-13 09:02:25 -07:00
Jamie Schmeiser	64f29e2dd1	Fix bad assert in print-changed code Summary: The assertion that both functions were not missing was incorrect and would fail when one of the functions was missing. Fixed it and moved the assertion earlier to check the input parameters to better capture first-failure. Added lit test. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D107989	2021-08-13 10:54:30 -04:00
Roman Lebedev	0dc6b597db	Revert "[SCEV] Remove premature assert. PR46786" Since then, the SCEV pointer handling as been improved, so the assertion should now hold. This reverts commit `b96114c1e1`, relanding the assertion from commit `141e845da5`.	2021-08-13 17:50:22 +03:00
Roman Lebedev	c46546bd52	Reland "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological"" The commit originally unearthed a problem, reported as https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff#1019890 Now that the problem has been fixed, and the assertion no longer fires, let's see if there are other cases it fires on. This reverts commit `5c8c24d2de`, relanding commit `f30a7dff8a`.	2021-08-13 15:45:03 +03:00
Roman Lebedev	2702fb1148	[SimplifyCFG] Restart if `removeUndefIntroducingPredecessor()` made changes It might changed the condition of a branch into a constant, so we should restart and constant-fold terminator, instead of continuing with the tautological "conditional" branch. This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff	2021-08-13 15:45:03 +03:00
Roman Lebedev	5c8c24d2de	Revert "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological" The assertion does not hold on a provided reproducer. Reverting until after fixing the problem. This reverts commit `f30a7dff8a`.	2021-08-13 13:16:22 +03:00
Dylan Fleming	4be7fb9762	[SVE] Add folds for truncation of vscale Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107453	2021-08-13 10:18:00 +01:00
Rosie Sumpter	46abd1fbe8	[LoopFlatten] Fix assertion failure in checkOverflow There is an assertion failure in computeOverflowForUnsignedMul (used in checkOverflow) due to the inner and outer trip counts having different types. This occurs when the IV has been widened, but the loop components are not successfully rediscovered. This is fixed by some refactoring of the code in findLoopComponents which identifies the trip count of the loop.	2021-08-13 10:07:49 +01:00
luxufan	ee65938357	[JITLink] Update ELF_x86_64 's edge kind to generic edge kind This patch uses a switch statement to map the ELF_x86_64's edge kind to generic edge kind, and merge the ELF_x86_64 's applyFixup function to the x86_64 's applyFixup function. Some edge kinds were not have corresponding generic edge kinds, so I added three generic edge kinds asa follows: 1. RequestGOTAndTransformToDelta64, which is similar to RequestGOTAndTransformToDelta32. 2. GOTDelta64. This generic kind is similar to Delta64, except the GOTDelta64 computes the delta relative to GOTSymbol 3. RequestGOTAndTransformToGOTDelta64. This edge kind was used to deal with ELF_x86_64's GOT64 edge kind, it request the fixGOTEdge function to change the target to GOT entry, and set the edge kind to generic edge kind GOTDelta64. These added generic edge kinds may named haphazardly, or can't express its meaning well. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D107967	2021-08-13 12:53:54 +08:00
Shivam Gupta	835ea22b37	[AVR] Enable machine verifier Reviewed By: mhjacobson, benshi001 Differential Revision: https://reviews.llvm.org/D107853	2021-08-13 12:11:22 +08:00
Michael Kruse	b1de32d6dd	[OMPIRBuilder] Clarify CanonicalLoopInfo. NFC. Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives. CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards. In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0 Also see discussion in D105706. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D107540	2021-08-12 21:02:19 -05:00
Giorgis Georgakoudis	60e643fe05	[OpenMP][Fix] Fix disable spmdization option Besides SPMDization, other analysis and optimization for original, frontend-generated SPMD regions uses information from the AAKernelInfoFunction attribute. This fix makes sure disabling SPMDization through the corresponding option applies only to generic mode regions, which should not be SPMDized, while it leaves unaffected the attribute state of original SPMD regions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108001	2021-08-12 17:59:14 -07:00
Ruiling Song	e1beebbac5	SplitKit: Don't further split subrange mask in buildCopy We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829	2021-08-13 07:36:38 +08:00
Heejin Ahn	adb96d2e76	[WebAssembly] Fix leak in Emscripten SjLj For SjLj, we allocate a table to record setjmp buffer info in the entry of each setjmp-calling function by inserting a `malloc` call, and insert a `free` call to free the buffer before each `ret` instruction. But this is not sufficient; we have to free the buffer before we throw. In SjLj handling, normal functions that can possibly throw or longjmp are wrapped with an invoke and caught within the function so they don't end up escaping the function. But three functions throw and escape the function: - `__resumeException` (Emscripten library function used for Emscripten EH) - `emscripten_longjmp` (Emscripten library function used for Emscripten SjLj) - `__cxa_throw` (libc++abi function called when for C++ `throw` keyword) The first two functions are used to rethrow the current exception/longjmp when the caught exception/longjmp is not for the current function. `__cxa_throw` is used for exception, and because we consider that a function that cannot longjmp, it escapes the function right away, before which we should free the buffer. Currently `lsan.test_longjmp3` and `lsan.test_exceptions_longjmp3` fail in Emscripten; this CL fixes these. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107852	2021-08-12 16:32:46 -07:00
Heejin Ahn	aca198cf74	[WebAssembly] Error out when Emscripten SjLj setjmp is used with Wasm EH Currently, when Wasm EH is used with Emscripten SjLj, Emscripten SjLj cannot handle `invoke` instructions - it assumes all `invoke`s have been lowered away with Emscripten EH. But in Wasm EH they are lowered in instruction selection, so they are still present in the IR stage. This happens when 1. Wasm EH and Emscripten SjLj are used together 2. A function that calls `setjmp` uses exceptions, i.e., has `invoke`s We were already erroring out with an assertion failure in this case, but this CL makes it error out more properly with a valid error message. Wasm EH + Wasm SjLj will not have this restrictions. (it will have another restriction though, e.g., `setjmp` cannot be called within `catch`. But why would anyone do that..) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107687	2021-08-12 16:19:04 -07:00
Lei Huang	8930af45c3	[PowerPC] Implement XL compatibility builtin __addex Add builtin and intrinsic for `__addex`. This patch is part of a series of patches to provide builtins for compatibility with the XL compiler. Reviewed By: stefanp, nemanjai, NeHuang Differential Revision: https://reviews.llvm.org/D107002	2021-08-12 16:38:21 -05:00
Heejin Ahn	78e87970af	[WebAssembly] Disable offset folding for function addresses Wasm does not support function addresses with offsets, but isel can generate folded SDValues in the form of (@func + offset) without this patch. Fixes https://bugs.llvm.org/show_bug.cgi?id=43133. Reviewed By: dschuff, sbc100 Differential Revision: https://reviews.llvm.org/D107940	2021-08-12 13:40:41 -07:00
Sanjay Patel	14eefa57f2	[InstCombine] factorize min/max intrinsic ops with common operand (2nd try) This is a re-try of `6de1dbbd09` which was reverted because it missed a null check. Extra test for that failure added. Original commit message: This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 16:32:07 -04:00
Amy Huang	427520a8fa	Revert "[InstCombine] factorize min/max intrinsic ops with common operand" This reverts commit `6de1dbbd09` because it causes a compiler crash.	2021-08-12 12:36:25 -07:00
Florian Hahn	f999312872	Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts the revert `28c04794df`. The failing MLIR test that caused the revert should be fixed in this version. Also includes a PPC test fix previously in `1f87c7c478`.	2021-08-12 18:31:57 +01:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Roman Lebedev	f30a7dff8a	[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological We really shouldn't deal with a conditional branch that can be trivially constant-folded into an unconditional branch. Indeed, barring failure to trigger BB reprocessing, that should be true, so let's assert as much, and hope the assertion never fires. If it does, we have a bug to fix.	2021-08-12 20:03:09 +03:00
Roman Lebedev	628f63d3d5	[SimplifyCFG] If FoldTwoEntryPHINode() changed things, restart Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()` doesn't get asked to deal with non-unconditional branches, and if i do that, then said assertion fires on existing tests, and this is what prevents it from firing.	2021-08-12 20:03:09 +03:00
Sanjay Patel	790c29ab86	[InstCombine] fold umax/umin intrinsics based on demanded bits This is a direct translation of the select folds added with D53033 / D53036 and another step towards canonicalization using the intrinsics (see D98152).	2021-08-12 12:37:45 -04:00
maekawatoshiki	dd3eea6566	[LICM] Support sinking in LNICM Currently, LNICM pass does not support sinking instructions out of loop nest. This patch enables LNICM to sink down as many instructions to the exit block of outermost loop as possible. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D107219	2021-08-13 00:56:26 +09:00
Sanjay Patel	cd44cc86e3	[InstCombine] remove unused function argument; NFC This was just added with `6de1dbbd09` , and I missed pulling the extra arg from the final revision.	2021-08-12 11:47:25 -04:00
Johannes Doerfert	4e7d7cae67	[Attributor][FIX] Do not try to rewrite functions with casted call sites If we cast a function at the call site it is hard(er) to get the rewrite correct, let's not attempt it for now. Fixes PR51448.	2021-08-12 10:39:53 -05:00
Johannes Doerfert	5f543919b2	[Attributor][FIX] Guard constant casts with type size checks	2021-08-12 10:39:53 -05:00
Johannes Doerfert	a420f80bf1	[Attributor] Do not delete volatile stores to null/undef See D106309. Differential Revision: https://reviews.llvm.org/D107906	2021-08-12 10:39:52 -05:00
David Green	ae9a346ef8	[ARM] Fix DAG combine loop in reduction distribution Given a constant operand, the MVE and DAGCombine combines could fight, each redistributing in the opposite order. Add a guard to the MVE vecreduce distribution to prevent that.	2021-08-12 16:37:39 +01:00
Sanjay Patel	be0698559b	[InstCombine] remove shl(neg x), y transform This diff was accidentally committed with: `1b5a195845`	2021-08-12 11:27:22 -04:00
Sanjay Patel	6de1dbbd09	[InstCombine] factorize min/max intrinsic ops with common operand This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 11:19:09 -04:00
Sanjay Patel	1b5a195845	[InstCombine] add tests for factorization of min/max intrinsics; NFC	2021-08-12 11:19:09 -04:00
Victor Huang	99e00663d4	[PowerPC] Fix return address computation for "__builtin_return_address" When depth > 0, callee frame address is used to compute the return address of callee producing improper return address. This patch adds the fix to use caller frame address to compute the return address of callee. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D107646	2021-08-12 09:44:49 -05:00
Liqiang Tao	422fc5603a	[llvm][Inline] Refactor out InlineOrder Move InlineOrder to separated file. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D107831	2021-08-12 22:19:53 +08:00
Mehdi Amini	28c04794df	Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts commit `a1ef81de35`. Broke the MLIR buildbot.	2021-08-12 11:57:19 +00:00
Florian Hahn	a1ef81de35	[Matrix] Overload stride arg in matrix.columnwise.load/store. This patch adjusts the intrinsics definition of llvm.matrix.column.major.load and llvm.matrix.column.major.store to allow overloading the type of the stride. The bitwidth of the stride is used to perform the offset computation. This fixes a crash when using __builtin_matrix_column_major_load or __builtin_matrix_column_major_store on 32 bit platforms. The stride argument of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit platforms. Note that we still perform offset computations with 64 bit width on 32 bit platforms for accesses that do not take a user-specified stride. This can be fixed separately. Fixes PR51304. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D107349	2021-08-12 10:45:25 +01:00
David Truby	9c47d6b48d	[llvm][sve] Lowering for VLS extending loads This patch enables extending loads for fixed length SVE code generation. There is a slight regression here in the mulh tests; since these tests load the parameter and then extend it these are treated as extending loads which are merged, preventing the mulh instruction from being generated. As this affects scalable SVE codegen as well this should be addressed in a separate patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D107057	2021-08-12 09:43:39 +00:00
Cullen Rhodes	419deccfd1	[AArch64] NFC: Remove register decoder tables in disassembler The register classes are generated by TableGen, use them instead of handwritten tables. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107763	2021-08-12 07:28:56 +00:00
Fangrui Song	67d4d7cf68	[Object] Add missing PPC_DYNAMIC_TAG macros	2021-08-12 00:05:04 -07:00
Christudasan Devadasan	5d940b71ae	Reapply "SROA: Enhance speculateSelectInstLoads" Originally committed as `ffc3fb665d` Reverted in `fcf2d5f402` due to an assertion failure. Original commit message: Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-11 22:58:54 -04:00
Amara Emerson	73056f239e	[AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules. These rules were originally written when the new predicate based legalizer was introduced in an attempt to preserve existing behaviour. It wasn't properly kept up to date as things like vector support was split out into G_CONCAT_VECTORS, and frankly, even if it was, it was too complex. It's much easier to start from scratch with what we can actually support, which is just a few type combinations. Anything illegal we should either legalize, or should be eliminated as a side effect of artifact combination. Differential Revision: https://reviews.llvm.org/D107937	2021-08-11 16:45:23 -07:00
Usman Nadeem	9396c3ec7b	[AArch64][SVE] Remove assertion/range check for i16 values during immediate selection The assertion can fail in some cases when an i16 constant is promoted to i32. e.g. in the added test case the value `i16 -32768` is within the range of i16 but the assert fails when the constant is promoted to positive `i32 32768` by an earlier call to DAG.getConstant(). Differential Revision: https://reviews.llvm.org/D107880 Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de	2021-08-11 14:50:20 -07:00
Amara Emerson	2c1789bc8c	[AArch64][GlobalISel] Add ptradd_immed_chain combine to post-legalizer combiner.	2021-08-11 13:59:23 -07:00
Akira Hatanaka	643ce61fb3	[ObjC][ARC] Don't form a StoreStrong call if it is unsafe to move the release call findSafeStoreForStoreStrongContraction checks whether it's safe to move the release call to the store by inspecting all instructions between the two, but was ignoring retain instructions. This was causing objects to be released and deallocated before they were retained. rdar://81668577	2021-08-11 13:50:19 -07:00
Usman Nadeem	a7c4e9b1f7	[InstSimplify] Eliminate vector reverse of a splat vector experimental.vector.reverse(splat(X)) -> splat(X) Differential Revision: https://reviews.llvm.org/D107793 Change-Id: Id29ba88fd669ff8686712e96b1bdc46dda5b853c	2021-08-11 11:27:58 -07:00
Rong Xu	4c5909ba83	[SampleFDO] Add two passes of MIRAddFSDiscriminatorsPass This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register allocation, and Pass2 of MIRAddFSDiscriminatorsPass before Block-Placement. This is still under --enable-fs-discrmininator option (default false). This would reduce the turn-around time for FSAFDO transition. Differential Revision: https://reviews.llvm.org/D104579	2021-08-11 11:11:04 -07:00
Kazu Hirata	63c566b1fd	[DWARF] Remove extractFast (NFC) The last use was removed on Dec 13, 2016 in commit `c8c1032c0c`. This patch repurposes the function comment for the other variant of extractFast.	2021-08-11 09:55:00 -07:00
Sanjay Patel	a0a9c9e188	[InstCombine] avoid breaking up min/max (cmp+sel) idioms This is a quick fix for a motivating case that looks like this: https://godbolt.org/z/GeMqzMc38 As noted, we might be able to restore the min/max patterns with select folds, or we just wait for this to become easier with canonicalization to min/max intrinsics.	2021-08-11 12:48:11 -04:00
Yolanda Chen	8fa16cc628	[LTO][lld] Add lto-pgo-warn-mismatch option When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX) due to source changes (e.g. `#if` code runs for profile generation but not for profile use) To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option. In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option. Add "lto-pgo-warn-mismatch" option to lld COFF/ELF to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO. Differential Revision: https://reviews.llvm.org/D104431	2021-08-11 09:45:55 -07:00
Fraser Cormack	885be620f9	[LegalizeTypes][NFC] Remove else-after-return Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107890	2021-08-11 16:48:28 +01:00
David Green	8c50b5fbfe	[ARM] Add extra debug messages for validating live outs. NFC We are running into more and more cases where the liveouts of low overhead loops do not validate. Add some extra debug messages to make it clearer why.	2021-08-11 10:35:53 +01:00
Wang, Pengfei	6c4809825d	Revert "[lld] Add lto-pgo-warn-mismatch option" This reverts commit `0cfb00a1c9`.	2021-08-11 16:25:42 +08:00
Cullen Rhodes	1fe0e6a380	[AArch64][SME] Support ptrue(s) in streaming mode The ptrue and ptrues instructions are legal in streaming mode, missed in D106272. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107807	2021-08-11 07:49:36 +00:00
Rainer Orth	7bbbf29561	[ELF] Don't emit SHF_GNU_RETAIN on Solaris The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris. Initially, as reported in Bug 49437, it caused dozens of testsuite failures on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but `SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag (in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a completely different semantics, which confuses Solaris `ld` very much. Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris `ld` doesn't support, causing it to SEGV and break the build. The linker is currently being hardened to not accept non-native OS ABIs to avoid this. The need for linker support is already documented in `clang/include/clang/Basic/AttrDocs.td`, but not currently checked. This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D107747	2021-08-11 09:27:51 +02:00
madhur13490	61526b1262	[DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D107845	2021-08-11 12:14:28 +05:30
Yolanda Chen	0cfb00a1c9	[lld] Add lto-pgo-warn-mismatch option When enable CSPGO for ThinLTO, there are profile cfg mismatch warnings that will cause lld-link errors (with /WX). To disable it we have to use an internal "/mllvm:-no-pgo-warn-mismatch" option. In contrast clang uses option ”-Wno-backend-plugin“ to avoid such warnings and gcc has an explicit "-Wno-coverage-mismatch" option. Add this "lto-pgo-warn-mismatch" option to lld to help turn on/off the profile mismatch warnings explicitly when build with ThinLTO and CSPGO. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104431	2021-08-11 14:43:26 +08:00
Petr Hosek	389dc94d4b	[InstrProfiling] Generate runtime hook for Fuchsia When none of the translation units in the binary have been instrumented we shouldn't need to link the profile runtime. However, because we pass -u__llvm_profile_runtime on Linux and Fuchsia, the runtime would still be pulled in and incur some overhead. On Fuchsia which uses runtime counter relocation, it also means that we cannot reference the bias variable unconditionally. This change modifies the InstrProfiling pass to pull in the profile runtime only when needed by declaring the __llvm_profile_runtime symbol in the translation unit only when needed. For now we restrict this only for Fuchsia, but this can be later expanded to other platforms. This approach was already used prior to `9a041a7522`, but we changed it to always generate the __llvm_profile_runtime due to a TAPI limitation, but that limitation may no longer apply, and it certainly doesn't apply on platforms like Fuchsia. Differential Revision: https://reviews.llvm.org/D98061	2021-08-10 23:21:15 -07:00
Petr Hosek	c0c1c3cf93	Revert "[InstrProfiling] Emit bias variable eagerly" This reverts commit `6660cec568` since it was superseded by https://reviews.llvm.org/D98061.	2021-08-10 23:21:15 -07:00
Johannes Doerfert	fc32a5c87d	[Attributor][NFC] Try to make the windows build bots happy Failed for some reason, potentially because of the inner type declaration in combination with the `using`. This might help. Failure: https://lab.llvm.org/buildbot/#/builders/127/builds/15432	2021-08-11 01:11:37 -05:00
Johannes Doerfert	e7e3585cde	[Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly PHI nodes are not pass through but change their value, we have to account for that to avoid missing stores. Follow up for D107798 to fix PR51249 for good. Differential Revision: https://reviews.llvm.org/D107808	2021-08-11 00:49:54 -05:00
Johannes Doerfert	96da6dd6ba	[Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249) AAPointerInfoFloating needs to visit all uses and some multiple times if we go through PHI nodes. Attributor::checkForAllUses keeps a visited set so we don't recurs endlessly. We now allow recursion for non-phi uses so we track all pointer offsets via PHI nodes properly without endless recursion. This replaces the first attempt D107579. Differential Revision: https://reviews.llvm.org/D107798	2021-08-11 00:49:54 -05:00
Johannes Doerfert	e0c5d83a92	[OpenMP][FIX] Disabled optimizations have to be made known To avoid simplification with wrong constants we need to make sure we know that we won't perform specific optimizations based on the users request. The non-SPMDzation and non-CustomStateMachine flags did only prevent the final transformation but allowed to value simplification to go ahead. Differential Revision: https://reviews.llvm.org/D107862	2021-08-11 00:49:53 -05:00
Craig Topper	a8ae41fb51	[SelectionDAGBuilder] Save iterator to avoid second DenseMap lookup. NFC We were calling find and then using operator[]. Instead keep the iterator from find and use it to get the value. Just happened to notice while investigating how we decide what extends to use between basic blocks.	2021-08-10 22:37:48 -07:00
Mircea Trofin	510402c2c8	[NFC][MLGO] 'Use' variable used for asserts	2021-08-10 19:55:17 -07:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
Hongtao Yu	68d6c3e448	[CSSPGO] Additional cleanup as a follow-up to D107838	2021-08-10 19:04:20 -07:00
Hongtao Yu	78523516bc	[CSSPGO] Do not use getCanonicalFnName in pseudo probe descriptor decoding Pseudo probe descriptors are created very early in the pipeline where function names just come from the front end and are not yet decorated. So calling getCanonicalFnName on the function names in probe desc is basically a no-op, which also addes a depenency from MC to ProfileData unnessesarily. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D107838	2021-08-10 18:24:39 -07:00
Amara Emerson	7ec4ce157b	[AArch64][GlobalISel] Relax oneuse restriction for PTR_ADD chain combining to check addressing legality. With contributions by Sebastian Neubauer Differential Revision: https://reviews.llvm.org/D105676	2021-08-10 16:41:18 -07:00
Sushma Unnibhavi	7bdce6bcbd	[M68k][GloballSel] RegBankSelect implementation Implementation of RegBankSelect for the M68k backend. Differential Revision: https://reviews.llvm.org/D107542	2021-08-10 15:24:43 -07:00
Adrian Prantl	a353edb8d6	Simplify coro::salvageDebugInfo() (NFC-ish) This patch removes the hand-rolled implementation of salvageDebugInfo for cast and GEPs and replaces it with a call into llvm::salvageDebugInfoImpl(). A side-effect of this is that additional redundant convert operations are introduced, but those don't have any negative effect on the resulting DWARF expression. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107384	2021-08-10 15:21:18 -07:00
Adrian Prantl	d6b6880172	Streamline the API of salvageDebugInfoImpl (NFC) This patch refactors / simplifies salvageDebugInfoImpl(). The goal here is to simplify the implementation of coro::salvageDebugInfo() in a followup patch. 1. Change the return value to I.getOperand(0). Currently users of salvageDebugInfoImpl() assume that the first operand is I.getOperand(0). This patch makes this information explicit. A nice side-effect of this change is that it allows us to salvage expressions such as add i8 1, %a in the future. 2. Factor out the creation of a DIExpression and return an array of DIExpression operations instead. This change allows users that call salvageDebugInfoImpl() in a loop to avoid the costly creation of temporary DIExpressions and to defer the creation of a DIExpression until the end. This patch does not change any functionality. rdar://80227769 Differential Revision: https://reviews.llvm.org/D107383	2021-08-10 15:21:18 -07:00
Nikita Popov	17db125b48	[MemCpyOpt] Optimize MemoryDef insertion When converting a store into a memset, we currently insert the new MemoryDef after the store MemoryDef, which requires all uses to be renamed to the new def using a whole block scan. Instead, we can insert the new MemoryDef before the store and not rename uses, because we know that the location is immediately overwritten, so all uses should still refer to the old MemoryDef. Those uses will get renamed when the old MemoryDef is actually dropped, which is efficient. I expect something similar can be done for some of the other MSSA updates in MemCpyOpt. This is an alternative to D107513, at least for this particular case. Differential Revision: https://reviews.llvm.org/D107702	2021-08-10 21:28:29 +02:00
Thomas Johnson	b821086876	[ARC] Add codegen for count trailing zeros intrinsic for the ARC backend Differential Revision: https://reviews.llvm.org/D107828	2021-08-10 12:07:35 -07:00
Fangrui Song	76093b1739	[InlineAdvisor] Add single quotes around caller/callee names Clang diagnostics refer to identifier names in quotes. This patch makes inline remarks conform to the convention. New behavior: ``` % clang -O2 -Rpass=inline -Rpass-missed=inline -S a.c a.c:4:25: remark: 'foo' inlined into 'bar' with (cost=-30, threshold=337) at callsite bar:0:25; [-Rpass=inline] int bar(int a) { return foo(a); } ^ ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D107791	2021-08-10 11:51:31 -07:00
Sanjay Patel	b267d3ce8d	[InstCombine] avoid infinite loops from min/max canonicalization The intrinsics have an extra chunk of known bits logic compared to the normal cmp+select idiom. That allows folding the icmp in each case to something better, but that then opposes the canonical form of min/max that we try to form for a select. I'm carving out a narrow exception to preserve all existing regression tests while avoiding the inf-loop. It seems unlikely that this is the only bug like this left, but this should fix: https://llvm.org/PR51419	2021-08-10 14:42:37 -04:00
Jinsong Ji	2cfd427626	[AIX] Don't crash on unimplemented lowerRelativeReference We may call lowerRelativeReference in MC to determine whether target supports this lowering. We should return nullptr instead of crashing when we haven't implemented the real lowering. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D107830	2021-08-10 17:43:06 +00:00
Matt Arsenault	d719f1c3cc	AMDGPU: Add alloc priority to global ranges The requested register class priorities weren't respected globally. Not sure why this is a target option, and not just the expected behavior (recently added in `1a6dc92be7`). This avoids an allocation failure when many wide tuple spills are introduced. I think this is a workaround since I would not expect the allocation priority to be required, and only a performance hint. The allocator should be smarter about when only a subregister needs to be spilled and restored. This does regress a couple of degenerate store stress lit tests which shouldn't be too important.	2021-08-10 13:12:34 -04:00
Matt Arsenault	1b41945da0	RegAllocGreedy: Add spaces between registers in debug message	2021-08-10 13:12:34 -04:00
Craig Topper	6f5edc3487	[RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y)) Similar for sub except sub isn't commutative. Modify the existing and/or/xor folds to also work on ISD::SELECT and not just RISCVISD::SELECT_CC. This is needed to make sure we do this transform before type legalization turns i32 add/sub into add/sub+sign_extend_inreg on RV64. If we don't do this before that, the sign_extend_inreg will still be after the select. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107603	2021-08-10 09:02:56 -07:00
Sanjay Patel	e260e10c4a	[InstSimplify] fold min/max with limit constant This is already done within InstCombine: https://alive2.llvm.org/ce/z/MiGE22 ...but leaving it out of analysis makes it harder to avoid infinite loops there.	2021-08-10 10:57:25 -04:00
Sanjay Patel	188832f419	Revert "[InstSimplify] fold min/max with limit constant; NFC" This reverts commit `f43859b437`. This is not NFC, so I'll try again without that mistake in the commit message.	2021-08-10 10:50:09 -04:00
Sanjay Patel	f43859b437	[InstSimplify] fold min/max with limit constant; NFC This is already done within InstCombine: https://alive2.llvm.org/ce/z/MiGE22 ...but leaving it out of analysis makes it harder to avoid infinite loops there.	2021-08-10 10:43:07 -04:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
Tim Northover	5ad0860899	AArch64: support @llvm.va_copy in GISel	2021-08-10 13:11:03 +01:00
Konstantin Schwarz	64bef13f08	[GlobalISel] Look through truncs and extends in narrowScalarShift If a G_SHL is fed by a G_CONSTANT, the lower and upper bits of the source can be shifted individually by the constant shift amount. However in case the shift amount came from a G_TRUNC(G_CONSTANT), the generic shift legalization code was used, producing intermediate shifts that are potentially illegal on some targets. This change teaches narrowScalarShift to look through G_TRUNCs and G_*EXTs. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D89100	2021-08-10 13:49:22 +02:00
Carl Ritson	a1783b54e8	[SimpifyCFG] Remove recursion from FoldCondBranchOnPHI. NFCI. Avoid stack overflow errors on systems with small stack sizes by removing recursion in FoldCondBranchOnPHI. This is a simple change as the recursion was only iteratively calling the function again on the same arguments. Ideally this would be compiled to a tail call, but there is no guarantee. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107803	2021-08-10 19:14:31 +09:00
David Green	c140ff493e	[ARM] Change a couple of instances of LiveRegs.contains to !LiveRegs.available This changes a couple of calls to LiveRegs.contains to !LiveRegs.available, one in Thumb1FrameLoweringInfo (which modifies a test to look more correct to me, given r7 should be the frame pointer so is not available), and another in the ARMLoadStoreOptimizer, that I don't have a test for, it was just found by inspection. Differential Revision: https://reviews.llvm.org/D107454	2021-08-10 09:53:26 +01:00
Tony Tye	53eb469195	[AMDGPU] Support non-strictly stronger memory orderings in SIMemoryLegalizer C++20 no longer requires the failure memory ordering to be no stronger than the success memory ordering. Adjust assert in AMD GPU SIMemoryLegalizer, and merge instruction memory orderings Add common operation to merge memory orders that allows non strict memory orderings to be combined. Use it in SIMemoryLegalizer and MachineMemOperand::getMergedOrdering. Reviewed By: efriedma, rampitec Differential Revision: https://reviews.llvm.org/D106729	2021-08-10 08:43:03 +00:00
David Sherwood	ce394161cb	[InstCombine] Add more complex folds for extractelement + stepvector I have updated cheapToScalarize to also consider the case when extracting lanes from a stepvector intrinsic. This required removing the existing 'bool IsConstantExtractIndex' and passing in the actual index as a Value instead. We do this because we need to know if the index is <= known minimum number of elements returned by the stepvector intrinsic. Effectively, when extracting lane X from a stepvector we know the value returned is also X. New tests added here: Transforms/InstCombine/vscale_extractelement.ll Differential Revision: https://reviews.llvm.org/D106358	2021-08-10 09:17:21 +01:00
Cullen Rhodes	81f057c253	[AArch64][SVE] NFC: Remove unused p0-p7 with element size predicates Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107752	2021-08-10 07:56:22 +00:00
David Sherwood	8439415333	[IR] Let ConstantVector::getSplat use poison instead of undef This patch updates ConstantVector::getSplat to use poison instead of undef when using insertelement/shufflevector to splat. This follows on from D93793. Differential Revision: https://reviews.llvm.org/D107751	2021-08-10 08:27:43 +01:00
Wang, Pengfei	6f7f5b54c8	[X86] AVX512FP16 instructions enabling 1/6 1. Enable FP16 type support and basic declarations used by following patches. 2. Enable new instructions VMOVW and VMOVSH. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105263	2021-08-10 12:46:01 +08:00
Usman Nadeem	5420fc4a27	[AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend Replace vector unpack operation with a scalar extend operation. unpack(splat(X)) --> splat(extend(X)) If we have both, unpkhi and unpklo, for the same vector then we may save a register in some cases, e.g: Hi = unpkhi (splat(X)) Lo = unpklo(splat(X)) --> Hi = Lo = splat(extend(X)) Differential Revision: https://reviews.llvm.org/D106929 Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b	2021-08-09 14:58:54 -07:00
Usman Nadeem	85bbc05154	[AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value Move the last{a,b} operation to the vector operand of the binary instruction if the binop's operand is a splat value. This essentially converts the binop to a scalar operation. Example: // If x and/or y is a splat value: lastX (binop (x, y)) --> binop(lastX(x), lastX(y)) Differential Revision: https://reviews.llvm.org/D106932 Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0	2021-08-09 14:48:41 -07:00
Eli Friedman	ac20e56911	[AArch64] Implement FCOPYSIGN for SVE. I was originally going to try to implement this in target-independent code, but it's actually sort of tricky to generate the correct sequence for vectors like nxv2f32. So just stick this in target-specific code, at least for now. Differential Revision: https://reviews.llvm.org/D107608	2021-08-09 12:06:48 -07:00
Arnold Schwaighofer	b987c283ae	[coro] Correct CurrentBlock tracking bug recently introduced We use the CurrentBlock to determine whether we have already processed a block. Don't reuse this variable for setting where we should insert the rematerialization. The rematerialization block is different to the current block when we rematerialize for coro suspend block users. Differential Revision: https://reviews.llvm.org/D107573	2021-08-09 10:41:41 -07:00
Bradley Smith	73ecb9987b	[AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter The patterns for fixed length gather/scatter with 32-bit offsets and 64-bit memory type are slightly different that the rest of the patterns, as such the lowering needs to be slightly different to ensure the correct types are used. Differential Revision: https://reviews.llvm.org/D107576	2021-08-09 14:05:22 +00:00
Jeremy Morse	d4ce9e463d	[DWARF] Revert sharing subprograms across CUs This patch is a revert of `e08f205f5c`. In that patch, DW_TAG_subprograms were permitted to be referenced across CU boundaries, to improve stack trace construction using call site information. Unfortunately, as documented in PR48790, the way that subprograms are "owned" by dwarf units is sufficiently complicated that subprograms end up in unexpected units, invalidating cross-unit references. There's no obvious way to easily fix this, and several attempts have failed. Revert this to ensure correct DWARF is always emitted. Three tests change in addition to the reversion, but they're all very light alterations. Differential Revision: https://reviews.llvm.org/D107076	2021-08-09 12:43:43 +01:00
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Luo, Yuanke	53642d5b80	[NFC] Fix the formula for reciprocal calculation. Differential Revision: https://reviews.llvm.org/D107713	2021-08-09 16:03:56 +08:00
Min-Yih Hsu	7cbcde4aa3	[M68k] Use separate asm operand class for different widths of address This could help asm parser to pick the correct variant of instruction. This patch also migrated all the control instructions MC tests.	2021-08-09 00:07:19 -07:00
Min-Yih Hsu	cf277f0b31	[M68k][NFC] Coalesce render methods in different asm register op class And assign RegClass (i.e. operand class for all GPR) as the super class of ARegClass and DRegClass. Note that this is a NFC change because actually we already had XRDReg to model either address or data register operands (as well as test coverage for it). The new super class syntax added here is just making the relations between three RegClass-es more explicit.	2021-08-09 00:07:19 -07:00
Cullen Rhodes	1a18bb9270	[AArch64] NFC: Remove DecodeVectorRegisterClass from disassembler The decoder function and table are the same as FPR128, use that instead. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107644	2021-08-09 06:52:47 +00:00
Simon Atanasyan	990e8025b5	[MC][ELF] Do not error on parsing .debug_* section directive for MIPS MIPS .debug_* sections should have SHT_MIPS_DWARF section type to distinguish among sections contain DWARF and ECOFF debug formats, but in assembly files these sections have SHT_PROGBITS (@progbits) type. Now assembler shows 'changed section type for ...' error when parsing `.section .debug_*,"",@progbits` directive for MIPS targets. The same problem exists for x86-64 target and this patch extends workaround implemented in D76151. The patch adds one more case when assembler ignores section types mismatch after `SwitchSection()` call. Differential Revision: https://reviews.llvm.org/D107707	2021-08-09 08:54:56 +03:00
Christudasan Devadasan	fcf2d5f402	Revert "SROA: Enhance speculateSelectInstLoads" This reverts commit `ffc3fb665d`.	2021-08-09 01:13:39 -04:00
Michael Liao	b5e470aa2e	[LowerMemIntrinsics] Typo fix.	2021-08-08 22:38:58 -04:00
Craig Topper	2f3b738960	[RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64. This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.	2021-08-08 18:30:48 -07:00
Craig Topper	88bc29f5f2	[RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC Previously we converted ISD condition codes to integers and stored them directly in our MIR instructions. The ISD enum kind of belongs to SelectionDAG so that seems like incorrect layering. This patch instead uses a CondCode node on RISCV::SELECT_CC until isel and then converts it from ISD encoding to a RISCV specific value. This value can be converted to/from the RISCV branch opcodes in the RISCV namespace. My larger motivation is to possibly support a microarchitectural feature of some CPUs where a short forward branch over a single instruction can be predicated internally. This will require a new pseudo instruction for select that needs to carry a branch condition and live probably until RISCVExpandPseudos. At that point it can be expanded to control flow without other instructions ending up in the predicated basic block. Using an ISD encoding in RISCVExpandPseudos doesn't seem like correct layering. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107400	2021-08-08 17:25:37 -07:00
Craig Topper	20dfe051ab	[RISCV] Move the $rs operand of PseudoStore from outs to ins. NFC This is the data to be stored so it should be an input. To keep operand order similar between loads and stores, move the temp register to the first dest operand of floating point loads. Rework the assembler code accordingly. This doesn't have any functional effect because this Pseudo is only used by the assembler which doesn't use ins/outs. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107309	2021-08-08 15:58:24 -07:00
Kazu Hirata	d9c9d13365	[DWARF] Remove collectChildrenAddressRanges (NFC) The last use was removed on Dec 21, 2018 in commit `c3f30a7fc6`.	2021-08-08 08:57:32 -07:00
Dorit Nuzman	67278b8a90	[LV] Support Interleaved Store Group With Gaps Teach LV to use masked-store to support interleave-store-group with gaps (instead of scatters/scalarization). The symmetric case of using masked-load to support interleaved-load-group with gaps was introduced a while ago, by https://reviews.llvm.org/D53668; This patch completes the store-scenario leftover from D53668, and solves PR50566. Reviewed by: Ayal Zaks Differential Revision: https://reviews.llvm.org/D104750	2021-08-08 10:32:02 +03:00
Min-Yih Hsu	657bb7262d	[M68k] Separate ADDA from ADD and migrate rest of the arithmetic MC tests Previously ADD & ADDA (as well as SUB & SUBA) instructions are mixed together, which not only violated Motorola assembly's syntax but also made asm parsing more difficult. This patch separates these two kinds of instructions migrate rest of the tests from test/CodeGen/M68k/Encoding/Arithmetic to test/MC/M68k/Arithmetic. Note that we observed minor regressions on codegen quality: Sometimes isel uses ADD instead of ADDA even the latter can lead to shorter sequence of code. This issue implies that some isel patterns might need to be updated.	2021-08-07 17:19:12 -07:00
Craig Topper	d4ee84ceee	[RISCV] Support FP_TO_S/UINT_SAT for i32 and i64. The fcvt fp to integer instructions saturate if their input is infinity or out of range, but the instructions produce a maximum integer for nan instead of 0 required for the ISD opcodes. This means we can use the instructions to do the saturating conversion, but we'll need to fix up the nan case at the end. We can probably improve the i8 and i16 default codegen as well, but I'll leave that for a follow up. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107230	2021-08-07 16:06:00 -07:00
Nikita Popov	88003cea1c	[MemCpyOpt] Remove MemDepAnalysis-based implementation The MemorySSA-based implementation has been enabled for a few months (since D94376). This patch drops the old MDA-based implementation entirely. I've kept this to only the basic cleanup of dropping various conditions -- the code could be further cleaned up now that there is only one implementation. Differential Revision: https://reviews.llvm.org/D102113	2021-08-07 22:35:44 +02:00
Krishna	a9a176ca3b	[InstCombine] Remove nnan requirement for transformation to fabs from select In this patch, the "nnan" requirement is removed for the canonicalization of select with fcmp to fabs. (i) FSub logic: Remove check for nnan flag presence in fsub. Example: https://alive2.llvm.org/ce/z/751svg (fsub). (ii) FNeg logic: Remove check for the presence of nnan and nsz flag in fneg. Example: https://alive2.llvm.org/ce/z/a_fsdp (fneg). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D106872	2021-08-07 22:38:45 +05:30
Craig Topper	24dfba8d50	[X86] Teach shouldSinkOperands to recognize pmuldq/pmuludq patterns. The IR for pmuldq/pmuludq intrinsics uses a sext_inreg/zext_inreg pattern on the inputs. Ideally we pattern match these away during isel. It is possible for LICM or other middle end optimizations to separate the extend from the mul. This prevents SelectionDAG from removing it or depending on how the extend is lowered, we may not be able to generate an AssertSExt/AssertZExt in the mul basic block. This will prevent pmuldq/pmuludq from being formed at all. This patch teaches shouldSinkOperands to recognize this so that CodeGenPrepare will clone the extend into the same basic block as the mul. Fixes PR51371. Differential Revision: https://reviews.llvm.org/D107689	2021-08-07 08:45:56 -07:00
Roman Lebedev	0a241e90d4	[NFC][InstCombine] `vector_reduce_xor(?ext(<n x i1>))` --> `?ext(vector_reduce_add(<n x i1>))` Instead of expanding it ourselves, we can just forward to `?ext(vector_reduce_add(<n x i1>))`, as per alive2: https://alive2.llvm.org/ce/z/ymz7zE (self) https://alive2.llvm.org/ce/z/eKu2v2 (skipped zext) https://alive2.llvm.org/ce/z/c3BXgc (skipped sext)	2021-08-07 17:31:33 +03:00
Roman Lebedev	c6ff867f92	[NFC][InstCombine] Simplify emitted IR for `vector_reduce_xor(?ext(<n x i1>))` Now that we canonicalize low bit splatting to the form we were emitting here ourselves, emit simpler IR that will be canonicalized later. See `1e801439be` for proofs: https://alive2.llvm.org/ce/z/MjCm5W (self) https://alive2.llvm.org/ce/z/kgqF4M (skipped zext) https://alive2.llvm.org/ce/z/pgy3HP (skipped sext)	2021-08-07 17:31:24 +03:00
Roman Lebedev	e71870512f	[InstCombine] Prefer `-(x & 1)` as the low bit splatting pattern (PR51305) Both patterns are equivalent (https://alive2.llvm.org/ce/z/jfCViF), so we should have a preference. It seems like mask+negation is better than two shifts.	2021-08-07 17:25:28 +03:00
Christudasan Devadasan	ffc3fb665d	SROA: Enhance speculateSelectInstLoads Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-07 09:09:14 -04:00
Florian Hahn	a00aafc30d	[VPlan] Iterate over phi recipes to detect reductions to fix. After refactoring the phi recipes, we can now iterate over all header phis in a VPlan to detect reductions when it comes to fixing them up when tail folding. This reduces the coupling with the cost model & legal by using the information directly available in VPlan. It also removes a call to getOrAddVPValue, which references the original IR value which may become outdated after VPlan transformations. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100102	2021-08-07 14:06:50 +01:00
Amara Emerson	4c2e01232c	[GlobalISel] Fix a combine causing DBG_VALUE with dangling vregs. We should use MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() instead of eraseFromParent(). We should probably use that in other places too but fix this issue which affects clang bootstrap builds for now.	2021-08-07 01:41:02 -07:00
Nemanja Ivanovic	62fe3dcf98	Fix PPC buildbot break caused by `4c4093e6e3` This commit adds the isnan intrinsic and provides a default expansion for it in the SDAG. However, it makes the assumption that types it operates on are IEEE-compliant types. This is not always the case. An example of that is PPC "double double" which has a representation that - Does not need to conform to IEEE requirements for isnan as it is not an IEEE-compliant type - Does not have a representation that allows for straightforward reinterpreting as an integer and use of integer operations The result was that this commit broke __builtin_isnan for ppc_fp128 making many valid numeric values report a NaN. This patch simply changes the expansion to always expand to unordered comparison (regardless of whether FP exceptions are tracked). This is inline with previous semantics.	2021-08-06 22:10:20 -05:00
Steffen Larsen	1b4c85fc02	[NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5. PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Reviewed By: tra Differential Revision: https://reviews.llvm.org/D107046	2021-08-06 16:13:35 -07:00
Sanjay Patel	0369714b31	[InstCombine] reduce vector casting before icmp There may be some generalizations (see test comments) of these patterns, but this should handle the cases motivated by: https://llvm.org/PR51315 https://llvm.org/PR51259 The backend may want to transform differently, but at least for the x86 examples that I looked at, there does not appear to be any significant perf diff either way.	2021-08-06 17:09:38 -04:00
Michael Liao	05783e1cfe	[amdgpu] Revise the conversion from i64 to f32. - Replace 'cmp+sel' with 'umin' if possible. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107507	2021-08-06 17:01:47 -04:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
Thomas Johnson	f8a4495149	[ARC] Add codegen for llvm.ctlz intrinsic for the ARC backend Differential Revision: https://reviews.llvm.org/D107611	2021-08-06 12:18:06 -07:00
Artem Belevich	6a9cf21f5a	[CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA. Attempt to enable MemCpyOpt unconditionally in D104801 uncovered the fact that there are users that do not expect LLVM to materialize `memset` intrinsic. While other passes can do that, too, MemCpyOpt triggers it more frequently and breaks sanitizers and some downstream users. For now introduce a flag to force-enable the flag and opt-in only CUDA compilation with NVPTX back-end. Differential Revision: https://reviews.llvm.org/D106401	2021-08-06 11:13:52 -07:00
Zheng Chen	30b0c455b1	[LoopCacheAnalysis]: handle mismatch type for Numerator and CacheLineSize fix an assertion due to mismatch type for Numerator and CacheLineSize in loop cache analysis pass. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D107618	2021-08-06 16:51:09 +00:00
Jon Roelofs	eae4a44c1d	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-06 09:48:39 -07:00
Michael Liao	d1cacd5928	[MemCpyOpt] Teach memcpyopt to handle loads from the constant memory. - Loads from the constant memory (either explicit one or as the source of memory transfer intrinsics) won't alias any stores. Reviewed By: asbirlea, efriedma Differential Revision: https://reviews.llvm.org/D107605	2021-08-06 12:43:52 -04:00
Craig Topper	b2ca4dc935	[LegalizeTypes] Add a simple expansion for SMULO when a libcall isn't available. This isn't optimal, but prevents crashing when the libcall isn't available. It just calculates the full product and makes sure the high bits match the sign of the low half. Each of the pieces should go through their own type legalization. This can make D107420 unnecessary. Needs tests, but I wanted to start discussion about D107420. Reviewed By: FreddyYe Differential Revision: https://reviews.llvm.org/D107581	2021-08-06 09:43:01 -07:00
David Green	77e8f4eeee	[ARM] Define ComplexPatternFuncMutatesDAG Some of the Arm complex pattern functions call canExtractShiftFromMul, which can modify the DAG in-place. For this to be valid and handled successfully we need to define ComplexPatternFuncMutatesDAG. Differential Revision: https://reviews.llvm.org/D107476	2021-08-06 17:35:11 +01:00
Kazu Hirata	276be84d0a	[CodeGen] Remove computeDefOperandLatency (NFC) The last use was removed on Oct 9, 2016 in commit `5c924d7117`.	2021-08-06 08:26:55 -07:00
Jay Foad	57b9107e3f	[GlobalISel] Improve widening of cttz/cttz_zero_undef Differential Revision: https://reviews.llvm.org/D107631	2021-08-06 14:25:56 +01:00
Mircea Trofin	ae1a2a09e4	[NFC][MLGO] Make logging more robust 1) add some self-diagnosis (when asserts are enabled) to check that all features have the same nr of entries 2) avoid storing pointers to mutable fields because the proto API contract doesn't actually guarantee those stay fixed even if no further mutation of the object occurs. Differential Revision: https://reviews.llvm.org/D107594	2021-08-06 04:44:52 -07:00
Reshabh Sharma	5173854f19	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-06 15:53:33 +05:30
Simon Pilgrim	dbce6a8d9d	[ARM] Fold insert_subvector to concat_vectors D107068 fixed the same problem on aarch64 but the arm variant wasn't exposed in existing test coverage. I've copied the arm64-neon-copy tests (and stripped the intrinsic test from it) for testing on arm neon builds as well.	2021-08-06 11:21:31 +01:00
Simon Pilgrim	18e6a03b1a	[X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281) As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width. The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data. This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize. Differential Revision: https://reviews.llvm.org/D107158	2021-08-06 11:21:31 +01:00
Cullen Rhodes	08bc441174	[AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst	2021-08-06 09:23:18 +00:00
David Sherwood	3fd96e1b2e	[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-06 10:13:15 +01:00
Chuanqi Xu	0fd03feb4b	[FuncSpec] Return changed if function is changed by tryToReplaceWithConstant The may get changed before specialization by RunSCCPSolver. In other words, the pass may change the function without specialization happens. Add test and comment to reveal this. And it may return No Changed if the function get changed by RunSCCPSolver before the specialization. It looks like a potential bug. Test Plan: check-all Reviewed By: https://reviews.llvm.org/D107622 Differential Revision: https://reviews.llvm.org/D107622	2021-08-06 17:00:17 +08:00
David Sherwood	43a5c750d1	Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors" This reverts commit `95800da914`.	2021-08-06 09:48:16 +01:00
Jay Foad	83610d4eb0	[AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107474	2021-08-06 09:40:48 +01:00
Jay Foad	24b67a9024	[AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef We can improve on the generic splitting by using ffbh/ffbl, which have a defined result when the input is zero. Differential Revision: https://reviews.llvm.org/D107442	2021-08-06 09:40:48 +01:00
Jay Foad	d77b43c385	[AMDGPU][GlobalISel] Add G_AMDGPU_FFBL_B32 This is the counterpart to G_AMDGPU_FFBH_U32 which already exists. These instructions have a defined result of -1 when the input is zero. Differential Revision: https://reviews.llvm.org/D107441	2021-08-06 09:40:48 +01:00
Jay Foad	cd2594e1c6	[GlobalISel] Improve legalization of narrow CTTZ Differential Revision: https://reviews.llvm.org/D107457	2021-08-06 09:40:48 +01:00
Chuanqi Xu	62fc3e0ad6	[NFC] [FuncSpec] Remove unused variables in isArgumentInteresting	2021-08-06 16:38:20 +08:00
Chuanqi Xu	cc3f40bb41	[FuncSpec] Move invariant computation for spec cost out of loop (NFC-ish) Noticed that the computation for function specialization cost of a function wouldn't change during the traversal of the arguments for the function. We could hoist the computation out of the traversal. I observed about ~1% improvement on compile time for spec2017. But I guess it may not be precise. This should be NFC and fine. Reviewed By: Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D107621	2021-08-06 15:43:05 +08:00
Serge Pavlov	4c4093e6e3	Introduce intrinsic llvm.isnan This is recommit of the patch `16ff91ebcc`, reverted in `0c28a7c990` because it had an error in call of getFastMathFlags (base type should be FPMathOperator but not Instruction). The original commit message is duplicated below: Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-06 14:32:27 +07:00
Florian Hahn	3e58dd19df	[LV] Move reduction PHI node fixup to VPlan::execute (NFC). All information to fix-up the reduction phi nodes in the vectorized loop is available in VPlan now. This patch moves the code to do so, to make this clearer. Fixing up the loop exit value still relies on other information and remains outside of VPlan for now. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100113	2021-08-06 08:29:20 +01:00
Chuanqi Xu	82ca845b47	[NFC] [FuncSpec] Update the Todo list for recursive functions Now the recursive functions may get specialized many times when `func-specialization-max-iters` increases. See discussion in https://reviews.llvm.org/D106426 for details.	2021-08-06 14:43:17 +08:00
Amara Emerson	4fee756c75	Delete copy-ctor of MachineFrameInfo. I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo() without declaring the variable as a reference. Deleting the copy-constructor also showed a place in the ARM backend which was doing the same thing, albeit it didn't impact correctness there from the looks of it.	2021-08-05 23:24:37 -07:00
Kai Luo	666ee849f0	[PowerPC] Fix shift amount of xxsldwi when performing vector int_to_double POC ``` // main.c #include <stdio.h> #include <altivec.h> extern vector double foo(vector int s); int main() { vector int s = {0, 1, 0, 4}; vector double vd; vd = foo(s); printf("%lf %lf\n", vd[0], vd[1]); return 0; } // poc.c vector double foo(vector int s) { int x1 = s[1]; int x3 = s[3]; double d1 = x1; double d3 = x3; vector double x = { d1, d3 }; return x; } ``` Compiled with `poc.c main.c -mcpu=pwr8 -O3` on BE machine. Current clang gives ``` 4.000000 1.000000 ``` while xlc gives ``` 1.000000 4.000000 ``` Xlc's output should be correct. Reviewed By: shchenz, #powerpc Differential Revision: https://reviews.llvm.org/D107428	2021-08-06 06:01:29 +00:00
Arthur Eubanks	a1b21ed3fb	[GCov] Emit memset instead of stores in __llvm_gcov_reset For a very large module, __llvm_gcov_reset can become very large. __llvm_gcov_reset previously emitted stores to a bunch of globals in one huge basic block. MemCpyOpt would turn many of these stores into memsets, and updating MemorySSA would be extremely slow. Verified that this makes the compile time of certain files go down drastically (20min -> 5min). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107538	2021-08-05 22:40:15 -07:00
Serge Bazanski	7ece20505f	[Lanai] fix lowering wide returns This implements LanaiTargetLowering::CanLowerReturn, thereby ensuring all return values conform to the RetCC and get sret-demoted as necessary. A regression test is also added that exercises this functionality. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D107086	2021-08-05 21:08:09 -07:00
Jinsong Ji	6f84d94b9c	[PowerPC] Fix copy/paste error in scalar_to_vector patterns https://reviews.llvm.org/D100478 refactoring added a copy/paste error for v8i16 patterns. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D107609	2021-08-06 02:59:01 +00:00
Jessica Paquette	e6a3944ea9	[AArch64][GlobalISel] Overhaul G_INSERT legalization Similar cleanup to G_EXTRACT (`51bd4e874f`). Also swap the order of clamp/widen to avoid unnecessary complex merges. Add a bunch of missing testcases to legalize-inserts while we're at it. Differential Revision: https://reviews.llvm.org/D107601	2021-08-05 18:28:22 -07:00
Jessica Paquette	562c8e14d9	[AArch64][GlobalISel] Widen G_IMPLICIT_DEF and G_FREEZE before clamping Similar to other cleanup commits which widen instructions before clamping during legalization. Purpose of this is to avoid weird type breakdowns. In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions. The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so this can help with emitting merges elsewhere. Differential Revision: https://reviews.llvm.org/D107604	2021-08-05 18:21:14 -07:00
Sean Fertile	23651c5ae0	[PowerPC][AIX] Create multiple constant sections. Fixes issue where late materialized constants can be more strictly aligned then their containing csect. Differential Revision: https://reviews.llvm.org/D103103	2021-08-05 21:19:16 -04:00
Jon Roelofs	5fc7b1a260	Revert "[GlobalISel][KnownBits] Implement G_CTPOP" This reverts commit `ce6eb4f15a`. It's broken on the windows bots: https://reviews.llvm.org/D107606#2930121	2021-08-05 17:47:47 -07:00
Jon Roelofs	ce6eb4f15a	[GlobalISel][KnownBits] Implement G_CTPOP Implementation copied almost verbatim from ValueTracking. Differential revision: https://reviews.llvm.org/D107606	2021-08-05 17:17:29 -07:00
Ryan Prichard	623cf3dfdf	Mark getc_unlocked as unavailable by default Before D45736, getc_unlocked was available by default, but turned off for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked functions, which were unavailable by default, but it also: * left getc_unlocked enabled by default, * removed the disabling line for Windows, and * added code to enable getc_unlocked for GNU, Android, and OSX. For consistency, make getc_unlocked unavailable by default. Maybe this was the intent of D45736 anyway. Reviewed By: MaskRay, efriedma Differential Revision: https://reviews.llvm.org/D107527	2021-08-05 16:35:02 -07:00
Jessica Paquette	8a557d8311	[AArch64][GlobalISel] Widen extloads before clamping during legalization Allows us to avoid awkward type breakdowns on types like s88, like the other commits. Differential Revision: https://reviews.llvm.org/D107587	2021-08-05 16:14:06 -07:00
Stanislav Mekhanoshin	d71924fbfe	[AMDGPU] Improve v2i32/v2f32 insertelt patterns Using REG_SEQUENCE produces better code than INSERT_SUBREG, we can omit one move instruction in many cases. Fixes: SWDEV-298028 Differential Revision: https://reviews.llvm.org/D107602	2021-08-05 16:13:39 -07:00
Heejin Ahn	41ba39dfcd	[WebAssembly] Don't do SjLj transformation when there's only setjmp When there is a `setjmp` call in a function, we transform every callsite of `setjmp` to record its information by calling `saveSetjmp` function, and we also transform every callsite of a function that can longjmp to to check if a longjmp occurred and if so jump to the corresponding post-setjmp BB. Currently we are doing this for every function that contains a call to `setjmp`, but if there is no other function call within that function that can longjmp, this transformation of `setjmp` callsite and all the preparation of `setjmpTable` in the entry of the function are not necessary. This checks if a setjmp-calling function has any other calls that can longjmp, and if not, skips the function for the SjLj transformation. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107530	2021-08-05 15:28:02 -07:00
David Green	649cf4514d	[AArch64] Expand the SVE min/max reduction costs to NEON This takes the existing SVE costing for the various min/max reduction intrinsics and expands it to NEON, where I believe it applies equally well. In the process it changes the lowering to use min/max cost, as opposed to summing up the cost of ICmp+Select. Differential Revision: https://reviews.llvm.org/D106239	2021-08-05 23:23:24 +01:00
Jessica Paquette	36498374d4	[AArch64][GlobalISel] Widen G_BSWAP before clamping This allows us to avoid odd type breakdowns + allows us to legalize types like s88 in the first place. Add some testcases for known legal types + testcases for s4 and s88. Differential Revision: https://reviews.llvm.org/D107607	2021-08-05 15:16:00 -07:00
Jessica Paquette	51bd4e874f	[AArch64][GlobalISel] Overhaul G_EXTRACT legalization This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly changing this because it should make it easier to improve legalization for instructions which use G_EXTRACT as part of the legalization process. This also adds support for legalizing some weird types. Similar to other recent legalizer changes, this changes the order of widening/clamping. There was some dead code in our existing rules (e.g. the p0 case would never get hit), so this knocks those out and makes the types we want to handle explicit. This also removes some checks which, nowadays, are handled by the MachineVerifier. Differential Revision: https://reviews.llvm.org/D107505	2021-08-05 13:55:15 -07:00
Jon Roelofs	98f38c151b	[AArch64][GlobalISel] Legalize ctpop s128 This is re-landing the same patch again, but without the changes to LegalizerHelper that regressed the Mips test: test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll Differential revision: https://reviews.llvm.org/D106494	2021-08-05 11:54:53 -07:00
Chris Jackson	113a06f7a5	{DebugInfo][LSR] Don't cache dbg.value that are already undef The SCEV-based salvaging method caches dbg.value information pre-LSR so that salvaging may be attempted post-LSR. If the dbg.value are already undef pre-LSR then a salvage attempt would be fruitless, so avoid caching them. Reviewed By: StephenTozer Differential Revision: https://reviews.llvm.org/D107448	2021-08-05 19:16:43 +01:00
Roman Lebedev	c0586ff05d	[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef` to it is confusing. Instead, just move the `Mask` vector higher up, and change the code that earlier had no access to it but now does to use `Mask` instead of `BaseMask`. This has no other intentional changes. This is a recommit of `35c0848b57`, that was reverted to simplify reversion of an earlier change.	2021-08-05 20:37:51 +03:00
Ramesh Peri	976bd23612	[llvm-ar] Fix for handling thin archive with SYM64 and a test case for it WHen thin archives are created which have symbol table of type SYM64 then all the tools will not work since they cannot read the files properly. One can reproduce the problem as follows: 1. Take a hello world program and create an archive out of it. The SYM64_THRESHOLD=0 will force the generation of SYM64 symbol table. clang -c hello.cpp SYM64_THRESHOLD=0 llvm-ar crsT mylib.a hello.o 2. Now try to use any of the tools on this mylib.a and it will fail. llvm-nm -M mylib.a THis fix will eliminate these failures. A regression test is created in llvm/test/Object/archive-symtab.test Reviewed By: MaskRay, Ramesh Differential Revision: https://reviews.llvm.org/D107322	2021-08-05 10:06:34 -07:00
Benjamin Kramer	bd17ced1db	Revert "[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats" This reverts commits `f819e4c7d0` and `35c0848b57`. It triggers an infinite loop during compilation. $ cat t.ll target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define void @MaxPoolGradGrad_1.65() local_unnamed_addr #0 { entry: %wide.vec78 = load <64 x i32>, <64 x i32>* null, align 16 %strided.vec83 = shufflevector <64 x i32> %wide.vec78, <64 x i32> poison, <8 x i32> <i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60> %0 = lshr <8 x i32> %strided.vec83, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16> %1 = add <8 x i32> zeroinitializer, %0 %2 = shufflevector <8 x i32> %1, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15> %3 = shufflevector <16 x i32> %2, <16 x i32> undef, <32 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31> %interleaved.vec = shufflevector <32 x i32> undef, <32 x i32> %3, <64 x i32> <i32 0, i32 8, i32 16, i32 24, i32 32, i32 40, i32 48, i32 56, i32 1, i32 9, i32 17, i32 25, i32 33, i32 41, i32 49, i32 57, i32 2, i32 10, i32 18, i32 26, i32 34, i32 42, i32 50, i32 58, i32 3, i32 11, i32 19, i32 27, i32 35, i32 43, i32 51, i32 59, i32 4, i32 12, i32 20, i32 28, i32 36, i32 44, i32 52, i32 60, i32 5, i32 13, i32 21, i32 29, i32 37, i32 45, i32 53, i32 61, i32 6, i32 14, i32 22, i32 30, i32 38, i32 46, i32 54, i32 62, i32 7, i32 15, i32 23, i32 31, i32 39, i32 47, i32 55, i32 63> store <64 x i32> %interleaved.vec, <64 x i32>* undef, align 16 unreachable } $ llc < t.ll -mcpu=skylake <hang>	2021-08-05 18:58:08 +02:00
Jessica Paquette	f3f3098afe	[AArch64][GlobalISel] Mark v16s8 <- v8s8, v8s8 G_CONCAT_VECTOR as legal G_CONCAT_VECTORS shows up from time to time when legalizing other instructions. We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it as legal gives us selection for free. Differential Revision: https://reviews.llvm.org/D107512	2021-08-05 09:40:46 -07:00
Kazu Hirata	72661f337a	[Transforms] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-08-05 08:53:17 -07:00
Alexey Bataev	e7c3eaa8ae	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107494	2021-08-05 08:41:24 -07:00
Craig Topper	f7076cfd3a	[DAGCombiner][RISCV][AMDGPU] Call SimplifyDemandedBits at the end of visitMULHU to enable known bits contant folding. We don't have real demanded bits support for MULHU, but we can still use the known bits based constant folding support at the end of SimplifyDemandedBits to simplify a MULHU. This helps with cases where we know the LHS and RHS have enough leading zeros so that the high multiply result is always 0. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106471	2021-08-05 08:31:26 -07:00
David Sherwood	e9177b0958	Fix build issues caused by `95800da914`	2021-08-05 16:26:34 +01:00
Sander de Smalen	3e47f009ff	[LV] Consider ExtractValue as uniform. Since all operands to ExtractValue must be loop-invariant when we deem the loop vectorizable, we can consider ExtractValue to be uniform. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107286	2021-08-05 16:20:50 +01:00
eopXD	fd7f6a3c81	[NFC][LoopIdiom] rename boolean variable NegStride to IsNegStride Rename variable for better code readability. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107570	2021-08-05 23:11:42 +08:00
Jay Foad	2b63933115	[AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107566	2021-08-05 15:57:40 +01:00
Jay Foad	e6c364a624	[AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107546	2021-08-05 15:57:40 +01:00
Momchil Velikov	f171149e0d	[SimpifyCFG] Speculate a store preceded by a local non-escaping load In SimplifyCFG we may simplify the CFG by speculatively executing certain stores, when they are preceded by a store to the same location. This patch allows such speculation also when the stores are similarly preceded by a load. In order for this transformation to be correct we need to ensure that the memory location is writable and the store in the new location does not introduce a data race. Local objects (created by an `alloca` instruction) are always writable, so once we are past a read from a location it is valid to also write to that same location. Seeing just a load does not guarantee absence of a data race (unlike if we see a store) - the load may still be part of a race, just not causing undefined behaviour (cf. https://llvm.org/docs/Atomics.html#optimization-outside-atomic). In the original program, a data race might have been prevented by the condition, but once we move the store outside the condition, we must be sure a data race wasn't possible anyway, no matter what the condition evaluates to. One way to be sure that a local object is never concurrently read/written is check that its address never escapes the function. Hence this transformation is restricted to local, non-escaping objects. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D107281	2021-08-05 15:54:42 +01:00
Simon Pilgrim	2cbf9fd402	[DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes. This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector. This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector. Fixes PR50053 Differential Revision: https://reviews.llvm.org/D107068	2021-08-05 15:40:48 +01:00
Florian Hahn	38b098be66	[VectorCombine] Limit scalarization known non-poison indices. We can only trust the range of the index if it is guaranteed non-poison. Fixes PR50949. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107364	2021-08-05 15:36:31 +01:00
Dawid Jurczak	06206a8cd1	[BuildLibCalls][NFC] Remove redundant attribute list from emitCalloc Additionally with this patch aligned DSE which is the only user of emitCalloc. Differential Revision: https://reviews.llvm.org/D103523	2021-08-05 16:18:38 +02:00
David Sherwood	95800da914	[LoopVectorize] Add support for replication of more intrinsics with scalable vectors This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-05 15:17:27 +01:00
Dawid Jurczak	f8cdde7195	[SimplifyLibCalls][NFC] Clean up LibCallSimplifier from 'memset + malloc into calloc' transformation FoldMallocMemset can be safely removed because since https://reviews.llvm.org/D103009 such transformation is already performed in DSE. Differential Revision: https://reviews.llvm.org/D103451	2021-08-05 16:08:32 +02:00
Krzysztof Parzyszek	d0c3b61498	Delay initialization of OptBisect When LLVM is used in other projects, it may happen that global cons- tructors will execute before the call to ParseCommandLineOptions. Since OptBisect is initialized via a constructor, and has no ability to be updated at a later time, passing "-opt-bisect-limit" to the parse function may have no effect. To avoid this problem use a cl::cb (callback) to set the bisection limit when the option is actually processed. Differential Revision: https://reviews.llvm.org/D104551	2021-08-05 09:04:17 -05:00
Bardia Mahjour	0e08891ec1	[DA] control compile-time spent by MIV tests Function exploreDirections() in DependenceAnalysis implements a recursive algorithm for refining direction vectors. This algorithm has worst-case complexity of O(3^(n+1)) where n is the number of common loop levels. In this patch I'm adding a threshold to control the amount of time we spend in doing MIV tests (which most of the time end up resulting in over pessimistic direction vectors anyway). Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D107159	2021-08-05 09:50:11 -04:00
Sander de Smalen	8d08a84745	[LV] Remove a change that was added in D106164. This change wasn't strictly necessary for D106164 and could be removed. This patch addresses the post-commit comments from @fhahn on D106164, and also changes sve-widen-gep.ll to use the same IR test as shown in pointer-induction.ll. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106878	2021-08-05 14:44:53 +01:00
Paul Robinson	75aa3d520d	Add a DIExpression const-folder to prevent silly expressions. It's entirely possible (because it actually happened) for a bool variable to end up with a 256-bit DW_AT_const_value. This came about when a local bool variable was initialized from a bitfield in a 32-byte struct of bitfields, and after inlining and constant propagation, the variable did have a constant value. The sequence of optimizations had it carrying "i256" values around, but once the constant made it into the llvm.dbg.value, no further IR changes could affect it. Technically the llvm.dbg.value did have a DIExpression to reduce it back down to 8 bits, but the compiler is in no way ready to emit an oversized constant and a DWARF expression to manipulate it. Depending on the circumstances, we had either just the very fat bool value, or an expression with no starting value. The sequence of optimizations that led to this state did seem pretty reasonable, so the solution I came up with was to invent a DWARF constant expression folder. Currently it only does convert ops, but there's no reason it couldn't do other ops if that became useful. This broke three tests that depended on having convert ops survive into the DWARF, so I added an operator that would abort the folder to each of those tests. Differential Revision: https://reviews.llvm.org/D106915	2021-08-05 06:14:40 -07:00
Petar Avramovic	66de26b1f9	GlobalISel: Fix matchEqualDefs for instructions with multiple defs Instructions that produceSameValue produce same values for operands with same index. matchEqualDefs used to return true for any two values from different instructions that produce same values. Fix this by checking if values are defined by operands with the same index. Differential Revision: https://reviews.llvm.org/D107362	2021-08-05 15:05:45 +02:00
Simon Pilgrim	e78bf49a58	[X86] Rename Subtarget Tuning Feature Flag Prefix. NFC. As suggested on D107370, this patch renames the tuning feature flags to start with 'Tuning' instead of 'Feature'. Differential Revision: https://reviews.llvm.org/D107459	2021-08-05 13:09:23 +01:00
Dominik Montada	cc947e29ea	[GlobalISel] Combine shr(shl x, c1), c2 to G_SBFX/G_UBFX Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107330	2021-08-05 13:52:10 +02:00
Fraser Cormack	0b8471e91b	[SelectionDAG] Correctly determine the VECREDUCE_SEQ_FMUL action The LegalizeAction for this node should follow the logic for `VECREDUCE_SEQ_FADD` and be determined using the vector operand's type. here isn't an in-tree target that makes use of this, but I think it's safe to say this is how it should behave, should a target want to customize the action for this node. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107478	2021-08-05 09:42:33 +01:00
Jay Foad	e790b2b744	[AMDGPU] Make more use of getHiHalf64 and split64BitValue. NFCI.	2021-08-05 09:36:13 +01:00
Igor Kudrin	2c14798ead	[ARM][llvm-objdump] Annotate PC-relative memory operands of VLDR instructions This extends D105979 and adds support for VLDR instructions. Differential Revision: https://reviews.llvm.org/D105980	2021-08-05 14:11:11 +07:00
Igor Kudrin	ddbe812bcc	[ARM][llvm-objdump] Annotate PC-relative memory operands This implements `MCInstrAnalysis::evaluateMemoryOperandAddress()` for Arm so that the disassembler can print the target address of memory operands that use PC+immediate addressing. Differential Revision: https://reviews.llvm.org/D105979	2021-08-05 14:11:11 +07:00
eopXD	26aa1bbe97	[NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned Letting it take SCEV allows further modification on the function to optimize if the StoreSize / Stride is runtime determined. This is a preceeding of D107353. The big picture is to let LoopIdiom deal with runtime-determined sizes. Reviewed By: Whitney, lebedev.ri Differential Revision: https://reviews.llvm.org/D104595	2021-08-05 13:21:48 +08:00
Heejin Ahn	aa0b0fbbe6	[WebAssembly] Use `SDValue::getConstantOperandVal` (NFC) Reviewed By: tlively Differential Revision: https://reviews.llvm.org/D107499	2021-08-04 21:15:23 -07:00
Nathan Lanza	5848166369	Disable LibFuncs for stpcpy and stpncpy for Android < 21 These functions don't exist in android API levels < 21. A change in llvm-12 (rG6dbf0cfcf789) caused Oz builds to emit this symbol assuming it's available and thus is causing link errors. Simply disable it here. Differential Revision: https://reviews.llvm.org/D107509	2021-08-04 22:48:41 -04:00
Matt Jacobson	75abeb64ce	[AVR] emit 'MCSA_Global' references to '__do_global_ctors' and '__do_global_dtors' Emit references to '__do_global_ctors' and '__do_global_dtors' to allow constructor/destructor routines to run. Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D107133	2021-08-05 10:37:36 +08:00
modimo	041b525141	[CSSPGO] Remove used of PseudoProbeAttributes::Reserved D106861 added usage of PseudoProbeAttributes::Reserved as TailCall however this usage hasn't been committed/reviewed. Removing this usage. Testing ninja check-all Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D107514	2021-08-04 17:23:56 -07:00
Yonghong Song	e52946b9ab	BPF: avoid NE/EQ loop exit condition Kuniyuki Iwashima reported in [1] that llvm compiler may convert a loop exit condition with "i < bound" to "i != bound", where "i" is the loop index variable and "bound" is the upper bound. In case that "bound" is not a constant, verifier will always have "i != bound" true, which will cause verifier failure since to verifier this is an infinite loop. The fix is to avoid transforming "i < bound" to "i != bound". In llvm, the transformation is done by IndVarSimplify pass. The compiler checks loop condition cost (i = i + 1) and if the cost is lower, it may transform "i < bound" to "i != bound". This patch implemented getArithmeticInstrCost() in BPF TargetTransformInfo class to return a higher cost for such an operation, which will prevent the transformation for the test case added in this patch. [1] https://lore.kernel.org/netdev/1994df05-8f01-371f-3c3b-d33d7836878c@fb.com/ Differential Revision: https://reviews.llvm.org/D107483	2021-08-04 16:54:16 -07:00
Jessica Paquette	ca2e053652	[AArch64][GlobalISel] Legalize wide vector G_PHIs Clamp the max number of elements when legalizing G_PHI. This allows us to legalize some common fallbacks like 4 x s64. Here's an example: https://godbolt.org/z/6YocsEYTd Had to add -global-isel-abort=0 to legalize-phi.mir to account for the G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI. Differential Revision: https://reviews.llvm.org/D107508	2021-08-04 16:48:59 -07:00
Heejin Ahn	31a71a393f	[WebAssembly] Make result of 'catch' inst variadic `catch` instruction can have any number of result values depending on its tag, but so far we have only needed a single i32 return value for C++ exception so the instruction was specified that way. But using the instruction for SjLj handling requires multiple return values. This makes `catch` instruction's results variadic and moves selection of `throw` and `catch` instruction from ISelLowering to ISelDAGToDAG. Moving `catch` to ISelDAGToDAG is necessary because I am not aware of a good way to do instruction selection for variadic output instructions in TableGen. This also moves `throw` because 1. `throw` and `catch` share the same utility function and 2. there is really no reason we should do that in ISelLowering in the first place. What we do is mostly the same in both places, and moving them to ISelDAGToDAG allows us to remove unnecessary mid-level nodes for `throw` and `catch` in WebAssemblyISD.def and WebAssemblyInstrInfo.td. This also adds handling for new `catch` instruction to AsmTypeCheck. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D107423	2021-08-04 14:05:33 -07:00
Fangrui Song	9c19b36f1c	[X86] Remove -x86-experimental-pref-loop-alignment in favor of -align-loops	2021-08-04 13:23:57 -07:00
Fangrui Song	a194438615	[CodeGen] Add -align-loops to `lib/CodeGen/CommandFlags.cpp`. It can replace -x86-experimental-pref-loop-alignment=. The loop alignment is only used by MachineBlockPlacement. The implementation uses a new `llvm::TargetOptions` for now, as an IR function attribute/module flags metadata may be overkill. This is the llvm part of D106701.	2021-08-04 12:45:18 -07:00
Michael Liao	5edc886e90	[amdgpu] Add an enhanced conversion from i64 to f32. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D107187	2021-08-04 15:33:12 -04:00
Nikita Popov	bb15861e14	[MemCpyOpt] Relax libcall checks Rather than blocking the whole MemCpyOpt pass if the libcalls are not available, only disable creation of new memset/memcpy intrinsics where only load/stores were used previously. This only affects the store merging and load-store conversion optimization. Other optimizations are derived from existing intrinsics, which are well-defined in the absence of libcalls -- not having the libcalls just means that call simplification won't convert them to intrinsics. This is a weaker variation of D104801, which dropped these checks entirely. Ideally we would not couple emission of intrinsics to libcall availability at all, but as the intrinsics may be legalized to libcalls we need to be a bit careful right now. Differential Revision: https://reviews.llvm.org/D106769	2021-08-04 21:17:51 +02:00
Giorgis Georgakoudis	29a3e3dd7b	[OpenMPOpt] Expand SPMDization with guarding for target parallel regions This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106892	2021-08-04 11:49:24 -07:00
Craig Topper	c23405174a	[DAGCombiner][AMDGPU] Canonicalize constants to the RHS of MULHU/MULHS. This allows special constants like to 0 to be recognized. It's also expected by isel patterns if a target had a mulh with immediate instructions. The commuting done by tablegen won't commute patterns with immediates since it expects DAGCombine to have done it. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D107486	2021-08-04 11:39:23 -07:00
Alexey Bataev	214f99b27c	Revert "[SLP]Do not emit extra shuffle for insertelements vectorization." This reverts commit `871ea69803` to fix the problem if the first vector is not just undef.	2021-08-04 11:28:59 -07:00
Reshabh Sharma	dce35ef104	Revert "[AMDGPU] Handle functions in llvm's global ctors and dtors list" This reverts commit `d42e70b3d3`.	2021-08-04 23:33:31 +05:30
Dawid Jurczak	238139be09	[DSE][NFC] Clean up DeadStoreElimination from unused variables Differential Revision: https://reviews.llvm.org/D106446	2021-08-04 19:44:40 +02:00
Craig Topper	643ce70a64	[RISCV] Remove the _COMMUTABLE and _TA versions of FMA and wide FMA vector instructions. Use a tail policy operand instead. Inspired by the work in D105092, but without the intrinsic interface changes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106512	2021-08-04 10:39:50 -07:00
Jessica Paquette	d9279843b1	[AArch64][GlobalISel] Widen G_PHI before clamping it during legalization This allows us to handle weird types like s88; we first widen to s128, then clamp back down to s64. https://godbolt.org/z/9xqbP46Mz Also this makes it possible for GISel to legalize the case in pr48188.ll. It now does the same thing as SDAG, although regalloc chooses different registers. Differential Revision: https://reviews.llvm.org/D107417	2021-08-04 10:25:14 -07:00
Jessica Paquette	7d97de60b3	[AArch64][GlobalISel] Widen G_FPTO*I before clamping Going through our legalization rules and doing some cleanup. Widening and then clamping is usually easier than clamping and then widening. This allows us to legalize some weird types like s88. Differential Revision: https://reviews.llvm.org/D107413	2021-08-04 10:19:26 -07:00
Petr Hosek	6660cec568	[InstrProfiling] Emit bias variable eagerly Rather than emitting the bias variable lazily as needed, emit it eagerly. This allows profile runtime to refer to this variable unconditionally without having to use the weak reference. The bias variable is in a COMDAT so there'll never be more than one instance, and if it's not needed, linker should be able to GC it, so the overhead should be minimal. Differential Revision: https://reviews.llvm.org/D107377	2021-08-04 10:17:08 -07:00
Andrea Di Biagio	7a1a35a1d1	[X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322). This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in the RM/MR variants of ADC/SBB (PR51318) This also fixes the absence of ReadAdvance for the register operand of RMW arithmetic instructions (PR51322). Differential Revision: https://reviews.llvm.org/D107367	2021-08-04 17:50:22 +01:00
jamesluox	ee7d20e846	[CSSPGO] Migrate and refactor the decoder of Pseudo Probe Migrate pseudo probe decoding logic in llvm-profgen to MC, so other LLVM-base program could reuse existing codes. Redesign object layout of encoded and decoded pseudo probes. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D106861	2021-08-04 09:21:34 -07:00
Sander de Smalen	fe6ae81ef3	[InstCombine] Fix vscale zext/sext optimization when vscale_range is unbounded. According to the LangRef, a (vscale_range) value of 0 means unbounded. This patch additionally cleans up the test file vscale_sext_and_zext.ll.	2021-08-04 17:17:37 +01:00
Bradley Smith	d9cc5d84e4	[AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores An insert subvector that is inserting the result of a vector predicate sized load into undef at index 0, whose result is casted to a predicate type, can be combined into a direct predicate load. Likewise the same applies to extract subvector but in reverse. The purpose of this optimization is to clean up cases that will be introduced in a later patch where casts to/from predicate types from i8 types will use insert subvector, rather than going through memory early. This optimization is done in SVEIntrinsicOpts rather than InstCombine to re-introduce scalable loads as late as possible, to give other optimizations the best chance possible to do a good job. Differential Revision: https://reviews.llvm.org/D106549	2021-08-04 15:51:14 +00:00
Simon Wallis	9269752671	[AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults Don't know how to custom expand this UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788 The fix is to provide missing expansions for: case ISD::STRICT_FP_TO_UINT: case ISD::STRICT_FP_TO_SINT: A test case is provided. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107452	2021-08-04 16:18:19 +01:00
Chris Jackson	21ee38e24f	[DebugInfo][LSR] Avoid crashes on large integer inputs SCEV-based salvaging in LSR translates SCEVs to DIExpressions. SCEVs may contain very large integers but the translation does not support integers greater than 64 bits. This patch adds checks to ensure conversions of these large integers is not attempted. A regression test is added to ensure no such translation is attempted. Reviewed by: StephenTozer PR: https://bugs.llvm.org/show_bug.cgi?id=51329 Differential Revision: https://reviews.llvm.org/D107438	2021-08-04 15:51:22 +01:00
Reshabh Sharma	d42e70b3d3	[AMDGPU] Handle functions in llvm's global ctors and dtors list This patch introduces a new code object metadata field, ".kind" which is used to add support for init and fini kernels. HSAStreamer will use function attributes, "device-init" and "device-fini" to distinguish between init and fini kernels from the regular kernels and will emit metadata with ".kind" set to "init" and "fini" respectively. To reduce the number of init and fini kernels, the ctors and dtors present in the llvm's global.ctors and global.dtors lists are called from a single init and fini kernel respectively. Reviewed by: yaxunl Differential Revision: https://reviews.llvm.org/D105682	2021-08-04 19:53:33 +05:30
Roman Lebedev	35c0848b57	[NFC][X86] combineX86ShuffleChain(): hoist Mask variable higher up Having `NewMask` outside of an if and rebinding `BaseMask` `ArrayRef` to it is confusing. Instead, just move the `Mask` vector higher up, and change the code that earlier had no access to it but now does to use `Mask` instead of `BaseMask`. This has no other intentional changes.	2021-08-04 17:15:12 +03:00
Roman Lebedev	916cdc3d4b	[NFC][X86] combineX86ShuffleChain(): rename inner Mask to avoid future shadowing I want to hoist `Mask` variable higher up, but then it would clash with this one. So let's rename this one first. There are no other intentional changes here other than said rename.	2021-08-04 17:15:12 +03:00
Tomas Matheson	40650f27b5	[ARM][atomicrmw] Fix CMP_SWAP_32 expand assert This assert is intended to ensure that the high registers are not selected when it is passed to one of the thumb UXT instructions. However it was triggering even for 32 bit where no UXT instruction is emitted. Fixes PR51313. Differential Revision: https://reviews.llvm.org/D107363	2021-08-04 15:02:02 +01:00
Roman Lebedev	f819e4c7d0	[X86] combineX86ShuffleChain(): canonicalize mask elts picking from splats Given a shuffle mask, if it is picking from an input that is splat given the current granularity of the shuffle, then adjust the mask to pick from the same lane of the input as the mask element is in. This may result in a shuffle being simplified into a blend. I believe this is correct given that the splat detection matches the one just above the new code, My basic thought is that we might be able to get less regressions by handling multiple insertions of the same value into a vector if we form broadcasts+blend here, as opposed to D105390, but i have not really thought this through, and did not try implementing it yet. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107009	2021-08-04 16:55:04 +03:00
David Green	eeddcba525	[RDA] Attempt to make RDA subreg aware This attempts to make more of RDA aware of potentially overlapping subregisters. Some of this was already in place, with it iterating through MCRegUnitIterators. This also replaces calls to LiveRegs.contains(..) with !LiveRegs.available(..), and updates the isValidRegUseOf and isValidRegDefOf to search subregs. Differential Revision: https://reviews.llvm.org/D107351	2021-08-04 14:21:32 +01:00
Simon Pilgrim	8cd40ece70	[X86] Rename X86 tuning feature flag FeatureHasFastGather -> FeatureFastGather Match the naming style used by the other 'FeatureFast/FeatureSlow' tuning flags.	2021-08-04 13:07:50 +01:00
Simon Pilgrim	17e8ac0703	[X86] Move FeatureFastBEXTR from bdver2 features to tuning Noticed while looking at the feature flag renaming suggested in D107370	2021-08-04 13:07:49 +01:00
Serge Pavlov	0c28a7c990	Revert "Introduce intrinsic llvm.isnan" This reverts commit `16ff91ebcc`. Several errors were reported mainly test-suite execution time. Reverted for investigation.	2021-08-04 17:18:15 +07:00
Simon Pilgrim	fc8dee1ebb	[X86] Split Subtarget ISA / Security / Tuning Feature Flags Definitions. NFC Our list of slow/fast tuning feature flags has become pretty extensive and is randomly interleaved with ISA and Security (Retpoline etc.) flags, not even based on when the ISAs/flags were introduced, making it tricky to locate them. Plus we started treating tuning flags separately some time ago, so this patch tries to group the flags to match. I've left them mostly in the same order within each group - I'm happy to rearrange them further if there are specific ISA or Tuning flags that you think should be kept closer together. Differential Revision: https://reviews.llvm.org/D107370	2021-08-04 11:16:36 +01:00
Tim Northover	d7b0e5525a	X86: fix frame offset calculation with mandatory tail calls If there's a region of the stack reserved for potential tail call arguments (only the case when we guarantee tail calls will be honoured), this is right next to the incoming stored return address, not necessarily next to the callee-saved area, so combining the two into a single figure leads to incorrect offsets in some edge cases.	2021-08-04 10:02:42 +01:00
Serge Pavlov	16ff91ebcc	Introduce intrinsic llvm.isnan Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854	2021-08-04 15:27:49 +07:00
Sjoerd Meijer	30fbb06979	[FuncSpec] Support specialising recursive functions This adds support for specialising recursive functions. For example: int Global = 1; void recursiveFunc(int arg) { if (arg < 4) { print(arg); recursiveFunc(arg + 1); } } void main() { recursiveFunc(&Global); } After 3 iterations of function specialisation, followed by inlining of the specialised versions of recursiveFunc, the main function looks like this: void main() { print(1); print(2); print(3); } To support this, the following has been added: - Update the solver and state of the new specialised functions, - An optimisation to propagate constant stack values after each iteration of function specialisation, which is necessary for the next iteration to recognise the constant values and trigger. Specialising recursive functions is (at the moment) controlled by option -func-specialization-max-iters and is opt-in for compile-time reasons. I.e., the default is -func-specialization-max-iters=1, but for the example above we would need to use -func-specialization-max-iters=3. Future work is to see if we can increase the default, or improve the cost-model/heuristics to control compile-times. Differential Revision: https://reviews.llvm.org/D106426	2021-08-04 08:07:04 +01:00

... 3 4 5 6 7 ...

149978 Commits