llvm-project

Commit Graph

Author	SHA1	Message	Date
Jordan Rupprecht	c90f15d25a	[NFC] Fix unused var in release build	2020-09-01 13:05:56 -07:00
Florian Hahn	0d966ae4b2	[Loads] Add canReplacePointersIfEqual helper. This patch adds an initial, incomeplete and unsound implementation of canReplacePointersIfEqual to check if a pointer value A can be replaced by another pointer value B, that are deemed to be equivalent through some means (e.g. information from conditions). Note that is in general not sound to blindly replace pointers based on equality, for example if they are based on different underlying objects. LLVM's memory model is not completely settled as of now; see https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed discussion. The initial version of canReplacePointersIfEqual only rejects a very specific case: replacing a pointer with a constant expression that is not dereferenceable. Such a replacement is problematic and can be restricted relatively easily without impacting most code. Using it to limit replacements in GVN/SCCP/CVP only results in small differences in 7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto. This patch is supposed to be an initial step to improve the current situation and the helper should be made stricter in the future. But this will require careful analysis of the impact on performance. Reviewed By: aqjune Differential Revision: https://reviews.llvm.org/D85524	2020-09-01 20:57:41 +01:00
Aaron Liu	d7e16ca28f	[LV] Interleave to expose ILP for small loops with scalar reductions. Interleave for small loops that have reductions inside, which breaks dependencies and expose. This gives very significant performance improvements for some benchmarks. Because small loops could be in very hot functions in real applications. Differential Revision: https://reviews.llvm.org/D81416	2020-09-01 19:47:32 +00:00
Craig Topper	4783e2c9c6	[MachineCopyPropagation] In isNopCopy, check the destination registers match in addition to the source registers. Previously if the source match we asserted that the destination matched. But GPR <-> mask register copies on X86 can violate this since we use the same K-registers for multiple sizes. Fixes this ISPC issue https://github.com/ispc/ispc/issues/1851 Differential Revision: https://reviews.llvm.org/D86507	2020-09-01 12:44:32 -07:00
Arthur Eubanks	96f0b57568	[Bindings] Add LLVMAddInstructionSimplifyPass Reviewed By: sroland Differential Revision: https://reviews.llvm.org/D86764	2020-09-01 12:38:49 -07:00
Owen Anderson	5987da8764	Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"" This reverts commit `bc9a29b9ee`. The reasoning that this patch was wrong was itself incorrect (see discussion on llvm-commits). This patch does seem to be exposing a latent SVE code generation bug on non-public tests, which should not block a correctness fix for public, non-SVE use cases.	2020-09-01 19:29:03 +00:00
Alina Sbirlea	c292fba46f	[MemorySSA] Update phi map with replacement value.	2020-09-01 11:56:40 -07:00
Sean Fertile	fecc27db11	[PowerPC][AIX] Update save/restore offset for frame and base pointers. General purpose registers 30 and 31 are handled differently when they are reserved as the base-pointer and frame-pointer respectively. This fixes the offset of their fixed-stack objects when there are fpr calle-saved registers. Differential Revision: https://reviews.llvm.org/D85850	2020-09-01 14:13:05 -04:00
Craig Topper	96ae43bad5	[Bitstream] Use alignTo to make code more readable. NFC I was recently debugging a similar issue to https://reviews.llvm.org/D86500 only with a large metadata section. Only after I finished debugging it did I discover it was fixed very recently. My version of the fix was going to alignTo since that uses uint64_t and improves the readability of the code. So I though I would go ahead and share it. Differential Revision: https://reviews.llvm.org/D86957	2020-09-01 11:06:45 -07:00
Amara Emerson	5ded444252	[AArch64][GlobalISel] Optimize away a Not feeding a brcond by using tbz instead of tbnz. Usually brconds are fed by compares, but not always, in which case we would miss this fold. Differential Revision: https://reviews.llvm.org/D86413	2020-09-01 11:06:06 -07:00
Amara Emerson	8ad8f484b6	[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _) This is needed for an upcoming change to how we translate conditional branches which might generate these. Differential Revision: https://reviews.llvm.org/D86383	2020-09-01 10:57:17 -07:00
Eric Astor	a57fdcdd40	x87 FPU state instructions do not use an f32 memory location These instructions actually use a 512-byte location, where bytes 464-511 are ignored. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86942	2020-09-01 13:50:07 -04:00
Matt Arsenault	32a8a10b42	GlobalISel: Implement computeNumSignBits for G_SELECT	2020-09-01 12:50:19 -04:00
Matt Arsenault	35c94d3f7e	GlobalISel: Port smarter known bits for umin/umax from DAG	2020-09-01 12:50:15 -04:00
Matt Arsenault	759482ddaa	GlobalISel: Implement computeKnownBits for G_BSWAP and G_BITREVERSE	2020-09-01 12:49:57 -04:00
Qiu Chaofan	29ae448595	[PowerPC] Handle STRICT_FSETCC(S) in more cases On -O0, i1 strict_fsetcc will be promoted to i32. We don't handle that in TD patterns. This patch fills logic in PPCISelDAGToDAG to handle more cases. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D86595	2020-09-02 00:33:21 +08:00
Sam Tebbs	15e880a04f	[DAGCombiner] Fold an AND of a masked load into a zext_masked_load This patch folds an AND of a masked load and build vector into a zero extended masked load. Differential Revision: https://reviews.llvm.org/D86789	2020-09-01 17:02:07 +01:00
Amy Kwan	ca2227c1b3	[PowerPC] Implement instruction definitions/MC Tests for xvcvspbf16 and xvcvbf16spn This patch adds the td instruction definitions of the xvcvspbf16 and xvcvbf16spn instructions, along with their respective MC tests. Differential Revision: https://reviews.llvm.org/D86794	2020-09-01 10:59:43 -05:00
Volkan Keles	061182b7ba	GlobalISel: Add combines for extend operations https://reviews.llvm.org/D86516	2020-09-01 08:50:06 -07:00
Matt Arsenault	9e7e1b2d4b	GlobalISel: Implement computeNumSignBits for G_SEXTLOAD/G_ZEXTLOAD	2020-09-01 11:20:02 -04:00
Matt Arsenault	92090e8bd8	GlobalISel: Implement computeKnownBits for G_UNMERGE_VALUES	2020-09-01 11:19:27 -04:00
Paul Walker	bc9a29b9ee	Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain" This reverts commit `e9d9a61208`. This patch was previously revert by `04879086b4` with the reapplication being done after breaking the assert used to ensure SP is always 16-byte aligned, which is a requirement of the AAPCS. For extra context the latest patch caused runtime failures when building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".	2020-09-01 16:09:37 +01:00
Matt Arsenault	18bbd9f15e	GlobalISel: Artifact combine unmerge of unmerge Unmerges have the same fundamental problem as G_TRUNC, and G_TRUNC could be implemented in terms of G_UNMERGE_VALUES. Reducing the number of elements in unmerge results ends up producing the original unmerge type profile, so the artifact combiner needs to eliminate the intermediate illegal registers. This avoids infinite looping in the legalizer in a future change. Assuming an unmerge has each result unmerged the same way, this ends up producing a new unmerge of the source for every definition. I'm not sure if the artifact combiner should either insert temporary merges here and erase the original merge, or if the combiner should look at uses from defs rather than defs from uses for unmerges. In a few cases this regresses from using 16-bit shifts for 8-bit values to using 32-bit shifts, but I think these can be legalized later (the other legalization rules don't try very hard to use 16-bit shifts either).	2020-09-01 11:01:33 -04:00
Anh Tuyen Tran	68717acb24	[LoopIdiomRecognizePass] Options to disable part or the entire Loop Idiom Recognize Pass Loop Idiom Recognize Pass (LIRP) attempts to transform loops with subscripted arrays into memcpy/memset function calls. In some particular situation, this transformation introduces negative impacts. For example: https://bugs.llvm.org/show_bug.cgi?id=47300 This patch will enable users to disable a particular part of the transformation, while he/she can still enjoy the benefit brought about by the rest of LIRP. The default behavior stays unchanged: no part of LIRP is disabled by default. Reviewed By: etiotto (Ettore Tiotto) Differential Revision: https://reviews.llvm.org/D86262	2020-09-01 13:59:24 +00:00
Raphael Isemann	5ffd940ac0	Reland [FileCheck] Move FileCheck implementation out of LLVMSupport into its own library This relands `e9a3d1a401` which was originally missing linking LLVMSupport into LLMVFileCheck which broke the SHARED_LIBS build. Original summary: The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a good reason for having FileCheck implemented there as it has a very specific use while LLVMSupport is a dependency of pretty much every LLVM tool there is. In fact, the only use of FileCheck I could find (outside the FileCheck tool and the FileCheck unit test) is a single call in GISelMITest.h. This moves the FileCheck logic to its own LLVMFileCheck library. This way only FileCheck and the GlobalISelTests now have a dependency on this code. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86344	2020-09-01 14:59:28 +02:00
Sourabh Singh Tomar	03812041d8	[NFCI] Removed an un-used declaration got accidentally introduced in `f91d18eaa9`	2020-09-01 15:59:04 +05:30
Raphael Isemann	7c80f2da81	Revert "[lldb] Add reproducer verifier" This reverts commit `297f69afac`. It broke the Fedora 33 x86-64 bot. See the review for more info.	2020-09-01 12:21:44 +02:00
David Sherwood	9fbb113247	[SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll I have fixed up a number of warnings resulting from TypeSize -> uint64_t casts and calling getVectorNumElements() on scalable vector types. I think most of the changes are fairly trivial except for those in DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create the MachineMemoryOperands in a sensible way for scalable vectors. I have added a CHECK line to the following test: CodeGen/AArch64/sve-split-load.ll that ensures no new warnings are added. Differential Revision: https://reviews.llvm.org/D86697	2020-09-01 07:47:59 +01:00
David Green	ffd0b31c7c	Revert "[ARM] Register pressure with -mthumb forces register reload before each call" Expensive checks are failing, complaining about additional MMO operands added to the branch.	2020-09-01 07:39:54 +01:00
Petr Hosek	3c7bfbd683	[CMake] Use find_library for ncurses Currently it is hard to avoid having LLVM link to the system install of ncurses, since it uses check_library_exists to find e.g. libtinfo and not find_library or find_package. With this change the ncurses lib is found with find_library, which also considers CMAKE_PREFIX_PATH. This solves an issue for the spack package manager, where we want to use the zlib installed by spack, and spack provides the CMAKE_PREFIX_PATH for it. This is a similar change as https://reviews.llvm.org/D79219, which just landed in master. Patch By: haampie Differential Revision: https://reviews.llvm.org/D85820	2020-08-31 20:06:21 -07:00
Alina Sbirlea	63844c116a	[MemorySSA] Clean up single value phis. MemoryPhis with a single value are correct, but can lead to errors when updating. Clean up single entry Phis newly added when cloning blocks. Resolves PR46574.	2020-08-31 19:26:08 -07:00
Xing GUO	428b2ffad4	[DWARFYAML] Make the debug_str section optional. This patch makes the debug_str section optional. When the debug_str section exists but doesn't contain anything, yaml2obj will emit a section header for it. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D86860	2020-09-01 10:02:09 +08:00
Hamilton Tobon Mosquera	1d3d9b9cd8	[OpenMPOpt][NFC] Moving constants as struct static attributes	2020-08-31 19:05:00 -05:00
Lang Hames	b79e19e6d6	[ORC] Remove an unused variable. The unused Main variable was accidentally left in an earlier commit.	2020-08-31 15:35:55 -07:00
Jonas Devlieghere	297f69afac	[lldb] Add reproducer verifier Add a reproducer verifier that catches: - Missing or invalid home directory - Missing or invalid working directory - Missing or invalid module/symbol paths - Missing files from the VFS The verifier is enabled by default during replay, but can be skipped by passing --reproducer-no-verify. Differential revision: https://reviews.llvm.org/D86497	2020-08-31 15:14:18 -07:00
Hamilton Tobon Mosquera	8931add617	[OpenMPOpt][HideMemTransfersLatency] Get values stored in offload arrays getValuesInOffloadArrays goes through the offload arrays in __tgt_target_data_begin_mapper getting the values stored in them before the call is issued. call void @__tgt_target_data_begin_mapper(arg0, arg1, i8 %offload_baseptrs, i8 %offload_ptrs, i64* %offload_sizes, ...) Diferential Revision: https://reviews.llvm.org/D86300	2020-08-31 15:33:05 -05:00
Sanjay Patel	e25449ff57	[IR][GVN] allow intrinsics in Instruction's isCommutative query (2nd try) The 1st try was reverted because I missed an assert that needed softening. As discussed in D86798 / rG09652721 , we were potentially returning a different result for whether an Instruction is commutable depending on if we call the base class or derived class method. This requires relaxing asserts in GVN, but that pass seems to be working otherwise. NewGVN requires more work because it uses different code paths for numbering binops and calls.	2020-08-31 16:01:19 -04:00
Christopher Tetreault	640f20b0c7	[SVE] Remove calls to VectorType::getNumElements from InstCombine Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D82237	2020-08-31 12:59:10 -07:00
Roman Lebedev	c23aefd7c3	[NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647 this must be sufficient otherwise `EliminateDuplicatePHINodes()` would have hit issues with it already.	2020-08-31 22:29:39 +03:00
Prathamesh Kulkarni	85b4d286d7	[ARM] Register pressure with -mthumb forces register reload before each call This patch implements the foldMemoryOperand hook in Thumb1InstrInfo, allowing tBLXr and a spilled function address to be combined back into a tBL. This can help with codesize at Oz, especailly in the tinycrypt library. Differential Revision: https://reviews.llvm.org/D79785	2020-08-31 20:00:30 +01:00
Qiu Chaofan	5475154865	[NFC] [DAGCombiner] Refactor bitcast folding within fabs/fneg fabs and fneg share a common transformation: (fneg (bitconvert x)) -> (bitconvert (xor x sign)) (fabs (bitconvert x)) -> (bitconvert (and x ~sign)) This patch separate the code into a single method. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86862	2020-09-01 00:48:12 +08:00
Qiu Chaofan	eb2a405c18	[NFC] [DAGCombiner] Remove unnecessary negation in visitFNEG In visitFNEG of DAGCombiner, the folding of (fneg (fsub c, x)) is redundant since getNegatedExpression already handles it.	2020-09-01 00:35:01 +08:00
Sanjay Patel	1c9a09f42e	[DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x), better I tried to fix this in: rG716e35a0cf53 ...but that patch depends on the order that we encounter the magic "x/sqrt(x)" expression in the combiner's worklist. This patch should improve that by waiting until we walk the user list to decide if there's a use to skip. The AArch64 test reveals another (existing) ordering problem though - we may try to create an estimate for plain sqrt(x) before we see that it is part of a 1/sqrt(x) expression.	2020-08-31 09:35:59 -04:00
Sourabh Singh Tomar	db464a2753	[NFCI] Silent a build warning due to an extra semi-colon	2020-08-31 17:49:31 +05:30
Raphael Isemann	ed89eb3571	Revert "[FileCheck] Move FileCheck implementation out of LLVMSupport into its own library" This reverts commit `e9a3d1a401`. Seems the new FileCheck library doesn't link on some bots. Reverting for now.	2020-08-31 11:38:40 +02:00
Raphael Isemann	e9a3d1a401	[FileCheck] Move FileCheck implementation out of LLVMSupport into its own library The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a good reason for having FileCheck implemented there as it has a very specific use while LLVMSupport is a dependency of pretty much every LLVM tool there is. In fact, the only use of FileCheck I could find (outside the FileCheck tool and the FileCheck unit test) is a single call in GISelMITest.h. This moves the FileCheck logic to its own LLVMFileCheck library. This way only FileCheck and the GlobalISelTests now have a dependency on this code. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86344	2020-08-31 11:24:41 +02:00
Fangrui Song	f2284e3405	[Sink] Optimize/simplify sink candidate finding with nearest common dominator For an instruction in the basic block BB, SinkingPass enumerates basic blocks dominated by BB and BB's successors. For each enumerated basic block, SinkingPass uses `AllUsesDominatedByBlock` to check whether the basic block dominates all of the instruction's users. This is inefficient. Use the nearest common dominator of all users to avoid enumerating the candidate. The nearest common dominator may be in a parent loop which is not beneficial. In that case, find the ancestors in the dominator tree. In the case that the instruction has no user, with this change we will not perform unnecessary move. This causes some amdgpu test changes. A stage-2 x86-64 clang is a byte identical with this change.	2020-08-30 22:51:00 -07:00
Sanjay Patel	badd7264e1	Revert "[IR][GVN] allow intrinsics in Instruction's isCommutative query" This reverts commit `25597f7783`. It is causing crashing on bots such as: http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10523/steps/ninja-build/logs/stdio	2020-08-30 17:02:51 -04:00
Florian Hahn	86d817d7cf	[DSE,MemorySSA] Skip defs without analyzable write locations. Similar to other checks above, if there is no write location for a def, it cannot be considered for elimination and can be skipped.	2020-08-30 21:56:25 +01:00
Sanjay Patel	25597f7783	[IR][GVN] allow intrinsics in Instruction's isCommutative query As discussed in D86798 / rG09652721 , we were potentially returning a different result for whether an Instruction is commutable depending on if we call the base class or derived class method. This requires relaxing an assert in GVN, but that pass seems to be working otherwise. NewGVN requires more work because it uses different code paths for numbering binops and calls.	2020-08-30 16:49:22 -04:00
Florian Hahn	42c57c294d	[DSE,MemorySSA] Simplify code, EarlierAccess is be a MemoryDef (NFC). After recent changes, we return early if Current is a MemoryPhi, so EarlierAccess can only be a MemoryDef.	2020-08-30 21:31:57 +01:00
Thomas Preud'homme	998709b7d5	[FileCheck] Add precision to format specifier Add printf-style precision specifier to pad numbers to a given number of digits when matching them if the value is smaller than the given precision. This works on both empty numeric expression (e.g. variable definition from input) and when matching a numeric expression. The syntax is as follows: [[#%.<precision><format specifier>, ...] where <format specifier> is optional and ... can be a variable definition or not with an empty expression or not. In the absence of a precision specifier, a variable definition will accept leading zeros. Reviewed By: jhenderson, grimar Differential Revision: https://reviews.llvm.org/D81667	2020-08-30 19:40:57 +01:00
Florian Hahn	eb35ebb3a2	[LV] Update CFG before adding runtime checks. addRuntimeChecks uses SCEVExpander, which relies on the DT/LoopInfo to be up-to-date. Changing the CFG afterwards may invalidate some inserted instructions, especially LCSSA phis. Reorder the code to first update the CFG and then create the runtime checks. This should not have any impact on the generated code, as we adjust the CFG and generate runtime checks together. Fixes PR47343.	2020-08-30 18:21:44 +01:00
Sanjay Patel	2d3e12818e	[FastISel] update to use intrinsic's isCommutative(); NFC This requires adding a missing 'const' to the definition because the callers are using const args, but there should be no change in behavior. The intrinsic method was added with D86798 / rG096527214033	2020-08-30 11:36:41 -04:00
Sanjay Patel	716e35a0cf	[DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x) In general, we probably want to try the multi-use reciprocal transform before sqrt transforms, but x/sqrt(x) is a special-case because that will always reduce to plain sqrt(x) or an estimate. The AArch64 tests show that the transform is limited by TLI hook to patterns where there are 3 or more uses of the divisor. So this change can result in an extra division compared to what we had, but that's the intended behvior based on the current setting of that hook.	2020-08-30 10:55:45 -04:00
Sanjay Patel	af4581e8ab	[SLP] make commutative check apply only to binops; NFC As discussed in D86798, it's not clear if the caller code works with a more liberal definition of "commutative" that includes intrinsics like min/max. This makes the binop restriction (current functionality is unchanged) explicit until the code is audited/tested.	2020-08-30 10:55:44 -04:00
Krzysztof Parzyszek	69fac677bc	[Hexagon] Fix perfect shuffle generation for single vectors Perfect shuffle instruction (vdealvdd/vshuffvdd) work on vector pairs. When given a single input vector, half of it first needs to be transposed into the other vector before the generated shuffles can take effect. Also the first transpose needs to be undone at the end (this last step was missing).	2020-08-30 06:43:16 -05:00
David Green	543c5425f1	[LV] Add some const to RecurrenceDescriptor. NFC	2020-08-30 12:27:51 +01:00
sstefan1	5dfd7cc46c	Reland [OpenMPOpt] ICV tracking for calls The problem with module slice has been addressed in D86319 Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a function can have a unique ICV value after it is finished, and AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a call site. This enables us to check the value of a call and if it changes the ICV. This also changes the approach in `getReplacementValues()` to a worklist-based approach so we can explore all relevant BBs. Differential Revision: https://reviews.llvm.org/D85544	2020-08-30 11:27:48 +02:00
sstefan1	8d8ce85b23	[Attributor] Introduce module slice. Summary: The module slice describes which functions we can analyze and transform while working on an SCC as part of the Attributor-CGSCC pass. So far we simply restricted it to the SCC. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D86319	2020-08-30 10:30:44 +02:00
Shinji Okumura	a7ca9e09bd	[Attributor] Fix callsite check in AAUndefinedBehavior This is the next patch of D86842 When we check `noundef` attribute violation at callsites, we do not have to require `nonnull` in the following two cases. 1. An argument is known to be simplified to undef 2. An argument is known to be dead Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86845	2020-08-30 13:17:02 +09:00
Shinji Okumura	7082381735	[Attributor][NFC] Fix dependency type in AAUndefinedBehaviorImpl::updateImpl This patch fixes wrong dependency type in AAUB. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86842	2020-08-30 12:34:50 +09:00
Fangrui Song	6ae7b403c3	Set alignment of .llvmbc and .llvmcmd to 1 Otherwise their alignment is dependent on the size of the section. If the size is large than 16, the alignment will be 16. 16 is a bad choice for both .llvmbc and .llvmcmd because the padding between two contributions from input sections is of a variable size. A bitstream is actually guaranteed to be 4-byte aligned, but consumers don't need this property.	2020-08-29 18:27:34 -07:00
Lang Hames	e1d5f7d003	[ORC] Add getDFSLinkOrder / getReverseDFSLinkOrder methods to JITDylib. DFS and Reverse-DFS linkage orders are used to order execution of deinitializers and initializers respectively. This patch replaces uses of special purpose DFS order functions in MachOPlatform and LLJIT with uses of the new methods.	2020-08-29 15:17:06 -07:00
Shinji Okumura	7a15dfd056	[Attributor] Fix AANoUndef identification Even though `noundef` IR attribute might be attached to non-void type values, AANoUndef is mistakenly identified for pointer type values only. This patch fixes that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86737	2020-08-30 05:39:25 +09:00
Nikita Popov	88b310f64b	[InstSimplify] Reduce code duplication in simplifySelectWithICmpCond (NFC) Canonicalize icmp ne to icmp eq and implement all the folds only once.	2020-08-29 22:38:49 +02:00
Nikita Popov	a5be86fde5	[InstSimplify] Protect against more poison in SimplifyWithOpReplaced (PR47322) Replace the check for poison-producing instructions in SimplifyWithOpReplaced() with the generic helper canCreatePoison() that properly handles poisonous shifts and thus avoids the problem from PR47322. This additionally fixes a bug in IIQ.UseInstrInfo=false mode, which previously could have caused this code to ignore poison flags. Setting UseInstrInfo=false should reduce the possible optimizations, not increase them. This is not a full solution to the problem, as poison could be introduced more indirectly. This is just a minimal, easy to backport fix. Differential Revision: https://reviews.llvm.org/D86834	2020-08-29 21:59:39 +02:00
Florian Hahn	5067f4b626	[LV] Check opt-for-size before expanding runtime checks. Move bail out when optimizing for size before runtime check generation. In that case, we do not use the result of the expansion, the expanded instruction will be dead and cleaned up later. By doing the check before expanding the runtime-checks, we can save a bit of unnecessary work.	2020-08-29 20:35:14 +01:00
Nikita Popov	a400a61721	[LVI] Remove unnecessary lambda capture (NFC)	2020-08-29 21:33:19 +02:00
Nikita Popov	6d88f6efd4	Reapply [LVI] Normalize pointer behavior This got reverted because a dependency was reverted. It has since been reapplied, so reapply this as well. ----- Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this more powerful than the previous implementation as a side-effect, and this does actually seem benefitial in practice. Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914	2020-08-29 21:17:03 +02:00
Roman Lebedev	1dcb936cf6	[NFC][Local] EliminateDuplicatePHINodes(): add STATISTIC()	2020-08-29 22:03:18 +03:00
Roman Lebedev	961483a5ea	[NFCI][Local] Rewrite EliminateDuplicatePHINodes to optionally check hashing invariants EarlyCSE has a mode to verify the invariant that hash equality equals key equality, but EliminateDuplicatePHINodes() doesn't. I've verified that this would have caught the stage2-stage3 mismatches `5ec2b757cc` revert has fixed, that were introduced last time in `3e69871ab5`.	2020-08-29 22:03:10 +03:00
Shinji Okumura	1364d856f4	[Attributor][NFC] Do not manifest noundef for positions to be changed to undef This patch fixes AANoUndef manifestation. We should not manifest noundef for positions that will be changed to undef. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86835	2020-08-30 03:23:41 +09:00
Florian Hahn	31cdb29de4	[DSE,MemorySSA] Return early when hitting a MemoryPhi. A MemoryPhi can never be eliminated. If we hit one, return the Phi, so the caller can continue traversing the incoming accesses. This saves some unnecessary read clobber checks and improves compile-time http://llvm-compile-time-tracker.com/compare.php?from=1ffc58b6d098ce8fa71f3a80fe75b990f633f921&to=d0fa8d1982380b57d7b6067528104bc373dbe07a&stat=instructions	2020-08-29 18:28:26 +01:00
Benjamin Kramer	8e5b1557e5	[IR] Inline AttrBuilder::addAttribute. It just sets 1 bit. NFC.	2020-08-29 19:13:49 +02:00
Roman Lebedev	5ec2b757cc	[Instruction] Speculatively undo isIdenticalToWhenDefined() PHI handling changes The stage2-stage3 differences persist even without instcombine-based PHI CSE, so this is the only possible reason.	2020-08-29 19:38:57 +03:00
Sanjay Patel	0965272140	[EarlyCSE] fold commutable intrinsics Handling the new min/max intrinsics is the motivation, but it turns out that we have a bunch of other intrinsics with this missing bit of analysis too. The FP min/max tests show that we are intersecting FMF, so that part should be safe too. As noted in https://llvm.org/PR46897 , there is a commutative property specifier for intrinsics, but no corresponding function attribute, and so apparently no uses of that bit. We may want to remove that next. Follow-up patches should wire up the Instruction::isCommutative() to this IntrinsicInst specialization. That requires updating callers to be aware of the more general commutative property (not just binops). Differential Revision: https://reviews.llvm.org/D86798	2020-08-29 12:11:01 -04:00
Nikita Popov	51d34c0c53	[TargetLowering] Strip tailing whitespace (NFC)	2020-08-29 18:09:08 +02:00
Roman Lebedev	bf21ce7b90	[InstCombine] Take 3: Perform trivial PHI CSE The original take 1 was `6102310d81`, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit `ed90f15efb` that reverted it: > It appears to cause compilation non-determinism and caused stage3 mismatches. Then there was take 2 `3e69871ab5`, which was InstCombine-specific, but it again showed stage2-stage3 differences, and reverted in `bdaa3f86a0`. This is quite alarming. Here, let's try to change how we find existing PHI candidate: due to the worklist order, and the way PHI nodes are inserted (it may be inserted as the first one, or maybe not), let's look at all PHI nodes in the block. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| asm-printer.EmittedInsts \| 7942329 \| 7942457 \| 128 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254312480 \| 16848 \| 0.01% \| 0.01% \| \| correlated-value-propagation.NumPhis \| 18412 \| 18347 \| -65 \| -0.35% \| 0.35% \| \| early-cse.NumCSE \| 2183283 \| 2183267 \| -16 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3644419 \| 4108 \| 0.11% \| 0.11% \| \| instcombine.NumDeadInst \| 1778204 \| 1783205 \| 5001 \| 0.28% \| 0.28% \| \| instcombine.NumPHICSEs \| 0 \| 22490 \| 22490 \| 0.00% \| 0.00% \| \| instcombine.NumWorklistIterations \| 2023272 \| 2024400 \| 1128 \| 0.06% \| 0.06% \| \| instcount.NumCallInst \| 1758395 \| `1758802` \| 407 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330545 \| -12 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077220 \| 82 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832606 \| 660 \| 0.01% \| 0.01% \| \| simplifycfg.NumHoistCommonCode \| 24186 \| 24187 \| 1 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999767 \| -20046 \| -1.97% \| 1.97% \| ``` So it fires 22490 times, which is less than ~24k the take 1 did, but more than what take 2 did (22228 times) . It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good.	2020-08-29 18:21:24 +03:00
Roman Lebedev	bdaa3f86a0	Revert "[InstCombine] Take 2: Perform trivial PHI CSE" While the original variant with doing this in InstSimplify (rightfully) caused questions and ultimately was detected to be a culprit of stage2-stage3 mismatch, it was expected that InstCombine-based implementation would be fine. But apparently it's not, as http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio suggests. Which suggests that somewhere in InstCombine there is a loop over nondeterministically sorted container, which causes different worklist ordering. This reverts commit `3e69871ab5`.	2020-08-29 16:05:02 +03:00
Nikita Popov	6093b14c2c	[InstCombine] Return replaceInstUsesWith() result (NFC) Follow the usual usage pattern for this function and return the result.	2020-08-29 14:49:57 +02:00
Martin Storsjö	5b86d130e2	[AArch64] Generate and parse SEH assembly directives This ensures that you get the same output regardless if generating code directly to an object file or if generating assembly and assembling that. Add implementations of the EmitARM64WinCFI*() methods in AArch64TargetAsmStreamer, and fill in one blank in MCAsmStreamer. Add corresponding directive handlers in AArch64AsmParser and COFFAsmParser. Some SEH directive names have been picked to match the prior art for SEH assembly directives for x86_64, e.g. the spelling of ".seh_startepilogue" matching the preexisting ".seh_endprologue". For the directives for saving registers, the exact spelling from the arm64 documentation is picked, e.g. ".seh_save_reg" (to follow that naming for all the other ones, e.g. ".seh_save_fregp_x"), while the corresponding one for x86_64 is plain ".seh_savereg" without the second underscore. Directives in the epilogues have the same names as in prologues, e.g. .seh_savereg, even though the registers are restored, not saved, at that point. Differential Revision: https://reviews.llvm.org/D86529	2020-08-29 15:15:22 +03:00
Martin Storsjö	20f7773bb4	[MC] [Win64EH] Fill in FuncletOrFuncEnd if missing This can happen e.g. for code that declare .seh_proc/.seh_endproc in assembly, or for code that use .seh_handlerdata (which triggers the unwind info to be emitted before the end of the function). The TextSection field must be made non-const to be able to use it with Streamer.SwitchSection(). Differential Revision: https://reviews.llvm.org/D86528	2020-08-29 15:15:22 +03:00
Roman Lebedev	71ac9105cd	[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
Roman Lebedev	e65f213178	[InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
Roman Lebedev	bd12113f57	[NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith()	2020-08-29 15:10:14 +03:00
Roman Lebedev	65b3854e10	[NFC] Instruction::isIdenticalToWhenDefined(): s/nessesairly/necessarily/	2020-08-29 15:10:13 +03:00
Roman Lebedev	49d223274f	[NFC][InstCombine] Add STATISTIC() for how many iterations we did As we've established, if it takes more than two iterations (one to perform folding and one to ensure that no folding opportunities remain) per function, then there are worklist management issues. So it may be interesting to keep track of it.	2020-08-29 15:10:13 +03:00
Roman Lebedev	4f4eecf0ec	[InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW As noted in post-commit review, we really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:00 +03:00
Roman Lebedev	3e69871ab5	[InstCombine] Take 2: Perform trivial PHI CSE The original take was `6102310d81`, which taught InstSimplify to do that, which seemed better at time, since we got EarlyCSE support for free. However, it was proven that we can not do that there, the simplified-to PHI would not be reachable from the original PHI, and that is not something InstSimplify is allowed to do, as noted in the commit `ed90f15efb` that reverted it : > It appears to cause compilation non-determinism and caused stage3 mismatches. However InstCombine already does many different optimizations, so it should be a safe place to do it here. Note that we still can't just compare incoming values ranges, because there is no guarantee that these PHI's we'd simplify to were already re-visited and sorted. However coming up with a test is problematic. Effects on vanilla llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| instcombine.NumPHICSEs \| 0 \| 22228 \| 22228 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 7942329 \| 7942456 \| 127 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 254295632 \| 254313792 \| 18160 \| 0.01% \| 0.01% \| \| early-cse.NumCSE \| 2183283 \| 2183272 \| -11 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 541842 \| -8263 \| -1.50% \| 1.50% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640311 \| 3666911 \| 26600 \| 0.73% \| 0.73% \| \| instcombine.NumDeadInst \| 1778204 \| 1783318 \| 5114 \| 0.29% \| 0.29% \| \| instcount.NumCallInst \| 1758395 \| 1758804 \| 409 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330549 \| -8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1077138 \| 1077221 \| 83 \| 0.01% \| 0.01% \| \| instcount.TotalFuncs \| 101442 \| 101441 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8831946 \| 8832611 \| 665 \| 0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019813 \| 999740 \| -20073 \| -1.97% \| 1.97% \| ``` So it fires ~22k times, which is less than ~24k the take 1 did. It allows foldAggregateConstructionIntoAggregateReuse() to actually work after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG would have done this PHI CSE, of all places. Additionally, allows some more `invoke`->`call` folds to happen (+110, +2.56%). All in all, expectedly, this catches less things overall, but all the motivational cases are still caught, so all good.	2020-08-29 13:13:06 +03:00
Nikita Popov	57a26bb7b4	[InstCombine] Fix typo in comment (NFC) As pointed out in post-commit review of D63060.	2020-08-29 10:17:17 +02:00
Rainer Orth	672d7836bb	[Target][AArch64] Allow for char as int8_t in AArch64AsmParser.cpp A couple of AArch64 tests were failing on Solaris, both sparc and x86: LLVM :: MC/AArch64/SVE/add-diagnostics.s LLVM :: MC/AArch64/SVE/cpy-diagnostics.s LLVM :: MC/AArch64/SVE/cpy.s LLVM :: MC/AArch64/SVE/dup-diagnostics.s LLVM :: MC/AArch64/SVE/dup.s LLVM :: MC/AArch64/SVE/mov-diagnostics.s LLVM :: MC/AArch64/SVE/mov.s LLVM :: MC/AArch64/SVE/sqadd-diagnostics.s LLVM :: MC/AArch64/SVE/sqsub-diagnostics.s LLVM :: MC/AArch64/SVE/sub-diagnostics.s LLVM :: MC/AArch64/SVE/subr-diagnostics.s LLVM :: MC/AArch64/SVE/uqadd-diagnostics.s LLVM :: MC/AArch64/SVE/uqsub-diagnostics.s For example, reduced from `MC/AArch64/SVE/add-diagnostics.s`: add z0.b, z0.b, #0, lsl #8 missed the expected diagnostics $ ./bin/llvm-mc -triple=aarch64 -show-encoding -mattr=+sve add.s add.s:1:21: error: immediate must be an integer in range [0, 255] with a shift amount of 0 add z0.b, z0.b, #0, lsl #8 ^ The message is `Match_InvalidSVEAddSubImm8`, emitted in the generated `lib/Target/AArch64/AArch64GenAsmMatcher.inc` for `MCK_SVEAddSubImm8`. When comparing the call to `::AArch64Operand::isSVEAddSubImm<char>` on both Linux/x86_64 and Solaris, I find 875 bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value; is `false` on Solaris, unlike Linux. The problem boils down to the fact that `int8_t` is plain `char` on Solaris: both the sparc and i386 psABIs have `char` as signed. However, with 9887 DiagnosticPredicate DP(Operand.isSVEAddSubImm<int8_t>()); in `lib/Target/AArch64/AArch64GenAsmMatcher.inc`, `std::make_signed_t<int8_t>` above yieds `signed char`, so `std::is_same<int8_t, signed char>` is `false`. This can easily be fixed by also allowing for `int8_t` here and in a few similar places. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D85225	2020-08-29 10:01:04 +02:00
Craig Topper	6dcd9f517e	[Attributes] Merge calls to getFnAttribute/hasFnAttribute using Attribute::isValid. NFC Rather than calling hasFnAttribute and then calling getFnAttribute if the attribute exists, its better to just call getFnAttribute and then check if we got a valid attribute back.	2020-08-29 00:23:13 -07:00
Roman Lebedev	c1b3e32118	[NFC][InstructionSimplify] Add a warning about not simplifying to not def-reachable See https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html and https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824967.html InstSimply is not allowed to perform simplifications to instructions that are not def-reachable from the original instruction.	2020-08-29 09:58:08 +03:00
Xing GUO	12e832cbcb	[DWARFYAML] Make the debug_abbrev_offset field optional. This patch helps make the debug_abbrev_offset field optional. We don't need to calculate the value of this field in the future. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86614	2020-08-29 14:54:52 +08:00
Kai Luo	b904324788	[DAGCombiner] Enhance (zext(setcc)) Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86687	2020-08-29 03:37:41 +00:00
Akira Hatanaka	0231a4e5bd	[ObjC][ARC] In HandlePotentialAlterRefCount, check whether an instruction can decrement the reference count, not whether it can alter it This prevents the state transition from S_Use to S_CanRelease when doing a bottom-up traversal and the transition from S_Retain to S_CanRelease when doing a top-down traversal when the visited instruction can increment the ref count but cannot decrement it. This allows the ARC optimizer to remove retain/release pairs which were previously not removed. rdar://problem/21793154	2020-08-28 17:45:14 -07:00
Owen Anderson	ed90f15efb	Revert "[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block" This reverts commit `6102310d81`. It appears to cause compilation non-determinism and caused stage3 mismatches.	2020-08-28 23:43:42 +00:00
Fangrui Song	b5ef137c11	[gcov] Increment counters with atomicrmw if -fsanitize=thread Without this patch, `clang --coverage -fsanitize=thread` may fail spuriously because non-atomic counter increments can be detected as data races.	2020-08-28 16:32:35 -07:00
Matt Arsenault	1b201914b5	GlobalISel: Combine out redundant sext_inreg The scalar tests don't work yet, since computeNumSignBits apparently doesn't handle sextload yet, and sext folds into the load first.	2020-08-28 17:57:31 -04:00
Jon Roelofs	b15f2bd3ad	[early-ifcvt] Add OptRemarks	2020-08-28 15:51:18 -06:00
Matt Arsenault	9145d75226	AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples The implicit def of the super register would appear to kill any live uses of components before the spill, and would be deleted by MachineCopyPropagation. We need to add implicit uses of the super register, similarly to what copyPhysReg does. VGPR tuples appear to be correctly handled already. I need to double check the SGPR->memory path.	2020-08-28 17:50:37 -04:00
Craig Topper	aab90384a3	[Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None) There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None. So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline. This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations. Differential Revision: https://reviews.llvm.org/D86744	2020-08-28 13:23:45 -07:00
Arthur Eubanks	cfde93e5d6	[ObjCARCOpt] Port objc-arc to NPM Since doInitialization() in the legacy pass modifies the module, the NPM pass is a Module pass. Reviewed By: ahatanak, ychen Differential Revision: https://reviews.llvm.org/D86178	2020-08-28 12:59:33 -07:00
Tyker	6d3657417e	[SROA] Improve handleling of assumes bundles by SROA This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e And gives SROA the ability to remove assumes if it allows promoting an alloca to register Without removing assumes when it can't promote to register. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86570	2020-08-28 21:55:45 +02:00
Nikita Popov	ffe05dd125	[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178) Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding usub.sat(a, b) + b to umax(a, b). The backend will expand umax back to usubsat if that is profitable. We may also want to handle uadd.sat(a, b) - b in the future. Differential Revision: https://reviews.llvm.org/D63060	2020-08-28 21:52:29 +02:00
serge-sans-paille	2296182181	Skip analysis re-computation when no changes are reported This is a follow-up to https://reviews.llvm.org/D80707, generalized to CallGraphSCC, Loop and Region Differential Revision: https://reviews.llvm.org/D86442	2020-08-28 21:41:01 +02:00
Sjoerd Meijer	5f1cad4d29	[ARM] Skip combining base updates for vld1x NEON intrinsics Skip this for now, to avoid a backend crash in: UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412 This should fix PR45824. Differential Revision: https://reviews.llvm.org/D86784	2020-08-28 20:29:15 +01:00
Benjamin Kramer	8782c72765	Strength-reduce SmallVectors to arrays. NFCI.	2020-08-28 21:14:20 +02:00
Benjamin Kramer	52cc97a0db	[CodeGenPrepare] Zap the argument of llvm.assume when deleting it We know that the argument is mostly likely dead, so we can purge it early. Otherwise it would make it to codegen, and can block further optimizations.	2020-08-28 20:52:22 +02:00
Snehasish Kumar	94faadaca4	[llvm][CodeGen] Machine Function Splitter We introduce a codegen optimization pass which splits functions into hot and cold parts. This pass leverages the basic block sections feature recently introduced in LLVM from the Propeller project. The pass targets functions with profile coverage, identifies cold blocks and moves them to a separate section. The linker groups all cold blocks across functions together, decreasing fragmentation and improving icache and itlb utilization. We evaluated the Machine Function Splitter pass on clang bootstrap and SPECInt 2017. For clang bootstrap we observe a mean 2.33% runtime improvement with a ~32% reduction in itlb and stlb misses. Additionally, L1 icache misses reduced by 9.5% while L2 instruction misses reduced by 20%. For SPECInt we report the change in IntRate the C/C++ benchmarks. All benchmarks apart from mcf and x264 improve, on average by 0.6% with the max for deepsjeng at 1.6%. Benchmark % Change 500.perlbench_r 0.78 502.gcc_r 0.82 505.mcf_r -0.30 520.omnetpp_r 0.18 523.xalancbmk_r 0.37 525.x264_r -0.46 531.deepsjeng_r 1.61 541.leela_r 0.83 557.xz_r 0.15 Differential Revision: https://reviews.llvm.org/D85368	2020-08-28 11:10:14 -07:00
Anna Welker	064981f0ce	[ARM][MVE] Enable MVE gathers and scatters by default Enable MVE gather/scatters by default, which requires some minor adaptations in some tests. Differential revision: https://reviews.llvm.org/D86776	2020-08-28 19:05:29 +01:00
David Green	4ca60915bc	[ARM] Correct predicate operand for offset gather/scatter These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask. Differential Revision: https://reviews.llvm.org/D86791	2020-08-28 17:48:15 +01:00
Albion Fung	331dcc43ea	[PowerPC] Implemented Vector Load with Zero and Signed Extend Builtins This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1. Differential Revision: https://reviews.llvm.org/D82502#inline-797941	2020-08-28 11:28:58 -05:00
Denis Antrushin	fabd4c1ae1	[Statepoint] Always spill base pointer. There is a subtle problem with new statepoint lowering scheme when base and pointers are the same (see PR46917 for more context): %1 = STATEPOINT ... %0, %0(tied-def 0)... if, for some reason, register allocator desides to put two instances of %0 into two different objects (registers or spill slots), we may end up with $reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)... and nothing will prevent later passes to sink uses of $reg2 below statepoint, which is incorrect. As a short term solution, always put base pointers on stack during lowering. A longer term solution may be to rework MIR statepoint format to avoid GC pointer duplication in statepoint argument list. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D86712	2020-08-28 23:22:07 +07:00
Yonghong Song	443d352a1c	[GlobalISel] fix a compilation error with gcc 6.3.0 With gcc 6.3.0, I hit the following compilation error: ../lib/CodeGen/GlobalISel/Combiner.cpp: In member function ‘bool llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&, llvm::GISelCSEInfo*)’: ../lib/CodeGen/GlobalISel/Combiner.cpp:156:54: error: suggest parentheses around ‘&&’ within ‘\|\|’ [-Werror=parentheses] assert(!CSEInfo \|\| !errorToBool(CSEInfo->verify()) && ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ "CSEInfo is not consistent. Likely missing calls to " ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "observer on mutations"); Fix the code as suggested by the compiler.	2020-08-28 09:16:52 -07:00
QingShan Zhang	deb4b25807	[DAGCombine] Don't delete the node if it has uses immediately This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node. ``` if (NegX && (CostX <= CostY)) { Cost = std::min(CostX, CostZ); RemoveDeadNode(NegY); return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags); #<-- NegY is used here if NegY == NegX. } ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86689	2020-08-28 16:13:43 +00:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Xing GUO	f20e6c7253	[DWARFYAML] Abbrev codes in a new abbrev table should start from 1 (by default). The abbrev codes in a new abbrev table should start from 1 (by default), rather than inherit the value from the code in the previous table. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86545	2020-08-28 21:18:11 +08:00
Denis Antrushin	248a67f144	[Statepoint] Turn assert into check in foldPatchpoint. Original D81646 had check for tied regs in foldPatchpoint(). Due to unfortunate miscommunication with review comments and adressing some comments post commit, it turned into assertion. We had an offline talk and agreed that with current implementation this path is possible, so I'm changing it back to check. Note that this is workaround until ussues described in PR46917 are resolved.	2020-08-28 20:00:23 +07:00
Sam Parker	b30adfb529	[ARM][LowOverheadLoops] Liveouts and reductions Remove the code that tried to look for reduction patterns, since the vectorizer and isel can now produce predicated arithmetic instructios within the loop body. This has required some reorganisation and fixes around live-out and predication checks, as well as looking for cases where an input/output is initialised to zero. Differential Revision: https://reviews.llvm.org/D86613	2020-08-28 13:56:16 +01:00
Benjamin Kramer	3524c23ff2	[SCCP] Use bulk-remove API to bulk-remove attributes. NFCI.	2020-08-28 14:44:14 +02:00
Benjamin Kramer	dce72dc870	[FunctionAttrs] Bulk remove attributes. NFC.	2020-08-28 12:56:19 +02:00
Ties Stuij	d678e14c55	[AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported Previously in addTypeForNeon, we would set the operations for bfloat vectors like other generic types. But as bfloat is a storage-only type a number of operations shouldn't be set. This patch fixes that. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85101	2020-08-28 11:44:37 +01:00
Florian Hahn	43aa7227df	[DSE,MemorySSA] Check if Current is valid for elimination first. This changes getDomMemoryDef to check if a Current is a valid candidate for elimination before checking for reads. Before the change, we were spending a lot of compile-time in checking for read accesses for Current that might not even be removable. This patch flips the logic, so we skip Current if they cannot be removed before checking all their uses. This is much more efficient in practice. It also adds a more aggressive limit for checking partially overlapping stores. The main problem with overlapping stores is that we do not know if they will lead to elimination until seeing all of them. This patch limits adds a new limit for overlapping store candidates, which keeps the number of modified overlapping stores roughly the same. This is another substantial compile-time improvement (while also increasing the number of stores eliminated). Geomean -O3 -0.67%, ReleaseThinLTO -0.97%. http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86487	2020-08-28 11:19:04 +01:00
Florian Hahn	fd6ebea50d	[MemLoc] Support memcmp in MemoryLocation::getForArgument. This patch adds support for memcmp in MemoryLocation::getForArgument. memcmp reads from the first 2 arguments up to the number of bytes of the third argument. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86725	2020-08-28 10:19:54 +01:00
Florian Hahn	20e989e9de	[BuildLibCalls] Add argmemonly to more lib calls. strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr, memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp should all only access memory through their arguments. I broke out strcoll, strcasecmp, strncasecmp because the result depends on the locale, which might get accessed through memory. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86724	2020-08-28 09:50:38 +01:00
Martin Storsjö	db1ec04963	[ValueTracking] Remove a stray semicolon. NFC. This silences warnings when built with GCC at least.	2020-08-28 09:24:10 +03:00
Martin Storsjö	37ef743cbf	[MC] [Win64EH] Avoid producing malformed xdata records If there's no unwinding opcodes, omit writing the xdata/pdata records. Previously, this generated truncated xdata records, and llvm-readobj would error out when trying to print them. If writing of an xdata record is forced via the .seh_handlerdata directive, skip it if there's no info to make a sensible unwind info structure out of, and clearly error out if such info appeared later in the process. Differential Revision: https://reviews.llvm.org/D86527	2020-08-28 09:05:36 +03:00
serge-sans-paille	b1f4e5979b	(Expensive) Check for Loop, SCC and Region pass return status This generalizes the logic introduced in https://reviews.llvm.org/D80916 to other passes. It's needed by https://reviews.llvm.org/D86442 to assert passes correctly report their status. Differential Revision: https://reviews.llvm.org/D86589	2020-08-28 07:56:35 +02:00
Kai Luo	cbea17568f	[PowerPC] PPCBoolRetToInt: Don't translate Constant's operands When collecting `i1` values via `findAllDefs`, ignore Constant's operands, since Constant's operands might not be `i1`. Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE ``` llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant llvm::ConstantExpr::getZExt(llvm::Constant , llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed. ``` Differential Revision: https://reviews.llvm.org/D85007	2020-08-28 01:56:12 +00:00
Alina Sbirlea	d370836c20	[MemorySSA] Assert defining access is not a MemoryUse.	2020-08-27 18:21:10 -07:00
Harmen Stoppels	cdcb9ab10e	Revert "Use find_library for ncurses" The introduction of find_library for ncurses caused more issues than it solved problems. The current open issue is it makes the static build of LLVM fail. It is better to revert for now, and get back to it later. Revert "[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows" This reverts commit `1ed1e16ab8`. Revert "Fix msan build" This reverts commit `34fe9613dd`. Revert "[CMake] Always mark terminfo as unavailable on Windows" This reverts commit `76bf26236f`. Revert "[CMake] Fix OCaml build failure because of absolute path in system libs" This reverts commit `8e4acb82f7`. Revert "[CMake] Don't look for terminfo libs when LLVM_ENABLE_TERMINFO=OFF" This reverts commit `495f91fd33`. Revert "Use find_library for ncurses" This reverts commit `a52173a3e5`. Differential revision: https://reviews.llvm.org/D86521	2020-08-27 17:57:26 -07:00
Matt Arsenault	5feca7c9c3	GlobalISel: Implement computeNumSignBits for G_SEXT_INREG	2020-08-27 19:44:37 -04:00
Matt Arsenault	af1c1e20f4	AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize	2020-08-27 19:39:44 -04:00
Matt Arsenault	9d3dc276a6	AMDGPU: Fix broken switch braces	2020-08-27 19:39:39 -04:00
Matt Arsenault	f08bbde83f	Correctly revert "GlobalISel: Use & operator on KnownBits" I mis-resolved the revert through moving the code to another function.	2020-08-27 19:08:31 -04:00
Matt Arsenault	6cf4f25670	Revert "GlobalISel: Use & operator on KnownBits" This reverts commit `e53b799779`. Confusingly, this does not simply and the two sets of known bits, but implements known bits for the and operator.	2020-08-27 18:52:34 -04:00
Vitaly Buka	23524fdece	[ValueTracking] Replace recursion with Worklist Now findAllocaForValue can handle nontrivial phi cycles.	2020-08-27 14:44:49 -07:00
Brad Smith	d870e36326	[SSP] Restore setting the visibility of __guard_local to hidden for better code generation. Patch by: Philip Guenther	2020-08-27 17:17:38 -04:00
Shinji Okumura	50ebd1afa9	[Attributor] Do not manifest noundef for dead positions Even if noundef is deduced for a position, we should not manifest it when the position is dead. This is because the associated values with dead positions are replaced with undef values by AAIsDead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86565	2020-08-28 05:58:18 +09:00
Matt Arsenault	abc99ab572	GlobalISel: Implement known bits for min/max	2020-08-27 16:56:17 -04:00
Matt Arsenault	ee679638d7	MIR: Infer not-SSA for subregister defs It's possible to have a single virtual register def with a subreg index that would pass the previous check, but it's not possible to have a subregister def in SSA. This is in preparation for adding stricter checks for SSA MIR.	2020-08-27 16:56:16 -04:00
Vitaly Buka	a40660551e	[StackSafety] Ignore allocas with partial lifetime markers Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D86672	2020-08-27 13:54:41 -07:00
Vitaly Buka	a6927c8621	[NFC][ValueTracking] Add OffsetZero into findAllocaForValue For StackLifetime after finding alloca we need to check that values ponting to the begining of alloca. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D86692	2020-08-27 13:46:22 -07:00
Matt Arsenault	a1bc37c9e5	AMDGPU: Use caller subtarget, not intrinsic declaration Intrinsic declarations use the default subtarget, but this should be using the subtarget for the calling function. I haven't been able to come up with a case where it matters though.	2020-08-27 16:42:09 -04:00
Krzysztof Parzyszek	4ef9275b9b	[Hexagon] Emit better 32-bit multiplication sequence for HVXv62+	2020-08-27 15:24:32 -05:00
Eli Friedman	8d21985a75	[RegisterScavenging] Delete dead function unprocess().	2020-08-27 13:19:32 -07:00
Roman Lebedev	b85f91fdce	[InstSimplify] SimplifyPHINode(): check that instruction is in basic block first As pointed out in post-commit review, this can legally be called on instructions that are not inserted into basic blocks, so don't blindly assume that there is basic block.	2020-08-27 22:32:03 +03:00
Christopher Tetreault	035833ae42	[SVE] Remove bad call to VectorType::getNumElements() from HeapProfiler Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D86727	2020-08-27 12:16:00 -07:00
Shinji Okumura	c5e6872ec6	[Attributor] Guarantee getAAFor not to update AA in the manifestation stage If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated. This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86635	2020-08-28 04:07:42 +09:00
Christopher Tetreault	5e63083435	[SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D82056	2020-08-27 12:02:20 -07:00
Christopher Tetreault	5a55e2781c	[SVE] Remove calls to VectorType::getNumElements from IR Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D81500	2020-08-27 11:16:10 -07:00
Matt Arsenault	e53b799779	GlobalISel: Use & operator on KnownBits Avoid repeating for zero and one	2020-08-27 14:07:18 -04:00
Matt Arsenault	531f7063ba	GlobalISel: Implement known bits for G_MERGE_VALUES	2020-08-27 14:07:18 -04:00
Mikhail Maltsev	ae1396c7d4	[ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics This patch adjusts the following ARM/AArch64 LLVM IR intrinsics: - neon_bfmmla - neon_bfmlalb - neon_bfmlalt so that they take and return bf16 and float types. Previously these intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from implementation lacking bf16 IR type). The neon_vbfdot[q] intrinsics are adjusted similarly. This change required some additional selection patterns for vbfdot itself and also for vector shuffles (in a previous patch) because of SelectionDAG transformations kicking in and mangling the original code. This patch makes the generated IR cleaner (less useless bitcasts are produced), but it does not affect the final assembly. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86146	2020-08-27 18:43:16 +01:00
Craig Topper	ba852e1e19	[X86] Don't call hasFnAttribute and getFnAttribute for 'prefer-vector-width' and 'min-legal-vector-width' in getSubtargetImpl We only need to call getFnAttribute and then check if the Attribute is None or not.	2020-08-27 10:40:20 -07:00
Owen Anderson	e9d9a61208	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan Differential Revision: D70800	2020-08-27 17:29:41 +00:00
Teresa Johnson	5b9d462b7d	[HeapProf] Fix bot failures from instrumentation pass Fix bot failure from 7ed8124d46f94601d5f1364becee9cee8538265e: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/8533 Since we are always using dynamic shadow, insertDynamicShadowAtFunctionEntry should always return true for modifying the function.	2020-08-27 10:21:19 -07:00
Aditya Nandakumar	db464a3dbf	[GISel] Add new GISel combiners for G_SELECT https://reviews.llvm.org/D83833 Patch adds two new GICombinerRules for G_SELECT. The rules include: combining selects with undef comparisons into their first selectee value, and to combine away selects with constant comparisons. Patch additionally adds a new combiner test for the AArch64 target to test these new G_SELECT combiner rules and the existing select_same_val combiner rule. Patch by mkitzan	2020-08-27 09:40:15 -07:00
Simon Moll	c48b06c44f	[sda][nfc] clang-formatting	2020-08-27 18:27:44 +02:00
Shinji Okumura	7a68f0f1e0	[Attributor] Add a phase flag to Attributor Add a new flag that indicates which stage in the process we are in. This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86678	2020-08-28 01:16:38 +09:00
Aditya Nandakumar	5c2db1655b	[GISel]: Fix one more CSE Non determinism https://reviews.llvm.org/D86676 Sometimes we can have the following code x:gpr(s32) = G_OP Say we build G_OP2 to the same x and then delete the previous instruction. Using something like Register X = ...; auto NewMIB = CSEBuilder.buildOp2(X, ... args); Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method. This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code. Also this patch adds this verification at the end of the combiners as well.	2020-08-27 09:06:21 -07:00
Lucas Prates	3d943bcd22	[CodeGen] Properly propagating Calling Convention information when lowering vector arguments When joining the legal parts of vector arguments into its original value during the lower of Formal Arguments in SelectionDAGBuilder, the Calling Convention information was not being propagated for the handling of each individual parts. The same did not happen when lowering calls, causing a mismatch. This patch fixes the issue by properly propagating the Calling Convention details. This fixes Bugzilla #47001. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86715	2020-08-27 17:01:10 +01:00
Teresa Johnson	7ed8124d46	[HeapProf] Clang and LLVM support for heap profiling instrumentation See RFC for background: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html Note that the runtime changes will be sent separately (hopefully this week, need to add some tests). This patch includes the LLVM pass to instrument memory accesses with either inline sequences to increment the access count in the shadow location, or alternatively to call into the runtime. It also changes calls to memset/memcpy/memmove to the equivalent runtime version. The pass is modeled on the address sanitizer pass. The clang changes add the driver option to invoke the new pass, and to link with the upcoming heap profiling runtime libraries. Currently there is no attempt to optimize the instrumentation, e.g. to aggregate updates to the same memory allocation. That will be implemented as follow on work. Differential Revision: https://reviews.llvm.org/D85948	2020-08-27 08:50:35 -07:00
Roman Lebedev	6102310d81	[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify, nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places.. While i could teach EarlyCSE how to hash PHI nodes, we can't really do much (anything?) even if we find two identical PHI nodes in different basic blocks, same-BB case is the interesting one, and if we teach InstSimplify about it (which is what i wanted originally, https://reviews.llvm.org/D86530), we get EarlyCSE support for free. So i would think this is pretty uncontroversial. On vanilla llvm test-suite + RawSpeed, this has the following effects: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|----------------------------------------------------\|-----------\|-----------\|-------:\|---------:\|---------:\| \| instsimplify.NumPHICSE \| 0 \| 23779 \| 23779 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 7942328 \| 7942392 \| 64 \| 0.00% \| 0.00% \| \| assembler.ObjectBytes \| 273069192 \| 273084704 \| 15512 \| 0.01% \| 0.01% \| \| correlated-value-propagation.NumPhis \| 18412 \| 18539 \| 127 \| 0.69% \| 0.69% \| \| early-cse.NumCSE \| 2183283 \| 2183227 \| -56 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 550105 \| 542090 \| -8015 \| -1.46% \| 1.46% \| \| instcombine.NumAggregateReconstructionsSimplified \| 73 \| 4506 \| 4433 \| 6072.60% \| 6072.60% \| \| instcombine.NumCombined \| 3640264 \| 3664769 \| 24505 \| 0.67% \| 0.67% \| \| instcombine.NumDeadInst \| 1778193 \| 1783183 \| 4990 \| 0.28% \| 0.28% \| \| instcount.NumCallInst \| 1758401 \| 1758799 \| 398 \| 0.02% \| 0.02% \| \| instcount.NumInvokeInst \| 59478 \| 59502 \| 24 \| 0.04% \| 0.04% \| \| instcount.NumPHIInst \| 330557 \| 330533 \| -24 \| -0.01% \| 0.01% \| \| instcount.TotalInsts \| 8831952 \| 8832286 \| 334 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 4300 \| 4410 \| 110 \| 2.56% \| 2.56% \| \| simplifycfg.NumSimpl \| 1019808 \| 999607 \| -20201 \| -1.98% \| 1.98% \| ``` I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call` transforms, and counter-intuitively results in more instructions total. That being said, the PHI count doesn't decrease that much, and looking at some examples, it seems at least some of them were previously getting PHI CSE'd in SimplifyCFG of all places.. I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time. As a comment in `InstCombinerImpl::visitPHINode()` already stated, there are no guarantees on the ordering of the operands of a PHI node, so if we just naively compare them, we may false-negatively say that the nodes are not equal when the only difference is operand order, which is especially important since the fold is in InstSimplify, so we can't rely on InstCombine sorting them beforehand. Fixing this for the general case is costly (geomean +0.02%), and does not appear to catch anything in test-suite, but for the same-BB case, it's trivial, so let's fix at least that. As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions this appears to cause geomean +0.03% compile time increase (regression), but geomean -0.01%..-0.04% code size decrease (improvement).	2020-08-27 18:47:04 +03:00
Alexandre Ganea	a6a37a2fcd	[Support] On Windows, add optional support for {rpmalloc\|snmalloc\|mimalloc} This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree. To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc. When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future. When enabled, this changes the memory stack from: new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc to: new/delete -> {rpmalloc\|snmalloc\|mimalloc} -> VirtualAlloc The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration. Differential Revision: https://reviews.llvm.org/D71786	2020-08-27 11:09:46 -04:00
diggerlin	6923b0a76e	Revert "[AIX][XCOFF] emit symbol visibility for xcoff object file." This reverts commit `a081868921`. Based on the Hubert Tong'comment https://reviews.llvm.org/D84265#inline-799085	2020-08-27 11:07:58 -04:00
Benjamin Kramer	b5924a8e27	[Hexagon] Fold another layer of single-use variable into assert. NFCI.	2020-08-27 16:52:34 +02:00
Benjamin Kramer	2b7df2707f	[Hexagon] Fold single-use variable into assert. NFCI.	2020-08-27 16:44:22 +02:00
Matt Arsenault	6c770a09be	AMDGPU: Hoist subtarget lookup	2020-08-27 10:27:56 -04:00
Krzysztof Parzyszek	154daf1f94	[Hexagon] Widen short vector stores to HVX vectors using masked stores Also invent a flag -hexagon-hvx-widen=N to set the minimum threshold for widening short vectors to HVX vectors.	2020-08-27 09:25:08 -05:00
Florian Hahn	419c6948df	[SimplifyLibCalls] Remove over-eager early return in strlen optzns. Currently we bail out early for strlen calls with a GEP operand, if none of the GEP specific optimizations fire. But there could be later optimizations that still apply, which we currently miss out on. An example is that we do not apply the following optimization strlen(x) == 0 --> *x == 0 Unless I am missing something, there seems to be no reason for bailing out early there. Fixes PR47149. Reviewed By: lebedev.ri, xbolva00 Differential Revision: https://reviews.llvm.org/D85886	2020-08-27 15:19:45 +01:00
Pavel Labath	9cb222e749	[cmake] Make gtest include directories a part of the library interface This applies the same fix that D84748 did for macro definitions. Appropriate include path is now automatically set for all libraries which link against gtest targets, which avoids the need to set include_directories in various parts of the project. Differential Revision: https://reviews.llvm.org/D86616	2020-08-27 15:35:57 +02:00
serge-sans-paille	4e29d25669	Fix OpenMP deduplicateRuntimeCalls return status Differential Revision: https://reviews.llvm.org/D86705	2020-08-27 15:01:04 +02:00
serge-sans-paille	5621571fc7	Fix Attributor return status Differential Revision: https://reviews.llvm.org/D86703	2020-08-27 15:01:04 +02:00
Jay Foad	45eeb8c2a9	[AMDGPU] Remove unused variable introduced in r251860	2020-08-27 13:28:32 +01:00
Drew Wock	0ec098e22b	[FPEnv] Allow fneg + strict_fadd -> strict_fsub in DAGCombiner This is the first of a set of DAGCombiner changes enabling strictfp optimizations. I want to test to waters with this to make sure changes like these are acceptable for the strictfp case- this particular change should preserve exception ordering and result precision perfectly, and many other possible changes appear to be able to as well. Copied from regular fadd combines but modified to preserve ordering via the chain, this change allows strict_fadd x, (fneg y) to become struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x. Differential Revision: https://reviews.llvm.org/D85548	2020-08-27 08:17:01 -04:00
Florian Hahn	bb024c3c4e	[DSE,MemorySSA] Remove short-cut to check if all paths are covered. The post-order number early continue does not work in some cases, e.g. if a path from EarlierAccess to an exit includes a node that dominates EarlierAccess in a cycle. The short-cut only has very minor impact on compile-time, so it seems straight-forward to remove it for now: http://llvm-compile-time-tracker.com/compare.php?from=062412e79fcfedf2cf004433e42036b0333e3f83&to=d7386016a77ce1387bdbbf360f1de157faea9d31&stat=instructions Fixes PR47285.	2020-08-27 12:42:40 +01:00
OCHyams	b6cca0ec05	Revert "[DWARF] Add cuttoff guarding quadratic validThroughout behaviour" This reverts commit `b9d977b0ca`. This cutoff is no longer required. The commit 34ffa7fc501 (D86153) introduces a performance improvement which was tested against the motivating case for this patch. Discussed in differential revision: https://reviews.llvm.org/D86153	2020-08-27 11:52:30 +01:00
OCHyams	57d8acac64	[DwarfDebug] Improve validThroughout performance (4/4) Almost NFC (see end). The backwards scan in validThroughout significantly contributed to compile time for a pathological case, causing the 'X86 Assembly Printer' pass to account for roughly 70% of the run time. This patch guards the loop against running unnecessarily, bringing the pass contribution down to 4%. Almost NFC: There is a hack in validThroughout which promotes single constant value DBG_VALUEs in the prologue to be live throughout the function. We're more likely to hit this code path with this patch applied. Similarly to the parent patches there is a small coverage change reported in the order of 10s of bytes. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86153	2020-08-27 11:52:30 +01:00
OCHyams	3c491881d2	[DwarfDebug] Improve multi-BB single location detection in validThroughout (3/4) With the changes introduced in D86151 we can now check for single locations which span multiple blocks for inlined scopes and blocks. D86151 introduced the InstructionOrdering parameter, replacing a scan through MBB instructions. The functionality to compare instruction positions across blocks was add there, and this patch just removes the exit checks that were previously (but no longer) required. CTMark shows a geomean binary size reduction of 2.2% for RelWithDebInfo builds. llvm-locstats (using D85636) shows a very small variable location coverage change in 5 of 10 binaries, but just like in D86151 it is only in the order of 10s of bytes. Reviewed By: djtodoro Differential Revision: https://reviews.llvm.org/D86152	2020-08-27 11:52:29 +01:00
OCHyams	0b5a8050ea	[DwarfDebug] Improve single location detection in validThroughout (2/4) With this patch we're now accounting for two more cases which should be considered 'valid throughout': First, where RangeEnd is ScopeEnd. Second, where RangeEnd comes before ScopeEnd when including meta instructions, but are both preceded by the same non-meta instruction. CTMark shows a geomean binary size reduction of 1.5% for RelWithDebInfo builds. `llvm-locstats` (using D85636) shows a very small variable location coverage change in 2 of 10 binaries, but it is in the order of 10s of bytes which lines up with my expectations. I've added a test which checks both of these new cases. The first check in the test isn't strictly necessary for this patch. But I'm not sure that it is explicitly tested anywhere else, and is useful for the final patch in the series. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86151	2020-08-27 11:52:29 +01:00
OCHyams	e048ea7b1a	[NFC][DebugInfo] Create InstructionOrdering helper class (1/4) Group the map and methods used to query instruction ordering for trimVarLocs (D82129) into a class. This will make it easier to reuse the functionality upcoming patches. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86150	2020-08-27 11:52:29 +01:00
Mikhail Maltsev	23d5e93f34	[AArch64] Optimize instruction selection for certain vector shuffles This patch adds code to recognize vector shuffles which can be represented as VDUP (splat) of a vector lane with of a different (wider) type than the original vector lane type. For example: shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1> is essentially: shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0> Such patterns are generated by the SelectionDAG machinery in some cases (see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double bitcasts from shuffles" part). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D86225	2020-08-27 11:06:49 +01:00
Paul Walker	81337c915f	[SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source. Differential Revision: https://reviews.llvm.org/D86394	2020-08-27 10:57:37 +01:00
Sander de Smalen	4e9b66de3f	[AArch64][SVE] Add missing debug info for ACLE types. This patch adds type information for SVE ACLE vector types, by describing them as vectors, with a lower bound of 0, and an upper bound described by a DWARF expression using the AArch64 Vector Granule register (VG), which contains the runtime multiple of 64bit granules in an SVE vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86101	2020-08-27 10:56:42 +01:00
Alex Richardson	5ba4d0365b	[RISC-V] fmv.s/fmv.d should be as cheap as a move Since the canonical floatig-point move is fsgnj rd, rs, rs, we should handle this case in RISCVInstrInfo::isAsCheapAsAMove(). Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D86518	2020-08-27 10:32:23 +01:00
Alex Richardson	a11eeb4d4a	[RISC-V] Mark C_MV as a move instruction Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D86517	2020-08-27 10:32:23 +01:00
Alex Richardson	2259ce8c91	[RISC-V] ADDI/ORI/XORI x, 0 should be as cheap as a move The isTriviallyRematerializable hook is only called for instructions that are tagged as isAsCheapAsAMove. Since ADDI 0 is used for "mv" it should definitely be marked with "isAsCheapAsAMove". This change avoids one stack spill in most of the atomic-rmw.ll tests functions. It also avoids stack spills in two of our out-of-tree CHERI tests. ORI/XORI with zero may or may not be the same as a move micro-architecturally, but since we are already doing it for register == x0, we might as well do the same if the immediate is zero. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D86480	2020-08-27 10:32:22 +01:00
Vitaly Buka	469debe027	[ValueTracking] Support select in findAllocaForValue	2020-08-27 02:13:52 -07:00
Florian Hahn	e717fdb0f1	[DSE,MemorySSA] Traverse use-def chain without MemSSA Walker. For DSE with MemorySSA it is beneficial to manually traverse the defining access, instead of using a MemorySSA walker, so we can better control the number of steps together with other limits and also weed out invalid/unprofitable paths early on. This patch requires a follow-up patch to be most effective, which I will share soon after putting this patch up. This temporarily XFAIL's the limit tests, because we now explore more MemoryDefs that may not alias/clobber the killing def. This will be improved/fixed by the follow-up patch. This patch also renames some `Dom` variables to `Earlier`, because the dominance relation is not really used/important here and potentially confusing. This patch allows us to aggressively cut down compile time, geomean -O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores removed. Subsequent patches will increase the number of removed stores again, while keeping compile-time in check. http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86486	2020-08-27 10:02:02 +01:00
Sjoerd Meijer	1d8af682ef	Revert "[Verifier] Additional check for intrinsic get.active.lane.mask" This reverts commit `8d5f64c4ed`. Thanks to Eli Friedma for pointing out that this check is not appropiate here, this check will be moved to the Lint pass.	2020-08-27 09:27:05 +01:00
Piotr Sobczak	4e9d207117	[AMDGPU] Preserve vcc_lo when shrinking V_CNDMASK There is no justification for changing vcc_lo to vcc when shrinking V_CNDMASK, and such a change could later confuse live variable analysis. Make sure the original register is preserved. Differential Revision: https://reviews.llvm.org/D86541	2020-08-27 10:22:50 +02:00
Shinji Okumura	6c25eca614	[Attributor] Add flag for undef value to the state of AAPotentialValues Currently, an undef value is reduced to 0 when it is added to a set of potential values. This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85592	2020-08-27 16:30:29 +09:00
Sam Parker	03141aa04a	[ARM] Enable outliner at -Oz for M-class Enable default outlining when the function has the minsize attribute and we're targeting an m-class core. Differential Revision: https://reviews.llvm.org/D82951	2020-08-27 08:02:56 +01:00
Martin Storsjö	04879086b4	Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain" This reverts commit `9936455204`. That commit caused failed assertions e.g. like this: $ cat alloca.c a; b() { float c; d(); a = __builtin_alloca(d); c = e(); f(a); return c; } $ clang -target aarch64-linux-gnu -c alloca.c -O2 clang: ../lib/Target/AArch64/AArch64InstrInfo.cpp:3446: void llvm::emitFrameOffset(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::iterator, const llvm::DebugLoc&, unsigned int, unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo, llvm::MachineInstr::MIFlag, bool, bool, bool): Assertion `(DestReg != AArch64::SP \|\| Bytes % 16 == 0) && "SP increment/decrement not 16-byte aligned"' failed.	2020-08-27 09:39:56 +03:00
luxufan	888c02deee	[RISCV] add the MC layer support of riscv vector Zvamo extension Implements the assemble and disassemble support of RISCV Vector extension zvamo instructions, base on the 0.9 spec version. Reviewed by HsiangKai Differential Revision: https://reviews.llvm.org/D85069	2020-08-27 14:11:38 +08:00
Sam Parker	a3e41d4581	[ARM] Make MachineVerifier more strict about terminators Fix the ARM backend's analyzeBranch so it doesn't ignore predicated return instructions, and make the MachineVerifier rule more strict. Differential Revision: https://reviews.llvm.org/D40061	2020-08-27 07:10:20 +01:00
Amy Kwan	76b0f99ea8	[PowerPC] Implement Vector Multiply High/Divide Extended Builtins in LLVM/Clang This patch implements the function prototypes vec_mulh and vec_dive in order to utilize the vector multiply high (vmulh[s\|u][w\|d]) and vector divide extended (vdive[s\|u][w\|d]) instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82609	2020-08-26 23:14:34 -05:00
Matt Arsenault	5207545a86	GlobalISel: IRTranslate minimum of pointer sizes on memcpy I forgot to squash this with `0b7f6cc71a`	2020-08-26 20:10:00 -04:00
Matt Arsenault	0b7f6cc71a	GlobalISel: Add generic instructions for memory intrinsics AArch64, X86 and Mips currently directly consumes these and custom lowering to produce a libcall, but really these should follow the normal legalization process through the libcall/lower action.	2020-08-26 20:08:45 -04:00
Lang Hames	605df8112c	[ORC][JITLink] Switch to unique ownership for EHFrameRegistrars. This will make stateful registrars (e.g. a future TargetProcessControl based registrar) easier to deal with.	2020-08-26 16:59:45 -07:00
Arthur Eubanks	486ed88533	[ConstProp] Remove ConstantPropagation As discussed in http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html. Currently no users outside of unit tests. Replace all instances in tests of -constprop with -instsimplify. Notable changes in tests: * vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead * InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D85159	2020-08-26 15:51:30 -07:00
Craig Topper	92d3e70df3	[X86] Change pentium4 tuning settings and scheduler model back to their values before D83913. Clang now defaults to -march=pentium4 -mtune=generic so we don't need modern tune settings on pentium4.	2020-08-26 15:38:12 -07:00
Alina Sbirlea	0b34226304	Use properlyDominates in RDFLiveness when sorting on dominance. Summary: When looking for all reaching definitions, we sort basic blocks on dominance. When sorting looking for properlyDominates() handles the case A == B. Authored by: pranavb Differential Revision: https://reviews.llvm.org/D86661	2020-08-26 15:16:40 -07:00
Ahmed Bougacha	383f7c8858	[AArch64] Use CCAssignFnForReturn helper in more spots. NFC. It was added for GISel, but SDAG could use it too!	2020-08-26 14:39:11 -07:00
Nikita Popov	d7c119d89c	[InstSimplify] Fold min/max intrinsic based on icmp of operands This is a reboot of D84655, now performing the inner icmp simplification query without undef folds. It should be possible to handle the current foldMinMaxSharedOp() fold based on this, by moving the logic into icmp of min/max instead, making it more general. We can't drop the folds for constant operands, because those also allow undef, which we exclude here. The tests use assumes for exhaustive coverage, and have a few more examples of misc folds we get based on icmp simplification. Differential Revision: https://reviews.llvm.org/D85929	2020-08-26 22:02:57 +02:00
Muhammad Asif Manzoor	fd536eeed9	[AArch64][SVE] Add lowering for llvm fceil Add the functionality to lower fceil for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84548	2020-08-26 15:59:44 -04:00
Owen Anderson	9936455204	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan	2020-08-26 19:38:38 +00:00
Sanjay Patel	54a5dd485c	[DAGCombiner] allow store merging non-i8 truncated ops We have a gap in our store merging capabilities for shift+truncate patterns as discussed in: https://llvm.org/PR46662 I generalized the code/comments for this function in earlier commits, so we only need ease the type restriction and adjust the address/endian checking to make this work. AArch64 lets us switch endian to make sure that patterns are matched either way. Differential Revision: https://reviews.llvm.org/D86420	2020-08-26 15:23:08 -04:00
Aleksandr Platonov	ceffd6993c	[Support][Windows] Fix incorrect GetFinalPathNameByHandleW() return value check in realPathFromHandle() `GetFinalPathNameByHandleW(,,N,)` returns: - `< N` on success (this value does not include the size of the terminating null character) - `>= N` if buffer is too small (this value includes the size of the terminating null character) So, when `N == Buffer.capacity() - 1`, we need to resize buffer if return value is > `Buffer.capacity() - 2`. Also, we can set `N` to `Buffer.capacity()`. Thus, without this patch `realPathFromHandle()` returns unfilled buffer when length of the final path of the file is equal to `Buffer.capacity()` or `Buffer.capacity() - 1`. Reviewed By: andrewng, amccarth Differential Revision: https://reviews.llvm.org/D86564	2020-08-26 22:11:44 +03:00
Arthur Eubanks	098d3f9827	[InstSimplify] Simplify to vector constants when possible InstSimplify should do all transformations that ConstProp does, but one thing that ConstProp does that InstSimplify wouldn't is inline vector instructions that are constants, e.g. into a ret. Previously vector instructions wouldn't be inlined in InstSimplify because llvm::Simplify*Instruction() would return nullptr for specific instructions, such as vector instructions that were actually constants, if it couldn't simplify them. This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and SimplifyShuffleVectorInst to return a vector constant when possible. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85946	2020-08-26 11:40:36 -07:00
Francesco Petrogalli	61dfa00957	[MC][SVE] Fix data operand for instruction alias of `st1d`. The version of `st1d` that operates with vector plus immediate addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was generating `<Zn>.s` instead of `<Zn>.d>`. Differential Revision: https://reviews.llvm.org/D86633	2020-08-26 18:22:17 +00:00
Steven Wu	476ca33089	[LTO] Don't apply LTOPostLink module flag during writeMergedModule For `ld64` which uses legacy LTOCodeGenerator, it relies on writeMergedModule to perform `ld -r` (generates a linked object file). If all the inputs to `ld -r` is fullLTO bitcode, `ld64` will linked the bitcode module, internalize all the symbols and write out another fullLTO bitcode object file. This bitcode file doesn't have all the bitcode inputs and it should not have LTOPostLink module flag. It will also cause error when this bitcode object file is linked with other LTO object file. Fix the issue by not applying LTOPostLink flag during writeMergedModule function. The flag should only be added when all the bitcode are linked and ready to be optimized. rdar://problem/58462798 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D84789	2020-08-26 11:17:45 -07:00
Krzysztof Parzyszek	e15143d31b	[Hexagon] Implement llvm.masked.load and llvm.masked.store for HVX	2020-08-26 13:10:22 -05:00
Matt Arsenault	f78687df9b	AMDGPU: Don't assert on misaligned DS read2/write2 offsets This would assert with unaligned DS access enabled. The offset may not be aligned. Theoretically the pattern predicate should check the memory alignment, although it is possible to have the memory be aligned but not the immediate offset. In this case I would expect it to use ds_{read\|write}_b64 with unaligned access, but am not clear if there's a reason it doesn't.	2020-08-26 14:08:05 -04:00
Wei Mi	c67ccf5faf	[SampleFDO] Enhance profile remapping support for searching inline instance and indirect call promotion candidate. Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change. However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change. To add that support, before any remapping lookup happens, all the names in the profile will be inserted into remapper and the Key to the name mapping will be recorded in a map called NameMap in the remapper. During name lookup, a Key will be returned for the given name and it will be used to extract an equivalent name in the profile from NameMap. So with the help of the NameMap, we can translate any given name to an equivalent name in the profile if it exists. Whenever we try to match a name in the module to a name in the profile, we will try the match with the original name first, and if it doesn't match, we will use the equivalent name got from remapper to try the match for another time. In this way, the patch can enhance the profile remapping support for searching inline instance and searching indirect call promotion candidate. In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. If existing profile remapping was enabled, the performance degradation dropped to 1.2%. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14% (Note the experiment was done before searching indirect call promotion candidate was added. We hope with the remapping support of searching indirect call promotion candidate, the degradation can drop to 0% in the end. It will be evaluated post commit). Differential Revision: https://reviews.llvm.org/D86332	2020-08-26 11:07:35 -07:00
Juneyoung Lee	684b43c0cf	[IR] Add NoUndef attribute to Intrinsics.td This patch adds NoUndef to Intrinsics.td. The attribute is attached to llvm.assume's operand, because llvm.assume(undef) is UB. It is attached to pointer operands of several memory accessing intrinsics as well. This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check unnecessary, so it is removed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86576	2020-08-27 02:54:48 +09:00
Craig Topper	09288bcbf5	[X86] Add assembler support for .d32 and .d8 mnemonic suffixes to control displacement size. This is an older syntax than the {disp32} and {disp8} pseudo prefixes that were added a few weeks ago. We can reuse most of the support for that to support .d32 and .d8 as well.	2020-08-26 10:45:50 -07:00
Roman Lebedev	95848ea101	[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks As FIXME said, they really should be checking for a single user, not use, so let's do that. It is not that unusual to have the same value as incoming value in a PHI node, not unlike how a PHI may have the same incoming basic block more than once. There isn't a nice way to do that, Value::users() isn't uniqified, and Value only tracks it's uses, not Users, so the check is potentially costly since it does indeed potentially involes traversing the entire use list of a value.	2020-08-26 20:20:41 +03:00
Owen Anderson	9061eb8245	Revert "Fix frame pointer layout on AArch64 Linux." This broke stage2 of clang-cmake-aarch64-full. This reverts commit `a0aed80b22`.	2020-08-26 17:17:14 +00:00
aartbik	72305a08ff	[llvm] [DAG] Fix bug in llvm.get.active.lane.mask lowering This intrinsic only accepted proper machine vector lengths. Fixed by this change. With unit tests. https://bugs.llvm.org/show_bug.cgi?id=47299 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D86585	2020-08-26 10:16:31 -07:00
Steven Wu	34b289b6db	[ThinLTO][Legacy] Compute PreservedGUID based on IRName in Symtab Instead of computing GUID based on some assumption about symbol mangling rule from IRName to symbol name, lookup the IRName from all the symtabs from all the input files to see if there are any matching symbols entry provides the IRName for GUID computation. rdar://65853754 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D84803	2020-08-26 10:15:00 -07:00
jasonliu	413054400d	[XCOFF][AIX] Support relocation generation for large code model Summary: Support TOCU and TOCL relocation type for object file generation. Reviewed by: DiggerLin Differential Revision: https://reviews.llvm.org/D84549	2020-08-26 17:12:28 +00:00
Craig Topper	28bd47fc47	[LegalizeTypes] Remove WidenVecRes_Shift and just use WidenVecRes_Binary This function seems to allow for the shift amount to have a different type than the result, but I don't think we do that anywhere else for vector shifts. We also don't have any support for legalizing the shift amount alone if the result is legal and the shift amount type isn't. The code coverage report here shows this code as uncovered http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp.html Differential Revision: https://reviews.llvm.org/D86475	2020-08-26 09:57:41 -07:00
Kai Nacke	ed07e1fe0f	[SystemZ/ZOS] Add header file to encapsulate use of <sysexits.h> The non-standard header file `<sysexits.h>` provides some return values. `EX_IOERR` is used to as a special value to signal a broken pipe to the clang driver. On z/OS Unix System Services, this header file does not exists. This patch - adds a check for `<sysexits.h>`, removing the dependency on `LLVM_ON_UNIX` - adds a new header file `llvm/Support/ExitCodes`, which either includes `<sysexits.h>` or defines `EX_IOERR` - updates the users of `EX_IOERR` to include the new header file Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D83472	2020-08-26 12:44:30 -04:00
Owen Anderson	a0aed80b22	Fix frame pointer layout on AArch64 Linux. When floating point callee-saved registers were used, the frame pointer would incorrectly point to the bottom of the CSR space (containing saved floating-point registers), rather than to the frame record. While all frame offsets were calculated consistently, resulting in working code, this prevented stack walkers from being about to traverse the frame list.	2020-08-26 16:09:49 +00:00
Sjoerd Meijer	bda8fbe2d2	[LV] Fallback strategies if tail-folding fails This implements 2 different vectorisation fallback strategies if tail-folding fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This can be controlled with option -prefer-predicate-over-epilogue, that has been changed to take a numeric value corresponding to the tail-folding preference and preferred fallback. Patch by: Pierre van Houtryve, Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D79783	2020-08-26 16:55:25 +01:00
Jay Foad	a75e67b3b4	[AMDGPU] Make more use of Subtarget reference in SIInstrInfo	2020-08-26 15:04:00 +01:00
Jay Foad	75d159f924	[LegalizeTypes] Add ROTL/ROTR to ScalarizeVectorResult. We can scalarize these just like any other binary operation. Fixes https://bugs.llvm.org/show_bug.cgi?id=47303 caused by D77152. Differential Revision: https://reviews.llvm.org/D86601	2020-08-26 14:42:57 +01:00
Dibya Ranjan Mishra	a7da7e421c	[Support] Allow printing the stack trace only for a given depth Differential Revision: https://reviews.llvm.org/D85458	2020-08-26 09:27:42 -04:00
Matt Arsenault	ff34116cf0	AMDGPU: Use Subtarget reference in SIInstrInfo	2020-08-26 09:18:41 -04:00
Matt Arsenault	21ccedc24f	AMDGPU/GlobalISel: Tolerate negated control flow intrinsic outputs If the condition output is negated, swap the branch targets. This is similar to what SelectionDAG does for when SelectionDAGBuilder decides to invert the condition and swap the branches. This is leaving behind a dead constant def for some reason.	2020-08-26 08:58:54 -04:00
Matt Arsenault	eb074088c9	GlobalISel: Combine G_ADD of G_PTRTOINT to G_PTR_ADD This produces less work for addressing mode matching. I think this is safe since I don't think machine IR is supposed to give the same aliasing properties as getelementptr in the IR.	2020-08-26 08:57:15 -04:00
Jay Foad	831457c6d5	[AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guaranteed to come to the same point at the same time. This is the same optimization that was implemented for SelectionDAG in D31731. Differential Revision: https://reviews.llvm.org/D86609	2020-08-26 13:47:51 +01:00
Xing GUO	8daa3264a3	[DWARFYAML] Make the unit_length and header_length fields optional. This patch makes the unit_length and header_length fields of line tables optional. yaml2obj is able to infer them for us. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86590	2020-08-26 20:35:10 +08:00
QingShan Zhang	ebf3b188c6	[Scheduling] Implement a new way to cluster loads/stores Before calling target hook to determine if two loads/stores are clusterable, we put them into different groups to avoid fake cluster due to dependency. For now, we are putting the loads/stores into the same group if they have the same predecessor. We assume that, if two loads/stores have the same predecessor, it is likely that, they didn't have dependency for each other. However, one SUnit might have several predecessors and for now, we just pick up the first predecessor that has non-data/non-artificial dependency, which is too arbitrary. And we are struggling to fix it. So, I am proposing some better implementation. 1. Collect all the loads/stores that has memory info first to reduce the complexity. 2. Sort these loads/stores so that we can stop the seeking as early as possible. 3. For each load/store, seeking for the first non-dependency instruction with the sorted order, and check if they can cluster or not. Reviewed By: Jay Foad Differential Revision: https://reviews.llvm.org/D85517	2020-08-26 12:33:59 +00:00
David Green	677c1590c0	[ARM] Increase MVE gather/scatter cost by MVECostFactor. MVE Gather scatter codegeneration is looking a lot better than it used to, but still has some issues. The instructions we currently model as 1 cycle per element, which is a bit low for some cases. Increasing the cost by the MVECostFactor brings them in-line with our other instruction costs. This will have the effect of only generating then when the extra benefit is more likely to overcome some of the issues. Notably in running out of registers and vectorizing loops that could otherwise be SLP vectorized. In the short-term whilst we look at other ways of dealing with those more directly, we can increase the costs of gathers to make them more likely to be beneficial when created. Differential Revision: https://reviews.llvm.org/D86444	2020-08-26 13:03:46 +01:00
Sam Tebbs	85dd852a0d	[RDA] Don't visit the BB of the instruction in getReachingUniqueMIDef If the basic block of the instruction passed to getUniqueReachingMIDef is a transitive predecessor of itself and has a definition of the register, the function will return that definition even if it is after the instruction given to the function. This patch stops the function from scanning the instruction's basic block to prevent this. Differential Revision: https://reviews.llvm.org/D86607	2020-08-26 12:40:39 +01:00
Pierre Gousseau	cda6b09242	[X86] Make sure we do not clobber RBX with mwaitx when used as a base pointer. mwaitx uses EBX as one of its argument. Using this instruction clobbers RBX as it is defined to hold one of the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. This patch is adapted from @qcolombet patch for cmpxchg at r263325. This fixes PR43528. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D73475	2020-08-26 11:20:31 +01:00
Cullen Rhodes	1f44dfb640	[AArch64][AsmParser] Fix bug in operand printer The switch in AArch64Operand::print was changed in D45688 so the shift can be printed after printing the register. This is implemented with LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between the register and shift operands. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D86535	2020-08-26 09:31:36 +00:00
Sander de Smalen	5f47d4456d	[AArch64][SVE] Fix calculation restore point for SVE callee saves. This fixes an issue where the restore point of callee-saves in the function epilogues was incorrectly calculated when the basic block consisted of only a RET instruction. This caused dealloc instructions to be inserted in between the block of callee-save restore instructions, rather than before it. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86099	2020-08-26 10:02:31 +01:00
Jan Kratochvil	b20a4e293c	[Support] Speedup llvm-dwarfdump 3.9x Currently `strace llvm-dwarfdump x.debug >/tmp/file`: ioctl(1, TCGETS, 0x7ffd64d7f340) = -1 ENOTTY (Inappropriate ioctl for device) write(1, " DW_AT_decl_line\t(89)\n"..., 4096) = 4096 ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7ffd64d7f410) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device) After this patch: write(1, "0000000000001102 \"strlen\")\n "..., 4096) = 4096 write(1, "site\n DW_AT_low"..., 4096) = 4096 write(1, "d53)\n\n0x000e4d4d: DW_TAG_G"..., 4096) = 4096 The same speedup can be achieved by `--color=0` but that is not much convenient. This implementation has been suggested by Joerg Sonnenberger. Differential Revision: https://reviews.llvm.org/D86406	2020-08-26 10:29:46 +02:00
Jay Foad	b7e3599a22	[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT Differential Revision: https://reviews.llvm.org/D86449	2020-08-26 09:20:46 +01:00
Shinji Okumura	3050713798	[Attributor] Provide an edge-based interface in AAIsDead This patch produces an edge-based interface in AAIsDead. By this, we can query a set of basic blocks that are directly reachable from a given basic block. This is specifically useful for implementation of AAReachability. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85547	2020-08-26 16:57:52 +09:00
Roman Lebedev	1f90d45b9e	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) This is a reland of the original commit `fcb51d8c24`, because originally i forgot to ensure that the base aggregate types match. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:57:50 +03:00
Roman Lebedev	c295c6f2c0	Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad" This reverts commit `fcb51d8c24`. As buildbots report, there's apparently some missing check to ensure that the types of incoming values match the type of PHI. Let's revert for a moment.	2020-08-26 09:23:22 +03:00
Roman Lebedev	fcb51d8c24	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:08:24 +03:00
Jianzhou Zhao	4784987027	Fix a 32-bit overflow issue when reading LTO-generated bitcode files whose strtab are of size > 2^29 This happens when using -flto and -Wl,--plugin-opt=emit-llvm to create a linked LTO bitcode file, and the bitcode file has a strtab with size > 2^29. All the issues relate to a pattern like this size_t x64 = y64 + z32 * C When z32 is >= (2^32)/C, z32 * C overflows. Reviewed-by: MaskRay Differential Revision: https://reviews.llvm.org/D86500	2020-08-26 05:47:22 +00:00

... 3 4 5 6 7 ...

138742 Commits