llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	5207545a86	GlobalISel: IRTranslate minimum of pointer sizes on memcpy I forgot to squash this with `0b7f6cc71a`	2020-08-26 20:10:00 -04:00
Matt Arsenault	0b7f6cc71a	GlobalISel: Add generic instructions for memory intrinsics AArch64, X86 and Mips currently directly consumes these and custom lowering to produce a libcall, but really these should follow the normal legalization process through the libcall/lower action.	2020-08-26 20:08:45 -04:00
Lang Hames	605df8112c	[ORC][JITLink] Switch to unique ownership for EHFrameRegistrars. This will make stateful registrars (e.g. a future TargetProcessControl based registrar) easier to deal with.	2020-08-26 16:59:45 -07:00
Arthur Eubanks	486ed88533	[ConstProp] Remove ConstantPropagation As discussed in http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html. Currently no users outside of unit tests. Replace all instances in tests of -constprop with -instsimplify. Notable changes in tests: * vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead * InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp Reviewed By: lattner, nikic Differential Revision: https://reviews.llvm.org/D85159	2020-08-26 15:51:30 -07:00
Craig Topper	92d3e70df3	[X86] Change pentium4 tuning settings and scheduler model back to their values before D83913. Clang now defaults to -march=pentium4 -mtune=generic so we don't need modern tune settings on pentium4.	2020-08-26 15:38:12 -07:00
Alina Sbirlea	0b34226304	Use properlyDominates in RDFLiveness when sorting on dominance. Summary: When looking for all reaching definitions, we sort basic blocks on dominance. When sorting looking for properlyDominates() handles the case A == B. Authored by: pranavb Differential Revision: https://reviews.llvm.org/D86661	2020-08-26 15:16:40 -07:00
Ahmed Bougacha	383f7c8858	[AArch64] Use CCAssignFnForReturn helper in more spots. NFC. It was added for GISel, but SDAG could use it too!	2020-08-26 14:39:11 -07:00
Nikita Popov	d7c119d89c	[InstSimplify] Fold min/max intrinsic based on icmp of operands This is a reboot of D84655, now performing the inner icmp simplification query without undef folds. It should be possible to handle the current foldMinMaxSharedOp() fold based on this, by moving the logic into icmp of min/max instead, making it more general. We can't drop the folds for constant operands, because those also allow undef, which we exclude here. The tests use assumes for exhaustive coverage, and have a few more examples of misc folds we get based on icmp simplification. Differential Revision: https://reviews.llvm.org/D85929	2020-08-26 22:02:57 +02:00
Muhammad Asif Manzoor	fd536eeed9	[AArch64][SVE] Add lowering for llvm fceil Add the functionality to lower fceil for passthru variant Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84548	2020-08-26 15:59:44 -04:00
Owen Anderson	9936455204	Reapply D70800: Fix AArch64 AAPCS frame record chain Original Commit Message: After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register) may be placed in the middle of a stack frame if a function has both callee-saved general-purpose registers and floating point registers. This will break the stack unwinders that simply walk through the frame records (based on the guarantee from AAPCS64 "The Frame Pointer" section). This commit fixes the problem by adding the frame record offset. Patch By: logan	2020-08-26 19:38:38 +00:00
Sanjay Patel	54a5dd485c	[DAGCombiner] allow store merging non-i8 truncated ops We have a gap in our store merging capabilities for shift+truncate patterns as discussed in: https://llvm.org/PR46662 I generalized the code/comments for this function in earlier commits, so we only need ease the type restriction and adjust the address/endian checking to make this work. AArch64 lets us switch endian to make sure that patterns are matched either way. Differential Revision: https://reviews.llvm.org/D86420	2020-08-26 15:23:08 -04:00
Aleksandr Platonov	ceffd6993c	[Support][Windows] Fix incorrect GetFinalPathNameByHandleW() return value check in realPathFromHandle() `GetFinalPathNameByHandleW(,,N,)` returns: - `< N` on success (this value does not include the size of the terminating null character) - `>= N` if buffer is too small (this value includes the size of the terminating null character) So, when `N == Buffer.capacity() - 1`, we need to resize buffer if return value is > `Buffer.capacity() - 2`. Also, we can set `N` to `Buffer.capacity()`. Thus, without this patch `realPathFromHandle()` returns unfilled buffer when length of the final path of the file is equal to `Buffer.capacity()` or `Buffer.capacity() - 1`. Reviewed By: andrewng, amccarth Differential Revision: https://reviews.llvm.org/D86564	2020-08-26 22:11:44 +03:00
Arthur Eubanks	098d3f9827	[InstSimplify] Simplify to vector constants when possible InstSimplify should do all transformations that ConstProp does, but one thing that ConstProp does that InstSimplify wouldn't is inline vector instructions that are constants, e.g. into a ret. Previously vector instructions wouldn't be inlined in InstSimplify because llvm::Simplify*Instruction() would return nullptr for specific instructions, such as vector instructions that were actually constants, if it couldn't simplify them. This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and SimplifyShuffleVectorInst to return a vector constant when possible. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85946	2020-08-26 11:40:36 -07:00
Francesco Petrogalli	61dfa00957	[MC][SVE] Fix data operand for instruction alias of `st1d`. The version of `st1d` that operates with vector plus immediate addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was generating `<Zn>.s` instead of `<Zn>.d>`. Differential Revision: https://reviews.llvm.org/D86633	2020-08-26 18:22:17 +00:00
Steven Wu	476ca33089	[LTO] Don't apply LTOPostLink module flag during writeMergedModule For `ld64` which uses legacy LTOCodeGenerator, it relies on writeMergedModule to perform `ld -r` (generates a linked object file). If all the inputs to `ld -r` is fullLTO bitcode, `ld64` will linked the bitcode module, internalize all the symbols and write out another fullLTO bitcode object file. This bitcode file doesn't have all the bitcode inputs and it should not have LTOPostLink module flag. It will also cause error when this bitcode object file is linked with other LTO object file. Fix the issue by not applying LTOPostLink flag during writeMergedModule function. The flag should only be added when all the bitcode are linked and ready to be optimized. rdar://problem/58462798 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D84789	2020-08-26 11:17:45 -07:00
Krzysztof Parzyszek	e15143d31b	[Hexagon] Implement llvm.masked.load and llvm.masked.store for HVX	2020-08-26 13:10:22 -05:00
Matt Arsenault	f78687df9b	AMDGPU: Don't assert on misaligned DS read2/write2 offsets This would assert with unaligned DS access enabled. The offset may not be aligned. Theoretically the pattern predicate should check the memory alignment, although it is possible to have the memory be aligned but not the immediate offset. In this case I would expect it to use ds_{read\|write}_b64 with unaligned access, but am not clear if there's a reason it doesn't.	2020-08-26 14:08:05 -04:00
Wei Mi	c67ccf5faf	[SampleFDO] Enhance profile remapping support for searching inline instance and indirect call promotion candidate. Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change. However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change. To add that support, before any remapping lookup happens, all the names in the profile will be inserted into remapper and the Key to the name mapping will be recorded in a map called NameMap in the remapper. During name lookup, a Key will be returned for the given name and it will be used to extract an equivalent name in the profile from NameMap. So with the help of the NameMap, we can translate any given name to an equivalent name in the profile if it exists. Whenever we try to match a name in the module to a name in the profile, we will try the match with the original name first, and if it doesn't match, we will use the equivalent name got from remapper to try the match for another time. In this way, the patch can enhance the profile remapping support for searching inline instance and searching indirect call promotion candidate. In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. If existing profile remapping was enabled, the performance degradation dropped to 1.2%. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14% (Note the experiment was done before searching indirect call promotion candidate was added. We hope with the remapping support of searching indirect call promotion candidate, the degradation can drop to 0% in the end. It will be evaluated post commit). Differential Revision: https://reviews.llvm.org/D86332	2020-08-26 11:07:35 -07:00
Juneyoung Lee	684b43c0cf	[IR] Add NoUndef attribute to Intrinsics.td This patch adds NoUndef to Intrinsics.td. The attribute is attached to llvm.assume's operand, because llvm.assume(undef) is UB. It is attached to pointer operands of several memory accessing intrinsics as well. This change makes ValueTracking::getGuaranteedNonPoisonOps' intrinsic check unnecessary, so it is removed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86576	2020-08-27 02:54:48 +09:00
Craig Topper	09288bcbf5	[X86] Add assembler support for .d32 and .d8 mnemonic suffixes to control displacement size. This is an older syntax than the {disp32} and {disp8} pseudo prefixes that were added a few weeks ago. We can reuse most of the support for that to support .d32 and .d8 as well.	2020-08-26 10:45:50 -07:00
Roman Lebedev	95848ea101	[Value][InstCombine] Fix one-use checks in PHI-of-op -> Op-of-PHI[s] transforms to be one-user checks As FIXME said, they really should be checking for a single user, not use, so let's do that. It is not that unusual to have the same value as incoming value in a PHI node, not unlike how a PHI may have the same incoming basic block more than once. There isn't a nice way to do that, Value::users() isn't uniqified, and Value only tracks it's uses, not Users, so the check is potentially costly since it does indeed potentially involes traversing the entire use list of a value.	2020-08-26 20:20:41 +03:00
Owen Anderson	9061eb8245	Revert "Fix frame pointer layout on AArch64 Linux." This broke stage2 of clang-cmake-aarch64-full. This reverts commit `a0aed80b22`.	2020-08-26 17:17:14 +00:00
aartbik	72305a08ff	[llvm] [DAG] Fix bug in llvm.get.active.lane.mask lowering This intrinsic only accepted proper machine vector lengths. Fixed by this change. With unit tests. https://bugs.llvm.org/show_bug.cgi?id=47299 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D86585	2020-08-26 10:16:31 -07:00
Steven Wu	34b289b6db	[ThinLTO][Legacy] Compute PreservedGUID based on IRName in Symtab Instead of computing GUID based on some assumption about symbol mangling rule from IRName to symbol name, lookup the IRName from all the symtabs from all the input files to see if there are any matching symbols entry provides the IRName for GUID computation. rdar://65853754 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D84803	2020-08-26 10:15:00 -07:00
jasonliu	413054400d	[XCOFF][AIX] Support relocation generation for large code model Summary: Support TOCU and TOCL relocation type for object file generation. Reviewed by: DiggerLin Differential Revision: https://reviews.llvm.org/D84549	2020-08-26 17:12:28 +00:00
Craig Topper	28bd47fc47	[LegalizeTypes] Remove WidenVecRes_Shift and just use WidenVecRes_Binary This function seems to allow for the shift amount to have a different type than the result, but I don't think we do that anywhere else for vector shifts. We also don't have any support for legalizing the shift amount alone if the result is legal and the shift amount type isn't. The code coverage report here shows this code as uncovered http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp.html Differential Revision: https://reviews.llvm.org/D86475	2020-08-26 09:57:41 -07:00
Kai Nacke	ed07e1fe0f	[SystemZ/ZOS] Add header file to encapsulate use of <sysexits.h> The non-standard header file `<sysexits.h>` provides some return values. `EX_IOERR` is used to as a special value to signal a broken pipe to the clang driver. On z/OS Unix System Services, this header file does not exists. This patch - adds a check for `<sysexits.h>`, removing the dependency on `LLVM_ON_UNIX` - adds a new header file `llvm/Support/ExitCodes`, which either includes `<sysexits.h>` or defines `EX_IOERR` - updates the users of `EX_IOERR` to include the new header file Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D83472	2020-08-26 12:44:30 -04:00
Owen Anderson	a0aed80b22	Fix frame pointer layout on AArch64 Linux. When floating point callee-saved registers were used, the frame pointer would incorrectly point to the bottom of the CSR space (containing saved floating-point registers), rather than to the frame record. While all frame offsets were calculated consistently, resulting in working code, this prevented stack walkers from being about to traverse the frame list.	2020-08-26 16:09:49 +00:00
Sjoerd Meijer	bda8fbe2d2	[LV] Fallback strategies if tail-folding fails This implements 2 different vectorisation fallback strategies if tail-folding fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This can be controlled with option -prefer-predicate-over-epilogue, that has been changed to take a numeric value corresponding to the tail-folding preference and preferred fallback. Patch by: Pierre van Houtryve, Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D79783	2020-08-26 16:55:25 +01:00
Jay Foad	a75e67b3b4	[AMDGPU] Make more use of Subtarget reference in SIInstrInfo	2020-08-26 15:04:00 +01:00
Jay Foad	75d159f924	[LegalizeTypes] Add ROTL/ROTR to ScalarizeVectorResult. We can scalarize these just like any other binary operation. Fixes https://bugs.llvm.org/show_bug.cgi?id=47303 caused by D77152. Differential Revision: https://reviews.llvm.org/D86601	2020-08-26 14:42:57 +01:00
Dibya Ranjan Mishra	a7da7e421c	[Support] Allow printing the stack trace only for a given depth Differential Revision: https://reviews.llvm.org/D85458	2020-08-26 09:27:42 -04:00
Matt Arsenault	ff34116cf0	AMDGPU: Use Subtarget reference in SIInstrInfo	2020-08-26 09:18:41 -04:00
Matt Arsenault	21ccedc24f	AMDGPU/GlobalISel: Tolerate negated control flow intrinsic outputs If the condition output is negated, swap the branch targets. This is similar to what SelectionDAG does for when SelectionDAGBuilder decides to invert the condition and swap the branches. This is leaving behind a dead constant def for some reason.	2020-08-26 08:58:54 -04:00
Matt Arsenault	eb074088c9	GlobalISel: Combine G_ADD of G_PTRTOINT to G_PTR_ADD This produces less work for addressing mode matching. I think this is safe since I don't think machine IR is supposed to give the same aliasing properties as getelementptr in the IR.	2020-08-26 08:57:15 -04:00
Jay Foad	831457c6d5	[AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size If a workgroup size is known to be not greater than wavefront size the s_barrier instruction is not needed since all threads are guaranteed to come to the same point at the same time. This is the same optimization that was implemented for SelectionDAG in D31731. Differential Revision: https://reviews.llvm.org/D86609	2020-08-26 13:47:51 +01:00
Xing GUO	8daa3264a3	[DWARFYAML] Make the unit_length and header_length fields optional. This patch makes the unit_length and header_length fields of line tables optional. yaml2obj is able to infer them for us. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86590	2020-08-26 20:35:10 +08:00
QingShan Zhang	ebf3b188c6	[Scheduling] Implement a new way to cluster loads/stores Before calling target hook to determine if two loads/stores are clusterable, we put them into different groups to avoid fake cluster due to dependency. For now, we are putting the loads/stores into the same group if they have the same predecessor. We assume that, if two loads/stores have the same predecessor, it is likely that, they didn't have dependency for each other. However, one SUnit might have several predecessors and for now, we just pick up the first predecessor that has non-data/non-artificial dependency, which is too arbitrary. And we are struggling to fix it. So, I am proposing some better implementation. 1. Collect all the loads/stores that has memory info first to reduce the complexity. 2. Sort these loads/stores so that we can stop the seeking as early as possible. 3. For each load/store, seeking for the first non-dependency instruction with the sorted order, and check if they can cluster or not. Reviewed By: Jay Foad Differential Revision: https://reviews.llvm.org/D85517	2020-08-26 12:33:59 +00:00
David Green	677c1590c0	[ARM] Increase MVE gather/scatter cost by MVECostFactor. MVE Gather scatter codegeneration is looking a lot better than it used to, but still has some issues. The instructions we currently model as 1 cycle per element, which is a bit low for some cases. Increasing the cost by the MVECostFactor brings them in-line with our other instruction costs. This will have the effect of only generating then when the extra benefit is more likely to overcome some of the issues. Notably in running out of registers and vectorizing loops that could otherwise be SLP vectorized. In the short-term whilst we look at other ways of dealing with those more directly, we can increase the costs of gathers to make them more likely to be beneficial when created. Differential Revision: https://reviews.llvm.org/D86444	2020-08-26 13:03:46 +01:00
Sam Tebbs	85dd852a0d	[RDA] Don't visit the BB of the instruction in getReachingUniqueMIDef If the basic block of the instruction passed to getUniqueReachingMIDef is a transitive predecessor of itself and has a definition of the register, the function will return that definition even if it is after the instruction given to the function. This patch stops the function from scanning the instruction's basic block to prevent this. Differential Revision: https://reviews.llvm.org/D86607	2020-08-26 12:40:39 +01:00
Pierre Gousseau	cda6b09242	[X86] Make sure we do not clobber RBX with mwaitx when used as a base pointer. mwaitx uses EBX as one of its argument. Using this instruction clobbers RBX as it is defined to hold one of the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. This patch is adapted from @qcolombet patch for cmpxchg at r263325. This fixes PR43528. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D73475	2020-08-26 11:20:31 +01:00
Cullen Rhodes	1f44dfb640	[AArch64][AsmParser] Fix bug in operand printer The switch in AArch64Operand::print was changed in D45688 so the shift can be printed after printing the register. This is implemented with LLVM_FALLTHROUGH and was broken in D52485 when BTIHint was put between the register and shift operands. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D86535	2020-08-26 09:31:36 +00:00
Sander de Smalen	5f47d4456d	[AArch64][SVE] Fix calculation restore point for SVE callee saves. This fixes an issue where the restore point of callee-saves in the function epilogues was incorrectly calculated when the basic block consisted of only a RET instruction. This caused dealloc instructions to be inserted in between the block of callee-save restore instructions, rather than before it. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86099	2020-08-26 10:02:31 +01:00
Jan Kratochvil	b20a4e293c	[Support] Speedup llvm-dwarfdump 3.9x Currently `strace llvm-dwarfdump x.debug >/tmp/file`: ioctl(1, TCGETS, 0x7ffd64d7f340) = -1 ENOTTY (Inappropriate ioctl for device) write(1, " DW_AT_decl_line\t(89)\n"..., 4096) = 4096 ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7ffd64d7f410) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7ffd64d7f400) = -1 ENOTTY (Inappropriate ioctl for device) After this patch: write(1, "0000000000001102 \"strlen\")\n "..., 4096) = 4096 write(1, "site\n DW_AT_low"..., 4096) = 4096 write(1, "d53)\n\n0x000e4d4d: DW_TAG_G"..., 4096) = 4096 The same speedup can be achieved by `--color=0` but that is not much convenient. This implementation has been suggested by Joerg Sonnenberger. Differential Revision: https://reviews.llvm.org/D86406	2020-08-26 10:29:46 +02:00
Jay Foad	b7e3599a22	[SelectionDAG] Handle non-power-of-2 bitwidths in expandROT Differential Revision: https://reviews.llvm.org/D86449	2020-08-26 09:20:46 +01:00
Shinji Okumura	3050713798	[Attributor] Provide an edge-based interface in AAIsDead This patch produces an edge-based interface in AAIsDead. By this, we can query a set of basic blocks that are directly reachable from a given basic block. This is specifically useful for implementation of AAReachability. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85547	2020-08-26 16:57:52 +09:00
Roman Lebedev	1f90d45b9e	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) This is a reland of the original commit `fcb51d8c24`, because originally i forgot to ensure that the base aggregate types match. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:57:50 +03:00
Roman Lebedev	c295c6f2c0	Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad" This reverts commit `fcb51d8c24`. As buildbots report, there's apparently some missing check to ensure that the types of incoming values match the type of PHI. Let's revert for a moment.	2020-08-26 09:23:22 +03:00
Roman Lebedev	fcb51d8c24	[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad While since D86306 we do it's sibling fold for `insertvalue`, we should also do this for `extractvalue`'s. And unlike that one, the results here are, quite honestly, shocking, as it can be observed here on vanilla llvm test-suite + RawSpeed results: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \|%\| \| \|----------------------------------------------------\|-----------\|-----------\|--------:\|--------:\|-------:\| \| asm-printer.EmittedInsts \| 7945095 \| 7942507 \| -2588 \| -0.03% \| 0.03% \| \| assembler.ObjectBytes \| 273209920 \| 273069800 \| -140120 \| -0.05% \| 0.05% \| \| early-cse.NumCSE \| 2183363 \| 2183398 \| 35 \| 0.00% \| 0.00% \| \| early-cse.NumSimplify \| 541847 \| 550017 \| 8170 \| 1.51% \| 1.51% \| \| instcombine.NumAggregateReconstructionsSimplified \| 2139 \| 108 \| -2031 \| -94.95% \| 94.95% \| \| instcombine.NumCombined \| 3601364 \| 3635448 \| 34084 \| 0.95% \| 0.95% \| \| instcombine.NumConstProp \| 27153 \| 27157 \| 4 \| 0.01% \| 0.01% \| \| instcombine.NumDeadInst \| 1694521 \| 1765022 \| 70501 \| 4.16% \| 4.16% \| \| instcombine.NumPHIsOfExtractValues \| 0 \| 37546 \| 37546 \| 0.00% \| 0.00% \| \| instcombine.NumSunkInst \| 63158 \| 63686 \| 528 \| 0.84% \| 0.84% \| \| instcount.NumBrInst \| 874304 \| 871857 \| -2447 \| -0.28% \| 0.28% \| \| instcount.NumCallInst \| 1757657 \| 1758402 \| 745 \| 0.04% \| 0.04% \| \| instcount.NumExtractValueInst \| 45623 \| 11483 \| -34140 \| -74.83% \| 74.83% \| \| instcount.NumInsertValueInst \| 4983 \| 580 \| -4403 \| -88.36% \| 88.36% \| \| instcount.NumInvokeInst \| 61018 \| 59478 \| -1540 \| -2.52% \| 2.52% \| \| instcount.NumLandingPadInst \| 35334 \| 34215 \| -1119 \| -3.17% \| 3.17% \| \| instcount.NumPHIInst \| 344428 \| 331116 \| -13312 \| -3.86% \| 3.86% \| \| instcount.NumRetInst \| 100773 \| 100772 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1081154 \| 1077166 \| -3988 \| -0.37% \| 0.37% \| \| instcount.TotalFuncs \| 101443 \| 101442 \| -1 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 8890201 \| 8833747 \| -56454 \| -0.64% \| 0.64% \| \| instsimplify.NumSimplified \| 75822 \| 75707 \| -115 \| -0.15% \| 0.15% \| \| simplifycfg.NumHoistCommonCode \| 24203 \| 24197 \| -6 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 48201 \| 48195 \| -6 \| -0.01% \| 0.01% \| \| simplifycfg.NumInvokes \| 2785 \| 4298 \| 1513 \| 54.33% \| 54.33% \| \| simplifycfg.NumSimpl \| 997332 \| 1018189 \| 20857 \| 2.09% \| 2.09% \| \| simplifycfg.NumSinkCommonCode \| 7088 \| 6464 \| -624 \| -8.80% \| 8.80% \| \| simplifycfg.NumSinkCommonInstrs \| 15117 \| 14021 \| -1096 \| -7.25% \| 7.25% \| ``` ... which tells us that this new fold fires whopping 38k times, increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time), decreasing total instruction count by -0.64% (-56454), and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less) and `extractvalue`'s (-74.83%, i.e. four times less). This causes geomean -0.01% binary size decrease http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform `InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold, which is supposed to be dealing with such aggregate reconstructions, fires a lot less now. There are two reasons why: 1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes. We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun. 2. But then, EarlyCSE not only manages to fold such redundant PHI's, it also sees that the extract-insert chain recreates the original aggregate, and replaces it with the original aggregate. The take-aways are 1. We maybe should do most trivial, same-BB PHI CSE in InstCombine 2. I need to check if what other patterns remain, and how they can be resolved. (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86530	2020-08-26 09:08:24 +03:00
Jianzhou Zhao	4784987027	Fix a 32-bit overflow issue when reading LTO-generated bitcode files whose strtab are of size > 2^29 This happens when using -flto and -Wl,--plugin-opt=emit-llvm to create a linked LTO bitcode file, and the bitcode file has a strtab with size > 2^29. All the issues relate to a pattern like this size_t x64 = y64 + z32 * C When z32 is >= (2^32)/C, z32 * C overflows. Reviewed-by: MaskRay Differential Revision: https://reviews.llvm.org/D86500	2020-08-26 05:47:22 +00:00
Xing GUO	75e0b58668	[DWARFYAML] Use writeDWARFOffset() to write the prologue_length field. NFC. Use writeDWARFOffset() to simplify the logic. NFC.	2020-08-26 12:34:02 +08:00
Adrien Guinet	c6f7ac0071	[llvm-lipo] Add support for bitcode files A Mach-O universal binary may contain bitcode as a slice. This diff adds proper handling of such binaries to llvm-lipo. Test plan: make check-all Differential revision: https://reviews.llvm.org/D85740	2020-08-25 21:11:18 -07:00
Mikhail R. Gadelha	30967e51da	Add Z3 to system libraries list if enabled Without this trying to link static LLVM libraries (built with Z3 enabled) fails because `llvm-config` doesn't print `-lz3`. We are already using this patch at MSYS2: https://github.com/msys2/MINGW-packages/blob/master/mingw-w64-clang/0013-Add-Z3-to-system-libraries-list-if-enabled.patch Reviewed By: mikhail.ramalho Differential Revision: https://reviews.llvm.org/D85195	2020-08-25 22:32:36 -04:00
Craig Topper	1d1515a9e2	[X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG Since we can only copy to GR32 we had to EXTRACT from GR32, but we would first go to GR16 and then the truncate would extra again to GR8. This adds a special case to go directly from GR32 to GR8. This would eventually get cleaned up, but though maybe we should avoid doing it in the first place. Our k-register handling is weird and we could probably stand to have some more special ISD nodes for the conversions so the i32 type would be explicit.	2020-08-25 18:20:43 -07:00
Craig Topper	b8ec8f5776	[X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine. The IsExtractedElement already called getOperand(0) so Extract here is the source vector. We shouldn't call getOperand(0). This worked for the original test cases because the result was a bitcast so the getOperand(0) accidently peeked through the bitcast which is what we wanted. In the failing case here, the operand turns out to be undef so the getOperand(0) asserts because undef has no operands. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184 Differential Revision: https://reviews.llvm.org/D86428	2020-08-25 16:16:54 -07:00
Craig Topper	ba319ac47e	[X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern. KMOVWkr produces VK16, there's no reason to copy it to VK16 again. Test changes are presumably because we were scheduling based on the COPY that is no longer there.	2020-08-25 15:19:27 -07:00
Mircea Trofin	7cfcecece0	[MLInliner] Simplify TFUTILS_SUPPORTED_TYPES We only need the C++ type and the corresponding TF Enum. The other parameter was used for the output spec json file, but we can just standardize on the C++ type name there. Differential Revision: https://reviews.llvm.org/D86549	2020-08-25 14:19:39 -07:00
Stanislav Mekhanoshin	b7760c3e5d	[AMDGPU] Remove unsound dependency on ISA version in waitcnt Differential Revision: https://reviews.llvm.org/D86566	2020-08-25 14:01:42 -07:00
Fangrui Song	82d0749749	[TargetLoweringObjectFileImpl] Make .llvmbc and .llvmcmd non-SHF_ALLOC There are two ways .llvmbc can be produced: * clang -c -fembed-bitcode=all (which also produces .llvmcmd) * LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode .llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by --gc-sections. This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This is conceptually correct: the two sections are not part of the process image, so SHF_ALLOC is not appropriate. `test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to `llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D86374	2020-08-25 13:37:29 -07:00
Stanislav Mekhanoshin	817c831f02	[AMDGPU] Switch to named simm16 in vscnt insertion Differential Revision: https://reviews.llvm.org/D86568	2020-08-25 13:05:27 -07:00
Ankit Aggarwal	2da1eefb58	[Hexagon] Check if EVT is simple type in HVX lowering	2020-08-25 15:02:44 -05:00
Juneyoung Lee	f753f5b050	[ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands. Instead of special-casing llvm.assume, I think it is also a viable option to add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86477	2020-08-26 04:40:21 +09:00
Nikita Popov	3a54b6a4b7	[MemDep] Use BatchAA when computing pointer dependencies We're not changing IR while running a single MemDep query, so it's safe to cache alias analysis results using BatchAA. This adds BatchAA usage to getSimplePointerDependencyFrom(), which is non-intrusive -- covering larger parts (like a whole processNonLocalLoad query) is also possible, but requires threading BatchAA through a bunch of APIs. For the ThinLTO configuration, this is a 1% geomean improvement on CTMark. Differential Revision: https://reviews.llvm.org/D85583	2020-08-25 21:34:34 +02:00
Wei Wang	ae90df8e5a	[FIX] Avoid creating BFI when emitting remarks for dead functions Dead function has its body stripped away, and can cause various analyses to panic. Also it does not make sense to apply analyses on such function. Reviewed By: xazax.hun, MaskRay, wenlei, hoy Differential Revision: https://reviews.llvm.org/D84715	2020-08-25 11:12:38 -07:00
Krzysztof Parzyszek	dcef5e0c37	[Hexagon] Remove (redundant) HexagonISelLowering::isHvxOperation(SDValue) Use isHvxOperation(SDNode*) instead.	2020-08-25 11:45:08 -05:00
Ta-Wei Tu	abbd652dd6	[LoopNest] False negative of `arePerfectlyNested` with LCSSA loops Summary: The LCSSA pass (required for all loop passes) sometimes adds additional blocks containing LCSSA variables, and checkLoopsStructure may return false even when the loops are perfectly nested in this case. This is because the successor of the exit block of the inner loop now points to the LCSSA block instead of the latch block of the outer loop. Examples are shown in the test nests-with-lcssa.ll. To fix the issue, the successor of the exit block of the inner loop can now point to a block in which all instructions are LCSSA phi node (except the terminator), and the sole successor of that block should point to the latch block of the outer loop. Reviewed By: Whitney, etiotto Differential Revision: https://reviews.llvm.org/D86133	2020-08-25 16:20:52 +00:00
Sanjay Patel	c4f0a0896f	[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) The 1st attempt (rG557b890) was reverted because it caused miscompiles. That bug is avoided here by changing the order of folds and as verified in the new tests. Original commit message: InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-25 11:19:36 -04:00
Sjoerd Meijer	8d5f64c4ed	[Verifier] Additional check for intrinsic get.active.lane.mask This adapts the verifier checks for intrinsic get.active.lane.mask to the new semantics of it as described in D86147. I.e., the second argument %n, which corresponds to the loop tripcount, must be greater than 0 if it is a constant, so check that. Differential Revision: https://reviews.llvm.org/D86301	2020-08-25 15:44:33 +01:00
Xing GUO	1dc57ada0c	[DWARFYAML] Make the 'Attributes' field optional. This patch makes the 'Attributes' field optional. We don't need to explicitly specify the 'Attributes' field in the future. Reviewed By: jhenderson, grimar Differential Revision: https://reviews.llvm.org/D86537	2020-08-25 22:37:43 +08:00
Sjoerd Meijer	39522b1e10	[SelectionDAG] Legalize intrinsic get.active.lane.mask This adapts legalization of intrinsic get.active.lane.mask to the new semantics as described in D86147. Because the second argument is now the loop tripcount, we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the backedge-taken count. Differential Revision: https://reviews.llvm.org/D86302	2020-08-25 15:00:10 +01:00
Jeremy Morse	121a49d839	[LiveDebugValues] Add switches for using instr-ref variable locations This patch adds the -Xclang option "-fexperimental-debug-variable-locations" and same LLVM CodeGen option, to pick which variable location tracking solution to use. Right now all the switch does is pick which LiveDebugValues implementation to use, the normal VarLoc one or the instruction referencing one in rGae6f78824031. Over time, the aim is to add fragments of support in aid of the value-tracking RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html also controlled by this command line switch. That will slowly move variable locations to be defined by an instruction calculating a value, and a DBG_INSTR_REF instruction referring to that value. Thus, this is going to grow into a "use the new kind of variable locations" switch, rather than just "use the new LiveDebugValues implementation". Differential Revision: https://reviews.llvm.org/D83048	2020-08-25 14:58:48 +01:00
Matt Arsenault	0d2fe90063	AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge Most notably, we were incorrectly reporting <3 x s16> as a legal type for these. Make sure these aren't legal to help make progress on fixing the artifact combiner and vector legalizer rules. Unfortunately, this means spreading the -global-isel-abort=0 hack, although this doesn't change the legalizer result in any situation.	2020-08-25 09:40:20 -04:00
Sjoerd Meijer	c352e7fbda	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Matt Arsenault	ef8f3b5a78	AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors The selection patterns will currently fail on these.	2020-08-25 09:37:41 -04:00
Sjoerd Meijer	ae366479e8	[LV] get.active.lane.mask consuming tripcount instead of backedge-taken count This adapts LV to the new semantics of get.active.lane.mask as discussed in D86147, which means that the LV now emits intrinsic get.active.lane.mask with the loop tripcount instead of the backedge-taken count as its second argument. The motivation for this is described in D86147. Differential Revision: https://reviews.llvm.org/D86304	2020-08-25 13:49:19 +01:00
David Green	5b7e27a4db	[ARM][CGP] Fix scalar condition selects for MVE The arm backend does not handle select/select_cc on vectors with scalar conditions, preferring to expand them in codegenprepare instead. This usually works except when optimizing for size, where the optsize check would end up overruling the backend isSelectSupported check. We could handle the selects in ISel too, but this seems like smaller code than trying to splat the condition to all lanes. Differential Revision: https://reviews.llvm.org/D86433	2020-08-25 12:09:06 +01:00
Mikael Holmen	59e1fbe557	[PowerPC] Fix gcc warning [NFC] Without the fix gcc 7.4 warns with ../lib/Target/PowerPC/PPCAsmPrinter.cpp: In member function 'void {anonymous}::PPCAsmPrinter::EmitTlsCall(const llvm::MachineInstr*, llvm::MCSymbolRefExpr::VariantKind)': ../lib/Target/PowerPC/PPCAsmPrinter.cpp:525:53: warning: enumeral and non-enumeral type in conditional expression [-Wextra] MCInstBuilder(Subtarget->isPPC64() ? Opcode : PPC::BL_TLS) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~	2020-08-25 12:58:38 +02:00
Shinji Okumura	05390440a2	[Attributor][NFC] Clang format	2020-08-25 19:32:58 +09:00
Paul Walker	73ac3c0ede	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Benjamin Kramer	c6fb72de4f	Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract" This reverts commit `557b890ff4`. Causing miscompiles, test case is on llvm-commits.	2020-08-25 11:31:31 +02:00
Hans Wennborg	6da4f1199e	Revert "[CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU" It broke Chromium's llvm build: CMake Error at lib/Support/CMakeLists.txt:13 (string): string sub-command REGEX, mode REPLACE: regex "^()" matched an empty string. Call Stack (most recent call first): lib/Support/CMakeLists.txt:223 (get_system_libname) This reverts commit `2b3807d822` / https://reviews.llvm.org/D86434	2020-08-25 11:22:50 +02:00
David Sherwood	7b64765cd1	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Florian Hahn	e19ef1aab5	[DSE,MemorySSA] Cache accesses with/without reachable read-clobbers. Currently we repeatedly check the same uses for read clobbers in some cases. We can avoid unnecessary checks by keeping track of the memory accesses we already found read clobbers for. To do so, we just add memory access causing read-clobbers to a set. Note that marking all visited accesses as read-clobbers would be to pessimistic, as that might include accesses not on any path to the actual read clobber. If we do not find any read-clobbers, we can add all visited instructions to another set and use that to skip the same accesses in the next call. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D75025	2020-08-25 08:48:46 +01:00
Roman Lebedev	cdd339c568	[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's As per statistic, this happens pretty exceedingly rare, but i have seen it in exactly the situations the Phi-aware aggregate reconstruction would have handled, eventually, and allowed invoke -> call fold later on. So while this might be something that other fold will have to learn about, i believe we should be doing this transform in general. Here, we are okay with adding two PHI's to get both the base aggregate, and the inserted value. I'm not sure it makes much sense to restrict it to a single phi (to just the inserted value?), because originally we'd be receiving the final aggregate already.. llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|--------------------------------------------\|-----------\|-----------\|-----:\|-------:\|------:\| \| instcombine.NumPHIsOfInsertValues \| 0 \| 12 \| 12 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 8926643 \| 8926595 \| -48 \| 0.00% \| 0.00% \| \| instcombine.NumCombined \| 3846614 \| 3846640 \| 26 \| 0.00% \| 0.00% \| \| instcombine.NumConstProp \| 24302 \| 24293 \| -9 \| -0.04% \| 0.04% \| \| instcombine.NumDeadInst \| 1620140 \| 1620112 \| -28 \| 0.00% \| 0.00% \| \| instcount.NumBrInst \| 898466 \| 898464 \| -2 \| 0.00% \| 0.00% \| \| instcount.NumCallInst \| 1760819 \| 1760875 \| 56 \| 0.00% \| 0.00% \| \| instcount.NumExtractValueInst \| 45659 \| 45649 \| -10 \| -0.02% \| 0.02% \| \| instcount.NumInsertValueInst \| 4991 \| 4981 \| -10 \| -0.20% \| 0.20% \| \| instcount.NumIntToPtrInst \| 27084 \| 27087 \| 3 \| 0.01% \| 0.01% \| \| instcount.NumPHIInst \| 371435 \| 371429 \| -6 \| 0.00% \| 0.00% \| \| instcount.NumStoreInst \| 906011 \| 906019 \| 8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1105520 \| 1105518 \| -2 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9795737 \| 9795776 \| 39 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 2784 \| 2786 \| 2 \| 0.07% \| 0.07% \| \| simplifycfg.NumSimpl \| 1001840 \| 1001850 \| 10 \| 0.00% \| 0.00% \| \| simplifycfg.NumSinkCommonInstrs \| 15174 \| 15170 \| -4 \| -0.03% \| 0.03% \| ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86306	2020-08-25 10:38:11 +03:00
Sam Parker	85a5c65f69	[NFC][RDA] Add explicit def check Explicitly check that there is a local def prior to the given instruction in getReachingLocalMIDef instead of just relying on a nullptr return from getInstFromId.	2020-08-25 08:37:45 +01:00
Freddy Ye	e02d081f2b	[X86] Support -march=sapphirerapids Support -march=sapphirerapids for x86. Compare with Icelake Server, it includes 14 more new features. They are amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote, enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86503	2020-08-25 14:21:21 +08:00
Petr Hosek	2b3807d822	[CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU For the Windows GNU platform, CMAKE_FIND_LIBRARY_PREFIXES is a list containing an empty string, which ended up in a regex capturing group, which is invalid in CMake's regex engine. With this change, we get the following: set(CMAKE_FIND_LIBRARY_PREFIXES "lib" "") set(CMAKE_FIND_LIBRARY_SUFFIXES ".dll.a" ".a" ".lib") get_system_libname(path/to/libz.dll.a zlib) message("${zlib}") outputs z, as expected. Patch By: haampie Differential Revision: https://reviews.llvm.org/D86434	2020-08-24 23:00:54 -07:00
Mircea Trofin	8c63df2416	[MLInliner] Support training that doesn't require partial rewards If we use training algorithms that don't need partial rewards, we don't need to worry about an ir2native model. In that case, training logs won't contain a 'delta_size' feature either (since that's the partial reward). Differential Revision: https://reviews.llvm.org/D86481	2020-08-24 17:36:29 -07:00
Venkataramanan Kumar	62e91bf563	[DAGCombine]: Fold X/Sqrt(X) to Sqrt(X) With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X). This is done after targets have the chance to produce a reciprocal sqrt estimate sequence because that expansion is probably more efficient than an expansion of a non-reciprocal sqrt. That is also why we deferred doing this transform in IR (D85709). Differential Revision: https://reviews.llvm.org/D86403	2020-08-24 18:16:13 -04:00
Matt Arsenault	77e5a195f8	AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands. We would still need to waterfall if the value were somehow an AGPR, and also need to explicitly copy to a VGPR.	2020-08-24 17:54:34 -04:00
Nemanja Ivanovic	075a92dea1	[PowerPC] Do not use FISel for calls and TOC-based accesses with PC-Rel PC-Relative addressing introduces a fair bit of complexity for correctly eliminating TOC accesses. FastISel does not include any of that handling so we miscompile code with -mcpu=pwr10 -O0 if it includes an external call that FastISel does not handle followed by any of the following: Floating point constant materialization Materialization of a GlobalValue Call that FastISel does handle This patch switches to SDISel for any of the above. Differential revision: https://reviews.llvm.org/D86343	2020-08-24 16:51:44 -05:00
Craig Topper	f7c87b7e37	[X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64". To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586. I updated one llvm-mca test to check a different CPU since generic has a scheduler model now Differential Revision: https://reviews.llvm.org/D86312	2020-08-24 14:47:10 -07:00
Nemanja Ivanovic	c485343c83	[PowerPC] Handle SUBFIC in reg+reg -> reg+imm transformation We initially missed the subtract-immediate in this transformation. This patch just adds that. Differential revision: https://reviews.llvm.org/D84659	2020-08-24 16:22:59 -05:00
Sanjay Patel	557b890ff4	[InstCombine] improve demanded element analysis for vector insert-of-extract InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-24 17:00:16 -04:00
Bjorn Pettersson	fce44ff5da	[Scalarizer] Avoid updating the name of globals The "takeName" logic at the end of ScalarizerVisitor::finish could end up renaming global variables when having simplified and extractelement instruction to simply pick a single vector element. If the input vector to the extractelement instruction held pointers to global variables we ended up renaming the global variable. The patch make sure we only take the name of the replaced Op when we have added new instructions that might need a useful name. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D86472	2020-08-24 21:55:03 +02:00
Roman Lebedev	56c529300e	[NFC][InstCombine] Adjust naming for some methods to match coding standards Requested as preparatory cleanup in https://reviews.llvm.org/D86306#inline-799065	2020-08-24 22:39:34 +03:00
Roland Froese	b6d7ed469f	[PowerPC] Extend custom lower of vector truncate to handle wider input Current custom lowering of truncate vector handles a source of up to 128 bits, but that only uses one of the two shuffle vector operands. Extend it to use both operands to handle 256 bit sources. Differential Revision: https://reviews.llvm.org/D68035	2020-08-24 15:33:43 -04:00
Fangrui Song	44ee9d070a	Revert D85812 "[coroutine] should disable inline before calling coro split" This reverts commit `2e43acfed8`. LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the library which contains SampleProfile.cpp). It is inappropriate for SampleProfile.cpp to depent on Coroutines.h (circular dependency). The test inverted dependencies as well: llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.	2020-08-24 11:41:05 -07:00
Matt Arsenault	75e6f0b3d4	AMDGPU: Add flag to disable promotion of uniform i16 ops This interferes with GlobalISel's much better handling of the situation. This should really be disable for GlobalISel. However, the fallback only re-runs the selection passes, and doesn't go back and rerun any codegen IR passes. I haven't come up with a good solution to this problem.	2020-08-24 14:39:27 -04:00
Craig Topper	43465a4375	[LegalizeTypes][X86] Add ROTL/ROTR to WidenVectorResult. We can widen these just like any other binary operation. Added test cases for v2i32 for X86 for coverage. Fixes failures seen after D77152.	2020-08-24 10:10:20 -07:00
Jay Foad	a522067692	[SDAG] Convert FSHL <--> FSHR if the target only supports one of them D77152 tried to do this but got it wrong in the shift-by-zero case. D86430 reverted the wrong code. Reimplement the optimization with different code depending on whether the shift amount is known to be non-zero (modulo bitwidth). This improves code quality for fshl tests on AMDGPU, which only has an fshr instruction. Differential Revision: https://reviews.llvm.org/D86438	2020-08-24 17:47:10 +01:00
Florian Hahn	d1a1cce5b1	[DSE,MemorySSA] Do not use callCapturesBefore in isReadClobber. Using callCapturesBefore potentially improves the precision and the number of stores we can remove. But in practice, it seems to have very little impact in terms of stores removed. For example, for SPEC2000/SPEC2006/MultiSource with -O3 -flto, ~50 more stores are removed (out of ~26900 stores removed). But in terms of compile-time, it is very expensive and the patch gives substantial compile-time improvements: Geomean O3 -0.24%, ReleaseThinLTO -0.47%, ReleaseLTO-g -0.39%. http://llvm-compile-time-tracker.com/compare.php?from=612a0bff88ed906c83b82f079d4c49e5fecfb9d0&to=e6c86b96d20d97dd88e903a409bd8d39b6114312&stat=instructions	2020-08-24 16:19:42 +01:00
Matt Arsenault	62d1fb828f	AMDGPU/GlobalISel: Use unmerge instead of extract in addrspace queries This is a bit more consistent with regular operation legalization.	2020-08-24 11:07:51 -04:00
Thomas Preud'homme	2c9131665d	Test all CHECK-NOT in a block even if one fails This commit makes FileCheck print all CHECK-NOT directive failure in a CHECK-NOT block even if one fails. Prior to that, it would stop trying to match CHECK-NOT directive as soon as one in the block fails. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D86315	2020-08-24 15:45:05 +01:00
Baptiste Saleil	512e256c0d	[PowerPC] Add clang options to control MMA support This patch adds frontend and backend options to enable and disable the PowerPC MMA operations added in ISA 3.1. Instructions using these options will be added in subsequent patches. Differential Revision: https://reviews.llvm.org/D81442	2020-08-24 09:35:55 -05:00
dongAxis	2e43acfed8	[coroutine] should disable inline before calling coro split summary: When callee coroutine function is inlined into caller coroutine function before coro-split pass, llvm will emits "coroutine should have exactly one defining @llvm.coro.begin". It seems that coro-early pass can not handle this quiet well. So we believe that unsplited coroutine function should not be inlined. This patch fix such issue by not inlining function if it has attribute "coroutine.presplit" (it means the function has not been splited) to fix this issue TestPlan: check-llvm Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D85812	2020-08-24 22:22:08 +08:00
Matt Arsenault	517caca359	GlobalISel: Improve dead instruction debug printing This was printing the "Is dead" on a separate line from the instruction, which was harder to follow.	2020-08-24 10:12:00 -04:00
Francesco Petrogalli	5a34b3ab95	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with `llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an ordering function for the `ElementCount` class. * Bits and pieces around printing the `ElementCount` to string streams. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Note that in some places the idiom ``` unsigned VF = ... auto VTy = FixedVectorType::get(ScalarTy, VF) ``` has been replaced with ``` ElementCount VF = ... assert(!VF.Scalable && ...); auto VTy = VectorType::get(ScalarTy, VF) ``` The assertion guarantees that the new code is (at least in debug mode) functionally equivalent to the old version. Notice that this change had been possible because none of the methods that are specific to `FixedVectorType` were used after the instantiation of `VTy`. Reviewed By: rengolin, ctetreau Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:54:03 +00:00
Matt Arsenault	70cd9f5b77	AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr Handle workitem intrinsics. There isn't really away to adequately test this right now, since none of the known bits users are fine grained enough to test the edge conditions. This triggers a number of instances of the new 64-bit to 32-bit shift combine in the existing tests.	2020-08-24 09:53:27 -04:00
Francesco Petrogalli	bad7d6b373	Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]" Reverting because the commit message doesn't reflect the one agreed on phabricator at https://reviews.llvm.org/D85794. This reverts commit `c8d2b065b9`.	2020-08-24 13:50:55 +00:00
Matt Arsenault	e1644a3779	GlobalISel: Reduce G_SHL width if source is extension shl ([sza]ext x, y) => zext (shl x, y). Turns expensive 64 bit shifts into 32 bit if it does not overflow the source type: This is a port of an AMDGPU DAG combine added in `5fa289f0d8`. InstCombine does this already, but we need to do it again here to apply it to shifts introduced for lowered getelementptrs. This will help matching addressing modes that use 32-bit offsets in a future patch. TableGen annoyingly assumes only a single match data operand, so introduce a reusable struct. However, this still requires defining a separate GIMatchData for every combine which is still annoying. Adds a morally equivalent function to the existing getShiftAmountTy. Without this, we would have to do try to repeatedly query the legalizer info and guess at what type to use for the shift.	2020-08-24 09:42:40 -04:00
Francesco Petrogalli	c8d2b065b9	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Add `<` operator to `ElementCount` to be able to use `llvm::SmallSetVector<ElementCount, ...>`. * Bits and pieces around printing the ElementCount to string streams. * Added a static method to `ElementCount` to represent a scalar. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:39:42 +00:00
Florian Hahn	b99a5eb659	[DSE,MemorySSA] Delay PointerMayBeCaptured calls until actually needed. Avoid computing InvisibleToCallerBefore/AfterRet up front. In most cases, this information is not really needed. Instead, introduce helper functions to compute and cache the result on demand. Notably, this also does not use PointerMayBeCapturedBefore for isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as starting instruction, making the caching ineffective. But it appears the use of PointerMayBeCapturedBefore has very limited benefits in practice (e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with -O3 -flto). Refrain from using it for now, to limit-compile-time. This gives some nice compile-time improvements: http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions	2020-08-24 14:05:44 +01:00
Anna Welker	8048068c3e	[ARM][MVE] Allow tail predication for strides !=1 with gather/scatters If gather/scatters are enabled, ARMTargetTransformInfo now allows tail predication for loops with a much wider range of strides, up to anything that is loop invariant. Differential Revision: https://reviews.llvm.org/D85410	2020-08-24 13:54:47 +01:00
Jonas Paulsson	8ac70694b9	[SystemZ] Preserve the MachineMemOperand in emitCondStore() in all cases. Review: Ulrich Weigand	2020-08-24 14:07:30 +02:00
Florian Hahn	2431b143ae	[DSE,MemorySSA] Limit elimination at end of function to single UO. Limit elimination of stores at the end of a function to MemoryDefs with a single underlying object, to save compile time. In practice, the case with multiple underlying objects seems not very important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006 this results in a total of 2 more stores being eliminated. We can always re-visit that in the future.	2020-08-24 13:00:17 +01:00
Sanjay Patel	6a44edb8da	[InstCombine] fold abs of select with negated op (PR39474) Similar to the existing transform - peek through a select to match a value and its negation. https://alive2.llvm.org/ce/z/MXi5KG define i8 @src(i1 %b, i8 %x) { %0: %neg = sub i8 0, %x %sel = select i1 %b, i8 %x, i8 %neg %abs = abs i8 %sel, 1 ret i8 %abs } => define i8 @tgt(i1 %b, i8 %x) { %0: %abs = abs i8 %x, 1 ret i8 %abs } Transformation seems to be correct!	2020-08-24 07:37:55 -04:00
Sam Parker	2e194fe73b	[SCEV] Still trying to fix windows buildbots	2020-08-24 10:26:48 +01:00
Julien Etienne	0f0be3fb8d	Add support for AVR attiny441 and attiny841 Reviewed By: dylanmckay Differential Revision: https://reviews.llvm.org/D85589 Patch by Julien Etienne	2020-08-24 20:28:32 +12:00
Sam Parker	8ce450da32	[NFCI][SimplifyCFG] Combine select costs and checks Combine the cost modelling and validity checks for the phi to select conversion in SpeculativelyExecuteBB, extracting the logic out into a function.	2020-08-24 09:16:11 +01:00
Bjorn Pettersson	7a4e26adc8	[SelectionDAG] Fix miscompile bug in expandFunnelShift This is a fixup of commit `0819a6416f` (D77152) which could result in miscompiles. The miscompile could only happen for targets where isOperationLegalOrCustom could return different values for FSHL and FSHR. The commit mentioned above added logic in expandFunnelShift to convert between FSHL and FSHR by swapping direction of the funnel shift. However, that transform is only legal if we know that the shift count (modulo bitwidth) isn't zero. Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if Z modulo bitwidth, could be zero. ``` $ ./alive-tv /tmp/test.ll ---------------------------------------- define i32 @src(i32 %x, i32 %y, i32 %z) { %0: %t0 = fshl i32 %x, i32 %y, i32 %z ret i32 %t0 } => define i32 @tgt(i32 %x, i32 %y, i32 %z) { %0: %t0 = sub i32 32, %z %t1 = fshr i32 %x, i32 %y, i32 %t0 ret i32 %t1 } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %x = #x00000000 (0) i32 %y = #x00000400 (1024) i32 %z = #x00000000 (0) Source: i32 %t0 = #x00000000 (0) Target: i32 %t0 = #x00000020 (32) i32 %t1 = #x00000400 (1024) Source value: #x00000000 (0) Target value: #x00000400 (1024) ``` It could be possible to add back the transform, given that logic is added to check that (Z % BW) can't be zero. Since there were no test cases proving that such a transform actually would be useful I decided to simply remove the faulty code in this patch. Reviewed By: foad, lebedev.ri Differential Revision: https://reviews.llvm.org/D86430	2020-08-24 09:52:11 +02:00
Fangrui Song	fd485673da	[LiveDebugVariables] Internalize class DbgVariableValue. NFC	2020-08-23 22:53:46 -07:00
Qiu Chaofan	1bc45b2fd8	[PowerPC] Support lowering int-to-fp on ppc_fp128 D70867 introduced support for expanding most ppc_fp128 operations. But sitofp/uitofp is missing. This patch adds that after D81669. Reviewed By: uweigand Differntial Revision: https://reviews.llvm.org/D81918	2020-08-24 11:18:16 +08:00
Qiu Chaofan	fed6107dcb	[PowerPC] Allow constrained FP intrinsics in mightUseCTR We may meet Invalid CTR loop crash when there's constrained ops inside. This patch adds constrained FP intrinsics to the list so that CTR loop verification doesn't complain about it. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D81924	2020-08-24 11:09:58 +08:00
QingShan Zhang	960cbc53ca	[DAGCombine] Remove dead node when it is created by getNegatedExpression We hit the compiling time reported by https://bugs.llvm.org/show_bug.cgi?id=46877 and the reason is the same as D77319. So we need to remove the dead node we created to avoid increase the problem size of DAGCombiner. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D86183	2020-08-24 02:50:58 +00:00
Qiu Chaofan	41ba9d7723	[PowerPC] Support constrained vector fp/int conversion This patch makes these operations legal, and add necessary codegen patterns. There's still some issue similar to D77033 for conversion from v1i128 type. But normal type tests synced in vector-constrained-fp-intrinsic are passed successfully. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D83654	2020-08-24 10:10:27 +08:00
Roman Lebedev	f6decfa36d	[InstCombine] Negator: freeze is freely negatible if it's operand is negatible	2020-08-23 23:28:19 +03:00
Fangrui Song	bef684154d	[X86][FastISel] Support materializing floating-point constants for large code model & PIC The following program miscompiles because rL216012 added static relocation model support but not for PIC. ``` // clang -fpic -mcmodel=large -O0 a.cc double foo() { return 42.0; } ``` This patch adds PIC support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86024	2020-08-23 08:36:18 -07:00
Florian Hahn	2843c9fe0a	[DSE,MemorySSA] Keep single DL instance in DSEState (NFC). Small cleanup, also removes one instance of getting DataLayout without using it later.	2020-08-23 15:56:38 +01:00
Sanjay Patel	1d0fa79824	[DAGCombiner] restrict store merge of truncs to early combining The pattern matching does not account for truncating stores, so it is unlikely to work at later stages. So we are likely wasting compile-time with no hope of improvement by running this later.	2020-08-23 10:44:23 -04:00
Sanjay Patel	79cb289a95	[DAGCombiner] add early exit for store merging of truncs This should be NFC in terms of output because the endian check further down would bail out too, but we are wasting time by waiting to that point to give up. If we generalize that function to deal with more than i8 types, we should not have to deal with the degenerate case.	2020-08-22 16:25:16 -04:00
Jeremy Morse	93af37043b	Follow-up build fix for rGae6f78824031 One of the bots objects to brace-initializing a tuple: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/43595/steps/build%20stage%201/logs/stdio As the tuple constructor is apparently explicit. Fall back to the (not as pretty) explicit construction of a tuple. I'd thought this was permitted behaviour; will investigate why this fails later.	2020-08-22 19:09:30 +01:00
Fangrui Song	60bcec4eea	[LiveDebugValues] Delete unneeded copy constructor after D83047 It will suppress the implicitly-declared copy assignment operator in C++20.	2020-08-22 10:55:28 -07:00
Jeremy Morse	ae6f788240	[LiveDebugValues] Add instruction-referencing LDV implementation This patch imports the instruction-referencing implementation of LiveDebugValues proposed here: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html The new implementation is unreachable in this patch, it's the next patch that enables it behind a command line switch. Briefly, rather than tracking variable locations by just their location as the 'VarLoc' implementation does, this implementation does it by value: * Each value defined in a function is numbered, and propagated through dataflow, * Each DBG_VALUE reads a machine value number from a machine location, * Variable _values_ are propagated through dataflow, * Variable values are translated back into locations, DBG_VALUEs inserted to specify where those locations are. The ultimate aim of this is to enable referring to variable values throughout post-isel code, rather than locations. Those patches will build on top of this new LiveDebugValues implementation in later patches -- it can't be done with the VarLoc implementation as we don't have value information, only locations. Differential Revision: https://reviews.llvm.org/D83047	2020-08-22 18:31:08 +01:00
Matt Arsenault	901e3317fe	GlobalISel: Merge FewerElements for G_BUILD_VECTOR/G_CONCAT_VECTORS This switches from using G_EXTRACT in odd cases to widen with undef and unmerge.	2020-08-22 10:25:53 -04:00
Jeremy Morse	2d9be9e318	Fix some builds after `20bb9fe565` -Wsuggest-override indicates this VarLocBasedLDV method needs the override keyword.	2020-08-22 15:20:42 +01:00
Jeremy Morse	20bb9fe565	[LiveDebugValues] Install an implementation-picking LiveDebugValues pass This patch renames the current LiveDebugValues class to "VarLocBasedLDV" and removes the pass-registration code from it. It creates a separate LiveDebugValues class that deals with pass registration and management, that calls through to VarLocBasedLDV::ExtendRanges when runOnMachineFunction is called. This is done through the "LDVImpl" abstract class, so that a future patch can install the new instruction-referencing LiveDebugValues implementation and have it picked at runtime. No functional change is intended, just shuffling responsibilities. Differential Revision: https://reviews.llvm.org/D83046	2020-08-22 14:50:22 +01:00
Sanjay Patel	ec06b38130	[InstCombine] canonicalize 'not' ops before logical shifts This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen. The SCEV motivating case is discussed in: http://bugs.llvm.org/PR47136 There's an analysis motivating case at: http://bugs.llvm.org/PR38781 I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps. Alive proofs: https://rise4fun.com/Alive/BBV Name: shift right of 'not' Pre: C2 == (-1 u>> C1) %a = lshr i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = lshr i8 %n, C1 Name: shift left of 'not' Pre: C2 == (-1 << C1) %a = shl i8 %x, C1 %r = xor i8 %a, C2 => %n = xor i8 %x, -1 %r = shl i8 %n, C1 Name: ashr of 'not' %a = ashr i8 %x, C1 %r = xor i8 %a, -1 => %n = xor i8 %x, -1 %r = ashr i8 %n, C1 Differential Revision: https://reviews.llvm.org/D86243	2020-08-22 09:38:13 -04:00
Sanjay Patel	2fc7c85201	[DAGCombiner] clean up merge of truncated stores; NFC This code handles the special-case of i8 stores, but it could be generalized to deal with other types.	2020-08-22 09:23:32 -04:00
Jeremy Morse	fba06e3c85	[LiveDebugValues][NFC] Move LiveDebugValues source for refactor This is a pure file move of LiveDebugValues.cpp ahead of the pass being refactored, with an experimental new implementation to follow. The motivation for these changes can be found here: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html And the other related changes can be found in the phabricator stack for this revision: Differential Revision: https://reviews.llvm.org/D83304	2020-08-22 12:58:30 +01:00
Florian Hahn	5e7e2162d4	[DSE,MemorySSA] Use BatchAA for AA queries. We can use BatchAA to avoid some repeated AA queries. We only remove stores, so I think we will get away with using a single BatchAA instance for the complete run. The changes in AliasAnalysis.h mirror the changes in D85583. The change improves compile-time by roughly 1%. http://llvm-compile-time-tracker.com/compare.php?from=67ad786353dfcc7633c65de11601d7823746378e&to=10529e5b43809808e8c198f88fffd8f756554e45&stat=instructions This is part of the patches to bring down compile-time to the level referenced in http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86275	2020-08-22 08:36:35 +01:00
Sourabh Singh Tomar	f91d18eaa9	[DebugInfo][flang]Added support for representing Fortran assumed length strings This patch adds support for representing Fortran `character(n)`. Primarily patch is based out of D54114 with appropriate modifications. Test case IR is generated using our downstream classic-flang. We're in process of upstreaming flang PR's but classic-flang has dependencies on llvm, so this has to get in first. Patch includes functional test case for both IR and corresponding dwarf, furthermore it has been manually tested as well using GDB. Source snippet: ``` program assumedLength call sub('Hello') call sub('Goodbye') contains subroutine sub(string) implicit none character(len=), intent(in) :: string print , string end subroutine sub end program assumedLength ``` GDB: ``` (gdb) ptype string type = character (5) (gdb) p string $1 = 'Hello' ``` Reviewed By: aprantl, schweitz Differential Revision: https://reviews.llvm.org/D86305	2020-08-22 10:13:40 +05:30
Alina Sbirlea	f55ad3973d	[DomTree] Extend update API to allow a post CFG view. Extend the `applyUpdates` in DominatorTree to allow a post CFG view, different from the current CFG. This patch implements the functionality of updating an already up to date DT, to the desired PostCFGView. Combining a set of updates towards an up to date DT and a PostCFGView is not yet supported. Differential Revision: https://reviews.llvm.org/D85472	2020-08-21 17:23:08 -07:00
Paul C. Anagnostopoulos	196e6f9f18	Replace TableGen range piece punctuator with '...' The TableGen range piece punctuator is currently '-' (e.g., {0-9}), which interacts oddly with the fact that an integer literal's sign is part of the literal. This patch replaces the '-' with the new punctuator '...'. The '-' punctuator is deprecated. Differential Revision: https://reviews.llvm.org/D85585 Change-Id: I3d53d14e23f878b142d8f84590dd465a0fb6c09c	2020-08-21 23:33:57 +02:00
Roman Lebedev	503deec218	Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit `1d51dc38d8`.	2020-08-22 00:33:22 +03:00
Nicolai Hähnle	17cd34409a	Fix two bugs in TGParser::ParseValue TGParser::ParseValue contains two recursive calls, one to parse the RHS of a list paste operator and one to parse the RHS of a paste operator in a class/def name. Both of these calls neglect to check the return value to see if it is null (because of some error). This causes a crash in the next line of code, which uses the return value. The code now checks for null returns. Differential Revision: https://reviews.llvm.org/D85852	2020-08-21 23:19:36 +02:00
Arthur Eubanks	b79889c2b1	[opt][NewPM] Add basic-aa in legacy PM compatibility mode The legacy PM alias analysis pipeline by default includes basic-aa. When running `opt -foo-pass` under the NPM and -disable-basic-aa is not specified, use basic-aa. This decreases the number of check-llvm failures under NPM from 913 to 752. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86167	2020-08-21 14:05:07 -07:00
Nicolai Hähnle	b37db11d95	MachineSSAUpdater: Allow initialization with just a register class The register class is required for inserting PHIs, but the "current virtual register" isn't actually used for anything, so let's remove it while we're at it. Differential Revision: https://reviews.llvm.org/D85602 Change-Id: I1e647f31570ef21a7ea8e20db3454178e98a6a8b	2020-08-21 23:04:35 +02:00
kuterd	65fcc0ee31	[Attributor] Function seed allow list - Adds a command line option to seed only selected functions. - Makes seed allow listing exclusive to assertions enabled builds. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D86129	2020-08-21 23:55:26 +03:00
Shinji Okumura	e21a22a7a8	[Attributor] fix AANoUndef initialization Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position. This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86361	2020-08-22 05:06:14 +09:00
Amy Huang	5e3fd471ac	[Cloning] Fix to cloning DISubprograms. When trying to enable -debug-info-kind=constructor there was an assert that occurs during debug info cloning ("mismatched subprogram between llvm.dbg.value variable and !dbg attachment"). It appears that during llvm::CloneFunctionInto, a DISubprogram could be duplicated when MapMetadata is called, and then added to the MD map again when DIFinder gets a list of subprograms. This results in two different versions of the DISubprogram. This patch switches the order so that the DIFinder subprograms are added before MapMetadata is called. Fixes https://bugs.llvm.org/show_bug.cgi?id=46784 Differential Revision: https://reviews.llvm.org/D86185	2020-08-21 11:54:56 -07:00
Stanislav Mekhanoshin	9a9a092e61	[AMDGPU] Avoid sorting stalls in regbank-reassign This is the slowest operation in the already slow pass. Instead of sorting just put a stall list into an ordered map. Differential Revision: https://reviews.llvm.org/D86253	2020-08-21 11:49:41 -07:00
Serguei Katkov	9e362bb0eb	[InstCombine] Remove unused entries in gc-live bundle of statepoint If some of gc live value are not used in gc.relocate we can remove them from gc-live bundle of statepoint instruction. Also the CL removes duplicated Values in gc-live bundle. Reviewers: reames, dantrushin Reviewed By: dantrushin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85959	2020-08-22 01:36:22 +07:00
Fangrui Song	06cad825cd	PrintStackTrace: don't symbolize if LLVM_DISABLE_SYMBOLIZATION is set See http://lists.llvm.org/pipermail/llvm-dev/2017-June/113975.html for a related previous discussion. Many tools install signal handlers to print stack traces and optionally symbolize the addresses with an external program 'llvm-symbolizer' (when searching for 'llvm-symbolizer', the directory containg the executable is preferred over PATH). 'llvm-symbolizer' can be slow if the executable is large and/or if llvm-symbolizer' itself is under-optimized. For example, my 'llvm-lto2' from a -DCMAKE_BUILD_TYPE=Debug build is 443MiB. The 'llvm-symbolizer' from the same build takes ~2s to symbolize it. (An optimized 'llvm-symbolizer' takes 0.34s). A crashed clang may take more than 5s to symbolize a stack trace. If a test file has several `not --crash` RUN lines. It can be very slow in a Debug build. This patch makes `not --crash` set an environment variable to suppress symbolization. This is similar to D33804 which uses a command line option. I pick 'symbolization' instead of 'symbolication' because the former is used much more commonly and its stem matches 'llvm-symbolizer'. Also set LLVM_DISABLE_CRASH_REPORT=1, which is currently only applicable on `__APPLE__`. Reviewed By: dblaikie, aganea Differential Revision: https://reviews.llvm.org/D86170	2020-08-21 11:27:13 -07:00
Qiu Chaofan	a5b7b8cce0	[PowerPC] Support constrained scalar sitofp/uitofp This patch adds support for constrained scalar int to fp operations on PowerPC. Besides, this also fixes the FP exception bit of FCFID* instructions. Reviewed By: steven.zhang, uweigand Differential Revision: https://reviews.llvm.org/D81669	2020-08-22 02:10:29 +08:00
Serguei Katkov	63d9d56a55	[InstCombine] Move handling of gc.relocate in a gc.statepoint The only def for gc.relocate is a gc.statepoint. But real dependency is not described by def-use chain. Instead this dependency is encoded by indecies of operands in gc-live bundle of statepoint as integer constants in gc.relocate. InstCombine operates by def-use chain. As a result when value in gc-live bundle is simplified the gc.statepoint itself is not simplified but it might simplify dependent gc.relocates. To trigger the optimization of gc.relocate we now unconditionally trigger check of all dependent gc.relocates by adding them to worklist. This CL handles of gc.relocates as process of gc.statepoint optimization considering gc.statepoint and related gc.relocate as whole entity. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D85954	2020-08-21 23:44:23 +07:00
Florian Hahn	bc72a3ab94	[Constants] Handle FNeg in getWithOperands. Currently ConstantExpr::getWithOperands does not handle FNeg and subsequently treats FNeg as binary operator, leading to an assertion failure or segmentation fault if built without assertions. Originally I reproduced this with llvm-dis on a bitcode file, which I unfortunately cannot share and also cannot really reduce. But PR45426 describes the same issue and has a reproducer with Clang, so I'll go with that. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D86274	2020-08-21 16:50:56 +01:00
Kamau Bridgeman	365f861c45	[PowerPC][PCRelative] Thread Local Storage Support for Initial Exec This patch is the initial support for the Intial Exec Thread Local Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D81947	2020-08-21 10:13:11 -05:00
diggerlin	a081868921	[AIX][XCOFF] emit symbol visibility for xcoff object file. SUMMARY: Reviewers: Jason liu Differential Revision: https://reviews.llvm.org/D84265	2020-08-21 11:00:56 -04:00
Florian Hahn	8eded24bf4	Recommit "[SCEVExpander] Add helper to clean up instrs inserted while expanding." Recommit the patch after fixing an issue reported caused by the fact that re-used values are also added to InsertedValues. Additional tests have been added in `88818491b9` This reverts the revert commit `38884641f2`.	2020-08-21 15:04:17 +01:00
Cameron McInally	36dbb8fc97	[SVE] Lower fixed length UDIV to scalable Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement. Differential Revision: https://reviews.llvm.org/D86316	2020-08-21 09:01:25 -05:00
Sam Parker	bfc6d8b59b	[NFC][SimplifyCFG] Formatting and variable rename	2020-08-21 13:11:17 +01:00
Xing GUO	f5643dc3dc	Recommit: [DWARFYAML] Add support for referencing different abbrev tables. The original commit (7ff0ace96db9164dcde232c36cab6519ea4fce8) was causing build failure and was reverted in `6d242a7326` ==================== Original Commit Message ==================== This patch adds support for referencing different abbrev tables. We use 'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly assign an abbrev table to compilation units. The syntax is: ``` debug_abbrev: - ID: 0 Table: ... - ID: 1 Table: ... debug_info: - ... AbbrevTableID: 1 ## Reference the second abbrev table. - ... AbbrevTableID: 0 ## Reference the first abbrev table. ``` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D83116	2020-08-21 19:02:10 +08:00
lewis-revill	9e6c09c0d9	[RISCV] Fix inaccurate annotations on PseudoBRIND PseudoBRIND had seemingly inherited incorrect annotations denoting it as a call instruction and that it defines X1/ra. This caused excess save/restore code to be emitted for ra. Differential Revision: https://reviews.llvm.org/D86286	2020-08-21 11:38:42 +01:00
Mirko Brkusanin	0654ff703d	[AMDGPU] Use ds_read/write_b96/b128 when possible for SDag Do not break down local loads and stores so ds_read/write_b96/b128 in ISelLowering can be selected on subtargets that support them and if align requirements allow them. Differential Revision: https://reviews.llvm.org/D84403	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	d17ea67b92	[AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores Fix local ds_read/write_b96/b128 so they can be selected if the alignment allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break them down. Differential Revision: https://reviews.llvm.org/D81638	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	5bd1febe21	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
Florian Hahn	9f7350672e	[DSE,MemorySSA] Handle atomicrmw/cmpxchg conservatively. This adds conservative handling of AtomicRMW/AtomicCmpXChg to isDSEBarrier, similar to atomic loads and stores.	2020-08-21 10:42:42 +01:00
Roman Lebedev	5d7c5a5e99	[NFC] Port InstCount pass to new pass manager	2020-08-21 12:39:42 +03:00
Jay Foad	0819a6416f	[SelectionDAG] Better legalization for FSHL and FSHR In SelectionDAGBuilder always translate the fshl and fshr intrinsics to FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and ORs. Improve the legalization of FSHL and FSHR to avoid code quality regressions. Differential Revision: https://reviews.llvm.org/D77152	2020-08-21 10:32:49 +01:00
Jay Foad	98de0d22f5	[AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy	2020-08-21 10:14:35 +01:00
Yevgeny Rouban	18bc400f97	[NewPM][PassInstrumentation] Add PreservedAnalyses parameter to AfterPass* callbacks Both AfterPass and AfterPassInvalidated pass instrumentation callbacks get additional parameter of type PreservedAnalyses. This patch was created by @fedor.sergeev. I have just slightly changed it. Reviewers: fedor.sergeev Differential Revision: https://reviews.llvm.org/D81555	2020-08-21 16:10:42 +07:00
Sam Parker	47251582f5	[SimplifyCFG] Cost required selects Before we speculatively execute a basic block, query the cost of inserting the necessary select instructions against the phi folding threshold. For non-trivial insertions, a more accurate decision can probably be made during machine if-conversion. With minsize we query the CodeSize cost, otherwise we use SizeAndLatency. Differential Revision: https://reviews.llvm.org/D82438	2020-08-21 09:52:52 +01:00
Florian Hahn	a0e92ffd0d	[DSE,MemorySSA] Split off partial tracking from isOverwite. When traversing memory uses to look for aliasing reads/writes, we only care about complete overwrites. This patch splits off the partial overwrite tracking from isOverwrite This avoids some unnecessary work when checking for read/write clobbers with MemorySSA-DSE. isOverwrite, which skips the partial overwrite tracking. This gives a relatively small improvement http://llvm-compile-time-tracker.com/compare.php?from=ef2a2f77f87553a0a4a39f518eb9ac86b756bda6&to=658f3905dd96d3415f3782adc712c79fa59a4665&stat=instructions This is part of the patches to bring down compile-time to the level referenced in http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86280	2020-08-21 09:13:59 +01:00
Sam Parker	acf0bb41e4	[ARM][CostModel] Select instruction costs. Modify the ARM getCmpSelInstrCost implementation for the code size costs of selects. Now consider the legalization cost and increase the cost of i1 because those values wouldn't live in a general purpose register. We also make selects +1 more expensive to account for the IT instruction. Differential Revision: https://reviews.llvm.org/D82091	2020-08-21 08:49:56 +01:00
David Green	2b69efded0	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
Mehdi Amini	927da43ade	Allow multiple calls to InitLLVM() (NFC) In `e99dee82b0`, the "out_of_memory_new_handler" was changed to be explicitly initialized instead of relying on a global static constructor. However before this change, install_out_of_memory_new_handler could be called multiple times while it asserts right now. We can be more tolerant to calling multiple time InitLLVM without reintroducing a global constructor for this handler. Differential Revision: https://reviews.llvm.org/D86330	2020-08-21 06:13:00 +00:00
Xing GUO	6d242a7326	Revert "[DWARFYAML] Add support for referencing different abbrev tables." This reverts commit `f7ff0ace96`. This change is causing build failure. http://lab.llvm.org:8011/builders/clang-cmake-armv7-global-isel/builds/10400	2020-08-21 12:15:54 +08:00
Xing GUO	f7ff0ace96	[DWARFYAML] Add support for referencing different abbrev tables. This patch adds support for referencing different abbrev tables. We use 'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly assign an abbrev table to compilation units. The syntax is: ``` debug_abbrev: - ID: 0 Table: ... - ID: 1 Table: ... debug_info: - ... AbbrevTableID: 1 ## Reference the second abbrev table. - ... AbbrevTableID: 0 ## Reference the first abbrev table. ``` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D83116	2020-08-21 11:44:25 +08:00
Xing GUO	290e399f96	[DWARFYAML] Add support for emitting multiple abbrev tables. This patch adds support for emitting multiple abbrev tables. Currently, compilation units will always reference the first abbrev table. Reviewed By: jhenderson, labath Differential Revision: https://reviews.llvm.org/D86194	2020-08-21 10:12:08 +08:00
Michael Liao	5257a60ee0	[amdgpu] Add codegen support for HIP dynamic shared memory. Summary: - HIP uses an unsized extern array `extern __shared__ T s[]` to declare the dynamic shared memory, which size is not known at the compile time. Reviewers: arsenm, yaxunl, kpyzhov, b-sumner Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82496	2020-08-20 21:29:18 -04:00
Kang Zhang	95e18b2d9d	[PowerPC] Fix a typo for InstAlias of mfsprg D77531 has a type for mfsprg, it should be mtsprg. This patch is to fix this typo.	2020-08-21 01:10:52 +00:00
Justin Bogner	1283dca007	[GISel] Correct the known bits of G_ANYEXT Known bits for G_ANYEXT was incorrectly using KnownBits::zext, causing us to treat the high bits as zero even though they're (by definition) unknown. Differential Revision: https://reviews.llvm.org/D86323	2020-08-20 17:17:04 -07:00
Jon Roelofs	74ca5275e9	Fix a couple of typos. NFC	2020-08-20 14:56:57 -06:00
Matt Arsenault	79ce9bb380	CodeGen: Don't drop AA metadata when splitting MachineMemOperands Assuming this is used to split a memory access into smaller pieces, the new access should still have the same aliasing properties as the original memory access. As far as I can tell, this wasn't intentionally dropped. It may be necessary to drop this if you are moving the operand outside of the bounds of the original object in such a way that it may alias another IR object, but I don't think any of the existing users are doing this. Some of the uses widen into unused alignment padding, which I think is OK.	2020-08-20 16:17:30 -04:00
Matt Arsenault	18b218007d	AMDGPU/GlobalISel: Legalize odd sized loads with widening Custom lower and widen odd sized loads up to the alignment. The default set of legalization actions doesn't have a way to represent this. This fixes naturally aligned <3 x s8> and <3 x s16> loads. This also starts moving towards eliminating the buggy and overcomplicated legalization rules for narrowing. All the memory size changes should be done in the lower or custom action, not NarrowScalar / FewerElements. These currently have redundant and ambiguous code with the lower action.	2020-08-20 16:15:53 -04:00
vnalamot	54d8ded4b1	allSGPRSpillsAreDead() should use actual FP/BP frame indices The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead() make sure there are no SGPR spills pending during PEI. But the FP/BP spills happen during PEI and are exceptions. Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to accommodate the exceptions. Differential Revision: https://reviews.llvm.org/D86291	2020-08-20 16:15:53 -04:00
Kamau Bridgeman	b74b80bb2d	[PowerPC][PCRelative] Thread Local Storage Support for General Dynamic This patch is the initial support for the General Dynamic Thread Local Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Patch by: NeHuang Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D82315	2020-08-20 15:08:13 -05:00
Cameron McInally	ac63959460	[SVE] Lower fixed length vXi8/vXi16 SDIV to scalable There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32. Differential Revision: https://reviews.llvm.org/D86114	2020-08-20 13:47:01 -05:00
Jessica Clarke	3149ec07c0	[RISCV] Enable MCCodeEmitter instruction predicate verifier This ensures that we never encode an instruction which is unavailable, such as if we explicitly insert a forbidden instruction when lowering. This is particularly important on RISC-V given its high degree of modularity, and will become increasingly important as new standard extensions appear. Reviewed By: asb, lenary Differential Revision: https://reviews.llvm.org/D85015	2020-08-20 18:36:54 +01:00
Jay Foad	3497860203	[AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister ... in favour of the isPhysical/isVirtual methods.	2020-08-20 17:59:11 +01:00
Mircea Trofin	364cd768a2	[NFC] Expose the -Oz module optimization pipeline to opt This exposes the module optimization pipeline as a pass that can be applied stand-alone when using 'opt'. This helps ml inliner training scenarios, where we start with IR captured right before inlining, perform the inlining (-scc-oz-module-inliner) and then want to continue and observe the final IR (where this patch comes into play). We can then apply llc on the resulting IR to continue compilation down to native. Differential Revision: https://reviews.llvm.org/D86224	2020-08-20 09:28:58 -07:00
Jay Foad	4aaf772542	[PeepholeOptimizer] Remove dead code At this point we have already ruled out all def operands, so we can't possibly see a dead implicit def operand.	2020-08-20 16:48:57 +01:00
David Green	816097e4e5	[LV] Allow tail folded reduction selects to remain in the loop The normal scheme for tail folding reductions is to use: loop: p = phi(0, a) mask = ... x = masked_load(..., mask) a = add(x, p) s = select(mask, a, p) This means we need to keep the register p and a alive out of the loop, plus the mask. On a target with predicated operations we can instead generate the phi as p = phi(0, s). This ensures the select in the loop and we can fold select(m, add(a, b), c) to something like a vaddt c, a, b using the m predicate. This in turn allows us to tail predicate the entire loop. Differential Revision: https://reviews.llvm.org/D84741	2020-08-20 14:31:14 +01:00
Bjorn Pettersson	ff107eed15	[AArch64] Update a code comment incorrectly referring to zero_reg. NFC The getSrcFromCopy helper nowadays return a MachineOperand pointer, so talking about zero_reg was incorrect as it nowadays return a nullptr when not finding a copy like instruction.	2020-08-20 14:36:59 +02:00
Shinji Okumura	835cfa5def	[Attributor] Handle CallBase case in AAValueConstantRange::initialize Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case. That is problematic when we propagate a range from call site returned position to floating position. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D86196	2020-08-20 20:15:19 +09:00
Paul Walker	0015b8db8e	[SVE] Add ISEL patterns for predicated shifts by an immediate. For scalable vector shifts the prediacte is typically all active, which gets selected to an unpredicated shift by immediate. When code generating for fixed length vectors the predicate is based on the vector length and so additional patterns are required to make use of SVE's predicated shift by immediate instructions. Differential Revision: https://reviews.llvm.org/D86204	2020-08-20 11:47:20 +01:00
David Stenberg	8206257cb8	[GlobalOpt] Fix an incorrect Modified status When removing a non-constant store to a global in CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return false. This was caught using the check introduced by D80916. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86149	2020-08-20 11:52:09 +02:00
David Stenberg	7a1029fd1e	Reland "[LoopUnswitch] Fix incorrect Modified status" Relanded since the buildbot issue was unrelated to this commit. When hoisting simple values out from a loop, and an optsize attribute, a convergent call, or an invoke instruction hindered the pass from unswitching the loop, the pass would return an incorrect Modified status. This was caught using the check introduced by D80916. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86085	2020-08-20 11:52:09 +02:00
Bjorn Pettersson	b43235a76c	[DebugInfo] Fix DwarfExpression::addConstantFP for float on big-endian The byte swapping, when dealing with 4 byte (float) FP constants in DwarfExpression::addConstantFP, added in commit `ef8992b9f0` was not correct. It always performed byte swapping using an uint64_t value. When dealing with 4 byte values the 4 interesting bytes ended up in the big end of the uint64_t, but later we emitted the 4 bytes at the little end. So we ended up with zeroes being emitted and faulty debug information. This patch simplifies things a bit, IMHO. Using the APInt representation throughout the function, instead of looking at the internal representation using getRawBytes and without using reinterpret_cast etc. And using API.byteSwap() should result in correct byte swapping independent of APInt being 4 or 8 bytes. Differential Revision: https://reviews.llvm.org/D86272	2020-08-20 11:48:05 +02:00
David Stenberg	ca688ae497	Revert "[LoopUnswitch] Fix incorrect Modified status" This reverts commit `dfd447c220`. After I pushed this commit, llvm-sphinx-docs started failing, due to: Warning, treated as error: extension 'recommonmark' has no setup() function; is it really a Sphinx extension module? I don't see how this commit may have caused that, but I'm still reverting it since I don't know how to proceed with that troubleshooting.	2020-08-20 11:14:23 +02:00
Evgeny Leviant	d5b701b972	[ThinLTO] Import globals recursively Differential revision: https://reviews.llvm.org/D73698	2020-08-20 12:13:43 +03:00
Sebastian Neubauer	b8d1994778	[AMDGPU] Add A16/G16 to InstCombine When sampling from images with coordinates that only have 16 bit accuracy, convert the image intrinsic call to use a16 or g16. This does only happen if the target hardware supports it. An alternative would be to always apply this combination, independent of the target hardware and extend 16 bit arguments to 32 bit arguments during legalization. To me, this sounds like an unnecessary roundtrip that could prevent some further InstCombine optimizations. Differential Revision: https://reviews.llvm.org/D85887	2020-08-20 10:51:49 +02:00
Konstantin Schwarz	7497b861f4	[GlobalISel][IRTranslator] Support PHI instructions in landingpad blocks The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear before the landingpad instructions, resulting in a fallback to SelectionDAG. This change relaxes the check to allow PHI nodes. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86141	2020-08-20 10:49:31 +02:00
Georgii Rymar	a6436b0b3a	[yaml2obj] - Make the 'Machine' key optional. Currently we have to set 'Machine' to something in our YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets and 'EM_386' for 32-bit targets. At the same time, in fact, in most cases our tests do not need a machine type and we can use 'EM_NONE'. This is cleaner, because avoids the need of using a particular machine. In this patch I've made the 'Machine' key optional (the default value, when it is not specified is `EM_NONE`) and removed it (where possible) from yaml2obj, obj2yaml and llvm-readobj tests. There are few tests left where I decided not to remove it, because I didn't want to touch CHECK lines or doing anything more complex than a removing a "Machine: *" line and formatting lines around. Differential revision: https://reviews.llvm.org/D86202	2020-08-20 11:40:51 +03:00
Bevin Hansson	1a995a0af3	[ADT] Move FixedPoint.h from Clang to LLVM. This patch moves FixedPointSemantics and APFixedPoint from Clang to LLVM ADT. This will make it easier to use the fixed-point classes in LLVM for constructing an IR builder for fixed-point and for reusing the APFixedPoint class for constant evaluation purposes. RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html Reviewed By: leonardchan, rjmccall Differential Revision: https://reviews.llvm.org/D85312	2020-08-20 10:29:45 +02:00
dfukalov	33e2f69a24	[AMDGPU][LoopUnroll] Increase BB size to analyze for complete unroll. The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no information about a loop body BB cost. In some cases, e.g. for simple loop ``` for(int i=0; i<32; ++i){ D = Arr2[i8 + C1]; Arr1[i64 + C2] += C3 * D; Arr1[i64 + C2 + 2048] += C4 D; } ``` current default parameter value is not enough to run deeper cost analyze so the loop is not completely unrolled. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D86248	2020-08-20 10:41:47 +03:00
Yvan Roux	0459f29e8b	[ARM][MachineOutliner] Add default mode. Use the stack to save and restore the link register when there is no available register to do it. Differential Revision: https://reviews.llvm.org/D76069	2020-08-20 09:25:33 +02:00
David Stenberg	dfd447c220	[LoopUnswitch] Fix incorrect Modified status When hoisting simple values out from a loop, and an optsize attribute, a convergent call, or an invoke instruction hindered the pass from unswitching the loop, the pass would return an incorrect Modified status. This was caught using the check introduced by D80916. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D86085	2020-08-20 09:04:16 +02:00
Johannes Doerfert	012819f301	[Attributor][FIX] Update the call graph properly when internalizing functions The internal version is now part of the SCC, make sure to perform this update.	2020-08-20 01:44:58 -05:00
Johannes Doerfert	3edea15f9a	[Attributor] Simplify comparison against constant null pointer Comparison against null is a common pattern that usually is followed by error handling code and the likes. We now use AANonNull to simplify these comparisons optimistically in order to make more code dead early on. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D86145	2020-08-20 01:44:58 -05:00
Johannes Doerfert	d01ad217ba	[Attributor][FIX] Do not use cyclic arguments for `nonnull` `AADereferenceable::getAssumedDereferenceableBytes()` is actually deducing `dereferenceable_or_null`. We should not use that information to deduce `nonnull`, since it doesn't imply `nonnull`.	2020-08-20 01:44:58 -05:00
Johannes Doerfert	a49dae0e38	[Attributor][AAIsDead][NFC] Skip uninteresting instructions early	2020-08-20 01:44:58 -05:00
Johannes Doerfert	08f33756e6	[Attributor][NFC] Extract functionality into own member	2020-08-20 01:44:58 -05:00
Qiu Chaofan	131b3b9ed4	[PowerPC] Support constrained scalar fptosi/fptoui This patch adds support for constrained scalar fp to int operations on PowerPC. Besides, this fixes the FP exception bit of quad-precision convert & truncate instructions. Reviewed By: steven.zhang, uweigand Differential Revision: https://reviews.llvm.org/D81537	2020-08-20 13:29:43 +08:00
Johannes Doerfert	1de70a724e	Revert "[OpenMPOpt] ICV tracking for calls" This commits breaks certain OpenMP codes (on power) because it expanded the Attributor scope without telling the Attributor about the SCC extend. See: https://reviews.llvm.org/D85544#2227611 This reverts commit `b0b32e6490`.	2020-08-20 00:00:35 -05:00
Craig Topper	8750d54cea	[X86][AutoUpgrade] Simplify string management in UpgradeDataLayoutString a bit. NFCI We don't need a std::string for a literal string, we can use a StringRef. The addition of StringRefs produces a Twine that we can just call str() without converting to a SmallString ourselves. Twine will do that internally.	2020-08-19 17:48:11 -07:00
Matt Arsenault	31adc28d24	GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.	2020-08-19 18:53:24 -04:00
Petr Hosek	1ed1e16ab8	[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows Fixes https://bugs.chromium.org/p/chromium/issues/detail?id=1119478 Patch By: haampie Differential Revision: https://reviews.llvm.org/D86245	2020-08-19 14:33:52 -07:00
Kyungwoo Lee	7a028fe702	Force Remove Attribute -force-attribute adds an attribute to function via command-line. However, there was no counter-part to remove an attribute. This patch adds -force-remove-attribute that removes an attribute from function. Differential Revision: https://reviews.llvm.org/D85586	2020-08-19 17:30:13 -04:00
Sanjay Patel	6f3511a01a	[ValueTracking] define/use max recursion depth in header There's a potential motivating case to increase this limit in PR47191: http://bugs.llvm.org/PR47191 But first we should make it less hacky. The limit in InstCombine is directly tied to this value because an increase there can cause asserts in the underlying value tracking calls if not changed together. The usage in VectorUtils is independent, but the comment suggests that we should use the same value unless there's a known reason to diverge. There are similar limits in codegen analysis, but I think we should leave those independent in case we intentionally want the optimization power/cost to be different there. Differential Revision: https://reviews.llvm.org/D86113	2020-08-19 16:56:59 -04:00
Hiroshi Yamauchi	28ccc52c40	[X86] Add feature for Fast Short REP MOV (FSRM) for Icelake or newer. Differential Revision: https://reviews.llvm.org/D85989	2020-08-19 13:39:42 -07:00
Sourabh Singh Tomar	ef8992b9f0	Re-apply "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants" This patch was reverted in `7c182663a8` due to some failures observed on PCC based machines. Failures were due to Endianness issue and long double representation issues. Patch is revised to address Endianness issue. Furthermore, support for emission of `DW_OP_implicit_value` for `long double` has been removed (since it was unclean at the moment). Planning to handle this in a clean way soon! For more context, please refer to following review link. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83560	2020-08-20 01:39:42 +05:30
Sourabh Singh Tomar	9937872c02	Revert "[DebugInfo] Emit DW_OP_implicit_value for Floating point constants" This reverts commit `15801f1619`. arc's land messed up! It removed the new commit message and took it from revision.	2020-08-20 01:28:03 +05:30
Raul Tambre	e887d0e89b	[AArch64][GlobalISel] Handle rtcGPR64RegClassID in AArch64RegisterBankInfo::getRegBankFromRegClass() TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16 and X17, as it's the last matching class. This in turn gets passed to AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable. It seems sensible to handle this case, so copies from X16 and X17 work. Copying from X17 is used in inline assembly in libunwind for pointer authentication. Differential Revision: https://reviews.llvm.org/D85720	2020-08-19 12:52:30 -07:00
Sourabh Singh Tomar	15801f1619	[DebugInfo] Emit DW_OP_implicit_value for Floating point constants llvm is missing support for DW_OP_implicit_value operation. DW_OP_implicit_value op is indispensable for cases such as optimized out long double variables. For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions Consider the following example: ``` int main() { long double ld = 3.14; printf("dummy\n"); ld *= ld; return 0; } ``` when compiled with tunk `clang` as `clang test.c -g -O1` produces following location description of variable `ld`: ``` DW_AT_location (0x00000000: [0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value) DW_AT_name ("ld") ``` Here one may notice that this representation is incorrect(DWARF4 stack could only hold integers(and only up to the size of address)). Here the variable size itself is `128` bit. GDB and LLDB confirms this: ``` (gdb) p ld $1 = <invalid float value> (lldb) frame variable ld (long double) ld = <extracting data from value failed> ``` GCC represents/uses DW_OP_implicit_value in these sort of situations. Based on the discussion with Jakub Jelinek regarding GCC's motivation for using this, I concluded that DW_OP_implicit_value is most appropriate in this case. Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html GDB seems happy after this patch:(LLDB doesn't have support for DW_OP_implicit_value) ``` (gdb) p ld p ld $1 = 3.14000000000000012434 ``` Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D83560	2020-08-20 01:20:40 +05:30
Hiroshi Yamauchi	ab401a8c8a	[PGO][PGSO][LV] Fix loop not vectorized issue under profile guided size opts. D81345 appears to accidentally disables vectorization when explicitly enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to the vectorization with versioning-for-unit-stride for PGSO. Differential Revision: https://reviews.llvm.org/D85784	2020-08-19 12:13:34 -07:00
Matt Arsenault	adbcc8e733	GlobalISel: Add TargetLowering member to LegalizerHelper	2020-08-19 14:50:35 -04:00
Florian Hahn	c0cbe6453a	[DSE] Remove dead argument from removePartiallyOverlappedStores (NFC). The argument is unused and can be removed.	2020-08-19 19:33:52 +01:00
Matt Arsenault	d64ad3f051	GlobalISel: Don't check for verifier enforced constraint Loads are always required to have a single memory operand.	2020-08-19 14:15:38 -04:00
Matt Arsenault	9e8d59a9b8	AMDGPU/GlobalISel: Remove hack for combines forming illegal extloads Previously we weren't adding the LegalizerInfo to the post-legalizer combiner. Since that's fixed, we don't need to try to filter out the one case that was breaking.	2020-08-19 14:15:38 -04:00
Matt Arsenault	e95c08432a	GlobalISel: Use Register	2020-08-19 13:45:31 -04:00
Petr Hosek	8e4acb82f7	[CMake] Fix OCaml build failure because of absolute path in system libs D85820 introduced a full path in the LLVM_SYSTEM_LIBS property of the LLVMSupport target, which made the OCaml bindings fail to build, since they use -l [system_lib] flags for every lib in LLVM_SYSTEM_LIBS, which cannot work with absolute paths. This patch solves the issue in a similar vain as ZLIB does it: it adds the full library path to imported_libs, and adds a stripped down version without directories, lib prefix and lib suffix to system_libs In the future we should probably make some changes to LLVM_SYSTEM_LIBS, since both zlib and ncurses do not necessarily have to be system libs anymore due to the find_package / find_library bits introduced in D85820 and D79219. Patch By: haampie Differential Revision: https://reviews.llvm.org/D86134	2020-08-19 10:33:03 -07:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Jessica Paquette	d25b12bdc3	[GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x If we have a mask, and a value x, where (x & mask) == x, we can drop the AND and just use x. This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64. In AArch64, this is most useful post-legalization. Patterns like this often show up when legalizing s1s, which must be extended to larger types. e.g. ``` %cmp:_(s32) = G_ICMP ... %and:_(s32) = G_AND %cmp, 1 ``` Since G_ICMP only produces a single bit, there's no reason to mask it with the G_AND. Differential Revision: https://reviews.llvm.org/D85463	2020-08-19 10:20:57 -07:00
Hamilton Tobon Mosquera	bd2fa1819b	[OpenMPOpt][HideMemTransfersLatency] Moving the 'wait' counterpart of __tgt_target_data_begin_mapper canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point. Differential Revision: https://reviews.llvm.org/D86155	2020-08-19 11:42:22 -05:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Sanjay Patel	c8d711adae	[InstCombine] reduce code duplication; NFC	2020-08-19 12:05:12 -04:00
Benjamin Kramer	b98e25b6d7	Make helpers static. NFC.	2020-08-19 16:00:03 +02:00
Roman Lebedev	3d76a133c7	Revert "[InstCombine] Lower infinite combine loop detection thresholds" And as being reported by Florian Hahn, there's a hit in MultiSource/Benchmarks/mafft from the test-suite on X86 with -O3 -flto, so reverting until addressed. This reverts commit `71e0b82c9f`.	2020-08-19 16:53:30 +03:00
Simon Pilgrim	057bdd63a4	[X86][AVX] lowerShuffleWithVPMOV - minor refactor to more closely match lowerShuffleAsVTRUNC Replace isBuildVectorAllZeros check by using the Zeroable bitmask instead.	2020-08-19 14:34:32 +01:00
Simon Pilgrim	9fee2bad6d	[X86] lowerShuffleWithVPMOV - remove unnecessary shuffle commutation. NFCI. canonicalizeShuffleMaskWithCommute should have already ensured the lower elements are from V1, we do have test coverage for this already.	2020-08-19 13:28:59 +01:00
Simon Pilgrim	b61cef3a92	[X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths. Thanks to @fhahn for the test case.	2020-08-19 13:00:26 +01:00
Roman Lebedev	71e0b82c9f	[InstCombine] Lower infinite combine loop detection thresholds It's been a month since `2f3862eb9f`, and no new bug reports about the threshold were filled, so let's bump it again and wait again.	2020-08-19 14:37:57 +03:00
Simon Pilgrim	80a0dc59b7	[X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling. Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.	2020-08-19 11:39:27 +01:00
Simon Pilgrim	46fc9a0dfc	[X86][AVX] Fold store(extract_element(vtrunc)) to truncated store Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly. Differential Revision: https://reviews.llvm.org/D86158	2020-08-19 11:10:20 +01:00
sstefan1	b0b32e6490	[OpenMPOpt] ICV tracking for calls Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a function can have a unique ICV value after it is finished, and AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a call site. This enables us to check the value of a call and if it changes the ICV. This also changes the approach in `getReplacementValues()` to a worklist-based approach so we can explore all relevant BBs. Differential Revision: https://reviews.llvm.org/D85544	2020-08-19 11:43:12 +02:00
Meera Nakrani	545de56f87	[ARM] Enabled VMLAV and Add instructions to use VMLAVA Used InstCombine to enable VMLAV and Add instructions to generate VMLAVA instead with tests.	2020-08-19 08:36:49 +00:00
luxufan	6c5039a10f	[RISCV] add the assemble and disassemble support of Zvlsseg instructions This implements the assemble and disassemble support of RISCV Vector extension Zvlsseg instructions, base on the 0.9 spec version. Reviewed by HsiangKai Differential Revision: https://reviews.llvm.org/D84416	2020-08-19 16:22:25 +08:00
Florian Hahn	1a55fbceaa	[DSE,MemorySSA] Use NumRedundantStores instead of NumNoopStores. Legacy DSE uses NumRedundantStores, while MemorySSA DSE uses NumNoopStores. We should just use the same counter.	2020-08-19 08:50:33 +01:00
Ronak Chauhan	fdf71d486c	Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors" This reverts commit `cacfb02d28`. Reverting due to buildbot failures.	2020-08-19 13:12:29 +05:30
David Sherwood	3f36561f69	[SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only ever deals with fixed width types, hence the offsets for each individual store never take 'vscale' into account. I've changed the code in that function to use TypeSize instead of unsigned for tracking the remaining load amount. In addition, I've changed the load loop to use the new IncrementPointer helper function for updating the addresses in each iteration, since this handles scalable vector types. Also, I've added report_fatal_errors in GenWidenVectorExtLoads, TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores, since these functions currently use a sequence of element-by-element scalar loads/stores. In a similar vein, I've also added a fatal error report in FindMemType for the case when we decide to return the element type for a scalable vector type. I've added new tests in CodeGen/AArch64/sve-split-load.ll CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll for the changes in GenWidenVectorLoads. Differential Revision: https://reviews.llvm.org/D85909	2020-08-19 07:54:32 +01:00
Yaxun (Sam) Liu	7546b29e76	[HIP] Support target id by --offload-arch This patch introduces support of target id by -offload-arch. Differential Revision: https://reviews.llvm.org/D60620	2020-08-18 23:43:53 -04:00
Ronak Chauhan	cacfb02d28	[AMDGPU] Support disassembly for AMDGPU kernel descriptors Decode AMDGPU Kernel descriptors as assembler directives. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D80713	2020-08-19 08:49:07 +05:30
Changpeng Fang	e7081d117a	AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc Summary: When the resource descriptor is of vgpr, we need a waterfall loop to read into a sgpr. In this patchm we generalized the implementation to work for any regster class sizes, and extend the work to MIMG instructions. Fixes: SWDEV-223405 Reviewers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D82603	2020-08-18 16:27:36 -07:00
Roman Lebedev	2f01785857	[NFC][InstCombine] Aggregate reconstruction: use plain map Now that we no longer require for this map to have stable iteration order, we no longer need to pay for keeping the iteration order stable, so switch from `SmallMapVector` to `SmallDenseMap`.	2020-08-19 01:09:25 +03:00
Roman Lebedev	78bd4231bf	[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors While it may seem like we can just "deduplicate" the case where some basic block happens to be a predecessor more than once, which happens for e.g. switches, that is not correct thing to do. We must actually add a PHI operand for each predecessor. This was initially reported to me by David Major as a clang crash during gecko build for android.	2020-08-19 01:00:42 +03:00
Amara Emerson	ed35344524	Use std::make_tuple instead of initializer lists to make a bot happy: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux	2020-08-18 14:55:52 -07:00
Craig Topper	9028c03ce6	[X86] Fix the Predicates on MMX_PSHUFWri/PSHUFWmi to include SSE1 in addition to MMX. These instructions weren't in the initial version of MMX, but were added when SSE1 was introduced. We already have the intrinsic named correctly to include sse and the frontened header enforces sse. We have one place in the backend where we DAG combine to this intrinsic, but that's also qualified. So don't know of anything currently broken unless someone writes their own IR and doesn't set the sse feature.	2020-08-18 14:28:26 -07:00
David Blaikie	1870b52f0c	Recommit "PR44685: DebugInfo: Handle address-use-invalid type units referencing non-type units" Originally committed as `be3ef93bf5`. Reverted by `b4bffdbadf` due to bot failures: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/17380/testReport/junit/LLVM/DebugInfo_X86/addr_tu_to_non_tu_ll/ http://45.33.8.238/win/22216/step_11.txt MacOS failure due to testing Split DWARF which isn't compatible with MachO. Windows failure due to testing type units which aren't enabled on Windows. Fix both of these by applying an explicit x86 linux triple to the test.	2020-08-18 13:43:28 -07:00
Eli Friedman	be944c85f3	[AArch64][SVE] Add patterns for integer mla/mls. We probably want to introduce pseudo-instructions at some point, like we have for binary operations, but this seems okay for now. One thing I'm not sure about is whether we should be doing this as a DAGCombine instead of directly pattern-matching it. I don't see any big downside to doing it this way, though. Differential Revision: https://reviews.llvm.org/D85681	2020-08-18 12:51:16 -07:00
Eli Friedman	bb18532399	[AArch64][SVE] Allow llvm.aarch64.sve.st2/3/4 with vectors of pointers. This isn't necessaary for ACLE, but could be useful in other situations. And the change is simple. Differential Revision: https://reviews.llvm.org/D85251	2020-08-18 12:51:16 -07:00
Jessica Paquette	bf36e90295	[GlobalISel][CallLowering] NFC: Unify flag-setting from CallBase + AttributeList It's annoying to have to maintain multiple, nearly identical chains of if statements which all set the same attributes. Add a helper function, `addFlagsUsingAttrFn` which performs the attribute setting. Then, use wrappers for that function in `lowerCall` and `setArgFlags`. (Note that the flag-setting code in `setArgFlags` was missing the returned attribute. There's no selection for this yet, so no test. It's an example of the kind of thing this lets us avoid, though.) Differential Revision: https://reviews.llvm.org/D86159	2020-08-18 11:07:33 -07:00
Jessica Paquette	f29e6277ad	[GlobalISel][CallLowering] Don't tail call with non-forwarded explicit sret Similar to this commit: `faf8065a99` Testcase is pretty much the same as test/CodeGen/AArch64/tailcall-explicit-sret.ll Except it uses i64 (since we don't handle the i1024 return values yet), and doesn't have indirect tail call testcases (because we can't translate those yet). Differential Revision: https://reviews.llvm.org/D86148	2020-08-18 11:06:57 -07:00
Matt Arsenault	5a15f6628e	GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT Add unit tests since AMDGPU will only trigger this for gigantic vectors, and won't use the annoying odd sized breakdown case.	2020-08-18 13:51:19 -04:00
David Blaikie	f7a49d2aa6	[WIP][DebugInfo] Lazily parse debug_loclist offsets Parsing DWARFv5 debug_loclist offsets when a CU is parsed is weighing down memory usage of symbolizers that don't need to parse this data at all. There's not much benefit to caching these anyway - since they are O(1) lookup and reading once you know where the offset list starts (and can do bounds checking with the offset list size too). In general, I think it might be time to start paying down some of the technical debt of loc/loclist/range/rnglist parsing to try to unify it a bit more. eg: * Currently DWARFUnit has: RangeSection, RangeSectionBase, LocSection, LocSectionBase, LocTable, RngListTable, LoclistTableHeader (be nice if these were all wrapped up in two variables - one for loclists, one for rnglists) * rnglists and loclists are handled differently (see: LoclistTableHeader, but no RnglistTableHeader) * maybe all these types could be less stateful - lazily parse what they need to, even reparsing rather than caching because it doesn't seem too expensive, for instance. (though admittedly so long as it's constantcost/overead per compilatiton that's probably adequate) * Maybe implementing and using a DWARFDataExtractor that can be sub-ranged (so we could slice it up to just the single contribution) - though maybe that's not so useful because loc/ranges need to refer to it by absolute, not contribution-relative mechanisms Differential Revision: https://reviews.llvm.org/D86110	2020-08-18 10:49:39 -07:00
Amara Emerson	04a6ea5d77	[GlobalISel] Add a combine for sext_inreg(load x), c --> sextload x This is restricted to single use loads, which if we fold to sextloads we can find more optimal addressing modes on AArch64. This also fixes an overload the MachineFunction::getMachineMemOperand() method which was incorrectly using the MF alignment instead of the MMO alignment. Differential Revision: https://reviews.llvm.org/D85966	2020-08-18 10:42:15 -07:00
Amara Emerson	40e269ea6d	[GlobalISel] Add a combine for ashr(shl x, c), c --> sext_inreg x, c' By detecting this sign extend pattern early, we can uncover opportunities for more optimizations. Differential Revision: https://reviews.llvm.org/D85965	2020-08-18 10:42:15 -07:00
Simon Pilgrim	11ff5176c4	[X86][AVX] lowerShuffleWithVPMOV - add non-VLX support. We can efficiently handle non-VLX cases now that we have the getAVX512TruncNode helper.	2020-08-18 17:51:14 +01:00
Fangrui Song	c466c5fa7e	[ARM] Fix build after D86087	2020-08-18 09:20:32 -07:00
David Green	3471520b1f	[ARM] Allow tail predication of VLDn VLD2/4 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up loading extra values into off vector lanes, not effecting the on lanes. The same is true for loads in general where so long as we are not using the other vector lanes, an unpredicated load can be converted to a predicated one. This marks VLD2 and VLD4 instructions as validForTailPredication and allows any unpredicated load in tail predication loop, which seems to be valid given the other checks we have. Differential Revision: https://reviews.llvm.org/D86022	2020-08-18 17:15:45 +01:00
Sam Tebbs	31f02ac60a	[ARM] Use mov operand if the mov cannot be moved while tail predicating There are some cases where the instruction that sets up the iteration count for a tail predicated loop cannot be moved before the dlstp, stopping tail predication entirely. This patch checks if the mov operand can be used and if so, uses that instead. Differential Revision: https://reviews.llvm.org/D86087	2020-08-18 17:10:29 +01:00
Jamie Schmeiser	645c6856a6	[NFC] Add raw_ostream parameter to printIR routines This is a non-functional-change to generalize the printIR routines so that the output can be saved and manipulated rather than being directly output to dbgs(). This is a prerequisite change for many upcoming changes that allow new ways of examining changes made to the IR in the new pass manager. Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D85999	2020-08-18 16:05:27 +00:00
Jessica Paquette	224a8c639e	[GlobalISel][CallLowering] Look through call parameters for flags We weren't looking through the parameters on calls at all. E.g., say you had ``` declare i32 @zext(i32 zeroext %x) ... %y = call i32 @zext(i32 %something) ... ``` At the point of the call, we wouldn't know that the %something should have the zeroext attribute. This sets flags in about the same way as TargetLoweringBase::ArgListEntry::setAttributes. Differential Revision: https://reviews.llvm.org/D86125	2020-08-18 08:48:56 -07:00
jasonliu	f48eced390	[XCOFF] emit .rename for .lcomm when necessary Summary: This is a follow up for D82481. For .lcomm directive, although it's not necessary to have .rename emitted, it's still desirable to do it so that we do not see internal 'Rename..' gets print out in symbol table. And we could have consistent naming between TC entry and .lcomm. And also have consistent naming between IR and final object file. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D86075	2020-08-18 15:32:45 +00:00
Simon Pilgrim	abd33bf5ef	[X86][AVX] lowerShuffleWithPERMV - pad 128/256-bit shuffles on non-VLX targets Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles. TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.	2020-08-18 15:46:02 +01:00
Simon Pilgrim	011bf4fd96	[X86][AVX] lowerShuffleWithVTRUNC - extend to support v16i16/v32i8 binary shuffles. This requires a few additional SrcVT vs DstVT padding cases in getAVX512TruncNode.	2020-08-18 15:30:02 +01:00
Simon Pilgrim	d5621b83a5	[X86][AVX] lowerShuffleWithVTRUNC - pull out TRUNCATE/VTRUNC creation into helper code. NFCI. Prep work toward adding v16i16/v32i8 support for lowerShuffleWithVTRUNC and improving lowerShuffleWithVPMOV.	2020-08-18 14:52:42 +01:00
Matt Arsenault	2f5f5febf3	AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize Previously, it would successfully select and assert if not HSA or PAL when expanding the pseudoinstruction. We don't need the pseudoinstruction anymore since we know the total size after legalization.	2020-08-18 09:28:01 -04:00
Matt Arsenault	3ba7777b94	AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT The code to determine the value size was overcomplicated and only correct in the case where the result register already had a register class assigned. We can always take the size directly from the register's type.	2020-08-18 09:28:01 -04:00
Sanjay Patel	139da9c4d7	[InstCombine] fold fabs of select with negated operand This is the FP example shown in: https://bugs.llvm.org/PR39474	2020-08-18 09:23:07 -04:00
Georgii Rymar	bd7daf5ceb	[yaml2obj] - Don't crash when `FileHeader` declares an empty `Flags` key in specific situations. We currently call the `llvm_unreachable` for the following YAML: ``` --- !ELF FileHeader: Class: ELFCLASS32 Data: ELFDATA2LSB Type: ET_REL Machine: EM_NONE Flags: [ ] ``` it happens because the `Flags` key is present, though `EM_NONE` is a machine type that has no known `EF_*` values and we call `llvm_unreachable` by mistake. Differential revision: https://reviews.llvm.org/D86138	2020-08-18 16:09:28 +03:00
Simon Pilgrim	7db5124736	[X86][AVX] lowerShuffleWithVTRUNC - avoid unnecessary division in element counts. NFCI. (256 / SrcEltBits) == ((2 * EltSizeInBits * NumElts) / (EltSizeInBits * Scale)) == (2 * (NumElts / Scale)) == NumSrcElts	2020-08-18 13:48:22 +01:00
Nico Weber	b4bffdbadf	Revert "PR44685: DebugInfo: Handle address-use-invalid type units referencing non-type units" This reverts commit `be3ef93bf5`. Test fails on macOS and Windows, e.g. http://45.33.8.238/win/22216/step_11.txt	2020-08-18 08:40:36 -04:00
Ronak Chauhan	e760e85680	[llvm-objdump][AMDGPU] Detect CPU string AMDGPU ISA isn't backwards compatible and hence -mcpu must always be specified during disassembly. However, the AMDGPU target CPU is stored in e_flags in the ELF object. This patch allows targets to implement CPU string detection, and also implements it for AMDGPU by looking at e_flags. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D84519	2020-08-18 17:43:16 +05:30
Paul Walker	9f63dc3265	[SVE] Fix shift-by-imm patterns used by asr, lsl & lsr intrinsics. Right shift patterns will no longer incorrectly accept a shift amount of zero. At the same time they will allow larger shift amounts that are now saturated to their upper bound. Patterns have been extended to enable immediate forms for shifts taking an arbitrary predicate. This patch also unifies the code path for immediate parsing so the i64 based shifts are no longer treated specially. Differential Revision: https://reviews.llvm.org/D86084	2020-08-18 11:41:26 +01:00
Paul Walker	cb5cc47a65	[SVE] Lower fixed length vector ISD::SPLAT_VECTOR operations. Also strengthens the CHECK lines for scalable vector splat tests. Differential Revision: https://reviews.llvm.org/D86070	2020-08-18 11:19:43 +01:00
Simon Pilgrim	d2057a8015	[X86][AVX] Lower v16i8/v8i16 binary shuffles using VTRUNC/TRUNCATE This patch adds lowerShuffleWithVTRUNC to handle basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements). We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat. For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result. This should address the last regressions in D66004 Differential Revision: https://reviews.llvm.org/D86093	2020-08-18 11:11:58 +01:00
Shinji Okumura	5e361e2aa4	[Attributor] Deduce noundef attribute This patch introduces a new abstract attribute `AANoUndef` which corresponds to `noundef` IR attribute and deduce them. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85184	2020-08-18 18:05:54 +09:00
David Blaikie	be3ef93bf5	PR44685: DebugInfo: Handle address-use-invalid type units referencing non-type units Theory was that we should never reach a non-type unit (eg: type in an anonymous namespace) when we're already in the invalid "encountered an address-use, so stop emitting types for now, until we throw out the whole type tree to restart emitting in non-type unit" state. But that's not the case (prior commit cleaned up one reason this wasn't exposed sooner - but also makes it easier to test/demonstrate this issue)	2020-08-17 21:42:00 -07:00
David Blaikie	24c3dabef4	DebugInfo: Emit class template parameters first, before members This reads more like what you'd expect the DWARF to look like (from the lexical order of C++ - template parameters come before members, etc), and also happens to make it easier to tickle (& thus test) a bug related to type units and Split DWARF I'm about to fix.	2020-08-17 21:42:00 -07:00
Johannes Doerfert	8abd69aa9e	[Attributor] Bail early if AAMemoryLocation cannot derive anything Before this change we looked through all memory operations in a function even if the first was an unknown call that could do anything. This did cost a lot of time but there is little use to do so. We also avoid creating AAs for things that we would have looked at in case no other AA will; that is the reason for the test changes. Running only the attributor-cgscc pass on a IR version of `llvm-test-suite/MultiSource/Applications/SPASS/clause.c` reduced the time we spend in `AAMemoryLocation::update` from 4% total to 0.9% (disclaimer: no accurate measurements).	2020-08-17 23:36:36 -05:00
Johannes Doerfert	1d99c3d707	[Attributor] We (should) keep the CG updated so we can mark it as preserved	2020-08-17 23:36:36 -05:00
Johannes Doerfert	858c75f7d1	[Attributor][NFC] Directly return proper type to avoid casts	2020-08-17 23:36:36 -05:00
Johannes Doerfert	b27bdf955a	[Attributor][FIX] Handle function pointers properly in AANonNull Before we tired to create a dominator tree for a declaration when we wanted to determine if the function pointer is `nonnull`. We now avoid looking at global values if `Value::getPointerDereferenceableBytes` not already determined `nonnull`.	2020-08-17 23:36:35 -05:00
Harmen Stoppels	a52173a3e5	Use find_library for ncurses Currently it is hard to avoid having LLVM link to the system install of ncurses, since it uses check_library_exists to find e.g. libtinfo and not find_library or find_package. With this change the ncurses lib is found with find_library, which also considers CMAKE_PREFIX_PATH. This solves an issue for the spack package manager, where we want to use the zlib installed by spack, and spack provides the CMAKE_PREFIX_PATH for it. This is a similar change as https://reviews.llvm.org/D79219, which just landed in master. Differential revision: https://reviews.llvm.org/D85820	2020-08-17 19:52:52 -07:00
Amy Kwan	c7ec3a7e33	[PowerPC] Implement Vector Extract Mask builtins in LLVM/Clang This patch implements the vec_extractm function prototypes in altivec.h in order to utilize the vector extract with mask instructions introduced in Power10. Differential Revision: https://reviews.llvm.org/D82675	2020-08-17 21:14:17 -05:00
Hamilton Tobon Mosquera	496f8e5b36	[OpenMPOpt][HideMemTransfersLatency] Split __tgt_target_data_begin_mapper into its "issue" and "wait" counterparts. WIP that tries to hide the latency of runtime calls that involve host to device memory transfers by splitting them into their "issue" and "wait" versions. The "issue" is moved upwards as much as possible. The "wait" is moved downards as much as possible. The "issue" issues the memory transfer asynchronously, returning a handle. The "wait" waits in the returned handle for the memory transfer to finish. We still lack of the movement.	2020-08-17 20:56:10 -05:00
Aditya Kumar	370330f084	NFC: [GVNHoist] Outline functions from the class Reviewers: sebpop Reviewed By: hiraditya Differential Revision: https://reviews.llvm.org/D86032	2020-08-17 17:40:04 -07:00
Craig Topper	b673dfbb9a	[X86] When manually creating intrinsic nodes in X86ISelLowering, make sure we use getTargetConstant and pointer type for the intrinsic ID. Doesn't really matter in practice but that's how the nodes are normally created by SelectionDAGBuilder. So we should match. Found by temporarily hacking type checks into isel table.	2020-08-17 17:25:53 -07:00
Craig Topper	2ffa5d218f	[X86] Rename INTR_TYPE_4OP to INTR_TYPE_4OP_IMM8 and truncate immediates to MVT::i8 This makes sure VPTERNLOG is generated with MVT::i8 immediate as its SDNode declaration in X86InstrFragmentsSIMD.td declares.	2020-08-17 17:25:52 -07:00
Craig Topper	bc244f08cf	[X86] Truncate immediate to i8 for INTR_TYPE_3OP_IMM8 This is used for DBPSADBW which has a i32 immediate for its intrinsic and an i8 immediate in tablegen isel patterns.	2020-08-17 17:25:51 -07:00
Craig Topper	ab7151f1cf	[X86] Make PreprocessISelDAG create X86ISD::VRNDSCALE nodes with i32 constants instead of i8. This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern matching doesn't check so it doesn't matter in practice. Maybe for SelectionDAG CSE it would matter.	2020-08-17 17:25:51 -07:00
Mircea Trofin	62fc44ca3c	[MLInliner] In development mode, obtain the output specs from a file Different training algorithms may produce models that, besides the main policy output (i.e. inline/don't inline), produce additional outputs that are necessary for the next training stage. To facilitate this, in development mode, we require the training policy infrastructure produce a description of the outputs that are interesting to it, in the form of a JSON file. We special-case the first entry in the JSON file as the inlining decision - we care about its value, so we can guide inlining during training - but treat the rest as opaque data that we just copy over to the training log. Differential Revision: https://reviews.llvm.org/D85674	2020-08-17 16:56:47 -07:00
Hongtao Yu	819b2d9c79	[llvm-objdump] Symbolize binary addresses for low-noisy asm diff. When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise. In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand. So far only the X86 disassemblers are supported. Test Plan: llvm-objdump -d --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr : ``` Disassembly of section .text: <_start>: push rax mov dword ptr [rsp + 4], 0 mov dword ptr [rsp], 0 mov eax, dword ptr [rsp] cmp eax, dword ptr [rip + 4112] # 202182 <g> jge 0x20117e <_start+0x25> call 0x201158 <foo> inc dword ptr [rsp] jmp 0x201169 <_start+0x10> xor eax, eax pop rcx ret ``` llvm-objdump -d --symbolize-operands --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr : ``` Disassembly of section .text: <_start>: push rax mov dword ptr [rsp + 4], 0 mov dword ptr [rsp], 0 <L1>: mov eax, dword ptr [rsp] cmp eax, dword ptr <g> jge <L0> call <foo> inc dword ptr [rsp] jmp <L1> <L0>: xor eax, eax pop rcx ret ``` Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion. Reviewed By: jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D84191	2020-08-17 16:55:12 -07:00
Johannes Doerfert	19bd4ef157	[Attributor] Properly use the call site argument position	2020-08-17 18:21:09 -05:00
Johannes Doerfert	5dfc207c53	[Attributor][FIX] Do not request an AANonNull for non-pointer types	2020-08-17 18:21:08 -05:00
Kazushi (Jam) Marukawa	68cb29eff1	[VE] Modify ISelLoweirng following clang-tidy Modify case style of function names following clang-tidy. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D86076	2020-08-18 07:43:19 +09:00
Roman Lebedev	03127f795b	[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block While the original implementation added in D85787 / `ae7f08812e` is not incorrect, it is known to be suboptimal. In particular, it is not incorrect to use the basic block in which the original `insertvalue` instruction is located as the merge point, that is not necessarily optimal, as `@test6` shows. We should look at all the AggElts, and, if they are all defined in the same basic block, then that is the basic block we should use. On RawSpeed library, this catches +4% (+50) more cases. On vanilla LLVM test-suits, this catches +12% (+92) more cases.	2020-08-18 00:45:18 +03:00
Roman Lebedev	f4f673e0e3	[NFC][InstCombine] PHI-aware aggregate reconstruction: don't capture UseBB in lambdas, take it as argument In a following patch, UseBB will be detected later, so capturing it is potentially error-prone (capture by ref vs by val). Also, parametrized UseBB will likely be needed for multiple levels of PHI indirections later on anyways.	2020-08-18 00:45:18 +03:00
Roman Lebedev	4973ca3eac	[NFC][InstCombine] PHI-aware aggregate reconstruction: insert PHI node manually This is NFC at the moment, because right now we always insert the PHI into the same basic block in which the original `insertvalue` instruction is, but that will change. Also, fixes addition of the suffix to the value names.	2020-08-18 00:45:17 +03:00
Matt Arsenault	a128292b90	GlobalISel: Make type for lower action more consistently optional Some of the lower implementations were relying on this, however the type was not set depending on which form .lower* helper form you were using. For instance, if you used an unconditonal lower(), the type was never set. Most of the lower actions do not benefit from a type parameter, and just expand in terms of the original operation's types. However, some lowerings could benefit from an additional type hint to combine a promotion and an expansion. An example of this is for add/sub sat. The DAG integer legalization tries to use smarter expansions directly when promoting the integer type, and doesn't always produce the same instruction with a wider type. Treat this as an optional hint argument, that only means something for specific lower actions. It may be useful to generalize this mechanism to pass a full list of type indexes and desired types, but I haven't run into a case like that yet.	2020-08-17 16:24:55 -04:00
diggerlin	2f0d755d81	[AIX][XCOFF][Patch1] Provide decoding trace back table information API for xcoff object file for llvm-objdump -d SUMMARY: 1. This patch provided API for decoding the traceback table info and unit test for the these API. 2. Another patchs will do the following things: 2.1 added a new option --traceback-table to decode the trace back table information for xcoff object file when using llvm-objdump to disassemble the xcoff objfile. 2.2 print out the traceback table information for llvm-objdump. Reviewers: Jason liu, Hubert Tong, James Henderson Differential Revision: https://reviews.llvm.org/D81585	2020-08-17 16:23:47 -04:00
Florian Hahn	4cc20aa743	[DSE,MemorySSA] Skip access already dominated by a killing def. If we already found a killing def (= a def that completely overwrites the location) that dominates an access, we can skip processing it further. This does not help with compile-time, but increases the number of memory accesses we can process with the same scan budget, leading to more stores being eliminated. Improvements with this change Same hash: 203 (filtered out) Remaining: 34 Metric: dse.NumFastStores Program base dom diff test-suite...rolangs-C++/family/family.test 2.00 4.00 100.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 172.00 229.00 33.1% test-suite...ks/Prolangs-C/agrep/agrep.test 10.00 12.00 20.0% test-suite...oxyApps-C++/miniFE/miniFE.test 44.00 51.00 15.9% test-suite...marks/7zip/7zip-benchmark.test 1285.00 1474.00 14.7% test-suite...006/450.soplex/450.soplex.test 254.00 289.00 13.8% test-suite...006/447.dealII/447.dealII.test 2466.00 2798.00 13.5% test-suite...000/197.parser/197.parser.test 9.00 10.00 11.1% test-suite.../Benchmarks/nbench/nbench.test 85.00 91.00 7.1% test-suite...ce/Applications/siod/siod.test 68.00 72.00 5.9% test-suite...ications/JM/lencod/lencod.test 786.00 824.00 4.8% test-suite...6/464.h264ref/464.h264ref.test 765.00 798.00 4.3% test-suite.../Benchmarks/Ptrdist/bc/bc.test 105.00 109.00 3.8% test-suite...lications/obsequi/Obsequi.test 29.00 28.00 -3.4% test-suite...3.xalancbmk/483.xalancbmk.test 1322.00 1367.00 3.4% test-suite...chmarks/MallocBench/gs/gs.test 118.00 122.00 3.4% test-suite...T2006/401.bzip2/401.bzip2.test 60.00 62.00 3.3% test-suite...6/482.sphinx3/482.sphinx3.test 30.00 31.00 3.3% test-suite...rks/tramp3d-v4/tramp3d-v4.test 862.00 887.00 2.9% test-suite...telecomm-gsm/telecomm-gsm.test 78.00 80.00 2.6% test-suite...ediabench/gsm/toast/toast.test 78.00 80.00 2.6% test-suite.../Applications/SPASS/SPASS.test 163.00 167.00 2.5% test-suite...lications/ClamAV/clamscan.test 240.00 245.00 2.1% test-suite...006/453.povray/453.povray.test 1392.00 1419.00 1.9% test-suite...000/255.vortex/255.vortex.test 211.00 215.00 1.9% test-suite...:: External/Povray/povray.test 1295.00 1317.00 1.7% test-suite...lications/sqlite3/sqlite3.test 175.00 177.00 1.1% test-suite...T2000/256.bzip2/256.bzip2.test 99.00 100.00 1.0% test-suite...0/253.perlbmk/253.perlbmk.test 629.00 635.00 1.0% test-suite.../CINT2006/403.gcc/403.gcc.test 1183.00 1194.00 0.9% test-suite.../CINT2000/176.gcc/176.gcc.test 647.00 653.00 0.9% test-suite...ications/JM/ldecod/ldecod.test 512.00 516.00 0.8% test-suite...0.perlbench/400.perlbench.test 1026.00 1034.00 0.8% test-suite...-typeset/consumer-typeset.test 1876.00 1877.00 0.1% Geomean difference 7.3%	2020-08-17 20:54:48 +01:00
Alexandre Ganea	98e01f56b0	Revert "Re-Re-land: [CodeView] Add full repro to LF_BUILDINFO record" This reverts commit `a3036b3863`. As requested in: https://reviews.llvm.org/D80833#2221866 Bug report: https://crbug.com/1117026	2020-08-17 15:49:18 -04:00
Matt Arsenault	a9ee0589a8	AMDGPU/GlobalISel: Match global saddr addressing mode	2020-08-17 15:48:06 -04:00
Sanjay Patel	f925fd3304	[DAGCombiner] give magic number a name in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	046b4a550a	[DAGCombiner] reduce code duplication in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	20c85fd1ab	[DAGCombiner] simplify bool return in getStoreMergeCandidates; NFC	2020-08-17 15:37:55 -04:00
Sanjay Patel	52cd8f1ecb	[DAGCombiner] clean up getStoreMergeCandidates(); NFC 1. Move bailouts and local var declarations. 2. Convert if-chain to switch on StoreSource with unreachable default.	2020-08-17 15:37:54 -04:00
Sanjay Patel	27708db3e3	[DAGCombiner] convert StoreSource if-chain to switch; NFC The "isa" checks were less constrained because they allow target constants, but the later matching code would bail out on those anyway, so this should be slightly more efficient.	2020-08-17 15:37:54 -04:00
Tyker	a79e604462	[AssumeBundles] Fix Bug in Assume Queries this bug was causing miscompile. now clang cant properly selfhost with -mllvm --enable-knowledge-retention Reviewed By: jdoerfert, lebedev.ri Differential Revision: https://reviews.llvm.org/D83507	2020-08-17 21:36:53 +02:00
Matt Arsenault	e1a2f4713c	AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cases due to the shift/scale applied in getelementptrs. This relies on reducing the 64-bit shift to 32-bits.	2020-08-17 15:28:14 -04:00
Stanislav Mekhanoshin	24182f14b6	[AMDGPU] Define spill opcodes for all AGPR sizes Since we have defined all these sizes I believe we shall be able to spill these as well. Differential Revision: https://reviews.llvm.org/D86098	2020-08-17 12:17:23 -07:00
Dávid Bolvanský	0f14b2e6cb	Revert "[BPI] Improve static heuristics for integer comparisons" This reverts commit `50c743fa71`. Patch will be split to smaller ones.	2020-08-17 20:44:33 +02:00
Jonas Devlieghere	295eb54deb	[llvm] Don't create the directory hierarchy in the FileCollector... ... if the collected file doesn't exists. This fixes the situation where LLDB can't create a file when capturing a reproducer because the parent path doesn't exist, but can during replay because the file collector created the directory hierarchy even though the file doesn't exist. This is covered by the lldb reproducer test suite.	2020-08-17 11:21:39 -07:00
Florian Hahn	df4756ec6c	[DSE,MemorySSA] Check for underlying objects first. isWriteAtEndOfFunction needs to check all memory uses of Def, which is much more expensive than getting the underlying objects in practice. Switch the call order, as recommended by the TODO, which was added as per an earlier review. This shaves off a bit of compile-time.	2020-08-17 18:52:18 +01:00
Matt Arsenault	a275acc4a9	GlobalISel: Early continue to reduce loop indentation	2020-08-17 13:51:08 -04:00
Florian Hahn	139810449b	[DSE,MemorySSA] Account for ScanLimit == 0 on entry. Currently the code does not account for the fact that getDomMemoryDef can be called with ScanLimit == 0, if we reached the limit while processing an earlier access. Also tighten the check a bit more and bump the scan limit now that it is handled properly. In some cases, this brings a 2x speedup in terms of compile-time.	2020-08-17 17:55:14 +01:00
Aditya Kumar	cb6e6936db	NFC: [GVNHoist] Hoist loop invariant code and rename variables for readability Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D86031	2020-08-17 09:43:34 -07:00
Matt Arsenault	c8a9872259	AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset We may have an SGPR->VGPR copy if a totally uniform pointer calculation is used for a VGPR pointer operand. Also hack around a bug in MUBUF matching which would incorrectly use MUBUF for global when flat was requested. This should really be a predicate on the parent pattern, but the DAG always checked this manually inside the complex pattern.	2020-08-17 12:31:38 -04:00
Steven Perron	eed6476a87	Reset PAL metadata when AMDGPU traget stream finishes If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one. See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC. No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side. Let me know if there is a good way to test this. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D85667	2020-08-17 10:56:11 -04:00
Matt Arsenault	5b53b17cd3	DAG: Add missing comment for transform	2020-08-17 10:01:12 -04:00
Matt Arsenault	c7b9cd31bf	AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping	2020-08-17 09:53:26 -04:00
Matt Arsenault	af162ac785	AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics This should use the default mapping and insert a copy to the vcc bank, and not try to insert a readfirstlane.	2020-08-17 09:53:25 -04:00
Matt Arsenault	da3f357de6	AMDGPU: Don't look at dbg users for foldable operands These would have always failed to fold, so checking them or adding them to the fold candidates is useless.	2020-08-17 09:53:25 -04:00
Matt Arsenault	924f31bc3c	GlobalISel: Remove unnecessary check for copy type COPY isn't allowed to change the type, but can mix no type with type.	2020-08-17 09:19:25 -04:00
Matt Arsenault	66ffa0e91f	AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo	2020-08-17 09:19:22 -04:00
Matt Arsenault	e0375dbcb3	AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics Global instructions have the signed offsets.	2020-08-17 09:19:15 -04:00
Alex Zinenko	874aef875d	[llvm] support graceful failure of DataLayout parsing Existing implementation always aborts on syntax errors in a DataLayout description. While this is meaningful for consuming textual IR modules, it is inconvenient for users that may need fine-grained control over the layout from, e.g., command-line options. Propagate errors through the parsing functions and only abort in the top-level parsing function instead. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D85650	2020-08-17 15:10:37 +02:00
Kai Nacke	c2ae7934c8	[SystemZ/ZOS]__(de)register_frame are not available on z/OS. The functions `__register_frame`/`__deregister_frame` are not available on z/OS, so add a guard to not use them. Reviewed By: lhames, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D84787	2020-08-17 09:00:09 -04:00
Sam Elliott	3f7068ad98	[RISCV] Enable the use of the old mucounteren name The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which has the same numeric CSR value as `mucounteren` from 1.09.1. This patch enables the use of the old `mucounteren` name. Patch by Yuichi Sugiyama. Reviewed By: lenary, jrtc27, pzheng Differential Revision: https://reviews.llvm.org/D85067	2020-08-17 13:11:49 +01:00
Sam Elliott	5f9ecc5d85	[RISCV] Indirect branch generation in position independent code This fixes the "Unable to insert indirect branch" fatal error sometimes seen when generating position-independent code. Patch by msizanoen1 Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D84833	2020-08-17 13:09:26 +01:00
Sanjay Patel	e6b6787d01	[InstCombine] fold abs(X)/X to cmp+select The backend can convert the select-of-constants to bit-hack shift+logic if desirable. https://alive2.llvm.org/ce/z/pgJT6E define i8 @src(i8 %x) { %0: %a = abs i8 %x, 1 %d = sdiv i8 %x, %a ret i8 %d } => define i8 @tgt(i8 %x) { %0: %cond = icmp sgt i8 %x, 255 %r = select i1 %cond, i8 1, i8 255 ret i8 %r } Transformation seems to be correct!	2020-08-17 08:01:28 -04:00
Sanjay Patel	6cd4a6f6b2	[InstCombine] reduce code duplication; NFC	2020-08-17 08:01:27 -04:00
Simon Pilgrim	c1f6ce0c73	[DemandedBits] Improve accuracy of Add propagator The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits. I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback. This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read. Known A: 0_______0_______0_______0_______ Known B: 0_______0_______0_______0_______ AOut: 00000000001000000000000000000000 AB, current: 00000000001111111111111111111111 AB, patch: 00000000001111111000000000000000 Committed on behalf of: @rrika (Erika) Differential Revision: https://reviews.llvm.org/D72423	2020-08-17 12:54:09 +01:00
Simon Pilgrim	1d2ede87ea	[X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported. We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004	2020-08-17 11:58:51 +01:00
Cullen Rhodes	2ccde3c96b	[InlineCost] Fix scalable vectors in visitAlloca Discovered as part of the VLS type work (see D85128). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85848	2020-08-17 10:34:27 +00:00
Vitaly Buka	3b348d9102	[NFC][StackSafety] Move out sort from the loop	2020-08-17 03:30:14 -07:00
Kazushi (Jam) Marukawa	40f1e7e804	[VE] Support f128 Support f128 using VE instructions. Update regression tests. I've noticed there is no load or store i128 test, so I add them too. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D86035	2020-08-17 17:26:52 +09:00
Craig Topper	a206f85091	[X86] Reject dirflag in inline asm constraints other than clobber. Fixes the crash from PR47195.	2020-08-16 23:33:45 -07:00
Chen Zheng	4d52ebb9b9	[PowerPC] Make StartMI ignore COPY like instructions. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D85659	2020-08-17 02:12:30 -04:00
Yonghong Song	aa61e43040	[InstCombine] Fix a compilation bug With gcc 6.3.0, I hit the following compilation bug. ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic] }; ^ cc1plus: all warnings being treated as errors The error is introduced by Commit `ae7f08812e` ("[InstCombine] Aggregate reconstruction simplification (PR47060)")	2020-08-16 21:56:42 -07:00
Vitaly Buka	e10e7829bf	[StackSafety] Skip ambiguous lifetime analysis If we can't identify alloca used in lifetime marker we need to assume to worst case scenario. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D84630	2020-08-16 18:05:52 -07:00
Roman Lebedev	0ec1f0f332	[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically	2020-08-16 23:37:36 +03:00
Roman Lebedev	ae7f08812e	[InstCombine] Aggregate reconstruction simplification (PR47060) This pattern happens in clang C++ exception lowering code, on unwind branch. We end up having a `landingpad` block after each `invoke`, where RAII cleanup is performed, and the elements of an aggregate `{i8, i32}` holding exception info are `extractvalue`'d, and we then branch to common block that takes extracted `i8` and `i32` elements (via `phi` nodes), form a new aggregate, and finally `resume`'s the exception. The problem is that, if the cleanup block is effectively empty, it shouldn't be there, there shouldn't be that `landingpad` and `resume`, said `invoke` should be a `call`. Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`. But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft, while it is pointless, does not look like "empty cleanup block". So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has higher cost than it could have on unwind branch :S This doesn't happen that often, but it will basically happen once per C++ function with complex CFG that called more than one other function that isn't known to be `nounwind`. I think, this is a missing fold in InstCombine, so i've implemented it. I think, the algorithm/implementation is rather self-explanatory: 1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate. 2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great. 3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it. 4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor. I'm not sure if this already handles everything, and there are some FIXME's, i'll deal with all that later in followups. I'd be fine with going with post-commit review here code-wise, but just in case there are thoughts, i'm posting this. On RawSpeed, for example, this has the following effect: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 1253 \| 1253 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 948 \| 1355 \| 407 \| 42.93% \| 42.93% \| \| instcount.NumInsertValueInst \| 4382 \| 3210 \| -1172 \| -26.75% \| 26.75% \| \| simplifycfg.NumSinkCommonCode \| 574 \| 458 \| -116 \| -20.21% \| 20.21% \| \| simplifycfg.NumSinkCommonInstrs \| 1154 \| 921 \| -233 \| -20.19% \| 20.19% \| \| instcount.NumExtractValueInst \| 29017 \| 26397 \| -2620 \| -9.03% \| 9.03% \| \| instcombine.NumDeadInst \| 166618 \| 174705 \| 8087 \| 4.85% \| 4.85% \| \| instcount.NumPHIInst \| 51526 \| 50678 \| -848 \| -1.65% \| 1.65% \| \| instcount.NumLandingPadInst \| 20865 \| 20609 \| -256 \| -1.23% \| 1.23% \| \| instcount.NumInvokeInst \| 34023 \| 33675 \| -348 \| -1.02% \| 1.02% \| \| simplifycfg.NumSimpl \| 113634 \| 114708 \| 1074 \| 0.95% \| 0.95% \| \| instcombine.NumSunkInst \| 15030 \| 14930 \| -100 \| -0.67% \| 0.67% \| \| instcount.TotalBlocks \| 219544 \| 219024 \| -520 \| -0.24% \| 0.24% \| \| instcombine.NumCombined \| 644562 \| 645805 \| 1243 \| 0.19% \| 0.19% \| \| instcount.TotalInsts \| 2139506 \| 2135377 \| -4129 \| -0.19% \| 0.19% \| \| instcount.NumBrInst \| 156988 \| 156821 \| -167 \| -0.11% \| 0.11% \| \| instcount.NumCallInst \| 1206144 \| 1207076 \| 932 \| 0.08% \| 0.08% \| \| instcount.NumResumeInst \| 5193 \| 5190 \| -3 \| -0.06% \| 0.06% \| \| asm-printer.EmittedInsts \| 948580 \| 948299 \| -281 \| -0.03% \| 0.03% \| \| instcount.TotalFuncs \| 11509 \| 11507 \| -2 \| -0.02% \| 0.02% \| \| inline.NumDeleted \| 97595 \| 97597 \| 2 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 210514 \| 210522 \| 8 \| 0.00% \| 0.00% \| ``` So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half, and there is a very apparent decrease in instruction and basic block count. On vanilla llvm-test-suite: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 744 \| 744 \| 0.00% \| 0.00% \| \| instcount.NumInsertValueInst \| 2705 \| 2053 \| -652 \| -24.10% \| 24.10% \| \| simplifycfg.NumInvokes \| 1212 \| 1424 \| 212 \| 17.49% \| 17.49% \| \| instcount.NumExtractValueInst \| 21681 \| 20139 \| -1542 \| -7.11% \| 7.11% \| \| simplifycfg.NumSinkCommonInstrs \| 14575 \| 14361 \| -214 \| -1.47% \| 1.47% \| \| simplifycfg.NumSinkCommonCode \| 6815 \| 6743 \| -72 \| -1.06% \| 1.06% \| \| instcount.NumLandingPadInst \| 14851 \| 14712 \| -139 \| -0.94% \| 0.94% \| \| instcount.NumInvokeInst \| 27510 \| 27332 \| -178 \| -0.65% \| 0.65% \| \| instcombine.NumDeadInst \| 1438173 \| 1443371 \| 5198 \| 0.36% \| 0.36% \| \| instcount.NumResumeInst \| 2880 \| 2872 \| -8 \| -0.28% \| 0.28% \| \| instcombine.NumSunkInst \| 55187 \| 55076 \| -111 \| -0.20% \| 0.20% \| \| instcount.NumPHIInst \| 321366 \| 320916 \| -450 \| -0.14% \| 0.14% \| \| instcount.TotalBlocks \| 886816 \| 886493 \| -323 \| -0.04% \| 0.04% \| \| instcount.TotalInsts \| 7663845 \| 7661108 \| -2737 \| -0.04% \| 0.04% \| \| simplifycfg.NumSimpl \| 886791 \| 887171 \| 380 \| 0.04% \| 0.04% \| \| instcount.NumCallInst \| 553552 \| 553733 \| 181 \| 0.03% \| 0.03% \| \| instcombine.NumCombined \| 3200512 \| 3201202 \| 690 \| 0.02% \| 0.02% \| \| instcount.NumBrInst \| 741794 \| 741656 \| -138 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 14443 \| 14445 \| 2 \| 0.01% \| 0.01% \| \| asm-printer.EmittedInsts \| 7978085 \| 7977916 \| -169 \| 0.00% \| 0.00% \| \| inline.NumDeleted \| 73188 \| 73189 \| 1 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 291959 \| 291968 \| 9 \| 0.00% \| 0.00% \| ``` Roughly similar effect, less instructions and blocks total. See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e. Compile-time wise, this appears to be roughly geomean-neutral: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions And this is a win size-wize in general: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text See https://bugs.llvm.org/show_bug.cgi?id=47060 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85787	2020-08-16 23:27:56 +03:00
Simon Pilgrim	f25d47b7ed	[X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup. There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.	2020-08-16 15:00:41 +01:00
Sanjay Patel	3ffb751f3d	[InstCombine] fold copysign with fabs/fneg operand We already get this in the backend, but we need to do it in IR too to consistently get yet more copysign transforms.	2020-08-16 08:53:47 -04:00
Sanjay Patel	3fed67b7e6	[InstCombine] reduce code duplication; NFC	2020-08-16 08:53:47 -04:00
Vitaly Buka	47552a614a	[StackSafety] Change how callee searched in index Handle other than local linkage types.	2020-08-16 04:37:19 -07:00
Simon Pilgrim	dca7eb7d60	[X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.	2020-08-16 12:26:27 +01:00
Simon Pilgrim	c27baa54b7	[X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC. Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size. This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.	2020-08-16 11:51:44 +01:00
Fady Ghanim	aaa93a681b	[OpenMP][OMPBuilder] Adding support for `omp single` This adds support for generating `omp single`, and necessary calls for `copyprivate` clause. Differential Revision: https://reviews.llvm.org/D85617	2020-08-16 01:15:16 -04:00
Wenlei He	577e58bcc7	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. This is a resubmit of https://reviews.llvm.org/D83743	2020-08-15 20:17:21 -07:00
Lang Hames	a49b05bb61	[JITLink][MachO] Use correct symbol scope when N_PEXT is set and N_EXT unset. MachOLinkGraphBuilder has been treating these as hidden, but they should be treated as local. Symbols with N_PEXT set and N_EXT unset are produced when hidden symbols are run through 'ld -r' without passing -keep_private_externs. They will show up under 'nm -m' as "was private extern", hence the name of the test cases. Testcase commited as relocatable object to ensure that the test suite doesn't depend on having 'ld -r' available.	2020-08-15 15:53:33 -07:00
Amara Emerson	7006bb69ef	[GlobalISel] Enable copy-propagation in post-legalizer combiner. This cleans up copies that the legalizer or other combines leave around. They can occasionally end up escaping as moves. Differential Revision: https://reviews.llvm.org/D85964	2020-08-15 13:44:30 -07:00
Matt Arsenault	04a288f0f0	GlobalISel: Remove unnecessary llvm::	2020-08-15 12:12:50 -04:00
Matt Arsenault	f0af434b79	AMDGPU: Remove register class params from flat memory patterns	2020-08-15 12:12:33 -04:00
Matt Arsenault	a7455652c0	AMDGPU: Fix global atomic saddr operand class	2020-08-15 12:12:28 -04:00
Matt Arsenault	625db2fe5b	AMDGPU: Remove slc from flat offset complex patterns This was always set to 0. Use a default value of 0 in this context to satisfy the instruction definition patterns. We can't unconditionally use SLC with a default value of 0 due to limitations in TableGen's handling of defaulted operands when followed by non-default operands.	2020-08-15 12:12:24 -04:00
Matt Arsenault	e5077b5c2a	AMDGPU: Fix matching wrong offsets for global atomic loads These used signed offsets with a different size.	2020-08-15 12:12:17 -04:00
Matt Arsenault	8cb022982a	AMDGPU: Remove redundant FLAT complex patterns These were identical to the non-atomic cases. I'm not sure why these were ever separated.	2020-08-15 12:12:01 -04:00
Matt Arsenault	47af1ac69a	AMDGPU: Correct definitions for global saddr instructions The VGPR component is a 32-bit offset, not 64-bits. I'm not sure what the correct syntax is for this. This maintains the vaddr position and leaves saddr in the end "off" position. This is particularly terrible for stores, since the operand order is now <vgpr offset>, <data>, <sgpr base>, splitting the pointer operands. I suppose this is a logical consequence from the mistake of not putting the data operand first. I'm not sure what sp3 does.	2020-08-15 12:11:57 -04:00
Matt Arsenault	79298a5067	AMDGPU: Remove SIFixupVectorISel pass This was only used for matching the saddr addressing mode of global instructions, but this was not implemented correctly. The instruction definitions aren't even correct, and are defined as using a 64-bit VGPR component. Eliminate this pass to enable correcting the instruction definitions. A new matching implementation can work in GlobalISel or relying on DAG divergence information for the base address.	2020-08-15 12:11:51 -04:00
Luofan Chen	266949b2bc	[Attributor][NFC] Format code	2020-08-16 00:00:45 +08:00
Luofan Chen	b7448a348b	[Attributor][NFC] Use indexes instead of iterator When adding elements when iterating, the iterator will become valid, which could cause errors. This fixes the issue by using indexes instead of iterator.	2020-08-15 23:09:46 +08:00
Cyndy Ishida	85d381eb02	[TextAPI] update DriverKit string value String value differed from downstream, where upstream doesn't depend on casing difference. <rdar://problem/67106257>	2020-08-15 06:44:30 -07:00
Xing GUO	030df8242f	[MachOYAML] Move EmitFunc to an inner scope. NFC.	2020-08-15 21:10:03 +08:00
Luofan Chen	87a85f3d57	[Attributor] Use internalized version of non-exact functions This patch internalize non-exact functions and replaces of their uses with the internalized version. Doing this enables the analysis of non-exact functions. We can do this because some non-exact functions with the same name whose linkage is `linkonce_odr` or `weak_odr` should have the same semantics, so we can safely internalize and replace use of them (the result of the other version of this function should be the same.). Note that not all functions can be internalized, e.g., function with `linkonce` or `weak` linkage. For now when specified in commandline, we internalize all functions that meet the requirements without calculating the cost of such internalzation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84167	2020-08-15 20:23:38 +08:00
Xing GUO	4a0b95dc5e	[DWARFYAML] Simplify isEmpty(). NFC.	2020-08-15 20:10:29 +08:00
Dávid Bolvanský	f134fc4f1b	Reland "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)"	2020-08-15 12:14:57 +02:00
Martin Storsjö	3e7403a134	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `6dbf0cfcf7`. That commit caused failed assertions, e.g. like this: $ cat sprintf-strcpy.c char ptr; void func(void) { ptr += sprintf(ptr, "%s", ""); } $ clang -c sprintf-strcpy.c -O2 -target x86_64-linux-gnu clang: ../lib/IR/Value.cpp:473: void llvm::Value::doRAUW(llvm::Value, llvm::Value::ReplaceMetadataUses): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.	2020-08-15 09:35:11 +03:00
Philip Reames	6b2105456a	[Statepoint] Remove code related to inline operand bundles This code becomes dead for valid IR after `48f4312` and `a96fc46`. The reason for the test change is that the verifier reports the first verification error encountered, in some non-specified visit order. By removing the verification code in gc.relocates for a statepoint with inline gc operands, I change the error the verifier reports. And in one case, the checked for error is no longer possible with the bundle representation, so I simply delete the file.	2020-08-14 20:29:41 -07:00
Philip Reames	48f4312d4e	Remove inline gc arguments from statepoints The "gc-live" operand bundles were recently added, and all tests have been updated to use that format. A migration period was provided, though it's worth noting these intrinsics are experimental, so formally there is no compatibile requirement. This is an extension to `a96fc46`. "gc-live" hadn't been implemented at the point that patch was initially posted.	2020-08-14 19:44:24 -07:00
Stanislav Mekhanoshin	43a38dc251	[AMDGPU] Fix MAI ld/st hazard handling It did not process hazard for ds_permute because it does not load or store even though it is DS. Differential Revision: https://reviews.llvm.org/D86003	2020-08-14 17:07:37 -07:00
Dávid Bolvanský	f62de7c9c7	[SLC] Transform strncpy(dst, "text", C) to memcpy(dst, "text\0\0\0", C) for C <= 128 only Transformation creates big strings for big C values, so bail out for C > 128. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86004	2020-08-15 01:53:32 +02:00
Gui Andrade	05e3ab41e4	[MSAN] Avoid dangling ActualFnStart when replacing instruction This would be a problem if the entire instrumented function was a call to e.g. memcpy Use FnPrologueEnd Instruction* instead of ActualFnStart BB* Differential Revision: https://reviews.llvm.org/D86001	2020-08-14 23:50:38 +00:00
Cameron McInally	92593f9e77	[SVE] Lower fixed length vXi32/vXi64 SDIV to scalable vectors. Differential Revision: https://reviews.llvm.org/D85982	2020-08-14 18:47:22 -05:00
Christopher Tetreault	416a6a85b1	[SVE] Remove calls to VectorType::getNumElements from AggressiveInstCombine Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D82218	2020-08-14 16:40:34 -07:00
Philip Reames	a96fc4638b	Remove deopt and gc transition arguments from gc.statepoint intrinsic (Forgot to land this a couple of weeks back.) In a recent series of changes, I've introduced support for using the respective operand bundle kinds on the statepoint. At the moment, code supports either/or, but there's no need to keep the old support around. For the moment, I am simply changing the specification and verifier to require zero length argument sets in the intrinsic. The intrinsic itself is experimental. Given that, there's no forward serialization needed. The in tree uses and generation have already been updated to use the new operand bundle based forms, the only folks broken by the change will be those with frontends generating statepoints directly and the updates should be easy. Why not go ahead and just remove the arguments entirely? Well, I plan to. But while working on this I've found that almost all of the arguments to the statepoint can be expressed via operand bundles or attributes. Given that, I'm planning a radical simplification of the arguments and figured I'd do one update not several small ones. Differential Revision: https://reviews.llvm.org/D80892	2020-08-14 16:07:40 -07:00
Fangrui Song	58f5966d5b	Fix TargetSubtargetInfo derivatives after D85165	2020-08-14 15:50:53 -07:00
Craig Topper	c7a0b2684f	[X86][MC][Target] Initial backend support a tune CPU to support -mtune This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line. This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned. One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU. I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning. Differential Revision: https://reviews.llvm.org/D85165	2020-08-14 15:31:50 -07:00
Jordan Rupprecht	38884641f2	Temporarily revert "[SCEVExpander] Add helper to clean up instrs inserted while expanding." This reverts commit `7829c33084`. The assertion is triggering on some internal code. A reduced test case is in progress.	2020-08-14 14:52:37 -07:00
Dávid Bolvanský	6dbf0cfcf7	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Transform sprintf(dst, "%s", str) -> strcpy(dst, str) if result is unused Avoid sprintf(dest, "%s", str) -> llvm.memcpy(align 1 dest, align 1 str, strlen(str)+1) if optimizing for size. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85963	2020-08-14 23:48:53 +02:00
Gui Andrade	36ebabc153	[MSAN] Convert ActualFnStart to be a particular Instruction *, not BB This allows us to add addtional instrumentation before the function start, without splitting the first BB. Differential Revision: https://reviews.llvm.org/D85985	2020-08-14 21:43:56 +00:00
Gui Andrade	97de0188dd	[MSAN] Reintroduce libatomic load/store instrumentation Have the front-end use the `nounwind` attribute on atomic libcalls. This prevents us from seeing `invoke __atomic_load` in MSAN, which is problematic as it has no successor for instrumentation to be added.	2020-08-14 20:31:10 +00:00
Xiangling Liao	f759b4e43b	[AIX] Generate unique module id based on Pid and timestamp A unique module id, which is a part of sinit and sterm function names, is necessary to be unique. However, `getUniqueModuleId` will fail if there is no strong external symbol within a module. We turn to use Pid and timestamp when this happens. Differential Revision: https://reviews.llvm.org/D85527	2020-08-14 16:22:50 -04:00
Vitaly Buka	fc4fd89852	[StackSafety] Use ValueInfo in ParamAccess::Call This avoid GUID lookup in Index.findSummaryInModule. Follow up for D81242. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D85269	2020-08-14 12:42:44 -07:00
Greg McGary	eef41efe00	[MachO] Add skeletal support for DriverKit platform Define the platform ID = 10, and simple mappings between platform ID & name. Reviewed By: MaskRay, cishida Differential Revision: https://reviews.llvm.org/D85594	2020-08-14 12:36:43 -07:00
Haowei Wu	ee5d07e6ce	Remove unnecessary HEADER_DIRS in lib/InterfaceStub/CMakeLists.txt This change removes unnecessary HEADER_DIRS from //llvm/lib/ InterfaceStub/CMakeLists.txt file. Differential Revision: https://reviews.llvm.org/D85936	2020-08-14 11:22:50 -07:00
Matt Arsenault	5c5e6d951e	TableGen/GlobalISel: Partially handle immAllOnesV/immAllZerosV These should really match either G_BUILD_VECTOR or G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing mechanism for matching alternative opcodes. There is GIM_SwitchOpcode, but it seems to assume it's oly only used for matcher optimization. I could also omit any opcode check and rely on the matcher directly checking the opcode, but the table optimizer currently assumes there has to be an opcode check. Also doesn't try to handle undef elements like the DAG version.	2020-08-14 13:55:30 -04:00
Simon Pilgrim	e9eb2dc332	[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y)) This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining Another step towards PR41813 Recommit of rG9bd97d036398 with fixed offset adjustments	2020-08-14 18:43:19 +01:00
Matt Arsenault	40a142fa57	AMDGPU/GlobalISel: Match andn2/orn2 for more types Unfortunately this ends up not working as expected on targets with 16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform 16-bit ops to i32. The vector case annoyingly requires switching the checked opcode, since constants for vectors aren't directly handled. I also need to think more carefully about whether this is valid for i1.	2020-08-14 13:18:03 -04:00
Jordan Rupprecht	fd9187f746	[NFC] Silence variables unused in release builds	2020-08-14 08:35:58 -07:00
Denis Antrushin	1c80a6ce5f	[Statepoints] FixupStatepoint: properly set isKill on spilled register. When spilling statepoint meta arg register it is incorrect to blindly mark it as killed - it may be used in non-meta args (e.g., as call parameter).	2020-08-14 22:19:20 +07:00
Matt Morehouse	891b2be85d	Revert "[NFC][StackSafety] Move out sort from the loop" This reverts commit `0426e28419` due to ASan buildbot failure.	2020-08-14 08:17:35 -07:00
Johannes Doerfert	9240e48a58	[OpenMP][OMPIRBuilder] Use the source (=directory + filename) for locations Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D85938	2020-08-14 08:59:25 -05:00
Denis Antrushin	5f6bee77fa	[Statepoints] Spill GC Ptr regs in FixupStatepoints. Extend FixupStatepointCallerSaved pass with ability to spill statepoint GC pointer arguments (optionally allowing them on CSRs). Special handling is required for invoke statepoints, because at MI level single landing pad may be shared by multiple statepoints, so we must ensure we spill landing pad's live-ins into the same stack slots. Full statepoint refactoring change set is available at D81603. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D81647	2020-08-14 20:21:19 +07:00
Kazushi (Jam) Marukawa	2f01af764b	[VE] Remove obsolete I8/I16 register classes Remove I8/I16 register classes which are prepared to implement previously to implement VE ABI. However, it is possible to implement VE ABI correctly without them. Therefore, removing them now. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D85905	2020-08-14 21:52:22 +09:00
Shinji Okumura	5f55a8193c	[Attributor] Implement AAPotentialValues This patch provides an implementation of `AAPotentialValues`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85632	2020-08-14 20:51:14 +09:00
Vitaly Buka	4c30d4b4e5	[NFC][StackSafety] Change map key comparison	2020-08-14 04:23:15 -07:00
Vitaly Buka	0426e28419	[NFC][StackSafety] Move out sort from the loop	2020-08-14 04:19:10 -07:00
Stefan Gränitz	9a47bcae7c	[ORC][NFC] Refactor loop to determine name of init symbol in IRMaterializationUnit This loop caused me a little headache once, because I didn't see the assigned variable is a member. The refactored version appears more readable to me. Differential Revision: https://reviews.llvm.org/D85922	2020-08-14 11:34:44 +02:00
Sam Parker	eb82d58f83	[NFC][ARM] Port MaybeCall into ARMTTImpl method Renamed to maybeLoweredToCall.	2020-08-14 10:23:20 +01:00
Vitaly Buka	798eb71c3a	[NFC][StackSafety] Dedup callees	2020-08-14 01:14:52 -07:00
Sebastian Neubauer	9aa0ff77bd	[AMDGPU] Enable .rodata for amdpal os PAL recently got support for multiple ELF sections and relocations, therefore we can now use .rodata sections instead of forcing constants into .text. Differential Revision: https://reviews.llvm.org/D85895	2020-08-14 09:05:48 +02:00
David Sherwood	6c7957c990	[SVE] Fix bug in SVEIntrinsicOpts::optimizePTest The code wasn't taking into account that the two operands passed to ptest could be identical and was trying to erase them twice. Differential Revision: https://reviews.llvm.org/D85892	2020-08-14 07:57:21 +01:00
Sam Parker	725400f993	[NFCI][SimpleLoopUnswitch] Adjust CostKind query When getUserCost was transitioned to use an explicit CostKind, TCK_CodeSize was used even though the original kind was implicitly SizeAndLatency so restore this behaviour. We now only query for CodeSize when optimising for minsize. I expect this to not change anything as, I think all, targets will currently return the same value for CodeSize and SizeLatency. Indeed I see no changes in the test suite for Arm, AArch64 and X86. Differential Revision: https://reviews.llvm.org/D85829	2020-08-14 07:54:20 +01:00
Igor Kudrin	95fad44e34	[DebugInfo] Avoid an infinite loop with a truncated pre-v5 .debug_str_offsets.dwo. dumpStringOffsetsSection() expects the size of a contribution to be correctly aligned. The patch adds the corresponding verifications for pre-v5 cases. Differential Revision: https://reviews.llvm.org/D85739	2020-08-14 13:11:37 +07:00
Arthur Eubanks	48cd5b72b1	Revert "[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str)" This reverts commit `ab9fc8bae8`. Incorrect transformation if the result is used. Causes breakages, e.g. http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/8193/	2020-08-13 21:05:03 -07:00
Peter Collingbourne	c201f27225	hwasan: Emit the globals note even when globals are uninstrumented. This lets us support the scenario where a binary is linked from a mix of object files with both instrumented and non-instrumented globals. This is likely to occur on Android where the decision of whether to use instrumented globals is based on the API level, which is user-facing. Previously, in this scenario, it was possible for the comdat from one of the object files with non-instrumented globals to be selected, and since this comdat did not contain the note it would mean that the note would be missing in the linked binary and the globals' shadow memory would be left uninitialized, leading to a tag mismatch failure at runtime when accessing one of the instrumented globals. It is harmless to include the note when targeting a runtime that does not support instrumenting globals because it will just be ignored. Differential Revision: https://reviews.llvm.org/D85871	2020-08-13 16:33:22 -07:00
Yuanfang Chen	a5ed20b549	[NewPM][CodeGen] Add machine code verification callback D83608 need this. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D85916	2020-08-13 16:13:01 -07:00
Ben Dunbobbin	4cb016cd2d	[X86][ELF] Prefer lowering MC_GlobalAddress operands to .Lfoo$local for STV_DEFAULT only This patch restricts the behaviour of referencing via .Lfoo$local local aliases, introduced in https://reviews.llvm.org/D73230, to STV_DEFAULT globals only. Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility) is an important scenario. Benefits: - Improves the size of object files by using fewer STT_SECTION symbols. - The code reads a bit better (it was not obvious to me without going back to the code reviews why the canBenefitFromLocalAlias function currently doesn't consider visibility). - There is also a side benefit in restoring the effectiveness of the --wrap linker option and making the behavior of --wrap consistent between LTO and normal builds for references within a translation-unit. Note: this --wrap behavior (which is specific to LLD) should not be considered reliable. See comments on https://reviews.llvm.org/D73230 for more. Differential Revision: https://reviews.llvm.org/D85782	2020-08-14 00:09:15 +01:00
Arthur Eubanks	41f49736a9	[ConstProp] Handle insertelement constants Previously ConstantFoldExtractElementInstruction() would only work with insertelement instructions, not contants. This properly handles insertelement constants as well. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D85865	2020-08-13 15:59:17 -07:00
Dávid Bolvanský	ab9fc8bae8	[SLC] sprintf(dst, "%s", str) -> strcpy(dst, str) Solves 46489	2020-08-14 00:05:55 +02:00
David Green	0c390c22a5	Revert "[ARM] Fix IT block generation after Thumb2SizeReduce with -Oz" This reverts commit `18279a54b5` as it is causing some chromium android test problems.	2020-08-13 22:40:36 +01:00
Austin Kerbow	7d1cb187fb	[AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded If we need a scratch register for the spill don't use the same scratch register that is being used for the MBUF offset. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D85772	2020-08-13 14:12:00 -07:00
Thomas Lively	d53d952810	[WebAssembly] Allow inlining functions with different features Allow inlining only when the Callee has a subset of the Caller's features. In principle, we should be able to inline regardless of any features because WebAssembly supports features at module granularity, not function granularity, but without this restriction it would be possible for a module to "forget" about features if all the functions that used them were inlined. Requested in PR46812. Differential Revision: https://reviews.llvm.org/D85494	2020-08-13 13:57:43 -07:00
Dávid Bolvanský	5ef2287d36	[SLC] Optimize strncpy(a, a, C) to memcpy(a, a000, C) Solves PR47154	2020-08-13 22:22:51 +02:00
Cameron McInally	21810b0e14	[SVE] Lower fixed length vector integer UMIN/UMAX Differential Revision: https://reviews.llvm.org/D85926	2020-08-13 14:48:36 -05:00
Stefan Gränitz	5bcd32b744	[ORC][NFC] Fix typo in comment	2020-08-13 21:14:20 +02:00
Stefan Gränitz	f12db8cf75	[ORC] cloneToNewContext() can work with a const-ref to ThreadSafeModule	2020-08-13 21:01:21 +02:00
Haowei Wu	d650cbc349	[elfabi] Move llvm-elfabi related code to InterfaceStub library This change moves elfabi related code to llvm/InterfaceStub library so it can be shared by multiple llvm tools without causing cyclic dependencies. Differential Revision: https://reviews.llvm.org/D85678	2020-08-13 11:51:44 -07:00
Stanislav Mekhanoshin	0462aef5f3	[AMDGPU] Inhibit SDWA if target instruction has FI Differential Revision: https://reviews.llvm.org/D85918	2020-08-13 11:34:28 -07:00
Stanislav Mekhanoshin	d25cb5a8a2	[AMDGPU] Fix misleading SDWA verifier error. NFC. The old error from GFX9 shall be updated to GFX9+.	2020-08-13 11:32:17 -07:00
Aditya Kumar	1a8c9cd1d9	Fix PR45442: Bail out when MemorySSA information is not available Reviewers: sebpop, uabelho, fhahn Reviewed by: fhahn Differential Revision: https://reviews.llvm.org/D85881	2020-08-13 11:25:58 -07:00
Lang Hames	adaadbfeac	[JITLink][MachO] Return an error when MachO TLV relocations are encountered. MachO TLV relocations aren't supported yet. Error out rather than falling through to llvm_unreachable.	2020-08-13 11:19:35 -07:00
Sameer Arora	8d58eb11f9	[llvm-libtool-darwin] Refactor ArchiveWriter Refactoring function `writeArchive` in ArchiveWriter. Added a new function `writeArchiveBuffer` that returns the archive in a memory buffer instead of writing it out to the disk. This refactor is necessary so as to allow `llvm-libtool-darwin` to write universal files containing archives. Reviewed by jhenderson, MaskRay, smeenai Differential Revision: https://reviews.llvm.org/D84858	2020-08-13 10:56:30 -07:00
Dávid Bolvanský	50c743fa71	[BPI] Improve static heuristics for integer comparisons Similarly as for pointers, even for integers a == b is usually false. GCC also uses this heuristic. Reviewed By: ebrevnov Differential Revision: https://reviews.llvm.org/D85781	2020-08-13 19:54:27 +02:00
David Green	2632c625ed	[ARM] Mark VMINNMA/VMAXNMA as commutative These operations take Qda and Rn register operands, which are commutative so long as the instruction is not predicated. Differential Revision: https://reviews.llvm.org/D85813	2020-08-13 18:01:11 +01:00
Cameron McInally	e1a87f0a9b	[SVE] Lower fixed length vector integer SMIN/SMAX Differential Revision: https://reviews.llvm.org/D85855	2020-08-13 11:41:20 -05:00
Aditya Kumar	44716856db	Fix PR45442: Bail out when MemorySSA information is not available	2020-08-13 09:31:18 -07:00
Bjorn Pettersson	11446b02c7	[VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load This is a fixup to commit `43bdac2906`, to make sure the address space from the original load pointer is retained in the vector pointer. Resolves problem with Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. due to address space mismatch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85912	2020-08-13 18:25:32 +02:00
Serguei Katkov	98ba0a5ffe	[InstCombine] Handle gc.relocate(null) in one iteration InstCombine adds users of transformed instruction to working list to process on the same iteration. However gc.relocate may have a hidden user (next gc.relocate) which is connected through gc.statepoint intrinsic and there is no direct def-use chain between them. In this case if the next gc.relocation is already processed it will not be added to worklist and will not be able to be processed on the same iteration. Let's we have the following case: A = gc.relocate(null) B = statepoint(A) C = gc.relocate(B, hidden(A)) If C is already considered then after replacement of A with null, statepoint B instruction will be added to the queue but not C. C can be processed only on the next iteration. If the chain of relocation is pretty long the many iteration may be required. This change is to reduce the number of iteration to meet the latest changes related to reducing infinite loop threshold. This is a quick (not best) fix. In the follow up patches I plan to move gc relocation handling into statepoint handler. This should also help to remove unused gc live entries in statepoint bundle. Reviewers: reames, dantrushin Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D75598	2020-08-13 23:16:27 +07:00
Fangrui Song	7f8c49b016	[llvm-objdump] Change symbol name/PLT decoding errors to warnings If the referenced symbol of a J[U]MP_SLOT is invalid (e.g. symbol index 0), llvm-objdump -d will bail out: ``` error: 'a': st_name (0x326600) is past the end of the string table of size 0x7 ``` where 0x326600 is the st_name field of the first entry past the end of .symtab Change it to a warning to continue dumping. `X86/plt.test` uses a prebuilt executable, so I pick `ELF/AArch64/plt.test` which has a YAML input and can be easily modified. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D85623	2020-08-13 08:13:42 -07:00
Simon Pilgrim	63863451d1	Fix unused variable warning. NFC. Reduce the dyn_cast<> to a isa<> as that's all non-assert builds require, and move the cast<> inside the assert.	2020-08-13 15:43:20 +01:00
Simon Pilgrim	cd3b850a4c	rG9bd97d0363987b582 - Revert "[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))" This reverts commit `9bd97d0363`. Seeing some codegen issues in internal testing.	2020-08-13 15:21:15 +01:00
Matt Arsenault	c7191e3185	DAG: Don't pass 0 alignment value to allowsMisalignedMemoryAccesses I think not unconditionally passing getDstAlign is broken, but leave that for another change.	2020-08-13 09:33:17 -04:00

... 7 8 9 10 11 ...

138742 Commits