llvm-project

Commit Graph

Author	SHA1	Message	Date
Lang Hames	4b37462aab	[ORC] Fix SimpleRemoteEPC data races. Adds a 'start' method to SimpleRemoteEPCTransport to defer transport startup until the client has been configured. This avoids races on client members if the first messages arrives while the client is being configured. Also fixes races on the file descriptors in FDSimpleRemoteEPCTransport.	2021-09-26 18:11:48 -07:00
Nikita Popov	7a855596c3	[BasicAA] Don't check whether GEP is sized (NFC) GEPs are required to have sized source element type, so we can just assert that here.	2021-09-26 21:21:54 +02:00
Simon Pilgrim	c0eff50fc5	[X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.	2021-09-26 19:37:23 +01:00
Simon Pilgrim	ed3e4917b3	[X86] Fold PACK(_EXTEND_VECTOR_INREG, UNDEF) -> _EXTEND_VECTOR_INREG For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.	2021-09-26 19:37:22 +01:00
Lang Hames	6498b0e991	Reintroduce "[ORC] Introduce EPCGenericRTDyldMemoryManager." This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager." (`bef55a2b47`) and "[lli] Add ChildTarget dependence on OrcTargetProcess library." (`7a219d801b`) which were reverted in `99951a5684` due to bot failures. The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized variable." (`0371049277`) and "[ORC] Wait for handleDisconnect to complete in SimpleRemoteEPC::disconnect." (`320832cc9b`).	2021-09-27 03:24:33 +10:00
Simon Pilgrim	3fe9767204	[X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W)) Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements. There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....	2021-09-26 18:08:29 +01:00
Lang Hames	175c1a39e8	[ORC][llvm-jitlink] Add debugging output to SimpleRemoteEPC (and Server). Also adds an optional 'debug' argument to the llvm-jitlink-executor tool to enable debug-logging.	2021-09-26 10:00:29 -07:00
Kazu Hirata	c4ae4a745d	[RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC) Note that RISCVMnemonicSpellCheck is defined in RISCVGenAsmMatcher.inc, which RISCVAsmParser.cpp includes. Identified with readability-redundant-declaration.	2021-09-26 09:26:57 -07:00
Roman Lebedev	d9413f46b3	[X86][Costmodel] Load/store i16 VF=2 interleaving costs The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0` So pick cost of `2`. For store we have: https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5` So pick cost of `1`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103144	2021-09-26 19:13:23 +03:00
Nikita Popov	14a49f5840	[DSE] Don't check getUnderlyingObject() return value (NFC) getUnderlyingObject() never returns null. It will simply return something that is not the "root" underlying object. Also drop a stale comment.	2021-09-26 18:01:26 +02:00
Nikita Popov	f3c74b72f4	[DSE] Make DSEState non-copyable (NFC) As it contains a self-reference, the default copy/move ctors would not be safe. Move the DSEState::get() method into the ctor to make sure no move occurs here even without NRVO. This is a speculative fix for test failures on llvm-clang-x86_64-expensive-checks-win.	2021-09-26 17:54:38 +02:00
Sanjay Patel	6063e6b499	[InstCombine] move add after min/max intrinsic This is another regression noted with the proposal to canonicalize to the min/max intrinsics in D98152. Here are Alive2 attempts to show correctness without specifying exact constants: https://alive2.llvm.org/ce/z/bvfCwh (smax) https://alive2.llvm.org/ce/z/of7eqy (smin) https://alive2.llvm.org/ce/z/2Xtxoh (umax) https://alive2.llvm.org/ce/z/Rm4Ad8 (umin) (if you comment out the assume and/or no-wrap, you should see failures) The different output for the umin test is due to a fold added with `c4fc2cb5b2` : // umin(x, 1) == zext(x != 0) We probably want to adjust that, so it applies more generally (umax --> sext or patterns where we can fold to select-of-constants). Some folds that were ok when starting with cmp+select may increase instruction count for the equivalent intrinsic, so we have to decide if it's worth altering a min/max. Differential Revision: https://reviews.llvm.org/D110038	2021-09-26 09:49:10 -04:00
Simon Pilgrim	3538ee763d	[CostModel][X86] Improve AVX1/AVX2 v16i32->v16i16/v16i8 truncation costs (PR51972) Based off worst case btver2 (AVX1) and haswell (AVX2) llvm-mca reports	2021-09-26 13:43:46 +01:00
Lang Hames	320832cc9b	[ORC] Wait for handleDisconnect to complete in SimpleRemoteEPC::disconnect. Disconnect should block until handleDisconnect completes, otherwise we might destroy the SimpleRemoteEPC instance while it's still in use. Thanks to Dave Blaikie for helping me track this down.	2021-09-26 10:19:26 +10:00
Lang Hames	0371049277	[ORC] Fix uninitialized variable. Spotted by Dave Blaikie. Thanks Dave!	2021-09-26 10:19:25 +10:00
Nikita Popov	ba664d9066	[AA] Move earliest escape tracking from DSE to AA This is a followup to D109844 (and alternative to D109907), which integrates the new "earliest escape" tracking into AliasAnalysis. This is done by replacing the pre-existing context-free capture cache in AAQueryInfo with a replaceable (virtual) object with two implementations: The SimpleCaptureInfo implements the previous behavior (check whether object is captured at all), while EarliestEscapeInfo implements the new behavior from DSE. This combines the "earliest escape" analysis with the full power of BasicAA: It subsumes the call handling from D109907, considers a wider range of escape sources, and works with AA recursion. The compile-time cost is slightly higher than with D109907. Differential Revision: https://reviews.llvm.org/D110368	2021-09-25 22:40:41 +02:00
Nikita Popov	327bbbb10b	[DSE] Make capture check more precise It is sufficient that the object has not been captured before the load that produces the pointer we're loading. A capture after that can not affect the already loaded pointer. This is small part of D110368 applied separately.	2021-09-25 22:23:19 +02:00
Nikita Popov	1c3859f31d	[BasicAA] Don't consider Argument as escape source (NFCI) The case of an Argument and an identified function local is already handled earlier, because we don't care about captures in that case. As such, we don't need to additionally consider the combination of an Argument with a non-escaping identified function local. This ensures that isEscapeSource() only returns true for instructions, which is necessary for D110368.	2021-09-25 22:08:15 +02:00
Lang Hames	99951a5684	Revert "[ORC] Introduce EPCGenericRTDyldMemoryManager." This reverts commit `bef55a2b47` while I investigate failures on some bots. Also reverts "[lli] Add ChildTarget dependence on OrcTargetProcess library." (`7a219d801b`) which was a fallow-up to `bef55a2b47`.	2021-09-25 11:19:14 -07:00
Lang Hames	bef55a2b47	[ORC] Introduce EPCGenericRTDyldMemoryManager. EPCGenericRTDyldMemoryMnaager is an EPC-based implementation of the RuntimeDyld::MemoryManager interface. It enables remote-JITing via EPC (backed by a SimpleExecutorMemoryManager instance on the executor side) for RuntimeDyld clients. The lli and lli-child-target tools are updated to use SimpleRemoteEPC and SimpleRemoteEPCServer (rather than OrcRemoteTargetClient/Server), and EPCGenericRTDyldMemoryManager for MCJIT tests. By enabling remote-JITing for MCJIT and RuntimeDyld-based ORC clients, EPCGenericRTDyldMemoryManager allows us to deprecate older remote-JITing support, including OrcTargetClient/Server, OrcRPCExecutorProcessControl, and the Orc RPC system itself. These will be removed in future patches.	2021-09-25 10:42:10 -07:00
Simon Pilgrim	18c8ed5416	[DAG] ReduceLoadOpStoreWidth - replace getABITypeAlign with allowsMemoryAccess (PR45116) One of the cases identified in PR45116 - we don't need to limit store narrowing to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory access by checking allowsMisalignedMemoryAccesses as a fallback.	2021-09-25 18:35:57 +01:00
Simon Pilgrim	8c83bd3bd4	[CostModel][X86] Adjust vXi32 multiply costs if it can be performed using PMADDWD Update the costs to match the codegen from combineMulToPMADDWD - not only can we use PMADDWD is its zero-extended, but also if its a constant or sign-extended from a vXi16 (which can be replaced with a zero-extension).	2021-09-25 16:28:48 +01:00
Simon Pilgrim	eb7c78c2c5	[X86][SSE] combineMulToPMADDWD - mask off upper bits of sign-extended vXi32 constants If we are multiplying by a sign-extended vXi32 constant, then we can mask off the upper 16 bits to allow folding to PMADDWD and make use of its implicit sign-extension from i16	2021-09-25 15:50:45 +01:00
Simon Pilgrim	2a4fa0c27c	[X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on sub-128 bit vectors	2021-09-25 15:50:45 +01:00
Kazu Hirata	44c401bdc3	[Mips] Remove redundant declarations (NFC) Note that identical declarations immediately precede what's being removed in this patch. Identified with readability-redundant-declaration.	2021-09-25 07:41:11 -07:00
Simon Pilgrim	f5a26ccae2	[X86][SSE] combineMulToPMADDWD - enable sext(v8i16) -> zext(v8i16) fold on pre-SSE41 targets We already do this on SSE41 targets where we have sext/zext instructions, now that combineShiftToPMULH handles SSE2 targets, we can enable this here as well.	2021-09-25 14:35:31 +01:00
Simon Pilgrim	4c72b10f0a	[X86] X86FastISel::fastMaterializeConstant - break if-else chain to fix llvm-else-after-return warning. NFCI All previous if-else cases return	2021-09-25 14:31:14 +01:00
Simon Pilgrim	a25f25c3b7	[X86] combineShiftToPMULH - relax from ISA from SSE41 to SSE2 With improved shuffle combines (in particular canonicalizeShuffleWithBinOps), we can now usefully perform this on any SSE2+ target. We should be able to remove this entirely and just use DAGCombiner's combineShiftToMULH if we can someday get it to support illegal (pre-widened) types.	2021-09-25 14:08:03 +01:00
Simon Pilgrim	5a14edd8ed	[InstCombine] Ensure shifts are in range for (X << C1) / C2 -> X fold. We can get here before out of range shift amounts have been handled - limit to BW-2 for sdiv and BW-1 for udiv Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38078	2021-09-25 12:57:43 +01:00
Simon Pilgrim	ee267b1c7c	[IR] DIBuilder::createEnumerator - pass APSInt by const reference Avoid unnecessary copy by value.	2021-09-25 11:58:06 +01:00
Simon Pilgrim	6bd5b1b1ce	[DAG] combineShiftToMULH - move getValueType() inside assert. NFCI. Avoids an unnecessary (void).	2021-09-25 11:56:35 +01:00
David Green	883758ed48	[ARM] Fix Arm block placement creating branches after jump tables. Given: - A jump table - Which jumps to the next block - The next block ends in a WLS - Where the WLS conditionally jumps to block earlier in the program. The Arm block placement pass would attempt to move the block containing the WLS earlier, as the WLS instruction can only branch forward. In doing so it would add a branch from the jumptable block to the WLS block, thinking it previously fell-through. This in itself would be fine, if a little inefficient, but the constant island pass expects all instructions after a jump-table branch to have been removed by analyzeBranch. So it gets confused and can assign the same labels to multiple jump table blocks. I've changed the condition to the same as used in analyzeBranch.	2021-09-25 11:32:25 +01:00
Jim Lin	ed687c0211	[RISCV] Fix incorrect operand type of inst alias for InstR4 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110381	2021-09-25 11:25:12 +08:00
David Blaikie	5cb210862b	DebugInfo: Use the signedness of the underlying enum when encoding enum non-type-template-parameters This improves the accuracy of the debug info and improves round tripping through -gsimple-template-names.	2021-09-24 17:02:55 -07:00
Craig Topper	715cf6ffb9	[RISCV] Add another isel optimization for (and (shl X, c2), c1). Where c1 is a shifted mask with 32-c2 leading zeros and c3 trailing zeros and c3>c2. We can select it as (slli (srliw X, c3-c2), c3).	2021-09-24 15:10:25 -07:00
David Blaikie	9911af4b91	WIP: Verify -gsimple-template-names=mangled values Clang will encode names that should be able to be simplified as "_STNname\|<template, args>" (eg: "_STNt1\|<int>") - this verification mode will detect these names, decode them, create the original name ("t1<int>") and the simple name ("t1") - letting the simple name run through the usual rebuilding logic - then compare the two sources of the full name - the rebuilt and the _STN encoding. This helps ensure that -gsimple-template-names is lossless.	2021-09-24 14:28:18 -07:00
Jonas Devlieghere	62d6ff5e9e	[dsymutil] Track incompleteness across unions When determining the incompleteness of a DIE based on its children, make sure we propagate it across union types. See test case for an example. Without this patch we never emit the definition of Container_ivars. Differential revision: https://reviews.llvm.org/D110443	2021-09-24 14:26:37 -07:00
Stanislav Mekhanoshin	cf74ef134c	[AMDGPU] Limit promote alloca max size in functions Non-entry functions have 32 caller saved VGPRs available. If we promote alloca to consume more registers we will have to spill CSRs. There is no reason to eliminate scratch access to get another scratch access instead. Differential Revision: https://reviews.llvm.org/D110372	2021-09-24 13:38:39 -07:00
Anirudh Prasad	a9ae2436fc	[SystemZ][z/OS] Introduce the GOFFMCAsmInfo Interface for z/OS - This patch adds in the GOFFMCAsmInfo interfaces for the z/OS target. - This patch decouples the previously existing SystemZMCAsmInfo interface for the ELF target and the z/OS target. - This patch also removes a small test in the SystemZAsmLexerTest.cpp. The reason for this is because, the test is set up for the s390x-ibm-linux (SystemZ ELF triple), and the test checks a function which is overridden only for the z/OS target. The reason we can't change the test to use a z/OS triple outright is because there is still missing support which prevents the successful running of a test (assert in AsmParser.cpp due to missing GOFFAsmParser support) Reviewed By: uweigand, abhina.sreeskantharajan Differential Revision: https://reviews.llvm.org/D110077	2021-09-24 16:25:41 -04:00
Nikita Popov	5969e5743a	[IR] Handle large element size when calculating GEP indices This is a fix for the issue reported at https://reviews.llvm.org/D110043#3019942: The ElementSize is a uint64_t and as such may be larger than the index space, or be negative in the index space. This is UB, but shouldn't cause assertion failures. We address this by detecting whether the size is too large and use a zero index in that case (which is always conservatively correct). Differential Revision: https://reviews.llvm.org/D110437	2021-09-24 22:20:20 +02:00
Sanjay Patel	a47c8e40c7	[InstCombine] fold lshr(trunc(lshr X, C1)) C2 Only the multi-use cases are changing here because there's another fold that catches the simpler patterns. But that other fold is the source of infinite loops when we try to add D110170, so removing that is planned as a follow-up. Attempt to show the general proof in Alive2: https://alive2.llvm.org/ce/z/Ns1uS2 Note that the overshift fold-to-zero tests are not currently handled by instsimplify. If they were, we could assert that the shift amount sum is less than the source bitwidth.	2021-09-24 15:44:07 -04:00
Sanjay Patel	29c09c7653	[InstCombine] match variable names and code comments; NFC	2021-09-24 15:44:07 -04:00
Teresa Johnson	96cb97c453	[ThinLTO] Update combined index for SamplePGO indirect calls to locals In ThinLTO for locals we normally compute the GUID from the name after prepending the source path to get a unique global id. SamplePGO indirect call profiles contain the target GUID without this uniquification, however (unless compiling with -funique-internal-linkage-names). In order to correctly handle the call edges added to the combined index for these indirect calls, during importing and bitcode writing we consult a map of original to full GUID to identify the actual callee. However, for a large application this was consuming a lot of compile time as we need to do this repeatedly (especially during importing where we may traverse call edges multiple times). To fix this implement a suggestion in one of the FIXME comments, and actually modify the call edges during a single traversal after the index is built to perform the fixups once. I combined this fixup with the dead code analysis performed on the index in order to avoid adding an additional walk of the index. The dead code analysis is the first analysis performed on the index. This reduced the time required for a large thin link with SamplePGO by about 20%. No new test added, but I confirmed that there are existing tests that will fail when no fixup is performed. Differential Revision: https://reviews.llvm.org/D110374	2021-09-24 12:29:49 -07:00
Anirudh Prasad	ebe06910ce	[NFC] Replace hard-coded usages of SystemZ::R15D with SpecialRegisters API This patch changes hard-coded usages of SystemZ::R15D with calls to the getStackPointerRegister function. Uses in the LowerCall function are avoided to avoid merge conflicts with an expected upcoming patch. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D109702	2021-09-24 15:20:57 -04:00
Paul Robinson	6185ad03f1	[TargetLibraryInfo] Correctly handle sqrt*_finite Other <math>_finite calls are marked as unavailable except on GNU/Linux; it looks like the sqrt set was just overlooked. Differential Revision: https://reviews.llvm.org/D110418	2021-09-24 11:57:38 -07:00
Jay Foad	ac51ad24a7	[LiveIntervals] Fix asan debug build failures Call RemoveMachineInstrFromMaps before erasing instrs. repairIntervalsInRange will do this for you after erasing the instruction, but it's not safe to rely on it because assertions in SlotIndexes::removeMachineInstrFromMaps refer to fields in the erased instruction. This fixes asan buildbot failures caused by D110328.	2021-09-24 19:14:57 +01:00
Anirudh Prasad	e09a1dc475	[SystemZ][z/OS] Add GOFF Support to the DataLayout - This patch adds in the GOFF mangling support to the LLVM data layout string. A corresponding additional line has been added into the data layout section in the language reference documentation. - Furthermore, this patch also sets the right data layout string for the z/OS target in the SystemZ backend. Reviewed By: uweigand, Kai, abhina.sreeskantharajan, MaskRay Differential Revision: https://reviews.llvm.org/D109362	2021-09-24 14:09:01 -04:00
Stanislav Mekhanoshin	08d7eec06e	Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit `92c1fd19ab`.	2021-09-24 10:26:11 -07:00
Simon Pilgrim	bdee805b32	[ConstantFold] ConstantFoldGetElementPtr - use APInt::isNegative() instead of getSExtValue() to support big ints Fixes fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=39197	2021-09-24 18:18:53 +01:00
Victor Huang	6e1aaf18af	[PowerPC] Mark splat immediate instructions as rematerializable This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as rematerializable to prevent MachineLICM from moving them out of loops. Reviewed By: lei, amy Differential revision: https://reviews.llvm.org/D108823	2021-09-24 12:03:34 -05:00
Hans Wennborg	1e9afab875	Re-apply "[JumpThreading] Ignore free instructions" It seems the crashes we saw wasn't caused by this (see comments on the review). > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `4604695d7c`.	2021-09-24 18:52:30 +02:00
Stanislav Mekhanoshin	082e22f3d7	[AMDGPU] Always reserve flat scratch SGPR for architected flat scratch With architected flat scratch it becomes readonly. We must always reserve SGPR pair for it even if we do not use scratch at all since an attempt to write to SGPRs mapped to FLAT_SCRATCH results in memory violation. This is not needed since GFX10 with architected flat scratch though since special SGPRs are not carving space from normal SGPRs. Differential Revision: https://reviews.llvm.org/D110376	2021-09-24 09:46:31 -07:00
Florian Hahn	6f28fb7081	Recommit "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts the revert commit `df56fc6ebb`. This version of the patch adjusts the location where the EarliestEscapes cache is cleared when an instruction gets removed. The earliest escaping instruction does not have to be a memory instruction. It could be a ptrtoint instruction like in the added test @earliest_escape_ptrtoint, which subsequently gets removed. We need to invalidate the EarliestEscape entry referring to the ptrtoint when deleting it. This fixes the crash mentioned in https://bugs.chromium.org/p/chromium/issues/detail?id=1252762#c6	2021-09-24 17:13:27 +01:00
Simon Pilgrim	d8fc9f8727	[X86][SSE] combineMulToPMADDWD - replace sext(v8i16) -> zext(v8i16) As suggested on D108522, if we're sign extending a v4i16 source before multiplying as a v4i32, then we can replace that with a zero extension and rely on the implicit sign-extension of PMADDWD.	2021-09-24 16:42:01 +01:00
Sanjay Patel	09e71c367a	[x86] convert logic-of-FP-compares to FP logic-of-vector-compares This is motivated by the examples and discussion in: https://llvm.org/PR51245 ...and related bugs. By using vector compares and vector logic, we can convert 2 'set' instructions into 1 'movd' or 'movmsk' and generally improve throughput/reduce instructions. Unfortunately, we don't have a complete vector compare ISA before AVX, so I left SSE-only out of this patch. Ie, we'd need extra logic ops to simulate the missing predicates for SSE 'cmpp*', so it's not as clearly a win. Differential Revision: https://reviews.llvm.org/D110342	2021-09-24 11:38:19 -04:00
Paul Robinson	1376ae9094	[TargetLibraryInfo][AMDGPU] Minor cleanup, NFC	2021-09-24 07:52:44 -07:00
Sanjay Patel	3c5500907b	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try)" This reverts commit `bb9333c350`. This exposes another existing bug that causes an infinite loop as shown in D110170 ...so reverting while I look at another fix.	2021-09-24 10:47:35 -04:00
Hans Wennborg	4604695d7c	Revert "[JumpThreading] Ignore free instructions" It caused compiler crashes, see comment on the code review for repro. > This is basically D108837 but for jump threading. Free instructions > should be ignored for the threading decision. JumpThreading already > skips some free instructions (like pointer bitcasts), but does not > skip various free intrinsics -- in fact, it currently gives them a > fairly large cost of 2. > > Differential Revision: https://reviews.llvm.org/D110290 This reverts commit `1e3c6fc7cb`.	2021-09-24 16:14:53 +02:00
Nico Weber	df56fc6ebb	Revert "[DSE] Track earliest escape, use for loads in isReadClobber." This reverts commit `5ce89279c0`. Makes clang crash, see comments on https://reviews.llvm.org/D109844	2021-09-24 09:57:59 -04:00
David Sherwood	8e4f7b749c	[Analysis] Fix another issue when querying vscale attributes on functions There are several places in the code that are currently broken where we assume an Instruction is always a member of a BasicBlock that lives in a Function. This is a problem specifically when attempting to get the vscale_range attribute. This patch adds checks that an Instruction's parent also has a parent! I've added a test for a function-less @llvm.vscale intrinsic call here: unittests/Analysis/ValueTrackingTest.cpp	2021-09-24 13:37:23 +01:00
Jay Foad	e4e95f14f1	[LiveIntervals] Repair live intervals that gain subranges In repairIntervalsInRange, if the new instructions refer to subregs but the old instructions did not, make sure any existing live interval for the superreg is updated to have subranges. Also skip repairing any range that we have recalculated from scratch, partly for efficiency but also to avoids some cases that repairOldRegInRange can't handle. The existing test/CodeGen/AMDGPU/twoaddr-regsequence.mir provides some test coverage for this change: when TwoAddressInstructionPass converts REG_SEQUENCE into subreg copies, the live intervals will now get subranges and MachineVerifier will verify that the subranges are correct. Unfortunately MachineVerifier does not complain if the subranges are not present, so the test also passed before this patch. This patch also fixes ~800 of the ~1500 failures in the whole CodeGen lit test suite when -early-live-intervals is forced on. Differential Revision: https://reviews.llvm.org/D110328	2021-09-24 11:58:08 +01:00
Jay Foad	7863cc6c1c	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238 Unrevert with some changes to the tests: - Add -verify-machineinstrs to check for remaining problems in live interval support in TwoAddressInstructionPass. - Drop test/CodeGen/AMDGPU/extract-load-i1.ll since it suffers from some of those remaining problems.	2021-09-24 11:44:49 +01:00
Congzhe Cao	751be2a064	[CodeMoverUtils] Enhance isSafeToMoveBefore() when moving BBs When moving an entire basic block BB before InsertPoint, currently we check for all instructions whether the operands dominates InsertPoint, however, this can be improved such that even an operand does not dominate InsertPoint, as long as it appears as a previous instruction in the same BB, it is safe to move. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110378	2021-09-24 05:48:15 -04:00
Hsiangkai Wang	7d39a8a921	[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics. Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list. For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below. In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them. * Use dest argument to control tail policy vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument) vfmerge.vfm (add _t builtins with additional dest argument) vmv.v.v (add _t builtins with additional dest argument) vmv.v.x (add _t builtins with additional dest argument) vmv.v.i (add _t builtins with additional dest argument) vfmv.v.f (add _t builtins with additional dest argument) vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument) vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument) * Always has tail argument for masked/unmasked intrinsics Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Reduction Operations (add _t and _mt builtins) Vector Slideup Instructions (add _t and _mt builtins) Vector Slidedown Instructions (add _t and _mt builtins) Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101 Differential Revision: https://reviews.llvm.org/D105092	2021-09-24 17:09:50 +08:00
Simon Pilgrim	dade83c02a	[X86][SLM] Fix ADDQ/SUBQ/CMPEQQ throughput to account for running on either port. Testing on a SLM box suggests these can run on either port, but the throughput is 4cy on either (inc MMX versions). Confirmed with Intel AoM / Agner / InstLatX64.	2021-09-24 10:06:14 +01:00
David Sherwood	c2634fc6ab	[Analysis] Fix issues when querying vscale attributes on functions There are several places in the code that are currently broken as they assume an Instruction always has a parent Function when attempting to get the vscale_range attribute. This patch adds checks that an Instruction has a parent. I've added a test for a parentless @llvm.vscale intrinsic call here: unittests/Analysis/ValueTrackingTest.cpp Differential Revision: https://reviews.llvm.org/D110158	2021-09-24 09:58:10 +01:00
Jonas Paulsson	ea92283449	[SystemZ] Implement ISD::BITCAST for fp128 -> i128. The type legalizer has by default no method of doing this bitcast other than storing and reloading the value from stack. This patch implements a custom lowering of this operation using extractions of subregs (z13 and earlier using FP128 register pairs), or of vector elements (with 'vector enhancements 1' using VR128 FP registers). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D110346	2021-09-24 10:26:45 +02:00
Amara Emerson	9f773b17c2	[GlobalISel][IRTranslator] Fix crash during bit-test switch optimization with odd types. Odd switch case types cause a crash in the conversion to MVT. Instead use a pointer sized scalar type which is what SDAG does in these cases.	2021-09-24 00:19:27 -07:00
Amara Emerson	661ab70314	[AArch64][GlobalISel] Fix crash in the extend(extract_vector_elt) optimization. It was assuming that GPR extends could only have destination sizes of 32 or 64 bits, but for AArch64 we allow < 32 bits even without matching size physregs.	2021-09-23 23:07:16 -07:00
Lang Hames	ef391df2b6	[ORC] Rename ExecutorAddress to ExecutorAddr. Removing the 'ess' suffix improves the ergonomics without sacrificing clarity. Since this class is likely to be used more frequently in the future it's worth some short term pain to fix this now.	2021-09-23 20:35:17 -07:00
Lang Hames	a2c1cf09df	[ORC] Introduce EPCGenericDylibManager / SimpleExecutorDylibManager. EPCGenericDylibManager provides an interface for loading dylibs and looking up symbols in the executor, implemented using EPC-calls to functions in the executor. SimpleExecutorDylibManager is an executor-side service that provides the functions used by EPCGenericDylibManager. SimpleRemoteEPC is updated to use an EPCGenericDylibManager instance to implement the ExecutorProcessControl loadDylib and lookup methods. In a future commit these methods will be removed, and clients updated to use EPCGenericDylibManagers directly.	2021-09-23 19:59:35 -07:00
Christudasan Devadasan	7a62a5b56d	[AMDGPU] Legalize initialized LDS variables We don't allow an initializer for LDS variables and there is an early abort during instruction selection. This patch legalizes them by ignoring the init values. During assembly emission, proper error reporting already exists for such instances. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109901	2021-09-23 22:53:20 -04:00
Teresa Johnson	2c1defeee4	[ThinLTO] Don't emit original GUID for locals to distributed indexes In ThinLTO for locals we normally compute the GUID from the name after prepending the source path to get a unique global id. SamplePGO indirect call profiles contain the target GUID without this uniquification, however (unless compiling with -funique-internal-linkage-names). Therefore, the index contains the original GUID of the local symbols (without module path prepended to uniquify), in order to correctly handle the call edges added for these indirect call profile targets with SamplePGO. We were emitting these to the combined index when writing it out as bitcode, which is unnecessary and causes overhead when writing out the indexes for distributed backends. The only use of the original GUID name is in the thin link. Suppress it in that case. This reduced the thin link time for a large distributed build by about 7%, and the aggregate size of the serialized indexes by over 2%. Continue to print it when writing out the full index, since that is just used for debugging and testing. Update a distributed thinlto index test to contain a local and ensure that we don't get a COMBINED_ORIGINAL_NAME record. Differential Revision: https://reviews.llvm.org/D110296	2021-09-23 17:35:47 -07:00
Lang Hames	c965fde7c2	[ORC] Shut down services in SimpleRemoteEPCServer. This should have been included with ExecutorBootstrapService in `78b083dbb7`, but was accidentally left out. It give services a chance to release any resources that they have acquired.	2021-09-23 16:27:28 -07:00
Craig Topper	40b230f685	[RISCV] Limit transformAddImmMulImm to prevent an infinite loop. This fixes an issue reported in D108607.	2021-09-23 15:53:11 -07:00
Lang Hames	2ce73f13c9	[ORC] Fix file header.	2021-09-23 15:48:08 -07:00
Vang Thao	1443ba6163	[AMDGPU] Propagate defining src reg for AGPR to AGPR Copys On targets that do not support AGPR to AGPR copying directly, try to find the defining accvgpr_write and propagate its source vgpr register to the copies before register allocation so the source vgpr register does not get clobbered. The postrapseudos pass also attempt to propagate the defining accvgpr_write but if the register to propagate is clobbered, it will give up and create new temporary vgpr registers instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108830	2021-09-23 15:17:53 -07:00
Matt Arsenault	2875d3d484	RegAllocGreedy: Remove an unhelpful auto, and don't use a reference	2021-09-23 17:25:25 -04:00
Craig Topper	70f50114f3	[RISCV] Add another isel optimization for (and (shl x, c2), c1) Turn (and (shl x, c2), c1) -> (slli (srli x, c3-c2), c3) if c1 is a shifted mask with no leading zeros and c3 trailing zeros where c3 is greater than c2.	2021-09-23 14:18:07 -07:00
Nico Weber	3fa43da7a3	[llvm] Fix a copy-pasto We should use IMAGE_REL_I386_SECREL in the i386 section of this file. IMAGE_REL_I386_SECREL and IMAGE_REL_AMD64_SECREL have the same numeric value 0xB, so this doesn't change behavior.	2021-09-23 15:34:01 -04:00
Stefan Gränitz	767b328e50	[ORC] Minor renaming and typo fixes (NFC) Two typos, one unsused include and some leftovers from the TargetProcessControl -> ExecutorProcessControl renaming Reviewed By: xgupta Differential Revision: https://reviews.llvm.org/D110260	2021-09-23 21:33:34 +02:00
Fangrui Song	0bb767e7db	[InlineAdvisor] Use one single quote	2021-09-23 12:16:15 -07:00
Craig Topper	4a69551d66	[RISCV] Add more isel optimizations for (and (shr x, c2), c1). Turn (and (shr x, c2), c1) -> (slli (srli x, c2+c3), c3) if c1 is a shifted mask with c2 leading zeros and c3 trailing zeros. When the leading zeros is C2+32 we can use SRLIW in place of SRLI.	2021-09-23 11:29:04 -07:00
Sanjay Patel	74ba4b769a	[x86] move combiner state check into convertIntLogicToFPLogic(); NFC This function can be adapted to solve bugs like PR51245, but it could require differentiating the combiner timing between the existing and new transforms.	2021-09-23 14:28:22 -04:00
Thomas Lively	2f519825ba	[WebAssembly] Add prototype relaxed SIMD fma/fms instructions Add experimental clang builtins, LLVM intrinsics, and backend definitions for the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal: https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md. Do not allow these instructions to be selected without explicit user opt-in. Differential Revision: https://reviews.llvm.org/D110295	2021-09-23 11:01:36 -07:00
Louis Dionne	e6126faba0	[libc++] Remove unused macro in __config That macro was being defined but not used anywhere in libc++, so it must be safe to remove it. As a fly-by fix, also remove mentions of this macro in other places in LLVM, to make sure they were not depending on the value defined in libc++. Differential Revision: https://reviews.llvm.org/D110289	2021-09-23 13:09:32 -04:00
Jay Foad	deb2ca566a	Revert "[LiveIntervals] Fix repairOldRegInRange for simple def cases" This reverts commit `8229cb7412`. It was failing on buildbots with expensive checks enabled.	2021-09-23 17:55:05 +01:00
Nikita Popov	1e3c6fc7cb	[JumpThreading] Ignore free instructions This is basically D108837 but for jump threading. Free instructions should be ignored for the threading decision. JumpThreading already skips some free instructions (like pointer bitcasts), but does not skip various free intrinsics -- in fact, it currently gives them a fairly large cost of 2. Differential Revision: https://reviews.llvm.org/D110290	2021-09-23 18:28:36 +02:00
Fangrui Song	1a6e1ee42a	Resolve {GlobalValue,GloalIndirectSymol}::getBaseObject confusion While both GlobalAlias and GlobalIFunc are GlobalIndirectSymbol, their `getIndirectSymbol()` usage is quite different (GlobalIFunc's resolver is an entity different from GlobalIFunc itself). As discussed on https://lists.llvm.org/pipermail/llvm-dev/2020-September/144904.html ("[IR] Modelling of GlobalIFunc"), the name `getBaseObject` is confusing when used with GlobalIFunc. To resolve the confusion: * Move GloalIndirectSymol::getBaseObject to GlobalAlias:: (GlobalIFunc should use `getResolver` instead) * Change GlobalValue::getBaseObject not to inspect GlobalIFunc. Note: the function has 7 references. * Add GlobalIFunc::getResolverFunction to peel off potential ConstantExpr indirection (`strlen` in `test/LTO/Resolution/X86/ifunc.ll`) Note: GlobalIFunc::getResolver (like GlobalAlias::getAliasee which does not peel off ConstantExpr indirection) is kept to be used by ValueEnumerator. Reviewed By: ibookstein Differential Revision: https://reviews.llvm.org/D109792	2021-09-23 09:23:35 -07:00
Jay Foad	8229cb7412	[LiveIntervals] Fix repairOldRegInRange for simple def cases The fix applied in D23303 "LiveIntervalAnalysis: fix a crash in repairOldRegInRange" was over-zealous. It would bail out when the end of the range to be repaired was in the middle of the first segment of the live range of Reg, which was always the case when the range contained a single def of Reg. This patch fixes it as suggested by Matthias Braun in post-commit review on the original patch, and tests it by adding -early-live-intervals to a selection of existing lit tests that now pass. (Note that D23303 was originally applied to fix a crash in SILoadStoreOptimizer, but that is now moot since D23814 updated SILoadStoreOptimizer to run before scheduling so it no longer has to update live intervals.) Differential Revision: https://reviews.llvm.org/D110238	2021-09-23 17:16:14 +01:00
Craig Topper	d5c67bba62	[RegAlloc] Cast uint8_t to unsigned before printing it. raw_ostream interprets uint8_t as wanting to print a character with that ASCII value. In this case the uint8_t is an integer that we want to print.	2021-09-23 08:49:44 -07:00
Simon Pilgrim	5f2c53bdf4	Pass some DataLayout arguments by const-ref Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 15:50:31 +01:00
Piotr Sobczak	2ac53fffae	[AMDGPU] Avoid processing functions in amdgpu-propagate-attributes pass for shaders The pass amdgpu-propagate-attributes ("Early/Late propagate attributes from kernels to functions") is currently run also for shaders, where it does nothing. Modify the check so the pass only processes functions for kernels. Differential Revision: https://reviews.llvm.org/D109961	2021-09-23 16:46:56 +02:00
Simon Pilgrim	c931d35216	[CostModel][X86] Increase i64 mul cost from 1 to 2 Only the most recent cpus support really 1cy 64-bit multiplies, and the X64 cost table represents a realistic worst case. The 1cy value was also discouraging vectorization when most vXi64 PMULDQ expansions aren't actually slower than scalarization. Noticed while investigating PR51436.	2021-09-23 14:48:21 +01:00
Sanjay Patel	bb9333c350	[InstCombine] fold cast of right-shift if high bits are not demanded (2nd try) The 1st try at this was reverted because it caused an infinite loop in instcombine. That should be fixed after: `1cd6b44f26` (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-23 09:41:37 -04:00
Florian Hahn	5ce89279c0	[DSE] Track earliest escape, use for loads in isReadClobber. At the moment, DSE only considers whether a pointer may be captured at all in a function. This leads to cases where we fail to remove stores to local objects because we do not check if they escape before potential read-clobbers or after. Doing context-sensitive escape queries in isReadClobber has been removed a while ago in `d1a1cce5b1` to save compile-time. See PR50220 for more context. This patch introduces a new capture tracker, which keeps track of the 'earliest' capture. An instruction A is considered earlier than instruction B, if A dominates B. If 2 escapes do not dominate each other, the terminator of the common dominator is chosen. If not all uses cannot be analyzed, the earliest escape is set to the first instruction in the function entry block. If the query instruction dominates the earliest escape and is not in a cycle, then pointer does not escape before the query instruction. This patch uses this information when checking if a load of a loaded underlying object may alias a write to a stack object. If the stack object does not escape before the load, they do not alias. I will share a follow-up patch to also use the information for call instructions to fix PR50220. In terms of compile-time, the impact is low in general, NewPM-O3: +0.05% NewPM-ReleaseThinLTO: +0.05% NewPM-ReleaseLTO-g: +0.03 with the largest change being tramp3d-v4 (+0.30%) http://llvm-compile-time-tracker.com/compare.php?from=1a3b3301d7aa9ab25a8bdf045c77298b087e3930&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Compared to always computing the capture information on demand, we get the following benefits from the caching: NewPM-O3: -0.03% NewPM-ReleaseThinLTO: -0.08% NewPM-ReleaseLTO-g: -0.04% The biggest speedup is tramp3d-v4 (-0.21%). http://llvm-compile-time-tracker.com/compare.php?from=0b0c99177d1511469c633282ef67f20c851f58b1&to=bc6c6899cae757c3480f4ad4874a76fc1eafb0be&stat=instructions Overall there is a small, but noticeable benefit from caching. I am not entirely sure if the speedups warrant the extra complexity of caching. The way the caching works also means that we might miss a few cases, as it is less precise. Also, there may be a better way to cache things. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109844	2021-09-23 12:45:05 +01:00
Jim Lin	fbacf5ad38	[RISCV] Add missing op type OPERAND_UIMM2, OPERAND_UIMM3 and OPERAND_UIMM7 for verifyInstruction Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110307	2021-09-23 19:30:46 +08:00
Simon Pilgrim	2a5936faf0	[CodeGen] ProcessSDDbgValues - use const-ref value in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 12:23:46 +01:00
Simon Pilgrim	5cabe4d9d3	[CodeGen] RegisterCoalescer::buildVRegToDbgValueMap - use const-ref value in for-range loop. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 12:23:45 +01:00
Bjorn Pettersson	85a586501b	[BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor The NFC commit `e5692a564a` changed the logic for DomTreeUpdates to use the range [succ_begin, succ_begin) when looking for SuccsOfPredBB rather than using [succ_begin, succ_end). As the commit was NFC this is identified as a typo (it has been discussed briefly in phabricator). The typo was found when inspecting the code, so I've got no idea if changing back to the old range has any significant impact (such as solving any PR:s or causing some new problems). But at least this restores the code to the originally indented behavior.	2021-09-23 13:03:26 +02:00
Fraser Cormack	e7c879a69d	[RISCV][VP] Add support for VP_REDUCE_* operations This patch adds codegen support for lowering the vector-predicated reduction intrinsics to RVV instructions. The process is similar to that of the other reduction intrinsics, save for the fact that every VP reduction has a start value. We reuse the existing custom "VL" nodes, adding extra patterns where required to handle non-true masks. To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been given an explicit "merge" operand. This is to faciliate the VP reductions, where we must be careful to ensure that even if no operation is performed (when VL=0) we still produce the start value. The RVV reductions don't update the destination register under these conditions, so we tie the splatted start value to the output register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107657	2021-09-23 11:11:05 +01:00
Alex Richardson	05663dc146	[InstSimplify] Don't lose inbounds when simplifying a GEP I noticed this while working on a (ptrtoint (gep null, x)) -> x fold. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D110168	2021-09-23 09:25:06 +01:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
Bjorn Pettersson	c3ae8ecb52	[DAGCombiner] Rename isAlias as mayAlias. NFC Differential Revision: https://reviews.llvm.org/D110062	2021-09-23 09:54:42 +02:00
Bjorn Pettersson	c5e0313e44	[ModuleInlinerWrapperPass] Do some naive printing of wrapped pipeline with -print-pipeline-passes Bisecting and reducing opt pipelines that includes the ModuleInlinerWrapperPass has turned out to be a bit problematic. This is far from perfect (it still lacks information about inline advisor params etc.), but it should give some kind of hint to what the wrapped pipeline looks like when using -print-pipeline-passes. Reviewed By: aeubanks, mtrofin Differential Revision: https://reviews.llvm.org/D109878	2021-09-23 09:54:42 +02:00
Liu, Chen3	76656ec8ec	[X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A) This patch is to support transform something like _mm512_add_ph(acc, _mm512_fmadd_pch(a, b, _mm512_setzero_ph())) to _mm512_fmadd_pch(a, b, acc). Differential Revision: https://reviews.llvm.org/D109953	2021-09-23 15:37:08 +08:00
Mikael Holmen	e7b169a8ae	[AMDGPU] Fix gcc warnings about unused variables [NFC]	2021-09-23 08:08:00 +02:00
Johannes Doerfert	c6457dcae8	[OpenMP][FIX] Be more deliberate about invalidating the AAKernelInfo state This patch fixes a problem when the AAKernelInfo state was invalidated, e.g., due to `optnone` for a kernel, but not all parts indicated the invalidation properly. We further eliminate most full state invalidations as they should never be necessary. Differential Revision: https://reviews.llvm.org/D109468	2021-09-23 00:04:30 -05:00
Johannes Doerfert	0a16c56010	[OpenMP][NFC] Improve debug output	2021-09-23 00:04:29 -05:00
Usman Nadeem	3b12282b0e	[AArch64][SVE][InstCombine] Eliminate redundant chains of tuple get/set Differential Revision: https://reviews.llvm.org/D109667 Change-Id: I06a3c28e3658ecda109a3a1b73265828274ab2ea	2021-09-22 20:59:46 -07:00
Wang, Pengfei	ebec077e07	[X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109658	2021-09-23 11:02:48 +08:00
Freddy Ye	13207a21a6	[NFC] Remove redundant setOperationAction. [FROUND,FROUNDEVEN][f32, f64, f128] are set Expand twice. Differential Revision: https://reviews.llvm.org/D110302	2021-09-23 10:28:21 +08:00
hyeongyu kim	10a5632550	[NFC][InstCombine] Fix inconsistent comments	2021-09-23 09:31:39 +09:00
Zhi An Ng	1552179ac0	[WebAssembly] Add relaxed-simd feature This currently only defines a constant, but it the future will be used to gate builtins for experimenting and prototyping relaxed-simd proposal (https://github.com/WebAssembly/relaxed-simd/). Differential Revision: https://reviews.llvm.org/D110111	2021-09-22 14:52:50 -07:00
Craig Topper	f0a422f935	[RISCV] Add fcvt.s.w(u)/fcvt.d.w(u)/fcvt.h.w(u) to hasAllNBitUsers These instructions only read the lower 32 bits of their input.	2021-09-22 14:24:26 -07:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Sanjay Patel	1cd6b44f26	[InstCombine] add one-use check to shift-shift transform We don't want to create extra instructions, and this could infinite loop with the proposed transform in D110170.	2021-09-22 16:31:12 -04:00
Sanjay Patel	a85d7a56c7	[ValueTracking] fix isOnlyUsedInZeroEqualityComparison with no users This is another problem exposed by: https://bugs.llvm.org/PR50836	2021-09-22 15:01:53 -04:00
Sanjay Patel	b05804ab4c	[Analysis] reduce code for isOnlyUsedInZeroEqualityComparison; NFC There's a bug here noted by the FIXME and visible in variations of PR50836.	2021-09-22 14:57:53 -04:00
David Green	c49611f909	Mark CFG as preserved in TypePromotion and InterleaveAccess passes Neither of these passes modify the CFG, allowing us to preserve DomTree and LoopInfo across them by using setPreservesCFG. Differential Revision: https://reviews.llvm.org/D110161	2021-09-22 18:58:00 +01:00
Sanjay Patel	c240169ff2	[Analysis] improve function matching for strlen libcall The return type of strlen is size_t, not just any integer. This is a partial fix for an example based on: https://llvm.org/PR50836 There's another bug here because we can still crash processing a real strlen or something that looks like it.	2021-09-22 13:50:12 -04:00
Daniil Fukalov	1a7b7d7ba2	[NFCI][CodeGen, AArch64] Fix inconsistent TargetCostKind types. The pass uses different cost kinds to estimate "old" and "interleaved" costs: default cost kind for all targets override `getInterleavedMemoryOpCost()` is `TCK_SizeAndLatency`. Although at the moment estimated `TCK_Latency` costs are equal to `TCK_SizeAndLatency`, (so the change is NFC) it may change in future. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110100	2021-09-22 20:15:17 +03:00
Arthur Eubanks	e7249e4acf	[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837	2021-09-22 09:52:37 -07:00
Craig Topper	b33a1cc05b	[RISCV] Optimize vp.store with an all ones mask to avoid a vmset. We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here is based on similar code for handling masked.scatter and vp.scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110206	2021-09-22 09:12:47 -07:00
Shilei Tian	b205b3300b	[NFC] clang-format -i llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp	2021-09-22 12:10:20 -04:00
Hongtao Yu	d9b511d8e8	[CSSPGO] Set PseudoProbeInserter as a default pass. Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110209	2021-09-22 09:09:48 -07:00
Kazu Hirata	3c557cd7f9	[CodeGen] Remove redundant declaration MIRCanonicalizerID (NFC) Note that MIRCanonicalizerID is declared in llvm/include/llvm/CodeGen/Passes.h, which MIRCanonicalizerPass.cpp includes. Identified with readability-redundant-declaration.	2021-09-22 08:58:27 -07:00
Simon Pilgrim	8a44281f47	[SLP] getReductionCost - use explicit TTI::TCK_RecipThroughput CostKind. NFCI. Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.	2021-09-22 16:52:22 +01:00
hyeongyu kim	98e96663f6	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (3/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineVectorOps. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110230	2021-09-23 00:48:24 +09:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
hyeongyu kim	ec8311444a	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (2/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCompares and InstructionCombining. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110227	2021-09-23 00:14:50 +09:00
Simon Pilgrim	b1f38a27f0	[Target][CodeGen] Remove default CostKind arguments on inner/impl TTI overrides Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible. This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up. Differential Revision: https://reviews.llvm.org/D110242	2021-09-22 15:28:08 +01:00
hyeongyu kim	e5aaf03326	[InstCombine] Update InstCombine to use poison instead of undef for shufflevector's placeholder (1/3) This patch is for fixing potential shufflevector-related bugs like D93818. As D93818, this patch change shufflevector's default placeholder to poison. To reduce risk, it was divided into several patches, and this patch is for InstCombineCasts. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110226	2021-09-22 23:18:51 +09:00
Joseph Huber	1cf86df883	[OpenMP] Make sure the Thread ID function is not removed Summary: The thread ID function was reintroduced in D110195, but could potentially be removed by the optimizer. Make the function noinline to preserve the call sites and add it to the externalization RAII so its definition is not removed by the attributor.	2021-09-22 10:13:18 -04:00
Sander de Smalen	6375ca4059	[AArch64][SVE] Add extract_subvector patterns for unpacked fp16 and bfloat types. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110163	2021-09-22 14:25:17 +01:00
Sander de Smalen	3e8d2008f7	[SelectionDAG] Remove PromoteIntOp_EXTRACT_SUBVECTOR. This code seems untested and is likely obsolete, because this case should already be handled by the code that legalizes the result type of EXTRACT_SUBVECTOR. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110061	2021-09-22 14:23:35 +01:00
Tim Northover	3a00e58c2f	AArch64: use indivisible cmpxchg for 128-bit atomic loads at O0 Like normal atomicrmw operations, at -O0 the simple register-allocator can insert spills into the LL/SC loop if it's expanded and visible when regalloc runs. This can cause the operation to never succeed by repeatedly clearing the monitor. Instead expand to a cmpxchg, which has a pseudo-instruction for -O0.	2021-09-22 14:20:43 +01:00
Sander de Smalen	d5681f1d68	[SelectionDAG] Add PromoteIntOp_INSERT_SUBVECTOR. This is required to codegen something like: <vscale x 8 x i16> @llvm.experimental.vector.insert(<vscale x 8 x i16> %vec, <vscale x 2 x i16> %subvec, i64 %idx) where the output vector is legal, but the input vector needs promoting. It implements this by performing the whole operation on the promoted type, and then truncating the result. Reviewed By: david-arm, craig.topper Differential Revision: https://reviews.llvm.org/D110059	2021-09-22 13:32:36 +01:00
Florian Hahn	a7c6471a85	[Passes] Run vector-combine early with -fenable-matrix. IR with matrix intrinsics is likely to also contain large vector operations, which can benefit from early simplifications. This is the last step in a series of changes to improve code-gen for code using matrix subscript operators with the C/C++ matrix extension in CLang, like using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; } https://clang.godbolt.org/z/6dKxK1Ed7 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D102496	2021-09-22 12:48:32 +01:00
Sanjay Patel	c6013f71a4	Revert "[InstCombine] fold cast of right-shift if high bits are not demanded" This reverts commit `2f6b07316f`. This caused several bots to hit an infinite loop at stage 2, so it needs to be reverted while figuring out how to fix that.	2021-09-22 07:45:21 -04:00
David Green	02cd8a6b91	[ARM] Allow smaller VMOVL in tail predicated loops This allows VMOVL in tail predicated loops so long as the the vector size the VMOVL is extending into is less than or equal to the size of the VCTP in the tail predicated loop. These cases represent a sign-extend-inreg (or zero-extend-inreg), which needn't block tail predication as in https://godbolt.org/z/hdTsEbx8Y. For this a vecsize has been added to the TSFlag bits of MVE instructions, which stores the size of the elements that the MVE instruction operates on. In the case of multiple size (such as a MVE_VMOVLs8bh that extends from i8 to i16, the largest size was be chosen). The sizes are encoded as 00 = i8, 01 = i16, 10 = i32 and 11 = i64, which often (but not always) comes from the instruction encoding directly. A unit test was added, and although only a subset of the vecsizes are currently used, the rest should be useful for other cases. Differential Revision: https://reviews.llvm.org/D109706	2021-09-22 12:07:52 +01:00
Yi Kong	d0746f2e9b	Don't fold (select C, (gep Ptr, Idx), Ptr) if C is vector but Idx is scalar The folding rule (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) creates a malformed SELECT IR if C is a vector while Idx is scalar. SELECT VecC, ScalarIdx, 0 We could splat Idx to a vector but it defeats the purpose of optimisation. Don't apply the folding rule in this case. This fixes a regression from commit `d561b6fbdb`.	2021-09-22 18:11:33 +08:00
Florian Mayer	36daf074d9	[hwasan] also omit safe mem[cpy\|mov\|set]. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D109816	2021-09-22 11:08:27 +01:00
Sander de Smalen	4ca1fbe361	[SelectionDAG] Make WidenVecRes_Convert work for scalable vectors. Most of the code wasn't yet scalable safe, although most of the code conceptually just works for scalable vectors. This change makes the algorithm work on ElementCount, where appropriate, and leaves the fixed-width only code to use `getFixedNumElements`. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D110058	2021-09-22 10:58:38 +01:00
Florian Hahn	300870a95c	[VectorCombine] Switch to using a worklist. This patch updates VectorCombine to use a worklist to allow iterative simplifications where a combine enables other combines. Suggested in D100302. The main use case at the moment is foldSingleElementStore and scalarizeLoadExtract working together to improve scalarization. Note that we now also do not run SimplifyInstructionsInBlock on the whole function if there have been changes. This means we fail to remove/simplify instructions not related to any of the vector combines. IMO this is fine, as simplifying the whole function seems more like a workaround for not tracking the changed instructions. Compile-time impact looks neutral: NewPM-O3: +0.02% NewPM-ReleaseThinLTO: -0.00% NewPM-ReleaseLTO-g: -0.02% http://llvm-compile-time-tracker.com/compare.php?from=52832cd917af00e2b9c6a9d1476ba79754dcabff&to=e66520a4637290550a945d528e3e59573485dd40&stat=instructions Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D110171	2021-09-22 09:54:58 +01:00
Sander de Smalen	ab3607c0ed	[AArch64][SVE] Add missing load/store patterns for unpacked bfloat vectors. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D110063	2021-09-22 09:45:33 +01:00
Jay Foad	0205806d0f	[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form. With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding. Differential Revision: https://reviews.llvm.org/D110156	2021-09-22 09:36:34 +01:00
Jay Foad	3828ea6181	[AMDGPU] Divergence-driven instruction selection for mul i32 Differential Revision: https://reviews.llvm.org/D109881	2021-09-22 09:36:34 +01:00
Florian Hahn	e08a5dc86f	[InstCombine] Move InstCombineWorklist to Utils to allow reuse (NFC). InstCombine's worklist can be re-used by other passes like VectorCombine. Move it to llvm/Transform/Utils and rename it to InstructionWorklist. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110181	2021-09-22 08:47:21 +01:00
Matt Arsenault	ec55dcedce	AMDGPU: Refactor getWavesPerEU to separate flat workgroup size query Add an overload to pass the flat workgroup range in separately. This will allow the attributor to use the assumed value for amdgpu-flat-workgroup-sizes when inferring amdgpu-waves-per-eu.	2021-09-21 22:57:17 -04:00
Chen Zheng	ffa9fa9ed2	[PowerPC] prepare for udpate form with non-const increment. This is a follow-up of D105872. Now we are able to prepare for update form with non-const increment. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106032	2021-09-22 02:54:28 +00:00
Wenlei He	5f187f0afa	[SamplePGO] Add switch to honor zero count on block level as accurate Add a new LLVM switch `-profile-sample-block-accurate` to trust zero block counts for branches. Currently we leave out such zero counts when annotating branch weight metadata, which would lead to weights being considered as unknown. Differential Revision: https://reviews.llvm.org/D110117	2021-09-21 17:06:37 -07:00
Usman Nadeem	645b8f5365	[AArch64][SVE] Add patterns to generate ADR instruction Differential Revision: https://reviews.llvm.org/D109665 Change-Id: I9d2928688b80b804a16f52928e2057749ec2c0b2	2021-09-21 15:50:49 -07:00
Arthur Eubanks	e42234383e	Make DiagnosticInfoResourceLimit's limit param required And always print it. This makes some LLVM diagnostics match up better with Clang's diagnostics. Updated some AMDGPU uses of DiagnosticInfoResourceLimit and now we print better diagnostics for those. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D110204	2021-09-21 15:27:58 -07:00
Kirill Stoimenov	2649999579	[asan] Fixed a bug causing a crash when redzone optimization kicked in on X86 with -asan-optimize-callbacks flag on. This change adds the ASan intrinsic to the list whihc are setting hasCopyImplyingStackAdjustment. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D110012	2021-09-21 22:26:03 +00:00
Craig Topper	b81e26c7f4	Recommit "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering." This time with the right bug number. When we rewrite the setcc we replace set old setcc output register with the new CondReg. But since CondReg can be shared by other replacements, we don't know if the kill flags for the old register are valid for CondReg. So be conservative and remove them. The test case has a SETCCr and a SETCCm on the same condition so they end up sharing the same CondReg. The SETCCr had one use with a kill flag. This kill flag isn't valid after the replacement because CondReg needs a live range extending to the later SETCCm replacment. Fixes PR51903.	2021-09-21 14:59:25 -07:00
Xu Mingjie	32ab405717	[LTO] Emit DebugLoc for dead function in optimization remarks Currently, the dead functions information getting from optimizations remarks does not contain debug location, but knowing where these dead functions locate could be useful for debugging or for detecting dead code. Cause in `LTO::addRegularLTO()` we use `BitcodeModule::getLazyModule()` to read the bitcode module, when we pass Function F to `ore::NV()`, F is not materialized, so `F->getSubprogram()` returns nullptr, and there is no debug location information of dead functions in optimizations remarks. This patch call `F->materialize()` before we pass Function F to `ore::NV()`, then debug location information will be emitted for dead functions in optimization remarks. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D109737	2021-09-21 14:50:21 -07:00
Craig Topper	51a82e051e	Revert "[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering." This reverts commit `7550f146ff`. I botched the bug number.	2021-09-21 14:33:44 -07:00
Craig Topper	7550f146ff	[X86] Clear kill flags when rewriting SETCC uses in flag copy lowering. When we rewrite the setcc we replace set old setcc output register with the new CondReg. But since CondReg can be shared by other replacements, we don't know if the kill flags for the old register are valid for CondReg. So be conservative and remove them. The test case has a SETCCr and a SETCCm on the same condition so they end up sharing the same CondReg. The SETCCr had one use with a kill flag. This kill flag isn't valid after the replacement because CondReg needs a live range extending to the later SETCCm replacment. Fixes PR51908. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110046	2021-09-21 14:29:46 -07:00
George Burgess IV	cd5f582c3d	MemoryBuiltins: update comment; NFC This comment references behavior that was removed in `ccae43a247`, which is a commit from 5 years ago. It seems safe to assume that that behavior won't be coming back soon. If it does, we can readd this part of the comment :)	2021-09-21 13:47:26 -07:00
Sanjay Patel	2f6b07316f	[InstCombine] fold cast of right-shift if high bits are not demanded (masked) trunc (lshr X, C) --> (masked) lshr (trunc X), C Narrowing the shift should be better for analysis and can lead to follow-on transforms as shown. Attempt at a general proof in Alive2: https://alive2.llvm.org/ce/z/tRnnSF Here are a couple of the specific tests: https://alive2.llvm.org/ce/z/bCnTp- https://alive2.llvm.org/ce/z/TfaHnb Differential Revision: https://reviews.llvm.org/D110170	2021-09-21 16:09:08 -04:00
Antonio Frighetto	43d6991c2a	[IR] Look through bitcast in hasFnAttribute() A logic incompleteness may lead MemorySSA to be too conservative in its results. Specifically, when dealing with a call of kind `call i32 bitcast (i1 (i1)* @test to i32 (i32)*)(i32 %1)`, where the function `test` is declared with readonly attribute, the bitcast is not looked through, obscuring function attributes. Hence, some methods of CallBase (e.g., doesNotReadMemory) could provide suboptimal results. Differential Revision: https://reviews.llvm.org/D109888	2021-09-21 21:57:02 +02:00
Nikita Popov	e4a1af3724	[MergeICmps] Remove unused NumMerged variable	2021-09-21 21:43:25 +02:00
Nikita Popov	f2fa6ad047	[MergeICmps] Don't reorder unmerged comparisons MergeICmps will currently sort (by offset) all comparisons in a chain, including those that do not get merged. This is problematic in two ways: * We may end up moving the original first block into the middle of the chain, in which case the "extra work" instructions will also be in the middle of the chain, resulting in invalid IR (reported in https://reviews.llvm.org/D108782#3005583). * Reordering branches is generally not legal, because it may introduce branch on poison, which is UB (PR51845). The merging done by MergeICmps is legal as long as we assume that memcmp() works on frozen memory, but the reordering of unmerged comparisons is definitely incorrect (without inserting freeze instructions), so we should avoid it. There are easier ways to fix the first issue, but I figured it was worthwhile to do this properly to also fix the second one. What we now do is to restore the original relative order of (potentially merged) comparisons. I took the liberty of dropping the MERGEICMPS_DOT_ON functionality, because it would be more awkward to implement now (as the before and after representation is different) and it doesn't seem terribly useful nowadays. Differential Revision: https://reviews.llvm.org/D110024	2021-09-21 21:22:12 +02:00
David Blaikie	49c519a848	DebugInfo: Rebuild decltype(nullptr) as 'std::nullptr_t' Now that Clang's been changed to render nullptr types/template parameters as 'std::nullptr_t' do the same thing down here. (Clang commit: `131e878664` )	2021-09-21 11:37:30 -07:00
Michael Liao	2d1ffad010	[IR] Re-group AAMDNodes relevant interfaces. NFC.	2021-09-21 14:29:33 -04:00
alex-t	1a33294652	[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa. VALU operations only process the active lanes of the VGPR and ignore inactive ones. Active lanes correspond to 1 bit in the EXEC mask register. SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively. SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit. To return back to the SALU context we need to do the opposite. To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes. Reviewed By: piotr Differential Revision: https://reviews.llvm.org/D109900	2021-09-21 21:19:31 +03:00
Owen Anderson	b5fbbdd202	Teach InstCombine to eliminate malloc-realloc-free triplets. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D109988	2021-09-21 18:07:49 +00:00
Brendon Cahoon	cbdf624bb8	[AMDGPU] Correctly merge alias.scope and noalias metadata for memops When adding alias.scope and noalias metadata to a memcpy function, the alias.scope and noalias metadata from the operands are merged. The rule for merging alias.scope is to take the intersection of the domains and the union of the scopes within those domains. The rule for merging noalias is to take the intersection. The bug is that AMDGPULowerModuleLDS was using concatenation for both alias.scope and noalias. For example, when f1 and f2 are added to the LDS structure and there is a memcpy(f2, f1, sizeof(f1)). Then, concatenation creates noalias metadata for the memcpy that includes both {f1, f2}. That means that the memcpy is assumed not to alias a prior load of f2, which enables the optimizer to remove a load of f2 that occurs after mempcy. The function MDNode::getmostGenericAliasScope defines the semantics for alias.scope. There is a function, combineMetadata in Local.cpp, that uses intersect for noalias. Differential Revision: https://reviews.llvm.org/D110049	2021-09-21 13:02:01 -05:00
Craig Topper	7c975665b4	[RISCV] Make some arrays of constants 'static const'. NFC This helps the compiler generate better code.	2021-09-21 10:52:47 -07:00
Danila Malyutin	78b51c7a2c	[LSR] Make sure that Factor fits into Base type Fixes pr42770 Differential Revision: https://reviews.llvm.org/D108772	2021-09-21 20:50:50 +03:00
Amy Kwan	2af57b6099	[PowerPC] Add prefix load pattern for fpext to v2f64 This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we are dealing with a value with an offset that fits into a 34-bit signed immediate. A reduced test case is also added to patch that tests the pattern, in which the pattern is tested in the big endian CHECKs of the newly added test. Differential Revision: https://reviews.llvm.org/D109887	2021-09-21 12:45:24 -05:00
Ayal Zaks	ab6a69dfea	[LV] Fix crash for reverse interleaved loads with gap under fold-tail. This patch fixes the crash found by PR51614: whenever doing tail folding, interleave groups must be considered under mask. Another fix D108900 follows for targets that support masked loads and stores: when deciding to vectorize with masked interleave groups, check if the access is reverse - which is currently not supported; rather than (only) asserting when computing cost and generating code. Differential Revision: https://reviews.llvm.org/D108891	2021-09-21 20:13:32 +03:00
Craig Topper	aeb63d464f	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor. This requires a minor change to CodeGenPrepare to ensure that shouldSinkOperands will be called for And. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110106	2021-09-21 10:07:29 -07:00
Dávid Bolvanský	c0fdfc9af2	[InstCombine] powi(x, y) * powi(x, z) -> powi(x, y + z) We already have pow(x, y) * pow(x, z) -> pow(x, y + z) transformation, but we are missing same transformation for powi (power is integer). Requires reassoc. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109954	2021-09-21 18:20:46 +02:00
Florian Hahn	5131037ea9	[ValueTracking,VectorCombine] Allow passing DT to computeConstantRange. isValidAssumeForContext can provide better results with access to the dominator tree in some cases. This patch adjusts computeConstantRange to allow passing through a dominator tree. The use VectorCombine is updated to pass through the DT to enable additional scalarization. Note that similar APIs like computeKnownBits already accept optional dominator tree arguments. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D110175	2021-09-21 16:54:47 +01:00
Michael Liao	5fb3ae525f	[SelectionDAG] Re-calculate scoped AA metadata when merging stores. Reviewed By: jeroen.dobbelaere Differential Revision: https://reviews.llvm.org/D102821	2021-09-21 11:41:17 -04:00
Aleksandr Bezzubikov	624e4d087e	[GlobalISel] Support ConstantAsMetadata in IRTranslator When using instructions which have a MetadataAsValue argument (e.g. some target-specific intrinsics) MD canonicalization strips internal MDNodes with a single ConstantAsMetadata child. That prevented IRTranslator from the proper translation of such a calls.	2021-09-21 11:24:56 -04:00
Dmitry Preobrazhensky	3500e7d2b0	[AMDGPU][MC][GFX7][GFX10] Corrected image_atomic_fcmpswap Differential Revision: https://reviews.llvm.org/D109616	2021-09-21 18:06:02 +03:00
Ben Shi	b3052013b4	[RISCV] Optimize (add (mul x, c0), c1) Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0), if c1/c0 and c1%c0 are simm12, while c1 is not. Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0), if c1%c0 is zero, and c1/c0 is simm12 while c1 is not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108607	2021-09-21 14:13:14 +00:00
Anna Thomas	69921f6f45	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also added an API for retrieving a unique undroppable user. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-21 10:04:04 -04:00
Dmitry Preobrazhensky	b8e7f53208	[AMDGPU][MC][GFX10] Enabled dlc for FLAT and GLOBAL atomics Differential Revision: https://reviews.llvm.org/D109614	2021-09-21 16:23:20 +03:00
hyeongyu kim	043733d677	[IR] Add the constructor of ShuffleVector for one-input-vector. One of the two inputs of the Shufflevector is often a placeholder. Previously, there were cases where the placeholder was undef, and there were cases where it was poison. I added these constructors to create a placeholder consistently. Changing to use the newly added constructor will be written in a separate patch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D110146	2021-09-21 22:06:07 +09:00
Jonas Paulsson	a48b43f981	[SystemZ] Emit EXRL target instructions before text section is ended. SystemZ adds the EXRL target instructions in the end of each file. This must be done before debug info emission since that may end the text section, and therefore this is now done in emitConstantPools() (instead of in emitEndOfAsmFile). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109513	2021-09-21 14:32:28 +02:00
Nicholas Guy	9e4d72675f	[AArch64] Improve schedule modelling on the Cortex-A55 Enables the FuseAddress feature in the Cortex-A55 scheduling model Differential Revision: https://reviews.llvm.org/D109323	2021-09-21 13:03:34 +01:00
Simon Pilgrim	fc8f1e4419	[InstCombine] foldConstantInsEltIntoShuffle - bail if we fail to find constant element (PR51824) If getAggregateElement() returns null for any element, early out as otherwise we will assert when creating a new constant vector Fixes PR51824 + ; OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=38057	2021-09-21 13:01:09 +01:00
Simon Pilgrim	20b58855e0	[CodeGen] SelectionDAGBuilder - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	f5d23d36de	RewriteStatepointsForGC - Use const-ref iterator in for-range loops. NFCI. Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Simon Pilgrim	0f83456cf5	[CodeGen] SDDbgValue::getSDNodes() - use const-ref to avoid unnecessary copies. NFCI. Reported by MSVC static analyzer.	2021-09-21 13:01:08 +01:00
Jay Foad	598bebeaa6	[AMDGPU] Prefer fmac over fma when selecting FMA_W_CHAIN FMA_W_CHAIN is used when lowering fdiv f32. Prefer to select it to fmac if there are no source modifiers, just like we do for other mad/mac and fma/fmac cases. Differential Revision: https://reviews.llvm.org/D110074	2021-09-21 11:57:45 +01:00
Jay Foad	86dcb59206	[AMDGPU] Prefer v_fmac over v_fma only when no source modifiers are used v_fmac with source modifiers forces VOP3 encoding, but it is strictly better to use the VOP3-only v_fma instead, because $dst and $src2 are not tied so it gives the register allocator more freedom and avoids a copy in some cases. This is the same strategy we already use for v_mad vs v_mac and v_fma_legacy vs v_fmac_legacy. Differential Revision: https://reviews.llvm.org/D110070	2021-09-21 11:57:45 +01:00
Max Kazantsev	cd166fb2ef	[SCEV] Use isAvailableAtLoopEntry in the asserts This is what is supposed to be there.	2021-09-21 17:11:15 +07:00
Petar Avramovic	8bc7185668	GlobalISel/Utils: Refactor constant splat match functions Add generic helper function that matches constant splat. It has option to match constant splat with undef (some elements can be undef but not all). Add util function and matcher for G_FCONSTANT splat. Differential Revision: https://reviews.llvm.org/D104410	2021-09-21 12:09:35 +02:00
Max Kazantsev	4d5d725428	[SCEV] Add some asserts on availability of arguments of isLoopEntryGuardedByCond The logic in howManyLessThans is fishy. It first checks invariance of RHS, and then uses OrigRHS as argument for isLoopEntryGuardedByCond, which is, strictly saying, a different thing. We are seeing a very rare intermittent failure of availability checks, and it looks like this precondition is sometimes broken. Before we can figure out what's going on, adding asserts that all involved values that may possibly to to isLoopEntryGuardedByCond are available at loop entry. If either of these asserts fails (OrigRHS is the most likely suspect), it means that the logic here is flawed.	2021-09-21 17:08:52 +07:00
David Stenberg	7b4cc09b14	[LowerConstantIntrinsics] Fix heap-use-after-free bug in worklist This fixes PR51730, a heap-use-after-free bug in replaceConditionalBranchesOnConstant(). With the attached reproducer we were left with a function looking something like this after replaceAndRecursivelySimplify(): [...] cont2.i: br i1 %.not1.i, label %handler.type_mismatch3.i, label %cont4.i handler.type_mismatch3.i: %3 = phi i1 [ %2, %cont2.thread.i ], [ false, %cont2.i ] unreachable cont4.i: unreachable [...] with both the branch instruction and PHI node being in the worklist. As a result of replacing the branch instruction with an unconditional branch, the PHI node in %handler.type_mismatch3.i would be removed. This then resulted in a heap-use-after-free bug due to accessing that removed PHI node in the next worklist iteration. This is solved by using a value handle worklist. I am a unsure if this is the most idiomatic solution. Another solution could have been to produce a worklist just containing the interesting branch instructions, but I thought that it perhaps was a bit cleaner to keep all worklist filtering in the loop that does the rewrites. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D109221	2021-09-21 11:33:07 +02:00
Cullen Rhodes	b23d22f7d5	[PowerPC] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D109715	2021-09-21 08:24:16 +00:00
Evgeniy Brevnov	129cf33604	[DSE][NFC] Rename Later->Killing, Earlier->Dead First (and biggest) change is to use "Killing/Dead" in place of "Later/Earlier" base for names in DSE. For example, [Maybe]DeadLoc - is a location killed by KillingI instruction. I believe such names are more descriptive and easy to understand than current ones. Second, there are inconsistencies in naming where different names are used for the same thing. Fixed that too. Third, reordered parameters of isPartialOverwrite, tryToMergePartialOverlappingStores, isOverwrite to make them consistent between each other. This greatly reduces potential mistakes. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106947	2021-09-21 13:44:12 +07:00
Amara Emerson	7091a7f781	[GlobalISel][Legalizer] Don't use eraseFromParentAndMarkDBGValuesForRemoval() for some artifacts. For artifacts excluding G_TRUNC/G_SEXT, which have IR counterparts, we don't seem to have debug users of defs. However, in the legalizer we're always calling MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval() which is expensive. In some rare cases, this contributes significantly to unreasonably long compile times when we have lots of artifact combiner activity. To verify this, I added asserts to that function when it actually replaced a debug use operand with undef for these artifacts. On CTMark with both -O0 and -Os and debug info enabled, I didn't see a single case where it triggered. In my measurements I saw around a 0.5% geomean compile-time improvement on -g -O0 for AArch64 with this change. Differential Revision: https://reviews.llvm.org/D109750	2021-09-20 23:34:42 -07:00
Max Kazantsev	2c7d5fbc9e	[SCEV] Generalize implication when signedness of FoundPred doesn't matter The implication logic for two values that are both negative or non-negative says that it doesn't matter whether their predicate is signed and unsigned, but only flips unsigned into signed for further inference. This patch adds support for flipping a signed predicate into unsigned as well. Differential Revision: https://reviews.llvm.org/D109959 Reviewed By: nikic	2021-09-21 11:17:56 +07:00
Yonghong Song	ea72b0319d	BPF: make 32bit register spill with 64bit alignment In llvm, for non-alu32 mode, the stack alignment is 64bit so only one 64bit spill per 64bit slot. For alu32 mode, the stack alignment is 32bit, so it is possible to have two 32bit spills per 64bit slot. Currently, bpf kernel verifier does not preserve register states for 32bit spills. That is, one 32bit register may hold a constant value or a bounded range before spill. After reload from the stack, the information is lost and sometimes this may cause verifier failure. For 64bit register spill, the verifier indeed tries to preserve the register state for reloading. The current verifier can be modestly changed to handle one 32bit spill per 64bit stack slot with state-preserving reload. Handling two 32bit spills per 64bit stack slot will require substantial changes. This patch changes stack alignment for alu32 to be 64bit. This way, for any 64bit slot in alu32 mode, only one 32bit or 64bit register values can be saved. Together with previous-mentioned verifier enhancement, 32bit spill can be handled with state preserving. Note that llvm stack slot coallescing seems only doing adjacent packing which may leave some holes in the stack. For example, stack slot 8 <== 8 bytes stack slot 4 <== 8 bytes with 4 byte hole stack slot 8 <== 8 bytes stack slot 4 <== 4 bytes Differential Revision: https://reviews.llvm.org/D109073	2021-09-20 21:00:25 -07:00

... 2 3 4 5 6 ...

151193 Commits