llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	6fc50e531d	[CostModel][X86] Remove old FIXME comments for AVX512F vector splitting Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.	2021-11-01 11:11:11 +00:00
Simon Pilgrim	fd485d8cda	[X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances. This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source. There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses? Noticed while investigating the quality of interleaved load/store codegen. Differential Revision: https://reviews.llvm.org/D111960	2021-11-01 10:45:50 +00:00
David Sherwood	87a294d5eb	[LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion We never expect the runtime VF to be negative so we should use the uitofp instruction instead of sitofp. Differential revision: https://reviews.llvm.org/D112610	2021-11-01 09:58:14 +00:00
Roman Lebedev	b554e41e2d	[CVP] Canonicalize signed relational comparisons of scalar integers to unsigned comparison predicates Now that the reasoning was added to ConstantRange in D90924, this replicates IndVars variant of this transform (D111836) in a pass that uses value range reasoning for the transform. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112895	2021-11-01 12:16:05 +03:00
Jun Ma	1f9fa54984	[Taildup] Don't tail-duplicate loop header with multiple successors as its latches when Taildup hit loop with multiple latches like: // 1 -> 2 <-> 3 \| // \ <-> 4 \| // \ <-> 5 \| // \---> rest \| it may transform this loop into multiple loops by duplicate loop header. However, this change may has little benefit while makes cfg much complex. In some uncommon cases, it causes large compile time regression (offered by @alexfh in D106056). This patch disable tail-duplicate of such cases. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D110613	2021-11-01 15:32:00 +08:00
Jun Ma	c93f93b2e3	Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""" This reverts commit `3a998c06a8`.	2021-11-01 15:31:59 +08:00
Kazu Hirata	476e1ee3da	[AArch64] Remove unused declaration hasSwiftExtendedFrame (NFC)	2021-10-31 22:58:56 -07:00
Max Kazantsev	e512c5b166	[SCEV][NFC] Factor out common API for getting unique operands of a SCEV This function is used at least in 2 places, to it makes sense to make it separate. Differential Revision: https://reviews.llvm.org/D112516 Reviewed By: reames	2021-11-01 11:36:47 +07:00
Chen Zheng	eeed1545b2	[PowerPC] turn off chain commoning by default.	2021-11-01 04:11:10 +00:00
Itay Bookstein	848812a55e	[Verifier] Add verification logic for GlobalIFuncs Verify that the resolver exists, that it is a defined Function, and that its return type matches the ifunc's type. Add corresponding check to BitcodeReader, change clang to emit the correct type, and fix tests to comply. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112349	2021-10-31 20:00:57 -07:00
Zi Xuan Wu	cf78715cae	[CSKY] First patch to construct codegen infra and generate first add instruction Ooops. It constructs codegen infra and provide only basic code to generate first add instruction successfully. Differential Revision: https://reviews.llvm.org/D112206	2021-11-01 10:06:56 +08:00
Shoaib Meenai	0cf624cad7	[TimeProfiler] Reset variable to nullptr Otherwise we'll hit a spurious assert failure when we reset and then reinitialize TimeProfiler on the same thread, as can happen when e.g. using LLD as a library and running it multiple times in the same process. Makes `lld/test/MachO/time-trace.s` pass with `LLD_IN_TEST=2`, which runs the linker twice in the same process and exposed the issue. Reviewed By: MaskRay, mehdi_amini Differential Revision: https://reviews.llvm.org/D112880	2021-10-31 16:14:30 -07:00
Roman Lebedev	03a4f1f3b8	[ConstantRange] Sign-flipping of signedness-invariant comparisons For certain combination of LHS and RHS constant ranges, the signedness of the relational comparison predicate is irrelevant. This implements complete and precise model for all predicates, as confirmed by the brute-force tests. I'm not sure if there are some more cases that we can handle here. In a follow-up, CVP will make use of this. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D90924	2021-10-31 22:53:17 +03:00
Lang Hames	ff846fcb64	[ORC][ORC-RT] Switch MachO EH/TLV registration from EPC-calls to alloc actions. MachOPlatform used to make an EPC-call (registerObjectSections) to register the eh-frame and thread-data sections for each linked object with the ORC runtime. Now that JITLinkMemoryManager supports allocation actions we can use these instead of an EPC call. This saves us one EPC-call per object linked, and manages registration/deregistration in the executor, rather than the controller process. In the future we may use this to allow JIT'd code in the executor to outlive the controller object while still being able to be cleanly destroyed. Since the code for allocation actions must be available when the actions are run, and since the eh-frame registration code lives in the ORC runtime itself, this change required that MachO eh-frame support be split out of macho_platform.cpp and into its own macho_ehframe_registration.cpp file that has no other dependencies. During bootstrap we start by forcing emission of macho_ehframe_registration.cpp so that eh-frame registration is guaranteed to be available for the rest of the bootstrap process. Then we load the rest of the MachO-platform runtime support, erroring out if there is any attempt to use TLVs. Once the bootstrap process is complete all subsequent code can use all features.	2021-10-31 10:27:40 -07:00
Lang Hames	b77c6db959	[JITLink] Fix alloc action call signature in InProcessMemoryManager. Alloc actions should return a CWrapperFunctionResult. JITLink does not have access to this type yet, due to library layering issues, so add a cut-down version with a fixme.	2021-10-31 10:27:40 -07:00
Craig Topper	ada5458521	[RISCV] Expand scalable vector bswap. Fix crash for bitreverse. Fix LegalizeVectorOps to not try shuffle or unrolling expansions for scalable vectors. Differential Revision: https://reviews.llvm.org/D112236	2021-10-31 10:01:27 -07:00
Kazu Hirata	1a605f395f	[CodeGen] Use make_early_inc_range (NFC)	2021-10-31 07:57:36 -07:00
Kazu Hirata	72710af233	[CodeGen, Target] Use MachineBasicBlock::terminators (NFC)	2021-10-31 07:57:34 -07:00
Kazu Hirata	c714da2ceb	[Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC)	2021-10-31 07:57:32 -07:00
Kazu Hirata	4cc7c4724f	[MachineCSE] Use make_early_inc_range (NFC)	2021-10-30 19:00:23 -07:00
Kazu Hirata	c8b1ed5fb2	[clang, llvm] Use Optional::getValueOr (NFC)	2021-10-30 19:00:21 -07:00
Lang Hames	213666f804	[ORC] Move CWrapperFunctionResult out of the detail:: namespace. This type has been moved up into the llvm::orc::shared namespace. This type was originally put in the detail:: namespace on the assumption that few (if any) LLVM source files would need to use it. In practice it has been needed in many places, and will continue to be needed until/unless OrcTargetProcess is fully merged into the ORC runtime.	2021-10-30 16:12:45 -07:00
Kazu Hirata	5970249439	[Hexagon] Remove chksetELFHeaderEFlags (NFC) The function was introduced without any use on Nov 9, 2015 in commit `7cd0892729`.	2021-10-30 08:43:43 -07:00
Kazu Hirata	c3d63a0697	[Hexagon] Remove ValidArch (NFC) This function seems to be unused for at least one year.	2021-10-30 08:43:41 -07:00
Kazu Hirata	c5cd371cc9	[Hexagon] Remove unused struct InstTy (NFC)	2021-10-30 08:43:39 -07:00
Roman Lebedev	25043c8276	[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate As noted in https://reviews.llvm.org/D90924#inline-1076197 apparently this is a pretty common pattern, let's not repeat it yet again, but have it in a common place. There may be some more places where it could be used, but these are the most obvious ones.	2021-10-30 17:50:06 +03:00
David Green	2c4a9e830c	[ValueTracking] Teach computeConstantRange that the maximum value of a half is 65504 The maximal value of a half is 0x7bff, which is 65504 when converted to an integer. This patch teaches that to computeConstantRange to compute a constant range with the correct maximum value. https://alive2.llvm.org/ce/z/BV_Spb https://alive2.llvm.org/ce/z/Nwuqvb The maximum value for a float converted in the same way is 3.4e38, which requires 129bits of data. I have not added that here as integer types so larger are rare, compared to integers types larger than 17 bits require for half floats. The MVE tests change because instsimplify happens to be run as a part of the backend, where it doesn't tend to for other backends. Differential Revision: https://reviews.llvm.org/D112694	2021-10-30 14:27:38 +01:00
Christudasan Devadasan	aa2d3b59ce	GlobalISel/Utils: Use incoming regbank while constraining the superclasses Register operands with superclasses can possibly have multiple regBanks if they have different register types. The regBank ambiguity resolved during regbankselect should be used to constrain the operand regclass instead of obtaining one from the MCInstrDesc. This is a prerequisite patch for D109300 that introduces allocatable AV_* Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to restrain the regclass to either A or V based on the incoming regbank. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112323	2021-10-30 07:20:45 -04:00
David Green	66281baea1	[InstCombine] Fix type of constant in canonicalizeClampLike As a followup to D108049, one of the constants could now be generated with an incorrect type, now that the input could be truncated.	2021-10-30 09:06:21 +01:00
Kazu Hirata	972d4133e9	Use {DenseSet,SmallPtrSet}::contains (NFC)	2021-10-29 20:26:07 -07:00
Lang Hames	afeb1e4ac7	[ORC] Move all pass config into MachOPlatformPlugin::modifyPassConfig. NFC, this just makes it easier to see and reason about pass ordering.	2021-10-29 20:07:45 -07:00
Duncan P. N. Exon Smith	0d5b6423ba	Support: Reduce stats in fs::copy_file on Darwin fs::copy_file() on Darwin has a nice optimization to clone the file when possible. Change the implementation to use clonefile() directly, instead of the higher-level copyfile(). The latter does the wrong thing for symlinks, which requires calling `stat` first... With that out of the way, optimistically call clonefile() all the time, and then for any error that's recoverable try again with copyfile() (without the COPYFILE_CLONE flag, as before). Differential Revision: https://reviews.llvm.org/D112250	2021-10-29 16:48:35 -07:00
Stanislav Mekhanoshin	e5340ed30c	[AMDGPU] Fix global isel for kernels using agprs on gfx90a With Global ISel getReservedRegs() is called before function is regbank selected for the first time. Defer caching of usesAGPRs() in this case. Differential Revision: https://reviews.llvm.org/D112644	2021-10-29 14:23:14 -07:00
Florian Hahn	274a9b0f0b	[DSE] Support redundant stores eliminated by memset. This patch adds support to remove stores that write the same value as earlier memesets. It uses isOverwrite to check that a memset completely overwrites a later store. The candidate store must store the same bytewise value as the byte stored by the memset. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112321	2021-10-29 22:19:53 +01:00
Nikita Popov	cdf45f98ca	[BasicAA] Extract linear expression multiplication (NFC) Extract a common method for multiplying a linear expression by a factor.	2021-10-29 22:41:40 +02:00
Sam Clegg	3b039c68f2	Revert "[WebAssembly] Fix debug locations for ExplicitLocals pass" This reverts commit `a66451ebbe`. This caused a failure when integrated with emscripten: https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8832019855439718609/overview	2021-10-29 13:34:18 -07:00
Nikita Popov	7cf7378a9d	[BasicAA] Don't treat non-inbounds GEP as nsw The scale multiplication is only guaranteed to be nsw if the GEP is inbounds (or the multiplication is trivial). Previously we were only considering explicit muls in GEP indices.	2021-10-29 22:30:44 +02:00
Nick Desaulniers	39e5dd113f	[SparcISelLowering] avoid emitting libcalls to __muloti4 and __mulodi4 These compiler-rt-only symbols aren't available in libgcc. Similar to D108842, D108844, and D108926. Fixes: pr/52043 Reviewed By: craig.topper, rengolin Differential Revision: https://reviews.llvm.org/D112750	2021-10-29 13:14:09 -07:00
Sanjay Patel	285b8abce4	[x86] limit vector increment fold to allow load folding The tests are based on the example from: https://llvm.org/PR52032 I suspect that it looks worse than it actually is. :) That is, llvm-mca says there's no uop/timing difference with the load folding and pcmpeq vs. broadcast on Haswell (and probably other targets). The load-folding definitely makes the code smaller, so it's good for that at least. So this requires carving a narrow hole in the transform to get just this case without changing others that look good as-is (in other words, the transform still seems good for most examples). Differential Revision: https://reviews.llvm.org/D112464	2021-10-29 15:48:35 -04:00
Sanjay Patel	837518d6a0	[x86] make mayFold* helpers visible to more files; NFC The first function is needed for D112464, but we might as well keep these together in case the others can be used someday.	2021-10-29 15:48:35 -04:00
Sanjay Patel	8f786b4618	[InstCombine] fix comments to match code; NFC	2021-10-29 15:48:35 -04:00
modimo	5caad9b5d3	[InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner Adds the following switches: 1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are: 1. Original: defers to original advisor 2. AlwaysInline: inline all sites not in replay 3. NeverInline: inline no sites not in replay 2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are: 1. Line 2. LineColumn 3. LineDiscriminator 4. LineColumnDiscriminator Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks. All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using: 1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function 2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system 3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another. Updated various tests to cover the new switches and negative remarks Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D112040	2021-10-29 12:32:03 -07:00
Duncan P. N. Exon Smith	9902362701	Support: Use sys::path::is_style_{posix,windows}() in a few places Use the new sys::path::is_style_posix() and is_style_windows() in a few places that need to detect the system's native path style. In llvm/lib/Support/Path.cpp, this patch removes most uses of the private `real_style()`, where is_style_posix() and is_style_windows() are just a little tidier. Elsewhere, this removes `_WIN32` macro checks. Added a FIXME to a FileManagerTest that seemed fishy, but maintained the existing behaviour. Differential Revision: https://reviews.llvm.org/D112289	2021-10-29 12:09:41 -07:00
modimo	51ce567b38	[SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO. Testing: ninja check-all Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D112033	2021-10-29 12:04:52 -07:00
Roman Lebedev	0ae7bf124a	[NFC][LoopDeletion] Count the number of broken backedges Those don't contribute to the number of deleted loops.	2021-10-29 21:58:16 +03:00
Amara Emerson	5dd9e019dd	[AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added. rdar://84674985	2021-10-29 11:47:00 -07:00
Duncan P. N. Exon Smith	4e4883e1f3	Support: Expose sys::path::is_style_{posix,windows,native}() Expose three helpers in namespace llvm::sys::path to detect the path rules followed by sys::path::Style. - is_style_posix() - is_style_windows() - is_style_native() This are constexpr functions that that will allow a bunch of path-related code to stop checking `_WIN32`. Originally I looked at adding system_style(), analogous to sys::endian::system_endianness(), but future patches (from others) will add more Windows style variants for slash preferences. These helpers should be resilient to that change, allowing callers to detect basic path rules. Differential Revision: https://reviews.llvm.org/D112288	2021-10-29 11:46:44 -07:00
Sanjay Patel	d0e9879d96	[InstCombine] allow vector splat matching for bitwise logic folds These transforms are also likely missing a one-use check, but that's another patch.	2021-10-29 14:22:50 -04:00
Stanislav Mekhanoshin	a905c54b76	[InstCombine] Fold `(~(a \| b) & c) \| ~(a \| c)` into `~((b & c) \| a)` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %or1 = or i4 %b, %a %not1 = xor i4 %or1, -1 %or2 = or i4 %a, %c %not2 = xor i4 %or2, -1 %and = and i4 %not2, %b %or3 = or i4 %and, %not1 ret i4 %or3 } define i4 @tgt(i4 %a, i4 %b, i4 %c) { %and = and i4 %c, %b %or = or i4 %and, %a %or3 = xor i4 %or, -1 ret i4 %or3 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112338	2021-10-29 10:58:09 -07:00
Matt Morehouse	33cc0cfd46	[X86] Don't affect jump tables under +tagged-globals. `classifyLocalReference(nullptr)` is called to get the appropriate relocation type for jump tables. We should not use @GOTPCREL for this case. The new test cases trigger assertions without this patch. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D112832	2021-10-29 10:37:43 -07:00
Fraser Cormack	8314a04ede	[SelectionDAG] Allow FindMemType to fail when widening loads & stores This patch removes an internal failure found in FindMemType and "bubbles it up" to the users of that method: GenWidenVectorLoads and GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns an optional value, returning None if no such type is found. Each of the aforementioned users now pre-calculates the list of types it will use to widen the memory access. If the type breakdown is not possible they will signal a failure, at which point the compiler will crash as it does currently. This patch is preparing the ground for alternative legalization strategies for vector loads and stores, such as using vector-predication versions of loads or stores. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112000	2021-10-29 18:27:31 +01:00
Craig Topper	aefcd59895	[RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better. If the VL operand of a mask register instruction comes from an explicit vsetvli with a different VTYPE, we can still avoid needing a vsetvli as long as the SEW/LMUL ratio is the same and policy bits match. Differential Revision: https://reviews.llvm.org/D112762	2021-10-29 09:49:36 -07:00
Simon Pilgrim	6102e5d56b	[CostModel][X86] Remove old TODO comment BMI (TZCNT) scalar handling was added at rGa2db388dce77c2f23f2009d7363a0b63bb54523c	2021-10-29 17:28:45 +01:00
Mircea Trofin	d6790a0a3c	[NFC] ProfileSummary: const most of the fields. This simplifies readability / maintainability.	2021-10-29 08:36:08 -07:00
Bradley Smith	86972f1114	[AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes Add support for generating TargetFrameIndex in complex patterns for indexed addressing modes in SVE. Additionally, add missing load/stores to getMemOpInfo and getLoadStoreImmIdx. Differential Revision: https://reviews.llvm.org/D112617	2021-10-29 14:44:16 +00:00
Jay Foad	56f03d25b4	[IR] Remove createReplacementInstr. NFC. It is unused since D112791. Differential Revision: https://reviews.llvm.org/D112795	2021-10-29 15:03:19 +01:00
Jay Foad	1b758925ad	[IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction createReplacementInstr was a trivial wrapper around ConstantExpr::getAsInstruction, which also inserted the new instruction into a basic block. Implement this directly in getAsInstruction by adding an InsertBefore parameter and change all callers to use it. NFC. A follow-up patch will remove createReplacementInstr. Differential Revision: https://reviews.llvm.org/D112791	2021-10-29 15:02:58 +01:00
Jay Foad	21a1d4cf71	[AMDGPU] Change numBitsSigned for simplicity and document it. NFC. Change numBitsSigned to return the minimum size of a signed integer that can hold the value. This is different by one from the previous result but is more consistent with numBitsUnsigned. Update all callers. All callers are now more consistent between the signed and unsigned cases, and some callers get simpler, especially the ones that deal with quantities like numBitsSigned(LHS) + numBitsSigned(RHS). Differential Revision: https://reviews.llvm.org/D112813	2021-10-29 14:22:06 +01:00
Chen Zheng	7591d21032	[PowerPC] fix a miscompile for Solaris build	2021-10-29 12:06:25 +00:00
Bradley Smith	bf72a469ba	[AArch64][SVE] Fix build failure introduced in `13faa5f440`	2021-10-29 11:57:02 +00:00
David Green	11630dbbc3	[InstCombine] Fold BW/2+1 tops bits are same pattern Match "icmp eq (trunc (lsr A, BW), (ashr (trunc A), BW-1))", which checks the top BW/2 + 1 bits are all the same. Create "A >=s INT_MIN && A <=s INT_MAX", which we generate as "icmp ult (add A, 2^BW-1), 2^BW" to skip a few steps of instcombining. https://alive2.llvm.org/ce/z/NjH6Ty https://alive2.llvm.org/ce/z/_fEQ9P Differential Revision: https://reviews.llvm.org/D109155	2021-10-29 12:30:20 +01:00
Simon Pilgrim	154c036ebb	[X86] combineX86GatherScatter - only fold scale if the index isn't extended As mentioned on D108539, when the gather indices are smaller than the pointer size, they are sign-extended BEFORE scale is applied, making the general fold unsafe. If the index have sufficient sign-bits then folding the scale could be safe - I'll investigate this.	2021-10-29 11:48:05 +01:00
David Green	9020e22a87	[InstCombine] Convert xor (ashr X, BW-1), C -> select(X >=s 0, C, ~C) The sequence of instructions `xor (ashr X, BW-1), C` (or with a truncation `xor (trunc (ashr X, BW-1)), C)` takes a value, produces all zeros or all ones and with it optionally inverts a constant depending on whether the original input was positive or negative. This is the same as checking if the value is positive, and selecting between the constant and ~constant. https://alive2.llvm.org/ce/z/NJ85qY This is a fairly general version of a fold that helps pull saturating arithmetic into a canonical form. Differential Revision: https://reviews.llvm.org/D109151	2021-10-29 11:19:20 +01:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Cullen Rhodes	8686626244	[Sparc] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109712	2021-10-29 09:16:15 +00:00
Neubauer, Sebastian	c78640ee6a	[TailDuplicator] Fix merging block with terminator The TailDuplicator merged two blocks, even if the first one ended with a terminator, resulting in invalid MIR, where a terminator is in the middle of a block. Abort merging if the first block ends with a terminator. Differential Revision: https://reviews.llvm.org/D112226	2021-10-29 10:52:46 +02:00
Vang Thao	52b43d1549	[AMDGPU] Fix cvt_f32_ubyte combine with shl Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112733	2021-10-28 21:43:06 -07:00
Chuanqi Xu	bb16e83932	[NFC] [Coroutines] Use llvm::make_scope_exit to replace self-defined RTTIHelper	2021-10-29 12:14:20 +08:00
Kazu Hirata	01b4789b62	[AMDGPU] Remove hasDefinedInitializer (NFC) The last use was removed on Sep 16, 2021 in commit `7a62a5b56d`.	2021-10-28 20:33:34 -07:00
Kazu Hirata	dd5d46b009	[AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC) This field seems to be unused for at least one year.	2021-10-28 20:33:32 -07:00
Kazu Hirata	309357c01a	[AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC)	2021-10-28 20:33:30 -07:00
Abinav Puthan Purayil	db8d7b6e2d	[DAGCombine][NFC] s/it's/its in the comment of hasNoInfs().	2021-10-29 07:36:38 +05:30
Lang Hames	12b2cc2294	[ORC] Rename SupportFunctionCall to WrapperFunctionCall. The new name better suits the type. This patch also changes the signature of the run method (it now returns a WrapperFunctionResult), and adds runWithSPSRet methods that deserialize the function result using SPS. Together these chages bring this type into close alignment with its ORC runtime counterpart.	2021-10-28 17:48:54 -07:00
Lang Hames	999c6a235e	Reapply `e32b1eee6a` "[ORC] Change SPSExecutorAddr serialization,..." with fixes. This re-applies `e32b1eee6a`, which was reverted in `20675d8f7d` due to broken unit tests. This patch includes fixes for the tests.	2021-10-28 16:40:25 -07:00
Thomas Lively	fb67f3d969	[WebAssembly] Add prototype relaxed float to int trunc instructions Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u, i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112186	2021-10-28 14:01:53 -07:00
Daniel Kiss	d8075e8781	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 21:45:09 +02:00
Wouter van Oortmerssen	a66451ebbe	[WebAssembly] Fix debug locations for ExplicitLocals pass Differential Revision: https://reviews.llvm.org/D112487	2021-10-28 12:35:46 -07:00
Stanislav Mekhanoshin	f7f430c913	[InstCombine] Fixed non-determinisctic order of new instructions Fixes non-determinisctic order of XOR instructions created after `5a7a458306`. The order of call argument evaluation is not defined, so create one Value before the call.	2021-10-28 12:14:02 -07:00
Stanislav Mekhanoshin	5a7a458306	[InstCombine] Fold `(c & ~(a \| b)) \| (b & ~(a \| c))` to `~a & (b ^ c)` ``` ---------------------------------------- define i4 @src(i4 %a, i4 %b, i4 %c) { %0: %or1 = or i4 %a, %b %not1 = xor i4 %or1, 15 %and1 = and i4 %not1, %c %or2 = or i4 %a, %c %not2 = xor i4 %or2, 15 %and2 = and i4 %not2, %b %or3 = or i4 %and1, %and2 ret i4 %or3 } => define i4 @tgt(i4 %a, i4 %b, i4 %c) { %0: %xor = xor i4 %b, %c %not = xor i4 %a, 15 %or3 = and i4 %xor, %not ret i4 %or3 } Transformation seems to be correct! ``` Differential Revision: https://reviews.llvm.org/D112276	2021-10-28 11:54:30 -07:00
Ahmed Bougacha	bef777206e	[AArch64] Rename some timm predicates for consistency. NFC. timm isn't the common case, and TImmLeafs should make it clear what they are. We're adding a plain ImmLeaf for 0_65535, so rename i64_imm0_65535 to timm64_0_65535, and imm32_0_7 to timm32_0_7.	2021-10-28 11:41:29 -07:00
Yuanfang Chen	ac02bcad56	[IRSymTab] Mark __stack_chk_guard used `__stack_chk_guard` is a global variable that has no uses before the LLVM code generation phase (how it is defined is platform-dependent). LTO needs to preserve this symbol for that reason. Currently, legacy LTO API preserves it by hardcoding the logic in Internalizer, but this symbol is not preserved by regular LTO API in thinlink phase. This patch marks `__stack_chk_guard` used during IR symbol table writing since this is how builtin functions are preserved by thinlink by using `RuntimeLibcalls.def`. Reviewed By: MaskRay, tejohnson Differential Revision: https://reviews.llvm.org/D112595	2021-10-28 11:22:26 -07:00
Yuanfang Chen	c18ed69873	[Internalize] Preserve __stack_chk_fail in Internalizer correctly Move the section collecting `AlwaysPreserved` up before any `maybeInternalize` is called. Otherwise, functions in `AlwaysPreserved` (in this case, `__stack_chk_fail`) are not preserved. Reviewed By: MaskRay, tejohnson Differential Revision: https://reviews.llvm.org/D112684	2021-10-28 11:22:26 -07:00
Guozhi Wei	1e46dcb77b	[TwoAddressInstructionPass] Put all new instructions into DistanceMap In function convertInstTo3Addr, after converting a two address instruction into three address instruction, only the last new instruction is inserted into DistanceMap. This is wrong, DistanceMap should track all instructions from the beginning of current MBB to the working instruction. When a two address instruction is converted to three address instruction, multiple instructions may be generated (usually an extra COPY is generated), all of them should be inserted into DistanceMap. Similarly when unfolding memory operand in function tryInstructionTransform DistanceMap is not maintained correctly. Differential Revision: https://reviews.llvm.org/D111857	2021-10-28 11:11:59 -07:00
Matthias Braun	e2c7ee0743	X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr This extends `optimizeCompareInstr` to re-use previous comparison results if the previous comparison was with an immediate that was 1 bigger or smaller. Example: CMP x, 13 ... CMP x, 12 ; can be removed if we change the SETg SETg ... ; x > 12 changed to `SETge` (x >= 13) removing CMP Motivation: This often happens because SelectionDAG canonicalization tends to add/subtract 1 often when optimizing for fallthrough blocks. Example for `x > C` the fallthrough optimization switches true/false blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into `x < C + 1`. Differential Revision: https://reviews.llvm.org/D110867	2021-10-28 10:33:56 -07:00
Matthias Braun	97a1570d8c	X86InstrInfo: Optimize more combinations of SUB+CMP `X86InstrInfo::optimizeCompareInstr` would only optimize a `SUB` followed by a `CMP` in `isRedundantFlagInstr`. This extends the code to also look for other combinations like `CMP`+`CMP`, `TEST`+`TEST`, `SUB x,0`+`TEST`. - Change `isRedundantFlagInstr` to run `analyzeCompareInstr` on the candidate instruction and compare the results. This normalizes things and gives consistent results for various comparisons (`CMP x, y`, `SUB x, y`) and immediate cases (`TEST x, x`, `SUB x, 0`, `CMP x, 0`...). - Turn `isRedundantFlagInstr` into a member function so it can call `analyzeCompare`. - We now also check `isRedundantFlagInstr` even if `IsCmpZero` is true, since we now have cases like `TEST`+`TEST`. Differential Revision: https://reviews.llvm.org/D110865	2021-10-28 10:33:56 -07:00
Florian Hahn	c45045bfd0	[VPlan] Keep induction recipes in header. This patch updates recipe creation to ensure all VPWidenIntOrFpInductionRecipes are in the header block. At the moment, new induction recipes can be created in different blocks when trying to optimize casts and induction variables. Having all induction recipes in the header makes it easier to analyze/transform them in VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111300	2021-10-28 18:22:05 +01:00
Nicolai Hähnle	b437aaa672	MachineDominators: Define MachineDomTree type alias This is a (very) small move towards making the machine dominators more aligned with the IR dominators: * DominatorTree / MachineDomTree is the class holding the dominator tree * DominatorTreeWrapperPass / MachineDominatorTree is the corresponding (machine) function pass This alignment will be used by analyses that are designed as templates that work with LLVM IR as well as Machine IR. Reviewed By: critson Differential Revision: https://reviews.llvm.org/D112690	2021-10-28 22:30:35 +05:30
Leonard Grey	793b481f54	[CGProfile] Don't emit call graph profile edges with zero weight With D112160 and D112164, on a Chrome Mac build this reduces the total size of CGProfile sections by 78% (around 25% eliminated entirely) and total size of object files by 0.14%. Differential Revision: https://reviews.llvm.org/D112655	2021-10-28 11:32:49 -04:00
Daniel Kiss	66e03db814	Revert "Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."" This reverts commit `b6420e575f`.	2021-10-28 17:24:53 +02:00
Daniel Kiss	b6420e575f	Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This is relanding commit `da1d1a0869` . This patch additionally addresses failures found in buildbots & post review comments. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-28 16:49:19 +02:00
David Green	9358384fd6	[InstCombine] Extend canonicalizeClampLike to handle truncated inputs This extends the canonicalizeClampLike function to allow cases where the input is truncated, but still matching on the types of the ICmps. For example %t = trunc i32 %X to i8 %a = add i32 %X, 128 %cmp = icmp ult i32 %a, 256 %c = icmp sgt i32 %X, -1 %f = select i1 %c, i8 High, i8 Low %r = select i1 %cmp, i8 %t, i8 %f becomes %c1 = icmp slt i32 %X, -128 %c2 = icmp sge i32 %X, 128 %s1 = select i1 %c1, i32 sext(Low), i32 %X %s2 = select i1 %c2, i32 sext(High), i32 %s1 %t = trunc i32 %s2 to i8 https://alive2.llvm.org/ce/z/vPzfxH We limit the transform to constant High and Low values, where we know the sext are free. Differential Revision: https://reviews.llvm.org/D108049	2021-10-28 15:46:58 +01:00
Dawid Jurczak	f87e0c68d7	[DSE] Eliminates redundant store of an exisiting value (PR16520) That's https://reviews.llvm.org/D90328 follow-up. This change eliminates writes to variables where the value that is being written is already stored in the variable. This achieves the goal by looping through all memory definitions in the current state and getting defining access from each of them. When there is defining access where the write instruction is identical to the original instruction it will remove this redundant write. For example: void f() { x = 1; if foo() { x = 1; g(); } else { h(); } } void g(); void h(); The second x=1 will be eliminated since it is rewriting 1 to x. This pass will produce this: void f() { x = 1; if foo() { g(); } else { h(); } } void g(); void h(); Differential Revision: https://reviews.llvm.org/D111727	2021-10-28 16:20:09 +02:00
David Green	79011c705b	[InstCombine] Fix rare condition violation in canonicalizeClampLike With a "ult x, 0", the fold in canonicalizeClampLike does not validate with undef inputs. This condition will usually have been simplified away, but we should ensure the code is correct in case. https://alive2.llvm.org/ce/z/S8HQ6H vs https://alive2.llvm.org/ce/z/h2XBJ_ See: https://reviews.llvm.org/D108049	2021-10-28 15:03:07 +01:00
Simon Pilgrim	d29ccbecd0	[X86][AVX] Attempt to fold a scaled index into a gather/scatter scale immediate (PR13310) If the index operand for a gather/scatter intrinsic is being scaled (self-addition or a shl-by-immediate) then we may be able to fold that scaling into the intrinsic scale immediate value instead. Fixes PR13310. Differential Revision: https://reviews.llvm.org/D108539	2021-10-28 14:07:17 +01:00
Alexey Bataev	07ef9f513f	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-28 05:45:09 -07:00
Sanjay Patel	e8535fa784	[InstCombine] allow Negator to fold multi-use select with constant arms The motivating test is reduced from: https://llvm.org/PR52261 Note that the more general problem of folding any binop into a multi-use select of constants is still there. We need to ease the restriction in InstCombinerImpl::FoldOpIntoSelect() to catch those. But these examples never reach that code because Negator exclusively handles negation patterns within visitSub(). Differential Revision: https://reviews.llvm.org/D112657	2021-10-28 08:35:58 -04:00
Peter Waller	98f08752f7	[InstCombine][ConstantFolding] Make ConstantFoldLoadThroughBitcast TypeSize-aware The newly added test previously caused the compiler to fail an assertion. It looks like a strightforward TypeSize upgrade. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112142	2021-10-28 12:15:15 +00:00
Abinav Puthan Purayil	2da6ef3664	[AMDGPU] Add 24-bit mulhi intrinsics in INTRINSIC_WO_CHAIN combine. mul24 intrinsic's operands are simplified by AMDGPUTargetLowering::performIntrinsicWOChainCombine(). This change adds the mul24hi intrinsics in the combine since its operands can be simplified like that of the mul24 intrinsics. Differential Revision: https://reviews.llvm.org/D112702	2021-10-28 16:57:48 +05:30
Sebastian Neubauer	fd1cfc9094	[AMDGPU][GlobalISel] Fix waterfall loops - Move the `s_and exec` to its correct position before the content of the waterfall loop - Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit from optimizations - Add support for indirect function calls To support indirect calls, add a G_SI_CALL instruction without register class restrictions and insert a waterfall loop when applying register banks. Differential Revision: https://reviews.llvm.org/D109052	2021-10-28 10:30:55 +02:00
Neubauer, Sebastian	50d8d963e3	[GlobalISel] Simplify RegBankSelect Save the instruction list of a block before selecting banks. This allows to cope with moved instructions, even if they are reordered or splitted into multiple basic blocks. Differential Revision: https://reviews.llvm.org/D111223	2021-10-28 10:30:55 +02:00
Caroline Concatto	2186b011e9	[Driver][AArch64]Add driver support for neoverse-512tvb target The support for neoverse-512tvb mirrors the same option available in GCC[1]. There is no functional effect for this option yet. This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough plumbing is in place to allow the new option to be used in the future. [1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html Differential Revision: https://reviews.llvm.org/D112406	2021-10-28 09:08:40 +01:00
Martin Storsjö	177176f75c	[Support] [Windows] Manually clean up temp files if not setting delete disposition Since D81803 / `79657e2339`, temp files created on network shares don't set "Disposition.DeleteFile = true". This flag normally takes care of removing the temp file both if the process exits abnormally (either crashing or killed externally), and when the file is closed cleanly. For network shares, we voluntarily choose to not set the flag, and if the operation to inspect the file handle (as a prerequisite to setting the flag since `79657e2339`) fails we also error out. In both of these cases, we can at least make sure to remove the temp files when they are closed cleanly. Adjust the semantics of "OF_Delete" to not set the delete disposition, but only set the access mode for allowing deletion. Move the call to setDeleteDisposition into TempFile::create, where we can check if it failed, and if it did, set a flag noting that the file should be removed manually at the end. This does leak files on crash, but at least doesn't leak files in regular successful runs. (Technically, the alternative codepath could use the RemoveFileOnSignal function, but that might complicate the TempFile implementation further.) This fixes https://github.com/mstorsjo/llvm-mingw/issues/233 and https://bugs.llvm.org/show_bug.cgi?id=52080. Differential Revision: https://reviews.llvm.org/D111875	2021-10-28 10:33:37 +03:00
Hongtao Yu	259e4c5658	[CSSPGO] Trim cold base profiles for the CS preinliner. Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D112489	2021-10-27 22:50:27 -07:00
Hsiangkai Wang	7051f73d69	[RISCV] Sync Zvlsseg register order as the same as vector registers. Sync the order of Zvlsseg registers with vector registers to avoid unnecessary register copies between vector instructions and zvlsseg instructions. Differential Revision: https://reviews.llvm.org/D110250	2021-10-28 13:34:53 +08:00
Kazu Hirata	cee3419d65	[AMDGPU] Remove unused declaration findNumUsedRegistersSI (NFC)	2021-10-27 21:24:02 -07:00
Phoebe Wang	2bc28c6f82	[X86] Add a dependency breaking xor before any gathers with an undef passthru value. In the instruction encoding, the passthru register is always tied to the destination register. The CPU scheduler has to wait for the last writer of this register to finish executing before the gather can start. This is true even if the initial mask is all ones so that the passthru will never be used. By explicitly zeroing the register we can break the false dependency. The zero idiom is executed completing by the register renamer and so is immedately considered ready. Authored by Craig. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D112505	2021-10-28 11:44:52 +08:00
Hsiangkai Wang	0a9b82960c	[RISCV] Use vmv.v.[v\|i] if we know COPY is under the same vl and vtype. If we know the source operand of COPY is defined by a vector instruction with tail agnostic and the same LMUL and there is no vsetvli between COPY and the define instruction to change the vl and vtype, we could use vmv.v.v or vmv.v.i to copy vector registers to get better performance than the whole vector register move instructions. If the source of COPY is from vmv.v.i, we could use vmv.v.i for the COPY. This patch only considers all these instructions within one basic block. Case 1: ``` bb.0: ... VSETVLI # The first VSETVLI before COPY and VOP. ... # Use this VSETVLI to check LMUL and tail agnostic. ... vy = VOP va, vb # Define vy. ... # There is no vsetvli between VOP and COPY. vx = COPY vy ``` Case 2: ``` bb.0: ... VSETVLI # The first VSETVLI before VOP. ... # Use this VSETVLI to check LMUL and tail agnostic. ... vy = VOP va, vb # Define vy. ... # There is no vsetvli to change vl between VOP and COPY. ... VSETVLI # The first VSETVLI before COPY. ... # This VSETVLI does not change vl and vtype. ... vx = COPY vy ``` Co-Authored-by: Zakk Chen <zakk.chen@sifive.com> Co-Authored-by: Kito Cheng <kito.cheng@sifive.com> Differential Revision: https://reviews.llvm.org/D103510	2021-10-28 11:39:04 +08:00
Max Kazantsev	513914e1f3	[SCEV] Invalidate user SCEVs along with operand SCEVs to avoid cache corruption Following discussion in D110390, it seems that we are suffering from unability to traverse users of a SCEV being invalidated. The result of that is that ScalarEvolution's inner caches may store obsolete data about SCEVs even if their operands are forgotten. It creates problems when we try to verify the contents of those caches. It's also a frequent situation when messing with cache causes very sneaky and hard-to-analyze bugs related to corruption of memory when dealing with cached data. They are lurking there because ScalarEvolution's veirfication is not powerful enough and misses many problematic cases. I plan to make SCEV's verification much stricter in follow-ups, and this requires dangling-pointers-free caches. This patch makes sure that, whenever we forget cached information for a SCEV, we also forget it for all SCEVs that (transitively) use it. This may have negative compile time impact. It's a sacrifice we are more than willing to make to enforce correctness. We can also save some time by reworking invokers of forgetMemoizedResults (maybe we can forget multiple SCEVs with single query). Differential Revision: https://reviews.llvm.org/D111533 Reviewed By: reames	2021-10-28 09:39:24 +07:00
Craig Topper	1387483e72	[RISCV] Replace most uses of RISCVSubtarget::hasStdExtV. NFCI Add new hasVInstructions() which is currently equivalent. Replace vector uses of hasStdExtZfh/F/D with new vector specific versions. The vector spec no longer requires that the vectors implement the same types as scalar. It only requires that the scalar type is the maximum size the vectors can support. This is currently implemented using the scalar rule we were using before. Add new hasVInstructionsI64() begin using to qualify code that requires i64 vector elements. This is all NFC for now, but we can start using this to better implement D112408 which introduces the Zve extensions. Reviewed By: frasercrmck, eopXD Differential Revision: https://reviews.llvm.org/D112496	2021-10-27 19:33:48 -07:00
Johannes Doerfert	acf3093117	[Attributor][FIX] Do not ignore memory writes in AAMemoryBehavior Even if we look for `nocapture` we need to bail on escaping pointers. The crucial thing is that we might not look at a big enough scope when we derive the memory behavior. Thus, it might be `nocapture` in a larger context while it is "captured" in a smaller context.	2021-10-27 21:04:32 -05:00
Johannes Doerfert	734f91441d	[Attributor][NFC] Improve debug messages	2021-10-27 21:04:31 -05:00
Ard Biesheuvel	d7e089f2d6	[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register. Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead. This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,. Reviewed By: nickdesaulniers, peter.smith Differential Revision: https://reviews.llvm.org/D112600	2021-10-27 16:42:11 -07:00
Lang Hames	20675d8f7d	Revert "[ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct." This reverts commit `e32b1eee6a`. Reverting while I fix some broken unit tests.	2021-10-27 16:39:56 -07:00
Johannes Doerfert	8a4551b893	[Attributor][FIX] Use right address space to avoid assertion When we strip and accumulate constant offsets we need to pick the right address space such that the offset APInt has the right bit width. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112544	2021-10-27 18:22:37 -05:00
Lang Hames	e32b1eee6a	[ORC] Change SPSExecutorAddr serialization, SupportFunctionCall struct. SPSExecutorAddr will now be serializable to/from ExecutorAddr, rather than uint64_t. This improves type safety when working with serialized addresses. Also updates the SupportFunctionCall to use an ExecutorAddrRange (rather than a separate ExecutorAddr addr and uint64_t size field), and updates the tpctypes::*Write data structures to use ExecutorAddr rather than JITTargetAddress.	2021-10-27 16:20:46 -07:00
Roman Lebedev	b291597112	Revert rest of `IRBuilderBase`'s short-circuiting folds Upon further investigation and discussion, this is actually the opposite direction from what we should be taking, and this direction wouldn't solve the motivational problem anyway. Additionally, some more (polly) tests have escaped being updated. So, let's just take a step back here. This reverts commit `f3190dedee`. This reverts commit `749581d21f`. This reverts commit `f3df87d57e`. This reverts commit `ab1dbcecd6`.	2021-10-28 02:15:14 +03:00
Michael Liao	e6a4ba3aa6	[amdgpu] Handle the case where there is no scavenged register. - When an unconditional branch is expanded into an indirect branch, if there is no scavenged register, an SGPR pair needs spilling to enable the destination PC calculation. In addition, before jumping into the destination, that clobbered SGPR pair need restoring. - As SGPR cannot be spilled to or restored from memory directly, the spilling/restoring of that SGPR pair reuses the regular SGPR spilling support but without spilling it into memory. As that spilling and restoring points are fully controlled, we only need to spill that SGPR into the temporary VGPR, which needs spilling into its emergency slot. - The target-specific hook is revised to take additional restore block, where the restoring code is filled. After that, the relaxation will place that restore block directly before the destination block and insert an unconditional branch in any fall-through block into the destination block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106449	2021-10-27 18:37:27 -04:00
Ben Langmuir	3d13ee2891	[ORC][ORC-RT] Enable the MachO platform for arm64 Enables the arm64 MachO platform, adds basic tests, and implements the missing TLV relocations and runtime wrapper function. The TLV relocations are just handled as GOT accesses. rdar://84671534 Differential Revision: https://reviews.llvm.org/D112656	2021-10-27 13:36:03 -07:00
Nikita Popov	ea7be26045	[ConstantRange] Optimize smul_sat() (NFC) Base the implementation on the APInt smul_sat() implementation, which is much more efficient than performing calculations in double the bitwidth.	2021-10-27 21:01:09 +02:00
Nikita Popov	665060ea45	[BasicAA] Remove misleading overflow check GEP decomposition currently checks whether the multiplication of the linear expression offset and GEP scale overflows. However, if everything else works correctly, this overflow check is both unnecessary and dangerously misleading. While it will avoid an overflow in Scale * Offset in particular, other parts of the calculation (including those on dynamic values) may still overflow. The code working on the decomposed GEPs is responsible for ensuring that it remains correct in the presence of overflow. D112611 fixes the last issue of that kind that I'm aware of (in fact, the overflow check was originally introduced to work around precisely that issue). Differential Revision: https://reviews.llvm.org/D112618	2021-10-27 20:56:03 +02:00
Nick Desaulniers	3ccd041af9	[LowerTypeTests] Emit cfi_jt aliases regardless of function export A constant complaint we get is that the __typeid__ symbols in the CFI jump tables causes confusing stack traces in applications. Emit the more readable cfi_jt aliases regardless of function export (LTO vs Thin LTO). Reviewed By: pcc, tejohnson Differential Revision: https://reviews.llvm.org/D107934	2021-10-27 11:36:26 -07:00
Philip Reames	425cbbc602	[Operator] Add hasPoisonGeneratingFlags [mostly NFC] This method parallels the dropPoisonGeneratingFlags on Instruction, but is hoisted to operator to handle constant expressions as well. This is mostly code movement, but I did go ahead and add the inrange constexpr gep case. This had been discussed previously, but apparently never followed up o.	2021-10-27 11:25:40 -07:00
Alexey Bataev	f06e332982	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `64d1617d18` to fix test non-stability.	2021-10-27 11:16:58 -07:00
Roman Lebedev	156f10c840	[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ While generally i don't like such hacks, we have a very good reason to do this: here we are expanding a run-time correctness check for the vectorization, and said `umul_with_overflow` will not be optimized out before we query the cost of the checks we've generated. Which means, the cost of run-time checks would be artificially inflated, and after https://reviews.llvm.org/D109368 that will affect the minimal trip count for which these checks are even evaluated. And if they aren't even evaluated, then the vectorized code certainly won't be run. We could consider doing this in IRBuilder, but then we'd need to also teach `CreateExtractValue()` to look into chain of `insertvalue`'s, and i'm not sure there's precedent for that. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 19:45:55 +03:00
Kazu Hirata	593451bd3c	[X86] Remove getSETOpc (NFC) This function seems to be unused for at least one year.	2021-10-27 09:22:31 -07:00
Kazu Hirata	e6b6190ead	[X86] Remove NeedsRetpoline in X86AsmPrinter (NFC) This field seems to be unused for at least one year.	2021-10-27 09:22:29 -07:00
Kazu Hirata	cc73310a81	[X86] Remove CallOperand in X86Operand (NFC) This field seems to be unused for at least one year.	2021-10-27 09:22:27 -07:00
Alexey Bataev	64d1617d18	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 08:49:13 -07:00
David Sherwood	5d9318638e	[NFC][LoopVectorize] Change getStepVector to take a Value* for the StartIdx This patch changes the definition of getStepVector from: Value getStepVector(Value Val, int StartIdx, Value Step, ... to Value getStepVector(Value Val, Value StartIdx, Value Step, ... because: 1. it seems inconsistent to pass some values as Value and some as integer, and 2. future work will require the StartIdx to be an expression made up of runtime calculations of the VF. In widenIntOrFpInduction I've changed the code to pass in the value returned from getRuntimeVF, but the presence of the assert: assert(!VF.isScalable() && "scalable vectors not yet supported."); means that currently this code path is only exercised for fixed-width VFs and so the patch is still NFC. Differential revision: https://reviews.llvm.org/D111882	2021-10-27 16:12:38 +01:00
Roman Lebedev	ab1dbcecd6	[IR] `IRBuilderBase::CreateSelect()`: if cond is a constant i1, short-circuit While we could emit such a tautological `select`, it will stick around until the next instsimplify invocation, which may happen after we count the cost of this redundant `select`. Which is precisely what happens with loop vectorization legality checks, and that artificially increases the cost of said checks, which is bad. There is prior art for this in `IRBuilderBase::CreateAnd()`/`IRBuilderBase::CreateOr()`. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 18:01:05 +03:00
Alexey Bataev	9b12975cbf	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `f719b794bc` to fix instability in tests.	2021-10-27 07:31:36 -07:00
Kerry McLaughlin	f01fafdcd4	[SVE][CodeGen] Fix incorrect legalisation of zero-extended masked loads PromoteIntRes_MLOAD always sets the extension type to `EXTLOAD`, which results in a sign-extended load. If the type returned by getExtensionType() for the load being promoted is something other than `NON_EXTLOAD`, we should instead pass this to getMaskedLoad() as the extension type. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D112320	2021-10-27 14:15:41 +01:00
Alexey Bataev	f719b794bc	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 06:08:40 -07:00
Caroline Concatto	1137b7207d	[SelectionDAG] Widening the result of INSERT_SUBVECTOR. Widens the result and first input vector because they have the same size. The subvector to be inserted is widened in the operand widen function. Differential Revision: https://reviews.llvm.org/D112187	2021-10-27 13:52:25 +01:00
Nikita Popov	fbc0c308d5	[BasicAA] Handle known bits as ranges BasicAA currently tries to determine that the offset is positive by checking whether all variable indices are positive based on known bits, multiplied by a positive scale. However, this is incorrect if the scale multiplication might overflow. In the modified test case the original value is positive, but may be negative after a left shift. Fix this by converting known bits into a constant range and reusing the range-based logic, which handles overflow correctly. Differential Revision: https://reviews.llvm.org/D112611	2021-10-27 14:41:31 +02:00
Daniel Kiss	894ddba1c9	Revert "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume." This reverts commit `da1d1a0869`.	2021-10-27 14:29:35 +02:00
Sanjay Patel	6c0a2c2804	[x86] enhance mayFoldLoad to check alignment As noted in D112464, a pre-AVX target may not be able to fold an under-aligned vector load into another op, so we shouldn't report that as a load folding candidate. I only found one caller where this would make a difference -- combineCommutableSHUFP() -- so that's where I added a test to show the (minor) regression. Differential Revision: https://reviews.llvm.org/D112545	2021-10-27 07:54:25 -04:00
Matt	fc28a2f8ce	[AArch64][SVE] Combine predicated FMUL/FADD into FMA Combine FADD and FMUL intrinsics into FMA when the result of the FMUL is an FADD operand with one only use and both use the same predicate. Differential Revision: https://reviews.llvm.org/D111638	2021-10-27 11:41:23 +00:00
Alexandros Lamprineas	8689f5e6e7	[AArch64] Add support for the 'R' architecture profile. This change introduces subtarget features to predicate certain instructions and system registers that are available only on 'A' profile targets. Those features are not present when targeting a generic CPU, which is the default processor. In other words the generic CPU now means the intersection of 'A' and 'R' profiles. To maintain backwards compatibility we enable the features that correspond to -march=armv8-a when the architecture is not explicitly specified on the command line. References: https://developer.arm.com/documentation/ddi0600/latest Differential Revision: https://reviews.llvm.org/D110065	2021-10-27 12:32:30 +01:00
Alexey Bataev	cb4feae7bd	[SLP]Fix logical and/or reductions. Need to emit select(cmp) instructions for poison-safe forms of select ops. Currently alive reports that `Target is more poisonous than source` for operations we generating for such instructions. https://alive2.llvm.org/ce/z/FiNiAA Differential Revision: https://reviews.llvm.org/D112562	2021-10-27 04:25:20 -07:00
Nikita Popov	9bc7e543b4	[BasicAA] Make range check more precise Make the range check more precise by calculating the range of potentially accessed bytes for both accesses and checking whether their intersection is empty. In that case there can be no overlap between the accesses and the result is NoAlias. This is more powerful than the previous approach, because it can deal with sign-wrapped ranges. In the test case the original range is [-1, INT_MAX] but becomes [0, INT_MIN] after applying the offset. This is a wrapping range, so getSignedMin/getSignedMax will treat it as a full range. However, the range excludes the elements [INT_MIN+1, -1], which is enough to prove NoAlias with an access at offset -1. Differential Revision: https://reviews.llvm.org/D112486	2021-10-27 12:40:58 +02:00
Jay Foad	b9e3af124b	[LiveInterval] Add RemoveDeadValNo argument to removeSegment(iterator) Add an optional bool RemoveDeadValNo argument to the removeSegment(iterator) overload, for consistency with the other overloads. This gives clients a way to remove dead valnos while also getting an updated iterator returned (in the manner of vector::erase). Use this to clean up some inefficient code in LiveIntervals::repairOldRegInRange. NFC. Differential Revision: https://reviews.llvm.org/D110560	2021-10-27 09:43:32 +01:00
Daniel Kiss	da1d1a0869	[ARM] __cxa_end_cleanup should be called instead of _UnwindResume. ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup. It will call the UnwindResume. __cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called. This will trigger a termination when a foreign exception is processed while UnwindResume is called because the global state will be wrong due to the missing __cxa_end_cleanup call. Additional test here: D109856 [1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions Reviewed By: logan Differential Revision: https://reviews.llvm.org/D111703	2021-10-27 10:40:00 +02:00
David Sherwood	3d706c20f8	[NFC][LoopVectorize] Remove setBestPlan in favour of getBestPlanFor I have removed LoopVectorizationPlanner::setBestPlan, since this function is quite aggressive because it deletes all other plans except the one containing the <VF,UF> pair required. The code is currently written to assume that all <VF,UF> pairs will live in the same vplan. This is overly restrictive, since scalable VFs live in different plans to fixed-width VFS. When we add support for vectorising epilogue loops when the main loop uses scalable vectors then we will the vplan for the main loop will be different to the epilogue. Instead I have added a new function called LoopVectorizationPlanner::getBestPlanFor that returns the best vplan for the <VF,UF> pair requested and leaves all the vplans untouched. We then pass this best vplan to LoopVectorizationPlanner::executePlan which now takes an additional VPlanPtr argument. Differential revision: https://reviews.llvm.org/D111125	2021-10-27 09:38:27 +01:00
Arthur Eubanks	ae27c57b18	[InferAddressSpaces] Make pass work with opaque pointers Avoid getPointerElementType().	2021-10-26 23:53:20 -07:00
Kazu Hirata	6af3e87d2d	[Hexagon] Remove set-but-unused variables (NFC)	2021-10-26 23:38:15 -07:00
Phoebe Wang	eb55c1f153	[X86][NFC] Add the missed `break;` for `79f9dfef0d`	2021-10-27 13:58:31 +08:00
Craig Topper	2783a5cfaf	[RISCV] Add ICmp and FCmp to shouldSinkOperands.	2021-10-26 22:23:54 -07:00
Lang Hames	91434d4469	[JITLink] Fix element-present check in MachOLinkGraphParser. Not all symbols are added to the index-to-symbol map, so we shouldn't use the size of the map as a proxy for the highest valid index.	2021-10-26 20:48:40 -07:00
Lang Hames	bfb40e83ee	[ORC] Don't try to perform empty deallocations.	2021-10-26 20:48:40 -07:00
Max Kazantsev	5961f0308f	[SCEV][NFC] Verify intergity of SCEVUsers Make sure that, for every living SCEV, we have all its direct operand tracking it as their user. Differential Revision: https://reviews.llvm.org/D112402 Reviewed By: reames	2021-10-27 09:54:49 +07:00
Ben Shi	97e52e1c35	[RISCV] Optimize immediate materialisation with SLLI.UW in the Zba extension Simplify "LUI+SLLI+ADDI+SLLI" and "LUI+ADDIW+SLLI+ADDI+SLLI" to "LUI+ADDIW+SLLIUW" to reduce total instruction amount. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111933	2021-10-27 02:48:38 +00:00
David Blaikie	3ac709b6ce	llvm-dwarfdump --verify: Exit non-zero on simplified template name rebuilding failures	2021-10-26 15:57:16 -07:00
Austin Kerbow	02e60f2e77	[AMDGPU] Use max waves for scheduler's initial occupancy target The scheduler should set critical/excess register usage thresholds that are guided by the maximum possible occupancy for the function. This change is focused on setting proper lower bounds on register usage which we would typically only see when a specific number of maximum waves is requested with the "waves-per-eu" attribute, or by setting "amdgpu-num-vgpr\|sgpr" directly. This was broken previously. I have a follow-on patch that will address issues with the scheduler not targeting correct upper bounds on register usage which is typical with launch bounds and min "waves-per-eu". Changes by this patch: Set the initial critical register usage thresholds to minimum values that are determined by the maximum possible occupancy for the function, or the number of allocatable registers, whichever is lower. Avoid unisgned overflow if register limits are lower than the register tracking "ErrorMargin", I.e. when using stress-regalloc=2. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D112373	2021-10-26 15:30:26 -07:00
Yuanfang Chen	7c3fa52785	[DebugInfo] Skip ODRUniquing for mismatched tags Otherwise, ODRUniquing would map some member method/variable MDNodes to have enum type DIScope, resulting in invalid debug info and bad DWARF. - Add a Verifier check that when a 'scope:' operand is an ODR type that is not an enum. - Makes ODRUniquing apply to only ODR types with the same tag so that the debuginfo/DWARF is well-formed. Reviewed By: probinson, aprantl Differential Revision: https://reviews.llvm.org/D111770	2021-10-26 15:28:25 -07:00
Sanjay Patel	acabad9ff6	[InstCombine] try to canonicalize icmp with trunc op into mask and cmp The motivating test is based on: https://llvm.org/PR52260 We have better analysis for X == 0, so try harder to form that.	2021-10-26 17:43:28 -04:00
Fangrui Song	226465efe3	[ARC] Fix `undefined symbol: llvm::MachineFunction::dump() const`	2021-10-26 11:44:18 -07:00
Usman Nadeem	da1318ccca	[NFC][Instcombine] Cleanup some obsolete matches in visitSelectInstr These are now redundant after https://reviews.llvm.org/D106872 Change-Id: I82edfedf1d45cac4e3368d77ce3a48c78e342c19	2021-10-26 10:07:08 -07:00
Rosie Sumpter	b716d0aa94	[LoopVectorize] Clean up VPReductionRecipe::execute. NFC Use RdxDesc->getOpcode instead of getUnderlingInstr()->getOpcode. Move the code which finds Kind and IsOrdered to be outside the for loop since neither of these change with the vector part. Differential Revision: https://reviews.llvm.org/D112547	2021-10-26 17:18:25 +01:00
Kazu Hirata	c3e698e2f5	[CodeGen, Hexagon] Use MachineBasicBlock::phis (NFC)	2021-10-26 09:01:29 -07:00
Jonas Paulsson	bb506938be	[SystemZ] Improvement of emitMemMemWrapper() It was discovered that an extra register COPY remained when expanding a (variable length) memory operation with a loop and there was another use of the involved address register(s) afterwards. A simple fix for this is to COPY the address registers before the loop and use that new vreg instead. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D112065	2021-10-26 17:03:01 +02:00
Alexey Bataev	ce14d1b690	[SLP]Do not reorder reduction nodes. The final reduction nodes should not be reordered, the order does not matter for reductions. Also, it might be profitable to vectorize smaller reduction trees, reduction cost may compensate small tree cost. Part of D111574 Differential Revision: https://reviews.llvm.org/D112467	2021-10-26 07:41:24 -07:00
zhijian	158083f0de	[AIX][XCOFF] parsing xcoff object file auxiliary header Summary: The patch supports parsing the xcoff object file auxiliary header with llvm-readobj with option "auxiliary-headers" the format of auxiliary header as https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/filesreference/XCOFF.html#XCOFF__fyovh386shar Reviewers: James Henderson, Jason Liu, Hubert Tong, Esme yi, Sean Fertile. Differential Revision: https://reviews.llvm.org/D82549	2021-10-26 10:40:25 -04:00
Neubauer, Sebastian	eb16570ab0	[AMDGPU] Remove unused CSR defs CSR_AMDGPU_VGPRs_24_255 and CSR_AMDGPU_VGPRs_32_255 are not used anywhere, so remove them. Differential Revision: https://reviews.llvm.org/D112535	2021-10-26 16:01:49 +02:00
Abinav Puthan Purayil	61e3b9fefe	[AMDGPU] Add constrained shift pattern matches. The motivation for this is due to clang's conformance to https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#operators-shift which makes clang emit (<shift> a, (and b, <width> - 1)) for `a <shift> b` in OpenCL where a is an int of bit width <width>. Differential revision: https://reviews.llvm.org/D110231	2021-10-26 19:07:19 +05:30
Chen Zheng	631f44f338	[PowerPC] use right extend type for SCEV Fix an issue caused by D108750 Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D112502	2021-10-26 13:32:03 +00:00
Abinav Puthan Purayil	781dd39b7b	[AMDGPU] Enable 48-bit mul in AMDGPUCodeGenPrepare. We were bailing out of creating 24-bit muls for results wider than 32 bits in AMDGPUCodeGenPrepare. With the 24-bit mulhi intrinsic, this change teaches AMDGPUCodeGenPrepare to generate the 48-bit mul correctly. Differential Revision: https://reviews.llvm.org/D112395	2021-10-26 18:53:07 +05:30
Abinav Puthan Purayil	9bd5cfeb1f	[AMDGPU] Implement llvm.amdgcn.mulhi.[i,u]24 intrinsics. These intrinsics maps to the 24-bit v_mul_hi instructions. This change also fixes an incorrect assumption on the associativity of 24-bit mulhi in its SDNode record in tblgen. Differential Revision: https://reviews.llvm.org/D112394	2021-10-26 18:53:07 +05:30
Sanjay Patel	2ab0148c14	[x86] use cast instead of dyn_cast for unchecked usage; NFC This was noted as an independent clean-up in D112464.	2021-10-26 08:20:19 -04:00
Neubauer, Sebastian	487f15603e	[AMDGPU] Fix setcc combine for i128 The combine asserted if constants could not be represented as uint64_t. Use APInts to fix this. Differential Revision: https://reviews.llvm.org/D112416	2021-10-26 13:39:50 +02:00
Jay Foad	c8e5aef1a0	[AMDGPU] Use standard MachineBasicBlock::getFallThrough method. NFCI. Differential Revision: https://reviews.llvm.org/D101825	2021-10-26 12:07:54 +01:00
Jonas Paulsson	9f8872779a	[SystemZ] Provide size values for PATCHPOINT, STACKMAP and FENTRY_CALL. All instructions must have a correct size value close to emission when SystemZLongBranch runs, or a necessary branch relaxation may be missed. This patch also adds an assert for instruction sizes in SystemZLongBranch. Review: Ulrich Weigand	2021-10-26 12:07:22 +02:00
Nikita Popov	11a8423dab	[SCEV] Use reverse() (NFC)	2021-10-26 11:08:58 +02:00
Max Kazantsev	9bbfe0f72c	[NFC] Remove obsolete simplifyOnceImpl function The function simplifyOnce only calls simplifyOnceImpl and does nothing else. Having this separate helper makes no sense. Removing it. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D112517 Reviewed By: mkazantsev	2021-10-26 13:51:42 +07:00
Max Kazantsev	d4c74cd4e8	[NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. Previously we stored exits' IDoms in a map before peeling a loop and then, after peeling off one iteration, we changed their IDoms. Now we use the same logic not only for exits but for all non-loop blocks dominated by the loop. So when we add logic to support peeling loops with exits which branch, for example, to an unreachable-terminated block, we would update the IDoms not only for exits, but for their successors. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev, nikic	2021-10-26 13:09:07 +07:00
Phoebe Wang	79f9dfef0d	[X86] Move splat addends from the gather/scatter index operand to the base address This can avoid a vector add and a constant pool load. Or an explicit broadcast in case of non-constant. Also reverse the transform any time we encounter a constant index addend that can't be moved to base. In that case pull the constant from base into the index. This reduces code size needed for the displacement since we needed the index add anyway. Limit this to scale of 1 to avoid divisibility and wrap issues. Authored by Craig. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111595	2021-10-26 12:35:57 +08:00
Duncan P. N. Exon Smith	b12a864c29	Bitcode: Use Expected<T>::takeError() and moveInto() more, NFC Avoid naming some Expected<T> values in the Bitcode reader by using takeError() and moveInto() more often. This follows the smaller set of changes included in `2410fb4616`.	2021-10-25 16:03:40 -07:00
Zarko Todorovski	e9163660b1	[PPC][LLVM] Inclusive terms: remove references to sanity check in lib/Target/PowerPC Removed references to `sanity check` in `PPCBranchCoalescing.cpp` code comments. No word substitution made in this case, as the comments and code following illustrated are sufficient IMO. Reviewed By: quinnp Differential Revision: https://reviews.llvm.org/D112452	2021-10-25 18:13:54 -04:00
Craig Topper	d51e3a2139	[LegalizeTypes][TargetLowering] Merge getShiftAmountTyForConstant into TargetLowering::getShiftAmountTy. getShiftAmountTyForConstant is a special helper that changes the shift amount to i32 if the type chosen by TargetLowering::getShiftAmountTy can't represent all possible values. This is needed to satisfy an assert in SelectionDAG::getNode. It requires additional consideration to know when this helper should be used. I'm not sure that we are always using it when we should. This patch merges the getShiftAmountTyForConstant handling into TargetLowering::getShiftAmountTy so we don't need to think about it anymore. Technically this may slightly increase compile times since the majority of callers of getShiftAmountTy won't need this. Hopefully, this isn't an issue in practice. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112469	2021-10-25 14:06:53 -07:00
Nikita Popov	3a995c918e	[SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander Always insert values into ExprValueMap, and instead skip using them in SCEVExpander if poison-generating flags have been lost. This ensures that all values that are in ValueExprMap are also in ExprValueMap, so we can use the latter to invalidate the former. This change is probably not entirely NFC for the case where originally the SCEV had no nowrap flags but they were inferred later, in which case that would now allow reusing the existing value for expansion. Differential Revision: https://reviews.llvm.org/D112389	2021-10-25 22:37:20 +02:00
Arthur Eubanks	4a9db7367d	[AlwaysInliner] Invalidate analyses when we delete functions Fixes PR52292. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D112473	2021-10-25 13:36:32 -07:00
Zarko Todorovski	9769e97c35	[LLVM] Inclusive terms: remove/replace references to sanity in RewriteStatepointsForGC.cpp and test Part of work to have the LLVM backend to use more inclusive terms. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D112461	2021-10-25 16:17:41 -04:00
Stefan Gränitz	cdb335ffaf	[JITLink] Fix warning 'shift count exceeds width' in AArch64 backend	2021-10-25 20:44:07 +02:00
Jeremy Morse	4136897bd4	[DebugInfo][InstrRef][NFC] Switch to using DenseMaps and similar There are a few STL containers hanging around that can become DenseMaps, SmallVectors and similar. This recovers a modest amount of compile time performance. While I'm here, adjust the bit layout of ValueIDNum: this was always supposed to act like a value type, however it seems that clang doesn't compile the comparison functions to act that way. Add a uint64_t to a union that explicitly aliases the bitfields, so that we can compare the whole value as a single integer. Differential Revision: https://reviews.llvm.org/D112333	2021-10-25 18:07:17 +01:00
Wouter van Oortmerssen	5694dbccc3	[WebAssembly] support Memory64 in target_features section Differential Revision: https://reviews.llvm.org/D112266	2021-10-25 09:31:45 -07:00
Jeremy Morse	97ddf49e43	[DebugInfo][InstrRef] Recover stack-slot tracking performance This patch is like D111627 -- instead of calculating IDF for every location on the stack, only do it for the smallest units of interference, and copy the PHIs for those units to any aliases. The test added runs placeMLocPHIs directly, and tests that: * A def of the lower 8 bits of a stack slot causes all aliasing regs to have PHIs placed, * It doesn't cause the equivalent location to x86's $ah, which isn't aliased, to have a PHI placed. Differential Revision: https://reviews.llvm.org/D112324	2021-10-25 17:31:09 +01:00
Philip Reames	f82cf6187f	[indvars] Fix pr52276 (missing one use check) The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test). Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit. Restrict ourselves to case where we have a single use.	2021-10-25 09:26:55 -07:00
Craig Topper	e2b7aabb57	[RISCV] Reduce the number of RISCV vector builtins by an order of magnitude. All but 2 of the vector builtins are only used by clang_builtin_alias. When using clang_builtin_alias, the type string of the builtin is never checked. Only the types in the function definition used for the alias are checked. This patch takes advantage of this to share a single builtin for many different types. We already used type overloads on the IR intrinsic so the codegen for the builtins that are being merge were already the same. This extends the type overloading to the builtins. I had to make a few tweaks to make this work. -Floating point vector-vector vmerge now uses the vmerge intrinsic instead of the vfmerge intrinsic. New isel patterns and tests are added to support this. -The SemaChecking for the immediate of vset_v/vget_v has been removed. Determining the valid range is harder now. I've added masking to ManualCodegen to ensure valid IR for invalid input. This reduces the number of builtins from ~25000 to ~1100. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D112102	2021-10-25 09:03:59 -07:00
Craig Topper	210b586a85	[RISCV] Add vcsr CSR name for V extension. Reviewed By: frasercrmck, kito-cheng Differential Revision: https://reviews.llvm.org/D112342	2021-10-25 08:56:25 -07:00
Danila Malyutin	7b102fcc91	[CodeGen] Fix dependence breaking for tied operands Differential Revision: https://reviews.llvm.org/D107582	2021-10-25 18:52:27 +03:00
Jeremy Morse	ee3eee71e4	[DebugInfo][InstrRef] Track values fused into stack spills During register allocation, some instructions can have stack spills fused into them. It means that when vregs are allocated on the stack we can convert: SETCCr %0 DBG_VALUE %0 to SETCCm %stack.0 DBG_VALUE %stack.0 Unfortunately instruction referencing finds this harder: a store to the stack doesn't have a specific operand number, therefore we don't substitute the old operand for a new operand, and the location is dropped. This patch implements a solution: just recognise the memory operand attached to an instruction with a Special Number (TM), and record a substitution between the old value and the new one. This patch adds substitution code to InlineSpiller to record such fused spills, and tracking in InstrRefBasedLDV to recognise such values, and produce the value numbers for them. Everything to do with the movement of stack-defined values is already handled in InstrRefBasedLDV. Differential Revision: https://reviews.llvm.org/D111317	2021-10-25 15:14:53 +01:00
Danila Malyutin	2d9ee590b6	[AArch64] Handle ST1iN instructions in isAArch64FrameOffsetLegal Before the code would crash with "unhandled opcode in isAArch64FrameOffsetLegal" when there was a spill from extractelement. Fixes pr52249 Differential Revision: https://reviews.llvm.org/D112311	2021-10-25 17:05:12 +03:00
Nikita Popov	0d20ebf686	[BasicAA] Use ranges for more than one index D109746 made BasicAA use range information to determine the minimum/maximum GEP offset. However, it was limited to the case of a single variable index. This patch extends support to multiple indices by adding all the ranges together. Differential Revision: https://reviews.llvm.org/D112378	2021-10-25 15:30:50 +02:00
Alexey Bataev	eb9b75dd4d	[SLP]Change the order of the reduction/binops args pair vectorization attempts. Need to change the order of the reduction/binops args pair vectorization attempts. Need to try to find the reduction at first and postpone vectorization of binops args. This may help to find more reduction patterns and vectorize them. Part of D111574. Differential Revision: https://reviews.llvm.org/D112224	2021-10-25 06:27:14 -07:00
Jeremy Morse	2eb96e1711	[DebugInfo][NFC] Avoid a use-after-free This patch swaps two lines -- the CurSucc reference can be invalidated by the call to DFS.push_back, therefore that should happen last. The usual hat-tip to asan for catching this. This patch also swaps an ealier call to ToAdd.insert and DFS.push_back, where a stable iterator (from successors()) is being used. This isn't strictly necessary, but is good for consistency and avoiding readers asking themselves why the two code portions have a different order.	2021-10-25 14:16:30 +01:00
Sanjay Patel	6e46b66e2a	[DAGCombiner] make matching bit-hack form of usubsat more flexible (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 As suggested in D112085, we can substitute 'xor' with 'add' in this pattern, and it is logically equivalent: https://alive2.llvm.org/ce/z/eJtWWC We canonicalize to 'xor' in IR, but SDAG does not do that (and it probably should not - https://llvm.org/PR52267 ), so it is possible to see either pattern in codegen. Note that 'sub' is a another potential pattern, but that is canonicalized to 'add' in DAGCombiner, so we don't need to worry about that variation. Differential Revision: https://reviews.llvm.org/D112377	2021-10-25 09:01:52 -04:00
Tim Northover	f9089accba	CodeGenPrep: remove all copies of GEP from list if there are duplicates. Unfortunately ToT has changed enough from the revision where this actually caused problems that the test no longer triggers an assertion failure.	2021-10-25 14:00:02 +01:00
Kerry McLaughlin	1f49b71fe5	[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt This patch enables the use of reciprocal estimates for SVE when both the -Ofast and -mrecip flags are used. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D111657	2021-10-25 11:30:44 +01:00
Max Kazantsev	a9b0776a81	[SimplifyCFG] Sanity assert in iterativelySimplifyCFG We observe a hang within iterativelySimplifyCFG due to infinite loop execution. Currently, there is no limit to this loop, so in case of bug it just works forever. This patch adds an assert that will break it after 1000 iterations if it didn't converge.	2021-10-25 17:10:17 +07:00
Nikita Popov	75384ecdf8	[InstSimplify] Refactor invariant.group load folding Currently strip.invariant/launder.invariant are handled by constructing constant expressions with the intrinsics skipped. This takes an alternative approach of accumulating the offset using stripAndAccumulateConstantOffsets(), with a flag to look through invariant.group intrinsics. Differential Revision: https://reviews.llvm.org/D112382	2021-10-25 10:56:25 +02:00
Florian Hahn	a6c4969f5f	[VPlan] Do not create dummy entry block (NFC). At the moment a dummy entry block is created at the beginning of VPlan construction. This dummy block is later removed again. This means it is not easy to identify the VPlan header block in a general fashion, because during recipe creation it is the single successor of the entry block, while later it is the entry block. To make getting the header easier, just skip creating the dummy block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111299	2021-10-25 09:52:58 +01:00
Jingu Kang	a502436259	[AArch64] Remove redundant ORRWrs which is generated by zero-extend %3:gpr32 = ORRWrs $wzr, %2, 0 %4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32 If AArch64's 32-bit form of instruction defines the source operand of ORRWrs, we can remove the ORRWrs because the upper 32 bits of the source operand are set to zero. Differential Revision: https://reviews.llvm.org/D110841	2021-10-25 09:47:07 +01:00
Nikita Popov	477551fd09	[SCEVExpander] Minor cleanup in value reuse (NFC) Use dyn_cast_or_null and convert one of the checks into an assertion. SCEV is a per-function analysis.	2021-10-25 10:32:17 +02:00
Kazu Hirata	3729a5abf4	[SCEV] Fix a warning on an unused lambda capture This patch fixes: llvm/lib/Analysis/ScalarEvolution.cpp:12770:37: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]	2021-10-25 00:45:18 -07:00
Max Kazantsev	f8623b0783	[SCEV][NFC] Win some compile time from mass forgetMemoizedResults Mass forgetMemoizedResults can be done more efficiently than bunch of individual invocations of helper because we can traverse maps being updated just once, rather than doing this for each invidivual SCEV. Should be NFC and supposedly improves compile time. Differential Revision: https://reviews.llvm.org/D112294 Reviewed By: reames	2021-10-25 14:09:41 +07:00
Max Kazantsev	dbab339ea4	[SCEV][NFC] Apply mass forgetMemoizedResults queries where possible When forgetting multiple SCEVs, rather than doing this one by one, we can instead use mass updates. We plan to make them more efficient than they are now, potentially improving compile time. Differential Revision: https://reviews.llvm.org/D111602 Reviewed By: reames	2021-10-25 13:50:49 +07:00
Max Kazantsev	a6096b7f9e	[SCEV][NFC] Introduce API for mass forgetMemoizedResults query This patch changes signature of forgetMemoizedResults to be able to work with multiple SCEVs. Usage will come in follow-ups. We also plan to optimize it in the future to work faster than individual invalidation updates. Should not change behavior in any sense. Split-off from D111602. Differential Revision: https://reviews.llvm.org/D112293 Reviewed By: reames	2021-10-25 13:49:31 +07:00
Max Kazantsev	1c18ebb2cc	[NFC][SCEV] Do not track users of SCEVConstants Follow-up from D112295, suggested by Nikita: we can avoid tracking users of SCEVConstants because dropping their cached info is unlikely to give any new prospects for fact inference, and it should not introduce any correctness problems.	2021-10-25 12:30:46 +07:00
Max Kazantsev	fea4a48c0b	[SCEV][NFC] API for tracking of SCEV users This patch introduces API that keeps track of SCEVs users of another SCEVs, required to handle invalidations of users along with operands that comes in follow-up patches. Differential Revision: https://reviews.llvm.org/D112295 Reviewed By: reames	2021-10-25 12:14:18 +07:00
Chen Zheng	80e6aff6bb	[PowerPC] common chains to reuse offsets to reduce register pressure. Add a new preparation pattern in PPCLoopInstFormPrep pass to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D108750	2021-10-25 03:27:16 +00:00
Kazu Hirata	9800731367	[Target, Transforms] Use predecessors instead of pred_begin and pred_end (NFC)	2021-10-24 17:35:35 -07:00
Kazu Hirata	4bd46501c3	Use llvm::any_of and llvm::none_of (NFC)	2021-10-24 17:35:33 -07:00
Matthias Braun	4b75d674f8	X86InstrInfo: Look across basic blocks in optimizeCompareInstr This extends `optimizeCompareInstr` to continue the backwards search when it reached the beginning of a basic block. If there is a single predecessor block then we can just continue the search in that block and mark the EFLAGS register as live-in. Differential Revision: https://reviews.llvm.org/D110862	2021-10-24 16:22:45 -07:00
Matthias Braun	683994c863	X86InstrInfo: Refactor and cleanup optimizeCompareInstr This changes the first part of `optimizeCompareInstr` being split into a loop with a forward scan for cases that re-use zero flags from a producer in case of compare with zero and a backward scan for finding an instruction equivalent to a compare. The code now uses a single backward scan searching for the next instructions that reads or writes EFLAGS. Also: - Add comments giving examples for the 3 cases handled. - Check `MI` which contains the result of the zero-compare cases, instead of re-checking `IsCmpZero`. - Tweak coding style in some loops. - Add new MIR based tests that test the optimization in isolation. This also removes a check for flag readers in situations like this: ``` = SUB32rr %0, %1, implicit-def $eflags ... we no longer stop when there are $eflag users here CMP32rr %0, %1 ; will be removed ... ``` Differential Revision: https://reviews.llvm.org/D110857	2021-10-24 16:22:45 -07:00
Philip Reames	a461fa64bb	Treat branch on poison as immediate UB (under an off by default flag) The LangRef clearly states that branching on a undef or poison value is immediate undefined behavior, but historically, we have not been consistent about implementing that interpretation in the optimizer. Historically, we used (in some cases) a more relaxed model which essentially looked for provable UB along both paths which was control dependent on the condition. However, we've never been 100% consistent here. For instance SCEV uses the strong model for increments which form AddRecs (and only addrecs). At the moment, the last big blocker for finally making this switch is enabling the fix landed in D106041. Loop unswitching (in it's classic form) is incorrect as it creates many "branch on poisons" when unswitching conditions originally unreachable within the loop. This change adds a flag to value tracking which allows to easily test the optimization potential of treating branch on poison as immediate UB. It's intended to help ease work on getting us finally through this transition and avoid multiple independent rediscovers of the same issues. Differential Revision: https://reviews.llvm.org/D112026	2021-10-24 14:42:03 -07:00
Philip Reames	3c06ecaa1e	[instcombine] Fix oss-fuzz 39934 (mul matcher can match non-instruction) Fixes a crash observed by oss-fuzz in 39934. Issue at hand is that code expects a pattern match on m_Mul to imply the operand is a mul instruction, however mul constexprs are also valid here.	2021-10-24 14:42:03 -07:00
Fangrui Song	54405a49d8	[ARC] Fix -Wunused-variable. NFC	2021-10-24 10:31:44 -07:00
Kazu Hirata	1c35973c77	[llvm] Call *(Set\|Map)::erase directly (NFC) We can erase an item in a set or map without checking its membership first.	2021-10-24 09:32:59 -07:00
Simon Pilgrim	b09f2ee57c	[X86] findEltLoadSrc - fix shift amount variable name. NFCI. Fix the copy + paste, renaming shift amt from Idx to Amt	2021-10-23 21:24:37 +01:00
Nikita Popov	0c7f85d786	[InstSimplify] Simplify fetching of index size (NFC) Directly fetch the size instead of going through the index type first.	2021-10-23 22:08:15 +02:00
Nikita Popov	710596a1e1	[ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI) As this API is now internally offset-based, we can accept a starting offset and remove the need to create a temporary bitcast+gep sequence to perform an offset load. The API now mirrors the ConstantFoldLoadFromConst() API.	2021-10-23 17:59:39 +02:00
Kazu Hirata	d8e4170b0a	Ensure newlines at the end of files (NFC)	2021-10-23 08:45:29 -07:00
Kazu Hirata	d14d7068b6	[llvm] Use StringRef::contains (NFC)	2021-10-23 08:45:27 -07:00
Nikita Popov	c5b5b7f621	[ConstantFolding] Remove ConstantFoldLoadThroughGEPIndices() API (NFC) The last user of this API went away in `4f5e9a2bb2`.	2021-10-23 16:59:29 +02:00
Nikita Popov	4f5e9a2bb2	[SCEV] Remove computeLoadConstantCompareExitLimit() (NFCI) The functionality of this method is already covered by computeExitCountExhaustively() in a more general fashion. It was added at a time when exhaustive exit count calculation did not support constant folding loads yet. I double checked that dropping this code causes no binary changes in test-suite. Differential Revision: https://reviews.llvm.org/D112343	2021-10-23 15:34:25 +02:00
Jessica Clarke	2d8c18fbbd	[X86] Don't add implicit REP prefix to VIA PadLock xstore Commit `8fa3e8fa14` added an implicit REP prefix to all VIA PadLock instructions, but GNU as doesn't add one to xstore, only all the others. This resulted in a kernel panic regression in FreeBSD upon updating to LLVM 11 (https://bugs.freebsd.org/259218) which includes the commit in question. This partially reverts that commit. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112355	2021-10-23 01:57:17 +01:00
Nikita Popov	61cfdf636d	[BasicAA] Model implicit trunc of GEP indices GEP indices larger than the GEP index size are implicitly truncated to the index size. BasicAA currently doesn't model this, resulting in incorrect alias analysis results. Fix this by explicitly modelling truncation in CastedValue in the same way we do zext and sext. Additionally we need to disable a number of optimizations for truncated values, in particular "non-zero" and "non-equal" may no longer hold after truncation. I believe the constant offset heuristic is also not necessarily correct for truncated values, but wasn't able to come up with a test for that one. A possible followup here would be to use the new mechanism to model explicit trunc as well (which should be much more common, as it is the canonical form). This is straightforward, but omitted here to separate the correctness fix from the analysis improvement. (Side note: While I say "index size" above, BasicAA currently uses the pointer size instead. Something for another day...) Differential Revision: https://reviews.llvm.org/D110977	2021-10-22 23:47:02 +02:00
Matt Arsenault	ec57b37551	AMDGPU: Use attributor to propagate amdgpu-flat-work-group-size This can merge the acceptable ranges based on the call graph, rather than the simple application of the attribute. Remove the handling from the old pass.	2021-10-22 16:23:50 -04:00
Matt Arsenault	8d4b74ac3f	AMDGPU: Don't consider whether amdgpu-flat-work-group-size was set It should be semantically identical if it was set to the same value as the default. Also improve the documentation.	2021-10-22 16:23:50 -04:00
Craig Topper	cd824f9e39	[X86] Fix bad formatting. NFC	2021-10-22 13:16:35 -07:00
Duncan P. N. Exon Smith	2410fb4616	Support: Use Expected<T>::moveInto() in a few places These are some usage examples for `Expected<T>::moveInto()`. Differential Revision: https://reviews.llvm.org/D112280	2021-10-22 12:40:10 -07:00
Jay Foad	58e7ec471c	[AMDGPU] Run SIShrinkInstructions before post-RA scheduling Run post-RA SIShrinkInstructions just before post-RA scheduling, instead of afterwards. After the fixes in D112305 and D112317 this seems to make no difference, but it paves the way for scheduler tweaks that are sensitive to the e32 vs e64 encoding of VALU instructions. Differential Revision: https://reviews.llvm.org/D112341	2021-10-22 20:24:03 +01:00
Jay Foad	3f34f75a68	[AMDGPU] Fix latency for implicit vcc_lo operands on GFX10 wave32 As described in the comment, the way we change vcc to vcc_lo in these operands confuses addPhysRegDataDeps into treating them as implicit pseudo operands. Fix this by setting the correct latency from the SchedModel after addPhysRegDataDeps wrongly set it to 0. Differential Revision: https://reviews.llvm.org/D112317	2021-10-22 20:03:29 +01:00
Jay Foad	2915889d74	[ScheduleDAGInstrs] Call adjustSchedDependency in more cases This removes a condition and the corresponding FIXME comment, because the Hexagon assertion it refers to has apparently been fixed, probably by D76134. NFCI. This just gives targets the opportunity to adjust latencies that were set to 0 by the generic code because they involve "implicit pseudo" operands. Differential Revision: https://reviews.llvm.org/D112306	2021-10-22 20:03:29 +01:00
Jeremy Morse	e7084ceab3	[DebugInfo][Instr] Track subregisters across stack spills/restores Sometimes we generate code that writes to a subregister, then spills / restores a super-register to the stack, for example: $eax = MOV32ri 0 MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg This patch takes a different approach: it adds another index to MLocTracker that identifies a size/offset within a stack slot. A location on the stack is then a pari of {FrameIndex, SlotNum}. Spilling and restoring now involves pairing up the src/dest register numbers, and the dest/src stack position to be transferred to/from. Location coverage improves as a result, compile-time performance decreases, alas. One limitation is that if a PHI occurs inside a stack slot: DBG_PHI %stack.0, 1 We don't know how large the resulting value is, and so might have difficulty picking which value to use. DBG_PHI might need to be augmented in the future with such a size. Unit tests added ensure that spills and restores correctly transfer to positions in the Location => Value map, and that different register classes written to the stack will correctly clobber all other positions in the stack slot. Differential Revision: https://reviews.llvm.org/D112133	2021-10-22 19:20:55 +01:00
Craig Topper	93139a3c32	[LegalizeTypes] Only expand CTLZ/CTTZ/CTPOP during type promotion if the new type is legal. We might be promoting a large non-power of 2 type and the new type may need to be split. Once we split it we may have a ctlz/cttz/ctpop instruction for the split type. I'm also concerned that we may create large shifts with shift amounts that are too small.	2021-10-22 11:02:35 -07:00
Simon Pilgrim	a5f56342b0	[DAG] narrowExtractedVectorLoad - EXTRACT_SUBVECTOR indices are always constant EXTRACT_SUBVECTOR indices are always constant, we don't need to check for ConstantSDNode, we should just use getConstantOperandVal which will assert for the constant.	2021-10-22 18:32:14 +01:00
Philip Reames	412eb07edd	[indvars] Use fact loop must exit to canonicalize to unsigned conditions The logic in this patch is that if we find a comparison which would be unsigned except for when the loop is infinite, and we can prove that an infinite loop must be ill defined, we can still make the predicate unsigned. The eventual goal (combined with a follow on patch) is to use the fact the loop exits to remove the zext (see tests) entirely. A couple of points worth noting: * We loose the ability to prove the loop unreachable by committing to the must exit interpretation. If instead, we later proved that rhs was definitely outside the range required for finiteness, we could have killed the loop entirely. (We don't currently implement this transform, but could in theory, do so.) * simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts). D112170 explores fixing that, but - at least so far - appears to be too expensive compile time wise. Differential Revision: https://reviews.llvm.org/D111836	2021-10-22 10:31:36 -07:00
Jeremy Morse	d9eebe3cd7	[DebugInfo][InstrRef] Add unit tests for transfer-function building This patch adds some unit tests for the machine-location transfer-function building parts of InstrRefBasedLDV: i.e., test that if we feed some MIR into the transfer-function building code, does it create the correct transfer function. There are a number of minor defects that get corrected in the process: * The unit test was selecting the x86 (i.e. 32 bit) backend rather than x86_64's 64 bit backend, * COPY instructions weren't actually having their subregister values correctly represented in the transfer function. Subregisters were being defined by the COPY, rather than taking the value in the source register. * SP aliases were at risk of being clobbered, if an SP subregister was clobbered. Differential Revision: https://reviews.llvm.org/D112006	2021-10-22 18:29:03 +01:00
Craig Topper	04c184bba7	[TargetLowering] Simplify the interface of expandABS. NFC Instead of returning a bool to indicate success and a separate SDValue, return the SDValue and have the callers check if it is null. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112331	2021-10-22 10:22:23 -07:00
Nikita Popov	3a10fe2d89	[Loads] Use more powerful constant folding API This follows up on D111023 by exporting the generic "load value from constant at given offset as given type" and using it in the store to load forwarding code. We now need to make sure that the load size is smaller than the store size, previously this was implicitly ensured by ConstantFoldLoadThroughBitcast(). Differential Revision: https://reviews.llvm.org/D112260	2021-10-22 18:33:03 +02:00
Nikita Popov	5bb7562962	[Attributor] Generalize GEP construction Make use of the getGEPIndicesForOffset() helper for creating GEPs. This handles arrays as well, uses correct GEP index types and reduces code duplication. Differential Revision: https://reviews.llvm.org/D112263	2021-10-22 18:30:43 +02:00
Craig Topper	0766aef3f3	[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later. Expanding these requires multiple constants. If we promote during type legalization when they'll end up getting expanded in LegalizeDAG, we'll use larger constants. These constants may be harder to materialize. For example, 64-bit constants on 64-bit RISCV are very expensive. This is similar to what has already been done to BSWAP and BITREVERSE. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112268	2021-10-22 09:10:01 -07:00
Kazu Hirata	6fe949c4ed	[Target, Transforms] Use StringRef::contains (NFC)	2021-10-22 08:52:33 -07:00
Jonas Paulsson	12b44bf5ee	[SystemZ] Give the EXRL_Pseudo a size value of 6 bytes. This pseudo is expanded very late (AsmPrinter) and therefore has to have a correct size value, or the branch relaxation pass may make a wrong decision. Review: Ulrich Weigand	2021-10-22 17:38:51 +02:00
Bradley Smith	cfe22cd4ef	[AArch64][SVE] Add new ld<n> intrinsics that return a struct of vscale types This will allow us to reuse existing interleaved load logic in lowerInterleavedLoad that exists for neon types, but for SVE fixed types. The goal eventually will be to replace the existing ld<n> intriniscs with these, once a migration path has been sorted out. Differential Revision: https://reviews.llvm.org/D112078	2021-10-22 14:13:17 +00:00
Zarko Todorovski	0bd6a9f2d1	[clang/llvm] Inclusive language: replace segregate with separate	2021-10-22 09:59:35 -04:00
Roman Lebedev	8fac9e95ad	[X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live members By definition, interleaving load of stride N means: load NVF elements, and shuffle them into N VF-sized vectors, with 0'th vector containing elements `[0, VF)stride + 0`, and 1'th vector containing elements `[0, VF)stride + 1`. Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6) Now, not fully interleaved load, is when not all of these vectors is demanded. So at worst, we could just pretend that everything is demanded, and discard the non-demanded vectors. What this means is that the cost for not-fully-interleaved group should be not greater than the cost for the same fully-interleaved group, but perhaps somewhat less. Examples: https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4) https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2) https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1) Right now, for such not-fully-interleaved loads we just use the costs for fully-interleaved loads. But at least in general, that is obviously overly pessimistic, because in general*, not all the shuffles needed to perform the full interleaving will end up being live. So what this does, is naively scales the interleaving cost by the fraction of the live members. I believe this should still result in the right ballpark cost estimate, although it may be over/under -estimate. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112307	2021-10-22 16:33:58 +03:00
Jay Foad	74cd4dee20	[AMDGPU] Preserve deadness of vcc when shrinking instructions This doesn't have any effect on codegen now, but it might do in the future if we shrink instructions before post-RA scheduling, which is sensitive to live vs dead defs. Differential Revision: https://reviews.llvm.org/D112305	2021-10-22 14:22:24 +01:00
Simon Pilgrim	a750332d77	AMDGPULibCalls - constify some FuncInfo& arguments. NFCI.	2021-10-22 12:10:58 +01:00
Simon Pilgrim	99a64cc9da	AMDGPULibCalls::parseFunctionName - use reference instead of pointer. NFCI. parseFunctionName allowed a default null pointer, despite it being dereferenced immediately to be used as a reference and that all callers were taking the address of an existing reference. Fixes static analyzer warning about potential dereferenced nulls	2021-10-22 11:45:25 +01:00
Michał Górny	66e06cc8cb	[llvm] [ADT] Update llvm::Split() per Pavel Labath's suggestions Optimize the iterator comparison logic to compare Current.data() pointers. Use std::tie for assignments from std::pair. Replace the custom class with a function returning iterator_range. Differential Revision: https://reviews.llvm.org/D110535	2021-10-22 12:27:46 +02:00
Florian Hahn	d465315679	[LLVM-C]Add LLVMAddMetadataToInst, deprecated LLVMSetInstDebugLocation. IRBuilder has been updated to support preserving metdata in a more general manner. This patch adds `LLVMAddMetadataToInst` and deprecates `LLVMSetInstDebugLocation` in favor of the more general function. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D93454	2021-10-22 11:21:28 +01:00
Fraser Cormack	74c6895b39	[RISCV] Fix missing cross-block VSETVLI insertion This patch fixes a codegen bug, the test for which was introduced in D112223. When merging VSETVLIInfo across blocks, if the 'exit' VSETVLIInfo produced by a block is found to be compatible with the VSETVLIInfo computed as the intersection of the 'exit' VSETVLIInfo produced by the block's predecessors, that blocks' 'exit' info is discarded and the intersected value is taken in its place. However, we have one authority on what constitutes VSETVLIInfo compatibility and we are using it in two different contexts. Compatibility is used in one context to elide VSETVLIs between straight-line vector instructions. But compatibility when evaluated between two blocks' exit infos ignores any info produced inside each respective block before the exit points. As such it does not guarantee that a block will not produce a VSETVLI which is incompatible with the 'previous' block. As such, we must ensure that any merging of VSETVLIInfo is performed using some notion of "strict" compatibility. I've defined this as a full vtype match, but this is perhaps too pessimistic. Given that test coverage in this regard is lacking -- the only change is in the failing test -- I think this is a good starting point. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D112228	2021-10-22 10:45:10 +01:00
Chen Zheng	86a5c32616	[PowerPC] iterate on the SmallSet directly; NFC	2021-10-22 06:18:07 +00:00
Chen Zheng	13755436bb	[PowerPC] return early if there is no preparing candidate in the loop; NFC This is to improve compiling time. Differential Revision: https://reviews.llvm.org/D112196 Reviewed By: jsji	2021-10-22 05:39:51 +00:00
Chuanqi Xu	ddbf196194	[Coroutines] Ignore partial lifetime markers refer of an alloca When I playing with Coroutines, I found that it is possible to generate following IR: ``` %struct = alloca ... %sub.element = getelementptr %struct, i64 0, i64 index ; index is not %zero lifetime.marker.start(%sub.element) % use of %sub.element lifetime.marker.end(%sub.element) store %struct to xxx ; %struct is escaping! <suspend points> ``` Then the AllocaUseVisitor would collect the lifetime marker for sub.element and treat it as the lifetime markers of the alloca! So it judges that the alloca could be put on the stack instead of the frame by judging the lifetime markers only. The root cause for the bug is that AllocaUseVisitor collects wrong lifetime markers. This patch fixes this. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D112216	2021-10-22 09:49:50 +08:00
Vitaly Buka	b7ea298dfd	[msan] Don't use TLS slots of noundef args Transformations may strip the attribute from the argument, e.g. for unused, which will result in shadow offsets mismatch between caller and callee. Stripping noundef for used arguments can be a problem, as TLS is not going to be set by caller. However this is not the goal of the patch and I am not aware if that's even possible. Differential Revision: https://reviews.llvm.org/D112197	2021-10-21 18:35:12 -07:00
Stanislav Mekhanoshin	ca0c92d6a1	[AMDGPU] Allow to use a whole register file on gfx90a for VGPRs In a kernel which does not have calls or AGPR usage we can allocate the whole vector register budget for VGPRs and have no AGPRs as long as VGPRs stay addressable (i.e. below 256). Differential Revision: https://reviews.llvm.org/D111764	2021-10-21 18:24:34 -07:00
Luís Ferreira	2e97236aac	[Demangle] Rename OutputStream to OutputString This patch is a refactor to implement prepend afterwards. Since this changes a lot of files and to conform with guidelines, I will separate this from the implementation of prepend. Related to the discussion in https://reviews.llvm.org/D111414 , so please read it for more context. Reviewed By: #libc_abi, dblaikie, ldionne Differential Revision: https://reviews.llvm.org/D111947	2021-10-21 17:34:57 -07:00
Jack Anderson	d7733f8422	[DebugInfo] Expand ability to load 2-byte addresses in dwarf sections Some dwarf loaders in LLVM are hard-coded to only accept 4-byte and 8-byte address sizes. This patch generalizes acceptance into `DWARFContext::isAddressSizeSupported` and provides a common way to generate rejection errors. The MSP430 target has been given new tests to cover dwarf loading cases that previously failed due to 2-byte addresses. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D111953	2021-10-21 17:31:00 -07:00
Craig Topper	996123e5e8	[TargetLowering] Simplify the interface for expandCTPOP/expandCTLZ/expandCTTZ. There is no need to return a bool and have an SDValue output parameter. Just return the SDValue and let the caller check if it is null. I have another patch to add more callers of these so I thought I'd clean up the interface first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112267	2021-10-21 15:35:28 -07:00
Craig Topper	ff37b1105d	[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG. By expanding early it allows the shifts to be custom lowered in LegalizeVectorOps. Then a DAG combine is able to run on them before LegalizeDAG handles the BUILD_VECTORS for the masks used. v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16 shift. The BITREVERSE expansion applied an AND mask before SHL ops and after SRL ops. This was done to share the same mask constant for both shifts. It looks like this patch allows DAG combine to remove the AND mask added after v16i8 SHL by X86 lowering. This maintains the mask sharing that BITREVERSE was trying to achieve. Prior to this patch it looks like we kept the mask after the SHL instead which required an extra constant pool or a PANDN to invert it. This is dependent on D112248 because RISCV will end up scalarizing the BSWAP portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in LegalizeVectorOps first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112254	2021-10-21 15:23:23 -07:00
Craig Topper	458ed5fcc3	[TargetLowering][RISCV] Prevent scalarization of fixed vector bswap. It's better to do the ands, shifts, ors in the vector domain than to scalarize it and do those operations on each element. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112248	2021-10-21 14:34:01 -07:00
Jessica Paquette	5dc339d982	[AArch64][GlobalISel] Fold 64-bit cmps with 64-bit adds G_ICMP is selected to an arithmetic overflow op (ADDS/SUBS/etc) with a dead destination + a CSINC instruction. We have a fold which allows us to combine 32-bit adds with G_ICMP. The problem with G_ICMP is that we model it as always having a 32-bit destination even though it can be a 64-bit operation. So, we were missing some opportunities for 64-bit folds. This patch teaches the fold to recognize 64-bit G_ICMPs + refactors some of the code surrounding CSINC accordingly. (Later down the line, I think we should probably change the way we handle G_ICMP in general.) Differential Revision: https://reviews.llvm.org/D111088	2021-10-21 13:51:44 -07:00
Yonghong Song	0472e83ffc	BPF: emit BTF_KIND_DECL_TAG for typedef types If a typedef type has __attribute__((btf_decl_tag("str"))) with bpf target, emit BTF_KIND_DECL_TAG for that type in the BTF. Differential Revision: https://reviews.llvm.org/D112259	2021-10-21 12:09:42 -07:00
Nikita Popov	1848525842	[CodeMetrics] Don't require speculatability for ephemeral values As discussed in D112016, our current requirement of speculatability for ephemeral is overly strict: What we really care about is that the instruction will be DCEd once the assume is dropped. For that it is sufficient that the instruction is side-effect free and not a terminator. In particular, this allows non-dereferenceable loads to be ephemeral values. Differential Revision: https://reviews.llvm.org/D112179	2021-10-21 20:30:01 +02:00
Craig Topper	d55be79d75	[RISCV] Expand scalable vector CTTZ/CTLZ/CTPOP. Differential Revision: https://reviews.llvm.org/D112233	2021-10-21 10:50:04 -07:00
Arthur Eubanks	3781a46c3c	Revert "[IPT] Restructure cache to allow lazy update following invalidation [NFC]" This reverts commit `baea663a6e`. Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/77/builds/10715.	2021-10-21 10:48:41 -07:00
Sanjay Patel	66d22b4da4	[VectorCombine] fold shuffle-of-binops with common operand shuf (bo X, Y), (bo X, W) --> bo (shuf X), (shuf Y, W) This is motivated by an example in D111800 (although that patch avoids the problem for that particular example). The pattern is shown in reduced form with: https://llvm.org/PR52178 https://alive2.llvm.org/ce/z/d8zB4D There is no difference on the PhaseOrdering test from D111800 because the aarch64 cost model says that the shuffle cost is 3 while the fadd cost is 2. Differential Revision: https://reviews.llvm.org/D111901	2021-10-21 12:37:54 -04:00
Philip Reames	baea663a6e	[IPT] Restructure cache to allow lazy update following invalidation [NFC] This change restructures the cache used in IPT to point not to the first special instruction, but to the first instruction which could be special. That is, the cached reference is always equal to the first special, or comes before it in the block. This avoids expensive block scans when we are removing special instructions from the beginning of the block. At the moment, this case is not heavily used, though it does trigger in GVN when doing CSE of calls. The main motivation was a change I'm no longer planning to move forward with, but the cache optimization seemed worthwhile as a minor perf win at low cost. Differential Revision: https://reviews.llvm.org/D111768	2021-10-21 09:16:21 -07:00
Yonghong Song	f6811cec84	[DebugInfo] Support typedef with btf_decl_tag attributes Clang patch ([1]) added support for btf_decl_tag attributes with typedef types. This patch added llvm support including dwarf generation. For example, for typedef typedef unsigned * __u __attribute__((btf_decl_tag("tag1"))); __u u; the following shows llvm-dwarfdump result: 0x00000033: DW_TAG_typedef DW_AT_type (0x00000048 "unsigned int *") DW_AT_name ("__u") DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c") DW_AT_decl_line (1) 0x0000003e: DW_TAG_LLVM_annotation DW_AT_name ("btf_decl_tag") DW_AT_const_value ("tag1") 0x00000047: NULL [1] https://reviews.llvm.org/D110127 Differential Revision: https://reviews.llvm.org/D110129	2021-10-21 08:42:58 -07:00
Sanjay Patel	3888de9507	[InstCombine] generalize reassociated Demorgan folds This updates the recent D112108 / `b92412fb28` to handle the flipped logic ('or') sibling: https://alive2.llvm.org/ce/z/Y2L6Ch	2021-10-21 10:39:29 -04:00
Anirudh Prasad	aa3519f178	[SystemZ][z/OS] Initial implementation for lowerCall on z/OS - This patch provides the initial implementation for lowering a call on z/OS according to the XPLINK64 calling convention - A series of changes have been made to SystemZCallingConv.td to account for these additional XPLINK64 changes including adding a new helper function to shadow the stack along with allocation of a register wherever appropriate - For the cases of copying a f64 to a gr64 and a f128 / 128-bit vector type to a gr64, a `CCBitConvertToType` has been added and has been bitcasted appropriately in the lowering phase - Support for the ADA register (R5) will be provided in a later patch. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D111662	2021-10-21 09:48:59 -04:00
Sanjay Patel	d2198771e9	[DAGCombiner] fold bit-hack form of usubsat (i8 X ^ 128) & (i8 X s>> 7) --> usubsat X, 128 I haven't found a generalization of this identity: https://alive2.llvm.org/ce/z/_sriEQ Note: I was actually looking at the first form of the pattern in that link, but that's part of a long chain of potential missed transforms in codegen and IR....that I hope ends here! The predicates for when this is profitable are a bit tricky. This version of the patch excludes multi-use but includes custom lowering (as opposed to legal only). On x86 for example, we have custom lowering for some vector types, and that uses umax and sub. So to enable that fold, we need add use checks to avoid regressions. Even with legal-only lowering, we could see code with extra reg move instructions for extra uses, so that constraint would have to be eased very carefully to avoid penalties. Differential Revision: https://reviews.llvm.org/D112085	2021-10-21 09:47:19 -04:00
Alexey Bataev	3ea7877c8b	[SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization. Vectorization of PHIs and stores very similar, it might be beneficial to try to revectorize stores (like PHIs) if the total number of stores with the same/alternate opcode is less than the vector size but number of stores with the same type is larger than the vector size. Differential Revision: https://reviews.llvm.org/D109831	2021-10-21 06:25:32 -07:00
Kerry McLaughlin	0d153df69e	[SVE] Fix selection failure when splitting extended masked loads When splitting a masked load, `GetDependentSplitDestVTs` is used to get the MemVTs of the high and low parts. If the masked load is extended, this may return VTs with different element types which are used to create the high & low masked load instructions. This patch changes `GetDependentSplitDestVTs` to ensure we return VTs with the same element type. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D111996	2021-10-21 13:04:38 +01:00
YunQiang Su	302a165e18	[MIPS] Fix switching between 32/64-bit variants of r6 target triples If clang driver gets 64-bit r6 target triple like `mipsisa64r6` and additional option forces switching to generation of 32-bit code, it loses r6 abi and generates 32-bit r2-r5 abi code. ``` $ clang -target mipsisa64r6-linux-gnu -mabi=32 ``` This patch fixes the problem. - Add optional `SubArchType` argument to the `Triple::setArch()` method. - Implement generation of mips r6 target triples in the `Triple::getArchName()` method. Differential Revision: https://reviews.llvm.org/D110514.diff	2021-10-21 15:04:07 +03:00
Dawid Jurczak	9ba5bb4309	[NFC][LoopIdiom] Make for loops more readable Patch simplifies for loops in LIR following LLVM guidelines: https://llvm.org/docs/CodingStandards.html#use-range-based-for-loops-wherever-possible. Differential Revision: https://reviews.llvm.org/D112077	2021-10-21 12:17:44 +02:00
Evgeniy Brevnov	1a8ec24efb	[NARY-REASSOCIATE][NFC] Simplify min/max handling In order to explore different variants of reassociation current implementation uses "swap in a loop" approach. Unfortunately, the implementation is more complicated than it could be. This is an attempt to streamline the code. New approach is to extract core functionality into a helper function and call it explicitly as many times as required. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112128	2021-10-21 15:45:53 +07:00
David Sherwood	9448cdc900	[SVE][Analysis] Tune the cost model according to the tune-cpu attribute This patch introduces a new function: AArch64Subtarget::getVScaleForTuning that returns a value for vscale that can be used for tuning the cost model when using scalable vectors. The VScaleForTuning option in AArch64Subtarget is initialised according to the following rules: 1. If the user has specified the CPU to tune for we use that, else 2. If the target CPU was specified we use that, else 3. The tuning is set to "generic". For CPUs of type "generic" I have assumed that vscale=2. New tests added here: Analysis/CostModel/AArch64/sve-gather.ll Analysis/CostModel/AArch64/sve-scatter.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D110259	2021-10-21 09:33:50 +01:00
Vitaly Buka	6742c8a2d8	[NFC][msan] Break the loop when done We have nothing to do after the Argument is found.	2021-10-20 21:08:12 -07:00
Arthur Eubanks	6ea7437ca5	[SelectionDAG] Bail out of mergeTruncStores when not optimizing With unoptimized code, we may see lots of stores and spend too much time in mergeTruncStores. Fixes PR51827. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D111596	2021-10-20 16:58:22 -07:00
Sanjay Patel	40163f1df8	[x86] add special-case lowering for usubsat for AVX512 This is a small extension of D112095 to avoid another regression seen with D112085. In this case, we allow the same conversion from usubsat to ALU ops if the target supports vpternlog. That pattern will get converted later in X86DAGToDAGISel::tryVPTERNLOG(). This seems better than putting a magic immediate constant directly in this code to create the exact vpternlog that we need. It's possible that there are other special-cases along these lines, so we should try to keep all of the vpternlog magic in one place. Differential Revision: https://reviews.llvm.org/D112138	2021-10-20 16:41:13 -04:00
Stanislav Mekhanoshin	b92412fb28	[InstCombine] Fold `(a & ~b) & ~c` to `a & ~(b \| c)` %not1 = xor i32 %b, -1 %not2 = xor i32 %c, -1 %and1 = and i32 %a, %not1 %and2 = and i32 %and1, %not2 => %i1 = or i32 %b, %c %i2 = xor i32 %1, -1 %and2 = and i32 %i2, %a Differential Revision: https://reviews.llvm.org/D112108	2021-10-20 13:05:46 -07:00
Florian Hahn	8977bd5806	[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue. At the moment, rewriteLoopExitValue forgets the current phi node in the loop that collects phis to rewrite. A few lines after the value is forgotten, SCEV is used again to analyze incoming values and potentially expand SCEV expression. This means that another SCEV is created for PN, before the IR is actually updated in the next loop. This leads to accessing invalid cached expression in combination with D71539. PN should only be changed once the actual incoming exit value is set in the next loop. Moving invalidation there should ensure that PN is invalidated in all relevant cases. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D111495	2021-10-20 20:48:33 +01:00
Jon Roelofs	b046eb19b8	[AArch64][GlobalISel] combine (and (or x, c1), c2) => (and x, c2) iff c1 & c2 == 0 https://godbolt.org/z/h8ejrG4hb rdar://83597585 Differential Revision: https://reviews.llvm.org/D111856	2021-10-20 12:11:52 -07:00
Stanislav Mekhanoshin	c80d8a8cea	[AMDGPU] MachineLICM cannot hoist VALU MachineLoop::isLoopInvariant() returns false for all VALU because of the exec use. Check TII::isIgnorableUse() to allow hoisting. That unfortunately results in higher register consumption since MachineLICM does not adequately estimate pressure. Therefor I think it shall only be enabled after D107677 even though it does not depend on it. Differential Revision: https://reviews.llvm.org/D107859	2021-10-20 11:47:24 -07:00
Stanislav Mekhanoshin	6185835656	[AMDGPU] Allow rematerialization of SOP with virtual registers D106408 was doing this for all targets although it was reverted due to couple performance regressions on some targets. The difference for AMDGPU is the ability to rematerialize SOP instructions with virtual register uses like we already do for VOP. Differential Revision: https://reviews.llvm.org/D110743	2021-10-20 11:46:50 -07:00
Leonard Grey	5d57578a4e	[MC] Recursively calculate symbol offset This is speculative since I'm not sure if there's some implicit contract that a variable symbol must not have another variable symbol in its evaluation tree. Downstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=471146#c23. Test is based on alias.s (removed checks since we just need to know it didn't crash). Differential Revision: https://reviews.llvm.org/D109109	2021-10-20 14:29:43 -04:00
Sanjay Patel	80ab06c599	[InstCombine] fold fake vector insert to bit-logic bitcast (inselt (bitcast X), Y, 0) --> or (and X, MaskC), (zext Y) https://alive2.llvm.org/ce/z/Ux-662 Similar to D111082 / `db231ebdb0` : We want to avoid relatively opaque vector ops on types that are likely supported by the backend as scalar integers. The bitwise logic ops are more likely to allow further combining. We probably want to generalize this to allow a shift too, but that would oppose instcombine's general rule of not creating extra instructions, so that's left as a potential follow-up. Alternatively, we could do that transform in VectorCombine with the help of the TTI cost model. This is part of solving: https://llvm.org/PR52057	2021-10-20 14:21:40 -04:00
Arthur Eubanks	00500d5bad	[NFC] De-template LazyCallGraph::visitReferences() and move into .cpp file This makes changing it and recompiling it much faster.	2021-10-20 10:50:00 -07:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Zhi An Ng	e1fb13401e	[WebAssembly] Add prototype relaxed float min max instructions Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112146	2021-10-20 09:41:51 -07:00
Fraser Cormack	eabf11f9ea	[CodeGenPrepare] Avoid a scalable-vector crash in ctlz/cttz This patch fixes a crash when despeculating ctlz/cttz intrinsics with scalable-vector types. It is not safe to speculatively get the size of the vector type in bits in case the vector type is not a fixed-length type. As it happens this isn't required as vector types are skipped anyway. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112141	2021-10-20 16:45:55 +01:00
Craig Topper	fe1f0de003	[RISCV][WebAssembly][TargetLowering] Allow expandCTLZ/expandCTTZ to rely on CTPOP expansion for vectors. Our fallback expansion for CTLZ/CTTZ relies on CTPOP. If CTPOP isn't legal or custom for a vector type we would scalarize the CTLZ/CTTZ. This is different than CTPOP itself which would use a vector expansion. This patch teaches expandCTLZ/CTTZ to rely on the vector CTPOP expansion instead of scalarizing. To do this I had to add additional checks to make sure the operations used by CTPOP expansions are all supported. Some of the operations were already needed for the CTLZ/CTTZ expansion. This is a huge improvement to the RISCV which doesn't have a scalar ctlz or cttz in the base ISA. For WebAssembly, I've added Custom lowering to keep the scalarizing behavior. I've also extended the scalarizing to CTPOP. Differential Revision: https://reviews.llvm.org/D111919	2021-10-20 07:46:41 -07:00
Sanjay Patel	3efd2a0bec	[x86] make helper for useVPTERNLOG; NFC See D112085 for another use case.	2021-10-20 10:26:53 -04:00
Jeremy Morse	89950ade21	[DebugInfo][InstrRef] Track a single variable at a time Here's another performance patch for InstrRefBasedLDV: rather than processing all variable values in a scope at a time, instead, process one variable at a time. The benefits are twofold: * It's easier to reason about one variable at a time in your mind, * It improves performance, apparently from increased locality. The downside is that the value-propagation code gets indented one level further, plus there's some churn in the unit tests. Differential Revision: https://reviews.llvm.org/D111799	2021-10-20 15:03:52 +01:00
Sander de Smalen	be6c8dc765	[SelectionDAG] Fix getVectorSubVecPointer for scalable subvectors. When inserting a scalable subvector into a scalable vector through the stack, the index to store to needs to be scaled by vscale. Before this patch, that didn't yet happen, so it would generate the wrong offset, thus storing a subvector to the incorrect address and overwriting the wrong lanes. For some insert: nxv8f16 insert_subvector(nxv8f16 %vec, nxv2f16 %subvec, i64 2) The offset was not scaled by vscale: orr x8, x8, #0x4 st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8] ld1h { z0.h }, p0/z, [sp] And is changed to: mov x8, sp st1h { z0.h }, p0, [sp] st1h { z1.d }, p1, [x8, #1, mul vl] ld1h { z0.h }, p0/z, [sp] Differential Revision: https://reviews.llvm.org/D111633	2021-10-20 13:55:24 +01:00
Simon Pilgrim	5b395bd633	[CostModel][X86] Add costs for multiply-by-pow2 constants These are folded to left shifts in the backend. We should be able to extend this for multiply-by-negpow2 after D111968 has landed to resolve PR51436	2021-10-20 13:11:21 +01:00
Simon Pilgrim	9fc523d114	[X86] Remove X86ProcFamilyEnum::IntelSLM Replace X86ProcFamilyEnum::IntelSLM enum with a TuningUseSLMArithCosts flag instead, matching what we already do for Goldmont. This just leaves X86ProcFamilyEnum::IntelAtom to replace with general Tuning/Feature flags and we can finally get rid of the old X86ProcFamilyEnum enum. Differential Revision: https://reviews.llvm.org/D112079	2021-10-20 11:58:39 +01:00
Daniel Kiss	f903c85055	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2021-10-20 11:03:52 +02:00
Joerg Sonnenberger	ec428f7b78	[SPARC] Recognize the prefetch instruction Reviewed By: LemonBoy Differential Revision: https://reviews.llvm.org/D96311	2021-10-20 10:59:01 +02:00
Paulo Matos	6d0c7bc17d	[WebAssembly] Implementation of table.get/set for reftypes in LLVM IR This change implements new DAG nodes TABLE_GET/TABLE_SET, and lowering methods for load and stores of reference types from IR arrays. These global LLVM IR arrays represent tables at the Wasm level. Differential Revision: https://reviews.llvm.org/D111154	2021-10-20 10:31:31 +02:00
Zi Xuan Wu	de10a02fc0	[CSKY] Complete to add basic integer instruction set Complete the basic integer instruction set and add related predictor in CSKY.td. And it includes the instruction definition and asm parser support. Differential Revision: https://reviews.llvm.org/D111701	2021-10-20 15:50:44 +08:00
Evgeniy Brevnov	269f563a2b	[NARY-REASSOCIATE] Fix infinite recursion optimizing min\max To guarantee convergence of the algorithm each optimization step should decrease number of instructions when IR is modified. This property is not held in this test case. The problem is that SCEV Expander may do "unexpected" reassociation what results in creation of new min/max chains and introduction of extra instructions. As a result on each step we indefinitely optimize back and forth. The solution is to restrict SCEV Expander to perform uncontrolled reassociations by means of "Unknown" expressions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112060	2021-10-20 14:23:03 +07:00
Zhi An Ng	2542bfa43a	[WebAssembly] Add prototype relaxed swizzle instructions Add i8x16 relaxed_swizzle instructions. These are only exposed as builtins, and require user opt-in. Differential Revision: https://reviews.llvm.org/D112022	2021-10-19 17:53:04 -07:00
Yonghong Song	cd40b5a712	BPF: set .BTF and .BTF.ext section alignment to 4 Currently, .BTF and .BTF.ext has default alignment of 1. For example, $ cat t.c int foo() { return 0; } $ clang -target bpf -O2 -c -g t.c $ llvm-readelf -S t.o ... Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al ... [ 7] .BTF PROGBITS 0000000000000000 000167 00008b 00 0 0 1 [ 8] .BTF.ext PROGBITS 0000000000000000 0001f2 000050 00 0 0 1 But to have no misaligned data access, .BTF and .BTF.ext actually requires alignment of 4. Misalignment is not an issue for architecture like x64/arm64 as it can handle it well. But some architectures like mips may incur a trap if .BTF/.BTF.ext is not properly aligned. This patch explicitly forced .BTF and .BTF.ext alignment to be 4. For the above example, we will have [ 7] .BTF PROGBITS 0000000000000000 000168 00008b 00 0 0 4 [ 8] .BTF.ext PROGBITS 0000000000000000 0001f4 000050 00 0 0 4 Differential Revision: https://reviews.llvm.org/D112106	2021-10-19 16:26:01 -07:00
Artem Belevich	b6b7fe60a4	[NVPTX] Add a late SROA pass which allows optimizing away more allocas. Fixes performance regression https://bugs.llvm.org/show_bug.cgi?id=52037 Differential Revision: https://reviews.llvm.org/D111471	2021-10-19 16:18:28 -07:00
Yuta Saito	1813fde9cc	[WebAssembly] Emit clangast in custom section aligned by 4 bytes Emit __clangast in custom section instead of named data segment to find it while iterating sections. This could be avoided if all data segements (the wasm sense) were represented as their own sections (in the llvm sense). This can be resolved by https://github.com/WebAssembly/tool-conventions/issues/138 And the on-disk hashtable in clangast needs to be aligned by 4 bytes, so add paddings in name length field in custom section header. The length of clangast section name can be represented in 1 byte by leb128, and possible maximum pads are 3 bytes, so the section name length won't be invalid in theory. Fixes https://bugs.llvm.org/show_bug.cgi?id=35928 Differential Revision: https://reviews.llvm.org/D74531	2021-10-19 15:50:08 -07:00
Sanjay Patel	92a0389b04	[x86] add special-case lowering for usubsat for pre-SSE4 usubsat X, SMIN --> (X ^ SMIN) & (X s>> BW-1) This would be a regression with D112085 where we combine to usubsat more aggressively, so avoid that by matching the special-case where we are subtracting SMIN (signmask): https://alive2.llvm.org/ce/z/4_3gBD Differential Revision: https://reviews.llvm.org/D112095	2021-10-19 17:13:16 -04:00
Bjorn Pettersson	9c44a0996c	[SCEV] Fix formatting error introduced by D112080 Accidentally pushed D112080 without this clang-format cleanup.	2021-10-19 21:44:07 +02:00
Bjorn Pettersson	08619006a0	[SCEV] Avoid compile time explosion in ScalarEvolution::isImpliedCond As seen in PR51869 the ScalarEvolution::isImpliedCond function might end up spending lots of time when doing the isKnownPredicate checks. Calling isKnownPredicate for example result in isKnownViaInduction being called, which might result in isLoopBackedgeGuardedByCond being called, and then we might get one or more new calls to isImpliedCond. Even if the scenario described here isn't an infinite loop, using some random generated C programs as input indicates that those isKnownPredicate checks quite often returns true. On the other hand, the third condition that needs to be fulfilled in order to "prove implications via truncation", i.e. the isImpliedCondBalancedTypes check, is rarely fulfilled. I also made some similar experiments to look at how often we would get the same result when using isKnownViaNonRecursiveReasoning instead of isKnownPredicate. So far I haven't seen a single case when codegen is negatively impacted by using isKnownViaNonRecursiveReasoning. On the other hand, it seems like we get rid of the compile time explosion seen in PR51869 that way. Hence this patch. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D112080	2021-10-19 21:37:57 +02:00
Philip Reames	0836a1059d	Extend transform introduced in D111896 to multiple exits This is trivial. It was left out of the original review only because we had multiple copies of the same code in review at the same time, and keeping them in sync was easiest if the structure was kept in sync.	2021-10-19 12:12:19 -07:00
Philip Reames	fca0218875	[indvars] Canonicalize exit conditions to unsigned using range info This patch duplicates a bit of logic we apply to comparisons encountered during the IV users walk to conditions which feed exit conditions. Why? simplifyAndExtend has a very limited list of users it walks. In particular, in the examples is stops at the zext and never visits the icmp. (Because we can't fold the zext to an addrec yet in SCEV.) Being willing to visit when we haven't simplified regresses multiple tests (seemingly because of less optimal results when computing trip counts). Note that this can be trivially extended to multiple exiting blocks. I'm leaving that to a future patch (solely to cut down on the number of versions of the same code in review at once.) Differential Revision: https://reviews.llvm.org/D111896	2021-10-19 11:49:12 -07:00
Anna Thomas	9403514e76	[LoopPredication] Calculate profitability without BPI Using BPI within loop predication is non-trivial because BPI is only preserved lossily in loop pass manager (one fix exposed by lossy preservation is up for review at D111448). However, since loop predication is only used in downstream pipelines, it is hard to keep BPI from breaking for incomplete state with upstream changes in BPI. Also, correctly preserving BPI for all loop passes is a non-trivial undertaking (D110438 does this lossily), while the benefit of using it in loop predication isn't clear. In this patch, we rely on profile metadata to get almost similar benefit as BPI, without actually using the complete heuristics provided by BPI. This avoids the compile time explosion we tried to fix with D110438 and also avoids fragile bugs because BPI can be lossy in loop passes (D111448). Reviewed-By: asbirlea, apilipenko Differential Revision: https://reviews.llvm.org/D111668	2021-10-19 14:24:04 -04:00
Arthur Eubanks	ac0561ebb7	[Verifier] Add context for assume operand bundles verifier errors And fix a typo.	2021-10-19 09:52:04 -07:00
Jamie Schmeiser	3af474c0a1	Changes to print-changed classes in preparation for DotCfg change printer Summary: Break out non-functional changes to the print-changed classes that are needed for reuse with the DotCfg change printer in https://reviews.llvm.org/D87202. Various changes to the change printers to facilitate reuse with the upcoming DotCfg change printer. This includes changing several of the classes and their support classes to being templates. Also, some template parameter names were simplified to avoid confusion with planned identifiers in the DotCfg change printer to come. A virtual function in the class for comparing functions was changed to a lambda. The virtual function same was replaced with calls to operator==. The only intentional functional change was to add the exe name as the first parameter to llvm::sys::ExecuteAndWait Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D110737	2021-10-19 10:58:40 -04:00
David Sherwood	5ea35791e6	[AArch64] Split out processor/tuning features Following on from an earlier patch that introduced support for -mtune for AArch64 backends, this patch splits out the tuning features from the processor features. This gives us the ability to enable architectural feature set A for a given processor with "-mcpu=A" and define the set of tuning features B with "-mtune=B". It's quite difficult to write a test that proves we select the right features according to the tuning attribute because most of these relate to scheduling. I have created a test here: CodeGen/AArch64/misched-fusion-addr-tune.ll that demonstrates the different scheduling choices based upon the tuning. Differential Revision: https://reviews.llvm.org/D111551	2021-10-19 15:18:55 +01:00
David Sherwood	607fb1bb8c	[AArch64] Always add -tune-cpu argument to -cc1 driver This patch ensures that we always tune for a given CPU on AArch64 targets when the user specifies the "-mtune=xyz" flag. In the AArch64Subtarget if the tune flag is unset we use the CPU value instead. I've updated the release notes here: llvm/docs/ReleaseNotes.rst and added tests here: clang/test/Driver/aarch64-mtune.c Differential Revision: https://reviews.llvm.org/D110258	2021-10-19 14:57:51 +01:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Jeremy Morse	849b17949f	[DebugInfo][InstrRef] Avoid un-necessary densemap copies and comparisons This is purely a performance patch: InstrRefBasedLDV used to use three DenseMaps to store variable values, two for long term storage and one as a working set. This patch eliminates the working set, and updates the long term storage in place, thus avoiding two DenseMap comparisons and two DenseMap assignments, which can be expensive. Differential Revision: https://reviews.llvm.org/D111716	2021-10-19 11:10:14 +01:00
Lasse Folger	134e1817f6	[lldb] change name demangling to be consistent between windows and linx When printing names in lldb on windows these names contain the full type information while on linux only the name is contained. This change introduces a flag in the Microsoft demangler to control if the type information should be included. With the flag enabled demangled name contains only the qualified name, e.g: without flag -> with flag int (array2d)[10] -> array2d int (abc::array2d)[10] -> abc::array2d const int *x -> x For globals there is a second inconsistency which is not yet addressed by this change. On linux globals (in global namespace) are prefixed with :: while on windows they are not. Reviewed By: teemperor, rnk Differential Revision: https://reviews.llvm.org/D111715	2021-10-19 12:04:37 +02:00
Jeremy Morse	cf033bb2d3	[DebugInfo][NFC] Zero-initialize a class field This field gets assigned when the relevant object starts being used; but it remains uninitialized beforehand. This risks introducing hard-to-detect bugs if something changes, so zero-initialize the field.	2021-10-19 10:24:12 +01:00
Lang Hames	cc3115cd1d	[JITLink][x86-64] Lift GOT, PLT table managers into x86_64.h; reuse for MachO. This lifts the global offset table and procedure linkage table builders out of ELF_x86_64.h and into x86_64.h, renaming them with generic names x86_64::GOTTableBuilder and x86_64::PLTTableBuilder. MachO_x86_64.cpp is updated to use these classes instead of the older PerGraphGOTAndStubsBuilder tool.	2021-10-18 21:47:24 -07:00
Noah Shutty	e678c51177	[Support][ThinLTO] Move ThinLTO caching to LLVM Support library We would like to move ThinLTO’s battle-tested file caching mechanism to the LLVM Support library so that we can use it elsewhere in LLVM. Patch By: noajshu Differential Revision: https://reviews.llvm.org/D111371	2021-10-18 18:57:25 -07:00
Hsiangkai Wang	facff468b6	[RISCV] Reorder the vector register allocation order. GPR uses argument registers as the first group of registers to allocate. This patch uses vector argument registers, v8 to v23, as the first group to allocate. Differential Revision: https://reviews.llvm.org/D111304	2021-10-19 09:30:13 +08:00
Lang Hames	bc03a9c066	Simplify the TableManager class and move it into a public header. Moves visitEdge into the TableManager derivatives, replacing the fixEdgeKind methods in those classes. The visitEdge method takes on responsibility for updating the edge target, as well as its kind.	2021-10-18 18:20:33 -07:00
Anshil Gandhi	0567f03331	[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols By default clang emits complete contructors as alias of base constructors if they are the same. The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols. @yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true` and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had to be extended to support aliases to functions. inline-calls.ll was corrected appropriately. Reviewed By: yaxunl, #amdgpu Differential Revision: https://reviews.llvm.org/D109707	2021-10-18 16:53:15 -06:00
Simon Pilgrim	a83384498b	[X86] combineMulToPMADDWD - replace ASHR(X,16) -> LSHR(X,16) If we're using an ashr to sign-extend the entire upper 16 bits of the i32 element, then we can replace with a lshr. The sign bit will be correctly shifted for PMADDWD's implicit sign-extension and the upper 16 bits are zero so the upper i16 sext-multiply is guaranteed to be zero. The lshr also has a better chance of folding with shuffles etc.	2021-10-18 22:12:56 +01:00
Arthur Eubanks	ecd25edfc5	[InlineCost] Add empty line between call sites when printing inline costs	2021-10-18 13:56:48 -07:00
Craig Topper	1053e0b27c	[RISCV] Use a lambda to avoid having the Support library depend on Option library. RISCVISAInfo::toFeatures needs to allocate strings using ArgList::MakeArgString, but toFeatures lives in Support and MakeArgString lives in Option. toFeature only has one caller, so the simple fix is to have that caller pass a lamdba that wraps MakeArgString to break the dependency. Differential Revision: https://reviews.llvm.org/D112032	2021-10-18 13:39:37 -07:00
Matt Morehouse	431a5d8411	[x86] Implement a tagged-globals backend feature. The feature tells the backend to allow tags in the upper bits of global variable addresses. These tags will be ignored by upcoming CPUs with the Intel LAM feature but may be used in instrumentation passes (e.g., HWASan). This patch implements the feature by using @GOTPCREL relocations instead of direct references to the locally defined global. Thus the full tagged address can be loaded by a single instruction: movq global@GOTPCREL(%rip), %rax Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D111343	2021-10-18 13:31:10 -07:00
Alexandros Lamprineas	04dc68710a	[DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals When compiling for the RWPI relocation model the debug information is wrong: * the debug location is described as { DW_OP_addr Var } instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus } * the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32 Differential Revision: https://reviews.llvm.org/D111404	2021-10-18 21:29:46 +01:00
Arthur Eubanks	b8ce97372d	[NewPM] Add PipelineTuningOption to eagerly invalidate analyses This trades off more compile time for less peak memory usage. Right now it invalidates all function analyses after a module->function or cgscc->function adaptor. https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=instructions https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=max-rss For now this is just experimental. See comments on why this may affect optimizations. Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D111575	2021-10-18 13:20:35 -07:00
Alexey Bataev	b9cfa016da	[SLP]Fix emission of the shrink shuffles. Need to follow the order of the reused scalars from the ReuseShuffleIndices mask rather than rely on the natural order. Differential Revision: https://reviews.llvm.org/D111898	2021-10-18 13:13:12 -07:00
modimo	313c657fce	[InlineAdvisor] Add -inline-replay-scope=<Function\|Module> to control replay scope The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks. This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself. Testing: ninja check-all Reviewed By: wenlei, mtrofin Differential Revision: https://reviews.llvm.org/D110658	2021-10-18 13:08:39 -07:00
Nikita Popov	54d868991a	[ExpandMemCmp] Update CFG before DTU The applyUpdates() API requires that the CFG is already updated, so make sure to insert the new terminator first.	2021-10-18 21:49:47 +02:00
Petr Hosek	8e46e34d24	Revert "[Support][ThinLTO] Move ThinLTO caching to LLVM Support library" This reverts commit `92b8cc52bb` since it broke the gold plugin.	2021-10-18 12:24:05 -07:00
Noah Shutty	92b8cc52bb	[Support][ThinLTO] Move ThinLTO caching to LLVM Support library We would like to move ThinLTO’s battle-tested file caching mechanism to the LLVM Support library so that we can use it elsewhere in LLVM. Patch By: noajshu Differential Revision: https://reviews.llvm.org/D111371	2021-10-18 12:08:49 -07:00
Yonghong Song	f4a8526cc4	[NFC][BPF] fix comments and rename functions related to BTF_KIND_DECL_TAG There are no functionality change. Fix some comments and rename processAnnotations() to processDeclAnnotations() to avoid confusion when later BTF_KIND_TYPE_TAG is introduced (https://reviews.llvm.org/D111199).	2021-10-18 10:43:45 -07:00
Jon Roelofs	1300677f97	[AArch64][GlobalISel] combine and + [la]sr => ubfx https://godbolt.org/z/h8ejrG4hb rdar://83597585 Differential Revision: https://reviews.llvm.org/D111839	2021-10-18 10:33:01 -07:00
Yonghong Song	e9e4fc0fd3	BPF: fix a bug in IRPeephole pass Commit `009f3a89d8` ("BPF: remove intrindics @llvm.stacksave() and @llvm.stackrestore()") implemented IRPeephole pass to remove llvm.stacksave()/stackrestore() instrinsics. Buildbot reported a failure: UNREACHABLE executed at ../lib/IR/LegacyPassManager.cpp:1445! which is: llvm_unreachable("Pass modifies its input and doesn't report it"); The code has changed but the implementation didn't return true for changing. This patch fixed this problem.	2021-10-18 10:18:24 -07:00
Florian Hahn	e844f05397	[LoopUtils] Simplify addRuntimeCheck to return a single value. This simplifies the return value of addRuntimeCheck from a pair of instructions to a single `Value `. The existing users of addRuntimeChecks were ignoring the first element of the pair, hence there is not reason to track FirstInst and return it. Additionally all users of addRuntimeChecks use the second returned `Instruction ` just as `Value `, so there is no need to return an `Instruction `. Therefore there is no need to create a redundant dummy `and X, true` instruction any longer. Effectively this change should not impact the generated code because the redundant AND will be folded by later optimizations. But it is easy to avoid creating it in the first place and it allows more accurately estimating the cost of the runtime checks.	2021-10-18 18:03:09 +01:00
Craig Topper	84d9bc51a3	[RISCV] Rewrite forwardCopyWillClobberTuple to not assume that there are exactly 32 registers. NFC This function was copied from ARM where register pairs/triples/quads can wrap around the 32 encoding space. So register 31 can pair with register 0. This is not true for RISCV vectors. The spec specifically mentions the possibility of a future encoding that has more than 32 registers. This patch removes the modulo from the code and directly checks that destination register is in the source register range and not the beginning of the range. Though I don't expect an identity copy will occur. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D111467	2021-10-18 09:57:38 -07:00
Yonghong Song	009f3a89d8	BPF: remove intrindics @llvm.stacksave() and @llvm.stackrestore() Paul Chaignon reported a bpf verifier failure ([1]) due to using non-ABI register R11. For the test case, llvm11 is okay while llvm12 and later generates verifier unfriendly code. The failure is related to variable length array size. The following mimics the variable length array definition in the test case: struct t { char a[20]; }; void foo(void *); int test() { const int a = 8; char tmp[AA + sizeof(struct t) + a]; foo(tmp); ... } Paul helped bisect that the following llvm commit is responsible: `552c6c2328` ("PR44406: Follow behavior of array bound constant folding in more recent versions of GCC.") Basically, before the above commit, clang frontend did constant folding for array size "AA + sizeof(struct t) + a" to be 68, so used alloca for stack allocation. After the above commit, clang frontend didn't do constant folding for array size any more, which results in a VLA and llvm.stacksave/llvm.stackrestore is generated. BPF architecture API does not support stack pointer (sp) register. The LLVM internally used R11 to indicate sp register but it should not be in the final code. Otherwise, kernel verifier will reject it. The early patch ([2]) tried to fix the issue in clang frontend. But the upstream discussion considered frontend fix is really a hack and the backend should properly undo llvm.stacksave/llvm.stackrestore. This patch implemented a bpf IR phase to remove these intrinsics unconditionally. If eventually the alloca can be resolved with constant size, r11 will not be generated. If alloca cannot be resolved with constant size, SelectionDag will complain, the same as without this patch. [1] https://lore.kernel.org/bpf/20210809151202.GB1012999@Mem/ [2] https://reviews.llvm.org/D107882 Differential Revision: https://reviews.llvm.org/D111897	2021-10-18 09:51:19 -07:00
Kazu Hirata	8568ca789e	Use llvm::erase_if (NFC)	2021-10-18 09:33:42 -07:00
Kirill Stoimenov	62627c7217	[Sanitizers] Replaced getMaxPointerSizeInBits with getPointerSizeInBits, which was causing failures for 32bit x86. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D111829	2021-10-18 09:31:14 -07:00
Mircea Trofin	8612b47a8e	[NFC] ProfileSummary: const a bunch of members and fields. It helps readability and maintainability (don't need to chase down writes to a field I see is const, for example)	2021-10-18 08:55:06 -07:00
Gil Rapaport	1156bd4fc3	[LV] Record memory widening decisions (NFCI) Record widening decisions for memory operations within the planned recipes and use the recorded decisions in code-gen rather than querying the cost model. Differential Revision: https://reviews.llvm.org/D110479	2021-10-18 18:03:35 +03:00
Jessica Clarke	f5755c0849	[Mips] Add glue between CopyFromReg, CopyToReg and RDHWR nodes for TLS The MIPS ABI requires the thread pointer be accessed via rdhwr $3, $r29. This is currently represented by (CopyToReg $3, (RDHWR $29)) followed by a (CopyFromReg $3). However, there is no glue between these, meaning scheduling can break those apart. In particular, PR51691 is a report where PseudoSELECT_I was moved to between the CopyToReg and CopyFromReg, and since its expansion uses branches, it split the def and use of the physical register between two basic blocks, resulting in the def being eliminated and the use having no def. It also seems possible that a similar situation could arise splitting up the CopyToReg from the RDHWR, causing the RDHWR to use a destination register other than $3, violating the ABI requirement. Thus, add glue between all three nodes to ensure they aren't split up during instruction selection. No regression test is added since any test would be implictly relying on specific scheduling behaviour, so whilst it might be testing that glue is preventing reordering today, changes to scheduling behaviour could result in the test no longer being able to catch a regression here, as the reordering might no longer happen for other unrelated reasons. Fixes PR51691. Reviewed By: atanasyan, dim Differential Revision: https://reviews.llvm.org/D111967	2021-10-18 15:10:20 +01:00
Kerry McLaughlin	ac4e01ea0e	[SVE][CodeGen] Fix predicate for add/sub + element count patterns The patterns added in D111441 should use the HasSVEorStreamingSVE predicate. This changes one incorrect use of HasSVE with the new patterns.	2021-10-18 14:42:29 +01:00
Andrew Wei	f5056c8c16	[AArch64] Improve shuffle vector by using wider types Try to widen element type to get a new mask value for a better permutation sequence, so that we can use NEON shuffle instructions, such as zip1/2, UZP1/2, TRN1/2, REV, INS, etc. For example: shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3> is equivalent to: shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1> Finally, we can get: mov v0.d[0], v1.d[1] Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D111619	2021-10-18 21:24:45 +08:00
Sanjay Patel	2a3cc4d461	[Analysis] add utility function for unary shuffle mask creation This is NFC-intended for the callers. Posting in case there are other potential users that I missed. I would also use this from VectorCombine in a patch for: https://llvm.org/PR52178 ( D111901 ) Differential Revision: https://reviews.llvm.org/D111891	2021-10-18 09:00:39 -04:00
Simon Pilgrim	f041338153	[X86][Costmodel] Add SSE2 sub-128bit vXi32/f32 stride 2 interleaved store costs Differential Revision: https://reviews.llvm.org/D111941	2021-10-18 13:46:10 +01:00
Simon Pilgrim	c850d5c5c8	[X86][Costmodel] Add SSE2 sub-128bit vXi8/16 stride 2 interleaved store costs Differential Revision: https://reviews.llvm.org/D111941	2021-10-18 13:15:14 +01:00
Stephen Tozer	b9ca73e1a8	[DebugInfo] Correctly handle arrays with 0-width elements in GEP salvaging Fixes an issue where GEP salvaging did not properly account for GEP instructions which stepped over array elements of width 0 (effectively a no-op). This unnecessarily produced long expressions by appending `... + (x * 0)` and potentially extended the number of SSA values used in the dbg.value. This also erroneously triggered an assert in the salvage function that the element width would be strictly positive. These issues are resolved by simply ignoring these useless operands. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D111809	2021-10-18 12:01:12 +01:00
Peter Waller	c4603a8a43	[InstCombine][DebugInfo] Remove superflous assertion, add test When this code was added, an unnecessary assertion slipped in which we now hit in real code. Add a test to defend against it firing again.	2021-10-18 11:00:01 +00:00
Jay Foad	d55db4b033	[AMDGPU] Remove unused VirtRegMap analysis. NFC.	2021-10-18 11:55:40 +01:00
Max Kazantsev	baad10c09e	Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits" This reverts commit `fa16329ae0`. See comments in discussion. Merged by mistake, not entirely getting what the problem was.	2021-10-18 17:14:11 +07:00
Jay Foad	a129932b0d	[AMDGPU] Add link to bug	2021-10-18 10:33:42 +01:00
Jeremy Morse	ea970661dc	Fix signed/unsigned comparison after `b5426ced71` gcc11 warns that this counter causes a signed/unsigned comaprison when it's later compared with a SmallVector::difference_type. gcc appears to be correct, clang does not warn one way or the other.	2021-10-18 10:28:52 +01:00
Jay Foad	012248b0bc	Remove the verifyAfter mechanism that was replaced by D111397 Differential Revision: https://reviews.llvm.org/D111872	2021-10-18 10:26:46 +01:00
Jay Foad	36deb9a670	Add new MachineFunction property FailsVerification TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets you skip machine verification after a particular pass. Unfortunately this is used in generic code in TargetPassConfig itself to skip verification after a generic pass, only because some previous target- specific pass damaged the MIR on that specific target. This is bad because problems in one target cause lack of verification for all targets. This patch replaces that mechanism with a new MachineFunction property called "FailsVerification" which can be set by (usually target-specific) passes that are known to introduce problems. Later passes can reset it again if they are known to clean up the previous problems. Differential Revision: https://reviews.llvm.org/D111397	2021-10-18 10:26:46 +01:00
Piotr Sobczak	d869921004	[AMDGPU] Add patterns for i8/i16 local atomic load/store Add patterns for i8/i16 local atomic load/store. Added tests for new patterns. Copied atomic_[store/load]_local.ll to GlobalISel directory. Differential Revision: https://reviews.llvm.org/D111869	2021-10-18 11:23:10 +02:00
Fraser Cormack	3d850d03ae	[SelectionDAG] Fix illegal widening of scalable-vector loads The process of widening simple vector loads attempts to use a load of a wider vector type if the original load is sufficiently aligned to avoid memory faults. However this optimization is only legal when performed on fixed-length vector types. For scalable vector types this is invalid (unless vscale happens to be 1). This patch does increase the likelihood of compiler crashes (from `FindMemType` failing to find a suitable type) but this now better matches how widening non-simple loads, insufficiently-aligned loads, and scalable-vector stores are handled. Patches will be introduced later by which loads and stores can be widened on targets with support for masked or predicated operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D111885	2021-10-18 10:00:00 +01:00
Luo, Yuanke	942536ac08	[X86] Prefer VEX encoding in X86 assembler. This patch is to order the AVX instructions ahead of AVX512 instructions in the matching table so that the AVX instructions can be matched first. Thanks Craig and Shengchen for the idea. Differential Revision: https://reviews.llvm.org/D111538	2021-10-18 16:54:11 +08:00
Stanislav Mekhanoshin	7cdb1df8c7	[AMDGPU] Divergence driven selection for fused bitlogic The change adds divergence predicates for fused logical operations. The problem with selecting a scalar fused op such as S_NOR_B32 is that it does not have a VALU counterpart and will be split in moveToVALU. At the same time it prevents selection of a better opcode on the VALU side (such as V_OR3_B32) which does not have a counterpart on SALU side. XNOR opcodes are left as is and selected as scalar to get advantage of the SIInstrInfo::lowerScalarXnor() code which can commute operations to keep one of two opcodes on SALU if possible. See xnor.ll test for this. Differential Revision: https://reviews.llvm.org/D111907	2021-10-18 01:44:25 -07:00
Raphael Isemann	de4d2f80b7	Fix cyclic header dependency between Support<->Option due to RISCVISAInfo This was introduced in D105168 which added RISCVISAInfo.h.	2021-10-18 10:06:11 +02:00
Jingu Kang	3f0b178de2	[AArch64] Fixed a bug on AArch64MIPeepholeOpt Create new virtual register for the definition of new AND instruction and replace old register by the new one to keep SSA form. Differential Revision: https://reviews.llvm.org/D109963	2021-10-18 08:55:42 +01:00
Bing1 Yu	f383c53311	[MachineSink] Compile time improvement for large testcases which has many kill flags We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags. Before: Number of BasicBlocks:33357 Number of Instructions:162067 Number of Cleared Kill Flags:32869 Time of handling kill flags(ms):1.607509e+05 After: Number of BasicBlocks:33357 Number of Instructions:162067 Number of Cleared Kill Flags:32869 Time of handling kill flags:3.987371e+03 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D111688	2021-10-18 15:44:07 +08:00
Qiu Chaofan	67c64d8337	[PowerPC] Implement scheduling model for Power10 Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D110855	2021-10-18 15:27:49 +08:00
Max Kazantsev	fa16329ae0	[NFC] [LoopPeel] Change the way DT is updated for loop exits When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. With this NFC we just insert edges from cloned exiting blocks to their exits after peeling each iteration (we accumulate the insertion updates and then after peeling apply the updates to DT). This patch was a part of D110922. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev	2021-10-18 10:23:05 +07:00
Simon Pilgrim	0bb32b1b21	[X86][SLM] Fix BitTest+Set uops + port usage Both ports are required for BitTest ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures and what Intel AoM / Agner reports as well.	2021-10-17 18:13:15 +01:00
Simon Pilgrim	5ed5df4802	[X86][SLM] Fix uops for PCMPISTR/PCMPISTR instructions Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.	2021-10-17 18:13:14 +01:00
Simon Pilgrim	680afaaa5d	[X86][SLM] Fix uops for PCLMULQDQ Based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.	2021-10-17 18:13:14 +01:00
Simon Pilgrim	498c7236bc	[X86][SLM] +1uop for PSHUFBrm xmm Extra 1uop for folded pshufb ops, based off a recent llvm-exegesis capture and what Intel AoM / Agner reports as well.	2021-10-17 18:13:14 +01:00
Nikita Popov	274b2439f8	[ConstantRange] Add fast signed multiply The multiply() implementation is very slow -- it performs six multiplications in double the bitwidth, which means that it will typically work on allocated APInts and bypass fast-path implementations. Add an additional implementation that doesn't try to produce anything better than a full range if overflow is possible. At least for the BasicAA use-case, we really don't care about more precise modeling of overflow behavior. The current use of multiply() is fine while the implementation is limited to a single index, but extending it to the multiple-index case makes the compile-time impact untenable.	2021-10-17 16:41:49 +02:00
Roman Lebedev	91373bf12e	[X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costs A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0` So could pick cost of `40` For store we have: https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0` So we could pick cost of `40`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111945	2021-10-17 17:28:10 +03:00
Roman Lebedev	3274ce3a28	[X86][Costmodel] Load/store i64 Stride=2 VF=32 interleaving costs A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0` So could pick cost of `32` For store we have: https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0` So we could pick cost of `32`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111944	2021-10-17 17:28:10 +03:00
Roman Lebedev	3a6a9f74d3	[X86][Costmodel] Load/store i32 Stride=4 VF=32 interleaving costs A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0` So could pick cost of `68` For store we have: https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0` So we could pick cost of `64`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111943	2021-10-17 17:28:09 +03:00
Roman Lebedev	4b76a74b42	[X86][Costmodel] Load/store i32 Stride=3 VF=32 interleaving costs A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0` So could pick cost of `32` For store we have: https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0` So we could pick cost of `48`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111942	2021-10-17 17:28:09 +03:00
Roman Lebedev	887acf6842	[X86][Costmodel] Load/store i16 Stride=6 VF=32 interleaving costs A few more tuples are being queried after D111546. Might be good to model them, They all require a lot of manual assembly surgery. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0` So could pick cost of `212` For store we have: https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0` So we could pick cost of `90`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111940	2021-10-17 17:28:09 +03:00
David Truby	2e0fb007d6	[llvm][AArch64][SVE] Fold literals into math instructions SVE has predicated literal forms of some instructions for specific literals, which currently are generated correctly when using ACLE but not when those instructions are generated directly. This adds the patterns to generate those instructions when generating from standard LLVM IR instructions. Differential Revision: https://reviews.llvm.org/D99074	2021-10-17 10:57:04 +00:00
Kito Cheng	ff13189c5d	[RISCV] Unify the arch string parsing logic to to RISCVISAInfo. How many place you need to modify when implementing a new extension for RISC-V? At least 7 places as I know: - Add new SubtargetFeature at RISCV.td - -march parser in RISCV.cpp - RISCVTargetInfo::initFeatureMap@RISCV.cpp for handling feature vector. - RISCVTargetInfo::getTargetDefines@RISCV.cpp for pre-define marco. - Arch string parser for ELF attribute in RISCVAsmParser.cpp - ELF attribute emittion in RISCVAsmParser.cpp, and make sure it's in canonical order... - ELF attribute emittion in RISCVTargetStreamer.cpp, and again, must in canonical order... And now, this patch provide an unified infrastructure for handling (almost) everything of RISC-V arch string. After this patch, you only need to update 2 places for implement an extension for RISC-V: - Add new SubtargetFeature at RISCV.td, hmmm, it's hard to avoid. - Add new entry to RISCVSupportedExtension@RISCVISAInfo.cpp or SupportedExperimentalExtensions@RISCVISAInfo.cpp . Most codes are come from existing -march parser, but with few new feature/bug fixes: - Accept version for -march, e.g. -march=rv32i2p0. - Reject version info with `p` but without minor version number like `rv32i2p`. Differential Revision: https://reviews.llvm.org/D105168	2021-10-17 16:25:23 +08:00
Ben Shi	d0dbc991c0	Revert "[AArch64] Optimize add/sub with immediate" This reverts commit `9bf6bef995`.	2021-10-16 22:17:18 +00:00
Fangrui Song	8e1d532707	[Object] Simplify RELR decoding	2021-10-16 15:03:14 -07:00
Craig Topper	beb7862db5	[X86] Add DAG combine for negation of CMOV absolute value pattern. This patch detects the absolute value pattern on the RHS of a subtract. If we find it we swap the CMOV true/false values and replace the subtract with an ADD. There may be a more generic way to do this, but I'm not sure. Targets that don't have legal or custom ISD::ABS use a generic expand in DAG combiner already when it sees (neg (abs(x))). I haven't checked what happens if the neg is a more general subtract. Fixes PR50991 for X86. Reviewed By: RKSimon, spatel Differential Revision: https://reviews.llvm.org/D111858	2021-10-16 13:31:43 -07:00
Nikita Popov	492a4a428f	[APInt] Fix 1-bit edge case in smul_ov() The sdiv used to check for overflow can itself overflow if the LHS is signed min and the RHS is -1. The code tried to account for this by also checking the commuted version. However, for 1-bit values, signed min and -1 are the same value, so both divisions overflow. As such, the overflow for -1 * -1 was not detected (which results in -1 rather than 1 for 1-bit values). Fix this by explicitly checking for this case instead. Noticed while adding exhaustive test coverage for smul_ov(), which is also part of this commit.	2021-10-16 20:31:04 +02:00
Simon Pilgrim	85b87179f4	[TTI][X86] Add v8i16 -> 2 x v4i16 stride 2 interleaved load costs Split SSE2 and SSSE3 costs to correctly handle PSHUFB lowering - as was noted on D111938	2021-10-16 17:28:07 +01:00
Simon Pilgrim	6ec644e215	[TTI][X86] Add SSE2 sub-128bit vXi16/32 and v2i64 stride 2 interleaved load costs These cases use the same codegen as AVX2 (pshuflw/pshufd) for the sub-128bit vector deinterleaving, and unpcklqdq for v2i64. It's going to take a while to add full interleaved cost coverage, but since these are the same for SSE2 -> AVX2 it should be an easy win. Fixes PR47437 Differential Revision: https://reviews.llvm.org/D111938	2021-10-16 16:21:45 +01:00
Martin Storsjö	6c96ceabaf	[Support] Add more Windows error codes to mapWindowsError Also sort ERROR_BAD_NETPATH correctly. Compared with the similar error code mapping in libcxx/src/filesystem/operations.cpp, I'm leaving out mappings for ERROR_NOT_SAME_DEVICE and ERROR_OPERATION_ABORTED. They map nicely to std::errc::cross_device_link and std::errc::operation_canceled, but those aren't available in llvm::errc, as they aren't available across all platforms. Also, the libcxx version maps ERROR_INVALID_NAME to no_such_file_or_directory instead of invalid_argument. Differential Revision: https://reviews.llvm.org/D111874	2021-10-16 16:14:49 +03:00
Tomasz Miąsko	48ce523a26	[Symbolize] Demangle Rust symbols Add support for demangling Rust v0 symbols to LLVM symbolizer by reusing nonMicrosoftDemangle which supports both Itanium and Rust mangling. Reviewed By: dblaikie, jhenderson Part of https://reviews.llvm.org/D110664	2021-10-16 13:32:17 +02:00
Tomasz Miąsko	41a6fc8438	[Demangle] Extract nonMicrosoftDemangle from llvm::demangle Introduce a new demangling function that supports symbols using Itanium mangling and Rust v0 mangling, and is expected in the near future to include support for D mangling as well. Unlike llvm::demangle, the function does not accept extra underscore decoration. The callers generally know exactly when symbols should include the extra decoration and so they should be responsible for stripping it. Functionally the only intended change is to allow demangling Rust symbols with an extra underscore decoration through llvm::demangle, which matches the existing behaviour for Itanium symbols. Reviewed By: dblaikie, jhenderson Part of https://reviews.llvm.org/D110664	2021-10-16 13:32:16 +02:00
Simon Pilgrim	d464a9d476	[Analysis] Replace assert(isa)/dyn_cast with cast. NFC. cast<> will perform the assertion for us. Removes a static analysis null dereference warning.	2021-10-16 11:40:19 +01:00
Simon Pilgrim	a1b43d2bc9	[LazyValueInfo] getPredicateAt - remove unnecessary null pointer check. NFC. We already dereference the CxtI pointer several times before reaching the "if(CxtI)", we have no need to check it again. Fixes a coverity warning.	2021-10-16 11:20:19 +01:00
Simon Pilgrim	c288241795	[ConstantFolding] ConstantFoldScalarCall2 - early-out if getLibFunc fails. NFC.	2021-10-16 11:20:19 +01:00
Simon Pilgrim	c18cf10a04	[ConstantFolding] Use getValueAPF const ref value where possible. NFC. Don't copy the value if we can avoid it.	2021-10-16 11:20:19 +01:00
Simon Pilgrim	76ca0d67ab	[ConstantFolding] ConstantFoldScalarCall1 - early-out if getLibFunc fails. NFC.	2021-10-16 11:20:18 +01:00
Roman Lebedev	d137f1288e	[X86][LV] X86 does not prefer vectorized addressing And another attempt to start untangling this ball of threads around gather. There's `TTI::prefersVectorizedAddressing()`hoop, which confusingly defaults to `true`, which tells LV to try to vectorize the addresses that lead to loads, but X86 generally can not deal with vectors of addresses, the only instructions that support that are GATHER/SCATTER, but even those aren't available until AVX2, and aren't really usable until AVX512. This specializes the hook for X86, to return true only if we have AVX512 or AVX2 w/ fast gather. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D111546	2021-10-16 12:32:18 +03:00

... 6 7 8 9 10 ...

152407 Commits