llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	e918ba6958	[LICM] Drop -licm-n2-threshold option This was a diagnostic option used to demonstrate a weakness in the AST-based LICM implementation. This problem does not exist in the MSSA-based LICM implementation, which has been enabled for a long time now. As such, this option is no longer relevant.	2021-08-17 22:41:31 +02:00
Nikita Popov	f58a642da1	[PassBuilder] Use loop-mssa for licm Currently specifying -licm or -passes=licm will implicitly create -passes=loop(licm). This does not match the intended default (used by the legacy PM and by the default pipeline) of using the MemorySSA-based LICM implementation. As I plan to drop the non-MSSA implementation, this will stop working entirely... This special-cases licm to create a loop-mssa manager instead. At this point it's still possible to use -passes='loop(licm)' to opt into the AST-based implementation. Differential Revision: https://reviews.llvm.org/D108155	2021-08-17 21:23:11 +02:00
Sanjay Patel	50c1138796	[InstCombine] add TODO about another min/max fold; NFC Suggested in post-commit for `d0975b7cb0`	2021-08-17 14:14:25 -04:00
Craig Topper	8f6cea43e7	[RISCV] Use RISCV::RVVBitsPerBlock for RGK_ScalableVector in getRegisterBitWidth. I might be wrong, but I think this is should be width of the known min size we use for scalable vectors. It shouldn't scale with minimum vlen. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107945	2021-08-17 11:13:15 -07:00
Simon Pilgrim	d7f288502f	SelectionDAGBuilder::visitInlineAsm - don't dereference dyn_cast<> results. dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warning.	2021-08-17 18:40:59 +01:00
Simon Pilgrim	caff2acae1	[AArch64] AArch64DAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results. dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warnings.	2021-08-17 18:40:59 +01:00
Simon Pilgrim	1e770f0388	[ARM] ARMDAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results. dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warnings.	2021-08-17 18:40:59 +01:00
Simon Pilgrim	fb81271e8b	[AMDGPU] Fix lowering of AMDGPU::G_CTTZ_ZERO_UNDEF to AMDGPU::G_AMDGPU_FFBL_B32 As mentioned on D107474, there was a copy+paste typo repeating G_CTLZ_ZERO_UNDEF that coverity reported as dead code. Differential Revision: https://reviews.llvm.org/D108210	2021-08-17 18:09:57 +01:00
Fraser Cormack	f3e9047249	[VP] Add vector-predicated reduction intrinsics This patch adds vector-predicated ("VP") reduction intrinsics corresponding to each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the unpredicated reductions, all VP reductions have a start value. This start value is returned when the no vector element is active. Support for expansion on targets without native vector-predication support is included. This patch is based on the ["reduction slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch (https://reviews.llvm.org/D57504). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104308	2021-08-17 17:56:35 +01:00
Joseph Huber	339aa76526	[OpenMP][NFC] Add option to print module after OpenMPOpt for debugging This patch adds an extra option to print the module after running one of the OpenMPOpt passes if debugging is enabled. This makes it much easier to inspect the effects of this pass when doing debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108146	2021-08-17 12:46:10 -04:00
Philip Reames	982da7a20c	[SCEVExpander] Stop hoisting IR when reusing phis his is a fix for PR43678, and is an alternate patch to D105723. The basic issue we're running into is that LSR + SCEVExpander are moving the very instruction whose operand we're in the process of expanding. This breaks the subtle and ill-documented invariant which let LSR work. (Full story can be found here: https://reviews.llvm.org/D105723#2878473) Rather than attempting a fix, this change just removes the optimization entirely. The code is entirely untested, and removing it appears to have no impact I can find. This code was added back in 2014 by `1e12f8563d` with a single test which does not seem to actually test the hoisting logic. From a philosophical standpoint, it also seems very strange to have the expander implementing optimizations which should live in a dedicated transform pass. Differential Revision: https://reviews.llvm.org/D106178	2021-08-17 09:38:32 -07:00
Fangrui Song	78cb1adc5c	[Object] Move llvm-nm's symbol version utility to ELFObjectFile::readDynsymVersions The utility can be reused by llvm-objdump -T. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D108096	2021-08-17 09:06:39 -07:00
Roman Lebedev	2078c4ecfd	[X86] Lower insertions into upper half of an 256-bit vector as broadcast+blend (PR50971) Broadcast is not worse than extract+insert of subvector. https://godbolt.org/z/aPq98G6Yh Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D105390	2021-08-17 18:45:10 +03:00
Tozer	5c6f748cbc	[MCParser] Correctly handle CRLF line ends when consuming line comments Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983 The AsmLexer currently has an issue with lexing line comments in files with CRLF line endings, in which it reads the carriage return as being part of the line comment. This causes an error for certain valid comment layouts; this patch fixes this by excluding the carriage return from the line comment. Differential Revision: https://reviews.llvm.org/D90234	2021-08-17 15:52:51 +01:00
Kazu Hirata	a14920c002	[Bitcode] Remove unused declaration writeBitcodeHeader (NFC) The corresponding definition was removed on Nov 29, 2016 in commit `5a0a2e648c`.	2021-08-17 07:10:51 -07:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
David Green	52e0cf9d61	[ARM] Enable subreg liveness This enables subreg liveness in the arm backend when MVE is present, which allows the register allocator to detect when subregister are alive/dead, compared to only acting on full registers. This can helps produce better code on MVE with the way MQPR registers are made up of SPR registers, but is especially helpful for MQQPR and MQQQQPR registers, where there are very few "registers" available and being able to split them up into subregs can help produce much better code. Differential Revision: https://reviews.llvm.org/D107642	2021-08-17 14:10:33 +01:00
David Green	62e892fa2d	[ARM] Add MQQPR and MQQQQPR spill and reload pseudo instructions As a part of D107642, this adds pseudo instructions for MQQPR and MQQQQPR register classes, that can spill and reloads entire registers whilst keeping them combined, not splitting them into multiple D subregs that a VLDMIA/VSTMIA would use. This can help certain analyses, and helps to prevent verifier issues with subreg liveness.	2021-08-17 13:51:34 +01:00
Sanjay Patel	e73f4e1123	[InstCombine] remove unused function argument; NFC	2021-08-17 08:10:42 -04:00
Sanjay Patel	d0975b7cb0	[InstCombine] fold signed min/max intrinsics with negated operands If both operands are negated, we can invert the min/max and do the negation after: smax (neg nsw X), (neg nsw Y) --> neg nsw (smin X, Y) smin (neg nsw X), (neg nsw Y) --> neg nsw (smax X, Y) This is visible as a remaining regression in D98152. I don't see a way to generalize this for 'unsigned' or adapt Negator to handle it. This only appears to be safe with 'nsw': https://alive2.llvm.org/ce/z/GUy1zJ Differential Revision: https://reviews.llvm.org/D108165	2021-08-17 08:10:42 -04:00
Sebastian Neubauer	fbae34635d	[GlobalISel] Add combine for PTR_ADD with regbanks Combine two G_PTR_ADDs, but keep the register bank of the constant. That way, the combine can be used in post-regbank-select combines. Introduce two helper methods in CombinerHelper, getRegBank and setRegBank that get and set an optional register bank to a register. That way, they can be used before and after register bank selection. Differential Revision: https://reviews.llvm.org/D103326	2021-08-17 13:58:16 +02:00
Tiehu Zhang	9cfa9b44a5	[CodeGenPrepare] The instruction to be sunk should be inserted before its user in a block In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107262	2021-08-17 18:58:15 +08:00
Jeremy Morse	708cbda577	[DebugInfo][InstrRef] Honour too-much-debug-info cutouts This reapplies `54a61c94f9`, its follow up in `547b712500`, which were reverted `95fe61e639`. Original commit message: VarLoc based LiveDebugValues will abandon variable location propagation if there are too many blocks and variable assignments in the function. If it didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end up with 1 million DBG_VALUEs just at the start of blocks. Instruction-referencing LiveDebugValues should honour this limitation too (because the same limitation applies to it). Hoist the relevant command line options into LiveDebugValues.cpp and pass it down into the implementation classes as an argument to ExtendRanges. I've duplicated all the run-lines in live-debug-values-cutoffs.mir to have an instruction-referencing flavour. Differential Revision: https://reviews.llvm.org/D107823	2021-08-17 11:34:49 +01:00
Simon Pilgrim	895ed64009	[AArch64] LowerCONCAT_VECTORS - merge getNumOperands() calls. NFCI. Improves on the unused variable fix from rG4357562067003e25ab343a2d67a60bd89cd66dbf	2021-08-17 11:23:03 +01:00
Anton Afanasyev	1f3e35b6d1	[AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG Add `shl` instruction to the DAG post-dominated by `trunc`, allowing TruncInstCombine to reduce bitwidth of expressions containing left shifts. The only thing we need to check is that the target bitwidth must be wider than the maximal shift amount: https://alive2.llvm.org/ce/z/AwArqu Part of https://reviews.llvm.org/D107766 Differential Revision: https://reviews.llvm.org/D108091	2021-08-17 12:44:37 +03:00
Bing1 Yu	bcec4ccd04	[X86] [AMX] Replace bitcast with specific AMX intrinsics with X86 specific cast. There is some discussion on the bitcast for vector and x86_amx at https://reviews.llvm.org/D99152. This patch is to introduce a x86 specific cast for vector and x86_amx, so that it can avoid some unnecessary optimization by middle-end. On the other way, we have to optimize the x86 specific cast by ourselves. This patch also optimize the cast operation to eliminate redundant code. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D107544	2021-08-17 17:04:26 +08:00
David Stuttard	ebdb0d09a4	AMDGPU: During img instruction ret value construction cater for non int values Make sure return type is int type. Differential Revision: https://reviews.llvm.org/D108131 Change-Id: Ic02f07d1234cd51b6ed78c3fecd2cb1d6acd5644	2021-08-17 09:08:24 +01:00
Kazu Hirata	8f5e9d65d6	[AsmParser] Remove MDConstant (NFC) The last use was removed on Sep 22, 2016 in commit `fcee2d8001`.	2021-08-16 21:21:11 -07:00
Whitney Tsang	a41c95c0e3	[LNICM] Fix infinite loop There is a bug introduced by https://reviews.llvm.org/D107219 which causes an infinite loop, when there are more than 2 levels PHINode chain. Reviewed By: uint256_t Differential Revision: https://reviews.llvm.org/D108166	2021-08-17 12:55:22 +09:00
Christudasan Devadasan	686607676f	[AMDGPU] Skip pseudo MIs in hazard recognizer Instructions like WAVE_BARRIER and SI_MASKED_UNREACHABLE are only placeholders to prevent certain unwanted transformations and will get discarded during assembly emission. They should not be counted during nop insertion. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108022	2021-08-16 23:11:14 -04:00
Carl Ritson	99c790dc21	[AMDGPU] Make BVH isel consistent with other MIMG opcodes Suffix opcodes with _gfx10. Remove direct references to architecture specific opcodes. Add a BVH flag and apply this to diassembly. Fix a number of disassembly errors on gfx90a target caused by previous incorrect BVH detection code. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D108117	2021-08-17 10:42:22 +09:00
Hongtao Yu	f27fee623d	[SamplePGO][NFC] Dump function profiles in order Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D108147	2021-08-16 17:22:30 -07:00
Arthur Eubanks	0d822da2bd	[NFC] Remove/replace some confusing attribute getters on Function	2021-08-16 16:12:37 -07:00
Min-Yih Hsu	eec3495a9d	[M68k] Do not pass llvm::Function& to M68kCCState Previously we're passing `llvm::Function&` into `M68kCCState` to lower arguments in fastcc. However, that reference might not be available if it's a library call and we only need its argument types. Therefore, now we're simply passing a list of argument llvm::Type-s. This fixes PR-50752. Differential Revision: https://reviews.llvm.org/D108101	2021-08-16 15:33:08 -07:00
Afanasyev Ivan	913b5d2f7a	[AsmPrinter] fix nullptr dereference for MBBs with hasAddressTaken property without BB Basic block pointer is dereferenced unconditionally for MBBs with hasAddressTaken property. MBBs might have hasAddressTaken property without reference to BB. Backend developers must assign fake BB to MBB to workaround this issue and it should be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D108092	2021-08-16 15:32:09 -07:00
David Green	9236dea255	[ARM] Create MQQPR and MQQQQPR register classes Similar to the MQPR register class as the MVE equivalent to QPR, this adds MQQPR and MQQQQPR register classes for the MVE equivalents of QQPR and QQQQPR registers. The MVE MQPR seemed have worked out quite well, and adding MQQPR and MQQQQPR allows us to a little more accurately specify the number of registers, calculating register pressure limits a little better. Differential Revision: https://reviews.llvm.org/D107463	2021-08-16 22:58:12 +01:00
Anshil Gandhi	f22ba51873	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-16 14:56:01 -06:00
Stanislav Mekhanoshin	877572cc19	Allow rematerialization of virtual reg uses Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408	2021-08-16 12:42:42 -07:00
Rong Xu	9b8425e42c	Reapply commit `b7425e956` The commit b7425e956: [NFC] fix typos is harmless but was reverted by accident. Reapply.	2021-08-16 12:18:40 -07:00
Stanislav Mekhanoshin	b9e433b02a	Prevent machine licm if remattable with a vreg use Check if a remateralizable nstruction does not have any virtual register uses. Even though rematerializable RA might not actually rematerialize it in this scenario. In that case we do not want to hoist such instruction out of the loop in a believe RA will sink it back if needed. This already has impact on AMDGPU target which does not check for this condition in its isTriviallyReMaterializable implementation and have instructions with virtual register uses enabled. The other targets are not impacted at this point although will be when D106408 lands. Differential Revision: https://reviews.llvm.org/D107677	2021-08-16 12:09:00 -07:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Nikita Popov	570c9beb8e	[MemorySSA] Remove unnecessary MSSA dependencies LoopLoadElimination, LoopVersioning and LoopVectorize currently fetch MemorySSA when construction LoopAccessAnalysis. However, LoopAccessAnalysis does not actually use MemorySSA and we can pass nullptr instead. This saves one MemorySSA calculation in the default pipeline, and thus improves compile-time. Differential Revision: https://reviews.llvm.org/D108074	2021-08-16 20:40:55 +02:00
Nikita Popov	0a031449b2	[PassBuilder] Don't use MemorySSA for standalone LoopRotate passes Two standalone LoopRotate passes scheduled using createFunctionToLoopPassAdaptor() currently enable MemorySSA. However, while LoopRotate can preserve MemorySSA, it does not use it, so requiring MemorySSA is unnecessary. This change doesn't have a practical compile-time impact by itself, because subsequent passes still request MemorySSA. Differential Revision: https://reviews.llvm.org/D108073	2021-08-16 20:34:18 +02:00
Kostya Kortchinsky	80ed75e7fb	Revert "[NFC] Fix typos" This reverts commit `b7425e956b`.	2021-08-16 11:13:05 -07:00
Rong Xu	b7425e956b	[NFC] Fix typos s/senstive/senstive/g	2021-08-16 10:15:30 -07:00
Jordan Rupprecht	4357562067	[NFC][AArch64] Fix unused var in release build	2021-08-16 10:04:32 -07:00
Paul Robinson	94b4598d77	[PS4] stp[n]cpy not available on PS4	2021-08-16 09:06:52 -07:00
Craig Topper	92abb1cf90	[TypePromotion] Don't mutate the result type of SwitchInst. SwitchInst should have a void result type. Add a check to the verifier to catch this error. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D108084	2021-08-16 08:54:34 -07:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Jeremy Morse	95fe61e639	Revert `54a61c94f9` and its follow up in `547b712500` These were part of D107823, however asan has found something excitingly wrong happening: https://lab.llvm.org/buildbot/#/builders/5/builds/10543/steps/13/logs/stdio	2021-08-16 15:48:56 +01:00
Sanjay Patel	de285eacb0	[InstCombine] allow for constant-folding in GEP transform This would crash the reduced test or as described in https://llvm.org/PR51485 ...because we can't mark a constant (-expression) with 'inbounds'.	2021-08-16 10:36:56 -04:00
Jeremy Morse	547b712500	Suppress signedness-comparison warning This is a follow-up to `54a61c94f9`.	2021-08-16 15:29:43 +01:00
Jeremy Morse	54a61c94f9	[DebugInfo][InstrRef] Honour too-much-debug-info cutouts VarLoc based LiveDebugValues will abandon variable location propagation if there are too many blocks and variable assignments in the function. If it didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end up with 1 million DBG_VALUEs just at the start of blocks. Instruction-referencing LiveDebugValues should honour this limitation too (because the same limitation applies to it). Hoist the relevant command line options into LiveDebugValues.cpp and pass it down into the implementation classes as an argument to ExtendRanges. I've duplicated all the run-lines in live-debug-values-cutoffs.mir to have an instruction-referencing flavour. Differential Revision: https://reviews.llvm.org/D107823	2021-08-16 15:06:40 +01:00
Roman Lebedev	febcedf18c	Revert "[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer" https://bugs.llvm.org/show_bug.cgi?id=51490 was filed. This reverts commit `35a8bdc775`.	2021-08-16 14:30:29 +03:00
David Sherwood	9b19b77883	[NFC] Remove unused code in llvm::createSimpleTargetReduction	2021-08-16 09:50:45 +01:00
Roman Lebedev	2eb554a9fe	Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit `3d9beefc7d`.	2021-08-16 11:07:42 +03:00
Cullen Rhodes	09507b5325	[AArch64][SME] Disable NEON in streaming mode In streaming mode most of the NEON instruction set is illegal, disable NEON when compiling with `+streaming-sve`, unless NEON is explictly requested. Subsequent patches will add support for the small subset of NEON instructions that are legal in streaming mode. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D107902	2021-08-16 07:56:48 +00:00
Christian Sigg	93c55d5ea2	Reset all options in cl::ResetCommandLineParser() Reset cl::Positional, cl::Sink and cl::ConsumeAfter options as well in cl::ResetCommandLineParser(). Reviewed By: rriddle, sammccall Differential Revision: https://reviews.llvm.org/D103356	2021-08-16 09:56:22 +02:00
Craig Topper	b82ce77b2b	[X86] Support avx512fp16 compare instructions in the IntelInstPrinter. This enables printing of the mnemonics that contain the predicate in the Intel printer. This requires accounting for the memory size that is explicitly printed in Intel syntax. Those changes have been synced to the ATT printer as well. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108093	2021-08-16 12:31:36 +08:00
Sanjay Patel	ca637014f1	[Analysis][SimplifyLibCalls] improve function signature check for memcmp This would assert/crash as shown in: https://llvm.org/PR50850 The matching for bcmp/bcopy should probably also be updated, but that's another patch.	2021-08-15 16:11:26 -04:00
Craig Topper	ff95d2524a	[X86] Prevent accidentally accepting cmpeqsh as a valid mnemonic. We should only accept as vcmpeqsh. Same for all the other 31 comparison values.	2021-08-15 12:00:56 -07:00
Craig Topper	819818f7d5	[X86] Modify the commuted load isel pattern for VCMPSHZrm to match VCMPSSZrm/VCMPSDZrm. This allows commuting any immediate value. The previous code only commuted equality immediates. This was inherited from an earlier version of VCMPSSZrm/VCMPSDZrm.	2021-08-15 11:43:56 -07:00
David Blaikie	62a4c2c10e	DWARFVerifier: Check section-relative references at the end of the section This ensures that debug_types references aren't looked for in debug_info section. Behavior is still going to be questionable in an unlinked object file - since cross-cu references could refer to symbols in another .debug_info (or, in theory, .debug_types) chunk - but if a producer only uses ref_addr to refer to things within the same .debug_info chunk in an object file (eg: whole program optimization/LTO - producing two CUs into a single .debug_info section in an object file - the ref_addrs there could be resolved relative to that .debug_info chunk, not needing to consider comdat (DWARFv5 type units or other creatures) chunks of .debug_info, etc)	2021-08-15 11:40:24 -07:00
Craig Topper	786b8fcc9b	[X86] Add vcmpsh/vcmpph to X86InstrInfo::commuteInstructionImpl. They were already added to findCommuteOpIndices, but they also need to be in X86InstrInfo::commuteInstructionImpl in order to adjust the immediate control.	2021-08-15 11:36:13 -07:00
Paul Walker	cd0e196413	[DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post legalisation. visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when removing "redundant" INSERT_SUBVECTOR operations. This patch adds an extra check to ensure such combines only occur after operation legalisation if any resulting BITBAST is itself legal. Differential Revision: https://reviews.llvm.org/D108086	2021-08-15 18:25:49 +01:00
Kazu Hirata	e6e687f2d9	[AsmParser] Remove MDSignedOrUnsignedField (NFC) The last use was removed on Apr 18, 2020 in commit `aad3d578da`.	2021-08-15 09:31:39 -07:00
David Green	c6b7db015f	[InstCombine] Add call to matchSAddSubSat from min/max This adds a call to matchSAddSubSat from smin/smax instrinsics, allowing the same patterns to match if the canonical form of a min/max is an intrinsics, not a icmp/select. Differential Revision: https://reviews.llvm.org/D108077	2021-08-15 17:25:16 +01:00
Roman Lebedev	3d9beefc7d	Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) ... with test change this time. LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:16:04 +03:00
Roman Lebedev	60dd0121c9	Revert "[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Forgot to stage the test change. This reverts commit `78af5cb213`.	2021-08-15 19:15:09 +03:00
Roman Lebedev	78af5cb213	[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:02:34 +03:00
Roman Lebedev	35a8bdc775	[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer Currently/previously, while SCEV guaranteed that it produces the same value, the way it was produced may be illegal IR, so we have an ugly check that the replacement is valid. But now that the SCEV strictness wrt the pointer/integer types has been improved, i believe this invariant is already upheld by the SCEV itself, natively. I think we should add an assertion, wait for a week, and then, if all is good, rip out all this checking. Or we could just do the latter directly i guess. This reverts commit rL127839. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108043	2021-08-15 18:59:32 +03:00
Nikita Popov	944dfa4975	[IndVars] Don't check for pointer exit count (NFC) After recent changes, exit counts and BE taken counts are always integers, so convert these to assertions. While here, also convert the loop invariance checks to asserts. Exit counts are always loop invariant.	2021-08-15 16:49:30 +02:00
Qiu Chaofan	a240b29f21	[NFC] Simply update a FIXME comment X86 overrided LowerOperationWrapper was moved to common implementation in `a7eae62`.	2021-08-15 22:43:46 +08:00
Nikita Popov	3c503ba06a	[FunctionImport] Fix build with old mingw (NFC) std::errc::operation_not_supported is not universally supported. Make use of LLVM's errc interoperability header, which lists known-good errc values.	2021-08-15 15:47:59 +02:00
Harald van Dijk	957334382c	[ExecutionEngine] Check for libunwind before calling __register_frame libgcc and libunwind have different flavours of __register_frame. Both flavours are already correctly handled, except that the code to handle the libunwind flavour is guarded by __APPLE__. This change uses the presence of __unw_add_dynamic_fde in libunwind instead to detect whether libunwind is used, rather than hardcoding it as Apple vs. non-Apple. Fixes PR44074. Thanks to Albert Jin <albert.jin@gmail.com> and Chris Schafmeister <chris.schaf@verizon.net> for identifying the problem. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D106129	2021-08-15 13:35:53 +01:00
Paul Walker	f7a831daa6	[LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested. Previously we emitted a "does not support scalable vectors" remark for all targets whenever vectorisation is attempted. This pollutes the output for architectures that don't support scalable vectors and is likely confusing to the user. Instead this patch introduces a debug message that reports when scalable vectorisation is allowed by the target and only issues the previous remark when scalable vectorisation is specifically requested, for example: #pragma clang loop vectorize_width(2, scalable) Differential Revision: https://reviews.llvm.org/D108028	2021-08-15 12:15:52 +01:00
Nikita Popov	81b106584f	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076	2021-08-15 12:35:52 +02:00
Dávid Bolvanský	49de6070a2	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `435785214f`. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.	2021-08-15 11:44:13 +02:00
Anshil Gandhi	435785214f	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-14 23:37:23 -06:00
Itay Bookstein	530aa7e4da	[Linker] Import GlobalIFunc when importing symbols from another module Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107988	2021-08-14 22:01:11 -07:00
Wang, Pengfei	f1de9d6dae	[X86] AVX512FP16 instructions enabling 2/6 Enable FP16 binary operator instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105264	2021-08-15 08:56:33 +08:00
luxufan	4ec32375bc	[JITLink] Unify x86-64 MachO and ELF 's optimize GOT/Stub function This patch unify optimizeELF_x86_64_GOTAndStubs and optimizeMachO_x86_64_GOTAndStubs into a pure optimize_x86_64_GOTAndStubs Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D108025	2021-08-15 00:33:09 +08:00
Kazu Hirata	915cc69259	[Aarch64] Remove redundant c_str (NFC) Identified with readability-redundant-string-cstr.	2021-08-14 08:49:40 -07:00
eopXD	012173680f	[LoopIdiom] let the pass deal with runtime memset size The current LIR does not deal with runtime-determined memset-size. This patch utilizes SCEV and check if the PointerStrideSCEV and the MemsetSizeSCEV are equal. Before comparison the pass would try to fold the expression that is already protected by the loop guard. Testcase file `memset-runtime.ll`, `memset-runtime-debug.ll` added. This patch deals with proper loop-idiom. Proceeding patch wants to deal with SCEV-s that are inequal after folding with the loop guards. Reviewed By: lebedev.ri, Whitney Differential Revision: https://reviews.llvm.org/D107353	2021-08-14 19:22:06 +08:00
Dawid Jurczak	107401002e	[NFC][DSE] Clean up KnownNoReads and MemorySSAScanLimit in DSE Another simple cleanups set in DSE. CheckCache is removed since `1f1145006b` and in consequence KnownNoReads is useless. Also update description of MemorySSAScanLimit which default value is 150 instead 100. Differential Revision: https://reviews.llvm.org/D107812	2021-08-14 11:26:57 +02:00
Lang Hames	27ea3f1607	[JITLink][x86-64] Rename Relaxable edges to REXRelaxable. The existing relaxable edges all assume a REX prefix. ELF includes non-REX relaxations, so rename these edges to make room for the new kinds.	2021-08-14 18:28:49 +10:00
Lang Hames	632135acae	[JITLink][x86-64] Rename BranchPCRel32ToPtrJumpStub(Relaxable -> Bypassable). ELF allows for branch optimizations other than bypass, so rename this edge kind to avoid any confusion.	2021-08-14 17:49:31 +10:00
Anshil Gandhi	29e11a1aa3	Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop" This reverts commit `c4e5425aa5`.	2021-08-13 23:58:04 -06:00
Anshil Gandhi	c4e5425aa5	[Remarks] Emit optimization remarks for atomics generating CAS loop Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891	2021-08-13 22:44:08 -06:00
Jessica Paquette	50efbf9cbe	[GlobalISel] Narrow binops feeding into G_AND with a mask This is a fairly common pattern: ``` %mask = G_CONSTANT iN <mask val> %add = G_ADD %lhs, %rhs %and = G_AND %add, %mask ``` We have combines to eliminate G_AND with a mask that does nothing. If we combined the above to this: ``` %mask = G_CONSTANT iN <mask val> %narrow_lhs = G_TRUNC %lhs %narrow_rhs = G_TRUNC %rhs %narrow_add = G_ADD %narrow_lhs, %narrow_rhs %ext = G_ZEXT %narrow_add %and = G_AND %ext, %mask ``` We'd be able to take advantage of those combines using the trunc + zext. For this to work (or be beneficial in the best case) - The operation we want to narrow then widen must only be used by the G_AND - The G_TRUNC + G_ZEXT must be free - Performing the operation at a narrower width must not produce a different value than performing it at the original width after masking. Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign. Differential Revision: https://reviews.llvm.org/D107929	2021-08-13 18:31:13 -07:00
Matt Arsenault	cc56152f83	GlobalISel: Add helper function for getting EVT from LLT This can only give an imperfect approximation, but is enough to avoid crashing in places where we call into EVT functions starting from LLTs.	2021-08-13 21:10:13 -04:00
Craig Topper	d63f117210	[RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode.	2021-08-13 18:00:09 -07:00
Matt Arsenault	a77ae4aa6a	AMDGPU: Stop attributor adding attributes to intrinsic declarations	2021-08-13 20:51:48 -04:00
Matt Arsenault	5beb9a0e6a	AMDGPU: Respect compute ABI attributes with unknown OS Unfortunately Mesa is still using amdgcn-- as the triple for OpenGL, so we still have the awkward unknown OS case to deal with. Previously if the HSA ABI intrinsics appeared, we we would not add the ABI registers to the function. We would emit an error later, but we still need to produce some compile result. Start adding the registers to any compute function, regardless of the OS. This keeps the internal state more consistent, and will help avoid numerous test crashes in a future patch which starts assuming the ABI inputs are present on functions by default.	2021-08-13 20:44:46 -04:00
Arthur Eubanks	16e8134e7c	[NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr()	2021-08-13 16:56:42 -07:00
Arthur Eubanks	c19d7f8af0	[CallPromotion] Check for inalloca/byval mismatch Previously we would allow promotion even if the byval/inalloca attributes on the call and the callee didn't match. It's ok if the byval/inalloca types aren't the same. For example, LTO importing may rename types. Fixes PR51397. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107998	2021-08-13 16:52:04 -07:00
Arthur Eubanks	d5ff5ef65e	[NFC] One more AttributeList::getAttribute(FunctionIndex) -> getFnAttr()	2021-08-13 16:49:05 -07:00
Arthur Eubanks	dc41c558dd	[NFC] Make AttributeList::hasAttribute(AttributeList::ReturnIndex) its own method AttributeList::hasAttribute() is confusing. In an attempt to change the name to something that suggests using other methods, fix up some existing uses.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	f80ae58068	[NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex) getAttribute() is confusing, use a clearer method.	2021-08-13 16:27:11 -07:00
Arthur Eubanks	8e9ffa1dc6	[NFC] Cleanup callers of AttributeList::hasAttributes() AttributeList::hasAttributes() is confusing, use clearer methods like hasFnAttrs().	2021-08-13 12:16:52 -07:00
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	a9831cce1e	[NFC] Remove public uses of AttributeList::getAttributes() Use methods that better convey the intent.	2021-08-13 11:38:12 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Arthur Eubanks	a0c42ca56c	[NFC] Remove AttributeList::hasParamAttribute() It's the same as AttributeList::hasParamAttr().	2021-08-13 10:58:21 -07:00
Amy Kwan	581a80304c	[PowerPC] Disable CTR Loop generate for fma with the PPC double double type. It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this intrinsic needs to remain as a call as it cannot be lowered to any instruction, which also means we need to disable CTR loop generation for fma involving the ppcf128 type. This patch accomplishes this behaviour. Differential Revision: https://reviews.llvm.org/D107914	2021-08-13 12:27:24 -05:00
Haowei Wu	571b0d84d2	[IFS] Fix the copy constructor warning in IFSStub.cpp This change fixes the gcc warning on copy constructor in IFSStub.cpp file. Differential Revision: https://reviews.llvm.org/D108000	2021-08-13 10:17:53 -07:00
Alfonso Gregory	17bc82dd3b	[AsmWriter][NFC] Simplify writeDIGenericSubrange Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107469	2021-08-13 09:31:13 -07:00
Jessica Paquette	ccfc079047	[AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT These are lowered, matching SDAG behaviour. (See llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll) These fall back ~159 times on a build of clang with GISel enabled. Differential Revision: https://reviews.llvm.org/D107777	2021-08-13 09:02:25 -07:00
Jamie Schmeiser	64f29e2dd1	Fix bad assert in print-changed code Summary: The assertion that both functions were not missing was incorrect and would fail when one of the functions was missing. Fixed it and moved the assertion earlier to check the input parameters to better capture first-failure. Added lit test. Author: Jamie Schmeiser <schmeise@ca.ibm.com> Reviewed By: aeubanks (Arthur Eubanks) Differential Revision: https://reviews.llvm.org/D107989	2021-08-13 10:54:30 -04:00
Roman Lebedev	0dc6b597db	Revert "[SCEV] Remove premature assert. PR46786" Since then, the SCEV pointer handling as been improved, so the assertion should now hold. This reverts commit `b96114c1e1`, relanding the assertion from commit `141e845da5`.	2021-08-13 17:50:22 +03:00
Roman Lebedev	c46546bd52	Reland "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological"" The commit originally unearthed a problem, reported as https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff#1019890 Now that the problem has been fixed, and the assertion no longer fires, let's see if there are other cases it fires on. This reverts commit `5c8c24d2de`, relanding commit `f30a7dff8a`.	2021-08-13 15:45:03 +03:00
Roman Lebedev	2702fb1148	[SimplifyCFG] Restart if `removeUndefIntroducingPredecessor()` made changes It might changed the condition of a branch into a constant, so we should restart and constant-fold terminator, instead of continuing with the tautological "conditional" branch. This fixes the issue reported at https://reviews.llvm.org/rGf30a7dff8a5b32919951dcbf92e4a9d56c4679ff	2021-08-13 15:45:03 +03:00
Roman Lebedev	5c8c24d2de	Revert "[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological" The assertion does not hold on a provided reproducer. Reverting until after fixing the problem. This reverts commit `f30a7dff8a`.	2021-08-13 13:16:22 +03:00
Dylan Fleming	4be7fb9762	[SVE] Add folds for truncation of vscale Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107453	2021-08-13 10:18:00 +01:00
Rosie Sumpter	46abd1fbe8	[LoopFlatten] Fix assertion failure in checkOverflow There is an assertion failure in computeOverflowForUnsignedMul (used in checkOverflow) due to the inner and outer trip counts having different types. This occurs when the IV has been widened, but the loop components are not successfully rediscovered. This is fixed by some refactoring of the code in findLoopComponents which identifies the trip count of the loop.	2021-08-13 10:07:49 +01:00
luxufan	ee65938357	[JITLink] Update ELF_x86_64 's edge kind to generic edge kind This patch uses a switch statement to map the ELF_x86_64's edge kind to generic edge kind, and merge the ELF_x86_64 's applyFixup function to the x86_64 's applyFixup function. Some edge kinds were not have corresponding generic edge kinds, so I added three generic edge kinds asa follows: 1. RequestGOTAndTransformToDelta64, which is similar to RequestGOTAndTransformToDelta32. 2. GOTDelta64. This generic kind is similar to Delta64, except the GOTDelta64 computes the delta relative to GOTSymbol 3. RequestGOTAndTransformToGOTDelta64. This edge kind was used to deal with ELF_x86_64's GOT64 edge kind, it request the fixGOTEdge function to change the target to GOT entry, and set the edge kind to generic edge kind GOTDelta64. These added generic edge kinds may named haphazardly, or can't express its meaning well. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D107967	2021-08-13 12:53:54 +08:00
Shivam Gupta	835ea22b37	[AVR] Enable machine verifier Reviewed By: mhjacobson, benshi001 Differential Revision: https://reviews.llvm.org/D107853	2021-08-13 12:11:22 +08:00
Michael Kruse	b1de32d6dd	[OMPIRBuilder] Clarify CanonicalLoopInfo. NFC. Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives. CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards. In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0 Also see discussion in D105706. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D107540	2021-08-12 21:02:19 -05:00
Giorgis Georgakoudis	60e643fe05	[OpenMP][Fix] Fix disable spmdization option Besides SPMDization, other analysis and optimization for original, frontend-generated SPMD regions uses information from the AAKernelInfoFunction attribute. This fix makes sure disabling SPMDization through the corresponding option applies only to generic mode regions, which should not be SPMDized, while it leaves unaffected the attribute state of original SPMD regions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108001	2021-08-12 17:59:14 -07:00
Ruiling Song	e1beebbac5	SplitKit: Don't further split subrange mask in buildCopy We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829	2021-08-13 07:36:38 +08:00
Heejin Ahn	adb96d2e76	[WebAssembly] Fix leak in Emscripten SjLj For SjLj, we allocate a table to record setjmp buffer info in the entry of each setjmp-calling function by inserting a `malloc` call, and insert a `free` call to free the buffer before each `ret` instruction. But this is not sufficient; we have to free the buffer before we throw. In SjLj handling, normal functions that can possibly throw or longjmp are wrapped with an invoke and caught within the function so they don't end up escaping the function. But three functions throw and escape the function: - `__resumeException` (Emscripten library function used for Emscripten EH) - `emscripten_longjmp` (Emscripten library function used for Emscripten SjLj) - `__cxa_throw` (libc++abi function called when for C++ `throw` keyword) The first two functions are used to rethrow the current exception/longjmp when the caught exception/longjmp is not for the current function. `__cxa_throw` is used for exception, and because we consider that a function that cannot longjmp, it escapes the function right away, before which we should free the buffer. Currently `lsan.test_longjmp3` and `lsan.test_exceptions_longjmp3` fail in Emscripten; this CL fixes these. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107852	2021-08-12 16:32:46 -07:00
Heejin Ahn	aca198cf74	[WebAssembly] Error out when Emscripten SjLj setjmp is used with Wasm EH Currently, when Wasm EH is used with Emscripten SjLj, Emscripten SjLj cannot handle `invoke` instructions - it assumes all `invoke`s have been lowered away with Emscripten EH. But in Wasm EH they are lowered in instruction selection, so they are still present in the IR stage. This happens when 1. Wasm EH and Emscripten SjLj are used together 2. A function that calls `setjmp` uses exceptions, i.e., has `invoke`s We were already erroring out with an assertion failure in this case, but this CL makes it error out more properly with a valid error message. Wasm EH + Wasm SjLj will not have this restrictions. (it will have another restriction though, e.g., `setjmp` cannot be called within `catch`. But why would anyone do that..) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D107687	2021-08-12 16:19:04 -07:00
Lei Huang	8930af45c3	[PowerPC] Implement XL compatibility builtin __addex Add builtin and intrinsic for `__addex`. This patch is part of a series of patches to provide builtins for compatibility with the XL compiler. Reviewed By: stefanp, nemanjai, NeHuang Differential Revision: https://reviews.llvm.org/D107002	2021-08-12 16:38:21 -05:00
Heejin Ahn	78e87970af	[WebAssembly] Disable offset folding for function addresses Wasm does not support function addresses with offsets, but isel can generate folded SDValues in the form of (@func + offset) without this patch. Fixes https://bugs.llvm.org/show_bug.cgi?id=43133. Reviewed By: dschuff, sbc100 Differential Revision: https://reviews.llvm.org/D107940	2021-08-12 13:40:41 -07:00
Sanjay Patel	14eefa57f2	[InstCombine] factorize min/max intrinsic ops with common operand (2nd try) This is a re-try of `6de1dbbd09` which was reverted because it missed a null check. Extra test for that failure added. Original commit message: This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 16:32:07 -04:00
Amy Huang	427520a8fa	Revert "[InstCombine] factorize min/max intrinsic ops with common operand" This reverts commit `6de1dbbd09` because it causes a compiler crash.	2021-08-12 12:36:25 -07:00
Florian Hahn	f999312872	Recommit "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts the revert `28c04794df`. The failing MLIR test that caused the revert should be fixed in this version. Also includes a PPC test fix previously in `1f87c7c478`.	2021-08-12 18:31:57 +01:00
Craig Topper	79fbddbea0	[RISCV] Teach vsetvli insertion pass that it doesn't need to insert vsetvli for unit-stride or strided loads/stores in some cases. For unit-stride and strided load/stores we set the SEW operand of the pseudo instruction equal the EEW in the opcode. The LMUL of the pseudo instruction is the LMUL we want. These instructions calculate EMUL=(EEW/SEW) * LMUL. We can use this to avoid changing vtype if the SEW/LMUL of the previous vtype matches the EEW/EMUL ratio we need for the instruction. Due to how the global analysis works, we can only do this optimization when the previous vsetvli was produced in the block containing the store. We need to know in the first phase if the vsetvli will be inserted so we can propagate information to the successors in the second phase correctly. This means we can't depend on predecessors. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D106601	2021-08-12 10:05:27 -07:00
Roman Lebedev	f30a7dff8a	[NFCI][SimplifyCFG] simplifyCondBranch(): assert that branch is non-tautological We really shouldn't deal with a conditional branch that can be trivially constant-folded into an unconditional branch. Indeed, barring failure to trigger BB reprocessing, that should be true, so let's assert as much, and hope the assertion never fires. If it does, we have a bug to fix.	2021-08-12 20:03:09 +03:00
Roman Lebedev	628f63d3d5	[SimplifyCFG] If FoldTwoEntryPHINode() changed things, restart Mainly, i want to add an assertion that `SimplifyCFGOpt::simplifyCondBranch()` doesn't get asked to deal with non-unconditional branches, and if i do that, then said assertion fires on existing tests, and this is what prevents it from firing.	2021-08-12 20:03:09 +03:00
Sanjay Patel	790c29ab86	[InstCombine] fold umax/umin intrinsics based on demanded bits This is a direct translation of the select folds added with D53033 / D53036 and another step towards canonicalization using the intrinsics (see D98152).	2021-08-12 12:37:45 -04:00
maekawatoshiki	dd3eea6566	[LICM] Support sinking in LNICM Currently, LNICM pass does not support sinking instructions out of loop nest. This patch enables LNICM to sink down as many instructions to the exit block of outermost loop as possible. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D107219	2021-08-13 00:56:26 +09:00
Sanjay Patel	cd44cc86e3	[InstCombine] remove unused function argument; NFC This was just added with `6de1dbbd09` , and I missed pulling the extra arg from the final revision.	2021-08-12 11:47:25 -04:00
Johannes Doerfert	4e7d7cae67	[Attributor][FIX] Do not try to rewrite functions with casted call sites If we cast a function at the call site it is hard(er) to get the rewrite correct, let's not attempt it for now. Fixes PR51448.	2021-08-12 10:39:53 -05:00
Johannes Doerfert	5f543919b2	[Attributor][FIX] Guard constant casts with type size checks	2021-08-12 10:39:53 -05:00
Johannes Doerfert	a420f80bf1	[Attributor] Do not delete volatile stores to null/undef See D106309. Differential Revision: https://reviews.llvm.org/D107906	2021-08-12 10:39:52 -05:00
David Green	ae9a346ef8	[ARM] Fix DAG combine loop in reduction distribution Given a constant operand, the MVE and DAGCombine combines could fight, each redistributing in the opposite order. Add a guard to the MVE vecreduce distribution to prevent that.	2021-08-12 16:37:39 +01:00
Sanjay Patel	be0698559b	[InstCombine] remove shl(neg x), y transform This diff was accidentally committed with: `1b5a195845`	2021-08-12 11:27:22 -04:00
Sanjay Patel	6de1dbbd09	[InstCombine] factorize min/max intrinsic ops with common operand This is an adaptation of D41603 and another step on the way to canonicalizing to the intrinsic forms of min/max. See D98152 for status.	2021-08-12 11:19:09 -04:00
Sanjay Patel	1b5a195845	[InstCombine] add tests for factorization of min/max intrinsics; NFC	2021-08-12 11:19:09 -04:00
Victor Huang	99e00663d4	[PowerPC] Fix return address computation for "__builtin_return_address" When depth > 0, callee frame address is used to compute the return address of callee producing improper return address. This patch adds the fix to use caller frame address to compute the return address of callee. Reviewed By: nemanjai, #powerpc Differential revision: https://reviews.llvm.org/D107646	2021-08-12 09:44:49 -05:00
Liqiang Tao	422fc5603a	[llvm][Inline] Refactor out InlineOrder Move InlineOrder to separated file. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D107831	2021-08-12 22:19:53 +08:00
Mehdi Amini	28c04794df	Revert "[Matrix] Overload stride arg in matrix.columnwise.load/store." This reverts commit `a1ef81de35`. Broke the MLIR buildbot.	2021-08-12 11:57:19 +00:00
Florian Hahn	a1ef81de35	[Matrix] Overload stride arg in matrix.columnwise.load/store. This patch adjusts the intrinsics definition of llvm.matrix.column.major.load and llvm.matrix.column.major.store to allow overloading the type of the stride. The bitwidth of the stride is used to perform the offset computation. This fixes a crash when using __builtin_matrix_column_major_load or __builtin_matrix_column_major_store on 32 bit platforms. The stride argument of the builtins are defined as `size_t`, which is 32 bits wide on 32 bit platforms. Note that we still perform offset computations with 64 bit width on 32 bit platforms for accesses that do not take a user-specified stride. This can be fixed separately. Fixes PR51304. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D107349	2021-08-12 10:45:25 +01:00
David Truby	9c47d6b48d	[llvm][sve] Lowering for VLS extending loads This patch enables extending loads for fixed length SVE code generation. There is a slight regression here in the mulh tests; since these tests load the parameter and then extend it these are treated as extending loads which are merged, preventing the mulh instruction from being generated. As this affects scalable SVE codegen as well this should be addressed in a separate patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D107057	2021-08-12 09:43:39 +00:00
Cullen Rhodes	419deccfd1	[AArch64] NFC: Remove register decoder tables in disassembler The register classes are generated by TableGen, use them instead of handwritten tables. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107763	2021-08-12 07:28:56 +00:00
Fangrui Song	67d4d7cf68	[Object] Add missing PPC_DYNAMIC_TAG macros	2021-08-12 00:05:04 -07:00
Christudasan Devadasan	5d940b71ae	Reapply "SROA: Enhance speculateSelectInstLoads" Originally committed as `ffc3fb665d` Reverted in `fcf2d5f402` due to an assertion failure. Original commit message: Allow the folding even if there is an intervening bitcast. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D106667	2021-08-11 22:58:54 -04:00
Amara Emerson	73056f239e	[AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules. These rules were originally written when the new predicate based legalizer was introduced in an attempt to preserve existing behaviour. It wasn't properly kept up to date as things like vector support was split out into G_CONCAT_VECTORS, and frankly, even if it was, it was too complex. It's much easier to start from scratch with what we can actually support, which is just a few type combinations. Anything illegal we should either legalize, or should be eliminated as a side effect of artifact combination. Differential Revision: https://reviews.llvm.org/D107937	2021-08-11 16:45:23 -07:00

1 2 3 4 5 ...

149978 Commits