llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	128a49727a	[AMDGPU] Fix upcoming TableGen warnings on unused template arguments. NFC. The warning is implemented by D109359 which is still in review. Differential Revision: https://reviews.llvm.org/D109826	2021-09-16 09:07:18 +01:00
Sam Parker	c98a8a09b5	[HardwareLoops] Loop guard intrinsic to recognise zext If a loop count was initially represented by a 32b unsigned int in C then the hardware-loop pass can recognise the loop guard and insert the llvm.test.set.loop.iterations intrinsic. If this was instead a unsigned short/char then clang inserts a zext instruction to expand the loop count to an i32. This patch adds the necessary pattern matching to enable the use of lvm.test.set.loop.iterations in those cases. Patch by: sherwin-dc Differential Revision: https://reviews.llvm.org/D109631	2021-09-16 08:33:16 +01:00
Alok Kumar Sharma	a5b72abc9e	[DebugInfo] Enhance DIImportedEntity to accept children entities New field `elements` is added to '!DIImportedEntity', representing list of aliased entities. This is needed to dump optimized debugging information where all names in a module are imported, but a few names are imported with overriding aliases. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D109343	2021-09-16 10:41:55 +05:30
Kazu Hirata	24c8eaec94	[Transforms] Use make_early_inc_range (NFC)	2021-09-15 19:55:24 -07:00
Jessica Paquette	c8b3d7d6d6	[AArch64][GlobalISel] Ensure atomic loads always get assigned GPR destinations The default register bank selection code for G_LOAD assumes that we ought to use a FPR when the load is casted to a float/double. For atomics, this isn't true; we should always use GPRs. Without this patch, we crash in the following example: https://godbolt.org/z/MThjas441 Also make the code a little more stylistically consistent while we're here. Also test some other weird cast combinations as well. Differential Revision: https://reviews.llvm.org/D109771	2021-09-15 17:05:09 -07:00
Ahmed Bougacha	e159d3cbfc	[AArch64][GlobalISel] Use MI::getIntrinsicID in more spots. NFC. There's technically a difference in the logic used by these findIntrinsicID and MachineInstr::getIntrinsicID, but it shouldn't be a meaningful difference here, with G_INTRINSIC instructions. getIntrinsicID's "first non-def" logic should be correct for those.	2021-09-15 16:45:34 -07:00
Ahmed Bougacha	94a2f9cdb6	[GlobalISel] Fix CombinerHelper::isPredecessor for same def/use MI. The doc comment for isPredecessor says: Returns true if \p DefMI precedes \p UseMI or they are the same instruction. And dominates relies on that behavior for its own: Returns true if \p DefMI dominates \p UseMI. By definition an instruction dominates itself. Make both statements correct by fixing isPredecessor. Found by inspection.	2021-09-15 16:45:27 -07:00
Arthur Eubanks	c3ddc13d7d	[NFC] Split up PassBuilder.cpp PassBuilder.cpp is the slowest file to compile in LLVM. When trying to test changes to pipelines, it takes a long time to recompile. This doesn't actually speedup building PassBuilder.cpp itself since most of the time is spent in other large/duplicated functions caused by PassRegistry.def. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109798	2021-09-15 15:30:39 -07:00
Owen Anderson	68079ef0eb	Teach SimplifyCFG to fold switches into lookup tables in more cases. In particular, it couldn't handle cases where lookup table constant expressions involved bitcasts. This does not seem to come up frequently in C++, but comes up reasonably often in Rust via `#[derive(Debug)]`. Originally reported by pcwalton. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109565	2021-09-15 22:07:08 +00:00
Anna Thomas	f9e4aebe4a	Revert "[InstCombine] Improve TryToSinkInstruction with multiple uses" This reverts commit `4ac4e52189`. There are couple of test failures, which needs update of the test cases. Doing a clean revert and will recommit the change along with fixed testcases.	2021-09-15 18:03:11 -04:00
David Blaikie	065bb08bb8	NFC: DWARFTypePrinter: Remove "type" from member function names to reduce redundancy	2021-09-15 14:46:28 -07:00
Anna Thomas	b6cb03e6b9	Revert use of getUniqueUndroppableUser in AssumeBundleBuilder Fix build bot failure in rG4ac4e521 caused due to assumeBundleBuilder using new API (getUniqueUndroppableUser). We now continue using the existing API for AssumeBundleBuilder (getSingleUndroppableUser). Sorry for the noise here. Tests-Run: failing testcase passes.	2021-09-15 17:45:09 -04:00
Matt Arsenault	87c00878d3	SplitKit: Remove decade old live interval hack This was trying to fixup broken live intervals coming out of the coalescer. The verifier is more complete now and no tests seem to fail without this.	2021-09-15 17:35:59 -04:00
Anna Thomas	3273430406	Re-add getSingleUndroppableUse API The API was removed in `4ac4e52189` in favor of getUniqueUndroppableUser. However, this caused a buildbot failure in AbstractCallSiteTest.cpp, which uses the API and the AbstractCallSite class requires a "use" rather than a user. Retain the API so that the unittest compiles and passes.	2021-09-15 17:06:20 -04:00
Anna Thomas	4ac4e52189	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also, the API for retrieving undroppable user has been updated accordingly since in both usecases (Attributor and InstCombine), we seem to care about the user, rather than the use. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-15 20:39:38 +00:00
Kazu Hirata	385f380e80	[MemorySSA] Fix "set but not used" warnings	2021-09-15 11:41:41 -07:00
Sanjay Patel	e5a32d720e	[InstCombine] move extend after insertelement if both operands are extended I was wondering how instcombine does on the examples in D109236, and we're missing a basic transform: inselt (ext X), (ext Y), Index --> ext (inselt X, Y, Index) https://alive2.llvm.org/ce/z/z2aBu9 Note that there are several possible extensions of this fold (see TODO comments). Differential Revision: https://reviews.llvm.org/D109537	2021-09-15 14:38:03 -04:00
Philip Reames	9bdb19cca2	[SCEV] (udiv X, Y) * Y is always NUW Motivated by the removal done in D109782. This implements the correct flag part generically. Differential Revision: https://reviews.llvm.org/D109786	2021-09-15 11:34:50 -07:00
Alina Sbirlea	b759381b75	[MemorySSA] Add verification levels to MemorySSA. [NFC] Add two levels of verification for MemorySSA: Fast and Full. The defaults are kept the same. Full verification always occurs under EXPENSIVE_CHECKS, but now it can also be requested in a specific pass for debugging purposes.	2021-09-15 11:09:54 -07:00
Filipp Zhinkin	f5d8952356	[InstCombine] Transform X == 0 ? 0 : X * Y --> X * freeze(Y) Enabled mul folding optimization that was previously disabled by being incorrect. To preserve correctness, mul's operand that is not compared with zero in select's condition is now frozen. Related bug: https://bugs.llvm.org/show_bug.cgi?id=51286 Correctness: https://alive2.llvm.org/ce/z/bHef7J https://alive2.llvm.org/ce/z/QcR7sf https://alive2.llvm.org/ce/z/vvBLzt https://alive2.llvm.org/ce/z/jGDXgq https://alive2.llvm.org/ce/z/3Pe8Z4 https://alive2.llvm.org/ce/z/LGga8M https://alive2.llvm.org/ce/z/CTG5fs Differential Revision: https://reviews.llvm.org/D108408	2021-09-15 09:04:06 -04:00
Simon Pilgrim	0767e43d87	[CostModel][X86] Adjust bitreverse/ctpop/ctlz/cttz AVX2+ costs based on llvm-mca reports Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).	2021-09-15 13:04:40 +01:00
Martin Storsjö	b33a43e57c	[ARM] Move fetching of ARMSubtarget into the scopes that need it. NFC. This was requested in D38253, but missed back then. Differential Revision: https://reviews.llvm.org/D109046	2021-09-15 15:03:20 +03:00
David Green	a2332d5332	[ARM] Prevent continuous folding of SUBC Under some situations under Thumb1, we could be stuck in an infinite loop recombining the same instruction. This puts a limit on that, not combining SUBC with SUBE repeatedly.	2021-09-15 11:23:32 +01:00
David Green	61cc873a8e	[LV] Recognize intrinsic min/max reductions This extends the reduction logic in the vectorizer to handle intrinsic versions of min and max, both the floating point variants already created by instcombine under fastmath and the integer variants from D98152. As a bonus this allows us to match a chain of min or max operations into a single reduction, similar to how add/mul/etc work. Differential Revision: https://reviews.llvm.org/D109645	2021-09-15 10:45:50 +01:00
Simon Pilgrim	dcba994184	[X86] combineX86ShuffleChain - ensure we only peek through bitcasts to vectors (PR51858) When searching for hidden identity shuffles (added at rG41146bfe82aecc79961c3de898cda02998172e4b), only peek through bitcasts to the source operand if it is a vector type as well.	2021-09-15 10:21:05 +01:00
Simon Atanasyan	533471ff2f	[MIPS] Remove unused tblgen template args. NFC Identified in D109359.	2021-09-15 12:16:07 +03:00
Cullen Rhodes	18655140d6	[NVPTX] NFC: Remove unused imm type intrinsic arg Identified in D109359. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109755	2021-09-15 08:56:51 +00:00
Florian Hahn	e90d55e1c9	[VPlan] Support sinking recipes with uniform users outside sink target. This is a first step towards addressing the last remaining limitation of the VPlan version of sinkScalarOperands: the legacy version can partially sink operands. For example, if a GEP has uniform users outside the sink target block, then the legacy version will sink all scalar GEPs, other than the one for lane 0. This patch works towards addressing this case in the VPlan version by detecting such cases and duplicating the sink candidate. All users outside of the sink target will be updated to use the uniform clone. Note that this highlights an issue with VPValue naming. If we duplicate a replicate recipe, they will share the same underlying IR value and both VPValues will have the same name ir<%gep>. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104254	2021-09-15 09:21:39 +01:00
Xiang1 Zhang	1f1c71aeac	[X86][InlineAsm] Use mem size information (*word ptr) for "global variable + registers" memory expression in inline asm. Differential Revision: https://reviews.llvm.org/D109739	2021-09-15 16:11:14 +08:00
Amara Emerson	5ec1845cad	[AArch64][GlobalISel] Add a new reassociation for G_PTR_ADDs. G_PTR_ADD (G_PTR_ADD X, C), Y) -> (G_PTR_ADD (G_PTR_ADD(X, Y), C) Improves CTMark -Os on AArch64: Program before after diff sqlite3 286932 287024 0.0% kc 432512 432508 -0.0% SPASS 412788 412764 -0.0% pairlocalalign 249460 249416 -0.0% bullet 475740 475512 -0.0% 7zip-benchmark 568864 568356 -0.1% consumer-typeset 419088 418648 -0.1% tramp3d-v4 367628 367224 -0.1% clamscan 383184 382732 -0.1% lencod 430028 429284 -0.2% Geomean difference -0.1% Differential Revision: https://reviews.llvm.org/D109528	2021-09-14 23:57:41 -07:00
Markus Lavin	1ac209ed76	[NPM] Added -print-pipeline-passes print params for a few passes. Added '-print-pipeline-passes' printing of parameters for those passes declared with _WITH_PARAMS macro in PassRegistry.def. Note that it only prints the parameters declared inside _WITH_PARAMS as in a few cases there appear to be additional parameters not parsable. The following passes are now covered (i.e. all of those with *_WITH_PARAMS in PassRegistry.def). LoopExtractorPass - loop-extract HWAddressSanitizerPass - hwsan EarlyCSEPass - early-cse EntryExitInstrumenterPass - ee-instrument LowerMatrixIntrinsicsPass - lower-matrix-intrinsics LoopUnrollPass - loop-unroll AddressSanitizerPass - asan MemorySanitizerPass - msan SimplifyCFGPass - simplifycfg LoopVectorizePass - loop-vectorize MergedLoadStoreMotionPass - mldst-motion GVN - gvn StackLifetimePrinterPass - print<stack-lifetime> SimpleLoopUnswitchPass - simple-loop-unswitch Differential Revision: https://reviews.llvm.org/D109310	2021-09-15 08:34:04 +02:00
Matt Arsenault	54d755a034	DAG: Fix incorrect folding of fmul -1 to fneg The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.	2021-09-14 21:25:02 -04:00
Hongtao Yu	299b5d420d	[CSSPGO] Enable pseudo probe instrumentation in O0 mode. Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode. Reviewed By: wenlei, wlei, wmi Differential Revision: https://reviews.llvm.org/D109531	2021-09-14 18:13:29 -07:00
Matt Arsenault	4a36e96c3f	RegAllocGreedy: Account for reserved registers in num regs heuristic This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.	2021-09-14 21:00:29 -04:00
Matt Arsenault	88146230e1	SeparateConstOffsetFromGEP: Fix stack overflow in unreachable code ConstantOffsetExtractor::Find was infinitely recursing on the add referencing itself.	2021-09-14 19:49:38 -04:00
Matt Arsenault	fdd9761dd1	Attributor: Fix crash on undef in !callees	2021-09-14 19:49:34 -04:00
Matt Arsenault	f12174204c	AMDGPU: Rename attributor class for uniform-work-group-size This isn't really an AMDGPU specific attribute and could be moved to generic code. It's also important to include the word uniform in the name.	2021-09-14 19:49:08 -04:00
David Blaikie	4cabaf594a	NFC: DebugInfo: refactor pretty printing into a utility class Laying more foundation for full template name rebuilding - more complex type printing benefits from an object to carry some state rather than passing it around as parameters to every function.	2021-09-14 15:54:29 -07:00
Philip Reames	0dd755f027	[SCEV] Stop applying contextual flags in applyLoopGuards This fixes a violation of the wrap flag rules introduced in `c4048d8f`. As noted in the original review, the NUW is legal to infer from the structure of the replacee, but a) there's no test coverage, and b) this should be done generically for all multiplies. Differential Revision: https://reviews.llvm.org/D109782	2021-09-14 14:14:52 -07:00
David Tenty	26b8031774	[CMake][AIX] Disable visibility options in build Visibility options currently have limited support on AIX and may cause warnings or errors depending on the build compiler used. Reviewed By: ZarkoCA Differential Revision: https://reviews.llvm.org/D108467	2021-09-14 16:05:12 -04:00
Heejin Ahn	468c4409f6	Revert "[WebAssembly] Rethrow longjmp in EH handling if EmSjLj is enabled" This reverts commit `b7b4ebbcfa`. Reason: This breaks several code-size tests in Emscripten test suite because this exports `emscripten_longjmp` for programs that didn't do it before.	2021-09-14 12:59:42 -07:00
Joe Nash	3ce1b9631a	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde	2021-09-14 15:11:27 -04:00
Florian Hahn	7359450e6a	[VPlan] Queue (block, operand) pairs together (NFC). Instead of discovering the sink-to block for each operand in the main loop, the sink-to block can instead be directly queued with the operands. This simplifies processing in the main loop and is a NFC change split off from D104254 as suggested there.	2021-09-14 20:02:51 +01:00
Bjorn Pettersson	cd2bff1ef1	[StackColoring] Fix a debug invariance problem Ignore dbg instructions when collecting stack slot markers. This is to make sure the coloring is invariant regarding presence of dbg instructions (even in cases when the dbg instructions might be badly placed in the input). Differential Revision: https://reviews.llvm.org/D109758	2021-09-14 19:21:56 +02:00
Kazu Hirata	d9e46beace	[IPO] Use make_early_inc_range (NFC)	2021-09-14 08:59:36 -07:00
Sam Clegg	6ee55f9ab5	Fix test failure created by `ef8c9135ef` Followup to https://reviews.llvm.org/D108877 to fix test failure.	2021-09-14 07:35:05 -07:00
Sam Clegg	ef8c9135ef	[WebAssembly] Allow import and export of TLS symbols between DSOs We previously had a limitation that TLS variables could not be exported (and therefore could also not be imported). This change removed that limitation. Differential Revision: https://reviews.llvm.org/D108877	2021-09-14 06:47:37 -07:00
Amy Kwan	5041a485b9	[PowerPC] Exploit Prefixed Load/Stores using the refactored Load/Store Implementation This patch exploits the prefixed load and store instructions utilizing the refactored load/store implementation introduced in D93370. Prefixed load and store instructions are emitted whenever we are loading or storing a value with an offset that fits into a 34-bit signed immediate. Patterns for the prefixed load and stores are added in this patch, as well as the implementation that detects when we are loading and storing a value with an offset that fits in 34-bits. Differential Revision: https://reviews.llvm.org/D96075	2021-09-14 08:39:49 -05:00
Florian Hahn	e248d69036	Recommit "[LAA] Support pointer phis in loop by analyzing each incoming pointer." SCEV does not look through non-header PHIs inside the loop. Such phis can be analyzed by adding separate accesses for each incoming pointer value. This results in 2 more loops vectorized in SPEC2000/186.crafty and avoids regressions when sinking instructions before vectorizing. Fixes PR50296, PR50288. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D102266	2021-09-14 11:19:12 +01:00
David Green	5a6dfbb8cd	[ARM] Teach DemandedVectorElts about VMOVN lanes The class of instructions that write to narrow top/bottom lanes only demand the even or odd elements of the input lanes. Which means that a pair of VMOVNT; VMOVNB demands no lanes from the original input. This teaches that to instcombine from the target hooks available through ARMTTIImpl. Differential Revision: https://reviews.llvm.org/D109325	2021-09-14 11:05:31 +01:00
Tim Northover	f287405419	AArch64: fix indentation of ProcAppleA14. NFC.	2021-09-14 10:04:15 +01:00
Cullen Rhodes	6fbc167c0a	[WebAssembly] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D109689	2021-09-14 08:26:15 +00:00
Cullen Rhodes	742cf3996e	[AArch64] NFC: Use 'asm' in SIMDScalarCPY Fixes a warning identified in D109359. The mnemonic is also mov, not cpy. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D109573	2021-09-14 08:26:15 +00:00
Martin Storsjö	ac3edc4c97	[Win64EH] Write .pdata symbol relocations relative to the temporary begin symbol Previously the relocations pointed at the public user facing, possibly external symbol. When the function itself is weak, that symbol may be overridden at link time, pointing at another strong implementation of the same function instead. In that case, there's two conflicting pdata entries pointing at the same address, and the wrong unwind info might end up used. Both GCC/binutils and MSVC produce pdata pointing at internal static symbols. (GCC/binutils point at the .text section just as LLVM does after this change, MSVC points at special label type symbols with the type IMAGE_SYM_CLASS_LABEL and names like '$LN4'.) This fixes unwinding through an overridden "operator new" with a statically linked C++ library in MinGW mode. (Building libc++ with -ffunction-sections and linking with --gc-sections might avoid the issue too.) This makes the produced object files a little less user friendly to debug, but with other recent improvements for llvm-readobj, the unwind info debugging experience should be pretty much the same. Differential Revision: https://reviews.llvm.org/D109651	2021-09-14 11:05:37 +03:00
Heejin Ahn	e85ed44373	[WebAssembly] Fix a typo in comments	2021-09-14 00:45:02 -07:00
Esme-Yi	b98c3e957f	[yaml2obj][XCOFF] add the SectionIndex field for symbol. Summary: Add the SectionIndex field for symbol. 1: a symbol can reference a section by SectionName or SectionIndex. 2: a symbol can reference a section by both SectionName and SectionIndex. 3: if both Section and SectionIndex are specified, but the two values refer to different sections, an error will be reported. 4: an invalid SectionIndex is allowed. 5: if a symbol references a non-existent section by SectionName, an error will be reported. Reviewed By: jhenderson, Higuoxing Differential Revision: https://reviews.llvm.org/D109566	2021-09-14 06:18:03 +00:00
Chris Lattner	8b4afc5aef	[APInt] Add a concat method, use LLVM_UNLIKELY to help optimizer. Three unrelated changes: 1) Add a concat method as a convenience to help write bitvector use cases in a nicer way. 2) Use LLVM_UNLIKELY as suggested by @xbolva00 in a previous patch. 3) Fix casing of some "slow" methods to follow naming standards. Differential Revision: https://reviews.llvm.org/D109620	2021-09-13 22:02:54 -07:00
Chen Zheng	946e69d253	[PowerPC] prepare more loop load/store instructions PPCLoopInstrFormPrep pass now can prepare for load store instructions in a loop whose increment is not a constant integer. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D105872	2021-09-14 05:00:48 +00:00
Matt Arsenault	c305513cc2	AMDGPU: Fix assert with indirect call with known required inputs The attributor can determine that some indirect calls do not require special inputs. The special inputs will still be present in the ABI, so we need to allocate the registers and pass undefs.	2021-09-13 22:54:11 -04:00
Lang Hames	2c8e784915	[ORC] Add Shared/OrcRTBridge, and TargetProcess/OrcRTBootstrap. This is a small first step towards reorganization of the ORC libraries: Declarations for types and function names (as strings) to be found in the "ORC runtime bootstrap" set are moved into OrcRTBridge.h / OrcRTBridge.cpp. The current implementation of the "ORC runtime bootstrap" functions is moved into OrcRTBootstrap.h and OrcRTBootstrap.cpp. It is likely that this code will eventually be moved into ORT-RT proper (in compiler RT). The immediate goal of this change is to make these bootstrap functions usable for clients other than SimpleRemoteEPC/SimpleRemoteEPCServer. The first planned client is a new RuntimeDyld::MemoryManager that will run over EPC, which will allow us to remove the old OrcRemoteTarget code.	2021-09-14 10:19:45 +10:00
Brendon Cahoon	42dace9c5b	[Hexagon] Use getTypeAllocSize to compute difference between objects The code was using getTypeStoreSize to calculate the difference between consecutive objects. The calculation was incorrect due to padding that is added between consecutive objects. The getTypeAllocSize includes the padding amount. For example, if the type is [19 x i8], the difference between consecutive objects is 32 bytes, not 19 bytes. A second case for getTypeAllocSize is needed when computing the pointer values for the vector accesses. The calculation needs to account for the padding as well. Differential Revision: https://reviews.llvm.org/D109403	2021-09-13 19:04:59 -05:00
Ankit Aggarwal	a72763af67	[Hexagon] Handle bitcast of i64/i128 -> v64i1/v128i1	2021-09-13 18:52:30 -05:00
Kuba Mracek	e80ee4cbd9	[GlobalDCE] In VFE support for relative pointers, allow GEP references to the base symbol This is for Swift VFE support. In some vtable forms that Swift emits, the "base" of a relative pointer is not the global symbol itself directly, but a GEP into it -- so the pointer is relative to a particular field in the global. So getPointerAtOffset() needs to be able to see through the GEP and allow it in a SUB expression, to correctly recognize the offset as a vtable slot. Differential Revision: https://reviews.llvm.org/D109169	2021-09-13 15:22:11 -07:00
Heejin Ahn	c55b6c593b	[WebAssembly] Handle _setjmp and _longjmp in SjLj In some platforms `_setjmp` and `_longjmp` are used instead of `setjmp` and `longjmp`. This CL adds support for them. Fixes https://github.com/emscripten-core/emscripten/issues/14999. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D109669	2021-09-13 14:20:04 -07:00
Heejin Ahn	b7b4ebbcfa	[WebAssembly] Rethrow longjmp in EH handling if EmSjLj is enabled This is a fix on top of D106525's Case 2. In D106525, in `runEHOnFunction` which handles Emscripten EH, We rethrow `longjmp` only when the module has any usage of `setjmp` or `longjmp`. But now Wasm object files are linked using wasm-ld, the module this pass sees is not the whole program, and even if this module does not contain any `longjmp`, another file can contain it and can be linked with the current module. This enables the rethrowing of longjmp whenever Emscripten SjLj is enabled, regardless of whether it is used in this module or not. Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D109670	2021-09-13 14:15:25 -07:00
Florian Mayer	0a22510f3e	[value-tracking] see through returned attribute. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109675	2021-09-13 20:52:26 +01:00
Florian Mayer	5b5d774f5d	[hwasan] Respect returns attribute when tracking values. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109233	2021-09-13 20:52:24 +01:00
Jon Chesterfield	71052ea1e3	[openmp] Apply code change from D109500	2021-09-13 18:33:53 +01:00
Jon Chesterfield	bfcf979978	Revert "[openmp] Fix 51647, corrupt bitcode on amdgpu" This reverts commit `d5c049a3f6`. Going to re-commit it in pieces for easier application to 13	2021-09-13 18:25:07 +01:00
Philip Reames	6fec6552f5	Revert "[IndVars] Replace PHIs if loop exits on 1st iteration" This reverts commit `5a6dfb27ca`. See original review for why.	2021-09-13 10:11:18 -07:00
Philip Reames	5746c76f3f	Revert "[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration" This reverts commit `d9ca444835`. See review for why.	2021-09-13 10:10:49 -07:00
vnalamot	726b5d3416	[RegScavenger][NFC] Refer to the already initialized local variable for spill slot index Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109501	2021-09-13 21:55:33 +05:30
Kazu Hirata	abca4c012f	[Utils] Use make_early_inc_range (NFC)	2021-09-13 08:57:23 -07:00
Simon Pilgrim	9db20822f7	[APInt] Add APIntOps::ScaleBitMask helper APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions. When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type. This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation. This came up on D109065 where we currently have to add yet another implementation of the same code. Differential Revision: https://reviews.llvm.org/D109683	2021-09-13 16:27:12 +01:00
vnalamot	0fc3ebb70a	[SelectionDAG][NFC] Fix typo in VerifyDAGDiverence() function name Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109674	2021-09-13 20:48:04 +05:30
dpalermo	d5c049a3f6	[openmp] Fix 51647, corrupt bitcode on amdgpu Patch by @dpalermo The corrupt bitcode reported in https://bugs.llvm.org/show_bug.cgi?id=51647 seems to be a result of a later pass changing the workfn variable to addrspace(5) (thread private, on the stack). That seems reasonable for an alloca without an address space so it's an open question why that can crash the bitcode reader. This change puts it in the thread private address space to begin with which means whatever misfired further down the pipeline does not break it. That matches the codegen from clang where stack variables are always annotated (5) and then addrspace cast prior to following use. This therefore patches around whatever unsuccessfully moved the alloca variable to addrspace(5). That solves the problem of openmp opt producing code that crashes the bitcode reader. It should be possible to create a minimal repro for the underlying bug based on some handwritten IR that uses an alloca in a generic address space. Reviewed By: ronlieb, jdoerfert, dpalermo-phab Differential Revision: https://reviews.llvm.org/D109500	2021-09-13 15:24:48 +01:00
Anna Thomas	b4e787d8f4	[InstCombining] Refactor checks for TryToSinkInstruction. NFC Moved out the checks for profitability of TryToSinkInstructions into a lambda function. This will also allow us to easily add checks for bailing out if the transform is not profitable. Tests-Run: instCombine tests.	2021-09-13 09:04:34 -04:00
Stefan Gränitz	9691851582	[JITLink] Factor out forEachRelocation() function from addRelocations() in ELF Aarch64 backend (NFC) First step in reducing redundancy in `addRelocations()` implementations across ELF JITLink backends. The patch factors out common logic for ELF relocation traversal into the new helper function `forEachRelocation()` in the `ELFLinkGraphBuilder` base class. For now, this is applied to the Aarch64 implementation. Others may follow soon. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D109516	2021-09-13 14:59:38 +02:00
Tim Northover	5d070c8259	SwiftAsync: use runtime-provided flag for extended frame if back-deploying When back-deploying Swift async code we can't always toggle the flag showing an extended frame is present because it will confuse unwinders on systems released before this feature. So in cases where the code might run there, we `or` in a mask provided by the runtime (as an absolute symbol) telling us whether the unwinders can cope. When deploying only for newer OSs, we can still hard-code the bit-set for greater efficiency.	2021-09-13 13:54:46 +01:00
Cullen Rhodes	1d771e19fd	[AArch64] NFC: Remove unused template args Identified in D109359. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D109491	2021-09-13 10:39:33 +00:00
Florian Hahn	c24fc37e47	[VectorCombine] Support AND/UREM indices that require freezing. `38b098be66` limited scalarization to indices that are known non-poison. For certain patterns that restrict the range of an index, we can insert a freeze of the original value, to prevent propagation of poison. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107580	2021-09-13 11:21:45 +01:00
David Truby	915e9e76bf	[llvm][sve] Lowering for VLS masked extending loads This extends the custom lowering for extending loads on fixed length vectors in SVE to support masked extending loads. The existing tests for correct behaviour of masked extending loads exhibit bad code generation due to the legalistaion of i1 vectors. They have been left as-is and new tests have been added that do not exhibit this behaviour. Differential Revision: https://reviews.llvm.org/D108200	2021-09-13 11:13:25 +01:00
Cullen Rhodes	97a6d76694	[Hexagon] NFC: Remove unused tblgen template args Identified in D109359. Reviewed By: kparzysz Differential Revision: https://reviews.llvm.org/D109604	2021-09-13 10:09:08 +00:00
Cullen Rhodes	9e435c96de	[Lanai] NFC: Remove unused tblgen template arg 'OpNode' Identified in D109359. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D109606	2021-09-13 10:09:08 +00:00
Jingu Kang	2a26d47a2d	[LoopBoundSplit] Check the start value of split cond AddRec After transformation, we assume the split condition of the pre-loop is always true. In order to guarantee it, we need to check the start value of the split cond AddRec satisfies the split condition. Differential Revision: https://reviews.llvm.org/D109354	2021-09-13 10:32:35 +01:00
Jay Foad	477b9bc9f7	[AMDGPU] Minor cleanup after D109483. NFC.	2021-09-13 10:27:15 +01:00
Esme-Yi	909f3d7380	[yaml2obj][XCOFF] customize the string table Summary: The patch adds support for yaml2obj customizing the string table. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D107421	2021-09-13 09:24:38 +00:00
David Sherwood	bbada9ff45	[NFC] Replace unsigned VF with ElementCount in EpilogueLoopVectorizationInfo This patch simply replaces any unsigned VFs with ElementCounts. It's still NFC because at the moment epilogue vectorisation is disabled when the main vector loop uses scalable vectors. Differential Revision: https://reviews.llvm.org/D109364	2021-09-13 10:18:30 +01:00
Jim Lin	f29336104d	[RISCV] Rename prefix `FeatureExt` to `FeatureStdExt` for all sub-extension Rename prefix `FeatureExt` to `FeatureStdExt` for all sub-extension for consistency Reviewed By: HsiangKai, asb Differential Revision: https://reviews.llvm.org/D108187	2021-09-13 16:24:15 +08:00
Esme-Yi	ea81898d0f	[XCOFF] Fix the program abortion issue in XCOFFObjectFile::getSectionContents. Summary: Use std::move(E) to avoid `Program aborted due to an unhandled Error` Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D109567	2021-09-13 07:54:33 +00:00
Simon Pilgrim	65ad09da0e	[X86][SLM] Fix DIVPD/DIVPS/RCPPS/RSQRTPS/SQRTPD/SQRTPS/DPPD/DPPS uops, latency and throughput The packed variants of the instructions had been modelled as the same as the scalar variants. Reported during a run of llvm-exegesis on a cheap SLM box and matches what Agner / InstLatX64 report as well.	2021-09-13 08:36:43 +01:00
luxufan	ff6069b891	[JITLink] Add initial native TLS support to ELFNix platform This patch use the same way as the https://reviews.llvm.org/rGfe1fa43f16beac1506a2e73a9f7b3c81179744eb to handle the thread local variable. It allocates 2 * pointerSize space in GOT to represent the thread key and data address. Instead of using the _tls_get_addr function, I customed a function __orc_rt_elfnix_tls_get_addr to get the address of thread local varible. Currently, this is a wip patch, only one TLS relocation R_X86_64_TLSGD is supported and I need to add the corresponding test cases. To allocate the TLS descriptor in GOT, I need to get the edge kind information in PerGraphGOTAndPLTStubBuilder, So I add a `Edge::Kind K` argument in some functions in PerGraphGOTAndPLTStubBuilder.h. If it is not suitable, I can think further to solve this problem. Differential Revision: https://reviews.llvm.org/D109293	2021-09-13 14:35:49 +08:00
Arthur Eubanks	6a92ab07cb	[NFC][CoroSplit] Directly use Function::getFunctionType()	2021-09-12 21:34:19 -07:00
Max Kazantsev	d9ca444835	[IndVars] Break backedge and replace PHIs if loop exits on 1st iteration Implement TODO in optimizeLoopExits. Now if we have proved that some loop exit is taken on 1st iteration, we make all branches in the following exiting blocks always branch out of the loop and their conditions simplified away. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D108910 Reviewed By: lebedev.ri	2021-09-13 11:30:55 +07:00
Max Kazantsev	5a6dfb27ca	[IndVars] Replace PHIs if loop exits on 1st iteration This is a part of D108910. We replace all loop PHIs with values coming from the loop preheader if we proved that backedge is never taken. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D109596 Reviewed By: lebedev.ri	2021-09-13 10:50:33 +07:00
Arthur Eubanks	d48a3f9f75	[NFC] Directly use OpenMPIRBuilder::Ident instead of IdentPtr->getPointerElementType()	2021-09-12 20:45:44 -07:00
Kuter Dinel	9a193bdc81	[Attributor][FIX] AACallEdges, fix propagation error. This patch fixes a error made in `2cc6f7c8e1`. That patch added a call site position but there was a small error with the way the presence of a unknown call edge was being propagated from call site to function. This patch fixes that error. This error was effecting some AMDGPU tests.	2021-09-13 03:45:26 +03:00
Arthur Eubanks	f94a118a6e	[NFC] Avoid using pointee types in PPCISelLowering A cmpxchg's new value type is the same as the pointer operand's pointee type.	2021-09-12 17:37:35 -07:00
Craig Topper	283879793d	[RISCV] Initial support .insn directive for the assembler. This allows for a custom encoding to be emitted. It can also be used with inline assembly to allow the custom instruction to be register allocated like other instructions. I initially started from SystemZ's implementation, but some of the formats allow operands to be specified in multiple ways so I had to add support for matching different operand class lists for the same format. That implementation is a simplified version of what is emitted by tablegen for regular instructions. I've left out the compressed formats. And I haven't supported the named opcodes like LUI or OP_IMM_32. Those can be added in future patches. Documentation can be found here https://sourceware.org/binutils/docs-2.37/as/RISC_002dV_002dFormats.html Reviewed By: jrtc27, MaskRay Differential Revision: https://reviews.llvm.org/D108602	2021-09-12 15:56:12 -07:00
Kuter Dinel	66a0b3464c	[Attributor] AAFunctionReachability, Handle CallBase Reachability. This patch makes it possible to query callbase reachability (Can a callbase reach a function Fn transitively). The patch moves the reachability query handling logic to a member class, this class will have more users within the AA once we add other function reachability queries. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106402	2021-09-13 01:35:44 +03:00
Kuter Dinel	2cc6f7c8e1	[Attributor] Create a call site position for AACalledges This patch adds a call site position for AACallEdges, this allows us to ask questions about which functions a specific `CallBase` might call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106208	2021-09-13 01:17:05 +03:00
Florian Hahn	368af7558e	[VPlan] Fix crash caused by not updating all users properly. Users of VPValues are managed in a vector, so we need to be more careful when iterating over users while updating them. For now, just copy them. Fixes 51798.	2021-09-12 18:10:53 +01:00
Nikita Popov	4189e5fe12	[CGP] Support opaque pointers in address mode fold Rather than inspecting the pointer element type, use the access type of the load/store/atomicrmw/cmpxchg. In the process of doing this, simplify the logic by storing the address + type in MemoryUses, rather than an Instruction + Operand pair (which was then used to fetch the address).	2021-09-12 17:43:37 +02:00
Kazu Hirata	8e86c0e4f4	[Scalar] Use make_early_inc_range (NFC)	2021-09-12 08:17:18 -07:00
Sanjay Patel	3a126134d3	[InstCombine] remove casts from splat-a-bit pattern https://alive2.llvm.org/ce/z/_AivbM This case seems clear since we can reduce instruction count and avoid an intermediate type change, but we might want to use mask-and-compare for other sequences. Currently, we can generate more instructions on some related patterns by trying to use bit-hacks instead of mask+cmp, so something is not behaving as expected.	2021-09-12 09:18:14 -04:00
Sam Clegg	b78c85a44a	[WebAssembly] Convert to new "dylink.0" section format This format is based on sub-sections (like the "linking" and "name" sections) and is therefore easier to extend going forward. spec change: https://github.com/WebAssembly/tool-conventions/pull/170 binaryen change: https://github.com/WebAssembly/binaryen/pull/4141 wabt change: https://github.com/WebAssembly/wabt/pull/1707 emscripten change: https://github.com/emscripten-core/emscripten/pull/15019 Differential Revision: https://reviews.llvm.org/D109595	2021-09-12 05:30:38 -07:00
Lang Hames	b64fc0af9a	[ORC] Add bootstrap symbols to ExecutorProcessControl. Bootstrap symbols are symbols whose addresses may be required to bootstrap the rest of the JIT. The bootstrap symbols map generalizes the existing JITDispatchInfo class provide an arbitrary map of symbol names to addresses. The JITDispatchInfo class will be replaced by bootstrap symbols with reserved names in upcoming commits.	2021-09-12 18:49:43 +10:00
Lang Hames	e339303776	[ORC] Add OrcTargetProcess dependency on LLVM_PTHREAD_LIB	2021-09-12 18:17:06 +10:00
Lang Hames	698a598cf7	[ORC] Add OrcShared dependency on LLVM_PTHREAD_LIB	2021-09-12 16:02:02 +10:00
Lang Hames	d193d23795	[ORC] Fix missing std::move	2021-09-12 15:27:19 +10:00
Lang Hames	d11a0c5d91	[ORC] Fix out-of-range comparison errors.	2021-09-12 14:48:05 +10:00
Lang Hames	bb72f07380	Re-apply `bb27e45643` and `5629afea91` with fixes. This reapplies `bb27e45643` (SimpleRemoteEPC support) and `2269a941a4` (#include <mutex> fix) with further fixes to support building with LLVM_ENABLE_THREADS=Off.	2021-09-12 14:23:22 +10:00
Kazu Hirata	15e9575fb5	[Vectorize] Fix "unused variable" warnings	2021-09-11 12:06:43 -07:00
Nikita Popov	45c467346a	[LAA] Pass access type to getPtrStride() Pass the access type to getPtrStride(), so it is not determined from the pointer element type. Many cases still fetch the element type at a higher level though, so this only partially addresses the issue.	2021-09-11 19:16:49 +02:00
Sanjay Patel	75e8eb2b10	[InstCombine] update code/test comments; NFC Follow-up for post-commit suggestion on: `28afaed691` The comments were partly copied from the original code, but not updated to match the new code.	2021-09-11 10:53:53 -04:00
Nikita Popov	f5806830e0	[ARM] Support neon.vld auto-upgrade with opaque pointers This code manually constructs the intrinsic name, so we need to use p0 instead of p0i8 in opaque pointer mode.	2021-09-11 16:34:32 +02:00
Kazu Hirata	e030d31fda	[GlobalOpt] Use make_early_inc_range (NFC)	2021-09-11 07:23:22 -07:00
Sanjay Patel	28afaed691	[InstCombine] fold sub of min/max intrinsics with invertible ops This is a translation of the existing code to handle the intrinsics and another step towards D98152. https://alive2.llvm.org/ce/z/jA7eBC This pattern is already handled by underlying folds if there are less uses, so the minimal tests in this case have extra uses. The larger cmyk tests show the motivation - when combined with other folds, we invert a larger sequence and eliminate 'not' ops.	2021-09-11 09:18:46 -04:00
guopeilin	749ddd25e9	[BitcodeReader] Delay select until all constants resolved Like the shuffle, we should treat the select delayed so that all constants can be resolved. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D109053	2021-09-11 18:51:35 +08:00
Simon Pilgrim	df975e4590	[X86][SLM] Fix PSAD/MPSAD uops, latency and throughput Noticed while trying to improve generic reduction costs via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.	2021-09-11 11:44:09 +01:00
Simon Pilgrim	484944ac3b	[X86][SLM] Fix HADD/HSUB uops, latency and throughput Noticed while trying to improve generic reduction costs via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.	2021-09-11 11:44:09 +01:00
Simon Pilgrim	51d04e2268	[X86][SLM] Swap LoadLat and LoadUOps in the SLMWriteResPair<> helper. NFC. We set the LoadUOps argument a lot more frequently that LoadLat, by swapping them we can simplify a number of declarations.	2021-09-11 11:44:09 +01:00
Lang Hames	2269a941a4	Revert `5629afea91` and `bb27e45643` while I look into bot failures. This reverts commit `5629afea91` ("[ORC] Add missing include."), and `bb27e45643` ("[ORC] Add SimpleRemoteEPC: ExecutorProcessControl over SPS + abstract transport."). The SimpleRemoteEPC patch currently assumes availability of threads, and needs to be rewritten with LLVM_ENABLE_THREADS guards.	2021-09-11 19:02:11 +10:00
Lang Hames	bb27e45643	[ORC] Add SimpleRemoteEPC: ExecutorProcessControl over SPS + abstract transport. SimpleRemoteEPC is an ExecutorProcessControl implementation (with corresponding new server class) that uses ORC SimplePackedSerialization (SPS) to serialize and deserialize EPC-messages to/from byte-buffers. The byte-buffers are sent and received via a new SimpleRemoteEPCTransport interface that can be implemented to run SimpleRemoteEPC over whatever underlying transport system (IPC, RPC, network sockets, etc.) best suits your use case. The SimpleRemoteEPCServer class provides executor-side support. It uses a customizable SimpleRemoteEPCServer::Dispatcher object to dispatch wrapper function calls to prevent the RPC thread from being blocked (a problem in some earlier remote-JIT server implementations). Almost all functionality (beyond the bare basics needed to bootstrap) is implemented as wrapper functions to keep the implementation simple and uniform. Compared to previous remote JIT utilities (OrcRemoteTarget, OrcRPCExecutorProcessControl), more consideration has been given to disconnection and error handling behavior: Graceful disconnection is now always initiated by the ORC side of the connection, and failure at either end (or in the transport) will result in Errors being delivered to both ends to enable controlled tear-down of the JIT and Executor (in the Executor's case this means "as controlled as the JIT'd code allows"). The introduction of SimpleRemoteEPC will allow us to remove other remote-JIT support from ORC (including the legacy OrcRemoteTarget code used by lli, and the OrcRPCExecutorProcessControl and OrcRPCEPCServer classes), and then remove ORC RPC itself. The llvm-jitlink and llvm-jitlink-executor tools have been updated to use SimpleRemoteEPC over file descriptors. Future commits will move lli and other tools and example code to this system, and remove ORC RPC.	2021-09-11 18:16:38 +10:00
Jessica Paquette	4e408aae2c	[AArch64][GlobalISel] Select full-fp16 s16 G_FCONSTANT as a constant pool load When we have full-fp16 support, we should (manually select) s16 G_FCONSTANT to a constant pool load. Add support for that to `emitLoadFromConstantPool` + the existing constant selection code. Also tidy up the constant selection code a little. There were some out-of-date comments + some dead code. Differential Revision: https://reviews.llvm.org/D108957	2021-09-10 19:36:34 -07:00
Lang Hames	6c56b13331	[JITLink] Working memory shouldn't be subject to alignment constraints. Refactors copyBlockContentToWorkingMemory to use offsets rather than direct pointers to working memory. This simplifies the problem of maintaining alignments between blocks in working memory, without requiring the working memory itself to be aligned.	2021-09-11 11:26:38 +10:00
Lang Hames	3828ab086a	[ORC] Fix missing newline in debugging output.	2021-09-11 11:24:01 +10:00
Lang Hames	22641f5853	[ORC] Use EPC for EPCGeneric MemoryAccess / JITLinkMemoryManager construction. This allows these classes to be created during EPC construction, before an ExecutionSession is available.	2021-09-11 11:24:00 +10:00
Usman Nadeem	ab111e982f	Revert "Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation"" This reverts commit `eee7d225de`. Effectively relanding `98c37247d8` after fixing the failing tests. Change-Id: I5d7461aeb820a2d5f1895457d824a8de4d316ee5	2021-09-10 18:11:24 -07:00
Eric Christopher	2d26a72f82	nullptr initialize variables, spotted on msan bots.	2021-09-10 18:10:53 -07:00
Joseph Huber	29b44ca896	[OpenMP] Add flag for setting debug in the offloading device This patch introduces the flags `-fopenmp-target-debug` and `-fopenmp-target-debug=` to set the value of a global in the device. This will be used to enable or disable debugging features statically in the device runtime library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109544	2021-09-10 18:19:19 -04:00
Joseph Huber	7eb899cbcd	[OpenMP] Add more verbose remarks for runtime folding We peform runtime folding, but do not currently emit remarks when it is performed. This is because it comes from the runtime library and is beyond the users control. However, people may still wish to view this and similar information easily, so we can enable this behaviour using a special flag to enable verbose remarks. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109627	2021-09-10 17:36:06 -04:00
Johannes Doerfert	99ea8ac9f1	Reapply "[OpenMP] Group side-effects to improve guarding efficiency" This reapplies `ca134c3963`, effectively reverting commit `d2f206e0af`. Minor test changes to make the test pass.	2021-09-10 15:22:57 -05:00
Johannes Doerfert	c09fbbdcfb	Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"" This reapplies commit `7dbba3376f`, or, put differently, this reverts commit `d9a8d20827`. The test now requires the amdgpu and nvptx backend explicitly as it won't work without properly.	2021-09-10 15:22:56 -05:00
Mark Schimmel	7c82db3634	[ARC] Improve code generated for i32 ADDC/ADDE and SUBC/SUBE This change improves the code generated for long long addition and subtraction Differential Revision: https://reviews.llvm.org/D109615	2021-09-10 13:04:08 -07:00
Usman Nadeem	eee7d225de	Revert "[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation" This reverts commit `98c37247d8`.	2021-09-10 13:01:48 -07:00
Usman Nadeem	98c37247d8	[AArch64][SVE][InstCombine] Canonicalize aarch64_sve_dup_x intrinsic to IR splat operation Differential Revision: https://reviews.llvm.org/D109118 Change-Id: I47adc1984a54bea02bf5a0a767b765afe7e16aa3	2021-09-10 12:52:14 -07:00
Joseph Huber	9e2fc0ba37	[OpenMP] Check OpenMP assumptions on call-sites as well This patch adds functionality to check assumption attributes on call sites as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109376	2021-09-10 14:52:47 -04:00
Florian Mayer	09391e7e50	[hwasan] Do not instrument accesses to uninteresting allocas. This leads to a statistically significant improvement when using -hwasan-instrument-stack=0: https://bit.ly/3AZUIKI. When enabling stack instrumentation, the data appears gets better but not statistically significantly so. This is consistent with the very moderate improvements I have seen for stack safety otherwise, so I expect it to improve when the underlying issue of that is resolved. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108457	2021-09-10 19:28:28 +01:00
Florian Mayer	57335b6e2e	[stack-safety] Allow to determine safe accesses. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109503	2021-09-10 19:23:54 +01:00
Kazu Hirata	c9fca53af1	[CodeGen, Target] Use pred_empty and succ_empty (NFC)	2021-09-10 11:11:31 -07:00
Huihui Zhang	da4a2fd832	[AArch64ISelLowering] Fix null pointer access in performSVEAndCombine. When combining 'and' of an unsigned unpack and shuffle instruction, bail early if shuffle is not constructed from a constant integer. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D109556	2021-09-10 10:36:43 -07:00
Anton Afanasyev	54d8ebbbfd	[AggressiveInstCombine] Add `udiv` and `urem` instrs to TruncInstCombine DAG Add `udiv` and `urem` instructions to the DAG post-dominated by `trunc`, allowing TruncInstCombine to reduce bitwidth of expressions containing these instructions. It is sufficient to require that all truncated bits of both operands are zeros: https://alive2.llvm.org/ce/z/yiithn (`urem` case is identical). Differential Revision: https://reviews.llvm.org/D109515	2021-09-10 20:29:08 +03:00
Johannes Doerfert	d2f206e0af	Revert "[OpenMP] Group side-effects to improve guarding efficiency" This reverts commit `ca134c3963`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:24:00 -05:00
Johannes Doerfert	d9a8d20827	Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals" This reverts commit `7dbba3376f`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:23:08 -05:00
Johannes Doerfert	7dbba3376f	[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals Not all address spaces support initializers for globals and we can therefore not set them without checking if they are allowed. This patch adds a hook into TTI to check if an AS allows non-undef initializers. We disable it for all but address space 0 by default, NVPTX and AMDGPU targets allow all but address space 3. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109337	2021-09-10 12:08:50 -05:00
Johannes Doerfert	ca134c3963	[OpenMP] Group side-effects to improve guarding efficiency When we guard side-effects as part of SPMDzation we do it for consecutive instructions that need guarding. This patch will try to reorder guarded side-effects in a block to decrease the number of guarded regions we need. It does not use any smarts, e.g., alias analysis, to move side-effects over non-interfering reads. Instead, it only moves side-effects downwards to the next guarded side-effect if there was nothing in between that could have possibly be affected. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D109070	2021-09-10 12:08:48 -05:00
David Green	deefeffb5d	[ARM] Remove unused tblgen arguments. NFC As per D109359, this removes or makes use of some of the existing unused NEON and base ARM tblgn arguments.	2021-09-10 18:03:54 +01:00
Nikita Popov	14afbe9448	[CallLowering] Support opaque pointers Always use the byval/inalloca/preallocated type (which is required nowadays), don't fall back on the pointer element type. This requires adding Function::getParamPreallocatedType() to mirror the CallBase API, so that the templated code can work with both.	2021-09-10 18:32:12 +02:00
Nikita Popov	d34d2bbe5d	[IR] Remove unused parameter (NFC)	2021-09-10 18:16:22 +02:00
Craig Topper	1b736bda3b	[RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr LICM may have pulled out a splat, but with .vx instructions we can fold it into an operation. This patch enables CGP to reverse the LICM transform and move the splat back into the loop. I've started with the commutable integer operations and shifts, but we can extend this with more operations in future patches. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109394	2021-09-10 09:04:01 -07:00
Craig Topper	6c7cadb8c1	[RISCV] Teach vsetvli insertion that stores don't use the policy bits in vtype. This can avoid a vsetvl after a tail undisturbed operation. Differential Revision: https://reviews.llvm.org/D109549	2021-09-10 09:03:20 -07:00
David Green	6b7cdb40da	[ARM] Remove unused tblgen arguments. NFCI As per D109359, this removes or makes use of some of the existing unused MVE tblgn arguments.	2021-09-10 15:06:31 +01:00
Sam Clegg	e4b2f3054a	[WebAssembly][libObject] Avoid re-use of Section object during parsing The re-use of this struct across iterations of the loop was causing fields (specifically Name) to be incorrectly shared between multiple sections. Differential Revision: https://reviews.llvm.org/D108984	2021-09-10 09:30:50 -04:00
Nikita Popov	90ec6dff86	[OpaquePtr] Forbid mixing typed and opaque pointers Currently, opaque pointers are supported in two forms: The -force-opaque-pointers mode, where all pointers are opaque and typed pointers do not exist. And as a simple ptr type that can coexist with typed pointers. This patch removes support for the mixed mode. You either get typed pointers, or you get opaque pointers, but not both. In the (current) default mode, using ptr is forbidden. In -opaque-pointers mode, all pointers are opaque. The motivation here is that the mixed mode introduces additional issues that don't exist in fully opaque mode. D105155 is an example of a design problem. Looking at D109259, it would probably need additional work to support mixed mode (e.g. to generate GEPs for typed base but opaque result). Mixed mode will also end up inserting many casts between i8* and ptr, which would require significant additional work to consistently avoid. I don't think the mixed mode is particularly valuable, as it doesn't align with our end goal. The only thing I've found it to be moderately useful for is adding some opaque pointer tests in between typed pointer tests, but I think we can live without that. Differential Revision: https://reviews.llvm.org/D109290	2021-09-10 15:18:23 +02:00
Sander de Smalen	ec7d8d5069	[SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors (widening). This patch implements legalization of EXTRACT_SUBVECTOR for the case where the result needs promoting, and the input type requires widening. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109509	2021-09-10 13:29:26 +01:00
Sander de Smalen	801a745dd2	[SelectionDAG] PromoteIntRes_EXTRACT_SUBVECTOR for scalable vectors. This patch implements legalization of EXTRACT_SUBVECTOR for the case where the result needs promoting, and the input type is either legal or requires splitting. The idea is that the operation is broken down into simpler steps, by first extracting a smaller subvector until the input vector becomes legal or requires promotion. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D109313	2021-09-10 13:29:26 +01:00
Sjoerd Meijer	6a076fa953	[LoopFlatten] Make the analysis more robust after IV widening LoopFlatten wasn't triggering on this motivating case after IV widening: void foo(int A, int N, int M) { for (int i = 0; i < N; ++i) for (int j = 0; j < M; ++j) f(A[iM+j]); } The reason was that the old induction phi nodes were getting in the way. These narrow and dead induction phis are not always trivially dead, and having both the narrow and wide IVs confused the analysis and caused it to bail. This adds some extra bookkeeping for these old phis, so we can filter them out when checks on phi nodes are performed. Other clean up passes will get rid of these old phis and increment instructions. As this was one of the motivating examples from the beginning, it was surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG is just slightly different. Differential Revision: https://reviews.llvm.org/D109309	2021-09-10 12:34:04 +01:00
Rosie Sumpter	9d1bea9c88	[SVE][LoopVectorize] Optimise code generated by widenPHIInstruction For SVE, when scalarising the PHI instruction the whole vector part is generated as opposed to creating instructions for each lane for fixed- width vectors. However, in some cases the lane values may be needed later (e.g for a load instruction) so we still need to calculate these values to avoid extractelement being called on the vector part. Differential Revision: https://reviews.llvm.org/D109445	2021-09-10 11:58:04 +01:00
Serge Bazanski	788e7b3b8c	[Lanai] implement wide immediate support This fixes LanaiTTIImpl::getIntImmCost to return valid costs for i128 (and wider) values. Previously any immediate wider than 64 bits would cause Lanai llc to crash. A regression test is also added that exercises this functionality. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D107091	2021-09-10 10:54:43 +00:00
Serge Bazanski	231bfaab31	[Lanai] fix MC / objdump D78776 removed is{Call,Branch,UnconditionalBranch} guards in objdump before calling MCInstrAnalysis::evaluateBranch. This is fine for other architectures as they gracefully handle evaluateBranch being called on non-branches. However, the Lanai MCInstrAnalysis implementation didn't and that change caused it to crash. This inserts the same guards back into Lanai's evaluateBranch implementation and adds a smoke test that exercises `llc \| objdump` so this kind of regression is hopefully caught next time. Reviewed By: jpienaar, MaskRay Differential Revision: https://reviews.llvm.org/D107593	2021-09-10 10:46:13 +00:00
Florian Hahn	5d1a6d0d1a	[ARM] Remove unnecessary use of replaceSymbolicStrideSCEV (NFC). When passing an empty strides map, there's nothing to replace for replaceSymbolicStrideSCEV and it just returns the SCEV for Ptr. There should be no need to call the function. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D109462	2021-09-10 10:47:26 +02:00
Sjoerd Meijer	4f9217c519	[FuncSpec] Don't specialise call sites that have the MinSize attribute set The MinSize attribute can be attached to both the callee and the caller in the callsite. Function specialisation was already skipped for function declarations (callees) with MinSize. This also skips specialisations for the callsite when it has MinSize set. Differential Revision: https://reviews.llvm.org/D109441	2021-09-10 09:01:45 +01:00
Alexey Lapshin	3493540830	[DebugInfo][NFC] Erase capacity in DWARFUnit::clearDIEs(). DWARFUnit::clearDIEs() uses std::vector::shrink_to_fit() to make capacity of DieArray matched with its size(). The shrink_to_fit() is not binding request to make capacity match with size(). Thus the memory could still be reserved after DWARFUnit::clearDIEs() is called. This patch erases capacity when DWARFUnit::clearDIEs() is requested. So the memory occupied by dies would be freed. Differential Revision: https://reviews.llvm.org/D109499	2021-09-10 10:07:28 +03:00
Chris Lattner	704a395693	[APInt] Enable APInt to support zero bit integers. Motivation: APInt not supporting zero bit values leads to a lot of special cases in various bits of code, particularly when using APInt as a bit vector (where you want to start with zero bits and then concat on more. This is particularly challenging in the CIRCT project, where the absence of zero-bit ConstantOp forces duplication of ops and makes instcombine-like logic far more complicated. Approach: zero bit integers are weird. There are two reasonable approaches: either make it illegal to do general arithmetic on them (e.g. sign extends), or treat them as as implicitly having a zero value. This patch takes the conservative approach, which enables their use in bitvector applications. Differential Revision: https://reviews.llvm.org/D109555	2021-09-09 22:43:54 -07:00
hsmahesha	0c28814015	Revert "[AMDGPU] Split entry basic block after alloca instructions." This reverts commit `98f4713122`. Without any (theoretical/practical) guarantee that all the allocas within entry basic block are clustered together at the beginning of the block, this patch is doomed to fail. Hence reverting it.	2021-09-10 10:23:51 +05:30
Yonghong Song	e52617c31d	BPF: change BTF_KIND_TAG format Previously we have the following binary representation: struct bpf_type { name, info, type } struct btf_tag { __u32 component_idx; } If the tag points to a struct/union/var/func type, we will have kflag = 1, component_idx = 0 if the tag points to struct/union member or func argument, we will have kflag = 0, component_idx = 0, ..., vlen - 1 The above rather makes interface complex to have both kflag and component needed to determine its legality and index. This patch simplifies the interface by removing kflag involvement. component_idx = (u32)-1 : tag pointing to a type component_idx = 0 ... vlen - 1 : tag pointing to a member or argument and kflag is always 0 and there is no need to check. Differential Revision: https://reviews.llvm.org/D109560	2021-09-09 19:03:57 -07:00
Zequan Wu	12f80c0bbd	[DebugInfo] Emit DW_AT_inline under -g1/-gmlt Differential Revision: https://reviews.llvm.org/D109554	2021-09-09 18:59:50 -07:00
Matt Arsenault	0197cd0bd4	AMDGPU: Optimize amdgpu-no-* attributes This allows clobbering a few extra registers in the fixed ABI, and avoids some workitem ID packing instructions.	2021-09-09 18:24:28 -04:00
Matt Arsenault	db4963d080	AMDGPU: Use attributor to propagate uniform-work-group-size Drop the legacy version in AMDGPUAnnotateKernelFeatures. This has the side effect of now respecting the linkage, and not changing externally visible functions.	2021-09-09 18:24:28 -04:00
Matt Arsenault	722b8e0e5a	AMDGPU: Invert ABI attribute handling Previously we assumed all callable functions did not need any implicitly passed inputs, and added attributes to functions to indicate when they were necessary. Requiring attributes for correctness is pretty ugly, and it makes supporting indirect and external calls more complicated. This inverts the direction of the attributes, so an undecorated function is assumed to need all implicit imputs. This enables AMDGPUAttributor by default to mark when functions are proven to not need a given input. This strips the equivalent functionality from the legacy AMDGPUAnnotateKernelFeatures pass. However, AMDGPUAnnotateKernelFeatures is not fully removed at this point although it should be in the future. It is still necessary for the two hacky amdgpu-calls and amdgpu-stack-objects attributes, which would be better served by a trivial analysis on the IR during selection. Additionally, AMDGPUAnnotateKernelFeatures still redundantly handles the uniform-work-group-size attribute to be removed in a future commit. At this point when not using -amdgpu-fixed-function-abi, we are still modifying the ABI based on these newly negated attributes. In the future, this option will be removed and the locations for implicit inputs will always be fixed. We will then use the new attributes to avoid passing the values when unnecessary.	2021-09-09 18:24:28 -04:00
Philip Reames	bfa2a81e92	[ScalarEvolution] Add an additional bailout to avoid NOT of pointer. It's possible in some cases for the LHS to be a pointer where the RHS is not. This isn't directly possible for an icmp, but the analysis mixes up operands of different icmp expressions in some cases. This does not include a test case as the smallest reduced case we've managed is extremely fragile and unlikely to test anything meaningful in the long term. Also add an assertion to getNotSCEV() to make tracking down this sort of issue a bit easier in the future. Fixes https://bugs.llvm.org/show_bug.cgi?id=51787 . Differential Revision: https://reviews.llvm.org/D109546	2021-09-09 15:19:36 -07:00
Philip Reames	eede4846a9	[SCEV] Allow negative steps for LT exit count computation for unsigned comparisons This bit of code is incredibly suspicious. It allows fully unknown (but potentially negative) steps, but not steps known to be negative. The comment about scev flag inference is worrying, but also not correct to my knowledge. At best, this might be covering up some related miscompile. However, there's no test in tree for it, the review history doesn't include obvious motivation, and the C++ example doesn't appear to give wrong results when hand translated to IR. I think it's time to remove this and see what falls out. During review, there were concerns raised about the correctness of the corresponding signed case. This change was deliberately narrowed to the unsigned case which has been auditted and appears correct for negative values. We need to get back to the known-negative signed case, but that'll be a future patch if nothing falls out from this one. Differential Revision: https://reviews.llvm.org/D104140	2021-09-09 14:09:29 -07:00
Amy Kwan	351a0d8a90	[PowerPC] Update PC-Relative Load/Store Patterns to use the refactored Load/Store Implementation This patch updates the PC-Relative load and store patterns to utilize the refactored load/store implementation introduced in D93370. PC-Relative implementation has been added to PPCISelLowering.cpp, and also the patterns in PPCInstrPrefix.td have been updated and no longer require AddedComplexity. All existing test cases pass with this update. Differential Revision: https://reviews.llvm.org/D95116	2021-09-09 15:38:42 -05:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Nikita Popov	af382b9383	[IR] Handle constant expressions in containsUndefinedElement() If the constant is a constant expression, then getAggregateElement() will return null. Guard against this before calling HasFn().	2021-09-09 22:04:12 +02:00
Eli Friedman	8f792707c4	[ScalarEvolution] Fix pointer/int confusion in howManyLessThans. In general, howManyLessThans doesn't really want to work with pointers at all; the result is an integer, and the operands of the icmp are effectively integers. However, isLoopEntryGuardedByCond doesn't like extra ptrtoint casts, so the arguments to isLoopEntryGuardedByCond need to be computed without those casts. Somehow, the values got mixed up with the recent howManyLessThans improvements; fix the confused values, and add a better comment to explain what's happening. Differential Revision: https://reviews.llvm.org/D109465	2021-09-09 12:38:33 -07:00
Jameson Nash	e20f69f612	[Aarch64] Correct register class for pseudo instructions This constrains the Mov* and similar pseudo instruction to take GPR64common register classes rather than GPR64. GPR64 includs XZR which is invalid here, because this pseudo instructions expands into an adrp/add pair sharing a destination register. XZR is invalid on add and attempting to encode it will instead increment the stack pointer causing crashes (downstream report at [1]). The test case there reproduces on LLVM11, but I do not have a test case that reaches this code path on main, since it is being masked by improved dead code elimination introduced in D91513. Nevertheless, this seems like a good thing to fix in case there are other cases that dead code elimination doesn't clean up (e.g. if `optnone` is used and the optimization is skipped). I think it would be worth auditing uses of GPR64 in pseudo instructions to see if there are any similar issues, but I do not have a high enough view of the backend or knowledge of the Aarch64 architecture to do this quickly. [1] https://github.com/JuliaLang/julia/issues/39818 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D97435	2021-09-09 14:31:49 -04:00
Artem Belevich	d99a83b4e5	[NVPTX] Simplify and generalize constant printer. This allows handling i128 values and fixes https://bugs.llvm.org/show_bug.cgi?id=51789. Differential Revision: https://reviews.llvm.org/D109458	2021-09-09 11:30:19 -07:00
Craig Topper	517728fe1e	[SelectionDAG] Use DAG.getNOT to further simplify some code. NFC Followup to D109483	2021-09-09 10:53:39 -07:00
Nick Desaulniers	e69d402088	[NFC] rename member of BitTestBlock and JumpTableHeader Follow up to suggestions in D109103 via hans: I think UnreachableDefault (or UnreachableFallthrough) would be a better name now, since it doesn't just omit the range check, it also omits the last bit test. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D109455	2021-09-09 10:43:00 -07:00
Chris Lattner	d51da74889	[CodeGen] Use DAG.getAllOnesConstant where possible to simplify code. NFC.	2021-09-09 10:22:51 -07:00
Craig Topper	124bcc1a13	[X86] Disable muloti4 libcalls for x86-64. This library function only exists in compiler-rt not libgcc. So this would fail to link unless we were linking with compiler-rt. This is consistent with the recent removal of calls to mulodi4 on 32-bit targets like D108928. I suppose maybe we could keep the libcalls for platforms like Darwin that use compiler-rt exclusively? Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D109385	2021-09-09 10:03:15 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Neumann Hon	0782e55c26	[SystemZ] [NFC] Add SystemZELFFrameLowering and SystemZXPLINKFrameLowering classes. This patch adds class SystemZFrameLowering which is a SystemZ-specific class detailing special registers used by calling conventions on the target. SystemZELFFrameLowering and SystemZXPLINKFrameLowering implement this class for ELF and XPLINK64 respectively. Previous functionality in SystemZFrameLowering is moved to SystemZELFFrameLowering. SystemZXPLINKFrameLowering can then be implemented in future patches. Reviewed By: uweigand, Kai Differential Revision: https://reviews.llvm.org/D108777	2021-09-09 12:23:40 -04:00
Kazu Hirata	92c9ff6d5f	[IR, Transforms] Use arg_empty (NFC)	2021-09-09 08:50:10 -07:00
Sam Clegg	44177e5fb2	[WebAssembly] Add explict TLS symbol flag As before we maintain backwards compat with older object files by also infering the TLS flag based on the name of the segment. This change is was split out from https://reviews.llvm.org/D108877. Differential Revision: https://reviews.llvm.org/D109426	2021-09-09 10:03:30 -04:00
Sanjay Patel	97a4e7b7ff	[InstCombine] remove a buggy set of zext-icmp transforms The motivating case is an infinite loop shown with a reduced test from: https://llvm.org/PR51762 To solve this, I'm proposing we delete the most obviously broken part of this code. The bug example shows a fundamental problem: we ask computeKnownBits if a transform will be profitable, alter the code by creating new instructions, then rely on computeKnownBits to return the same answer to actually eliminate instructions. But there's no guarantee that the results will be the same between the 1st and 2nd calls. In the infinite loop example, we get different answers, so we add instructions that conflict with some other transform, and we're stuck. There's at least one other problem visible in the test diff for `@zext_or_masked_bit_test_uses`: the code doesn't check uses properly, so we can end up with extra instructions created. Last, it's not clear if this set of transforms actually improves analysis or codegen. I spot-checked a few targets and don't see a clear win: https://godbolt.org/z/x87EWovso If we do see a regression from this change, codegen seems like the right place to add a cmp -> bit-hack fold. If this is too big of a step, we could limit the computeKnownBits calls by not passing a context instruction and/or limiting the recursion. I checked that those would stop the infinite loop for PR51762, but that won't guarantee that some other example does not fall into the same loop. Differential Revision: https://reviews.llvm.org/D109440	2021-09-09 08:49:39 -04:00
Florian Mayer	6e12c73316	[NFC] [stack-safety] add placeholder addRange. This is in preparataion of D108457.	2021-09-09 13:13:18 +01:00
Florian Mayer	d261d4cf55	[stack-safety] [NFC] do not terminate print with blank line.	2021-09-09 12:31:09 +01:00
Florian Mayer	08b4dd8b24	[NFC] [stack-safety] remove unused return value.	2021-09-09 12:19:47 +01:00
Simon Pilgrim	c31a202233	[X86][AVX] Add missing X86ISD::VBROADCAST(v2f64 -> v4f64) isel pattern for AVX1 targets As discussed on the ticket, I'm intending to add additional 128->256 patterns when we have test coverage, but this addresses a known crash. Differential Revision: https://reviews.llvm.org/D109434	2021-09-09 12:16:23 +01:00
Bradley Smith	8089f9ed5a	[AArch64][SVE] Add missing patterns for unpredicated subr intrinsics Differential Revision: https://reviews.llvm.org/D109369	2021-09-09 10:28:37 +00:00
Alfonso Sánchez-Beato	b33fd31772	[yaml2obj][COFF] Allow variable number of directories Allow variable number of directories, as allowed by the specification. NumberOfRvaAndSize will default to 16 if not specified, as in the past. Reviewed by: jhenderson Differential Revision: https://reviews.llvm.org/D108825	2021-09-09 11:16:56 +01:00
Sjoerd Meijer	ecff9e3da5	[FuncSpec] Fixed minor formatting issues. NFC.	2021-09-09 10:36:54 +01:00
Roman Lebedev	909cba9699	[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125) I can't seem to wrap my head around the proper fix here, we should be fine without this requirement, iff we can form this form, but the naive attempt (https://reviews.llvm.org/D106317) has failed. So just to unblock the release, put up a restriction. Fixes https://bugs.llvm.org/show_bug.cgi?id=51125	2021-09-09 12:28:09 +03:00
Jun Ma	8ba2adcf9e	Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" Differential Revision: https://reviews.llvm.org/D106056	2021-09-09 16:53:33 +08:00
Cullen Rhodes	d42f76fd36	[AArch64][SVE] NFC: Remove unused template args For sve_fp_3op_p_zds_zx we have zero patterns downstream but the intrinsic args can be added again if/when the patterns are implemented. Identified in D109359. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109429	2021-09-09 07:10:57 +00:00
Cullen Rhodes	5b848a35d2	[AArch64][SVE] NFC: Use stepvector directly in index multiclasses Also fixes a couple of warnings identified in D109359: SVEInstrFormats.td:5099:59: warning: unused template argument: sve_int_index_ri::step_vector SVEInstrFormats.td:5133:59: warning: unused template argument: sve_int_index_rr::step_vector Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D109422	2021-09-09 07:10:57 +00:00
Alexander Pivovarov	4bc8dbe0ca	[RISCV] Add SiFive cores E and S series Add SiFive cores E20, E21, E24, E34, S21, S54 and S76 Differential Revision: https://reviews.llvm.org/D109260	2021-09-08 23:59:04 -07:00
Yvan Roux	261cbe98c3	[RISCV] Fix Machine Outliner jump table handling. Don't outline machine instructions which are using jump table indexes since they are materialized as local labels (like the already handled case of constant pools). Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D109436	2021-09-09 07:32:30 +02:00
Chris Lattner	9e46dd965a	[APInt.h] Reduce the APInt header file interface a bit. NFC This moves one mid-size function out of line, inlines the trivial tcAnd/tcOr/tcXor/tcComplement methods into their only caller, and moves the magic/umagic functions into SelectionDAG since they are implementation details of its algorithm. This also removes the unit tests for magic, but these are already tested in the divide lowering logic for various targets. This also upgrades some C style comments to C++. Differential Revision: https://reviews.llvm.org/D109476	2021-09-08 18:17:07 -07:00
Jessica Paquette	22a64d4a14	[MachineOutliner][AArch64] Ensure LR is live-in when inserting reg-save calls Similar to other code which handles creating the function frame. If LR isn't live-in to the block that we're inserting the call into, we'll get a MachineVerifier error.	2021-09-08 17:44:27 -07:00
Amara Emerson	eae44c8a86	[GlobalISel] Implement merging of stores of truncates. This is a port of a combine which matches a pattern where a wide type scalar value is stored by several narrow stores. It folds it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; => ((i32)p) = val; On CTMark AArch64 -Os this results in a good amount of savings: Program before after diff SPASS 412792 412788 -0.0% kc 432528 432512 -0.0% lencod 430112 430096 -0.0% consumer-typeset 419156 419128 -0.0% bullet 475840 475752 -0.0% tramp3d-v4 367760 367628 -0.0% clamscan 383388 383204 -0.0% pairlocalalign 249764 249476 -0.1% 7zip-benchmark 570100 568860 -0.2% sqlite3 287628 286920 -0.2% Geomean difference -0.1% Differential Revision: https://reviews.llvm.org/D109419	2021-09-08 17:06:33 -07:00
Philip Reames	e741fabc22	[SCEV] Move getIndexExpressionsFromGEP to delinearize [NFC]	2021-09-08 16:56:49 -07:00
Philip Reames	4b5e260b1d	[SCEV] Simplify findExistingSCEVInCache interface [NFC] We were returning a tuple when all but one caller only cared about one piece of the return value. That one caller can inline the complexity, and we can simplify all other uses.	2021-09-08 15:26:07 -07:00
Andrew Litteken	144cd22bae	[CodeExtractor] Creating exit stubs based off original order branch instructions. Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block. Reviewers: paquette, roelofs Differential Revision: https://reviews.llvm.org/D108657	2021-09-08 15:15:15 -07:00
Arthur Eubanks	fe15347a1e	Port the cost model printer to New PM Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109284	2021-09-08 14:47:05 -07:00
Craig Topper	a574f0e0c3	[RISCV] Disable use of i128 shift libcalls on RV32. Since i128 isn't a legal C type on RV32, I don't believe libgcc implements these functions for RV32. compiler-rt does implement them because i128 support is enabled in order to handle long double. This is consistent with 32-bit X86 and ARM. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D109383	2021-09-08 14:26:07 -07:00
Michael Kruse	088577a38e	[Delinerization] Require by offset to be zero. Users of delinearization assume that the the offset into the array element is zero. In most cases it will indeed be zero, but if it is not, the delinearization has to fail since it violates that assumption without the API even allowing to signal to the caller that the by offset is non-zero. This bug caused Polly to miscompile blender (526.blender_r from SPEC CPU 2017) in -polly-process-unprofitable mode. The SCEV expression incorrectly delinearized has been reduced in the test case byte_offset.ll. The dropped offset into the array element of size 4 (a float) is ((sext i32 %mul7.i4534 to i64) + {(sext i32 %i1 to i64),+,((sext i32 (1 + ((1 + %shl.i.i) * (1 + %shl.i.i)) + %shl.i.i) to i64) * (sext i32 %i1 to i64))}<%for.body703>). This significant component was just dropped, and the wrong pointer was computed when regenerating code from the remaining delinearized subscripts. This occurred during blender's subsurface scattering implementation. As a result, blender's rendering diverged from the reference image. Patch D108885 would also fix the API. Reviewed By: bmahjour Differential Revision: https://reviews.llvm.org/D109133	2021-09-08 16:02:37 -05:00
Greg Clayton	14850a0628	Log to the right stream in DwarfTransformer::handleDie(). Since we might end up using multiple threads when logging information in the DWARFTransformer, the handleDie() method must use the supplied stream named "OS" when logging warnings and errors. When we use multiple threads, we log to a thread specific stream buffer and then use a mutex to ensure our output doesn't overlap when we emit warnings and errors after a thread is done. Differential Revision: https://reviews.llvm.org/D109401	2021-09-08 14:00:19 -07:00
Florian Hahn	f4726e7238	[LAA] Remove unused OrigPtr from replaceSymbolicStrideSCEV (NFC). The OrigPtr argument is not used in tree.	2021-09-08 22:35:36 +02:00
Nikita Popov	6dfdc6bfd2	[SROA] Support opaque pointers Make the following changes in order to support opaque pointers in SROA: * Generate i8 GEPs for opaque pointers. * Explicitly enforce that promotable allocas only have stores of the alloca type -- previously this was implicitly enforced. * Replace a check for pointer element type with load/store type. Differential Revision: https://reviews.llvm.org/D109259	2021-09-08 22:25:44 +02:00
Arthur Eubanks	b493124ae2	[MemorySSA] Support invariant.group metadata The implementation is mostly copied from MemDepAnalysis. We want to look at all loads and stores to the same pointer operand. Bitcasts and zero GEPs of a pointer are considered the same pointer value. We choose the most dominating instruction. Since updating MemorySSA with invariant.group is non-trivial, for now handling of invariant.group is not cached in any way, so it's part of the walker. The number of loads/stores with invariant.group is small for now anyway. We can revisit if this actually noticeably affects compile times. To avoid invariant.group affecting optimized uses, we need to have optimizeUsesInBlock() not use invariant.group in any way. Co-authored-by: Piotr Padlewski <prazek@google.com> Reviewed By: asbirlea, nikic, Prazek Differential Revision: https://reviews.llvm.org/D109134	2021-09-08 13:06:12 -07:00
Philip Reames	585c594d74	Move delinearization logic out of SCEV [NFC] None of this logic has anything to do with SCEV's internals, it just uses the existing public APIs. As a result, we can move the code from ScalarEvolution.cpp/hpp to Delinearization.cpp/hpp with only minor changes. This was discussed in advance on today's loop opt call. It turned out to be easy as hoped.	2021-09-08 12:28:35 -07:00
Nikita Popov	3e54de4df2	[ConstantHoisting] Support opaque pointers Directly use i8 for GEP, rather than fetching element type of i8*.	2021-09-08 21:23:10 +02:00
Akira Hatanaka	dea6f71af0	[ObjC][ARC] Use the addresses of the ARC runtime functions instead of integer 0/1 for the operand of bundle "clang.arc.attachedcall" https://reviews.llvm.org/D102996 changes the operand of bundle "clang.arc.attachedcall". This patch makes changes to llvm that are needed to handle the new IR. This should make it easier to understand what the IR is doing and also simplify some of the passes as they no longer have to translate the integer values to the runtime functions. Differential Revision: https://reviews.llvm.org/D103000	2021-09-08 11:58:03 -07:00
Andrew Litteken	0087bb4a9a	[IROutliner] Using canonical values to find corresponding values. (NFC) D104143 introduced canonical value numbering between regions, which allows for the easy identification of items across a region, eliminating the need in the outliner to create parallel lists of instructions for each region, and replace output values in a less convoluted way. Additionally, in a future commit, the output values will not necessarily be recorded values from the region itself, it could be a combination value where the actual value being output is a PHINode instead. This new method allows us to handle the replacement of the output value to the stored value with the corresponding item in the same place for both normal output values, and PHINode outputs instead of handling the different types of outputs in different locations. Reviewers: paquette, roelofs Differential Revision: https://reviews.llvm.org/D108656	2021-09-08 11:36:05 -07:00
Joseph Huber	6b9a3ec3a2	[OpenMP] Do not SPMDize generic regions with no parallel This patch changes SPMDization to not trigger for regions with no parallelism. Otherwise, this will introduce unnecessary barriers that will slow the single-threaded region down. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109438	2021-09-08 14:33:15 -04:00
Nick Desaulniers	4331f19d8b	[ISEL][BitTestBlock] omit additional bit test when default destination is unreachable Otherwise we end up with an extra conditional jump, following by an unconditional jump off the end of a function. ie. bb.0: BT32rr .. JCC_1 %bb.4 ... bb.1: BT32rr .. JCC_1 %bb.2 ... JMP_1 %bb.3 bb.2: ... bb.3.unreachable: bb.4: ... Should be equivalent to: bb.0: BT32rr .. JCC_1 %bb.4 ... JMP_1 %bb.2 bb.1: bb.2: ... bb.3.unreachable: bb.4: ... This can occur since at the higher level IR (Instruction) SwitchInsts are required to have BBs for default destinations, even when it can be deduced that such BBs are unreachable. For most programs, this isn't an issue, just wasted instructions since the unreachable has been statically proven. The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to boot though once D106056 is re-applied. D106056 makes it more likely that correlation-propagation (CVP) can deduce that the default case of SwitchInsts are unreachable. The x86_64 kernel uses a binary post processor called objtool, which emits this warning: vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b I haven't debugged precisely why this causes a failure at boot time, but fixing this very obvious jump off the end of the function fixes the warning and boot problem. Link: https://bugs.llvm.org/show_bug.cgi?id=50080 Fixes: https://github.com/ClangBuiltLinux/linux/issues/679 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D109103	2021-09-08 11:03:47 -07:00
Kirill Stoimenov	3f875134a7	[asan] Fixed the jump to use the 4 byte offset version. This should have been the 4 byte version in the first place. Unfortunatelly there is no easy way to add a test as both the 1 byte and 4 byte version are printed as 'jmp' in the assembly code. Reviewed By: kda Differential Revision: https://reviews.llvm.org/D109453	2021-09-08 17:58:12 +00:00
Wouter van Oortmerssen	a99fb86c65	[WebAssembly] Change WebAssemblyMCLowerPrePass to ModulePass It was a FunctionPass before, which subverted its purpose to collect ALL symbols before MCLowering, depending on how LLVM schedules function passes. Fixes https://bugs.llvm.org/show_bug.cgi?id=51555 Differential Revision: https://reviews.llvm.org/D109202	2021-09-08 10:47:43 -07:00
Craig Topper	aca14c8cf1	[RISCV] Remove unused tablegen template parameters. NFC Identified in D109359	2021-09-08 10:01:42 -07:00
Craig Topper	b04c09c07c	[RISCV] Use V0 instead of VMV0: for mask vectors in isel patterns. This is consistent with the RVV intrinsic patterns. This has been shown to prevent some "ran out of registers" errors in our internal testing. Unfortunately, there are some regressions on LMUL=8 tests in here. I think the lack of registers with LMUL=8 just makes it very hard to schedule correctly. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109245	2021-09-08 09:46:21 -07:00
Benjamin Kramer	373b7622c1	[IROutliner] Remove unused variable. NFC.	2021-09-08 18:33:41 +02:00
Roman Lebedev	0852f8706b	[X86] X86DAGToDAGISel::matchBitExtract(): support 'num high bits to clear' pattern Currently, we only deal with the case where we can match the number of low bits to be kept, i.e.: ``` x & ((1 << y) - 1) ``` will extract low `y` bits of `x`. But what will ``` x & (-1 >> y) ``` do? Logically, it will extract `bitwidth(x) - y` low bits, i.e.: ``` x & ~(-1 << (bitwidth(x)-y)) ``` ... except we can't do such a transformation in IR in general, because if we wanted to extract all the bits `(-1 >> 0)` is fine, but `-1 << bitwidth(x)` would be `poison`: https://alive2.llvm.org/ce/z/BKJZfw, Yet, here with BMI's BEXTR and BMI2's BZHI we don't have any such problems with edge-cases. So what we can do is: https://alive2.llvm.org/ce/z/gm5M2B As briefly discussed with @craig.topper, this appears to be not worse than what we'd end up with currently (a pair of shifts): * https://godbolt.org/z/nsPb8bejs (direct data dependency, sequential execution) * https://godbolt.org/z/7bj3zeh1d (no direct data dependency, parallel execution) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107923	2021-09-08 19:27:08 +03:00
Craig Topper	1f16191906	[RISCV] Add an GPR def to the Zvlseg SPILL/RELOAD pseudos The expansion of these pseudos creates ADD instructions. Those ADDs modify a GPR so that it is no longer contains the same value as the input base pointer. Therefore, I believe we should have a GPR as a Def on these instructions and expansion should get the destination register for the ADDs from that operand. At least in our tests here this works out so that register scavenging picks the same register as the base pointer. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109405	2021-09-08 09:23:33 -07:00
Andrew Litteken	c172f1ad39	[IROutliner] Adding supports for multiple exits When we start outlining across branches, there is the possibility that we will have two different blocks with different output locations, or a single branch that goes to two blocks outside of the region that is being outlined. While the CodeExtractor provides most of the mechanisms by using the return value of the extracted function as the input to a switch statement to correctly branch to the correct location, we need special handling for different output schemas to each location. This is done by repeating the existing storing scheme for each different exit block. We have a map from the return values used, to the basic block that is used to store the outputs for that particular exit block within the outlined function. Then if needed, we create a switch statement for each return block to branch to the correct set of stored outputs. Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106993	2021-09-08 08:58:07 -07:00
Kazu Hirata	bcfbb3f9ec	[IR] Construct SmallVector with iterator ranges (NFC) Note that arg_operands has been deprecated in favor of args.	2021-09-08 08:54:15 -07:00
Peter Smith	b026ce9c8a	[MC] Add Subtarget for MAsmParser call to emitCodeAlignment The call to emitCodeAlignment was missing a STI which is required after D45962. emitCodeAlignment has a default parameter of 0 for MaxBytesToEmit. Explicitly passing 0 here was interpreted as as nullptr for the STI. This could possibly be avoided by taking STI as a const reference in emitCodeAlignment. Differential Revision: https://reviews.llvm.org/D109425	2021-09-08 13:28:24 +01:00
Sjoerd Meijer	88a2031207	[FuncSpec] Fix typo in option description. NFC.	2021-09-08 12:58:46 +01:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC `79845ed6df` folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
Evgeny Leviant	93b09a2a5d	[LiveDebugValues] Handle spills of indirect debug values correctly When handling register spill for indirect debug value LiveDebugValues pass doesn't add DW_OP_deref operator which may in some cases cause debugger to return value address, instead of value while machine register holding that address is spilled. Differential revision: https://reviews.llvm.org/D109142	2021-09-08 14:06:08 +03:00
Fraser Cormack	7fb66d4035	[MemCpyOpt] Fix a variety of scalable-type crashes This patch fixes a variety of crashes resulting from the `MemCpyOptPass` casting `TypeSize` to a constant integer, whether implicitly or explicitly. Since the `MemsetRanges` requires a constant size to work, all but one of the fixes in this patch simply involve skipping the various optimizations for scalable types as cleanly as possible. The optimization of `byval` parameters, however, has been updated to work on scalable types in theory. In practice, this optimization is only valid when the length of the `memcpy` is known to be larger than the scalable type size, which is currently never the case. This could perhaps be done in the future using the `vscale_range` attribute. Some implicit casts have been left as they were, under the knowledge they are only called on aggregate types. These should never be scalably-sized. Reviewed By: nikic, tra Differential Revision: https://reviews.llvm.org/D109329	2021-09-08 11:21:36 +01:00
Fraser Cormack	2c5568a6a9	[LegalizeTypes][VP] Add promotion support for binary VP ops This patch extends the preliminary support for vector-predicated (VP) operation legalization to include promotion of illegal integer vector types. Integer promotion of binary VP operations is relatively simple and piggy-backs on the non-VP logic, but passing the two extra mask and VP operands through to the promoted operation. Tests have been added to the RISC-V target to cover the basic scenarios for integer promotion for both fixed- and scalable-vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108288	2021-09-08 10:22:57 +01:00
Cullen Rhodes	89786c2b99	[AArch64][SME] Fix imm bug in mov vector to tile aliases Also fixes a warning mentioned in D109359. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109363	2021-09-08 07:42:16 +00:00
Sander de Smalen	981f7d563a	[AArch64] Implement extract_subvector for predicates. This patch implements extract_subvector for predicate types when the input type is more than twice the size of the subvector that is being extracted. Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D109314	2021-09-08 08:18:34 +01:00
Max Kazantsev	29d054bf12	[SimplifyCFG] Preserve knowledge about guarding condition by adding assume This improvement adds "assume" after removal of branch basing on UB in successor block. Consider the following example: ``` pred: x = ... cond = x > 10 br cond, bb, other.succ bb: phi [nullptr, pred], ... // other possible preds load(phi) // UB if we came from pred other.succ: // here we know that x <= 10, but this knowledge is lost // after the branch is turned to unconditional unless we // preserve it with assume. ``` If we remove the branch basing on knowledge about UB in a successor block, then the fact that x <= 10 is other.succ might be lost if this condition is not inferrable from any dominating condition. To preserve this knowledge, we can add assume intrinsic with (possibly inverted) branch condition. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109054 Reviewed By: lebedev.ri	2021-09-08 14:05:17 +07:00
Justin Latimer	b0d4d969e2	[AVR] Add support for the tinyAVR 0-series and tinyAVR 1-series Reviewed By: Dylan McKay, Ben Shi Differential Revision: https://reviews.llvm.org/D103136	2021-09-08 02:35:26 +00:00
Ben Shi	f0460fa4eb	[AArch64] Improve target hook function to decide folding (mul (add x, c1), c2) Prevent the folding if it leads to worse code. Reviewed By: dmgreen, kda Differential Revision: https://reviews.llvm.org/D108871	2021-09-08 01:51:26 +00:00
Wang, Pengfei	9d7d34c769	[X86][MS] Fix the aligement mismatch of vector variable arguments on Win32 The alignment of vector variable arguments in callee side is 4, which is aligned with MSVC. But the caller aligns them to the size of vector arguments. It results in run fails. This patch fixes this problem by trimming it to 4 bytes for variable arguments on Win32. Fixed vector arguments are passed by pointer on Win32. So they don't have the problem. I don't find a doc in MSDN for this calling conversion, so I did several experiments here: https://godbolt.org/z/n1zn1Gx1z Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D108887	2021-09-08 09:26:44 +08:00
Philip Reames	6cdca906c7	[SCEV] Use no-self-wrap flags infered from exit structure to compute trip count The basic problem being solved is that we largely give up when encountering a trip count involving an IV which is not an addrec. We will fall back to the brute force constant eval, but that doesn't have the information about the fact that we can't cycle back through the same set of values. There's a high level design question of whether this is the right place to handle this, and if not, where that place is. The major alternative here would be to return a conservative upper bound, and then rely on two invocations of indvars to add the facts to the narrow IV, and then reconstruct SCEV. (I have not implemented the alternative and am not 100% sure this would work out.) That's arguably more in line with existing code, but I find this substantially easier to reason about. During review, no one expressed a strong opinion, so we went with this one. Differential Revision: D108651	2021-09-07 17:00:02 -07:00
Heejin Ahn	a1d522939c	[WebAssembly] Error out on indirect uses of setjmp Both Wasm & Emscripten SjLj handling has a restriction that `setjmp` cannot be called indirectly. I thought we have been erroring out on indirect uses of `setjmp`, but some recent CL disrupted the logic and we are not erroring out anymore. We currently 1. Collect functions that contain `setjmp` calls in `SetjmpUsers`. This only counts direct calls: `8f77dc459e/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L869-L878)` 2. Run `runSjLjOnFunction` only on those `SetjmpUsers`. Within `runSjLjOnFunction`, if we see an indirect use of `setjmp`, we error out: `8f77dc459e/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L1218-L1221)` So if there are only indirect setjmp calls within the module, `SetjmpUsers` will be empty, and `runSjLjOnFunction` is not even entered once. And the indirect `setjmp` call will error out at link time. So in this CL we check for the indirect uses of `setjmp` upfront before we enter `runSjLjOnFunction`. Also this currently errors out on `invoke @setjmp`, which can only occur when using Wasm EH + Wasm SjLj within a function. We recently added Wasm SjLj support but we don't support using Wasm EH + Wasm SjLj in the same function yet. We plan to add this support very soon, so I don't think it's worth creating another test file just for this. (This is an error test so it needs its own file) Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D109375	2021-09-07 15:52:58 -07:00
Arthur Eubanks	39e2e3bddb	[NFC][C API] Make LLVMSetInstrParamAlignment's index param type LLVMAttributeIndex It's the same as unsigned, but clearer in intent.	2021-09-07 15:13:45 -07:00
Rainer Orth	08ba87fa4b	[Support] Implement getMainExecutable on Solaris Many `flang` tests currently `FAIL` on Solaris because the module files aren't found. I could trace this to `sys::fs::getMainExecutable` not being implemented. This patch does this and fixes all affected `flang` tests. Tested on `amd64-pc-solaris2.11`. Differential Revision: https://reviews.llvm.org/D109374	2021-09-07 22:56:10 +02:00
Philip Reames	9659069978	[SCEV] Further clarify comments regarding UB and zero stride Follow on to D109029. I realized we had no mention of mustprogrress in the comment (as it prexisted mustprogress in the codebase). In the process of adding it, I tweaked the preconditions into something I think is more clear. Note that mustprogress is checked in the code. Differential Revision: https://reviews.llvm.org/D109091	2021-09-07 13:53:56 -07:00
Sanjay Patel	a3c1669b17	[InstCombine] fold icmp equality with 'or' mask ops This could go either direction since the instruction count is the same either way, but there are a few reasons to prefer this: 1. We already do the related transform with 'and' (see just above the new code). 2. We try (too hard) to compensate for not having this and possibly other folds in transformZExtICmp(), and that leads to bugs like https://llvm.org/PR51762 . 3. Codegen looks better across a variety of targets. https://alive2.llvm.org/ce/z/uEgn4P	2021-09-07 16:34:00 -04:00
Irina Dobrescu	7023cefe61	[AArch64][Global ISel] Add sext/zext of vector extract improvements This patch adds improvements for sext/zext of a vector extract in Global ISel. For example, this piece of code: define i64 @si64(<4 x i32> %0, i32 %1) { %3 = extractelement <4 x i32> %0, i64 1 %s = sext i32 %3 to i64 ret i64 %s } Used to have this lowering: si64: mov s0, v0.s[1] fmov w8, s0 sxtw x0, w8 ret Whereas this patch makes it lower to this: si64: smov x0, v0.h[0] ret Differential Revision: https://reviews.llvm.org/D108137	2021-09-07 21:17:51 +01:00
Arthur Eubanks	4b05341681	Don't check if the result of hasAttrSomewhere is non-zero in CallBase::getReturnedArgOperand() Index is 0 when the return value has the returned attribute. But the return value cannot have the returned attribute, so the check is pointless. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D109334	2021-09-07 12:05:56 -07:00
Elliot Saba	ae8507b0df	[X86] Don't clobber EBX in stackprobes On X86, the stackprobe emission code chooses the `R11D` register, which is illegal on i686. This ends up wrapping around to `EBX`, which does not get properly callee-saved within the stack probing prologue, clobbering the register for the callers. We fix this by explicitly using `EAX` as the stack probe register. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D109203	2021-09-07 15:00:44 -04:00
Nikita Popov	f5832eaaad	[UseListOrder] Fix use list order for function operands Functions can have a personality function, as well as prefix and prologue data as additional operands. Unused operands are assigned a dummy value of i1* null. This patch addresses multiple issues in use-list order preservation for these: * Fix verify-uselistorder to also enumerate the dummy values. This means that now use-list order values of these values are shuffled even if there is no other mention of i1* null in the module. This results in failures of Assembler/call-arg-is-callee.ll, Assembler/opaque-ptr.ll and Bitcode/use-list-order2.ll. * The use-list order prediction in ValueEnumerator does not take into account the fact that a global may use a value more than once and leaves uses in the same global effectively unordered. We should be comparing the operand number here, as we do for the more general case. * While we enumerate all operands of a function together (which seems sensible to me), the bitcode reader would first resolve prefix data for all function, then prologue data for all functions, then personality functions for all functions. Change this to resolve all operands for a given function together instead. Differential Revision: https://reviews.llvm.org/D109282	2021-09-07 20:59:12 +02:00
Arthur Eubanks	7f54009a1f	Add missing overloads for Function::addRetAttr(s)	2021-09-07 11:52:22 -07:00
Nikita Popov	58db5f6e95	[ConstFold] Support opaque pointers in constexpr GEPs Support opaque pointers in SymbolicallyEvaluateGEP() by using the value type of a GlobalValue base or falling back to i8 if there isn't one. We don't unconditionally generate i8 GEPs here because that would lose inrange attribues, and because some optimizations on globals currently rely on GEP types (e.g. the globals SROA mentioned in the comment). Differential Revision: https://reviews.llvm.org/D109297	2021-09-07 20:50:29 +02:00
Andy Kaylor	34528c32d2	Copy Elementtype Attribute to IR at Link step Copying IR during linking causes a type mismatch due to the field being missing in IRMover/Valuemapper. Adds the full range of typed attributes including elementtype attribute in the copy functions. Patch by Chenyang Liu Differential Revision: https://reviews.llvm.org/D108796	2021-09-07 11:41:43 -07:00
Arthur Eubanks	b81fc14f2d	[NFC][InstCombine] Make check for sret in a vararg function clearer We're trying to get the parameter index of sret and see if it's part of a function's varargs. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D109335	2021-09-07 11:19:27 -07:00
Roman Lebedev	35fa7b8ad8	Reland "[InstCombine] Recognize `((x * y) s/ x) !=/== y` as an signed multiplication overflow check (PR48769)" This reverts commit `91f7a4fff7`, relanding commit `13ec913bdf`. The original commit was reverted because of (essentially) https://bugs.llvm.org/show_bug.cgi?id=35922 which has now been addressed by `d0eeb64be5`.	2021-09-07 21:03:52 +03:00
Nick Desaulniers	d0eeb64be5	[X86ISelLowering] avoid emitting libcalls to __mulodi4() Similar to D108842, D108844, and D108926. __has_builtin(builtin_mul_overflow) returns true for 32b x86 targets, but Clang is deferring to compiler RT when encountering long long types. This breaks ARCH=i386 + CONFIG_BLK_DEV_NBD=y builds of the Linux kernel that are using builtin_mul_overflow with these types for these targets. If the semantics of __has_builtin mean "the compiler resolves these, always" then we shouldn't conditionally emit a libcall. This will still need to be worked around in the Linux kernel in order to continue to support these builds of the Linux kernel for this target with older releases of clang. Link: https://bugs.llvm.org/show_bug.cgi?id=28629 Link: https://bugs.llvm.org/show_bug.cgi?id=35922 Link: https://github.com/ClangBuiltLinux/linux/issues/1438 Reviewed By: lebedev.ri, RKSimon Differential Revision: https://reviews.llvm.org/D108928	2021-09-07 10:44:54 -07:00
Simon Pilgrim	9eda472112	[X86] X86InstrAVX512.td - remove unused template parameters. NFC. Identified in D109359	2021-09-07 17:38:20 +01:00
Kazu Hirata	5648f7170e	[Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC)	2021-09-07 09:19:33 -07:00
Kazu Hirata	5c6338de16	[RISCV] Fix "set but not used" warnings	2021-09-07 09:19:31 -07:00
Dávid Bolvanský	3b5f318f5d	[InstCombine] ror/rol(X, RotAmt) == C --> X == rol/ror(C, RotAmt) (PR51567) ``` ---------------------------------------- define i1 @src(i32 %0) { %1: %2 = fshl i32 %0, i32 %0, i32 25 %3 = icmp eq i32 %2, 5 ret i1 %3 } => define i1 @tgt(i32 %0) { %1: %2 = icmp eq i32 %0, 640 ret i1 %2 } Transformation seems to be correct! ``` https://alive2.llvm.org/ce/z/GdY8Jm Solves PR51567 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109283	2021-09-07 18:04:58 +02:00
Andrew Litteken	81d3ac0cf2	[IROutliner] Adding outlining for single entry/single exit multiblock regions Using the similarity found from the IRSimilarity Identifier, we take regions with structural similarity, and deduplicate them into a separate function. The Code Extractor is able to provide most of this functionality. For simplicity, we start by only outlining regions with a single entry and single exit branch, this reduces the complexity in handling phi nodes outside the region, and handling many sets of outputs for each of the different exit blocks. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D106990	2021-09-07 08:51:54 -07:00
Victor Huang	4a226529e2	[PowerPC] Fixed the crash due to early if conversion with fixed CR fields This patch adds a fix to do early if conversion to select when conditional branch not using physical register to prevent the crash when expanding ISEL instruction. Reviewed By: lei, kamaub, PowerPC Differential revision: https://reviews.llvm.org/D108302	2021-09-07 10:51:03 -05:00
Simon Pilgrim	f8d2cd1428	[X86] Add missing domain to avx512_ord_cmp_sae comis sae patterns It doesn't appear to be possible to generate this from tests atm, but it matches what we do in sse12_ord_cmp Fixes unused template arg identified in D109359	2021-09-07 16:20:21 +01:00
Jinsong Ji	042a6564d3	[PowerPC] Guard XSRSP in P8 for FastISel This is exposed by enabling FastIsel on 64bit AIX. We are generating XSRSP regardless of the arch, which may be wrong when -mcpu=pwr7. The fix is to guard the generation in P8 only. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D109365	2021-09-07 15:17:51 +00:00
Sander de Smalen	bd576e5ac0	[AArch64][SVE] Improve extract_subvector for predicates. Using PUNPKLO/HI instead of ZIP1/ZIP2, because that avoids having to generate a predicate with all lanes inactive (PFALSE). Reviewed By: CarolineConcatto Differential Revision: https://reviews.llvm.org/D109312	2021-09-07 15:49:29 +01:00
Peter Smith	e63455d5e0	[MC] Use local MCSubtargetInfo in writeNops On some architectures such as Arm and X86 the encoding for a nop may change depending on the subtarget in operation at the time of encoding. This change replaces the per module MCSubtargetInfo retained by the targets AsmBackend in favour of passing through the local MCSubtargetInfo in operation at the time. On Arm using the architectural NOP instruction can have a performance benefit on some implementations. For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to limit the chances of this causing problems in the future. I've not done this for other targets such as X86 as there is more frequent use of the MCSubtargetInfo and it looks to be for stable properties that we would not expect to vary per function. This change required threading STI through MCNopsFragment and MCBoundaryAlignFragment. I've attempted to take into account the in tree experimental backends. Differential Revision: https://reviews.llvm.org/D45962	2021-09-07 15:46:19 +01:00
Peter Smith	5e71839f77	[MC] Add MCSubtargetInfo to MCAlignFragment In preparation for passing the MCSubtargetInfo (STI) through to writeNops so that it can use the STI in operation at the time, we need to record the STI in operation when a MCAlignFragment may write nops as padding. The STI is currently unused, a further patch will pass it through to writeNops. There are many places that can create an MCAlignFragment, in most cases we can find out the STI in operation at the time. In a few places this isn't possible as we are in initialisation or finalisation, or are emitting constant pools. When possible I've tried to find the most appropriate existing fragment to obtain the STI from, when none is available use the per module STI. For constant pools we don't actually need to use EmitCodeAlign as the constant pools are data anyway so falling through into it via an executable NOP is no better than falling through into data padding. This is a prerequisite for D45962 which uses the STI to emit the appropriate NOP for the STI. Which can differ per fragment. Note that involves an interface change to InitSections. It is now called initSections and requires a SubtargetInfo as a parameter. Differential Revision: https://reviews.llvm.org/D45961	2021-09-07 15:46:19 +01:00
Michael Liao	640beb38e7	[amdgpu] Enable selection of `s_cselect_b64`. Differential Revision: https://reviews.llvm.org/D109159	2021-09-07 10:45:07 -04:00
Mirko Brkusanin	6c4b634da6	[AMDGPU][GlobalISel] Legalize G_MUL for non-standard types Legalizing G_MUL for non-standard types (like i33) generated an error. Putting minScalar and maxScalar instead of clampScalar. Also using new rule, instead of widening to the next power of 2, widen to the next multiple of the passed argument (32 in this case), so instead of widening i65 to i128, we widen it to i96. Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D109228	2021-09-07 16:33:24 +02:00
Mirko Brkusanin	5263bf583a	[AMDGPU][GlobalISel] Legalization of G_ROTL and G_ROTR Add implementation for the legalization of G_ROTL and G_ROTR machine instructions. They are very similar to funnel shift instructions, the only difference is funnel shifts have 3 operands, whereas rotate instructions have two operands, the first being the register that is being rotated and the second being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering them into funnel shift instructions if they are legal. Patch by: Mateja Marjanovic Differential Revision: https://reviews.llvm.org/D105347	2021-09-07 16:33:24 +02:00
Simon Pilgrim	0d48ee2774	[X86] X86InstrSSE.td - remove unused template parameters. NFC. Identified in D109359	2021-09-07 15:13:05 +01:00
Simon Pilgrim	b50a60c234	[X86] X86InstrVecCompiler.td - remove unused template parameters. NFC. Identified in D109359	2021-09-07 14:46:08 +01:00
Simon Pilgrim	fb38795062	[X86] X86InstrFMA.td - remove unused template parameters. NFC. Identified in D109359	2021-09-07 14:46:07 +01:00
Anton Afanasyev	d1f9b21677	[AggressiveInstCombine] Add `AssumptionCache` to aggressive instcombine Add support for @llvm.assume() to TruncInstCombine allowing optimizations based on these intrinsics while computing known bits.	2021-09-07 16:45:00 +03:00
Anton Afanasyev	8c0a1940c1	[AggresiveInstCombine] Add wrapper calls for `KnownBits` computing Precommit before `AssumptionCache` adding: reviews.llvm.org/D109141 Differential Revision: https://reviews.llvm.org/D109288	2021-09-07 16:45:00 +03:00
Sander de Smalen	448d47f743	[AArch64][SVE] Implement all-inactive predicate with PFALSE. Instead of using a WHILE XZR, XZR instruction, just emit a PFALSE. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D109311	2021-09-07 14:29:02 +01:00
Simon Pilgrim	0a07ae6ebf	[KnownBits] Add support for XX self-multiplication Add KnownBits handling and unit tests for XX self-multiplication cases which guarantee that bit1 of their results will be zero - see PR48683. https://alive2.llvm.org/ce/z/NN_eaR The next step will be to add suitable test coverage so this can be enabled in ValueTracking/DAG/GlobalISel - currently only a single Analysis/ScalarEvolution test is affected. Differential Revision: https://reviews.llvm.org/D108992	2021-09-07 11:43:45 +01:00
Mirko Brkusanin	36527cbe02	[AMDGPU][GlobalISel] Legalize memcpy family of intrinsics Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE. Corresponding intrinsics are replaced by a loop that uses loads/stores in AMDGPULowerIntrinsics pass unless their length is a constant lower then MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that reaches legalizer should have a const length argument and should be expanded into appropriate number of loads + stores. Differential Revision: https://reviews.llvm.org/D108357	2021-09-07 12:24:07 +02:00
Fraser Cormack	a823bdf3ab	[RISCV][VP] Custom lower VP_STORE and VP_LOAD This patch adds support for the vector-predicated `VP_STORE` and `VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and `MLOAD`: to regular load/store instructions via intrinsics. One necessary change was made to `SelectionDAGLegalize` so that `VP_STORE` nodes' operation actions are taken from the stored "value" operands, in the same vein as `STORE` or `MSTORE`. Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108999	2021-09-07 10:53:25 +01:00
Fraser Cormack	f4dee8cb82	[RISCV][VP] Custom lower VP_SCATTER and VP_GATHER This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by lowering them to RVV's `vsox`/`vlux` instructions, respectively. This process is almost identical to the existing `MSCATTER`/`MGATHER` support. One extra change was made to `SelectionDAGLegalize` so that `VP_SCATTER`'s operation action is derived from its stored "value" operand rather than its return type (which is always the chain). Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108987	2021-09-07 10:43:07 +01:00
Andrew Wei	da9ed3dc71	[AArch64] Avoid adding duplicate implicit operands when expanding pseudo insts. When expanding pseudo insts, in order to create a new machine instr, we use BuildMI, which will add implicit operands by default. And transferImpOps will also copy implicit operands from old ones. Finally, duplicate implicit operands are added to the same inst. Sometimes this can cause correctness issues. Like below inst, renamable $w18 = nsw SUBSWrr renamable $w30, renamable $w14, implicit-def dead $nzcv After expanding, it will become $w18 = SUBSWrs renamable $w13, renamable $w14, 0, implicit-def $nzcv, implicit-def dead $nzcv A redundant implicit-def $nzcv is added, but the dead flag is missing. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D109069	2021-09-07 17:11:58 +08:00
luxufan	ffcaa80f7e	[RuntimeDyld] Don't use bitwise operation on SymbolRef::Type Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D109292	2021-09-07 16:58:35 +08:00
Ben Shi	63ca9371c7	[ARM] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding in DAGCombine if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D109124	2021-09-07 15:42:43 +08:00
Craig Topper	da3ef8b756	[X86] Handle inverted inputs when matching VPTERNLOG from 2 binary ops. This is a more general version of D109273. Though it doesn't peek through bitcasts or rearange broadcasts. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D109295	2021-09-06 17:44:52 -07:00
Fangrui Song	76529b4468	[X86] Simplify condition guarding emitCalleeSavedFrameMoves. NFC	2021-09-06 15:54:02 -07:00
Fangrui Song	4f1e410a1b	[X86] Simplify two hasFP(F). NFC	2021-09-06 15:47:40 -07:00
Nikita Popov	8d54c8a0c3	[SCEV] Fix applyLoopGuards() with range check idiom (PR51760) Due to a typo, this replaced %x with umax(C1, umin(C2, %x + C3)) rather than umax(C1, umin(C2, %x)). This didn't make a difference for the existing tests, because the result is only used for range calculation, and %x will usually have an unknown starting range, and the additional offset keeps it unknown. However, if %x already has a known range, we may compute a result range that is too small.	2021-09-06 22:22:41 +02:00
Sanjay Patel	e1e4bf174b	[DAGCombine] Prevent the transform of combine for multi-use operand The test is based on a miscompile example in: https://llvm.org/PR51321 Differential Revision: https://reviews.llvm.org/D107692	2021-09-06 15:30:32 -04:00
Andrew Litteken	bd4b1b5f6d	[IRSim] Adding support for recognizing branch similarity The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them. This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar. We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter. Reviewers: paquette, yroux Differential Revision: https://reviews.llvm.org/D106989	2021-09-06 11:55:38 -07:00
Kazu Hirata	3322354bfc	[Support] Qualify auto (NFC) Identified with readability-qualified-auto.	2021-09-06 09:10:07 -07:00
Jonas Paulsson	118997d8e9	[SelectionDAGBuilder] Bugfix in visitInlineAsm() In case of a virtual register tied to a phys-def, the register class needs to be computed. Make sure that this works generally also with fast regalloc by using TLI.getRegClassFor() whenever possible, and make only the case of 'Untyped' use getMinimalPhysRegClass(). Fixes https://bugs.llvm.org/show_bug.cgi?id=51699. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D109291	2021-09-06 17:46:31 +02:00
Sanjay Patel	0d83e72034	[InstCombine] fix infinite loop from shift transform I'm not sure if there is a better way or another bug still here, but this is enough to avoid the loop from: https://llvm.org/PR51657 The test requires multiple blocks and datalayout to trigger the problem path.	2021-09-06 11:13:39 -04:00
Sanjay Patel	c85f450619	[InstCombine] refactor to reduce indent; NFC This transform should be updated to use better variable names and code comments. It could also create the shift-of-shift directly instead of relying on another combine for that.	2021-09-06 11:13:39 -04:00
Sanjay Patel	fbb78668f2	[InstCombine] fix one-use condition for shift transform This transform is written in a confusing style, and I suspect it is at fault for a more serious bug noted in PR51567. But it's been around forever, so I'm making the minimal change to fix another bug - it could increase instructions because it was not checking uses.	2021-09-06 11:13:39 -04:00
Sanjay Patel	982a15cb3f	[InstCombine] early exit to reduce indentation; NFC	2021-09-06 11:13:38 -04:00
Victor Campos	79f9c79aaf	[AArch64][MC] Merge FeaturePMU into FeaturePerfMon FeaturePMU was created in AArch64 to accommodate one missing system register, PMMIR_EL1, in commit `ffcd7698ae`. However, the Performance Monitors extension already had a target feature, which is called FeaturePerfMon. Therefore, FeaturePMU is redundant. This patch removes FeaturePMU and merges its contents into FeaturePerfMon. Reviewed By: dnsampaio Differential Revision: https://reviews.llvm.org/D109246	2021-09-06 14:56:49 +01:00
David Truby	b297531ece	[AArch64][sve] Prevent incorrect function call on fixed width vector The isEssentiallyExtractHighSubvector function currently calls getVectorNumElements on a type that in specific cases might be scalable. Since this function only has correct behaviour at the moment on scalable types anyway, the function can just return false when given a fixed type. Differential Revision: https://reviews.llvm.org/D109163	2021-09-06 14:25:03 +01:00
Sander de Smalen	96f6785bc9	[VectorUtils] Teach findScalarElement to return splat value. If the vector is a splat of some scalar value, findScalarElement() can simply return the scalar value if it knows the requested lane is in the vector. This is only needed for scalable vectors, because the InsertElement/ShuffleVector case is already handled explicitly for the fixed-width case. This helps to recognize an InstCombine fold like: extractelt(bitcast(splat(%v))) -> bitcast(%v) Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D107254	2021-09-06 10:56:06 +01:00
Tianqing Wang	12fa608af4	[X86] Add CRC32 feature. `d8faf03807` implemented general-regs-only for X86 by disabling all features with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this instruction and allows it to be used with general-regs-only. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D105462	2021-09-06 17:24:30 +08:00
Moritz Sichert	a0a5964499	[RuntimeDyld] Implemented relocation of TLS symbols in ELF Differential Revision: https://reviews.llvm.org/D105466	2021-09-06 10:27:43 +02:00
Moritz Sichert	f687378603	[RuntimeDyld] Implemented relocation for ELF::R_X86_64_GOTPC32 Differential Revision: https://reviews.llvm.org/D95512	2021-09-06 10:26:37 +02:00
Fangrui Song	0e03450ae4	[AArch64] Remove an uneeded !NeedsWinCFI check. NFC	2021-09-05 21:02:56 -07:00
guopeilin	5f48c144c5	[AArch64][GlobalISel] Use ZExtValue for zext(xor) when invert tb(n)z Currently, we use SExtValue to decide whether to invert tbz or tbnz. However, for the case zext (xor x, c), we should use ZExt rather than SExt otherwise we will generate totally opposite branches. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D108755	2021-09-06 11:12:07 +08:00
David Green	1b83aaaefa	[DAG] Remove oneuse check in select_cc setgt X, -1, C, ~C fold This appears to produce better code, even if the condition may need to be replicated.	2021-09-05 16:18:31 +01:00
Simon Pilgrim	f114ef3731	[CostModel][X86] Add generic costs for vXi32 MUL -> v2Xi16 PMADDDW folds Based off the improved fold in D108522 This should eventually allow us to replace the SLM only cost patterns with generic versions.	2021-09-05 16:08:11 +01:00
David Green	8523fb96a6	[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Given a select_cc producing a constant and a invertion of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value. https://alive2.llvm.org/ce/z/DTFaBZ This includes a OneUseCheck on the Cmp, which seems to make thinks a little worse and will be removed in a followup. Differential Revision: https://reviews.llvm.org/D109149	2021-09-05 16:04:01 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
Dávid Bolvanský	9c476172b9	[InstCombine] stpcpy(d,s) -> strcpy(d,s) if the result is not used	2021-09-05 12:12:07 +02:00
Michael Kruse	650bbc5620	[OpenMP][OpenMPIRBuilder] Implement loop unrolling. Recommit of `707ce34b06`. Don't introduce a dependency to the LLVMPasses component, instead register the required passes individually. Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are: * `unrollLoopFull` * `unrollLoopPartial` * `unrollLoopHeuristic` `unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility. With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism. Reviewed By: jdoerfert, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107764	2021-09-04 19:18:58 -05:00
Fangrui Song	e03c8d309a	[AsmPrinter] Remove unneeded MCSubtargetInfo temporary after D14346. NFC The temporary object was used as a workaround when the target parser may change STI. D14346 made the MCSubtargetInfo argument to createMCAsmParser const, so we no longer need the temporary object.	2021-09-04 10:50:10 -07:00
Dávid Bolvanský	3a696f6092	[InstCombine] rotate(X,Z) eq/ne rotate(Y,Z) ---> X eq/ne Y (PR51565) ``` ---------------------------------------- define i1 @src(i8 %x, i8 %y, i8 %z) { %0: %f = fshl i8 %x, i8 %x, i8 %z %f2 = fshl i8 %y, i8 %y, i8 %z %r = icmp eq i8 %f, %f2 ret i1 %r } => define i1 @tgt(i8 %x, i8 %y, i8 %z) { %0: %r = icmp eq i8 %x, %y ret i1 %r } Transformation seems to be correct! ``` https://alive2.llvm.org/ce/z/qAZp8f Solves PR51565 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D109271	2021-09-04 18:58:44 +02:00
Bjorn Pettersson	0f0344dd1e	[SimpleLoopUnswitch] Inform pass manager when child loops are deleted As part of the nontrivial unswitching we could end up removing child loops. This patch add a notification to the pass manager when that happens (using the markLoopAsDeleted callback). Without this there could be stale LoopAccessAnalysis results cached in the analysis manager. Those analysis results are cached based on a Loop* as key. Since the BumpPtrAllocator used to allocate Loop objects could be resetted between different runs of for example the loop-distribute pass (running on different functions), a new Loop object could be created using the same Loop pointer. And then when requiring the LoopAccessAnalysis for the loop we got the stale (corrupt) result from the destroyed loop. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D109257	2021-09-04 17:54:39 +02:00
Shivam Gupta	5449d2da65	[NFC] Run clang-format on llvm/lib/Trget/AVR/ The current inconsistency confuse contributors which coding guidlines to follow. It would be better to have it consistent using clang-format tool. Reviewed By: mhjacobson Differential Revision: https://reviews.llvm.org/D109270	2021-09-04 20:05:15 +05:30
Simon Pilgrim	cb8d96e72f	Fix Wdocumentation unknown parameter warning. NFCI.	2021-09-04 15:06:53 +01:00
Simon Pilgrim	2005ae15a6	[X86][SLM] WriteVecIMul instructions only take 1uop (REAPPLIED) The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop. I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD. But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.	2021-09-04 15:03:56 +01:00
Simon Pilgrim	ac51d69208	Revert rG994da657076900f5ad7fe593c3b5e5f89ab3d53d "[X86][SLM] WriteVecIMul instructions only take 1uop" This changed some codegen tests that I forgot about in my rebase, I'll recommit shortly with a fix.	2021-09-04 13:39:10 +01:00
Simon Pilgrim	994da65707	[X86][SLM] WriteVecIMul instructions only take 1uop The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop. I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD. But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.	2021-09-04 13:21:34 +01:00
Simon Pilgrim	c6371020a8	[X86][SLM] RMW instructions don't require an extra uop For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM: "The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and store instructions go through addresses generation phase in program order to avoid on-the-fly memory ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions." Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.	2021-09-04 13:21:34 +01:00
Simon Pilgrim	da965a77d5	[X86][SLM] Fix MUL uops, latency and throughput These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with). Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.	2021-09-04 13:21:34 +01:00
Simon Pilgrim	7d062d2c47	[X86][Atom] MUL/DIV instructions require both ports, not either. Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM.	2021-09-04 11:58:09 +01:00
Simon Pilgrim	0d0f39b0f3	[X86][Atom] Add missing UOps override to AtomWriteResPair multiclass Make it easier to describe microcoded instructions.	2021-09-04 11:58:09 +01:00
Nikita Popov	66a54af967	[WebAssembly] Support opaque pointers in AddMissingPrototypes The change here is basically the same as in D108880: Rather than looking at bitcasts, look at calls and their function type. We still need to look through bitcasts to find those calls. The change in llvm/test/CodeGen/WebAssembly/add-prototypes-conflict.ll is due to different visitation order. add-prototypes-opaque-ptrs.ll is a copy of add-prototypes.ll with -force-opaque-pointers. Differential Revision: https://reviews.llvm.org/D109256	2021-09-04 11:25:42 +02:00
Kazu Hirata	bb51f76fb1	[ForceFunctionAttrs] Add const (NFC)	2021-09-03 22:29:58 -07:00
Kevin Athey	c7f50a445e	Revert "[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)" This reverts commit `095bea23d0`. Broke buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11411	2021-09-03 18:08:58 -07:00
Ben Shi	095bea23d0	[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D108871	2021-09-04 07:24:23 +08:00
David Blaikie	bc066e26c9	DebugInfo: Fix a few bot failures for type dumping fixes	2021-09-03 14:08:58 -07:00
David Blaikie	40f1593558	DebugInfo: Correct/improve type formatting (pointers to function types especially) This does add some extra superfluous whitespace (eg: "int *") intended to make the Simplified Template Names work easier - this makes the DIE-based names match more exactly the clang-generated names, so it's easier to identify cases that don't generate matching names. (arguably we could change clang to skip that whitespace or add some fuzzy matching to accommodate differences in certain whitespace - but this seemed easier and fairly low-impact)	2021-09-03 12:22:28 -07:00
Sanjay Patel	fd807601a7	[InstCombine] fold (rotate X) eq/ne (0/-1) This generalizes the examples shown in: https://llvm.org/PR51566 https://alive2.llvm.org/ce/z/V-sEy9	2021-09-03 14:51:35 -04:00
Sanjay Patel	d1458903eb	[InstCombine] reduce code duplication; NFC	2021-09-03 14:51:35 -04:00
Stanislav Mekhanoshin	d0c064715c	[AMDGPU] Small cleanup in optimizeCompareInstr. NFC.	2021-09-03 11:31:40 -07:00
David Green	adfd12e6d1	[ARM] Add patterns for store(fptosisat(..)) As an extension to D107866, this adds store(fptosisat(..)) patterns, similar to the existing fptosi patterns, to prevent unnecessarily moving into gpr regs where we can use fp stores directly. Differential Revision: https://reviews.llvm.org/D108378	2021-09-03 19:22:11 +01:00
David Green	f37e132263	[ARM] Add VFP lowering for fptosi.sat This extends D107865 to the VFP insructions, lowering llvm.fptosi.sat and llvm.fptoui.sat to VCVT instructions that inherently perform the saturate. Differential Revision: https://reviews.llvm.org/D107866	2021-09-03 18:11:08 +01:00
Craig Topper	75620fadf5	[RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0. This patch changes the register class to avoid accidentally setting the AVL operand to X0 through MachineIR optimizations. There are cases where we really want to use X0, but we can't get that past the MachineVerifier with the register class as GPRNoX0. So I've use a 64-bit -1 as a sentinel for X0. All other immediate values should be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI insertion pass to avoid touching the rest of the algorithm. In SelectionDAG lowering I'm using a -1 TargetConstant to hide it from instruction selection and treat it differently than if the user used -1. A user -1 should be selected to a register since it doesn't fit in uimm5. This is the rest of the changes started in D109110. As mentioned there, I don't have a failing test from MachineIR optimizations anymore. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109116	2021-09-03 09:19:25 -07:00
David Spickett	02b4620348	[ORC] Static cast more uint64_t to size_t These instances don't have an obvious way to fail nicely so I've just asserted they are within range. Fixes the Arm 32 bit builds.	2021-09-03 12:30:56 +00:00
Max Kazantsev	718157283c	[LoopDeletion] Move ICmpInst handling to getValueOnFirstIteration() As noticed in https://reviews.llvm.org/D105688, it would be great to move handling of ICmpInst which was in canProveExitOnFirstIteration() to getValueOnFirstIteration(). Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D108978 Reviewed By: reames	2021-09-03 18:36:19 +07:00
Konstantin Schwarz	90d5298759	[GlobalISel] Add convenience constructors to MemDesc This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D109161	2021-09-03 12:52:18 +02:00
Simon Pilgrim	6ba0b9f68a	[X86][SLM] Fix PBLENDVB uops and throughput SLM PBLENDVB is just as bad as BLENDVPD/PS - so model it as such, fixing the rr vs rm uops diff as well. The Intel AoM appears to have a copy+paste typo with PBLENDW, it doesn't match Agner or InstLatX64. Noticed while investigating some of the weird discrepancies reported by the D103695 helper script (SLM had much better vector shift throughputs than it should).	2021-09-03 11:31:29 +01:00
gbreynoo	e28cd75a50	[OptTable] Reapply Improve error message output for grouped short options This reapplies `71d7fed3bc` which was reverted by `3e2bd82f02`. This change includes the fix for breaking the sanitizer bots. As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current implementation for parsing grouped short options can return unclear error messages. This change fixes the example given in the ticket in which a flag is incorrectly given an argument. Also when parsing a group we now keep reading past the first incorrect option and output errors for all incorrect options in the group. Differential Revision: https://reviews.llvm.org/D108770	2021-09-03 11:13:52 +01:00
Florian Mayer	abf8ed8a82	[hwasan] Support more complicated lifetimes. This is important as with exceptions enabled, non-POD allocas often have two lifetime ends: the exception handler, and the normal one. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108365	2021-09-03 10:29:50 +01:00
Stefan Gränitz	2ed91da0f1	[JITLink] Add initial Aarch64 support Set up basic infrastructure for 64-bit ARM architecture support in JITLink. It allows for loading a minimal object file and resolving a single relocation. Advanced features like GOT and PLT handling or relaxations were intentionally left out for the moment. This patch follows the idea to keep implementations for ARM (32-bit) and Aaarch64 (64-bit) separate, because: * it might be easier to share code with the MachO "arm64" JITLink backend * LLVM has individual targets for ARM and Aaarch64 as well Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D108986	2021-09-03 10:48:06 +02:00
Jingu Kang	562521e2d1	[LoopBoundSplit] Update phi node in exit block It fixes https://bugs.llvm.org/show_bug.cgi?id=51700 Differential Revision:	2021-09-03 09:10:50 +01:00
Cullen Rhodes	dc5dd77ac7	[AArch64][SME] Support NEON vector to GPR integer moves in streaming mode A small subset of the NEON instruction set is legal in streaming mode. This patch adds support for the following vector to integer move instructions: 0x00 1110 0000 0001 0010 11xx xxxx xxxx # SMOV W\|Xd,Vn.B[0] 0x00 1110 0000 0010 0010 11xx xxxx xxxx # SMOV W\|Xd,Vn.H[0] 0100 1110 0000 0100 0010 11xx xxxx xxxx # SMOV Xd,Vn.S[0] 0000 1110 0000 0001 0011 11xx xxxx xxxx # UMOV Wd,Vn.B[0] 0000 1110 0000 0010 0011 11xx xxxx xxxx # UMOV Wd,Vn.H[0] 0000 1110 0000 0100 0011 11xx xxxx xxxx # UMOV Wd,Vn.S[0] 0100 1110 0000 1000 0011 11xx xxxx xxxx # UMOV Xd,Vn.D[0] Only the zero index variants are legal, all others indexes are illegal. To support this, new instructions are defined specifically for zero index which is hardcoded, along an implicit 'VectorIndex0' operand. Since the index operand is implicit and takes no bits in the encoding, custom decoding is required to add the operand. I'm not sure if this is the best approach but the predicate constraint on a subset of an operand is unusual. Would be interested to hear some alternatives. The instructions are predicated on 'HasNEONorStreamingSVE', i.e. they're enabled by either +neon or +streaming-sve. This follows on from the work in D106272 to support the subset of SVE(2) instructions that are legal in streaming mode. Depends on D107902. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D107903	2021-09-03 07:59:17 +00:00
Cullen Rhodes	1dcd900d1d	[AArch64][ISel] NFC: DAG.getMachineFunction() -> MF Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109135	2021-09-03 07:59:17 +00:00
Amara Emerson	6d9505b8e0	[AArch64][GlobalISel] Support for folding G_ROTR as shifted operands. This allows selection like: eor w0, w1, w2, ror #8 Saves 500 bytes on ClamAV -Os, which is 0.1%. Differential Revision: https://reviews.llvm.org/D109206	2021-09-02 21:37:24 -07:00
Qiu Chaofan	d0f9553ef5	[PowerPC] Enable fast-isel on AIX 64 subtarget This patch basically enables fast-isel for AIX 64-bit subtarget (previously enabled only for ELF 64). The initial motivation is to introduce branch folding to AIX generated code for correct debug behavior. I also saw some compiling time improvement in a few LLVM test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.) Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D98844	2021-09-03 11:33:45 +08:00
Chen Zheng	34badc409c	Revert "[HardwareLoops] Change order of SCEV expression construction for InitLoopCount." This causes https://bugs.llvm.org/show_bug.cgi?id=51714 and is not a right patch according to comments in D91724 This reverts commit `42eaf4fe0a`.	2021-09-03 02:55:43 +00:00
Matt Arsenault	79bcd4a7db	AMDGPU: Remove FeatureLocalMemorySize0 There's no reason to make this an explicit feature, since it's implied by the lack of a feature with a size.	2021-09-02 22:43:01 -04:00
Alexander Pivovarov	6cd4b508a8	[RISCV] Add SiFive core S51 Add SiFive core s51 as rv64imac RocketModel Reviewed-By: MaskRay, evandro Differential Revision: https://reviews.llvm.org/D108886	2021-09-02 18:45:25 -07:00
PeixinQiao	a42380ce83	[OMPIRBuilder] Add ordered directive to OMPBuilder Add support for ordered directive in the OpenMPIRBuilder. This patch also modidies clang to use the ordered directive when the option -fopenmp-enable-irbuilder is enabled. Also fix one ICE when parsing one canonical for loop with the relational operator LE or GE in openmp region by replacing unary increment operation of the expression of the variable "Expr A" minus the variable "Expr B" (++(Expr A - Expr B)) with binary addition operation of the experssion of the variable "Expr A" minus the variable "Expr B" and the expression with constant value "1" (Expr A - Expr B + "1"). Reviewed By: Meinersbur, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107430	2021-09-03 09:37:58 +08:00
Anna Thomas	f661ce209f	[LoopPredication] Fix MemorySSA crash in predicateLoopExits The attached testcase crashes without the patch (Not the same accesses in the same order). When we move instructions before another instruction, we also need to update the memory accesses corresponding to it. Reviewed-By: asbirlea Differential Revision: https://reviews.llvm.org/D109197	2021-09-02 21:26:07 -04:00
Alexander Pivovarov	1104e3258b	Fix typo in RISCVMatInt.cpp comments	2021-09-02 18:11:09 -07:00
Stanislav Mekhanoshin	78fbd1aa3d	[AMDGPU] Process any power of 2 in optimizeCompareInstr Differential Revision: https://reviews.llvm.org/D109201	2021-09-02 17:39:17 -07:00
Xun Li	2cf30c4769	[Coroutines] Only run verifyFunction in debug mode verifyFunction can be really slow on large functions. This can significantly slow down compilation in production. Given that coroutine passes are fairly stable now, we should only run it in debug mode. Differential Revision: https://reviews.llvm.org/D109198	2021-09-02 17:35:01 -07:00
Wenlei He	054487c5b2	[CSSPGO] Honor preinliner decision for ThinLTO importing When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context. Differential Revision: https://reviews.llvm.org/D109088	2021-09-02 17:29:26 -07:00
Stanislav Mekhanoshin	2cfda6a691	[AMDGPU] Fold immediates in the optimizeCompareInstr Peephole works before the first SIFoldOperands so most of the immediates are in registers. Differential Revision: https://reviews.llvm.org/D109186	2021-09-02 17:23:26 -07:00
Sam Clegg	c32884c482	[WebAssembly] Rename WrapperPIC -> WrapperREL. NFC This ISD node/wrapper represents am address which is relative to a base address and therefore lowers to `i32.const` rather than `global.get`. Use this wrapper type for TLS-relative addresses, paving the way for the non-REL wrapper to be used to external TLS address once those are supported. Differential Revision: https://reviews.llvm.org/D109179	2021-09-02 20:04:34 -04:00
Philip Reames	fa82a3d016	[runtimeunroll] Support epilogue unrolling with a parent loop This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop. When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop. Differential Revision: https://reviews.llvm.org/D108476	2021-09-02 16:29:20 -07:00
Philip Reames	45c672e20d	[runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info Requested in review comment on D108476	2021-09-02 16:28:46 -07:00
Lang Hames	dad60f8071	[ORC] Add EPCGenericJITLinkMemoryManager: memory management via EPC calls. All ExecutorProcessControl subclasses must provide a JITLinkMemoryManager object that can be used to allocate memory in the executor process. The EPCGenericJITLinkMemoryManager class provides an off-the-shelf JITLinkMemoryManager implementation for JITs that do not need (or cannot provide) a specialized JITLinkMemoryManager implementation. This simplifies the process of creating new ExecutorProcessControl implementations.	2021-09-03 08:28:29 +10:00
Jessica Paquette	844d8e0337	[GlobalISel] Combine icmp eq/ne x, 0/1 -> x when x == 0 or 1 This adds the following combines: ``` x = ... 0 or 1 c = icmp eq x, 1 -> c = x ``` and ``` x = ... 0 or 1 c = icmp ne x, 0 -> c = x ``` When the target's true value for the relevant types is 1. This showed up in the following situation: https://godbolt.org/z/M5jKexWTW SDAG currently supports the `ne` case, but not the `eq` case. This can probably be further generalized, but I don't feel like thinking that hard right now. This gives some minor code size improvements across the board on CTMark at -Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.) Differential Revision: https://reviews.llvm.org/D109130	2021-09-02 15:05:31 -07:00
Kirill Stoimenov	cf53c6c971	[asan] Fixed link error by setting jump symbol to R_X86_64_PLT32. Fixing this link error: ld: error: relocation R_X86_64_PC32 cannot be used against symbol __asan_report_load...; recompile with -fPIC Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D109183	2021-09-02 21:50:56 +00:00
Kevin Athey	04ed6e7afc	Revert "[CSSPGO] Honor preinliner decision for ThinLTO importing" This reverts commit `a2768b4732`. Breaks sanitizer-x86_64-linux-fast buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11334 Log snippet: Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80 FAIL: LLVM :: Transforms/SampleProfile/early-inline.ll (65549 of 78729) ****************** TEST 'LLVM :: Transforms/SampleProfile/early-inline.ll' FAILED ****************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -instcombine -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/einline.prof -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -- Exit Code: 2 Command Output (stderr): -- /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples' #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase, 8u>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21 #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13 #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16 #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12 #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>, llvm::ProfileSummaryInfo, llvm::CallGraph) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15 #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21 #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17 #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21 #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine, llvm::TargetLibraryInfoImpl, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7 #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12 #11 0x7fbf4de4009a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in FileCheck error: '<stdin>' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/early-inline.ll -- ****************** Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80 FAIL: LLVM :: Transforms/SampleProfile/inline-cold.ll (65643 of 78729) **************** TEST 'LLVM :: Transforms/SampleProfile/inline-cold.ll' FAILED ****************** Script: -- : 'RUN: at line 4'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 5'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 8'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 11'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=9999999 -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll : 'RUN: at line 14'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -passes=sample-profile -sample-profile-file=/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/Inputs/inline-cold.prof -sample-profile-inline-size -sample-profile-cold-inline-threshold=-500 -S \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=NOTINLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -- Exit Code: 2 Command Output (stderr): -- /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53: runtime error: member call on null pointer of type 'llvm::sampleprof::FunctionSamples' #0 0x5a730f8 in shouldInlineCandidate /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 #1 0x5a730f8 in (anonymous namespace)::SampleProfileLoader::tryInlineCandidate((anonymous namespace)::InlineCandidate&, llvm::SmallVector<llvm::CallBase, 8u>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1178:21 #2 0x5a6cda6 in inlineHotFunctions /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1105:13 #3 0x5a6cda6 in (anonymous namespace)::SampleProfileLoader::emitAnnotations(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1633:16 #4 0x5a5fcbe in runOnFunction /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2008:12 #5 0x5a5fcbe in (anonymous namespace)::SampleProfileLoader::runOnModule(llvm::Module&, llvm::AnalysisManager<llvm::Module>, llvm::ProfileSummaryInfo, llvm::CallGraph) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1922:15 #6 0x5a5de55 in llvm::SampleProfileLoaderPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:2038:21 #7 0x6552a01 in llvm::detail::PassModel<llvm::Module, llvm::SampleProfileLoaderPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17 #8 0x57f807c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:526:21 #9 0x37c8522 in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine, llvm::TargetLibraryInfoImpl, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::ToolOutputFile, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:489:7 #10 0x37e7c11 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:830:12 #11 0x7fcd534a209a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #12 0x379e519 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x379e519) SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/IPO/SampleProfile.cpp:1309:53 in FileCheck error: '<stdin>' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/FileCheck -check-prefix=INLINE /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SampleProfile/inline-cold.ll -- ****************** Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. ****************** Failed Tests (2): LLVM :: Transforms/SampleProfile/early-inline.ll LLVM :: Transforms/SampleProfile/inline-cold.ll	2021-09-02 14:48:31 -07:00
Arthur Eubanks	813a7f1ad7	[MemorySSA] Properly handle liveOnEntry in the walker printer Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109177	2021-09-02 12:51:27 -07:00
Arthur Eubanks	92b94a6d0c	[Verifier] Only allow invariant.group metadata on stores and loads As specified by https://llvm.org/docs/LangRef.html#invariant-group-metadata. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109182	2021-09-02 12:49:04 -07:00
Sam Clegg	4664590d53	[WebAssemlby] Remove redundant SDTypeProfile. NFC I added this back in https://reviews.llvm.org/D54647 but it wasn't actually needed. Differential Revision: https://reviews.llvm.org/D109176	2021-09-02 15:21:22 -04:00
Wenlei He	f7fff46acc	[CSSPGO] Allow inlining recursive call for preinliner When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining. However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining. The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner. Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change. This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used. With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks) Differential Revision: https://reviews.llvm.org/D109104	2021-09-02 11:24:27 -07:00
Nikita Popov	c86e1ce73b	[SCEVExpander] Simplify pointer overflow check This is a followup to D104662 to generate slightly nicer code for pointer overflow checks. Bypass expandAddToGEP and instead explicitly generate i8 GEPs. This saves some bitcasts and negates the value in a more obvious way. In particular, this prevents SCEV from looking through the umul.with.overflow, same as in the integer case. The wrapping-pointer-ni.ll test deserves a comment: Previously, this generated a typed GEP which used the umulo argument rather than the multiplication result. This results in more compact IR in that case, but effectively does the multiplication twice, the second one is just hidden in the GEP. Reusing the umulo result seems pretty reasonable to me. Differential Revision: https://reviews.llvm.org/D109093	2021-09-02 20:15:59 +02:00
Sam Clegg	ad2f94f398	[WebAssembly] Fix names of WebAssemblyWrapper SDNodes. NFC Other platforms all use CamelCase as normal for these wrapper nodes. Differential Revision: https://reviews.llvm.org/D109172	2021-09-02 13:54:44 -04:00
Heejin Ahn	28780e59f6	[WebAssembly] Add Wasm SjLj support This add support for SjLj using Wasm exception handling instructions: https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md This does not yet support the mixed use of EH and SjLj within a function. It will be added in a follow-up CL. This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s, except for the below: - `test_longjmp_standalone`: Uses Node - `test_dlfcn_longjmp`: Uses NodeRAWFS - `test_longjmp_throw`: Mixes EH and SjLj - `test_exceptions_longjmp1`: Mixes EH and SjLj - `test_exceptions_longjmp2`: Mixes EH and SjLj - `test_exceptions_longjmp3`: Mixes EH and SjLj Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D108960	2021-09-02 10:51:02 -07:00
Nick Desaulniers	6860b136b9	[MipsISelLowering] avoid emitting libcalls to __multi3 Similar to D108842 and D108844. __has_builtin(builtin_mul_overflow) returns true for 32b MIPS targets, but Clang is deferring to compiler RT when encountering long long types. This breaks MIPS malta_defconfig builds of the Linux kernel that are using __builtin_mul_overflow with these types for these targets. If the semantics of __has_builtin mean "the compiler resolves these, always" then we shouldn't conditionally emit a libcall. This will still need to be worked around in the Linux kernel in order to continue to support malta_defconfig builds of the Linux kernel for this target with older releases of clang. Link: https://bugs.llvm.org/show_bug.cgi?id=28629 Link: https://github.com/ClangBuiltLinux/linux/issues/1438 Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D108926	2021-09-02 10:41:37 -07:00
Daniil Suchkov	5c97507e2b	[InlineCost] Introduce attributes to override InlineCost for inliner testing This patch introduces four new string attributes: function-inline-cost, function-inline-threshold, call-inline-cost and call-threshold-bonus. These attributes allow you to selectively override some aspects of InlineCost analysis. That would allow us to test inliner separately from the InlineCost analysis. That could be useful when you're trying to write tests for inliner and you need to test some very specific situation, like "the inline cost has to be this high", or "the threshold has to be this low". Right now every time someone does that, they have get creative to come up with a way to make the InlineCost give them the number they need (like adding ~30 load/add pairs for a trivial test). This process can be somewhat tedious which can discourage some people from writing enough tests for their changes. Also, that results in tests that are fragile and can be easily broken without anyone noticing it because the test writer can't explicitly control what input the inliner will get from the inline cost analysis. These new attributes will alleviate those problems to an extent. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D109033	2021-09-02 17:35:06 +00:00
Craig Topper	3e89cc5cda	[X86] Remove isel predicates for xgetbv/xsetbv instructions so they can work on Windows. https://reviews.llvm.org/D56686 was supposed to allow these to work on Windows without needing to enable the xsave feature to match MSVC. It seems this didn't work because the backend isel patterns would still block it. This patch removes the predicates from the isel patterns. Fixes PR51706. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D109097	2021-09-02 10:25:02 -07:00
Stanislav Mekhanoshin	832c87b4fb	[AMDGPU] Use S_BITCMP0_* to replace AND in optimizeCompareInstr These can be used for reversed conditions if result of the AND is unused except in the compare: s_cmp_eq_u32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0 s_cmp_eq_i32 (s_and_b32 $src, 1), 0 => s_bitcmp0_b32 $src, 0 s_cmp_eq_u64 (s_and_b64 $src, 1), 0 => s_bitcmp0_b64 $src, 0 s_cmp_lg_u32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0 s_cmp_lg_i32 (s_and_b32 $src, 1), 1 => s_bitcmp0_b32 $src, 0 s_cmp_lg_u64 (s_and_b64 $src, 1), 1 => s_bitcmp0_b64 $src, 0 Differential Revision: https://reviews.llvm.org/D109099	2021-09-02 09:38:01 -07:00
Simon Pilgrim	d66d520fe1	[X86][SSE] combineMulToPMADDWD - improve recognition of sign/zero extended upper bits PMADDWD(v8i16 x, v8i16 y) == (v4i32) { (int)x[0]y[0] + (int)x[1]y[1], ..., (int)x[6]y[6] + (int)x[7]y[7] } Currently combineMulToPMADDWD only folds cases where the upper 17 bits of both vXi32 inputs are known zero (i.e. the first half is positive and the second half of the pair is zero in each 2xi16 pair), this can be relaxed to only require one zero-extended input if the other input has at least 17 sign bits. That way the sign of the result is still preserved, and the second half is still zero. Noticed while investigating PR47437. Differential Revision: https://reviews.llvm.org/D108522	2021-09-02 17:36:22 +01:00
Kazu Hirata	e1bb54b593	[clangd, llvm] Remove redundant calls to c_str() (NFC) Identified with readability-redundant-string-cstr.	2021-09-02 09:07:13 -07:00
Wenlei He	a2768b4732	[CSSPGO] Honor preinliner decision for ThinLTO importing When pre-inliner decision is used for CSSPGO, we should take that into account for ThinLTO importing as well, so post-link sample loader inliner can favor that decision. This is handled by a small tweak in this patch. It also includes a change to transfer preinliner decision when merging context. Differential Revision: https://reviews.llvm.org/D109088	2021-09-02 08:24:06 -07:00
Bradley Smith	14e1a4a6ee	[AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter When lowering a fixed length gather/scatter the index type is assumed to be the same as the memory type, this is incorrect in cases where the extension of the index has been folded into the addressing mode. For now add a temporary workaround to fix the codegen faults caused by this by preventing the removal of this extension. At a later date the lowering for SVE gather/scatters will be redesigned to improve the way addressing modes are handled. As a short term side effect of this change, the addressing modes generated for fixed length gather/scatters will not be optimal. Differential Revision: https://reviews.llvm.org/D109145	2021-09-02 15:07:24 +00:00
Craig Topper	b5fd6b46f5	[RISCV] Teach instruction selection to elide sext.w in some cases. If a sext_inreg is up for isel, and all its users are W instructions, we can skip emitting the sext_inreg. This helpful if the producing instruction can't become a W instruction. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108966	2021-09-02 07:54:34 -07:00
Evandro Menezes	5ebdb07e7e	[RISCV] Enable shrink wrap by default Differential Revision: https://reviews.llvm.org/D109037	2021-09-02 09:47:58 -05:00
Craig Topper	e4e69ba4d1	[RISCV] Split PseudoVSETVLI into 2 instructions to allow different register classes for rs1. X0 has special meaning for vsetvli, we need to make sure we never create it a vsetvli that uses it by accident. This could happen if the register coalescer coalesces a copy from X0 into this instruction. This patch splits the instruction so that we can have GPRNoX0 register class to use for the cases where we don't want the source to be X0. The verifier won't let us explicitly use X0 on a GPRNoX0 operand so we need a separate pseudo for those cases. I don't currently have a failing example for this. There was a failure in D107957, but the coalescable copy from that example should have been optimized away much earlier so I've fixed that. This is not a complete fix. We still need to prevent the same possible issue on the AVL operand of all of the vector instruction pseudos. I don't want to make two versions of all of those so we need to find a different solution for those. I have an idea I'm going to try. Differential Revision: https://reviews.llvm.org/D109110	2021-09-02 07:45:31 -07:00
Piotr Sobczak	30d6c39bca	[AMDGPU] Add merging into S_BUFFER_LOAD_DWORDX8_IMM Extend SILoadStoreOptimizer to merge into DWORDX8 variant of S_BUFFER_LOAD. Merging into DWORDX2 and DWORDX4 variants is handled already. Differential Revision: https://reviews.llvm.org/D108909	2021-09-02 16:26:25 +02:00
David Green	9cb8f4d1ad	[ARM] Add a tail-predication loop predicate register The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words: mov r3, #3 DLSTP lr, r3 // Start tail predication, lr==3 VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0. mov lr, #1 VADD.s32 q0, q1, q2 // Only first lane is updated. This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions. This patch adds a new lr predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be used. A lot of tests needed updating. Differential Revision: https://reviews.llvm.org/D107638	2021-09-02 13:42:58 +01:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
Simon Pilgrim	b0acd6c369	[X86] Fold PMADD(x,0) or PMADD(0,x) -> 0 Pulled out of D108522 - handle zero-operand cases for PMADDWD/VPMADDUBSW ops	2021-09-02 10:48:50 +01:00
Roman Lebedev	50634deaa5	Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling." Breaks build with -DBUILD_SHARED_LIBS=ON ``` CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle): "LLVMFrontendOpenMP" of type SHARED_LIBRARY depends on "LLVMPasses" (weak) "LLVMipo" of type SHARED_LIBRARY depends on "LLVMFrontendOpenMP" (weak) "LLVMCoroutines" of type SHARED_LIBRARY depends on "LLVMipo" (weak) "LLVMPasses" of type SHARED_LIBRARY depends on "LLVMCoroutines" (weak) depends on "LLVMipo" (weak) At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries. CMake Generate step failed. Build files cannot be regenerated correctly. ``` This reverts commit `707ce34b06`.	2021-09-02 12:42:23 +03:00
Fraser Cormack	ef78f2106c	[LegalizeTypes][VP] Add splitting support for binary VP ops This patch extends D107904's introduction of vector-predicated (VP) operation legalization to include vector splitting. When the result of a binary VP operation needs splitting, all of its operands are split in kind. The two operands and the mask are split as usual, and the vector-length parameter EVL is "split" such that the low and high halves each execute the correct number of elements. Tests have been added to the RISC-V target to show splitting several scenarios for fixed- and scalable-vector types. Without support for `umax` (e.g. in the `B` extension) the generated code starts to branch. Ideally a cost model would prevent their insertion in the first place. Through these tests many opportunities for better codegen can be seen: combining known-undef VP operations and for constant-folding operations on `ISD::VSCALE`, to name but a few. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107957	2021-09-02 10:15:53 +01:00
Simon Moll	ea2cdbf5e6	[VP] Declaration and docs for vp.select intrinsic llvm.vp.select extends the regular select instruction with an explicit vector length (%evl). All lanes with indexes at and above %evl are undefined. Lanes below %evl are taken from the first input where the mask is true and from the second input otherwise. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D105351	2021-09-02 11:17:14 +02:00
David Sherwood	d581d94385	[SVE] Fix the FP arithmetic instruction costs for SVE Several FP instructions (fadd, fsub, etc.) were incorrectly assigned a higher cost for SVE because they have custom lowering, however we know they are legal. This patch explicitly assigns a cost of 2 to these opcodes. Tests added here: Analysis/CostModel/AArch64/arith-fp-sve.ll Differential Revision: https://reviews.llvm.org/D108993	2021-09-02 09:55:13 +01:00
Fangrui Song	dfb7518df1	[MC] Set SHF_INFO_LINK on SHT_REL/SHT_RELA sections sh_info links to a section, therefore SHF_INFO_LINK should be set as GNU as does. The issue has been benign because linkers kindly combines relocation sections w/ and w/o the flag.	2021-09-02 01:00:51 -07:00
Michael Kruse	707ce34b06	[OpenMP][OpenMPIRBuilder] Implement loop unrolling. Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are: * `unrollLoopFull` * `unrollLoopPartial` * `unrollLoopHeuristic` `unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility. With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism. Reviewed By: jdoerfert, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107764	2021-09-02 02:37:25 -05:00
Wenlei He	c000b8bd5c	[CSSPGO] Use preinliner decision by default when available For CSSPGO, turn on `sample-profile-use-preinliner` by default. This simplifies the use of llvm-profgen preinliner as it's now simply driven by ContextShouldBeInlined flag for each context profile without needing extra compiler switch. Note that llvm-profgen's preinliner is still off by default, under switch `csspgo-preinliner`. Differential Revision: https://reviews.llvm.org/D109111	2021-09-01 23:45:38 -07:00
Markus Lavin	304f2bd21d	[NPM] Added opt option -print-pipeline-passes. Added opt option -print-pipeline-passes to print a -passes compatible string describing the built pass pipeline. As an example: $ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass At the moment this is best-effort only and there are some known limitations: - Not all passes accepting parameters will print their parameters (currently only implemented for simplifycfg). - Some ClassName to pass-name mappings are not unique. - Some ClassName to pass-name mappings are missing (e.g. BitcodeWriterPass). Differential Revision: https://reviews.llvm.org/D108298	2021-09-02 08:23:33 +02:00
Markus Lavin	645af79e8e	Revert "[NPM] Added opt option -print-pipeline-passes." This reverts commit `c71869ed4c`.	2021-09-02 08:22:17 +02:00
Markus Lavin	c71869ed4c	[NPM] Added opt option -print-pipeline-passes. Added opt option -print-pipeline-passes to print a -passes compatible string describing the built pass pipeline. As an example: $ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass At the moment this is best-effort only and there are some known limitations: - Not all passes accepting parameters will print their parameters (currently only implemented for simplifycfg). - Some ClassName to pass-name mappings are not unique. - Some ClassName to pass-name mappings are missing (e.g. BitcodeWriterPass).	2021-09-02 08:16:51 +02:00
Abinav Puthan Purayil	0baace5379	[DAGCombine] Add node level checks for fp-contract and fp-ninf in visitFMULForFMADistributiveCombine(). Differential Revision: https://reviews.llvm.org/D107551	2021-09-02 11:33:14 +05:30
Jinsong Ji	8671191d26	[NFC][PowerPC] Small code refactor in LoopInstrFormPrep Avoid some duplicate code. Reviewed By: #powerpc, shchenz Differential Revision: https://reviews.llvm.org/D109083	2021-09-02 03:16:01 +00:00
Arthur Eubanks	7b08d9da55	Reland [MemorySSA] Add pass to print results of MemorySSA walker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D109028	2021-09-01 18:58:57 -07:00
Arthur Eubanks	0f63496ea4	Revert "[MemorySSA] Add pass to print results of MemorySSA walker" This reverts commit `8f98477c2d`. Breaks bots	2021-09-01 18:45:19 -07:00
Chen Zheng	2596120199	[PowerPC] small code format refactor ; NFC address the code review comments in patch https://reviews.llvm.org/D105872	2021-09-02 01:39:32 +00:00

... 6 7 8 9 10 ...

151087 Commits