llvm-project

Commit Graph

Author	SHA1	Message	Date
Lang Hames	337e131ca7	[RuntimeDyld][COFF] Build stubs for COFF dllimport symbols. Summary: Enables JIT-linking by RuntimeDyld of COFF objects that contain references to dllimport symbols. This is done by recognizing symbols that start with the reserved "__imp_" prefix and building a pointer entry to the target symbol in the stubs area of the section. References to the "__imp_" symbol are updated to point to this pointer. Work in progress: The generic code is in place, but only RuntimeDyldCOFFX86_64 and RuntimeDyldCOFFI386 have been updated to look for and update references to dllimport symbols. Reviewers: compnerd Subscribers: hiraditya, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75884	2020-03-10 16:08:40 -07:00
Matt Arsenault	edd0dfca0d	AMDGPU/GlobalISel: Refine G_TRUNC legality rules Scalarize most truncates. Avoid touching cases that could end up in unresolvable infinite loops.	2020-03-10 15:32:22 -07:00
Matt Arsenault	ce8a1f7294	GlobalISel: Implement fewerElementsVector for G_TRUNC Extend fewerElementsVectorBasic to handle operands with different element types.	2020-03-10 15:17:20 -07:00
Matt Arsenault	200b20639a	AMDGPU: Use V_MAC_F32 for fmad.ftz This avoids regressions in a future patch. I'm confused by the use of the gfx9 usage legacy_mad. Was this a pointless instruction rename, or uses fmul_legacy handling? Why is regular mac avilable in that case?	2020-03-10 14:41:06 -07:00
Fangrui Song	a0c0389ffb	[SimplifyLibcalls] Don't replace locked IO (fgetc/fgets/fputc/fputs/fread/fwrite) with unlocked IO (_unlocked) This essentially reverts some of the SimplifyLibcalls part changes of D45736 [SimplifyLibcalls] Replace locked IO with unlocked IO. C11 7.21.5.2 The fflush function > If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above. i.e. fopen'ed FILE is inherently captured. POSIX.1-2017 getc_unlocked, getchar_unlocked, putc_unlocked, putchar_unlocked - stdio with explicit client locking > These functions can safely be used in a multi-threaded program if and only if they are called while the invoking thread owns the ( FILE ) object, as is the case after a successful call to the flockfile() or ftrylockfile() functions. After a thread fopen'ed a FILE, when it is calling foobar() which is now replaced by foobar_unlocked(), if another thread is concurrently calling fflush(0), the behavior is undefined. C11 7.22.4.4 The exit function > Next, all open streams with unwritten buffered data are flushed, all open streams are closed, and all files created by the tmpfile function are removed. The replacement is only feasible if the program is single threaded, or exit or fflush(0) is never called. See also http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180528/556615.html for how the replacement makes libc interceptors difficult to implement. dalias: in a worst case, it's unbounded data corruption because of concurrent access to pointers without synchronization. f->wpos or rpos could get outside of the buffer, thread A could do f->wpos += j after knowing j is in bounds, while thread B also changes it concurrently. This can produce exploitable conditions depending on libc internals. Revert the SimplifyLibcalls part change because the cons obviously overweigh the pros. Even when the replacement is feasible, the benefit is indemonstrable, more so in an application instead of an artificial glibc benchmark. Theoretically the replacement could be beneficial when calling getc_unlocked/putc_unlocked in a loop, but then it is better using a blocked IO operation and the user is likely aware of that. The function attribute inference is still useful and thus kept. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D75933	2020-03-10 11:11:58 -07:00
Matt Arsenault	c4de8935a5	ARM: Fixup some tests using denormal-fp-math attribute Don't use the deprecated, single mode form in tests. Also make sure to parse the attribute, in case of the deprecated form.	2020-03-10 14:02:06 -04:00
Tyker	a4cde9ad7b	Fixed [AssumeBundles] Move to IR so it can be used by Analysis This is a recommit of `57c964aaa7` after fixing modules build.	2020-03-10 18:02:39 +01:00
Kazushi (Jam) Marukawa	3dabad1af3	[VE] Target-specific bit size for sjljehprepare Summary: This patch extends the TargetMachine to let targets specify the integer size used by the sjljehprepare pass. This is 64bit for the VE target and otherwise defaults to 32bit for all targets, which was hard-wired before. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D71337	2020-03-10 17:51:16 +01:00
Simon Moll	d871ef4e6a	[instcombine] remove fsub to fneg hacks; only emit fneg Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp negation. This also extends the scalarization cost in instcombine for unary operators to result in the same IR rewrites for fneg as for the idiom. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D75467	2020-03-10 16:57:02 +01:00
Simon Pilgrim	c8ede5e485	[X86][SSE] getFauxShuffleMask - add support for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.	2020-03-10 15:42:37 +00:00
Simon Pilgrim	e6a7e3b5e3	[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles. This causes one minor test change but is mainly necessary for an upcoming patch.	2020-03-10 15:42:36 +00:00
Simon Pilgrim	417fe39be5	[X86][SSE] Add some extract+insert shuffle tests Shows failure to avoid xmm<->gpr transfers by using insertps/blendps	2020-03-10 15:42:36 +00:00
Matt Arsenault	67cfbec746	AMDGPU/GlobalISel: Insert readfirstlane on SGPR returns In case the source value ends up in a VGPR, insert a readfirstlane to avoid producing an illegal copy later. If it turns out to be unnecessary, it can be folded out.	2020-03-10 11:18:48 -04:00
Jonas Paulsson	62ff9960d3	[SystemZ] Improve foldMemoryOperandImpl(). Swap the compare operands if LHS is spilled while updating the CCMask:s of the CC users. This is relatively straight forward since the live-in lists for the CC register can be assumed to be correct during register allocation (thanks to `659efa2`). Also fold a spilled operand of an LOCR/SELR into an LOC(G). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D67437	2020-03-10 15:54:47 +01:00
Florian Hahn	c8c14d979a	[InstCombine] Support vectors in SimplifyAddWithRemainder. SimplifyAddWithRemainder currently also matches for vector types, but tries to create an integer constant, which causes a crash. By using Constant::getIntegerValue() we can support both the scalar and vector cases. The 2 added test cases crash without the fix. Reviewers: spatel, lebedev.ri Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D75906	2020-03-10 14:29:40 +00:00
Jonas Paulsson	c2dafe12dc	[SimplifyCFG] Skip merging return blocks if it would break a CallBr. SimplifyCFG should not merge empty return blocks and leave a CallBr behind with a duplicated destination since the verifier will then trigger an assert. This patch checks for this case and avoids the transformation. CodeGenPrepare has a similar check which also has a FIXME comment about why this is needed. It seems perhaps better if these two passes would eventually instead update the CallBr instruction instead of just checking and avoiding. This fixes https://bugs.llvm.org/show_bug.cgi?id=45062. Review: Craig Topper Differential Revision: https://reviews.llvm.org/D75620	2020-03-10 14:59:13 +01:00
Sanjay Patel	6e60e1025f	[InstCombine] regenerate test checks; NFC tmp -> t because 'tmp' tends to cause problems for the auto-generation script.	2020-03-10 09:57:41 -04:00
Simon Pilgrim	e71fb46a8f	[TargetLowering] SimplifyDemandedVectorElts - add DemandedElts mask to ISD::BITCAST SimplifyDemandedBits call. This fixes most of the regressions introduced in the rG4bc6f6332028 bugfix. The vector-trunc.ll issue should be fixed by D66004.	2020-03-10 13:39:10 +00:00
Sanjay Patel	467eec0910	[InstCombine] fold gep-of-select-of-constants (PR45084) As shown in: https://bugs.llvm.org/show_bug.cgi?id=45084 ...we failed to combine a gep with constant indexes with a pointer operand that is a select of constants. Differential Revision: https://reviews.llvm.org/D75807	2020-03-10 09:25:13 -04:00
Sanjay Patel	5b465ad290	[InstCombine] add/adjust tests for select-gep; NFC Goes with D75807	2020-03-10 09:25:13 -04:00
Florian Hahn	2d6ecf4648	[SLP] Support vectorizing functions provided by vector libs. It seems like the SLPVectorizer is currently not aware of vector versions of functions provided by libraries like Accelerate [1]. This patch updates SLPVectorizer to use the same infrastructure the LoopVectorizer uses to detect vectorizable library functions. For calls, it computes the cost of an intrinsic call (existing behavior) and the cost of a vector function library call, if available. Like LoopVectorizer, it assumes the cost of the vector function is simply the cost of a call to a vector function. [1] https://developer.apple.com/documentation/accelerate Reviewers: ABataev, RKSimon, spatel Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D75878	2020-03-10 13:10:50 +00:00
Simon Pilgrim	5cbddf7cbc	[X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll	2020-03-10 11:59:40 +00:00
Simon Pilgrim	9b05596eff	[SLPVectorizer][X86] Add fmaxnum/fminnum tests	2020-03-10 11:18:28 +00:00
Simon Pilgrim	0b1dc6016f	[CostModel][X86] Add fmaxnum/fminnum costs tests	2020-03-10 11:18:27 +00:00
Simon Pilgrim	b9b96adcf5	[X86][SSE] Add SSE41 coverage for fmaxnum/fminnum tests	2020-03-10 11:18:27 +00:00
alex-t	39e1a90784	[AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block Summary: When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop. To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result. %95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32 results to s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_mov_b64 exec, s[6:7] The bug appeared in case this expansion occurs in the ELSE block of the CF. Originally %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32, %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec expanded to ****************** <== here exec has "THEN" context s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_or_saveexec_b64 s[4:5], s[4:5] <-- exec mask is restored for "ELSE" but immediately overwritten. s_mov_b64 exec, s[6:7] The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask" SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec. Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75472	2020-03-10 14:04:22 +03:00
Kerry McLaughlin	0bba37a320	[AArch64][SVE] Add SVE intrinsics for address calculations Summary: Adds the @llvm.aarch64.sve.adr[b\|h\|w\|d] intrinsics Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75858	2020-03-10 10:53:37 +00:00
James Greenhalgh	f0de8d0940	[Arm] Do not lower vmax/vmin to Neon instructions On some Arm cores there is a performance penalty when forwarding from an S register to a D register. Calculating VMAX in a D register creates false forwarding hazards, so don't do that unless we're on a core which specifically asks for it. Patch by James Greenhalgh Differential Revision: https://reviews.llvm.org/D75248	2020-03-10 10:48:48 +00:00
Simon Pilgrim	18c19441d1	[X86][AVX] combineX86ShuffleChain - combine binary shuffles to X86ISD::VPERM2X128 For pre-AVX512 targets, combine binary shuffles to X86ISD::VPERM2X128 if possible. This mainly helps optimize the blend(extract_subvector(x,1),y) pattern. At some point soon we're going to have make a decision about when to combine AVX512 shuffles more aggressively - we bail out if there is any change in element size (to protect predicate mask merging) which means we miss out on a lot of optimizations.	2020-03-10 10:44:28 +00:00
Clement Courbet	30477197b3	[ExpandMemCmp][NFC] Add more tests.	2020-03-10 11:34:19 +01:00
Florian Hahn	b53907bfed	[SLP] Precommit vector library test for D75878.	2020-03-10 10:17:34 +00:00
Sam Parker	ff9ac33e1e	[ARM][MVE] Validate tail predication values Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not. We want to find out if the tail-predicated version of this loop will produce the same values as the loop in its original form. For this to be true, the newly inserted implicit predication must not change the the (observable) results. We're doing this because many instructions in the loop will not be predicated and so the conversion from VPT predication to tail predication can result in different values being produced, because of falsely predicated lanes not being updated in the converted form. A masked load, whether through VPT or tail predication, will write zeros to any of the falsely predicated bytes. So, from the loads, we know that the false lanes are zeroed and here we're trying to track that those false lanes remain zero, or where they change, the differences are masked away by their user(s). All MVE loads and stores have to be predicated, so we know that any load operands, or stored results are equivalent already. Other explicitly predicated instructions will perform the same operation in the original loop and the tail-predicated form too. Because of this, we can insert loads, stores and other predicated instructions into our KnownFalseZeros set and build from there. Differential Revision: https://reviews.llvm.org/D75452	2020-03-10 09:59:01 +00:00
Djordje Todorovic	5aa5c943f7	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-03-10 09:15:06 +01:00
Puyan Lotfi	4b8af31f63	[llvm][MIRVRegNamer] Avoid collisions across constant pool indices. When hashing on MachineOperand::MO_ConstantPoolIndex, now MIR-Canon and MIRVRegNamer will no longer result in a hash collision. Differential Revision: https://reviews.llvm.org/D74449	2020-03-10 01:13:20 -04:00
Matt Arsenault	627bb31a28	AMDGPU/GlobalISel: Avoid illegal vector exts for add/sub/mul When expanding scalar packed operations, we should not introduce illegal vector casts LegalizerHelper introduces. We're not in a legalizer context, and there's no RegBankSelect apply or legalize worklist.	2020-03-09 23:42:17 -04:00
Matt Arsenault	ed72bcae34	AMDGPU/GlobalISel: Fix mishandling SGPR v2s16 add/sub/mul We weren't considering the packed case correctly, and this was passing through to the selector. The selector only checked the size, so this would incorrectly compile to a single 32-bit scalar add. As usual, the LegalizerHelper is somewhat awkward to use from applyMappingImpl. I think this is the first place we've needed multi-step legalization here though.	2020-03-09 22:51:54 -04:00
Philip Reames	56a32fb648	[tests] Add long nop test coverage for intel platforms	2020-03-09 15:29:04 -07:00
ahatanak	1f5b471b8b	[ObjC][ARC] Don't remove autoreleaseRV/retainRV pairs if the call isn't a tail call Previosly ARC optimizer removed the autoreleaseRV/retainRV pair in the following code, which caused the object returned by @something to be placed in the autorelease pool because the call to @something isn't a tail call: ``` %call = call i8* @something(...) %2 = call i8* @objc_retainAutoreleasedReturnValue(i8* %call) %3 = call i8* @objc_autoreleaseReturnValue(i8* %2) ret i8* %3 ``` Fix the bug by checking whether @something is a tail call. rdar://problem/59275894	2020-03-09 13:21:38 -07:00
Matt Arsenault	eb41627799	AMDGPU/GlobalISel: Improve handling of illegal return types Most importantly, this fixes ret i8. Also make sure to handle signext/zeroext for odd types > i32. Some of the corresponding argument passing fixes also need to be handled.	2020-03-09 13:11:30 -07:00
Matt Arsenault	156a1b59df	AMDGPU: Make signext/zeroext behave more sensibly over > i32 Interpret these as extending to the next multiple of 32-bits. This had no effect with i48 for example, which is really split into {i32, i16}, which should extend the high part.	2020-03-09 12:56:10 -07:00
Matt Arsenault	209094eeb6	AMDGPU/GlobalISel: Start matching s_lshlN_add_u32 instructions Use a hack to only enable this for GlobalISel. Technically this also works with SelectionDAG, but the divergence selection isn't reliable enough and a few cases fail, but I have no desire to spend time writing the manual expansion code for it. The DAG actually does a better job since it catches using v_add_lshl_u32 in the mixed SGPR/VGPR cases.	2020-03-09 12:36:51 -07:00
Cameron McInally	2ab8065df6	[AArch64][SVE] Add missing fp16 DestructiveInstType tests These tests should have been added with `a5b22b768f` in D73711. Differential Revision: https://reviews.llvm.org/D75767	2020-03-09 14:09:23 -05:00
Simon Pilgrim	4b130b883d	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - reduce vector width of X86ISD::BLENDI If we don't need the upper subvector elements of the BLENDI node then use a smaller vector size. This causes a couple of minor regressions in insertelement-ones.ll which are more examples of PR26018; given how cheap allones generation is I don't consider that a showstopper, just an annoyance (and there's plenty of other poor codegen cases in that file).	2020-03-09 18:29:28 +00:00
Craig Topper	3dcc0db15e	[X86] Teach combineToExtendBoolVectorInReg to create opportunities for using broadcast load instructions. If we're inserting a scalar that is smaller than the element size of the final VT, the value of the extra bits doesn't matter. Previously we any_extended in the scalar domain before inserting. This patch changes this to use a broadcast of the original scalar type and then a bitcast to the final type. This might enable the use of a broadcast load. This recovers regressions from `07d68c24aa` and `9fcd212e2f` without relying on alignment of the load. Differential Revision: https://reviews.llvm.org/D75835	2020-03-09 11:26:12 -07:00
JF Bastien	8fc9eea43a	Test that volatile load type isn't changed Summary: As discussed in D75505, it's not particularly useful to change the type of a load to/from floating-point/integer because it's followed by a bitcast, and it might lead to surprising code generation. Check that this doesn't generally happen. Reviewers: lebedev.ri Subscribers: jkorous, dexonsmith, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75644	2020-03-09 11:19:23 -07:00
Nikita Popov	45555c3819	[InstSimplify] Simplify calls with "returned" attribute If a call argument has the "returned" attribute, we can simplify the call to the value of that argument. The "-inst-simplify" pass already handled this for the constant integer argument case via known bits, which is invoked in SimplifyInstruction. However, non-constant (or non-int) arguments are not handled at all right now. This addresses one of the regressions from D75801. Differential Revision: https://reviews.llvm.org/D75815	2020-03-09 18:53:47 +01:00
Nikita Popov	c3ca6876ed	[InstCombine] Don't simplify calls without uses When simplifying a call without uses, replaceInstUsesWith() is going to do nothing, but we'll skip all following folds. We can only run into this problem with calls that both simplify and are not trivially dead if unused, which currently seems to happen only with calls to undef, as the test diff shows. When extending SimplifyCall() to handle "returned" attributes, this becomes a much bigger problem, so I'm fixing this first. Differential Revision: https://reviews.llvm.org/D75814	2020-03-09 18:47:46 +01:00
Nikita Popov	829d377a98	[InstSimplify] Don't simplify musttail calls As pointed out by jdoerfert on D75815, we must be careful when simplifying musttail calls: We can only replace the return value if we can eliminate the call entirely. As we can't make this guarantee for all consumers of InstSimplify, this patch disables simplification of musttail calls. Without this patch, musttail simplification currently results in module verification errors. Differential Revision: https://reviews.llvm.org/D75824	2020-03-09 18:46:56 +01:00
Jonas Devlieghere	882f589e20	Revert "[AssumeBundles] Move to IR so it can be used by Analysis" This breaks the modules build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/ http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/ This reverts commit `57c964aaa7`.	2020-03-09 09:02:47 -07:00
Fangrui Song	0d673be13a	[llvm-objdump] Rename --disassemble-functions to --disassemble-symbols https://bugs.llvm.org/show_bug.cgi?id=41910 The feature can disassemble data and the new option name reflects its more generic usage. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D75816	2020-03-09 08:25:45 -07:00
Fangrui Song	6d026c89dc	[llvm-objdump][test] Move binary format specific tests under COFF/ ELF/ MachO/ XCOFF/ wasm/ Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D75798	2020-03-09 08:04:48 -07:00
Krzysztof Parzyszek	44205891ed	[Hexagon] Fix match pattern in a testcase	2020-03-09 09:09:58 -05:00
evgeny	6d2032e259	[WPD] Provide a way to prevent functions from being devirtualized Differential revision: https://reviews.llvm.org/D75617	2020-03-09 14:05:15 +03:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
KAWASHIMA Takahiro	c8cd1a994d	[AArch64] Add support for Fujitsu A64FX A64FX is an Armv8.2-A CPU used in FUJITSU Supercomputer PRIMEHPC FX1000, PRIMEHPC FX700, and supercomputer Fugaku. https://www.fujitsu.com/global/products/computing/servers/supercomputer/specifications/ Differential Revision: https://reviews.llvm.org/D75594	2020-03-09 19:15:09 +09:00
Clement Courbet	6518b72f93	[ExpandMemCmp] Properly constant-fold all compares. Summary: This gets rid of duplicated code and diverging behaviour w.r.t. constants. Fixes PR45086. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75519	2020-03-09 10:40:52 +01:00
Clement Courbet	f7e6f5f8e3	[ExpandMemCmp] Properly constant-fold all compares. Summary: This gets rid of duplicated code and diverging behaviour w.r.t. constants. Fixes PR45086. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75519	2020-03-09 09:10:34 +01:00
Hideto Ueno	bdcbdb4848	[Attributor] Deduction based on path exploration This patch introduces the propagation of known information based on path exploration. For example, ``` int u(int c, int p){ if(c) { return p; } else { return *p + 1; } } ``` An argument `p` is dereferenced whatever c's value is. For an instruction `CtxI`, we accumulate branch instructions in the must-be-executed-context of `CtxI` and then, we take the conjunction of the successors' known state. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D65593	2020-03-09 14:29:26 +09:00
Craig Topper	07d68c24aa	[X86] Remove isel patterns that matched vXi16 X86VBroadcast with i8->i16 aextload input. This was selecting VBROADCASTW which turned the 8-bit load into a 16-bit load if it happened to be 2 byte aligned. I have a plan to fix the regression with a follow up patch which I'll post shortly.	2020-03-08 19:16:24 -07:00
David Green	be5435e032	[ARM] MVE VMULL tests. NFC	2020-03-08 14:39:08 +00:00
Sanjay Patel	a69158c12a	[VectorCombine] fold extract-extract-op with different extraction indexes opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1 The first part of this patch generalizes the cost calculation to accept different extraction indexes. The second part creates a shuffle+extract before feeding into the existing code to create a vector op+extract. The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc" rather than "TargetTransformInfo::SK_Broadcast" (splat specifically from element 0) because we do not have a more general "SK_Splat" currently. That does not affect any of the current regression tests, but we might be able to find some cost model target specialization where that comes into play. I suspect that we can expose some missing x86 horizontal op codegen with this transform, so I'm speculatively adding a debug flag to disable the binop variant of this transform to allow easier testing. The test changes show that we're sensitive to cost model diffs (as we should be), so that means that patches like D74976 should have better coverage. Differential Revision: https://reviews.llvm.org/D75689	2020-03-08 09:57:55 -04:00
Sanjay Patel	b827a95b87	[VectorCombine] add tests for wider vectors; NFC	2020-03-08 09:33:07 -04:00
Tyker	57c964aaa7	[AssumeBundles] Move to IR so it can be used by Analysis Summary: Assume bundles need to be usable by Analysis and Transforms/Utils isn't. so this commit moves utilities to deal with asusme bundles to IR. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75618	2020-03-08 12:21:50 +01:00
Craig Topper	70e4fb8a53	[X86] Add DAG combine to turn (vzext_movl (vbroadcast_load)) -> vzext_load. If we're zeroing the other elements then we don't need the broadcast.	2020-03-08 00:35:40 -08:00
Craig Topper	d81d451442	[X86] Add DAG combine to replace vXi64 vzext_movl+scalar_to_vector with vYi32 vzext_movl+scalar_to_vector if the upper 32 bits of the scalar are zero. We can just use a 32-bit copy and zero in the SSE domain when we zero the upper bits. Remove an isel pattern that becomes dead with this.	2020-03-07 16:14:26 -08:00
Craig Topper	d41ea65ee8	[X86] Add DAG combines to enable removing of movddup/vbroadcast + simple_load isel patterns.	2020-03-07 15:22:02 -08:00
Nikita Popov	51a466a61f	[InstCombine] Fix known bits handling in SimplifyDemandedUseBits Fixes a regression from D75801. SimplifyDemandedUseBits() is also supposed to compute the known bits (of the demanded subset) of the instruction. For unknown instructions it does so by directly calling computeKnownBits(). For known instructions it will compute known bits itself. However, for instructions where only some cases are handled directly (e.g. a constant shift amount) the known bits invocation for the unhandled case is sometimes missing. This patch adds the missing calls and thus removes the main discrepancy with ExpensiveCombines mode. Differential Revision: https://reviews.llvm.org/D75804	2020-03-07 18:16:41 +01:00
Matt Arsenault	a4e71f01c0	Assume ieee behavior without denormal-fp-math attribute	2020-03-07 12:10:56 -05:00
Nikita Popov	f2419adc48	[InstCombine] Regenerate test checks; NFC	2020-03-07 17:58:33 +01:00
Nikita Popov	d2dab92f01	[InstSimplify] Add tests for "returned" attribute; NFC	2020-03-07 17:17:21 +01:00
Nikita Popov	2904a332fe	[InstCombine] Add additional known bits folding tests; NFC	2020-03-07 17:17:03 +01:00
Nikita Popov	4cfb4afb70	[InstCombine] Highlight tests using expensive combines; NFC	2020-03-07 17:16:47 +01:00
Sanjay Patel	89fdee87f7	[InstCombine] regenerate complete test checks; NFC	2020-03-07 10:20:38 -05:00
Sanjay Patel	564f5eed1a	[InstCombine] add test for gep (select),... (PR45084); NFC	2020-03-07 10:00:31 -05:00
Stefanos Baziotis	01c48d7d11	[Attributor] Fold terminators before changing instructions to unreachable It is possible that an instruction to be changed to unreachable is in the same block with a terminator that can be constant-folded. In this case, as of now, the instruction will be changed to unreachable before the terminator is folded. But, then the whole BB becomes invalidated and so when we go ahead to fold the terminator, we trap. Change the order of these two. Differential Revision: https://reviews.llvm.org/D75780	2020-03-07 12:38:44 +02:00
Amara Emerson	c1a97e992d	Revert "Revert "[GlobalISel][Localizer] Enable intra-block localization of already-local uses."" This reverts commit `5583c2f2fb`. The lldb bot failure was a test that was fragile and sensitive to irrelevant changes in instruction ordering. Re-committing this as the test should have been skipped for AArch64 now. Differential Revision: https://reviews.llvm.org/D75555	2020-03-06 21:35:08 -08:00
Andrew Monshizadeh	c5a06019d2	Extend TimeTrace to LLVM's new pass manager With the addition of the LLD time tracing it made sense to include coverage for LLVM's various passes. Doing so ensures that ThinLTO is also covered with a time trace. Before: {F11333974} After: {F11333928} Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D74516	2020-03-06 14:45:19 -08:00
Fangrui Song	c3de1d0b1f	[gold][test] Fix tests after D75713 and D74749	2020-03-06 13:38:04 -08:00
Reid Kleckner	65b21282c7	Avoid emitting unreachable SP adjustments after `throw` In `172eee9c`, we tried to avoid these by modelling the callee as internally resetting the stack pointer. However, for the majority of functions with reserved stack frames, this would lead LLVM to emit extra SP adjustments to undo the callee's internal adjustment. This lead us to fix the problem further on down the pipeline in eliminateCallFramePseudoInstr. In `5b79e603d3`, I added use a heuristic to try to detect when the adjustment would be unreachable. This heuristic is imperfect, and when exception handling is involved, it fails to fire. The new test is an example of this. Simply throwing an exception with an active cleanup emits dead SP adjustments after the throw. Not only are they dead, but if they were executed, they would be incorrect, so they are confusing. This change essentially reverts `172eee9c` and makes the `5b79e603d3` heuristic responsible for preventing unreachable stack adjustments. This means we may emit unreachable stack adjustments for functions using EH with unreserved call frames, but that is not very many these days. Back in 2016 when this change was added, we were focused on 32-bit, which we observed to have fewer reserved frames. Fixes PR45064 Reviewed By: hans Differential Revision: https://reviews.llvm.org/D75712	2020-03-06 13:33:45 -08:00
Anna Thomas	59029b9eef	[RS4GC] Handle uses of extractelement for conversion from vector to scalar base As mentioned in the comments, extractelement is special since we actually want a scalar base for that element we extracted from the vector (i.e. not a vector base). This same logic should apply to uses of the extractelement such as phis and selects which have the same BDV as the extractelement. Howeber, for these uses we conservatively mark the BDV state as conflict, since setting the EE's new base BDV does not always dominate these uses. Added testcase showcases the problem where the BDV identification chokes on the incorrect cast from vector to scalar for the phi use of extractelement. Tests-Run: make check, internal fuzzer testing Reviewers: reames, skatkov, dantrushin Reviewed-By: dantrushin Differential Revision: https://reviews.llvm.org/D75704	2020-03-06 16:28:49 -05:00
Roman Lebedev	1badf7c33a	[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org/z/G24anE This resolves phase-ordering bug that was introduced in D75145 for https://godbolt.org/z/2gBwF2 https://godbolt.org/z/XvgSua Reviewers: spatel, nikic, dmgreen, xbolva00 Reviewed By: nikic, xbolva00 Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75757	2020-03-06 21:39:07 +03:00
Simon Pilgrim	fb8149cac8	[X86] Add CMOV to i686 BMI/TBM tests As mentioned on D75748, there is no such target that has BMI/TBM support but not the much older CMOV.	2020-03-06 17:26:20 +00:00
Simon Pilgrim	7a2ab876fd	[Hexagon] Fix fshl/fshr -> combine() bug identified in D75114	2020-03-06 17:23:10 +00:00
Simon Pilgrim	f78b9a3398	[Hexagon] Add fshl/fshr -> combine() tests identified in D75114 Added tests showing that the fshl/fshr -> combine() is working the wrong way around	2020-03-06 17:23:10 +00:00
Jin Lin	fc6fda90f7	Fix incorrect logic in maintaining the side-effect of compiler generated outliner functions Summary: Fix incorrect logic in maintaining the side-effect of compiler generated outliner functions by adding the up-exposed uses. Reviewers: paquette, tellenbach Reviewed By: paquette Subscribers: aemerson, lebedev.ri, hiraditya, llvm-commits, jinlin Tags: #llvm Differential Revision: https://reviews.llvm.org/D71217	2020-03-06 09:13:20 -08:00
Jay Foad	596446623b	[AMDGPU][ConstantFolding] Fold llvm.amdgcn.cube* intrinsics Summary: This folds the following family of intrinsics: llvm.amdgcn.cubeid (face id) llvm.amdgcn.cubema (major axis) llvm.amdgcn.cubesc (S coordinate) llvm.amdgcn.cubetc (T coordinate) Reviewers: nhaehnle, arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75187	2020-03-06 16:42:53 +00:00
Roman Lebedev	69ec84f8e7	[NFC][InstCombine] Add 'x - (x & y)' tests with multi-use 'and' If %y is constant, we could still perform the fold	2020-03-06 19:41:19 +03:00
Lucas Prates	0ba553d153	[MC] Allowing the use of $-prefixed integer as asm identifiers Summary: Dollar signed prefixed integers were not allowed by the AsmParser to be used as Identifiers, differing from the GNU assembler behavior. This patch updates the parsing of Identifiers to consider such cases as valid, where the identifier string includes the $ prefix itself. As the Lexer currently splits these occurrences into separate tokens, those need to be combined by the AsmParser itself. Reviewers: efriedma, chill Reviewed By: efriedma Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75111	2020-03-06 16:27:51 +00:00
Lucas Prates	af1c2e561e	[ARM] Fix dropped dollar sign from symbols in branch targets Summary: ARMAsmParser was incorrectly dropping a leading dollar sign character from symbol names in targets of branch instructions. This was caused by an incorrect assumption that the contents following the dollar sign token should be handled as a constant immediate, similarly to the # token. This patch avoids the operand parsing from consuming the dollar sign token when it is followed by an identifier, making sure it is properly parsed as part of the expression. Reviewers: efriedma Reviewed By: efriedma Subscribers: danielkiss, chill, carwil, vhscampos, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73176	2020-03-06 16:25:08 +00:00
Igor Kudrin	3a1bc41a89	[DebugInfo] Print the actual value of an unknown section identifier. This is a follow-up for D75609. As @dblaikie suggested, it prints the actual number for an unknown section identifier when dumping unit index sections. Differential Revision: https://reviews.llvm.org/D75668	2020-03-06 21:46:04 +07:00
Krzysztof Parzyszek	37a604c296	[Hexagon] Recognize undefined registers in expandPostRAPseudo	2020-03-06 08:27:42 -06:00
Xiangling Liao	362456bc53	[AIX] Handle LinkOnceODRLinkage and AppendingLinkage for static init gloabl arrays Handle LinkOnceODRLinkage; Handle AppendingLinkage type for llvm.global_ctors/dtors static init global arrays; Differential Revision: https://reviews.llvm.org/D75305	2020-03-06 09:26:55 -05:00
Simon Pilgrim	7202d9cde9	[DAG] Combine fshl/fshr(load1,load0,c) if we have consecutive loads As noted on D75114, if both arguments of a funnel shift are consecutive loads we are missing the opportunity to combine them into a single load. Differential Revision: https://reviews.llvm.org/D75624	2020-03-06 11:36:18 +00:00
Georgii Rymar	e4ceb8f421	[lib/ObjectYAML] - Make `ELFYAML::Relocation::Offset` optional. Currently `yaml2obj` require `Offset` field in a relocation description. There are many cases when `Offset` is insignificant in a context of a test case. Making `Offset` optional allows to simplify our test cases. This is what this patch does. Also, with this patch `obj2yaml` does not dump a zero offset of a relocation. Differential revision: https://reviews.llvm.org/D75608	2020-03-06 13:59:58 +03:00
Georgii Rymar	7391885d5c	[yaml2obj][obj2yaml][Object][test] - Improve testing of relocation types. The intention was to remove the `Object/X86/yaml-elf-x86-rel-broken.yaml test`, This test is at the wrong place. `yaml-elf-x86-rel-broken.yaml` was introduced in rG892c6c86ea25dc97668ff1f1b7bf1108e85fa5ec to check that yaml2obj can use an arbitrary `Hex32` value as a relocation type. We have tests that check the similar functionality. I've improved them and removed the `yaml-elf-x86-rel-broken.yaml` Differential revision: https://reviews.llvm.org/D75679	2020-03-06 13:38:01 +03:00
Fangrui Song	952ee0df9e	ThinLTOBitcodeWriter: drop dso_local when a GlobalVariable is converted to a declaration If we infer the dso_local flag for -fpic, dso_local should be dropped when we convert a GlobalVariable a declaration. dso_local causes the generation of direct access (e.g. R_X86_64_PC32). Such relocations referencing STB_GLOBAL STV_DEFAULT objects are not allowed in a -shared link. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D74749	2020-03-05 18:09:33 -08:00
Fangrui Song	71e2ca6e32	[llvm-objdump] -d: print `00000000 <foo>:` instead of `00000000 foo:` The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier. `.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive. Without `-LABEL`, the CHECK line can match the `Disassembly of section` line and causes the next `CHECK-NEXT:` to fail. ``` Disassembly of section .foo: 0000000000001634 .foo: ``` Bdragon: <> has metalinguistic connotation. it just "feels right" Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D75713	2020-03-05 18:05:28 -08:00
Zhongduo Lin	eae228a292	[IndVarSimplify] Extend previous special case for load use instruction to any narrow type loop variant to avoid extra trunc instruction Summary: The widenIVUse avoids generating trunc by evaluating the use as AddRec, this will not work when: 1) SCEV traces back to an instruction inside the loop that SCEV can not expand, eg. add %indvar, (load %addr) 2) SCEV finds a loop variant, eg. add %indvar, %loopvariant While SCEV fails to avoid trunc, we can still try to use instruction combining approach to prove trunc is not required. This can be further extended with other instruction combining checks, but for now we handle the following case (sub can be "add" and "mul", "nsw + sext" can be "nus + zext") ``` Src: %c = sub nsw %b, %indvar %d = sext %c to i64 Dst: %indvar.ext1 = sext %indvar to i64 %m = sext %b to i64 %d = sub nsw i64 %m, %indvar.ext1 ``` Therefore, as long as the result of add/sub/mul is extended to wide type with right extension and overflow wrap combination, no trunc is required regardless of how %b is generated. This pattern is common when calculating address in 64 bit architecture. Note that this patch reuse almost all the code from D49151 by @az: https://reviews.llvm.org/D49151 It extends it by providing proof of why trunc is unnecessary in more general case, it should also resolve some of the concerns from the following discussion with @reames. http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20180910/585945.html Reviewers: sanjoy, efriedma, sebpop, reames, az, javed.absar, amehsan Reviewed By: az, amehsan Subscribers: hiraditya, llvm-commits, amehsan, reames, az Tags: #llvm Differential Revision: https://reviews.llvm.org/D73059	2020-03-05 16:27:59 -05:00
Jessica Paquette	ef4282e0ee	[AArch64][GlobalISel] Avoid copies to target register bank for subregister copies Previously for any copy from a register bigger than the destination: Copied to a same-sized register in the destination register bank. Subregister copy of that to the destination. This fails for copies from 128-bit FPRs to GPRs because the GPR register bank can't accomodate 128-bit values. Instead of special-casing such copies to perform the truncation beforehand in the source register bank, generalize this: a) Perform a subregister copy straight from source register whenever possible. This results in shorter MIR and fixes the above problem. b) Perform a full copy to target bank and then do a subregister copy only if source bank can't support target's size. E.g. GPR to 8-bit FPR copy. Patch by Raul Tambre (tambre)! Differential Revision: https://reviews.llvm.org/D75421	2020-03-05 11:13:02 -08:00
Fangrui Song	f9a0056016	[llvm-objdump] --syms: make flags closer to GNU objdump This fixes several issues. The behavior changes are: A SHN_COMMON symbol does not have the 'g' flag. An undefined symbol does not have 'g' or 'l' flag. A STB_GLOBAL SymbolRef::ST_Unknown symbol has the 'g' flag. A STB_LOCAL SymbolRef::ST_Unknown symbol has the 'l' flag. Reviewed By: rupprecht Differential Revision: https://reviews.llvm.org/D75659	2020-03-05 09:59:53 -08:00
Jordan Rupprecht	c140810ea1	[llvm-readobj] Include section name of notes. This changes the output of `llvm-readelf -n` from: ``` Displaying notes found at file offset 0x<...> with length 0x<...>: ``` to: ``` Displaying notes found in: .note.foo ``` And similarly, adds a `Name:` field to the `llvm-readobj -n` output for notes. This change not only increases GNU compatibility, it also makes it much easier to read notes. Note that we still fall back to printing the file offset/length in cases where we don't have a section name, such as when printing notes in program headers or printing notes in a partially stripped file (GNU readelf does the same). Fixes llvm.org/PR41339. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D75647	2020-03-05 09:53:14 -08:00
Rodrigo Dominguez	4313543de1	AMDGPU: Add/Fix tests for image atomic intrinsic. Summary: Add tests for 64-bit image atomic swap and cmpswap. Fix tests for 32-bit image atomic add. Change-Id: Ibb7619749c1ad504b24aa1c5f3185417a3013f3c Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75295	2020-03-05 12:18:15 -05:00
David Stuttard	a74b33f612	AMDGPU: Fix SMRD test in trivially disjoint mem access code Summary: This seems like an obvious error - cut and paste issue? The change does make a change to one of the lit tests - it stops s_buffer_load re-ordering past an MUBUF instruction (which is not surprising). Change-Id: I80be99de5b62af4f42e91af2591b76a52ac9efa6 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75686	2020-03-05 17:14:01 +00:00
Chris Bowler	c7b6fa8f4b	[AIX] Extend int arguments to register width when passed in stack memory. This is a follow up to the previous patch: [AIX] Implement caller arguments passed in stack memory. This corrects a defect in AIX 64-bit where an i32 is written to the stack with stw (4 bytes) rather than the expected std (8 bytes.) Integer arguments pass on the stack as images of their register representation. I also took the opportunity to tidy up some of the calling convention AIX tests I added in my last commit. This patch adds the missed assembly expected output for the stack arg int case, which would have caught this problem. Differential Revision: https://reviews.llvm.org/D75126	2020-03-05 11:49:16 -05:00
Juneyoung Lee	d7267ee194	[ValueTracking] Let isGuaranteedNotToBeUndefOrPoison look into branch conditions of dominating blocks' terminators Summary: ``` br i1 c, BB1, BB2: BB1: use1(c) BB2: use2(c) ``` In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB. This is a resubmission of `952ad47` with crash fix of llvm/test/Transforms/LoopRotate/freeze-crash.ll. Checked with Alive2 Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy Reviewed By: reames Subscribers: jdoerfert, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75401	2020-03-06 01:08:35 +09:00
Sanjay Patel	85ae5aa6ff	[VectorCombine] add tests for different extract indexes; NFC	2020-03-05 10:33:21 -05:00
Igor Kudrin	6e9c10f694	Fix typos in comment marks.	2020-03-05 20:01:45 +07:00
Sanjay Patel	59196f8452	[VectorCombine] add x86 AVX run to test for better coverage; NFC	2020-03-05 07:54:31 -05:00
Igor Kudrin	cada5b881b	[DebugInfo] Do not truncate 64-bit values when dumping CIEs and FDEs. This fixes printing long values that might reside in CIE and FDE, including offsets, lengths, and addresses. Differential Revision: https://reviews.llvm.org/D73887	2020-03-05 17:37:28 +07:00
Igor Kudrin	1a837569db	[DebugInfo] Refine the condition to detect CIEs. The condition was not accurate enough and could interpret some FDEs in .eh_frame or 64-bit DWARF .debug_frame sections as CIEs. Even though such FDEs are unlikely in a normal situation, the wrong interpretation could hide an issue in a buggy generator. Differential Revision: https://reviews.llvm.org/D73886	2020-03-05 17:37:09 +07:00
Daniil Suchkov	f35a898f5f	[Test] Add a regression test for failure introduced by `952ad4701c`	2020-03-05 16:32:37 +07:00
Daniil Suchkov	3db48f9324	Revert "[ValueTracking] Let isGuaranteedNotToBeUndefOrPoison look into branch conditions of dominating blocks' terminators" That commit causes SIGSEGV on some simple tests. This reverts commit `952ad4701c`.	2020-03-05 16:32:36 +07:00
Jun Ma	b10deb9487	[Coroutines] Optimized coroutine elision based on reachability Differential Revision: https://reviews.llvm.org/D75440	2020-03-05 14:43:50 +08:00
Sameer Sahasrabuddhe	42febbab91	StructurizeCFG: simplify phi nodes when possible After structurization, some phi nodes can have a single incoming edge and can be simplified away. This change runs a simplify query on all phis that are either modified or added by the structurizer. This also moves some phis closer to their use as a side benefit. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D75500	2020-03-05 10:33:15 +05:30
Michael Trent	df058699d3	Fix dyld opcode *_ADD_ADDR_IMM_SCALED error detection. Summary: Move the check for malformed REBASE_OPCODE_ADD_ADDR_IMM_SCALED and BIND_OPCODE_DO_BIND_ADD_ADDR_IMM_SCALED opcodes after the immediate has been applied to the SegmentOffset. This fixes specious errors where SegmentOffset is pointing between two sections when trying to correct the SegmentOffset value. Update the regression tests to verify the proper error message. Reviewers: pete, ab, lhames, steven_wu, jhenderson Reviewed By: pete Subscribers: hiraditya, dexonsmith, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75629	2020-03-04 19:57:45 -08:00
Igor Kudrin	cc61283bf6	[DebugInfo] Avoid crashing on an invalid section identifier. A DWARFSectionKind is read from input. It is not validated on parsing, so an unexpected value may result in reaching llvm_unreachable() in DWARFUnitIndex::getColumnHeader() when dumping the index section. Differential Revision: https://reviews.llvm.org/D75609	2020-03-05 10:54:43 +07:00
QingShan Zhang	3906ae387f	[DAGCombine] Check the uses of negated floating constant and remove the hack PowerPC hits an assertion due to somewhat the same reason as https://reviews.llvm.org/D70975. Though there are already some hack, it still failed with some case, when the operand 0 is NOT a const fp, it is another fma that with const fp. And that const fp is negated which result in multi-uses. A better fix is to check the uses of the negated const fp. If there are already use of its negated value, we will have benefit as no extra Node is added. Differential revision: https://reviews.llvm.org/D75501	2020-03-05 03:42:50 +00:00
Greg Clayton	4050b01ba9	Fix GSYM tests to run the yaml files and fix test failures on some machines. YAML files were not being run during lit testing as there was no lit.local.cfg file. Once this was fixed, some buildbots would fail due to a StringRef that pointed to a std::string inside of a temporary llvm::Triple object. These issues are fixed here by making a local triple object that stays around long enough so the StringRef points to valid data. Fixed memory sanitizer bot bugs as well. Differential Revision: https://reviews.llvm.org/D75390	2020-03-04 19:14:08 -08:00
hsmahesha	3fda1fde8f	AMDGPU/GlobalISel: Support llvm.trap and llvm.debugtrap intrinsics Summary: Lower trap and debugtrap intrinsics to AMDGPU machine instruction(s). Reviewers: arsenm, nhaehnle, kerbowa, cdevadas, t-tye, kzhuravl Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, dstuttard, tpr, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74688	2020-03-05 08:16:57 +05:30
Philip Reames	f708c823f0	[X86] Relax existing instructions to reduce the number of nops needed for alignment purposes If we have an explicit align directive, we currently default to emitting nops to fill the space. As discussed in the context of the prefix padding work for branch alignment (D72225), we're allowed to play other tricks such as extending the size of previous instructions instead. This patch will convert near jumps to far jumps if doing so decreases the number of bytes of nops needed for a following align. It does so as a post-pass after relaxation is complete. It intentionally works without moving any labels or doing anything which might require another round of relaxation. The point of this patch is mainly to mock out the approach. The optimization implemented is real, and possibly useful, but the main point is to demonstrate an approach for implementing such "pad previous instruction" approaches. The key notion in this patch is to treat padding previous instructions as an optional optimization, not as a core part of relaxation. The benefit to this is that we avoid the potential concern about increasing the distance between two labels and thus causing further potentially non-local code grown due to relaxation. The downside is that we may miss some opportunities to avoid nops. For the moment, this patch only implements a small set of existing relaxations.. Assuming the approach is satisfactory, I plan to extend this to a broader set of instructions where there are obvious "relaxations" which are roughly performance equivalent. Note that this patch doesn't change which instructions are relaxable. We may wish to explore that separately to increase optimization opportunity, but I figured that deserved it's own separate discussion. There are possible downsides to this optimization (and all "pad previous instruction" variants). The major two are potentially increasing instruction fetch and perturbing uop caching. (i.e. the usual alignment risks) Specifically: * If we pad an instruction such that it crosses a fetch window (16 bytes on modern X86-64), we may cause the decoder to have to trigger a fetch it wouldn't have otherwise. This can effect both decode speed, and icache pressure. * Intel's uop caching have particular restrictions on instruction combinations which can fit in a particular way. By moving around instructions, we can both cause misses an change misses into hits. Many of the most painful cases are around branch density, so I don't expect this to be too bad on the whole. On the whole, I expect to see small swings (i.e. the typical alignment change problem), but nothing major or systematic in either direction. Differential Revision: https://reviews.llvm.org/D75203	2020-03-04 16:52:35 -08:00
Matt Arsenault	7459781bd9	X86: Generate mir checks in sqrt test	2020-03-04 18:46:46 -05:00
Craig Topper	eadea7868f	[X86] Convert vXi1 vectors to xmm/ymm/zmm types via getRegisterTypeForCallingConv rather than using CCPromoteToType in the td file Previously we tried to promote these to xmm/ymm/zmm by promoting in the X86CallingConv.td file. But this breaks when we run out of xmm/ymm/zmm registers and need to fall back to memory. We end up trying to create a non-sensical scalar to vector. This lead to an assertion. The new tests in avx512-calling-conv.ll all trigger this assertion. Since we really want to treat these types like we do on avx2, it seems better to promote them before the calling convention code gets involved. Except when the calling convention is one that passes the vXi1 type in a k register. The changes in avx512-regcall-Mask.ll are because we indicated that xmm/ymm/zmm types should be passed indirectly for the Win64 ABI before we go to the common lines that promoted the vXi1 types. This caused the promoted types to be picked up by the default calling convention code. Now we promote them earlier so they get passed indirectly as though they were xmm/ymm/zmm. Differential Revision: https://reviews.llvm.org/D75154	2020-03-04 15:02:32 -08:00
shafik	37549464c1	[dsymutil] Fix template stripping in getDIENames(...) to account for overloaded operators Currently dsymutil when generating accelerator tables will attempt to strip the template parameters from names for subroutines. For some overload operators which contain < in their names e.g. operator< the current method ends up stripping the operator name as well, we just end up with the name operator in the table for each case. Differential Revision: https://reviews.llvm.org/D75545	2020-03-04 14:54:31 -08:00
Craig Topper	6ca96765c7	[X86] Disable commuting for the first source operand of zero masked scalar fma intrinsic instructions. I believe this is the correct fix for D75506 rather than disabling all commuting. We can still commute the remaining two sources. Differential Revision:m https://reviews.llvm.org/D75526	2020-03-04 14:35:53 -08:00
Nikita Popov	c6ff3c9bad	[InstSimplify] Constant fold icmp of gep InstSimplify can fold icmps of gep where the base pointers are the same and the offsets are constant. It does so by constructing a constant expression icmp and assumes that it gets folded -- but this doesn't actually happen, because GEP expressions can usually only be folded by the target-dependent constant folding layer. As such, we need to explicitly invoke it here. Differential Revision: https://reviews.llvm.org/D75407	2020-03-04 23:16:52 +01:00
Muhammad Omair Javaid	5583c2f2fb	Revert "[GlobalISel][Localizer] Enable intra-block localization of already-local uses." This reverts commit `e91e1df6ab`.	2020-03-05 03:12:28 +05:00
Matt Arsenault	9e1d2afc13	AMDGPU/GlobalISel: Don't use vector G_EXTRACT in arg lowering Create a wider source vector, and unmerge with dead defs like the legalizer. The legalization handling for G_EXTRACT is incomplete, and it's preferrable to keep everything in 32-bit pieces. We should probably start moving these functions into utils, since we have a growing number of places that do almost the same thing.	2020-03-04 16:49:01 -05:00
Matt Arsenault	f70e7dc17d	AMDGPU/GlobalISel: Switch target in argument test Since this is still largely relying on the DAG argument type lowering code, this has inherited the problem where i16 vectors have a different ABI on targets with and without legal i16. Switch to using a target with legal i16, so the i16 vector argument tests are more useful.	2020-03-04 16:40:06 -05:00
Matt Arsenault	fb0c35fa34	GlobalISel: Set alignment on function argument stack load/store	2020-03-04 16:38:46 -05:00
Fangrui Song	9e1319df7e	[llvm-readelf] Make --all output order closer to GNU readelf https://bugs.llvm.org/show_bug.cgi?id=43403 The new order makes it easy to compare the two tools' --all. Reviewed By: grimar, rupprecht Differential Revision: https://reviews.llvm.org/D75592	2020-03-04 12:22:12 -08:00
Fangrui Song	c72d60d42f	[llvm-objdump] --syms: print st_size as "%016" PRIx64 instead of "%08" PRIx64 for 64-bit objects This is GNU objdump's behavior and it is reasonable to match. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D75588	2020-03-04 12:09:27 -08:00
Wei Mi	3c96d01d2e	Generate Callee Saved Register (CSR) related cfi directives like .cfi_restore. https://reviews.llvm.org/D42848 only handled CFA related cfi directives but didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses the framework created in D42848. For each basicblock, the patch tracks which CSR set have been saved at its CFG predecessors's exits, and compare the CSR set with the set at its previous basicblock's exit (The previous block is the block laid before the current block). If the saved CSR set at its previous basicblock's exit is larger, .cfi_restore will be inserted. The patch also generates proper .cfi_restore in epilogue to make sure the saved CSR set is consistent for the incoming edges of each block. Differential Revision: https://reviews.llvm.org/D74303	2020-03-04 11:18:37 -08:00
Guozhi Wei	ee9a3eba76	[CodeGenPrepare] Handle ExtractValueInst in dupRetToEnableTailCallOpts As the test case shows if there is an ExtractValueInst in the Ret block, function dupRetToEnableTailCallOpts can't duplicate it into the block containing call. So later no tail call is generated in CodeGen. This patch adds the ExtractValueInst handling code in function dupRetToEnableTailCallOpts and FoldReturnIntoUncondBranch, and later tail call can be generated for this case. Differential Revision: https://reviews.llvm.org/D74242	2020-03-04 11:10:32 -08:00
David Green	38e532278e	[LSR] Add masked load and store handling This teaches Loop Strength Reduction the details about masked load and store address operands, so that it can have a better time optimising them as it would for normal loads and stores. Differential Revision: https://reviews.llvm.org/D75371	2020-03-04 18:36:10 +00:00
Mitch Phillips	58079aa91b	Revert "Fix GSYM tests to run the yaml files and fix test failures on some machines." This reverts commit `8d41f1a023`. This change broke the MSan buildbots - see comments in https://reviews.llvm.org/D75390 for more information.	2020-03-04 10:21:54 -08:00
Nikita Popov	9b5de84e27	[InstCombine] Use IRBuilder to create bitcast This makes sure that the constant expression bitcast goes through target-dependent constant folding, and thus avoids an additional iteration of InstCombine.	2020-03-04 18:28:38 +01:00
Nikita Popov	17be8e4a6f	[ConstProp] Add test for bitcast to gep fold; NFC	2020-03-04 18:27:20 +01:00
Nikita Popov	a99b97b818	[InstSimplify] Add additional icmp of gep folding test; NFC	2020-03-04 18:27:01 +01:00
Nikita Popov	0940c32385	[InstSimplify] Regenerate compare.ll checks; NFC	2020-03-04 18:26:42 +01:00
Simon Pilgrim	f24d90c0a6	[X86] Add tests showing failure to combine consecutive loads + FSHR into a single load Similar to some of the regressions seen in D75114	2020-03-04 17:07:03 +00:00
Simon Pilgrim	4c411d2419	[X86] Add tests showing failure to combine consecutive loads + FSHL into a single load Similar to some of the regressions seen in D75114	2020-03-04 17:07:02 +00:00
Sanjay Patel	71a316883d	[PassManager] adjust VectorCombine placement The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs: https://bugs.llvm.org/show_bug.cgi?id=45015 https://bugs.llvm.org/show_bug.cgi?id=42022 This patch contains a few independent changes: 1. Move the pass up in the pipeline, so it happens just after loop-vectorization. This is only to keep vectorization passes together in the pipeline at the moment. I don't have evidence of interaction between these yet. 2. Add an -early-cse pass after -vector-combine to clean up redundant ops. This was partly proposed as far back as rL219644 (which is why it's effectively being moved in the old PM code). This is important because the subsequent -instcombine doesn't work as well without EarlyCSE. With the CSE, -instcombine is able to squash shuffles together in 1 of the tests (because those are simple "select" shuffles). 3. Remove the -vector-combine pass that was running after SLP. We may want to do that eventually, but I don't have a test case to support it yet. Differential Revision: https://reviews.llvm.org/D75145	2020-03-04 11:10:49 -05:00
Sanjay Patel	29a2b20ab3	[SDAG] simplify FP binops to undef As discussed in the commit thread for rGa253a2a and D73978, we can do more undef folding for FP ops. The nnan and ninf fast-math-flags specify that if an operand is the disallowed value, the result is poison, so we can produce an undef result. But this doesn't work as expected (the undef operand cases remain) because of a Flags propagation problem in SelectionDAGBuilder. I've added DAGCombiner calls to enable these for the other cases because we've shown in other patches that (because of the limited way that SDAG iterates), it is possible to miss simplifications like this if they are done only at node creation time. Several potential follow-ups to expand on this patch are possible. Differential Revision: https://reviews.llvm.org/D75576	2020-03-04 10:42:16 -05:00
David Green	587feec07e	[ARM] Change all tests from "thumbv8.1-m.main" to "thumbv8.1m.main". NFC	2020-03-04 13:47:35 +00:00
Evgeniy Brevnov	e60c28746b	Lost regression test from commit `5a63813dc7`.	2020-03-04 19:52:42 +07:00
Pavel Labath	38385630ad	Use DWARFDataExtractor::getInitialLength in DWARFDebugAddr Reviewers: ikudrin, jhenderson, probinson Subscribers: hiraditya, dblaikie, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75532	2020-03-04 13:00:15 +01:00
Kerry McLaughlin	f5502c7035	[AArch64][SVE] Add SVE2 intrinsic for xar Summary: Implements the @llvm.aarch64.sve.xar intrinsic Reviewers: andwar, c-rhodes, dancgr, efriedma, rengolin Reviewed By: andwar Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75160	2020-03-04 11:44:32 +00:00
Simon Pilgrim	e2f0093800	[AMDGPU] performCvtF32UByteNCombine - revisit node after src operand simplification. If SimplifyDemandedBits succeeds in simplifying the byte src, add the CVT_F32_UBYTE node back to the worklist as we might be able to simplify further. Yet another step towards removing SelectionDAG::GetDemandedBits.	2020-03-04 11:25:50 +00:00
gbreynoo	5e0f9d5d3c	[llvm-ar][test] Add to llvm-ar test coverage - Added handling of thin archives to symtab.test. - Added handling of newlines to response.test. - `62fa3332c9` exposed behaviour regarding the use of -- on the command line. Added double-hyphen.test to cover this. Differential Revision: https://reviews.llvm.org/D73333	2020-03-04 10:56:48 +00:00
Simon Tatham	068b2f313c	[ARM,MVE] Add the `vshlcq` intrinsics. Summary: The VSHLC instruction performs a left shift of a whole vector register by an immediate shift count up to 32, shifting in new bits at the low end from a GPR and delivering the shifted-out bits from the high end back into the same GPR. Since the instruction produces two outputs (the shifted vector register and the output GPR of shifted-out bits), it has to be instruction-selected in C++ rather than Tablegen. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75445	2020-03-04 08:49:27 +00:00
Simon Tatham	810127f6ab	[ARM,MVE] Add the `vsbciq` intrinsics. Summary: These are exactly parallel to the existing `vadciq` intrinsics, which we implemented last year as part of the original MVE intrinsics framework setup. Just like VADC/VADCI, the MVE VSBC/VSBCI instructions deliver two outputs, both of which the intrinsic exposes: a modified vector register and a carry flag. So they have to be instruction-selected in C++ rather than Tablegen. However, in this case, that's trivial: the same C++ isel routine we already have for VADC works unchanged, and all we have to do is to pass it a different instruction id. Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75444	2020-03-04 08:49:27 +00:00
Juneyoung Lee	952ad4701c	[ValueTracking] Let isGuaranteedNotToBeUndefOrPoison look into branch conditions of dominating blocks' terminators Summary: ``` br i1 c, BB1, BB2: BB1: use1(c) BB2: use2(c) ``` In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB. Checked with Alive2 Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy Reviewed By: reames Subscribers: jdoerfert, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75401	2020-03-04 11:43:31 +09:00
Amara Emerson	e91e1df6ab	[GlobalISel][Localizer] Enable intra-block localization of already-local uses. This changes the localizer to attempt intra-block localizer of instructions that have local uses. This is useful because sometimes the entry block itself has many uses of constant-like instructions, which would benefit from shortening live ranges. Previously if an inst had no non-local uses, we wouldn't add it to the list of instructions to attempt further intra-block localization. This gives a 0.7% geomean code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D75555	2020-03-03 18:14:57 -08:00
Fangrui Song	7af4374ff8	[MC][test] Improve some llvm-objdump -t tests Delete two redundant tests.	2020-03-03 17:27:06 -08:00
Lang Hames	14ac84e5c5	[JITLink] Add a -slab-address option to llvm-jitlink. This option can be used to for JITLink to link as-if the target memory slab were allocated at a specific start address. This can be used to both verify that cross-address space linking is working correctly, and to ensure that certain address-sensitive optimizations (e.g. GOT and stub elimination) either do or do not fire, depending on the requirements of the test case. This argument is only valid for testing in conjunction with -noexec -slab-alloc, and will produce an error if used without those arguments.	2020-03-03 14:25:51 -08:00
Matt Arsenault	88aced1e45	AMDGPU: Fix computation for getOccupancyWithLocalMemSize The computation here didn't really make sense to me, and reported wildy different results depending on the flat work group size attribute. I think this should really report a range derived from the possible work group size bounds, and only allow an occupancy that is a multiple of the group size.	2020-03-03 17:15:57 -05:00
Brian Gesiak	aa85b437a9	[Coroutines] Use dbg.declare for frame variables Summary: https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f is an example of a small C++ program that uses C++20 coroutines that is difficult to debug, due to the loss of debug info for variables that "spill" across coroutine suspension boundaries. This patch addresses that issue by inserting 'llvm.dbg.declare' intrinsics that point the debugger to the variables' location at an offset to the coroutine frame. With this patch, I confirmed that running the 'frame variable' commands in https://gist.github.com/modocache/ed7c62f6e570766c0f39b35dad675c2f at the specified breakpoints results in the correct values being printed for coroutine frame variables 'i' and 'j' when using an lldb built from trunk, as well as with gdb 8.3 (lldb 9.0.1, however, could not print the values). The added test case also verifies this improved behavior. The existing coro-debug.ll test case is also modified to reflect the locations at which Clang actually places calls to 'dbg.declare', and additional checks are added to ensure this patch works as intended in that example as well. Reviewers: vsk, jmorse, GorNishanov, lewissbaker, wenlei Subscribers: EricWF, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75338	2020-03-03 17:13:46 -05:00
Amy Huang	5b3b21f025	[DebugInfo] Fix for adding "returns cxx udt" option to functions in CodeView. Summary: This change checks for the return type in the frontend and adds a flag to the DISubroutineType to indicate that the option should be added in CodeViewDebug. Previously function types sometimes appeared twice in the PDB: once with "returns cxx udt" and once without. See https://bugs.llvm.org/show_bug.cgi?id=44785. Reviewers: rnk, asmith Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75215	2020-03-03 14:00:08 -08:00
Sanjay Patel	f95095e9f6	[AArch64] add tests for nnan/ninf/undef FP simplifications; NFC	2020-03-03 16:38:58 -05:00
Vedant Kumar	b5b21812dc	test: Adjust no-dbg-value-after-terminator.mir to use `not --crash`	2020-03-03 13:30:31 -08:00
Sanjay Patel	5f5fce06b9	[PowerPC] adjust test to avoid getting zapped completely; NFC div-by-0 -> Inf The math ops are 'fast' so 'ninf' applies and the whole thing is undef.	2020-03-03 16:15:48 -05:00
Vedant Kumar	f002ee55c7	[MachineVerifier] Remove placement rule exception for debug entry values There should not be an exception allowing debug entry values to be placed after a terminator. Differential Revision: https://reviews.llvm.org/D75559	2020-03-03 13:02:18 -08:00
Vedant Kumar	2bf496620c	[LiveDebugValues] Do not insert DBG_VALUEs after a MBB terminator This fixes a miscompile that happened because a DBG_VALUE interfered with the MachineOutliner's liveness analysis. Inserting a DBG_VALUE after a terminator breaks predicates on MBB such as isReturnBlock(). And the resulting DBG_VALUE cannot be "live". I plan to introduce a MachineVerifier check for this situation in a follow up. rdar://59859175 Testing: check-llvm, LNT build with a stage2 compiler & entry values enabled Differential Revision: https://reviews.llvm.org/D75548	2020-03-03 13:00:52 -08:00
Craig Topper	02f03a6fd4	[X86] Match vpmullq latency to uops.info. Correct port usage for 512-bit memory form uops.info says these should be 15 cycle instructions. Uops.info also shows the 512-bit form uses port 0 and 5 for both register and memory. We had memory using 0 and 1. Differential Revision: https://reviews.llvm.org/D75549	2020-03-03 12:16:03 -08:00
Craig Topper	3c4e635593	[X86] Always emit an integer vbroadcast_load from lowerBuildVectorAsBroadcast regardless of AVX vs AVX2 If we go with D75412, we no longer depend on the scalar type directly. So we don't need to avoid using i64. We already have AVX1 fallback patterns with i32 and i64 scalar types so we don't need to avoid using integer types on AVX1. Differential Revision: https://reviews.llvm.org/D75413	2020-03-03 10:39:11 -08:00
Whitney Tsang	c84532a70a	[LoopNest]: Analysis to discover properties of a loop nest. Summary: This patch adds an analysis pass to collect loop nests and summarize properties of the nest (e.g the nest depth, whether the nest is perfect, what's the innermost loop, etc...). The motivation for this patch was discussed at the latest meeting of the LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we discussed the unimodular loop transformation framework ( “A Loop Transformation Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework provides a convenient way to unify legality checking and code generation for several loop nest transformations (e.g. loop reversal, loop interchange, loop skewing) and their compositions. Given that the unimodular framework is applicable to perfect loop nests this is one property of interest we expose in this analysis. Several other utility functions are also provided. In the future other properties of interest can be added in a centralized place. Authored By: etiotto Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn, reames, hfinkel, jdoerfert, ppc-slack Reviewed By: Meinersbur Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D68789	2020-03-03 18:25:19 +00:00
Craig Topper	a1611b3737	[X86] Connect accidentally dead code in a avx512 fmadd intrinsic test case.	2020-03-03 09:54:23 -08:00
Justin Bogner	831fe8dc4c	Restore `REQUIRES: default_triple` to a test This was accidentally removed as part of `7683a084de` "Remove lit feature object-emission" `7683a084de` Remove lit feature object-emission	2020-03-03 09:43:36 -08:00
Lang Hames	ff4fd8dead	[ORC] Make sure we add initializers to the SymbolFlags map for objects.	2020-03-03 09:33:37 -08:00
Fangrui Song	55a56041d1	[MCDwarf] Generate DWARF v5 .debug_rnglists for assembly files ``` // clang -c -gdwarf-5 a.s -o a.o .section .init; ret .text; ret ``` .debug_info contains DW_AT_ranges and llvm-dwarfdump will report a verification error because .debug_rnglists does not exist (not implemented). This patch generates .debug_rnglists for assembly files. emitListsTableHeaderStart() in DwarfDebug.cpp can be shared with MCDwarf.cpp. Because CodeGen depends on MC, I move the function to MCDwarf.cpp Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D75375	2020-03-03 09:03:34 -08:00
Craig Topper	d8ad7cc088	[DAGCombiner][X86] Improve narrowExtractedVectorLoad to handle cases where the element size isn't byte sized by the subvector is. Summary: Follow up from D75377. If the subvector is byte sized and the index is aligned to the subvector size, we can shrink the load. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: dbabokin, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75434	2020-03-03 08:41:31 -08:00
Craig Topper	68aeaab888	[X86] Don't count the chain uses when forming broadcast loads in lowerBuildVectorAsBroadcast. The build_vector needs to be the only user of the data, but the chain will likely have another use. So we can't make sure the build_vector is the only user of the node.	2020-03-03 08:41:31 -08:00
Chris Bowler	383e3ec1b2	[PowerPC][NFC] Add missing expected output for AIX int stack arg test. The expected output is erroneous and will be corrected alongside a fix to ensure stack arguments are widened to register width before writing to the parameter save area.	2020-03-03 11:25:41 -05:00
Chris Bowler	65dd63fb33	[PowerPC][NFC] Lexically order expected output for AIX stack arg test.	2020-03-03 11:14:17 -05:00
Jonas Paulsson	ae4d39c9e4	[SystemZ] Copy Access registers and CC with the correct register class. On SystemZ there are a set of "access registers" that can be copied in and out of 32-bit GPRs with special instructions. These instructions can only perform the copy using low 32-bit parts of the 64-bit GPRs. However, the default register class for 32-bit integers is GRX32, which also contains the high 32-bit part registers. In order to never end up with a case of such a COPY into a high reg, this patch adds a new simple pre-RA pass that selects such COPYs into target instructions. This pass also handles COPYs from CC (Condition Code register), and COPYs to CC can now also be emitted from a high reg in copyPhysReg(). Fixes: https://bugs.llvm.org/show_bug.cgi?id=44254 Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D75014	2020-03-03 16:41:09 +01:00
Sam Parker	5618e9be37	[RDA][ARM] collectKilledOperands across multiple blocks Use MIOperand in collectLocalKilledOperands to make the search global, as we already have to search for global uses too. This allows us to delete more dead code when tail predicating. Differential Revision: https://reviews.llvm.org/D75167	2020-03-03 15:23:05 +00:00
diggerlin	f9896435c9	[AIX][XCOFF] Fix XCOFFObjectWriter assertion failure with alignment-related gap and improve text section output testing SUMMARY: 1.if there is a gap between the end virtual address of one section and the beginning virtual address of the next section, the XCOFFObjectWriter.cpp will hit a assert. 2.as discussed in the patch https://reviews.llvm.org/D66969, since implemented the function description. We can output the raw object data for function. we need to create a test for raw text section content and test section header for xcoff object file. Reviewer: daltenty,hubert.reinterpretcast,jasonliu Differential Revision: https://reviews.llvm.org/D71845	2020-03-03 10:02:40 -05:00
Clement Courbet	075c281859	[ExpandMemCmp][NFC] Regenerate tests.	2020-03-03 15:09:55 +01:00
Whitney Tsang	613f791131	Revert "[LoopNest]: Analysis to discover properties of a loop nest." This reverts commit `3a063d68e3`. Broke the build with modules enabled: http://green.lab.llvm.org/green/job/lldb-cmake/10655/console .	2020-03-03 14:07:49 +00:00
Jonas Paulsson	237625757a	[SystemZ] Bugfix for backchain with packed-stack The incoming back chain slot was implicitly allocated whenever a GPR was saved in SystemZFrameLowering::getRegSpillOffset(), but in cases where no GPRs were saved/restored this did not take effect. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D75367	2020-03-03 15:03:01 +01:00
Clement Courbet	c68d35d78c	[ExpandMemCmp] Add more tests to show missing constant folding.	2020-03-03 14:57:11 +01:00
gbreynoo	62fa3332c9	[llvm-ar] Fix llvm-ar response file reading on Windows Response files where not being correctly read on Windows, this change fixes the issue and adds some tests. Differential Revision: https://reviews.llvm.org/D69665	2020-03-03 13:42:57 +00:00
Jonas Paulsson	cdcce3cabf	[SystemZ] Also accept ISD::USUBO in shouldFormOverflowOp(). Forming subtract with overflow is beneficial on SystemZ, just like additions. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D75290	2020-03-03 14:38:57 +01:00
Whitney Tsang	3a063d68e3	[LoopNest]: Analysis to discover properties of a loop nest. Summary: This patch adds an analysis pass to collect loop nests and summarize properties of the nest (e.g the nest depth, whether the nest is perfect, what's the innermost loop, etc...). The motivation for this patch was discussed at the latest meeting of the LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we discussed the unimodular loop transformation framework ( “A Loop Transformation Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework provides a convenient way to unify legality checking and code generation for several loop nest transformations (e.g. loop reversal, loop interchange, loop skewing) and their compositions. Given that the unimodular framework is applicable to perfect loop nests this is one property of interest we expose in this analysis. Several other utility functions are also provided. In the future other properties of interest can be added in a centralized place. Authored By: etiotto Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn, reames, hfinkel, jdoerfert, ppc-slack Reviewed By: Meinersbur Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D68789	2020-03-03 13:25:28 +00:00
David Green	be0736511b	[ARM] Add some postinc LSR tests. NFC	2020-03-03 12:10:12 +00:00
David Green	ec7e4a9a80	[LoopVectorizer] Add reduction tests for inloop reductions. NFC Also adds a force-reduction-intrinsics option for testing, for forcing the generation of reduction intrinsics even when the backend is not requesting them.	2020-03-03 10:54:00 +00:00
Hans Wennborg	916be8fd6a	Revert `abb00753` "build: reduce CMake handling for zlib" (PR44780) and follow-ups: `a2ca1c2d` "build: disable zlib by default on Windows" `2181bf40` "[CMake] Link against ZLIB::ZLIB" `1079c68a` "Attempt to fix ZLIB CMake logic on Windows" This changed the output of llvm-config --system-libs, and more importantly it broke stand-alone builds. Instead of piling on more fix attempts, let's revert this to reduce the risk of more breakages.	2020-03-03 11:03:09 +01:00
Jim Lin	4e3b037665	[AVR] Fix incorrect register state for LDRdPtr Summary: LDRdPtr expanded from LDWRdPtr shouldn't define its second operand(SrcReg). The second operand is its source register. Add -verify-machineinstrs into command line of testcases can trigger this error. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75437	2020-03-03 17:34:54 +08:00
Georgii Rymar	d58e383f23	[obj2yaml] - Dump allocatable SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM sections. Sometimes we need to dump an object and build it again from a YAML description produced. The problem is that obj2yaml does not dump some of sections, like string tables and symbol tables. Because of that yaml2obj implicitly creates them and sections created are not placed at their original locations. They are added to the end of a section list. That makes a preparing test cases task harder than it can be. This patch teaches obj2yaml to dump parts of allocatable SHT_STRTAB, SHT_SYMTAB and SHT_DYNSYM sections to print placeholders for them. This also allows to preserve usefull parameters, like virtual address. Differential revision: https://reviews.llvm.org/D74955	2020-03-03 11:32:49 +03:00
Roman Lebedev	9e1443e6f6	[NFC][InstCombine] Add test with non-CSE'd casts of load in @t0 we can still change type of load and get rid of casts.	2020-03-03 11:27:27 +03:00
Sameer Sahasrabuddhe	534d8866a1	[AMDGPU] add generated checks for some LIT tests This is in prepration for further changes that affect these tests. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D75403	2020-03-03 11:47:05 +05:30
Alok Kumar Sharma	6f029dadf6	[DebugInfo] Avoid generating duplicate llvm.dbg.value Summary: This is to avoid generating duplicate llvm.dbg.value instrinsic if it already exists after the Instruction. Before inserting llvm.dbg.value instruction, LLVM checks if the same instruction is already present before the instruction to avoid duplicates. Currently it misses to check if it already exists after the instruction. flang generates IR like this. %4 = load i32, i32* %i1_311, align 4, !dbg !42 call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33 When this IR is processed in llvm, it ends up inserting duplicates. %4 = load i32, i32* %i1_311, align 4, !dbg !42 call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33 call void @llvm.dbg.value(metadata i32 %4, metadata !35, metadata !DIExpression()), !dbg !33 We have now updated LdStHasDebugValue to include the cases when instruction is already followed by same dbg.value instruction we intend to insert. Now, Definition and usage of function LdStHasDebugValue are deleted. RemoveRedundantDbgInstrs is called for the cleanup of duplicate dbg.value's Testing: Added unit test for validation check-llvm check-debuginfo (the debug info integration tests) Reviewers: aprantl, probinson, dblaikie, jmorse, jini.susan.george SouraVX, awpandey, dstenb, vsk Reviewed By: aprantl, jmorse, dstenb, vsk Differential Revision: https://reviews.llvm.org/D74030	2020-03-03 09:56:45 +05:30
David Blaikie	4ce3e5074b	DebugInfo: Separate different debug_macinfo contributions & print the offset of a contribution	2020-03-02 19:30:30 -08:00
Juneyoung Lee	9f1f244d3c	[LICM] Allow freeze to hoist/sink out of a loop Summary: This patch allows LICM to hoist/sink freeze instructions out of a loop. Reviewers: reames, fhahn, efriedma Reviewed By: reames Subscribers: jfb, lebedev.ri, hiraditya, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75400	2020-03-03 12:29:39 +09:00
Huihui Zhang	44fa47c9e7	[ARM][ConstantIslands] Fix stack mis-alignment caused by undoLRSpillRestore. Summary: It is not safe for ARMConstantIslands to undoLRSpillRestore. PrologEpilogInserter is the one to ensure stack alignment, taking into consideration LR is spilled or not. For noreturn function with StackAlignment 8 (function contains call/alloc), undoLRSpillRestore cause stack be mis-aligned. Fixing stack alignment in ARMConstantIslands doesn't give us much benefit, as undo LR spill/restore only occur in large function with near branches only, also doesn't have callee-saved LR spill. Reviewers: t.p.northover, rengolin, efriedma, apazos, samparker, ostannard Reviewed By: ostannard Subscribers: dmgreen, ostannard, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75288	2020-03-02 16:28:57 -08:00
Greg Clayton	8d41f1a023	Fix GSYM tests to run the yaml files and fix test failures on some machines. YAML files were not being run during lit testing as there was no lit.local.cfg file. Once this was fixed, some buildbots would fail due to a StringRef that pointed to a std::string inside of a temporary llvm::Triple object. These issues are fixed here by making a local triple object that stays around long enough so the StringRef points to valid data. Also fixed an issue where strings for files in the file table could be added in opposite order due to parameters to function calls not having a strong ordering, which caused tests to fail. Added new arch specfic directories so when targets are not enabled, we continue to function just fine. Differential Revision: https://reviews.llvm.org/D75390	2020-03-02 15:40:11 -08:00
Philip Reames	7049cf6496	[BranchAlign] Fix bug w/nop padding for SS manipulation X86 has several instructions which are documented as enabling interrupts exactly one instruction after the one which changes the SS segment register. Inserting a nop between these two instructions allows an interrupt to arrive before the execution of the following instruction which changes semantic behaviour. The list of instructions is documented in "Table 24-3. Format of Interruptibility State" in Volume 3c of the Intel manual. They basically all come down to different ways to write to the SS register. Differential Revision: https://reviews.llvm.org/D75359	2020-03-02 14:40:25 -08:00
Sumanth Gundapaneni	9897daa6bf	Update LSR's logic that identifies a post-increment SCEV value. One of the checks has been removed as it seem invalid. The LoopStep size is always almost a 32-bit. Differential Revision: https://reviews.llvm.org/D75079	2020-03-02 16:34:18 -06:00
Teresa Johnson	80bf137fa1	Revert "Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP"" This reverts commit `80d0a137a5`, and the follow on fix in `873c0d0786`. It is causing test failures after a multi-stage clang bootstrap. See discussion on D73242 and D75201.	2020-03-02 14:02:13 -08:00
Greg Clayton	e3afe5952d	Revert "Fix GSYM tests to run the yaml files and fix test failures on some machines." This reverts commit `57688350ad`. Need to conditionalize for ARM targets, this is failing on machines that don't have ARM targets.	2020-03-02 13:07:58 -08:00

... 2 3 4 5 6 ...

69653 Commits