llvm-project

Commit Graph

Author	SHA1	Message	Date
Kevin P. Neal	1c3d19c82d	[FPEnv] Add constrained intrinsics for lrint and lround Earlier in the year intrinsics for lrint, llrint, lround and llround were added to llvm. The constrained versions are now implemented here. Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally Approved by: craig.topper Differential Revision: https://reviews.llvm.org/D64746 llvm-svn: 373900	2019-10-07 13:20:00 +00:00
Simon Pilgrim	b4ba3cbda0	[X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217) If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element. This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted. Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper. Fixes PR43217 Differential Revision: https://reviews.llvm.org/D68544 llvm-svn: 373871	2019-10-06 21:11:45 +00:00
Craig Topper	842dde6be4	[LegalizeTypes][X86] When splitting a vselect for type legalization, don't split a setcc condition if the setcc input is legal and vXi1 conditions are supported Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare. Reviewers: RKSimon, spatel, efriedma Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68359 llvm-svn: 373863	2019-10-06 18:43:03 +00:00
Sanjay Patel	f643fabb52	Revert [DAGCombine] Match more patterns for half word bswap This reverts r373850 (git commit `25ba49824d`) This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680 llvm-svn: 373853	2019-10-06 15:27:34 +00:00
Amaury Sechet	25ba49824d	[DAGCombine] Match more patterns for half word bswap Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 373850	2019-10-06 14:14:55 +00:00
Matt Arsenault	a5b9c75674	GlobalISel: Partially implement lower for G_EXTRACT Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838	2019-10-06 01:37:35 +00:00
Craig Topper	2decdf42b9	[FastISel] Copy the inline assembly dialect to the INLINEASM instruction. Fixes PR43575. llvm-svn: 373836	2019-10-05 23:21:17 +00:00
Simon Pilgrim	f609c0a303	BranchFolding - IsBetterFallthrough - assert non-null pointers. NFCI. Silences static analyzer null dereference warnings. llvm-svn: 373823	2019-10-05 13:20:30 +00:00
Philip Reames	d5a4dad206	Fix a nasty miscompile in experimental unordered atomic lowering This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block. Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well. Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant". llvm-svn: 373814	2019-10-05 00:32:10 +00:00
Reid Kleckner	67cfa79c01	Revert [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks This reverts r371177 (git commit `f879c68755`) It caused PR43566 by removing empty, address-taken MachineBasicBlocks. Such blocks may have references from blockaddress or other operands, and need more consideration to be removed. See the PR for a test case to use when relanding. llvm-svn: 373805	2019-10-04 22:24:21 +00:00
Jessica Paquette	784892c964	[MachineOutliner] Disable outlining from noreturn functions Outlining from noreturn functions doesn't do the correct thing right now. The outliner should respect that the caller is marked noreturn. In the event that we have a noreturn function, and the outlined code is in tail position, the outliner will not see that the outlined function should be tail called. As a result, you end up with a regular call containing a return. Fixing this requires that we check that all candidates live inside noreturn functions. So, for the sake of correctness, don't outline from noreturn functions right now. Add machine-outliner-noreturn.mir to test this. llvm-svn: 373791	2019-10-04 21:24:12 +00:00
Eli Friedman	23ae13d51f	[ScheduleDAG] When a node is cloned, add an edge between the nodes. InstrEmitter's virtual register handling assumes that clones are emitted after the cloned node. Make sure this assumption actually holds. Fixes a "Node emitted out of order - early" assertion on the testcase. This is probably a very rare case to actually hit in practice; even without the explicit edge, the scheduler will usually end up scheduling the nodes in the expected order due to other constraints. Differential Revision: https://reviews.llvm.org/D68068 llvm-svn: 373782	2019-10-04 19:51:40 +00:00
James Molloy	9baac83a2e	[ModuloSchedule] Do not remap terminators This is a trivial point fix. Terminator instructions aren't scheduled, so we shouldn't expect to be able to remap them. This doesn't affect Hexagon and PPC because their terminators are always hardware loop backbranches that have no register operands. llvm-svn: 373762	2019-10-04 17:15:25 +00:00
Simon Pilgrim	84f5cd75b3	Fix MSVC "not all control paths return a value" warning. NFCI. llvm-svn: 373741	2019-10-04 12:45:27 +00:00
Jeremy Morse	61800a75b7	[DebugInfo] LiveDebugValues: move DBG_VALUE creation into VarLoc class Rather than having a mixture of location-state shared between DBG_VALUEs and VarLoc objects in LiveDebugValues, this patch makes VarLoc the master record of variable locations. The refactoring means that the transfer of locations from one place to another is always a performed by an operation on an existing VarLoc, that produces another transferred VarLoc. DBG_VALUEs are only created at the end of LiveDebugValues, once all locations are known. As a plus, there is now only one method where DBG_VALUEs can be created. The test case added covers a circumstance that is now impossible to express in LiveDebugValues: if an already-indirect DBG_VALUE is spilt, previously it would have been restored-from-spill as a direct DBG_VALUE. We now don't lose this information along the way, as VarLocs always refer back to the "original" non-transfer DBG_VALUE, and we can always work out whether a location was "originally" indirect. Differential Revision: https://reviews.llvm.org/D67398 llvm-svn: 373727	2019-10-04 10:53:47 +00:00
Jeremy Morse	0ca48de26c	[DebugInfo] LiveDebugValues: defer DBG_VALUE creation during analysis When transfering variable locations from one place to another, LiveDebugValues immediately creates a DBG_VALUE representing that transfer. This causes trouble if the variable location should subsequently be invalidated by a loop back-edge, such as in the added test case: the transfer DBG_VALUE from a now-invalid location is used as proof that the variable location is correct. This is effectively a self-fulfilling prophesy. To avoid this, defer the insertion of transfer DBG_VALUEs until after analysis has completed. Some of those transfers are still sketchy, but we don't propagate them into other blocks now. Differential Revision: https://reviews.llvm.org/D67393 llvm-svn: 373720	2019-10-04 09:38:05 +00:00
Sanjay Patel	288079aafd	[DAGCombiner] add operation legality checks before creating shift ops (PR43542) As discussed on llvm-dev and: https://bugs.llvm.org/show_bug.cgi?id=43542 ...we have transforms that assume shift operations are legal and transforms to use them are profitable, but that may not hold for simple targets. In this case, the MSP430 target custom lowers shifts by repeating (many) simpler/fixed ops. That can be avoided by keeping this code as setcc/select. Differential Revision: https://reviews.llvm.org/D68397 llvm-svn: 373666	2019-10-03 21:34:04 +00:00
David Blaikie	2ac586c58f	DebugInfo: Generalize rnglist emission as a precursor to reusing it for loclist emission llvm-svn: 373663	2019-10-03 20:56:23 +00:00
James Molloy	9972c992eb	[ModuloSchedule] removeBranch() before creating the trip count condition The Hexagon code assumes there's no existing terminator when inserting its trip count condition check. This causes swp-stages5.ll to break. The generated code looks good to me, it is likely a permutation. I have disabled the new codegen path to keep everything green and will investigate along with the other 3-4 tests that have different codegen. Fixes expensive-checks build. llvm-svn: 373629	2019-10-03 17:10:32 +00:00
Guillaume Chatelet	d400d45150	[Alignment][NFC] Remove StoreInst::setAlignment(unsigned) Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu, jdoerfert Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68268 llvm-svn: 373595	2019-10-03 13:17:21 +00:00
David Blaikie	11e0bcf8a2	DebugInfo: Rename DebugLocStream::Entry::Begin/EndSym to just Begin/End Brings this struct in line with the RangeSpan class so they might eventually be used by common template code for generating range/loc lists with less duplicate code. llvm-svn: 373540	2019-10-02 22:58:02 +00:00
Craig Topper	2772b970e3	[LegalizeTypes] Check for already split condition before calilng SplitVecRes_SETCC in SplitRes_SELECT. No point in manually splitting the SETCC if it was already done. llvm-svn: 373535	2019-10-02 22:34:49 +00:00
David Blaikie	b677cb8dc7	DebugInfo: Simplify RangeSpan to be a plain struct This is an effort to make RangeSpan and DebugLocStream::Entry more similar to share code for their emission (to reuse the more complicated code for using (& choosing when to use) base address selection entries, etc). It didn't seem like this struct was worth the complexity of encapsulation - when the members could be initialized by the ctor to any value (no validation) and the type is assignable (so there's no mutability or other constraint being implemented by its interface). llvm-svn: 373533	2019-10-02 22:27:24 +00:00
Simon Pilgrim	49c2390877	[CodeGen] Remove unused MachineMemOperand::print wrappers (PR41772) As noted on PR41772, the static analyzer reports that the MachineMemOperand::print partial wrappers set a number of args to null pointers that were then dereferenced in the actual implementation. It turns out that these wrappers are not being used at all (hence why we're not seeing any crashes), so I'd like to propose we just get rid of them. Differential Revision: https://reviews.llvm.org/D68208 llvm-svn: 373484	2019-10-02 16:20:28 +00:00
Hans Wennborg	9330005a54	Reapply r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)" This was reverted in r373454 due to breaking the expensive-checks bot. This version addresses that by omitting the addSuccessorWithProb() call when omitting the range check. > Switch lowering: omit range check for bit tests when default is unreachable (PR43129) > > This is modeled after the same functionality for jump tables, which was > added in r357067. > > Differential revision: https://reviews.llvm.org/D68131 llvm-svn: 373477	2019-10-02 14:35:06 +00:00
Simon Pilgrim	369d16a1c6	AsmPrinter - emitGlobalConstantFP - silence static analyzer null dereference warning. NFCI. All the calls to emitGlobalConstantFP should provide a nonnull Type for the float. llvm-svn: 373464	2019-10-02 13:08:46 +00:00
James Molloy	9026518e73	[ModuloSchedule] Peel out prologs and epilogs, generate actual code Summary: This extends the PeelingModuloScheduleExpander to generate prolog and epilog code, and correctly stitch uses through the prolog, kernel, epilog DAG. The key concept in this patch is to ensure that all transforms are local; only a function of a block and its immediate predecessor and successor. By defining the problem in this way we can inductively rewrite the entire DAG using only local knowledge that is easy to reason about. For example, we assume that all prologs and epilogs are near-perfect clones of the steady-state kernel. This means that if a block has an instruction that is predicated out, we can redirect all users of that instruction to that equivalent instruction in our immediate predecessor. As all blocks are clones, every instruction must have an equivalent in every other block. Similarly we can make the assumption by construction that if a value defined in a block is used outside that block, the only possible user is its immediate successors. We maintain this even for values that are used outside the loop by creating a limited form of LCSSA. This code isn't small, but it isn't complex. Enabled a bunch of testing from Hexagon. There are a couple of tests not enabled yet; I'm about 80% sure there isn't buggy codegen but the tests are checking for patterns that we don't produce. Those still need a bit more investigation. In the meantime we (Google) are happy with the code produced by this on our downstream SMS implementation, and believe it generates correct code. Subscribers: mgorny, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68205 llvm-svn: 373462	2019-10-02 12:46:44 +00:00
Hans Wennborg	372aece777	Revert r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)" This broke http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/19967 > Switch lowering: omit range check for bit tests when default is unreachable (PR43129) > > This is modeled after the same functionality for jump tables, which was > added in r357067. > > Differential revision: https://reviews.llvm.org/D68131 llvm-svn: 373454	2019-10-02 12:08:44 +00:00
Simon Pilgrim	c9129cea27	WinException::emitExceptHandlerTable - silence static analyzer dyn_cast<Function> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<Function> directly and if not assert will fire for us. llvm-svn: 373449	2019-10-02 11:48:32 +00:00
Hans Wennborg	cbefc36fcc	Switch lowering: omit range check for bit tests when default is unreachable (PR43129) This is modeled after the same functionality for jump tables, which was added in r357067. Differential revision: https://reviews.llvm.org/D68131 llvm-svn: 373431	2019-10-02 08:32:15 +00:00
David Blaikie	bfc68885d9	DebugInfo: Update support for detecting C++ language variants in debug info emission llvm-svn: 373420	2019-10-02 01:39:48 +00:00
Jakub Kuderski	856c1cd852	[Dominators][CodeGen] Don't mark MachineDominatorTree as preserved in MachineLICM llvm-svn: 373378	2019-10-01 18:27:44 +00:00
Jakub Kuderski	5be08ee902	[Dominators][CodeGen] Fix MachineDominatorTree preservation in PHIElimination Summary: PHIElimination modifies CFG and marks MachineDominatorTree as preserved. Therefore, it the CFG changes it should also update the MDT, when available. This patch teaches PHIElimination to recalculate MDT when necessary. This fixes the `tailmerging_in_mbp.ll` test failure discovered after switching to generic DomTree verification algorithm in MachineDominators in D67976. Reviewers: arsenm, hliao, alex-t, rampitec, vpykhtin, grosser Reviewed By: rampitec Subscribers: MatzeB, wdng, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68154 llvm-svn: 373377	2019-10-01 18:27:17 +00:00
Jakub Kuderski	925c285f43	Reapply [Dominators][CodeGen] Clean up MachineDominators This reverts r373117 (git commit `159ef37735`) Phabricator review: https://reviews.llvm.org/D67976. llvm-svn: 373376	2019-10-01 18:27:14 +00:00
Jay Foad	e536800022	[AMDGPU] Add VerifyScheduling support. Summary: This is cut and pasted from the corresponding GenericScheduler functions. Reviewers: arsenm, atrick, tstellar, vpykhtin Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68264 llvm-svn: 373346	2019-10-01 15:45:47 +00:00
Simon Pilgrim	3c912c4abe	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 373343	2019-10-01 15:32:04 +00:00
Jakub Kuderski	56b52a207f	[Dominators][CodeGen] Add MachinePostDominatorTree verification Summary: This patch implements Machine PostDominator Tree verification and ensures that the verification doesn't fail the in-tree tests. MPDT verification can be enabled using `verify-machine-dom-info` -- the same flag used by Machine Dominator Tree verification. Flipping the flag revealed that MachineSink falsely claimed to preserve CFG and MDT/MPDT. This patch fixes that. Reviewers: arsenm, hliao, rampitec, vpykhtin, grosser Reviewed By: hliao Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68235 llvm-svn: 373341	2019-10-01 15:23:27 +00:00
Dmitri Gribenko	827a7fab78	Revert "GlobalISel: Handle llvm.read_register" This reverts commit r373294. It broke Clang's CodeGen/arm64-microsoft-status-reg.cpp: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18483 llvm-svn: 373310	2019-10-01 08:24:01 +00:00
Matt Arsenault	bdcc6d3d26	GlobalISel: Handle llvm.read_register SelectionDAG has a bunch of machinery to defer this to selection time for some reason. Just directly emit a copy during IRTranslator. The x86 usage does somewhat questionably check hasFP, which could depend on the whole function being at minimum translated. This does lose the convergent bit if the callsite had it, which may be a problem. We also lose that in general for intrinsics, which may also be a problem. llvm-svn: 373294	2019-10-01 02:07:16 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Matt Arsenault	ed85b0cee6	GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287	2019-10-01 01:06:48 +00:00
David Blaikie	38456776b3	DebugInfo: Simplify section label caching/usage llvm-svn: 373273	2019-09-30 23:19:10 +00:00
Amaury Sechet	e6f98c0073	[DAGCombiner] Clang format MatchRotate. NFC llvm-svn: 373269	2019-09-30 21:41:52 +00:00
Daniel Sanders	cbe13a1461	[globalisel][knownbits] Allow targets to call GISelKnownBits::computeKnownBitsImpl() Summary: It seems we missed that the target hook can't query the known-bits for the inputs to a target instruction. Fix that oversight Reviewers: aditya_nandakumar Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67380 llvm-svn: 373264	2019-09-30 20:55:53 +00:00
Amaury Sechet	496c0564f1	[DAGCombiner] Update MatchRotate so that it returns an SDValue. NFC llvm-svn: 373260	2019-09-30 20:47:23 +00:00
Yuanfang Chen	cc382cf727	[NewPM] Port MachineModuleInfo to the new pass manager. Existing clients are converted to use MachineModuleInfoWrapperPass. The new interface is for defining a new pass manager API in CodeGen. Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm Reviewed By: arsenm, fedor.sergeev Differential Revision: https://reviews.llvm.org/D64183 llvm-svn: 373240	2019-09-30 17:54:50 +00:00
Jessica Paquette	b1c1095fdc	[AArch64][GlobalISel] Support lowering variadic musttail calls This adds support for lowering variadic musttail calls. To do this, we have to... - Detect a musttail call in a variadic function before attempting to lower the call's formal arguments. This is done in the IRTranslator. - Compute forwarded registers in `lowerFormalArguments`, and add copies for those registers. - Restore the forwarded registers in `lowerTailCall`. Because there doesn't seem to be any nice way to wrap these up into the outgoing argument handler, the restore code in `lowerTailCall` is done separately. Also, irritatingly, you have to make sure that the registers don't overlap with any passed parameters. Otherwise, the scheduler doesn't know what to do with the extra copies and asserts. Add call-translator-variadic-musttail.ll to test this. This is pretty much the same as the X86 musttail-varargs.ll test. We didn't have as nice of a test to base this off of, but the idea is the same. Differential Revision: https://reviews.llvm.org/D68043 llvm-svn: 373226	2019-09-30 16:49:13 +00:00
Paul Robinson	ed1f3f36ae	[SSP] [3/3] cmpxchg and addrspacecast instructions can now trigger stack protectors. Fixes PR42238. Add test coverage for llvm.memset, as proxy for all llvm.mem* intrinsics. There are two issues here: (1) they could be lowered to a libc call, which could be intercepted, and do Bad Stuff; (2) with a non-constant size, they could overwrite the current stack frame. The test was mostly written by Matt Arsenault in r363169, which was later reverted; I tweaked what he had and added the llvm.memset part. Differential Revision: https://reviews.llvm.org/D67845 llvm-svn: 373220	2019-09-30 15:11:23 +00:00
Paul Robinson	527815f5b0	[SSP] [2/3] Refactor an if/dyn_cast chain to switch on opcode. NFC Differential Revision: https://reviews.llvm.org/D67844 llvm-svn: 373219	2019-09-30 15:08:38 +00:00
Paul Robinson	14945186c2	[SSP] [1/3] Revert "StackProtector: Use PointerMayBeCaptured" "Captured" and "relevant to Stack Protector" are not the same thing. This reverts commit `f29366b1f5`. aka r363169. Differential Revision: https://reviews.llvm.org/D67842 llvm-svn: 373216	2019-09-30 15:01:35 +00:00
Tamas Berghammer	421a186fb4	Support MemoryLocation::UnknownSize in TargetLowering::IntrinsicInfo Summary: Previously IntrinsicInfo::size was an unsigned what can't represent the 64 bit value used by MemoryLocation::UnknownSize. Reviewers: jmolloy Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68219 llvm-svn: 373214	2019-09-30 14:44:24 +00:00
Guillaume Chatelet	ab11b9188d	[Alignment][NFC] Remove AllocaInst::setAlignment(unsigned) Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68141 llvm-svn: 373207	2019-09-30 13:34:44 +00:00
Guillaume Chatelet	17380227e8	[Alignment][NFC] Remove LoadInst::setAlignment(unsigned) Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68142 llvm-svn: 373195	2019-09-30 09:37:05 +00:00
Hans Wennborg	dc7dbb1a88	NFC changes to SelectionDAGBuilder::visitBitTestHeader(), preparing for PR43129 llvm-svn: 373191	2019-09-30 08:47:53 +00:00
Roger Ferrer Ibanez	5a2a14db0b	[TargetLowering] Simplify expansion of S{ADD,SUB}O ISD::SADDO uses the suggested sequence described in the section §2.4 of the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for (non-zero) positive. Differential Revision: https://reviews.llvm.org/D47927 llvm-svn: 373187	2019-09-30 07:58:50 +00:00
Amara Emerson	509a4947c9	Add an operand to memory intrinsics to denote the "tail" marker. We need to propagate this information from the IR in order to be able to safely do tail call optimizations on the intrinsics during legalization. Assuming it's safe to do tail call opt without checking for the marker isn't safe because the mem libcall may use allocas from the caller. This adds an extra immediate operand to the end of the intrinsics and fixes the legalizer to handle it. Differential Revision: https://reviews.llvm.org/D68151 llvm-svn: 373140	2019-09-28 05:33:21 +00:00
Jakub Kuderski	159ef37735	Revert [Dominators][CodeGen] Clean up MachineDominators This reverts r373101 (git commit `72c57ec3e6`) llvm-svn: 373117	2019-09-27 19:33:39 +00:00
Jakub Kuderski	72c57ec3e6	[Dominators][CodeGen] Clean up MachineDominators Summary: This is a cleanup patch for MachineDominatorTree. It would be an NFC, except for replacing custom DomTree verification with the generic one. Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao Reviewed By: arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67976 llvm-svn: 373101	2019-09-27 17:25:39 +00:00
Djordje Todorovic	eb4c98ca3d	[DebugInfo] Exclude memory location values as parameter entry values Abandon describing of loaded values due to safety concerns. Loaded values are described as derefed memory location at caller point. At callee we can unintentionally change that memory location which would lead to different entry being printed value before and after the memory location clobbering. This problem is described in llvm.org/PR43343. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D67717 llvm-svn: 373089	2019-09-27 13:52:43 +00:00
Jesper Antonsson	39b81f1cbc	[CodeGenPrepare] Mend "avoid crashing from replacing a phi twice" fix. Summary: An erroneously negated if-statement by an earlier (March 2019) bugfix left phi replacement/simplification under optimizeMemoryInst() in CodeGenPrepare largely inactivated. The error was found when csmith found that the same assert as in the original bug report could still be triggered in a different way. This patch fixes the bugfix. The original bug was: https://bugs.llvm.org/show_bug.cgi?id=41052 ... and the previous fix was D59358. Reviewers: aprantl, skatkov Reviewed By: skatkov Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67838 llvm-svn: 373084	2019-09-27 13:01:37 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Hans Wennborg	3740ae3b8a	Revert r372893 "[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets" This caused severe compile-time regressions, see PR43455. > Modern processors predict the targets of an indirect branch regardless of > the size of any jump table used to glean its target address. Moreover, > branch predictors typically use resources limited by the number of actual > targets that occur at run time. > > This patch changes the semantics of the option `-max-jump-table-size` to limit > the number of different targets instead of the number of entries in a jump > table. Thus, it is now renamed to `-max-jump-table-targets`. > > Before, when `-max-jump-table-size` was specified, it could happen that > cluster jump tables could have targets used repeatedly, but each one was > counted and typically resulted in tables with the same number of entries. > With this patch, when specifying `-max-jump-table-targets`, tables may have > different lengths, since the number of unique targets is counted towards the > limit, but the number of unique targets in tables is the same, but for the > last one containing the balance of targets. > > Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 373060	2019-09-27 09:54:26 +00:00
Changpeng Fang	f5524f0451	Remove the AliasAnalysis argument in function areMemAccessesTriviallyDisjoint Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D58360 llvm-svn: 373024	2019-09-26 22:53:44 +00:00
Xiangling Liao	3b808fb330	[AIX]Emit function descriptor csect in assembly This patch emits the function descriptor csect for functions with definitions under both 32-bit/64-bit mode on AIX. Differential Revision: https://reviews.llvm.org/D66724 llvm-svn: 373009	2019-09-26 19:38:32 +00:00
Mikael Holmen	957e090ac9	[IfConversion] Disallow TBB == FBB for valid triangles Summary: Previously the case EBB \| \_ \| \| \| TBB \| / FBB was treated as a valid triangle also when TBB and FBB was the same basic block. This could then lead to an invalid CFG when we removed the edge from EBB to TBB, since that meant we would also remove the edge from EBB to FBB. Since TBB == FBB is quite a degenerated case of a triangle, we now don't treat it as a valid triangle anymore, and thus we will avoid the trouble with updating the CFG. Reviewers: efriedma, dmgreen, kparzysz Reviewed By: efriedma Subscribers: bjope, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67832 llvm-svn: 372943	2019-09-26 06:35:55 +00:00
Thomas Raoux	3c8c667235	[TargetLowering] Make allowsMemoryAccess methode virtual. Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935	2019-09-26 00:16:01 +00:00
Jessica Paquette	8535a8672e	[AArch64][GlobalISel] Choose CCAssignFns per-argument for tail call lowering When checking for tail call eligibility, we should use the correct CCAssignFn for each argument, rather than just checking if the caller/callee is varargs or not. This is important for tail call lowering with varargs. If we don't check it, then basically any varargs callee with parameters cannot be tail called on Darwin, for one thing. If the parameters are all guaranteed to be in registers, this should be entirely safe. On top of that, not checking for this could potentially make it so that we have the wrong stack offsets when checking for tail call eligibility. Also refactor some of the stuff for CCAssignFnForCall and pull it out into a helper function. Update call-translator-tail-call.ll to show that we can now correctly tail call on Darwin. Also add two extra tail call checks. The first verifies that we still respect the caller's stack size, and the second verifies that we still don't tail call when a varargs function has a memory argument. Differential Revision: https://reviews.llvm.org/D67939 llvm-svn: 372897	2019-09-25 16:45:35 +00:00
Evandro Menezes	3bd8ba156b	[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets Modern processors predict the targets of an indirect branch regardless of the size of any jump table used to glean its target address. Moreover, branch predictors typically use resources limited by the number of actual targets that occur at run time. This patch changes the semantics of the option `-max-jump-table-size` to limit the number of different targets instead of the number of entries in a jump table. Thus, it is now renamed to `-max-jump-table-targets`. Before, when `-max-jump-table-size` was specified, it could happen that cluster jump tables could have targets used repeatedly, but each one was counted and typically resulted in tables with the same number of entries. With this patch, when specifying `-max-jump-table-targets`, tables may have different lengths, since the number of unique targets is counted towards the limit, but the number of unique targets in tables is the same, but for the last one containing the balance of targets. Differential revision: https://reviews.llvm.org/D60295 llvm-svn: 372893	2019-09-25 16:10:20 +00:00
Sanjay Patel	831a7e7068	[DAGCombiner] add one-use restriction to vector transform with cheap extract We might be able to do better on the example in the test, but in general, we should not scalarize a splatted vector binop if there are other uses of the binop. Otherwise, we can end up with code as we had - a scalar op that is redundant with a vector op. llvm-svn: 372886	2019-09-25 15:08:33 +00:00
Simon Pilgrim	5f2d8b2618	[TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr& Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future. Committed on behalf of @hvdijk (Harald van Dijk) Differential Revision: https://reviews.llvm.org/D66138 llvm-svn: 372882	2019-09-25 14:55:57 +00:00
Jakub Kuderski	269bd15c68	[Dominators][AMDGPU] Don't use virtual exit node in findNearestCommonDominator. Cleanup MachinePostDominators. Summary: This patch fixes a bug that originated from passing a virtual exit block (nullptr) to `MachinePostDominatorTee::findNearestCommonDominator` and resulted in assertion failures inside its callee. It also applies a small cleanup to the class. The patch introduces a new function in PDT that given a list of `MachineBasicBlock`s finds their NCD. The new overload of `findNearestCommonDominator` handles virtual root correctly. Note that similar handling of virtual root nodes is not necessary in (forward) `DominatorTree`s, as right now they don't use virtual roots. Reviewers: tstellar, tpr, nhaehnle, arsenm, NutshellySima, grosser, hliao Reviewed By: hliao Subscribers: hliao, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #amdgpu, #llvm Differential Revision: https://reviews.llvm.org/D67974 llvm-svn: 372874	2019-09-25 14:04:36 +00:00
Sanjay Patel	2cec4b58f5	Revert [IR] allow fast-math-flags on phi of FP values This reverts r372866 (git commit `dec03223a9`) llvm-svn: 372868	2019-09-25 13:29:09 +00:00
Sanjay Patel	dec03223a9	[IR] allow fast-math-flags on phi of FP values The changes here are based on the corresponding diffs for allowing FMF on 'select': D61917 As discussed there, we want to have fast-math-flags be a property of an FP value because the alternative (having them on things like fcmp) leads to logical inconsistency such as: https://bugs.llvm.org/show_bug.cgi?id=38086 The earlier patch for select made almost no practical difference because most unoptimized conditional code begins life as a phi (based on what I see in clang). Similarly, I don't expect this patch to do much on its own either because SimplifyCFG promptly drops the flags when converting to select on a minimal example like: https://bugs.llvm.org/show_bug.cgi?id=39535 But once we have this plumbing in place, we should be able to wire up the FMF propagation and start solving cases like that. The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a regression in a LoopVectorize test. We are intersecting the FMF of any FPMathOperator there, so if a phi is not properly annotated, new math instructions may not be either. Once we fix the propagation in SimplifyCFG, it may be safe to remove that hack. Differential Revision: https://reviews.llvm.org/D67564 llvm-svn: 372866	2019-09-25 13:14:12 +00:00
Simon Pilgrim	20f4afc5a7	[DAG] Pull out minimum shift value calc into a helper function. NFCI. llvm-svn: 372856	2019-09-25 12:28:56 +00:00
Simon Pilgrim	be9beef5da	AggressiveAntiDepBreaker - silence static analyzer null dereference warning. NFCI. Assert that we've found the critical path. llvm-svn: 372759	2019-09-24 13:57:51 +00:00
Ilya Biryukov	60e5e0b667	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
Simon Pilgrim	9942c07745	[ModuloSchedule] KernelRewriter::rewrite - silence static analyzer dyn_cast<> null dereference warning. NFCI. Assert that we've found the start of the MI schedule list. llvm-svn: 372723	2019-09-24 10:58:42 +00:00
Simon Pilgrim	c81f8e4ce1	lowerObjCCall - silence static analyzer dyn_cast<CallInst> null dereference warnings. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<CallInst> directly and if not assert will fire for us. llvm-svn: 372720	2019-09-24 10:46:30 +00:00
Pavel Labath	aaff1a631a	MCRegisterInfo: Merge getLLVMRegNum and getLLVMRegNumFromEH Summary: The functions different in two ways: - getLLVMRegNum could return both "eh" and "other" dwarf register numbers, while getLLVMRegNumFromEH only returned the "eh" number. - getLLVMRegNum asserted if the register was not found, while the second function returned -1. The second distinction was pretty important, but it was very hard to infer that from the function name. Aditionally, for the use case of dumping dwarf expressions, we needed a function which can work with both kinds of number, but does not assert. This patch solves both of these issues by merging the two functions into one, returning an Optional<unsigned> value. While the same thing could be achieved by adding an "IsEH" argument to the (renamed) getLLVMRegNumFromEH function, it seemed better to avoid the confusion of two functions and put the choice of asserting into the hands of the caller -- if he checks the Optional value, he can safely process "untrusted" input, and if he blindly dereferences the Optional, he gets the assertion. I've updated all call sites to the new API, choosing between the two options according to the function they were calling originally, except that I've updated the usage in DWARFExpression.cpp to use the "safe" method instead, and added a test case which would have previously triggered an assertion failure when processing (incorrect?) dwarf expressions. Reviewers: dsanders, arsenm, JDevlieghere Subscribers: wdng, aprantl, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67154 llvm-svn: 372710	2019-09-24 09:31:02 +00:00
Amara Emerson	adec1209e6	[GlobalISel][IRTranslator] Fix switch table lowering to use signed LE not unsigned. We were miscompiling switch value comparisons with the wrong signedness, which shows up when we have things like switch case values with i1 types, which end up being legalized incorrectly. Fixes PR43383 llvm-svn: 372675	2019-09-24 00:09:23 +00:00
Sanjay Patel	7414151929	[BreakFalseDeps] ignore function with minsize attribute This came up in the x86-specific: https://bugs.llvm.org/show_bug.cgi?id=43239 ...but it is a general problem for the BreakFalseDeps pass. Dependencies may be broken by adding some other instruction, so that should be avoided if the overall goal is to minimize size. Differential Revision: https://reviews.llvm.org/D67363 llvm-svn: 372628	2019-09-23 17:01:01 +00:00
Guillaume Chatelet	1ae7905fc8	[Alignment][NFC] DataLayout migration to llvm::Align Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67914 llvm-svn: 372596	2019-09-23 12:41:36 +00:00
Simon Pilgrim	f6f6c6ca3b	Localizer - fix "variable used but never read" analyzer warning. NFCI. Simplify the code by separating the modification of the Changed variable from returning it. llvm-svn: 372583	2019-09-23 11:38:10 +00:00
Simon Pilgrim	0d6684d7e5	TargetInstrInfo::getStackSlotRange - fix "variable used but never read" analyzer warning. NFCI. We don't need to divide the BitSize local variable at all. llvm-svn: 372582	2019-09-23 11:36:24 +00:00
Simon Pilgrim	0b184b8526	CriticalAntiDepBreaker - Assert that we've found the bottom of the critical path. NFCI. Silences static analyzer null dereference warnings. llvm-svn: 372577	2019-09-23 10:42:47 +00:00
Craig Topper	a533e87792	[X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 intrinsics into the X86 backend. This intrinsics should be shift by immediate, but gcc allows any i32 scalar and clang needs to match that. So we try to detect the non-constant case and move the data from an integer register to an MMX register. Previously this was done by creating a v2i32 build_vector and bitcast in SelectionDAGBuilder. This had to be done early since v2i32 isn't a legal type. The bitcast+build_vector would be DAG combined to X86ISD::MMX_MOVW2D which isel will turn into a GPR->MMX MOVD. This commit just moves the whole thing to lowering and emits the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The test changes just seem to be due to nodes being linearized in a different order. llvm-svn: 372535	2019-09-23 01:05:33 +00:00
Simon Pilgrim	c8a9ae4ce2	[SelectionDAG] computeKnownBits/ComputeNumSignBits - cleanup demanded/unknown paths. NFCI. Merge the calls, just adjust the demandedelts if we have a valid extract_subvector constant index, else demand all elts. llvm-svn: 372521	2019-09-22 18:47:12 +00:00
James Molloy	8a74eca398	[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount Recommit: fix asan errors. The way MachinePipeliner uses these target hooks is stateful - we reduce trip count by one per call to reduceLoopCount. It's a little overfit for hardware loops, where we don't have to worry about stitching a loop induction variable across prologs and epilogs (the induction variable is implicit). This patch introduces a new API: /// Analyze loop L, which must be a single-basic-block loop, and if the /// conditions can be understood enough produce a PipelinerLoopInfo object. virtual std::unique_ptr<PipelinerLoopInfo> analyzeLoopForPipelining(MachineBasicBlock LoopBB) const; The return value is expected to be an implementation of the abstract class: /// Object returned by analyzeLoopForPipelining. Allows software pipelining /// implementations to query attributes of the loop being pipelined. class PipelinerLoopInfo { public: virtual ~PipelinerLoopInfo(); /// Return true if the given instruction should not be pipelined and should /// be ignored. An example could be a loop comparison, or induction variable /// update with no users being pipelined. virtual bool shouldIgnoreForPipelining(const MachineInstr MI) const = 0; /// Create a condition to determine if the trip count of the loop is greater /// than TC. /// /// If the trip count is statically known to be greater than TC, return /// true. If the trip count is statically known to be not greater than TC, /// return false. Otherwise return nullopt and fill out Cond with the test /// condition. virtual Optional<bool> createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB, SmallVectorImpl<MachineOperand> &Cond) = 0; /// Modify the loop such that the trip count is /// OriginalTC + TripCountAdjust. virtual void adjustTripCount(int TripCountAdjust) = 0; /// Called when the loop's preheader has been modified to NewPreheader. virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0; /// Called when the loop is being removed. virtual void disposed() = 0; }; The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while allowing the target to hold its own state across all calls. This API, in particular the disjunction of creating a trip count check condition and adjusting the loop, improves the code quality in ModuloSchedule.cpp. llvm-svn: 372463	2019-09-21 08:19:41 +00:00
Amara Emerson	7ac1039957	[GlobalISel] Defer setting HasCalls on MachineFrameInfo to selection time. We currently always set the HasCalls on MFI during translation and legalization if we're handling a call or legalizing to a libcall. However, if that call is later optimized to a tail call then we don't need the flag. The flag being set to true causes frame lowering to always save and restore FP/LR, which adds unnecessary code. This change does the same thing as SelectionDAG and ports over some code that scans instructions after selection, using TargetInstrInfo to determine if target opcodes are known calls. Code size geomean improvements on CTMark: -O0 : 0.1% -Os : 0.3% Differential Revision: https://reviews.llvm.org/D67868 llvm-svn: 372443	2019-09-20 23:52:07 +00:00
Mitch Phillips	72a3d8597d	Revert "[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount" This commit broke the ASan buildbot. See comments in rL372376 for more information. This reverts commit `15e27b0b6d`. llvm-svn: 372425	2019-09-20 20:25:16 +00:00
Craig Topper	1b7b4b467f	[SelectionDAG][Mips][Sparc] Don't allow SimplifyDemandedBits to constant fold TargetConstant nodes to a Constant. Summary: After the switch in SimplifyDemandedBits, it tries to create a constant when possible. If the original node is a TargetConstant the default in the switch will call computeKnownBits on the TargetConstant which will succeed. This results in the TargetConstant becoming a Constant. But TargetConstant exists to avoid being changed. I've fixed the two cases that relied on this in tree by explicitly making the nodes constant instead of target constant. The Sparc case is an old bug. The Mips case was recently introduced now that ImmArg on intrinsics gets turned into a TargetConstant when the SelectionDAG is created. I've removed the ImmArg since it lowers to generic code. Reviewers: arsenm, RKSimon, spatel Subscribers: jyknight, sdardis, wdng, arichardson, hiraditya, fedor.sergeev, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67802 llvm-svn: 372409	2019-09-20 16:49:51 +00:00
Krzysztof Parzyszek	2b5d7e93dd	[MVT] Add v256i1 to MachineValueType This type can show up when lowering some HVX vector code on Hexagon. llvm-svn: 372403	2019-09-20 15:19:20 +00:00
David Stenberg	b71d8d465a	Add a missing space in a MIR parser error message llvm-svn: 372398	2019-09-20 14:41:41 +00:00
David Tellenbach	2a47c77e72	[FastISel] Fix insertion of unconditional branches during FastISel The insertion of an unconditional branch during FastISel can differ depending on building with or without debug information. This happens because FastISel::fastEmitBranch emits an unconditional branch depending on the size of the current basic block without distinguishing between debug and non-debug instructions. This patch fixes this issue by ignoring debug instructions when getting the size of the basic block. Reviewers: aprantl Reviewed By: aprantl Subscribers: ormris, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67703 llvm-svn: 372389	2019-09-20 13:22:59 +00:00
James Molloy	15e27b0b6d	[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount The way MachinePipeliner uses these target hooks is stateful - we reduce trip count by one per call to reduceLoopCount. It's a little overfit for hardware loops, where we don't have to worry about stitching a loop induction variable across prologs and epilogs (the induction variable is implicit). This patch introduces a new API: /// Analyze loop L, which must be a single-basic-block loop, and if the /// conditions can be understood enough produce a PipelinerLoopInfo object. virtual std::unique_ptr<PipelinerLoopInfo> analyzeLoopForPipelining(MachineBasicBlock LoopBB) const; The return value is expected to be an implementation of the abstract class: /// Object returned by analyzeLoopForPipelining. Allows software pipelining /// implementations to query attributes of the loop being pipelined. class PipelinerLoopInfo { public: virtual ~PipelinerLoopInfo(); /// Return true if the given instruction should not be pipelined and should /// be ignored. An example could be a loop comparison, or induction variable /// update with no users being pipelined. virtual bool shouldIgnoreForPipelining(const MachineInstr MI) const = 0; /// Create a condition to determine if the trip count of the loop is greater /// than TC. /// /// If the trip count is statically known to be greater than TC, return /// true. If the trip count is statically known to be not greater than TC, /// return false. Otherwise return nullopt and fill out Cond with the test /// condition. virtual Optional<bool> createTripCountGreaterCondition(int TC, MachineBasicBlock &MBB, SmallVectorImpl<MachineOperand> &Cond) = 0; /// Modify the loop such that the trip count is /// OriginalTC + TripCountAdjust. virtual void adjustTripCount(int TripCountAdjust) = 0; /// Called when the loop's preheader has been modified to NewPreheader. virtual void setPreheader(MachineBasicBlock *NewPreheader) = 0; /// Called when the loop is being removed. virtual void disposed() = 0; }; The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while allowing the target to hold its own state across all calls. This API, in particular the disjunction of creating a trip count check condition and adjusting the loop, improves the code quality in ModuloSchedule.cpp. llvm-svn: 372376	2019-09-20 08:57:46 +00:00
Matt Arsenault	dd74f4839b	MachineScheduler: Fix missing dependency with multiple subreg defs If an instruction had multiple subregister defs, and one of them was undef, this would improperly conclude all other lanes are killed. There could still be other defs of those read-undef lanes in other operands. This would improperly remove register uses from CurrentVRegUses, so the visitation of later operands would not find the necessary register dependency. This would also mean this would fail or not depending on how different subregister def operands were ordered. On an undef subregister def, scan the instruction for other subregister defs and avoid killing those. This possibly should be deferring removing anything from CurrentVRegUses until the entire instruction has been processed instead. llvm-svn: 372362	2019-09-20 00:09:15 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Simon Pilgrim	af6043557d	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333	2019-09-19 15:02:47 +00:00
Amaury Sechet	9e94ef42ba	[DAGCombiner] Add node to the worklist in topological order in scalarizeExtractedVectorLoad Summary: As per title. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66661 llvm-svn: 372327	2019-09-19 14:22:11 +00:00
Simon Pilgrim	c65dd89804	[DAG] Add SelectionDAG::MaxRecursionDepth constant As commented on D67557 we have a lot of uses of depth checks all using magic numbers. This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly. Differential Revision: https://reviews.llvm.org/D67711 llvm-svn: 372315	2019-09-19 12:58:43 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
Matt Arsenault	c189f023ac	MachineScheduler: Fix assert from not checking subregs The assert would fail if there was a dead def of a subregister if there was a previous use of a different subregister. llvm-svn: 372287	2019-09-19 02:14:12 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Adrian Prantl	0779dffbd4	Remove the obsolete BlockByRefStruct flag from LLVM IR DIFlagBlockByRefStruct is an unused DIFlag that originally was used by clang to express (Objective-)C block captures in debug info. For the last year Clang has been emitting complex DIExpressions to describe block captures instead, which makes all the code supporting this flag redundant. This patch removes the flag and all supporting "dead" code, so we can reuse the bit for something else in the future. Since this only affects debug info generated by Clang with the block extension this mostly affects Apple platforms and I don't have any bitcode compatibility concerns for removing this. The Verifier will reject debug info that uses the bit and thus degrade gracefully when LTO'ing older bitcode with a newer compiler. rdar://problem/44304813 Differential Revision: https://reviews.llvm.org/D67453 llvm-svn: 372272	2019-09-18 22:38:56 +00:00
Roman Lebedev	c00f318224	[DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold Summary: `DAGCombiner::visitADDLikeCommutative()` already has a sibling fold: `(add X, Carry) -> (addcarry X, 0, Carry)` This fold, as suggested by @efriedma, helps recover from //some// of the regressions of D62266 Reviewers: efriedma, deadalnix Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D62392 llvm-svn: 372259	2019-09-18 20:48:27 +00:00
Guillaume Chatelet	97a18dc704	[Alignment][NFC] Align(1) to Align::None() conversions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67715 llvm-svn: 372234	2019-09-18 16:19:40 +00:00
Guillaume Chatelet	d4c4671aa7	[Alignment][NFC] Remove LogAlignment functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67620 llvm-svn: 372231	2019-09-18 15:49:49 +00:00
Guillaume Chatelet	35b4b403b4	[Alignment][NFC] Use Align::None instead of 1 Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67704 llvm-svn: 372230	2019-09-18 15:40:20 +00:00
Krasimir Georgiev	2f1bba7fd0	Revert "[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize" Summary: This reverts commit r372204. This change causes build bot failures under msan: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/35236/steps/check-llvm%20msan/logs/stdio: ``` FAIL: LLVM :: DebugInfo/AArch64/asan-stack-vars.mir (19531 of 33579) ****************** TEST 'LLVM :: DebugInfo/AArch64/asan-stack-vars.mir' FAILED **************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc -O0 -start-before=livedebugvalues -filetype=obj -o - /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llvm-dwarfdump -v - \| /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir -- Exit Code: 2 Command Output (stderr): -- ==62894==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0xdfcafb in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 #1 0xdfae8a in resolveFrameIndexReference /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1580:10 #2 0xdfae8a in llvm::AArch64FrameLowering::getFrameIndexReference(llvm::MachineFunction const&, int, unsigned int&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1536 #3 0x46642c1 in (anonymous namespace)::LiveDebugValues::extractSpillBaseRegAndOffset(llvm::MachineInstr const&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:582:21 #4 0x4647cb3 in transferSpillOrRestoreInst /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:883:11 #5 0x4647cb3 in process /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1079 #6 0x4647cb3 in (anonymous namespace)::LiveDebugValues::ExtendRanges(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1361 #7 0x463ac0e in (anonymous namespace)::LiveDebugValues::runOnMachineFunction(llvm::MachineFunction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp:1415:18 #8 0x4854ef0 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73:13 #9 0x53b0b01 in llvm::FPPassManager::runOnFunction(llvm::Function&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1648:27 #10 0x53b15f6 in llvm::FPPassManager::runOnModule(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1685:16 #11 0x53b298d in runOnModule /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1750:27 #12 0x53b298d in llvm::legacy::PassManagerImpl::run(llvm::Module&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1863 #13 0x905f21 in compileModule(char, llvm::LLVMContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:601:8 #14 0x8fdc4e in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/llc/llc.cpp:355:22 #15 0x7f67673632e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #16 0x882369 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/llc+0x882369) MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1658:3 in llvm::AArch64FrameLowering::resolveFrameOffsetReference(llvm::MachineFunction const&, int, bool, unsigned int&, bool, bool) const Exiting error: -: The file was not recognized as a valid object file FileCheck error: '-' is empty. FileCheck command line: /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/DebugInfo/AArch64/asan-stack-vars.mir ``` Reviewers: bkramer Reviewed By: bkramer Subscribers: sdardis, aprantl, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67710 llvm-svn: 372228	2019-09-18 14:42:09 +00:00
Sander de Smalen	dc2a7f5b39	[AArch64][DebugInfo] Do not recompute CalleeSavedStackSize This patch fixes a bug exposed by D65653 where a subsequent invocation of `determineCalleeSaves` ends up with a different size for the callee save area, leading to different frame-offsets in debug information. In the invocation by PEI, `determineCalleeSaves` tries to determine whether it needs to spill an extra callee-saved register to get an emergency spill slot. To do this, it calls 'estimateStackSize' and manually adds the size of the callee-saves to this. PEI then allocates the spill objects for the callee saves and the remaining frame layout is calculated accordingly. A second invocation in LiveDebugValues causes estimateStackSize to return the size of the stack frame including the callee-saves. Given that the size of the callee-saves is added to this, these callee-saves are counted twice, which leads `determineCalleeSaves` to believe the stack has become big enough to require spilling an extra callee-save as emergency spillslot. It then updates CalleeSavedStackSize with a larger value. Since CalleeSavedStackSize is used in the calculation of the frame offset in getFrameIndexReference, this leads to incorrect offsets for variables/locals when this information is recalculated after PEI. Reviewers: omjavaid, eli.friedman, thegameg, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D66935 llvm-svn: 372204	2019-09-18 09:02:44 +00:00
Francis Visoiu Mistrih	ba2e752c52	[Remarks] Allow the RemarkStreamer to be used directly with a stream The filename in the RemarkStreamer should be optional to allow clients to stream remarks to memory or to existing streams. This introduces a new overload of `setupOptimizationRemarks`, and avoids enforcing the presence of a filename at different places. llvm-svn: 372195	2019-09-18 01:04:45 +00:00
Craig Topper	b5ffbd0b14	[SimplifyDemandedBits] Use APInt::intersects to instead of ANDing and comparing to 0 separately. NFC llvm-svn: 372158	2019-09-17 18:19:02 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Alexander Timofeev	6524a7a2b9	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed Defferential Revision: https://reviews.llvm.org/D67101 Reviewers: rampitec, vpykhtin llvm-svn: 372086	2019-09-17 09:08:58 +00:00
Amara Emerson	9d64721ca5	[GlobalISel] Partially revert r371901. r371901 was overeager and widenScalarDst() and the like in the legalizer attempt to increment the insert point given in order to add new instructions after the currently legalizing inst. In cases where the insertion point is not exactly the current instruction, then callers need to de-compensate for the behaviour by decrementing the insertion iterator before calling them. It's not a nice state of affairs, for now just undo the problematic parts of the change. llvm-svn: 372050	2019-09-16 23:46:03 +00:00
Simon Pilgrim	a8a4953fdf	[GlobalISel] findGISelOptimalMemOpLowering - remove dead initalization. NFCI. Fixes static analyzer warning that "Value stored to 'NewTySize' during its initialization is never read". llvm-svn: 371937	2019-09-15 16:56:06 +00:00
Simon Pilgrim	2b4ace3f29	InterleavedLoadCombine - merge isa<> and dyn_cast<> duplicates. NFCI. Silence static analyzer null dereference warning of *dyn_cast<BinaryOperator> by merging with the isa<BinaryOperator> above. llvm-svn: 371935	2019-09-15 16:20:12 +00:00
Simon Pilgrim	b743e94cdc	[TargetLowering] SimplifyDemandedBits - add EXTRACT_SUBVECTOR support. Call SimplifyDemandedBits on the source vector. llvm-svn: 371923	2019-09-14 16:38:26 +00:00
Mingjie Xing	4b191770f4	[ScheduleDAGMILive] Fix typo in comment. Differential Revision: https://reviews.llvm.org/D67478 llvm-svn: 371916	2019-09-14 03:27:38 +00:00
Amara Emerson	02bcc86b08	[GlobalISel] Fix insertion point of new instructions to be after PHIs. For some reason we sometimes insert new instructions one instruction before the first non-PHI when legalizing. This can result in having non-PHI instructions before PHIs, which mean that PHI elimination doesn't catch them. Differential Revision: https://reviews.llvm.org/D67570 llvm-svn: 371901	2019-09-13 21:49:24 +00:00
Jessica Paquette	727328ab63	[AArch64][GlobalISel] Tail call memory intrinsics Because memory intrinsics are handled differently than other calls, we need to check them for tail call eligiblity in the legalizer. This allows us to still inline them when it's beneficial to do so, but also tail call when possible. This adds simple tail calling support for when the intrinsic is followed by a return. It ports the attribute checks from `TargetLowering::isInTailCallPosition` into a similarly-named function in LegalizerHelper.cpp. The target-specific `isUsedByReturnOnly` hook is not ported here. Update tailcall-mem-intrinsics.ll to show that GlobalISel can now tail call memory intrinsics. Update legalize-memcpy-et-al.mir to have a case where we don't tail call. Differential Revision: https://reviews.llvm.org/D67566 llvm-svn: 371893	2019-09-13 20:25:58 +00:00
Alexander Timofeev	9ff70132bf	Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. llvm-svn: 371873	2019-09-13 17:37:30 +00:00
Craig Topper	4d1df2aa23	[TargetRegisterInfo] Remove SVT argument from getCommonSubClass. This was added to support fp128 on x86-64, but appears to be unneeded now. This may be because the FR128 register class added back then was merged with the VR128 register class later. llvm-svn: 371815	2019-09-13 05:24:37 +00:00
Tim Shen	a31c521f5e	Temporarily revert r371640 "LiveIntervals: Split live intervals on multiple dead defs". It reveals a miscompile on Hexagon. See PR43302 for details. llvm-svn: 371802	2019-09-13 01:34:25 +00:00
Matt Arsenault	4d33918034	AMDGPU/GlobalISel: Legalize G_FMAD Unlike SelectionDAG, treat this as a normally legalizable operation. In SelectionDAG this is supposed to only ever formed if it's legal, but I've found that to be restricting. For AMDGPU this is contextually legal depending on whether denormal flushing is allowed in the use function. Technically we currently treat the denormal mode as a subtarget feature, so custom lowering could be avoided. However I consider this to be a defect, and this should be contextually dependent on the controllable rounding mode of the parent function. llvm-svn: 371800	2019-09-13 00:44:35 +00:00
Matt Arsenault	b85c8c4bbd	LiveIntervals: Remove assertion This testcase is invalid, and caught by the verifier. For the verifier to catch it, the live interval computation needs to complete. Remove the assert so the verifier catches this, which is less confusing. In this testcase there is an undefined use of a subregister, and lanes which aren't used or defined. An equivalent testcase with the super-register shrunk to have no untouched lanes already hit this verifier error. llvm-svn: 371792	2019-09-12 23:46:51 +00:00
Philip Reames	079e210463	[SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context. Differential Revision: https://reviews.llvm.org/D66318 llvm-svn: 371786	2019-09-12 22:49:17 +00:00
Jessica Paquette	a42070a6aa	[AArch64][GlobalISel] Support sibling calls with outgoing arguments This adds support for lowering sibling calls with outgoing arguments. e.g ``` define void @foo(i32 %a) ``` Support is ported from AArch64ISelLowering's `isEligibleForTailCallOptimization`. The only thing that is missing is a full port of `TargetLowering::parametersInCSRMatch`. So, if we're using swiftself, we'll never tail call. - Rename `analyzeCallResult` to `analyzeArgInfo`, since the function is now used for both outgoing and incoming arguments - Teach `OutgoingArgHandler` about tail calls. Tail calls use frame indices for stack arguments. - Teach `lowerFormalArguments` to set the bytes in the caller's stack argument area. This is used later to check if the tail call's parameters will fit on the caller's stack. - Add `areCalleeOutgoingArgsTailCallable` to perform the eligibility check on the callee's outgoing arguments. For testing: - Update call-translator-tail-call to verify that we can now tail call with outgoing arguments, use G_FRAME_INDEX for stack arguments, and respect the size of the caller's stack - Remove GISel-specific check lines from speculation-hardening.ll, since GISel now tail calls like the other selectors - Add a GISel test line to tailcall-string-rvo.ll since we can tail call in that test now - Add a GISel test line to tailcall_misched_graph.ll since we tail call there now. Add specific check lines for GISel, since the debug output from the machine-scheduler differs with GlobalISel. The dependency still holds, but the output comes out in a different order. Differential Revision: https://reviews.llvm.org/D67471 llvm-svn: 371780	2019-09-12 22:10:36 +00:00
Craig Topper	efe6724b9f	[DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares. The X86 decision assumes the compare will produce a result in an XMM register, but that can't happen for an fp128 compare since those go to a libcall the returns an i32. Pass the VT so X86 can check the type. llvm-svn: 371775	2019-09-12 21:30:18 +00:00
Craig Topper	344c398e2a	[SelectionDAGBuilder] Simplify loop in visitSelect back to how it was before r255558. This code was changed to accomodate fp128 being softened to itself during type legalization on x86-64. This was done in order to create libcalls while having fp128 as a legal type. We're now doing the libcall creation during LegalizeDAG and the type legalization changes to enable the old behavior have been removed. So this change to SelectionDAGBuilder is no longer needed. llvm-svn: 371771	2019-09-12 21:00:32 +00:00
David Green	a6e944b173	[CGP] Ensure sinking multiple instructions does not invalidate dominance checks In MVE, as of rL371218, we are attempting to sink chains of instructions such as: %l1 = insertelement <8 x i8> undef, i8 %l0, i32 0 %broadcast.splat26 = shufflevector <8 x i8> %l1, <8 x i8> undef, <8 x i32> zeroinitializer In certain situations though, we can end up breaking the dominance relations of instructions. This happens when we sink the instruction into a loop, but cannot remove the originals. The Use is updated, which might in fact be a Use from the second instruction to the first. This attempts to fix that by reversing the order of instruction that are sunk, and ensuring that we update the uses on new instructions if they have already been sunk, not the old ones. Differential Revision: https://reviews.llvm.org/D67366 llvm-svn: 371743	2019-09-12 16:00:07 +00:00
Guillaume Chatelet	af11cc7eb5	[Alignment] Move OffsetToAlignment to Alignment.h Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, JDevlieghere, alexshap, rupprecht, jhenderson Subscribers: sdardis, nemanjai, hiraditya, kbarton, jakehehrlich, jrtc27, MaskRay, atanasyan, jsji, seiya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67499 llvm-svn: 371742	2019-09-12 15:20:36 +00:00
Simon Pilgrim	da59a6bf7d	[DAGCombine] visitFDIV - Use isCheaperToUseNegatedFPOps helper for (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y). NFCI. Minor cleanup to use equivalent helper code. llvm-svn: 371724	2019-09-12 11:03:09 +00:00
Tim Northover	f1c2892912	AArch64: support arm64_32, an ILP32 slice for watchOS. This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM. FastISel is mostly disabled for now since it would generate incorrect code for ILP32. llvm-svn: 371722	2019-09-12 10:22:23 +00:00
Tim Northover	98534843fb	CodeGenPrep: add separate hook say when GEPs should be used for sinking. NFCI. Up to now, we've decided whether to sink address calculations using GEPs or normal arithmetic based on the useAA hook, but there are other reasons GEPs might be preferred. So this patch splits the two questions, with a default implementation falling back to useAA. llvm-svn: 371721	2019-09-12 10:21:00 +00:00
Qiu Chaofan	b7fb5d0f6f	[DAGCombiner] Improve division estimation of floating points. Current implementation of estimating divisions loses precision since it estimates reciprocal first and does multiplication. This patch is to re-order arithmetic operations in the last iteration in DAGCombiner to improve the accuracy. Reviewed By: Sanjay Patel, Jinsong Ji Differential Revision: https://reviews.llvm.org/D66050 llvm-svn: 371713	2019-09-12 07:51:24 +00:00
Craig Topper	b8dd075275	[LegalizeTypes] Remove code for softening a float type to itself. This was previously used to turn fp128 operations into libcalls on X86. This is now done through op legalization after r371672. This restores much of this code to before r254653. llvm-svn: 371709	2019-09-12 05:55:14 +00:00
Amara Emerson	55d86f04c7	[AArch64][GlobalISel] Fall back on attempts to allocate split types on the stack. First we were asserting that the ValNo of a VA was the wrong value. It doesn't actually make a difference for us in CallLowering but fix that anyway to silence the assert. The bigger issue was that after fixing the assert we were generating invalid MIR because the merging/unmerging of values split across multiple registers wasn't also implemented for memory locs. This happens when we run out of registers and have to pass the split types like i128 -> i64 x 2 on the stack. This is do-able, but for now just fall back. llvm-svn: 371693	2019-09-11 23:53:23 +00:00
Vedant Kumar	0b91333d59	[DWARF] Emit call site parameter info when tuning for lldb Emit debug entry values using standard DWARF5 opcodes when the debugger tuning is set to lldb. Differential Revision: https://reviews.llvm.org/D67410 llvm-svn: 371666	2019-09-11 21:23:39 +00:00
Matt Arsenault	81196a595c	LiveIntervals: Split live intervals on multiple dead defs If there are multiple dead defs of the same virtual register, these are required to be split into multiple virtual registers with separate live intervals to avoid a verifier error. llvm-svn: 371640	2019-09-11 17:59:21 +00:00
Guillaume Chatelet	97264366fb	[Alignment][NFC] use llvm::Align for AsmPrinter::EmitAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: dschuff, sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67443 llvm-svn: 371616	2019-09-11 13:37:35 +00:00
Guillaume Chatelet	48904e9452	[Alignment] Use llvm::Align in MachineFunction and TargetLowering - fixes mir parsing Summary: This catches malformed mir files which specify alignment as log2 instead of pow2. See https://reviews.llvm.org/D65945 for reference, This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67433 llvm-svn: 371608	2019-09-11 11:16:48 +00:00
Jessica Paquette	469d42fcf6	[GlobalISel] When a tail call is emitted in a block, stop translating it This fixes a crash in tail call translation caused by assume and lifetime_end intrinsics. It's possible to have instructions other than a return after a tail call which will still have `Analysis::isInTailCallPosition` return true. (Namely, lifetime_end and assume intrinsics.) If we emit a tail call, we should stop translating instructions in the block. Otherwise, we can end up emitting an extra return, or dead instructions in general. This makes the verifier unhappy, and is generally unfortunate for codegen. This also removes the code from AArch64CallLowering that checks if we have a tail call when lowering a return. This is covered by the new code now. Also update call-translator-tail-call.ll to show that we now properly tail call in the presence of lifetime_end and assume. Differential Revision: https://reviews.llvm.org/D67415 llvm-svn: 371572	2019-09-10 23:34:45 +00:00
Jessica Paquette	2af5b193d5	[AArch64][GlobalISel] Support sibling calls with mismatched calling conventions Add support for sibcalling calls whose calling convention differs from the caller's. - Port over `CCState::resultsCombatible` from CallingConvLower.cpp into CallLowering. This is used to verify that the way the caller and callee CC handle incoming arguments matches up. - Add `CallLowering::analyzeCallResult`. This is basically a port of `CCState::AnalyzeCallResult`, but using `ArgInfo` rather than `ISD::InputArg`. - Add `AArch64CallLowering::doCallerAndCalleePassArgsTheSameWay`. This checks that the calling conventions are compatible, and that the caller and callee preserve the same registers. For testing: - Update call-translator-tail-call.ll to show that we can now handle this. - Add a GISel line to tailcall-ccmismatch.ll to show that we will not tail call when the regmasks don't line up. Differential Revision: https://reviews.llvm.org/D67361 llvm-svn: 371570	2019-09-10 23:25:12 +00:00
Sanjay Patel	df6a958dcb	[BreakFalseDeps] fix typos/grammar in documentation comment; NFC llvm-svn: 371516	2019-09-10 13:00:31 +00:00
Guillaume Chatelet	3729b17cff	[Alignment][NFC] Use llvm::Align for TargetLowering::getPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Reviewed By: courbet Subscribers: wuzish, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67386 llvm-svn: 371511	2019-09-10 12:00:43 +00:00
Alexander Timofeev	c2d292f839	[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Reviewers: rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D67101 llvm-svn: 371508	2019-09-10 10:58:57 +00:00
Dmitri Gribenko	2bf8d77453	Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."" This reverts commit r371502, it broke tests (clang/test/CodeGenCXX/auto-var-init.cpp). llvm-svn: 371507	2019-09-10 10:39:09 +00:00
Clement Courbet	612c260ec3	Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." With a fix for sanitizer breakage (see explanation in D60318). llvm-svn: 371502	2019-09-10 09:18:00 +00:00
Guillaume Chatelet	b6722af068	[Alignment] Use Align for TargetLowering::MinStackArgumentAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67288 llvm-svn: 371498	2019-09-10 09:01:18 +00:00
Craig Topper	e8b432fa0e	[LegalizeTypes] Teach SoftenFloatOp_SELECT_CC to handle operand 2 or 3 being softened. This can only happen on X86 when fp128 is a legal type, but we go through softening to generate libcalls. This causes fp128 to be softened to fp128 instead of an integer type. This can be removed if D67128 lands. llvm-svn: 371493	2019-09-10 07:56:02 +00:00
Aditya Nandakumar	5112b71126	[GlobalISel]: Fix a bug where we could dereference None getConstantVRegVal returns None when dealing with constants > 64 bits. Don't assume we always have a value in GISelKnownBits. llvm-svn: 371465	2019-09-09 22:51:41 +00:00
Philip Reames	20aafa3156	Introduce infrastructure for an incremental port of SelectionDAG atomic load/store handling This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context. Note that this patch is NFC unless the experimental flag is set. The basic strategy I plan on taking is: introduce infrastructure and a flag for testing (this patch) Audit uses of isVolatile, and apply isAtomic conservatively* piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection flip the flag at the end (with minimal diffs) Work through todo list identified in (2) and (3) exposing performance ops (*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there. We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). Differential Revision: https://reviews.llvm.org/D66309 llvm-svn: 371441	2019-09-09 19:23:22 +00:00
Eli Friedman	79f0d3a6e5	[IfConversion] Correctly handle cases where analyzeBranch fails. If analyzeBranch fails, on some targets, the out parameters point to some blocks in the function. But we can't use that information, so make sure to clear it out. (In some places in IfConversion, we assume that any block with a TrueBB is analyzable.) The change to the testcase makes it trigger a bug on builds without this fix: IfConvertDiamond tries to perform a followup "merge" operation, which isn't legal, and we somehow end up with a branch to a deleted MBB. I'm not sure how this doesn't crash the compiler. Differential Revision: https://reviews.llvm.org/D67306 llvm-svn: 371434	2019-09-09 18:29:27 +00:00
Craig Topper	5ebd0a6e88	[SelectionDAG] Remove ISD::FP_ROUND_INREG I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431	2019-09-09 17:54:44 +00:00
Dmitri Gribenko	d9c4060bd5	Revert "[MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp" This reverts commit 371359. I'm suspecting a miscompile, I posted a reproducer to https://reviews.llvm.org/D65267. llvm-svn: 371421	2019-09-09 16:46:45 +00:00
James Molloy	b6c7fce67a	[DFAPacketizer] Reapply: Track resources for packetized instructions Reapply with fix to reduce resources required by the compiler - use unsigned[2] instead of std::pair. This causes clang and gcc to compile the generated file multiple times faster, and hopefully will reduce the resource requirements on Visual Studio also. This fix is a little ugly but it's clearly the same issue the previous author of DFAPacketizer faced (the previous tables use unsigned[2] rather uglily too). This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 llvm-svn: 371399	2019-09-09 13:17:55 +00:00
Simon Pilgrim	462e3d8050	Revert rL371198 from llvm/trunk: [DFAPacketizer] Track resources for packetized instructions This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 ........ Reverted as this is causing "compiler out of heap space" errors on MSVC 2017/19 NDEBUG builds llvm-svn: 371393	2019-09-09 12:33:22 +00:00
Tim Northover	06d93e0a25	GlobalISel: fix unused warnings in release builds. llvm-svn: 371385	2019-09-09 10:36:58 +00:00
Tim Northover	36147adc0b	GlobalISel: add combiner to form indexed loads. Loosely based on DAGCombiner version, but this part is slightly simpler in GlobalIsel because all address calculation is performed by G_GEP. That makes the inc/dec distinction moot so there's just pre/post to think about. No targets can handle it yet so testing is via a special flag that overrides target hooks. llvm-svn: 371384	2019-09-09 10:04:23 +00:00
Kai Luo	9115c477bb	[MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp Summary: After tailduplication, we have redundant copies. We can remove these copies in machine-cp if it's safe to, i.e. ``` $reg0 = OP ... ... <<< No read or clobber of $reg0 and $reg1 $reg1 = COPY $reg0 <<< $reg0 is killed ... <RET> ``` will be transformed to ``` $reg1 = OP ... ... <RET> ``` Differential Revision: https://reviews.llvm.org/D65267 llvm-svn: 371359	2019-09-09 02:32:42 +00:00
Craig Topper	dac34f52d3	[DAGCombiner][X86][ARM] Teach visitMULO to fold multiplies with 0 to 0 and no carry. I modified the ARM test to use two inputs instead of 0 so the test hopefully still tests what was intended. llvm-svn: 371344	2019-09-08 19:24:39 +00:00
David Stenberg	5a583665f4	[DebugInfo][X86] Describe call site values for zero-valued imms Summary: Add zero-materializing XORs to X86's describeLoadedValue() hook in order to produce call site values. I have had to change the defs logic in collectCallSiteParameters() a bit to be able to describe the XORs. The XORs implicitly define $eflags, which would cause them to never be considered, due to a guard condition that I->getNumDefs() is one. I have changed that condition so that we now only consider instructions where a forwarded register overlaps with the instruction's single explicit define. We still need to collect the implicit defines of other forwarded registers to remove them from the work list. I'm not sure how to move towards supporting instructions with multiple explicit defines, cases where forwarded register are implicitly defined, and/or cases where an instruction produces values for multiple forwarded registers. Perhaps the describeLoadedValue() hook should take a register argument, and we then leave it up to the hook to describe the loaded value in that register? I have not yet encountered a situation where that would be necessary though. Reviewers: aprantl, vsk, djtodoro, NikolaPrica Reviewed By: vsk Subscribers: ychen, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67225 llvm-svn: 371333	2019-09-08 14:22:06 +00:00
David Stenberg	8b70139e95	[NFC] Make the describeLoadedValue() hook return machine operand objects Summary: This changes the ParamLoadedValue pair which the describeLoadedValue() hook returns so that MachineOperand objects are returned instead of pointers. When describing call site values we may need to describe operands which are not part of the instruction. One such example is zero-materializing XORs on x86, which I have implemented support for in a child revision. Instead of having to return a pointer to an operand stored somewhere outside the instruction, start returning objects directly instead, as that simplifies the code. The MachineOperand class only holds POD members, and on x86-64 it is 32 bytes large. That combined with copy elision means that the overhead of returning a machine operand object from the hook does not become very large. I benchmarked this on a 8-thread i7-8650U machine with 32 GB RAM. The benchmark consisted of building a clang 8.0 binary configured with: -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DLLVM_TARGETS_TO_BUILD=X86 \ -DLLVM_USE_SANITIZER=Address \ -DCMAKE_CXX_FLAGS="-Xclang -femit-debug-entry-values -stdlib=libc++" The average wall clock time increased by 4 seconds, from 62:05 to 62:09, which is an 0.1% increase. Reviewers: aprantl, vsk, djtodoro, NikolaPrica Reviewed By: vsk Subscribers: hiraditya, ychen, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67261 llvm-svn: 371332	2019-09-08 14:05:10 +00:00
Bjorn Pettersson	d065c81164	[CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul Summary: Normally TargetLowering::expandFixedPointMul would handle SMULFIXSAT with scale zero by using an SMULO to compute the product and determine if saturation is needed (if overflow happened). But if SMULO isn't custom/legal it falls through and uses the same technique, using MULHS/SMUL_LOHI, as used for non-zero scales. Problem was that when checking for overflow (handling saturation) when not using MULO we did not expect to find a zero scale. So we ended up in an assertion when doing APInt::getLowBitsSet(VTSize, Scale - 1) This patch fixes the problem by adding a new special case for how saturation is computed when scale is zero. Reviewers: RKSimon, bevinh, leonardchan, spatel Reviewed By: RKSimon Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67071 llvm-svn: 371309	2019-09-07 12:16:23 +00:00
Bjorn Pettersson	5e331e4ce8	[Intrinsic] Add the llvm.umul.fix.sat intrinsic Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 llvm-svn: 371308	2019-09-07 12:16:14 +00:00
Bjorn Pettersson	2b698a13a1	[DwarfExpression] Disallow some rewrites to avoid undefined behavior Summary: The value operand in DW_OP_plus_uconst/DW_OP_constu value can be large (it uses uint64_t as representation internally in LLVM). This means that in the uint64_t to int conversions, previously done by DwarfExpression::addMachineRegExpression, could lose information. Also, the negation done in "-Offset" was undefined behavior in case Offset was exactly INT_MIN. To avoid the above problems, we now avoid transformation like [Reg, DW_OP_plus_uconst, Offset] --> [DW_OP_breg, Offset] and [Reg, DW_OP_constu, Offset, DW_OP_plus] --> [DW_OP_breg, Offset] when Offset > INT_MAX. And we avoid to transform [Reg, DW_OP_constu, Offset, DW_OP_minus] --> [DW_OP_breg,-Offset] when Offset > INT_MAX+1. The patch also adjusts DwarfCompileUnit::constructVariableDIEImpl to make sure that "DW_OP_constu, Offset, DW_OP_minus" is used instead of "DW_OP_plus_uconst, Offset" when creating DIExpressions with negative frame index offsets. Notice that this might just be the tip of the iceberg. There are lots of fishy handling related to these constants. I think both DIExpression::appendOffset and DIExpression::extractIfOffset may trigger undefined behavior for certain values. Reviewers: sdesmalen, rnk, JDevlieghere Reviewed By: JDevlieghere Subscribers: jholewinski, aprantl, hiraditya, ychen, uabelho, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67263 llvm-svn: 371304	2019-09-07 11:40:10 +00:00
Teresa Johnson	9c27b59cec	Change TargetLibraryInfo analysis passes to always require Function Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284	2019-09-07 03:09:36 +00:00
Matt Arsenault	cf10372119	GlobalISel: Add G_FMAD instruction llvm-svn: 371254	2019-09-06 20:49:10 +00:00
James Molloy	db2fa06722	[DFAPacketizer] Track resources for packetized instructions This patch allows the DFAPacketizer to be queried after a packet is formed to work out which resources were allocated to the packetized instructions. This is particularly important for targets that do their own bundle packing - it's not sufficient to know simply that instructions can share a packet; which slots are used is also required for encoding. This extends the emitter to emit a side-table containing resource usage diffs for each state transition. The packetizer maintains a set of all possible resource states in its current state. After packetization is complete, all remaining resource states are possible packetization strategies. The sidetable is only ~500K for Hexagon, but the extra tracking is disabled by default (most uses of the packetizer like MachinePipeliner don't care and don't need the extra maintained state). Differential Revision: https://reviews.llvm.org/D66936 llvm-svn: 371198	2019-09-06 12:20:08 +00:00
Jeremy Morse	5d9cd3b4ca	[DebugInfo] LiveDebugValues: explicitly terminate overwritten stack locations If a stack spill location is overwritten by another spill instruction, any variable locations pointing at that slot should be terminated. We cannot rely on spills always being restored to registers or variable locations being moved by a DBG_VALUE: the register allocator is entitled to spill a value and then forget about it when it goes out of liveness. To address this, scan for memory writes to spill locations, even those we don't consider to be normal "spills". isSpillInstruction and isLocationSpill distinguish the two now. After identifying spill overwrites, terminate the open range, and insert a $noreg DBG_VALUE for that variable. Differential Revision: https://reviews.llvm.org/D66941 llvm-svn: 371193	2019-09-06 10:08:22 +00:00
Kang Zhang	f879c68755	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: Fix a bug of not update the jump table and recommit it again. In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 371177	2019-09-06 08:16:18 +00:00
Puyan Lotfi	dc97ca9f25	[MIR] MIRNamer pass for improving MIR test authoring experience. This patch reuses the MIR vreg renamer from the MIRCanonicalizerPass to cleanup names of vregs in a MIR file for MIR test authors. I found it useful when writing a regression test for a globalisel failure I encountered recently and thought it might be useful for other folks as well. Differential Revision: https://reviews.llvm.org/D67209 llvm-svn: 371121	2019-09-05 20:44:33 +00:00
Daniel Sanders	f803237926	[globalisel][knownbits] Account for missing type constraints Now that we look through copies, it's possible to visit registers that have a register class constraint but not a type constraint. Avoid looking through copies when this occurs as the SrcReg won't be able to determine it's bit width or any known bits. Along the same lines, if the initial query is on a register that doesn't have a type constraint then the result is a default-constructed KnownBits, that is, a 1-bit fully-unknown value. llvm-svn: 371116	2019-09-05 20:26:02 +00:00
Jessica Paquette	20e8667098	Recommit "[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls" Recommit basic sibling call lowering (https://reviews.llvm.org/D67189) The issue was that if you have a return type other than void, call lowering will emit COPYs to get the return value after the call. Disallow sibling calls other than ones that return void for now. Also proactively disable swifterror tail calls for now, since there's a similar issue with COPYs there. Update call-translator-tail-call.ll to include test cases for each of these things. llvm-svn: 371114	2019-09-05 20:18:34 +00:00
Eli Friedman	cae1e47f6e	[IfConversion] Fix diamond conversion with unanalyzable branches. The code was incorrectly counting the number of identical instructions, and therefore tried to predicate an instruction which should not have been predicated. This could have various effects: a compiler crash, an assembler failure, a miscompile, or just generating an extra, unnecessary instruction. Instead of depending on TargetInstrInfo::removeBranch, which only works on analyzable branches, just remove all branch instructions. Fixes https://bugs.llvm.org/show_bug.cgi?id=43121 and https://bugs.llvm.org/show_bug.cgi?id=41121 . Differential Revision: https://reviews.llvm.org/D67203 llvm-svn: 371111	2019-09-05 20:02:38 +00:00
Guillaume Chatelet	f9f31ce6a9	[Alignment][NFC] Change internal representation of TargetLowering.h Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67226 llvm-svn: 371082	2019-09-05 15:44:33 +00:00
Simon Pilgrim	071287c5a9	Revert rL370996 from llvm/trunk: [AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 ........ This fails on EXPENSIVE_CHECKS builds due to a -verify-machineinstrs test failure in CodeGen/AArch64/dllimport.ll llvm-svn: 371051	2019-09-05 10:38:39 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
Jessica Paquette	b78324fc40	[AArch64][GlobalISel] Teach AArch64CallLowering to handle basic sibling calls This adds support for basic sibling call lowering in AArch64. The intent here is to only handle tail calls which do not change the ABI (hence, sibling calls.) At this point, it is very restricted. It does not handle - Vararg calls. - Calls with outgoing arguments. - Calls whose calling conventions differ from the caller's calling convention. - Tail/sibling calls with BTI enabled. This patch adds - `AArch64CallLowering::isEligibleForTailCallOptimization`, which is equivalent to the same function in AArch64ISelLowering.cpp (albeit with the restrictions above.) - `mayTailCallThisCC` and `canGuaranteeTCO`, which are identical to those in AArch64ISelLowering.cpp. - `getCallOpcode`, which is exactly what it sounds like. Tail/sibling calls are lowered by checking if they pass target-independent tail call positioning checks, and checking if they satisfy `isEligibleForTailCallOptimization`. If they do, then a tail call instruction is emitted instead of a normal call. If we have a sibling call (which is always the case in this patch), then we do not emit any stack adjustment operations. When we go to lower a return, we check if we've already emitted a tail call. If so, then we skip the return lowering. For testing, this patch - Adds call-translator-tail-call.ll to test which tail calls we currently lower, which ones we don't, and which ones we shouldn't. - Updates branch-target-enforcement-indirect-calls.ll to show that we fall back as expected. Differential Revision: https://reviews.llvm.org/D67189 llvm-svn: 370996	2019-09-04 22:54:52 +00:00
Puyan Lotfi	028061d4eb	[mir-canon][NFC] Move MIR vreg renaming code to separate file for better reuse. Moving MIRCanonicalizerPass vreg renaming code to MIRVRegNamerUtils so that it can be reused in another pass (ie planing to write a standalone mir-namer pass). I'm going to write a mir-namer pass so that next time someone has to author a test in MIR, they can use it to cleanup the naming and make it more readable by having the numbered vregs swapped out with named vregs. Differential Revision: https://reviews.llvm.org/D67114 llvm-svn: 370985	2019-09-04 21:29:10 +00:00
Matt Arsenault	5ff310e298	GlobalISel: Add basic legalization for G_BITREVERSE llvm-svn: 370979	2019-09-04 20:46:15 +00:00
Daniel Sanders	b276a9a51e	[globalisel] Support trivial COPY in GISelKnownBits Summary: Allow GISelKnownBits to look through the trivial case of TargetOpcode::COPY Reviewers: aditya_nandakumar Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67131 llvm-svn: 370955	2019-09-04 18:59:43 +00:00
Philip Reames	3a49ca331f	Update CodeGen to use hasMetadata as appropriate [NFC] My intial grepping for rL370933 missed a directory worth of cases. llvm-svn: 370942	2019-09-04 17:46:55 +00:00
Matt Arsenault	70becc20fa	GlobalISel: Add G_BITREVERSE This is the first failing pattern for AMDGPU and is trivial to handle. llvm-svn: 370927	2019-09-04 17:06:53 +00:00
James Molloy	11f0f7f583	[ModuloSchedule] Fix no-asserts build Apologies, due to a git SNAFU this fix (dump doesn't exist and silence unused variables) stayed in my index rather than applying to rL370893. llvm-svn: 370894	2019-09-04 12:57:23 +00:00
James Molloy	fef9f59055	[ModuloSchedule] Introduce PeelingModuloScheduleExpander This is the beginnings of a reimplementation of ModuloScheduleExpander. It works by generating a single-block correct pipelined kernel and then peeling out the prolog and epilogs. This patch implements kernel generation as well as a validator that will confirm the number of phis added is the same as the ModuloScheduleExpander. Prolog and epilog peeling will come in a different patch. Differential Revision: https://reviews.llvm.org/D67081 llvm-svn: 370893	2019-09-04 12:54:24 +00:00
Jeremy Morse	337a7cb55e	[DebugInfo] LiveDebugValues: locations with different exprs should not be merged When comparing variable locations, LiveDebugValues currently considers only the machine location, ignoring any DIExpression applied to it. This is a problem because that DIExpression can do pretty much anything to the machine location, for example dereferencing it. This patch adds DIExpressions to that comparison; now variables based on the same register/memory-location but with different expressions will compare differently, and be dropped if we attempt to merge them between blocks. This reduces variable coverage-range a little, but only because we were producing broken locations. Differential Revision: https://reviews.llvm.org/D66942 llvm-svn: 370877	2019-09-04 11:09:05 +00:00
Jeremy Morse	c8c5f2a84e	[LiveDebugValues][NFC] Silence an unused variable warning On release builds, 'MI' isn't used by anything (it's already inserted into a block by BuildMI), while on non-release builds it's used by a LLVM_DEBUG statement. Mark as explicitly used to avoid the warning. llvm-svn: 370870	2019-09-04 10:18:03 +00:00
Amara Emerson	5d5150f0b4	[GlobalISel] Fix G_SEXT narrowScalar to bail out of unsupported type combination. Similar to the issue with G_ZEXT that was fixed earlier, this is a quick to fall back if the source type is not exactly half of the dest type. Fixes the clang-cmake-aarch64-lld bot build. llvm-svn: 370847	2019-09-04 07:58:45 +00:00
Amara Emerson	2a2c25ba48	[AArch64][GlobalISel] Legalize 128 bit divisions to libcalls. Now that we have the infrastructure to support s128 types as parameters we can expand these to libcalls. Differential Revision: https://reviews.llvm.org/D66185 llvm-svn: 370823	2019-09-03 21:42:32 +00:00
Amara Emerson	fbaf425b79	[GlobalISel][CallLowering] Add support for splitting types according to calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822	2019-09-03 21:42:28 +00:00
Bjorn Pettersson	b0eb394417	[CodeGen] Use FSHR in DAGTypeLegalizer::ExpandIntRes_MULFIX Summary: Simplify the right shift of the intermediate result (given in four parts) by using funnel shift. There are some impact on lit tests, but that seems to be related to register allocation differences due to how FSHR is expanded on X86 (giving a slightly different operand order for the OR operations compared to the old code). Reviewers: leonardchan, RKSimon, spatel, lebedev.ri Reviewed By: RKSimon Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, pzheng, bevinh, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67036 llvm-svn: 370813	2019-09-03 19:35:07 +00:00
James Molloy	935499579c	[MachinePipeliner] Add a way to unit-test the schedule emitter Emitting a schedule is really hard. There are lots of corner cases to take care of; in fact, of the 60+ SWP-specific testcases in the Hexagon backend most of those are testing codegen rather than the schedule creation itself. One issue is that to test an emission corner case we must craft an input such that the generated schedule uses that corner case; sometimes this is very hard and convolutes testcases. Other times it is impossible but we want to test it anyway. This patch adds a simple test pass that will consume a module containing a loop and generate pipelined code from it. We use post-instr-symbols as a way to annotate instructions with the stage and cycle that we want to schedule them at. We also provide a flag that causes the MachinePipeliner to generate these annotations instead of actually emitting code; this allows us to generate an input testcase with: llc < %s -stop-after=pipeliner -pipeliner-annotate-for-testing -o test.mir And run the emission in isolation with: llc < test.mir -run-pass=modulo-schedule-test llvm-svn: 370705	2019-09-03 08:20:31 +00:00
Craig Topper	9c74c77404	[LegalizeDAG] Pass DAG to two calls to SDNode::dump in debug prints so that they will print target specific nodes correctly. The dump methods can only print target node names correctly if they can get access to the TLI object. llvm-svn: 370694	2019-09-03 02:51:14 +00:00
Robert Lougher	13190c4225	[TargetLowering][PS4] Add sincos(f) lib functions when target is PS4 PS4 supports sincosf and sincos. Adding the library functions enables the sin(f)+cos(f) -> sincos(f) optimization. Differential Revision: https://reviews.llvm.org/D67009 llvm-svn: 370675	2019-09-02 16:53:32 +00:00
Sanjay Patel	4e54cf3e0e	[DAGCombiner] try to form test+set out of shift+mask patterns The motivating bugs are: https://bugs.llvm.org/show_bug.cgi?id=41340 https://bugs.llvm.org/show_bug.cgi?id=42697 As discussed there, we could view this as a failure of IR canonicalization, but then we would need to implement a backend fixup with target overrides to get this right in all cases. Instead, we can just view this as a codegen opportunity. It's not even clear for x86 exactly when we should favor test+set; some CPUs have better theoretical throughput for the ALU ops than bt/test. This patch is made more complicated than I expected because there's an early DAGCombine for 'and' that can change types of the intermediate ops via trunc+anyext. Differential Revision: https://reviews.llvm.org/D66687 llvm-svn: 370668	2019-09-02 14:52:09 +00:00
Jeremy Morse	22493f66f1	[DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations The missing line added by this patch ensures that only spilt variable locations are candidates for being restored from the stack. Otherwise, register or constant-value information can be interpreted as a spill location, through a union. The added regression test replicates a scenario where this occurs: the stack load from [rsp] causes the register-location DBG_VALUE to be "restored" to rsi, when it should be left alone. See PR43058 for details. Un x-fail a test that was suffering from this from a previous patch. Differential Revision: https://reviews.llvm.org/D66895 llvm-svn: 370648	2019-09-02 12:28:36 +00:00
Amara Emerson	453ef4e376	[AArch64][GlobalISel] Fix zext narrowScalar to use the right type when creating the merges. Fixes PR43171. llvm-svn: 370627	2019-09-02 08:18:55 +00:00
Sanjay Patel	c882208367	[DAGCombiner] improve throughput of shift+logic+shift The motivating case for this is a long way from here: https://bugs.llvm.org/show_bug.cgi?id=43146 ...but I think this is where we have to start. We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good. In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral for scalar tests because it can fold shifts into bitwise logic ops. There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Differential Revision: https://reviews.llvm.org/D67021 llvm-svn: 370617	2019-09-01 18:38:15 +00:00
Shiva Chen	adfdcb9c26	[TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs Summary: This fixes the bugzilla id 43183 which triggerd by the following commit: [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall llvm-svn: 370604	2019-09-01 04:52:54 +00:00
Sanjay Patel	9e57b49392	[DAGCombiner] clean up code in visitShiftByConstant() This is not quite NFC because the SDLoc propagation is changed, but there are no regression test diffs from that. llvm-svn: 370587	2019-08-31 15:08:58 +00:00
Amaury Sechet	82825ab882	[DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate. Summary: The combiner transforms (shl X, 1) into (add X, X). Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66882 llvm-svn: 370578	2019-08-31 11:40:02 +00:00
James Molloy	e62c509cd4	[DAGCombiner] Don't create illegal narrow stores Narrowing stores when the target doesn't support the narrow version forces the target to expand into a load-modify-store sequence, which is highly suboptimal. The information narrowing throws away (legality of the inverse transform) is hard to re-analyze. If the target doesn't support a store of the narrow type, don't narrow even in pre-legalize mode. No test as this is DAGCombiner and depends on target bits. llvm-svn: 370576	2019-08-31 10:46:16 +00:00
Bjorn Pettersson	e27c74abb6	[CodeGen] Refactor DAGTypeLegalizer::ExpandIntRes_MULFIX. NFC Restructured the code a little bit in preparation for adding UMULFIXSAT. I think it will be easier to understand the code if not interleaving the codegen for signed/unsigned/saturated cases that much. llvm-svn: 370569	2019-08-31 09:28:50 +00:00
James Molloy	790a779f06	[MachinePipeliner] Separate schedule emission, NFC This is the first stage in refactoring the pipeliner and making it more accessible for backends to override and control. This separates the logic and state required to emit a scheudule from the logic that computes and validates a schedule. This will enable (a) new schedule emitters and (b) new modulo scheduling implementations to coexist. NFC. Differential Revision: https://reviews.llvm.org/D67006 llvm-svn: 370500	2019-08-30 18:49:50 +00:00
Simon Pilgrim	3be7081aa1	[DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI. SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent. llvm-svn: 370498	2019-08-30 18:19:02 +00:00
Simon Pilgrim	2d1e0899e9	[TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node. Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case. Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests llvm-svn: 370497	2019-08-30 17:58:55 +00:00
Matt Arsenault	466ec2d552	GlobalISel: Fix missing pass dependency llvm-svn: 370496	2019-08-30 17:41:58 +00:00
Craig Topper	30ddd2ab6c	[ValueTypes] Add v16f16 and v32f16 to EVT::getEVTString and Tablegen's getEnumName Missed these when I hadded the enum entries llvm-svn: 370494	2019-08-30 17:34:29 +00:00
Simon Pilgrim	ab8cb1a3c5	[DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI. llvm-svn: 370489	2019-08-30 17:21:20 +00:00
Simon Pilgrim	c2fed1dc8a	[DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI. llvm-svn: 370478	2019-08-30 15:17:37 +00:00
Simon Pilgrim	3367669668	[DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts. llvm-svn: 370471	2019-08-30 13:30:37 +00:00
Simon Pilgrim	8e1989e79a	[DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types. This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total. llvm-svn: 370468	2019-08-30 12:22:06 +00:00
Bjorn Pettersson	227145924a	[CodeGen] Introduce MachineBasicBlock::replacePhiUsesWith helper and use it. NFC Summary: Found a couple of places in the code where all the PHI nodes of a MBB is updated, replacing references to one MBB by reference to another MBB instead. This patch simply refactors the code to use a common helper (MachineBasicBlock::replacePhiUsesWith) for such PHI node updates. Reviewers: t.p.northover, arsenm, uabelho Subscribers: wdng, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66750 llvm-svn: 370463	2019-08-30 11:23:10 +00:00
Simon Pilgrim	7cbf823f93	[DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros Return a proper zero vector, just in case some elements are undef. Noticed by inspection after dealing with a similar issue in PR43159. llvm-svn: 370460	2019-08-30 10:42:14 +00:00
David Stenberg	b35d4699d0	[LiveDebugValues] Insert entry values after bundles Summary: Change LiveDebugValues so that it inserts entry values after the bundle which contains the clobbering instruction. Previously it would insert the debug value after the bundle head using insertAfter(), breaking the bundle. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D66888 llvm-svn: 370448	2019-08-30 09:06:50 +00:00
Petar Avramovic	6412b56513	[MIPS GlobalISel] Lower fptoui Add lower for G_FPTOUI. Algorithm is similar to the SDAG version in TargetLowering::expandFP_TO_UINT. Lower G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D66929 llvm-svn: 370431	2019-08-30 05:44:02 +00:00
Dan Gohman	8cfeeaf9de	[CodeGen] Fix lowering for returning the result of an extractvalue When the number of return values exceeds the number of registers available, SelectionDAGBuilder::visitRet transforms a function's return to use a pointer to a buffer to hold return values. When the returned value is an operator such as extractvalue, the value may have a non-zero result number. Add that number to the indexing when obtaining the values to store. This fixes https://bugs.llvm.org/show_bug.cgi?id=43132. Differential Revision: https://reviews.llvm.org/D66978 llvm-svn: 370430	2019-08-30 04:33:22 +00:00
Jordan Rupprecht	f9f81289e6	Revert [MBP] Disable aggressive loop rotate in plain mode This reverts r369664 (git commit `51f48295cb`) It causes many benchmark regressions, internally and in llvm's benchmark suite. llvm-svn: 370398	2019-08-29 19:03:58 +00:00
Matt Arsenault	093ebf9275	GlobalISel: Don't compute known bits for non-integral GEP llvm-svn: 370392	2019-08-29 17:55:05 +00:00
Matt Arsenault	b2b9a23758	GlobalISel: Add maskedValueIsZero and signBitIsZero to known bits I dropped the DemandedElts since it seems to be missing from some of the new interfaces, but not others. llvm-svn: 370389	2019-08-29 17:24:36 +00:00
Matt Arsenault	caff0a88dd	GlobalISel: Add known bits to InstructionSelector AMDGPU uses this for some addressing mode selection patterns. The analysis run itself doesn't do anything so it seems easier to just always require this than adding a way to opt in. llvm-svn: 370388	2019-08-29 17:24:32 +00:00
Simon Pilgrim	ea67741899	[DAGCombine] Fix shadow variable warnings. NFCI. llvm-svn: 370365	2019-08-29 14:34:07 +00:00
Jeremy Morse	ca0e4b3689	[DebugInfo] LiveDebugValues: correctly discriminate kinds of variable locations The missing line added by this patch ensures that only spilt variable locations are candidates for being restored from the stack. Otherwise, register or constant-value information can be interpreted as a spill location, through a union. The added regression test replicates a scenario where this occurs: the stack load from [rsp] causes the register-location DBG_VALUE to be "restored" to rsi, when it should be left alone. See PR43058 for details. Un x-fail a test that was suffering from this from a previous patch. Differential Revision: https://reviews.llvm.org/D66895 llvm-svn: 370334	2019-08-29 11:20:54 +00:00
Simon Pilgrim	6c2fc64edc	Fix signed/unsigned comparison warning. NFCI. llvm-svn: 370333	2019-08-29 11:18:53 +00:00
Simon Pilgrim	27f43e6b1a	Fix shadow variable warning. NFCI. llvm-svn: 370332	2019-08-29 11:16:32 +00:00
Jeremy Morse	313d2ce999	[DebugInfo] LiveDebugValues should always revisit backedges if it skips them The "join" method in LiveDebugValues does not attempt to join unseen predecessor blocks if their out-locations aren't yet initialized, instead the block should be re-visited later to see if any locations have changed validity. However, because the set of blocks were all being "process"'d once before "join" saw them, that logic in "join" was actually ignoring legitimate out-locations on the first pass through. This meant that some invalidated locations were not removed from the head of loops, allowing illegal locations to persist. Fix this by removing the run of "process" before the main join/process loop in ExtendRanges. Now the unseen predecessors that "join" skips truly are uninitialized, and we come back to the block at a later time to re-run "join", see the @baz function added. This also fixes another fault where stack/register transfers in the entry block (or any other before-any-loop-block) had their tranfers initially ignored, and were then never revisited. The MIR test added tests for this behaviour. XFail a test that exposes another bug; a fix for this is coming in D66895. Differential Revision: https://reviews.llvm.org/D66663 llvm-svn: 370328	2019-08-29 10:53:29 +00:00
Amaury Sechet	8365e42010	[DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y) Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66718 llvm-svn: 370326	2019-08-29 10:35:51 +00:00
Simon Pilgrim	dfb2a19ac2	LegalizeSetCCCondCode - Reduce scope of NeedSwap to fix cppcheck warning. NFCI. No need for this to be defined outside the only switch case its used in. llvm-svn: 370320	2019-08-29 10:11:34 +00:00
Craig Topper	1aadf6f39f	[X86] Make inline assembly 'x' and 'v' constraints work for f128. Including a type legalizer fix to make bitcast operand promotion work correctly when getSoftenedFloat returns f128 instead of i128. Fixes PR43157 llvm-svn: 370293	2019-08-29 05:13:56 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Kevin P. Neal	ddf13c00ed	[FPEnv] Add fptosi and fptoui constrained intrinsics. This implements constrained floating point intrinsics for FP to signed and unsigned integers. Quoting from D32319: The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed. Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton Approved by: Craig Topper Differential Revision: http://reviews.llvm.org/D63782 llvm-svn: 370228	2019-08-28 16:33:36 +00:00
Jessica Paquette	af0bd41e06	[AArch64][GlobalISel] Fall back when translating musttail calls These are currently translated as normal functions calls in AArch64. Until we have proper tail call lowering, we shouldn't translate these. Differential Revision: https://reviews.llvm.org/D66842 llvm-svn: 370225	2019-08-28 16:19:01 +00:00
Ryan Taylor	3b1459ed7c	[AMDGPU] Adjust number of SGPRs available in Calling Convention This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215	2019-08-28 15:00:45 +00:00
Simon Pilgrim	14e07d7f4b	[DAGCombine] Fix cppcheck shadow variable warning. NFCI. We already have an outer Ops variable. llvm-svn: 370197	2019-08-28 12:48:41 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
Simon Pilgrim	c5b38e2869	[DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI. These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors. llvm-svn: 370189	2019-08-28 11:50:36 +00:00
Amara Emerson	e20b91c265	[GlobalISel] Replace hard coded dynamic alloca handling with G_DYN_STACKALLOC. This change moves the actual stack pointer manipulation into the legalizer, available to targets via lower(). The codegen is slightly different because we're using explicit masks instead of G_PTRMASK, and using G_SUB rather than adding a negative amount via G_GEP. Differential Revision: https://reviews.llvm.org/D66678 llvm-svn: 370104	2019-08-27 19:54:27 +00:00
Matt Arsenault	2910184936	DAG: computeNumSignBits for MUL Copied directly from the IR version. Most of the testcases I've added for this are somewhat problematic because they really end up testing the yet to be implemented version for MUL_I24/MUL_U24. llvm-svn: 370099	2019-08-27 19:05:33 +00:00
Sanjay Patel	b516f1afdd	[DAGCombiner] cancel fnegs from multiplied operands of FMA (-X) * (-Y) + Z --> X * Y + Z This is a missing optimization that shows up as a potential regression in D66050, so we should solve it first. We appear to be partly missing this fold in IR as well. We do handle the simpler case already: (-X) * (-Y) --> X * Y And it might be beneficial to make the constraint less conservative (eg, if both operands are cheap, but not necessarily cheaper), but that causes infinite looping for the existing fmul transform. Differential Revision: https://reviews.llvm.org/D66755 llvm-svn: 370071	2019-08-27 15:17:46 +00:00
Jinsong Ji	7f536bcf22	Revert "[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks" This reverts commit `b3d258fc44`. @skatkov is reporting crash in D63972#1646303 Contacted @ZhangKang, and revert the commit on behalf of him. llvm-svn: 370069	2019-08-27 14:59:08 +00:00
Petar Avramovic	a393238422	[GlobalISel] Factor narrowScalar for G_ASHR and G_LSHR. NFC Main difference is in the way Hi for Long shift (HiL) is made. G_LSHR fills HiL with zeros, while G_ASHR fills HiL with sign bit value. Differential Revision: https://reviews.llvm.org/D66589 llvm-svn: 370064	2019-08-27 14:33:05 +00:00
Petar Avramovic	d568ed40e0	[GlobalISel] Fix narrowScalar for shifts to match algorithm from SDAG Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS to match names of surrounding variables. Differential Revision: https://reviews.llvm.org/D66587 llvm-svn: 370062	2019-08-27 14:22:32 +00:00
Amaury Sechet	f28dee2cff	[DAGCombiner] Add node to the worklist in topological order in parallelizeChainedStores Summary: As per title. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66659 llvm-svn: 370056	2019-08-27 13:27:57 +00:00
Amaury Sechet	a1e5ef3fd4	[DAGCombiner] Add node to the worklist in topological order after relegalization. Summary: As per title. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66702 llvm-svn: 370040	2019-08-27 11:06:09 +00:00
Craig Topper	243ede9970	[SelectionDAGBuilder] Hide existence of ConstantDataVector vector from visitGetElementPtr. ConstantDataVector is a specialized verison of ConstantVector that stores data in a packed array of bits instead of as individual pointers to other Constants. But we really shouldn't expose that if we can void it. And we should handle regular ConstantVector equally well. This removes a dyn_cast to ConstantDataVector and just calls getSplatValue directly on a Constant* if the type is a vector. llvm-svn: 370018	2019-08-27 06:39:50 +00:00
Craig Topper	4a3f62f9fd	[SelectionDAGBuilder] Fix typo in comment. NFC llvm-svn: 370017	2019-08-27 06:38:51 +00:00
Richard Trieu	58e67b8aa3	Revert r369927 - [DAGCombiner] Remove a bunch of redundant AddToWorklist calls. This change causes instrumented builds of Clang to have a fatal error in the backend. https://reviews.llvm.org/D66537 has the details. llvm-svn: 370006	2019-08-27 02:04:11 +00:00
Shafik Yaghmour	90e00bd8f3	Debug Info: Support for DW_AT_export_symbols for anonymous structs This implements the DWARF 5 feature described in: http://dwarfstd.org/ShowIssue.php?issue=141212.1 To support recognizing anonymous structs: struct A { struct { // Anonymous struct int y; }; } a This patch adds support for the new flag in constructTypeDIE(...) and test to verify this change. Differential Revision: https://reviews.llvm.org/D66605 llvm-svn: 369969	2019-08-26 20:59:44 +00:00
Vedant Kumar	58a0714885	[DWARF] Rename getDwarf5OrGNUCallSite{Attr,Tag}, NFC llvm-svn: 369967	2019-08-26 20:53:34 +00:00
Vedant Kumar	533dd0214c	[DWARF] Pick the DWARF5 OP_entry_value opcode on Darwin Use the GNU extension for OP_entry_value consistently (i.e. whenever GNU extensions are used for TAG_call_site). llvm-svn: 369966	2019-08-26 20:53:12 +00:00
Craig Topper	846429de74	[DAGCombiner][X86] Teach SimplifyVBinOp to fold VBinOp (concat X, undef/constant), (concat Y, undef/constant) -> concat (VBinOp X, Y), VecC This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it. Differential Revision: https://reviews.llvm.org/D66680 llvm-svn: 369937	2019-08-26 17:59:11 +00:00
Amaury Sechet	b7075e40f3	[DAGCombiner] Remove a bunch of redundant AddToWorklist calls. Summary: This comes as a first step toward processing the DAG nodes in topological orders. Doing so ensure that arguments of a node are combined before the node itself is combined, which exposes ore opportunities for optimization and/or reduce the amount of patterns a node has to match for. DAGCombiner adding nodes to the worklist is various places causes the nodes to be in a different order from what is expected. In addition, this is reduant because these nodes end up being added to the worklist anyways due to the machinery at line 1621. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66537 llvm-svn: 369927	2019-08-26 17:02:12 +00:00
Craig Topper	b8b90ac1c5	[X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef Summary: Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not. I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66504 llvm-svn: 369872	2019-08-25 17:59:49 +00:00
Xing Xue	ef039a3ccd	[PowerPC][AIX] Adds support for writing the .data section in assembly files Summary: Adds support for generating the .data section in assembly files for global variables with a non-zero initialization. The support for writing the .data section in XCOFF object files will be added in a follow-on patch. Any relocations are not included in this patch. Reviewers: hubert.reinterpretcast, sfertile, jasonliu, daltenty, Xiangling_L Reviewed by: hubert.reinterpretcast Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, wuzish, shchenz, DiggerLin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66154 llvm-svn: 369869	2019-08-25 15:17:25 +00:00
Nikita Popov	aa71c977ba	[SDAG] Fold umul_lohi with 0 or 1 multiplicand These can turn up during multiplication legalization. In principle these should also apply to smul_lohi, but I wasn't able to figure out how to produce those with the necessary operands. Differential Revision: https://reviews.llvm.org/D66380 llvm-svn: 369864	2019-08-25 08:04:22 +00:00
Nilanjana Basu	7da6f432d8	Removing block comments from CodeView records in assembly files & related code cleanup llvm-svn: 369860	2019-08-25 01:09:11 +00:00
Amara Emerson	3f6dd0c588	[GlobalISel] Introduce a G_DYN_STACKALLOC opcode to represent dynamic allocas. This just adds the opcode and verifier, it will be used to replace existing dynamic alloca handling in a subsequent patch. Differential Revision: https://reviews.llvm.org/D66677 llvm-svn: 369833	2019-08-24 02:25:56 +00:00
Guillaume Chatelet	b7be5b9095	[LLVM][NFC] remove unused fields Summary: Here is the commit introducing the fields https://github.com/llvm/llvm-project/commit/cf6749e4c091 It dates back from 2006 and was used by AArch64 backend. There is no more reference to these fields in the whole codebase so I think it's fine. Reviewers: courbet Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66683 llvm-svn: 369810	2019-08-23 20:49:06 +00:00
Volkan Keles	277631e3b8	[GlobalISel] Legalizer: Retry combining illegal artifacts as long as there new artifacts Summary: Currently, Legalizer aborts if it’s unable to legalize artifacts. However, it’s possible to combine them after processing the rest of the instruction because the legalization is likely to generate more artifacts that allow ArtifactCombiner to combine away them. Instead, move illegal artifacts to another list called RetryList and wait until all of the instruction in InstList are legalized. After that, check if there is any new artifacts and try to combine them again if that’s the case. If not, abort. The idea is similar to D59339, but the approach is a bit different. This patch fixes the issue described above, but the legalizer still may be unable to handle some cases depending on when to legalize artifacts. So, in the long run, we probably need a different legalization strategy that handles this dependency in a better way. Reviewers: dsanders, aditya_nandakumar, qcolombet, arsenm, aemerson, paquette Reviewed By: dsanders Subscribers: jvesely, wdng, nhaehnle, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65894 llvm-svn: 369805	2019-08-23 20:30:35 +00:00
Benjamin Kramer	dc5f805d31	Do a sweep of symbol internalization. NFC. llvm-svn: 369803	2019-08-23 19:59:23 +00:00
Matt Arsenault	2fd1afe8ef	RegScavenger: Use Register llvm-svn: 369794	2019-08-23 18:25:34 +00:00
Craig Topper	e7211bb567	[SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression Patch showing the effect of enabling bool vector oversimplification. Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify: insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i Preventing the removal of the AND to clear the upper bits of result Differential Revision: https://reviews.llvm.org/D53022 llvm-svn: 369780	2019-08-23 17:14:58 +00:00
Jeremy Morse	0ae5498146	[DebugInfo] Remove invalidated locations during LiveDebugValues LiveDebugValues gives variable locations to blocks, but it should also take away. There are various circumstances where a variable location is known until a loop backedge with a different location is detected. In those circumstances, where there's no agreement on the variable location, it should be undef / removed, otherwise we end up picking a location that's valid on some loop iterations but not others. However, LiveDebugValues doesn't currently do this, see the new testcase attached. Without this patch, the location of !3 is assumed to be %bar through the loop. Once it's added to the In-Locations list, it's never removed, even though the later dbg.value(0... of !3 makes the location un-knowable. This patch checks during block-location-joining to see whether any previously-present locations have been removed in a predecessor. If they have, the live-ins have changed, and the block needs reprocessing. Similarly, in transferTerminator, assign rather than \|= the Out-Locations after processing a block, as we may have deleted some previously valid locations. This will mean that LiveDebugValues performs more propagation -- but that's necessary for it being correct. Differential Revision: https://reviews.llvm.org/D66599 llvm-svn: 369778	2019-08-23 16:33:42 +00:00
Simon Pilgrim	04906ef1f2	[DAGCombine] GetNegatedExpression - add FMA\FMAD support If the accumulator and either of the multiply operands are negatable then we can we negate the entire expression. Differential Revision: https://reviews.llvm.org/D63141 llvm-svn: 369746	2019-08-23 10:49:46 +00:00
Peter Collingbourne	2452d7030b	IR. Change strip* family of functions to not look through aliases. I noticed another instance of the issue where references to aliases were being replaced with aliasees, this time in InstCombine. In the instance that I saw it turned out to be only a QoI issue (a symbol ended up being missing from the symbol table due to the last reference to the alias being removed, preventing HWASAN from symbolizing a global reference), but it could easily have manifested as incorrect behaviour. Since this is the third such issue encountered (previously: D65118, D65314) it seems to be time to address this common error/QoI issue once and for all and make the strip* family of functions not look through aliases. Includes a test for the specific issue that I saw, but no doubt there are other similar bugs fixed here. As with D65118 this has been tested to make sure that the optimization isn't load bearing. I built Clang, Chromium for Linux, Android and Windows as well as the test-suite and there were no size regressions. Differential Revision: https://reviews.llvm.org/D66606 llvm-svn: 369697	2019-08-22 19:56:14 +00:00
Matt Arsenault	fba82858f2	GlobalISel: Don't create G_UADDE with constant false carry in The x86 tests are now broken (in paticular add-scalar.ll now hits the DAG fallback) due to not handling G_UADDO. The DAG x86 backend has a custom lowering for this, so that will need to be implemented. llvm-svn: 369673	2019-08-22 17:29:17 +00:00
Francis Visoiu Mistrih	5b5ee61b5f	[MachO][TLOF] Use hasLocalLinkage to determine if indirect symbol is local Local symbols in the indirect symbol table contain the value `INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must contain the address of the target. In r349060, I added support for local symbols in the indirect symbol table, which was checking if the symbol `isDefined` && `!isExternal` to determine if the symbol is local or not. It turns out that `isDefined` will return false if the user of the symbol comes before its definition, and we'll again generate .long 0 which will be the symbol at the adress 0x0. Instead of doing that, use GlobalValue::hasLocalLinkage() to check if the symbol is local. Differential Revision: https://reviews.llvm.org/D66563 llvm-svn: 369671	2019-08-22 16:59:00 +00:00
Guozhi Wei	51f48295cb	[MBP] Disable aggressive loop rotate in plain mode Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 369664	2019-08-22 16:21:32 +00:00
Amaury Sechet	95cf66de7c	[DAGCombiner] Remove explicit call to AddToWorklist in sqrt and reciprocal computations Summary: These nodes end up being processed regardless due to DAGCombiner ensuring arguments are processed. This changes the order in which nodes are processed, which fixes an issue on PowerPC. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri, mcberg2017, stefanp, hfinkel Subscribers: nemanjai, MaskRay, jsji, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66548 llvm-svn: 369662	2019-08-22 15:35:45 +00:00
Jinsong Ji	545e993b8b	[SlotIndexes] Add print-slotindexes to disable printing slotindexes Summary: When we print the IR with --print-after/before-*, SlotIndexes will be printed whenever available (We haven't freed it). This introduces some noises when we try to compare the IR among different optimizations. eg: -print-before=machine-cp will print SlotIndexes for 1st machine-cp pass, but NOT for 2nd machine-cp; -print-after=machine-cp will NOT print SlotIndexes for both machine-cp passes. So SlotIndexes in 1st pass introduce noises when differing these IRs. This patch introduces an option to hide indexes. Reviewers: stoklund, thegameg, qcolombet Reviewed By: thegameg Subscribers: hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66500 llvm-svn: 369650	2019-08-22 13:44:47 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Fangrui Song	246750c2a9	[COFF] Fix section name for constants larger than 64 bits on Windows APIntToHexString returns wrong value ("0000000000000000ffffffffffffffff") for integer larger than 64 bits, and thus TargetLoweringObjectFileCOFF::getSectionForConstant returns same section name for all numbers larger than 64 bits. This patch tries to fix it. Differential Revision: https://reviews.llvm.org/D66458 Patch by Senran Zhang llvm-svn: 369610	2019-08-22 01:48:34 +00:00
Craig Topper	3f59bfd5be	[MVT] Add v16f16 and v32f16 vectors. I might look at improving PR43065 which will require being able to mark a 256 and 512 bit vector of f16 as Legal. Differential Revision: https://reviews.llvm.org/D66515 llvm-svn: 369565	2019-08-21 19:14:48 +00:00
Amaury Sechet	c0f190a048	[DAGCombiner] Remove mostly redundant calls to AddToWorklist Summary: These calls change the order in which some nodes are processed and so have an effect on codegen. The change in fixup-bw-copy.ll is due to (and (load anyext)) gets transformed into (load zext) while previously the and was removed by SimplifyDemandedBits, so the (load anyext) remained. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66543 llvm-svn: 369561	2019-08-21 18:51:08 +00:00
Matt Arsenault	954a012b4c	GlobalISel: Implement moreElementsVector for G_UNMERGE_VALUES sources This is necessary for handling <3 x s16> on AMDGPU, assuming this should be handled as 2 separate legalization actions. The alternative would be for fewerElementsVector to handle 3->2. llvm-svn: 369547	2019-08-21 16:59:10 +00:00
Nilanjana Basu	ac3851c434	Improving CodeView debug info type record's inline comments llvm-svn: 369533	2019-08-21 15:19:58 +00:00
Alexander Timofeev	78347c979e	[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions Differential Revision: https://reviews.llvm.org/D63731 Reviewers: qcolombet, rampitec llvm-svn: 369532	2019-08-21 15:15:04 +00:00
Guillaume Chatelet	1c18a9cb9e	[LLVM][Alignment] Introduce Alignment In MachineFrameInfo Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: jfb Subscribers: hiraditya, dexonsmith, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65800 llvm-svn: 369531	2019-08-21 14:29:30 +00:00
Amaury Sechet	045f33aec9	[DAGCombiner] Various nits. NFC llvm-svn: 369520	2019-08-21 12:01:37 +00:00
Petar Avramovic	5b4c5c2c54	[MIPS GlobalISel] NarrowScalar G_TRUNC Add NarrowScalar for G_TRUNC when NarrowTy is half the size of source. NarrowScalar G_TRUNC to s32 for MIPS32. Differential Revision: https://reviews.llvm.org/D66202 llvm-svn: 369509	2019-08-21 09:26:39 +00:00
Jeremy Morse	67443c3c6e	[DebugInfo] Avoid dropping location info across block boundaries LiveDebugValues propagates variable locations between blocks by creating new DBG_VALUE insts in the successors, then interpreting them when it passes back through the block at a later time. However, this flushes out any extra information about the location that LiveDebugValues holds: for example, connections between variable locations such as discussed in D65368. And as reported in PR42772 this causes us to lose track of the fact that a spill-location is actually a spill, not a register location. This patch fixes that by deferring the creation of propagated DBG_VALUEs until after propagation has completed: instead location propagation occurs only by sharing location ID numbers between blocks. Differential Revision: https://reviews.llvm.org/D66412 llvm-svn: 369508	2019-08-21 09:22:31 +00:00
Amara Emerson	56606a4db3	[AArch64][GlobalISel] Add support for narrowScalar of G_ZEXT We do this by merging the source with the high bits set to 0. Differential Revision: https://reviews.llvm.org/D66181 llvm-svn: 369480	2019-08-21 00:12:37 +00:00
Craig Topper	ba375263e8	[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef) I also had to add a new combine to X86's combineExtractSubvector to prevent a regression. This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation. I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that. Differential Revision: https://reviews.llvm.org/D66456 llvm-svn: 369459	2019-08-20 22:12:50 +00:00
Sean Fertile	1e46d4cec5	Adds support for writing the .bss section for XCOFF object files. Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target object writer. Also adds a class to represent the top-level sections, which we materialize in the ObjectWriter. executePostLayoutBinding will map all csects into the appropriate container depending on its storage mapping class, and map all symbols into their containing csect. Once all symbols have been processed we - Assign addresses and symbol table indices. - Calaculte section sizes. - Build the section header table. - Assign the sections raw-pointer value for non-virtual sections. Since the .bss section is virtual, writing the header table is enough to add support. Writing of a sections raw data, or of any relocations is not included in this patch. Testing is done by dumping the section header table, but it needs to be extended to include dumping the symbol table once readobj support for dumping auxiallary entries lands. Differential Revision: https://reviews.llvm.org/D65159 llvm-svn: 369454	2019-08-20 22:03:18 +00:00
Aditya Nandakumar	08bd080872	[GlobalISel] Handle multiple registers in dbg.value intrinsic https://reviews.llvm.org/D66077 The value passed into dbg.value may relate to multiple registers, each of which need a DBG_VALUE. This fix calls MIRBuilder.buildDirectDbgValue for each register. Without this, IR passed in from flang-compiler/flang may fail an assertion in getOrCreateVReg. Patch by : peterwaller-arm. llvm-svn: 369403	2019-08-20 16:28:37 +00:00
Thomas Raoux	be699bf389	[CodeGen] Add a pass to do block predication on SSA machine IR. For targets requiring aggressive scheduling and/or software pipeline we need to apply predication before preRA scheduling. This adds a pass re-using the early if-cvt infrastructure but generating predicated instructions instead of speculatively executing instructions. It allows doing if conversion on blocks containing instructions with side-effects. The pass re-use the target hook from postRA if-conversion to let the target decide on the heuristic to apply. Differential Revision: https://reviews.llvm.org/D66190 llvm-svn: 369395	2019-08-20 15:54:59 +00:00
Karl-Johan Karlsson	40da6be2bd	[AsmPrinter] Remove const qualifier from EmitBasicBlockStart. Overriders may want to modify state in it. AMDGPU wants to, but has to make its members mutable in order to do so. Besides, EmitBasicBlockEnd is not const, so why should Start be? Patch by Bevin Hansson. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D66341 llvm-svn: 369325	2019-08-20 05:13:57 +00:00
Vyacheslav Zakharin	f7229ac7d8	Fixed placement of llvm.global_dtors on Windows. Differential revision: https://reviews.llvm.org/D66373 llvm-svn: 369299	2019-08-19 21:07:03 +00:00
Craig Topper	93c2787193	[CGP] Remove ModifiedDT from the makeBitReverse loop I don't think anything in this loop modifies the control flow and we don't restart any iteration after setting the flag. This code was added in http://reviews.llvm.org/D16893 but looking at the test case added there the code that caused the dominator tree to change was merging blocks with their predecessor not the bitreverse optimization. Differential Revision: https://reviews.llvm.org/D66366 llvm-svn: 369283	2019-08-19 18:02:24 +00:00
Roman Lebedev	edfaee0811	[TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling Summary: The general fold is only valid for positive divisors. Which effectively means, it is invalid for `INT_MIN` divisors, and we currently bailout if we see them. But that is too strict, we can just fix-up the results. For that, let's do a second computation 'in parallel': ``` Name: srem -> and Pre: isPowerOf2(C) %o = srem i8 %X, C %r = icmp eq %o, 0 => %n = and i8 %X, C-1 %r = icmp eq %n, 0 ``` https://rise4fun.com/Alive/Sup And then just blend results: if the divisor was `INT_MIN`, pick the value we got via bit-test, else pick the value from general fold. There's interesting observation - `ISD::ROTR` is set to `LegalizeAction::Expand` before AVX512, so we should not treat `INT_MIN` divisor as even; and as it can be seen while `@test_srem_odd_even_one` improves on all run-lines, `@test_srem_odd_even_INT_MIN` only improves for AVX512. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66300 llvm-svn: 369268	2019-08-19 15:01:42 +00:00
Jinsong Ji	0776da5236	[PeepholeOptimizer] Don't assume bitcast def always has input Summary: If we have a MI marked with bitcast bits, but without input operands, PeepholeOptimizer might crash with assert. eg: If we apply the changes in PPCInstrVSX.td as in this patch: [(set v4i32:$XT, (bitconvert (v16i8 immAllOnesV)))]>; We will get assert in PeepholeOptimizer. ``` llvm-lit llvm-project/llvm/test/CodeGen/PowerPC/build-vector-tests.ll -v llvm-project/llvm/include/llvm/CodeGen/MachineInstr.h:417: const llvm::MachineOperand &llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i < getNumOperands() && "getOperand() out of range!"' failed. ``` The fix is to abort if we found out of bound access. Reviewers: qcolombet, MatzeB, hfinkel, arsenm Reviewed By: qcolombet Subscribers: wdng, arsenm, steven.zhang, wuzish, nemanjai, hiraditya, kbarton, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65542 llvm-svn: 369261	2019-08-19 14:19:04 +00:00
David Stenberg	88df53e6ea	[DebugInfo] Allow bundled calls in the MIR's call site info Summary: Extend the MIR parser and writer so that the call site information can refer to calls that are bundled. Reviewers: aprantl, asowda, NikolaPrica, djtodoro, ivanbaev, vsk Reviewed By: aprantl Subscribers: arsenm, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D66145 llvm-svn: 369256	2019-08-19 12:41:22 +00:00
Jeremy Morse	176bbd5cde	[DebugInfo] Make postra sinking of DBG_VALUEs subregister-safe Currently the machine instruction sinker identifies DBG_VALUE insts that also need to sink by comparing register numbers. Unfortunately this isn't safe, because (after register allocation) a DBG_VALUE may read a register that aliases what's being sunk. To fix this, identify the DBG_VALUEs that need to sink by recording & examining their register units. Register units gives us the following guarantee: "Two registers overlap if and only if they have a common register unit" [MCRegisterInfo.h] Thus we can always identify aliasing DBG_VALUEs if the set of register units read by the DBG_VALUE, and the register units of the instruction being sunk, intersect. (MachineSink already uses classes like "LiveRegUnits" for determining sinking validity anyway). The test added checks for super and subregister DBG_VALUE reads of a sunk copy being sunk as well. Differential Revision: https://reviews.llvm.org/D58191 llvm-svn: 369247	2019-08-19 09:53:07 +00:00
Craig Topper	74168ded03	[TargetLowering] Teach computeRegisterProperties to only widen v3i16/v3f16 vectors to the next power of 2 type if that's legal. These were recently made simple types. This restores their behavior back to something like their EVT legalization. We might be able to fix the code in type legalization where the assert was failing, but I didn't investigate too much as I had already looked at the computeRegisterProperties code during the review for v3i16/v3f16. Most of the test changes restore the X86 codegen back to what it looked like before the recent change. The test case in vec_setcc.ll and is a reduced version of the reproducer from the fuzzer. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16490 llvm-svn: 369205	2019-08-18 06:28:06 +00:00
Craig Topper	f43106e341	[SelectionDAG] Add a node creation debug message to getMachineNode. llvm-svn: 369204	2019-08-18 06:28:00 +00:00
Kang Zhang	b3d258fc44	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: Fix a bug of preducessors. In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 369191	2019-08-17 14:37:05 +00:00
Sanjay Patel	acceedb15f	[CodeGenPrepare] Fix use-after-free If OptimizeExtractBits() encountered a shift instruction with no operands at all, it would erase the instruction, but still return false. This previously didn’t matter because its caller would always return after processing the instruction, but https://reviews.llvm.org/D63233 changed the function’s caller to fall through if it returned false, which would then cause a use-after-free detectable by ASAN. This change makes OptimizeExtractBits return true if it removes a shift instruction with no users, terminating processing of the instruction. Patch by: @brentdax (Brent Royal-Gordon) Differential Revision: https://reviews.llvm.org/D66330 llvm-svn: 369168	2019-08-16 23:10:34 +00:00
Evgeniy Stepanov	187c63f145	Escape % in printf format string. Fixes branch-relax-block-size.mir on the ASan builder. llvm-svn: 369138	2019-08-16 18:23:54 +00:00
Amara Emerson	c809230a69	[AArch64][GlobalISel] Lower G_SHUFFLE_VECTOR with 1 elt src and 1 elt mask. Again, it's weird that these are allowed. Since lowering support was added in r368709 we started crashing on compiling the neon intrinsics test in the test suite. This fixes the lowering to fold the 1 elt src/mask case into copies. llvm-svn: 369135	2019-08-16 18:06:53 +00:00
Guozhi Wei	e03f6a1631	[CodeGen/Analysis] Intrinsic llvm.assume should not block tail call optimization In function Analysis.cpp:isInTailCallPosition, instructions between call and ret are checked to see if they block tail call optimization. If an instruction is an intrinsic call, only llvm.lifetime_end is allowed and other intrinsic functions block tail call. When compiling tcmalloc, we found llvm.assume between a hot function call and ret, it blocks the optimization. But llvm.assume doesn't generate instructions, it should not block tail call. Differential Revision: https://reviews.llvm.org/D66096 llvm-svn: 369125	2019-08-16 16:26:12 +00:00
Florian Hahn	403e85cbc5	Revert [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks This reverts r368997 (git commit `2a903c0b67`) It looks like this commit adds invalid predecessors to MBBs. The example below fails the verifier after MachineBlockPlacement (run llc -verify-machineinstrs): @global.4 = external constant i8* declare i32 @zot(...) define i16* @snork.67() personality i8* bitcast (i32 (...)* @zot to i8) { bb: invoke void undef() to label %bb5 unwind label %bb4 bb4: ; preds = %bb %tmp = landingpad { i8, i32 } catch i8* null unreachable bb5: ; preds = %bb %tmp6 = load i32, i32* null, align 4 %tmp7 = icmp eq i32 %tmp6, 0 br i1 %tmp7, label %bb14, label %bb8 bb8: ; preds = %bb11, %bb5 invoke void undef() to label %bb9 unwind label %bb11 bb9: ; preds = %bb8 %tmp10 = invoke i16* undef() to label %bb14 unwind label %bb11 bb11: ; preds = %bb9, %bb8 %tmp12 = landingpad { i8, i32 } cleanup catch i8 bitcast (i8** @global.4 to i8) %tmp13 = icmp ult i64 undef, undef br i1 %tmp13, label %bb8, label %bb14 bb14: ; preds = %bb11, %bb9, %bb5 %tmp15 = phi i16 [ null, %bb5 ], [ null, %bb11 ], [ %tmp10, %bb9 ] ret i16* %tmp15 } llvm-svn: 369104	2019-08-16 13:19:29 +00:00
Bjorn Pettersson	9dddd26e31	[DAGCombiner] Add simple folds for SMULFIX/UMULFIX/SMULFIXSAT Summary: Add the following DAGCombiner folds for mulfix being one of SMULFIX/UMULFIX/SMULFIXSAT: (mulfix x, undef, scale) -> 0 (mulfix x, 0, scale) -> 0 Also added canonicalization of constants to RHS. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66052 llvm-svn: 369103	2019-08-16 13:16:48 +00:00
Jeremy Morse	8b593480d3	[DebugInfo] Handle complex expressions with spills in LiveDebugValues In r369026 we disabled spill-recognition in LiveDebugValues for anything that has a complex expression. This is because it's hard to recover the complex expression once the spill location is baked into it. This patch re-enables spill-recognition and slightly adjusts the DBG_VALUE insts that LiveDebugValues tracks: instead of tracking the last DBG_VALUE for a variable, it tracks the last _unspilt_ DBG_VALUE. The spill-restore code is then able to access and copy the original complex expression; but the rest of LiveDebugValues has to be aware of the slight semantic shift, and produce a new spilt location if a spilt location is propagated between blocks. The test added produces an incorrect variable location (see FIXME), which will be the subject of future work. Differential Revision: https://reviews.llvm.org/D65368 llvm-svn: 369092	2019-08-16 10:04:17 +00:00
Volkan Keles	0ae6006bee	[GlobalISel] CSEMIRBuilder: Add support for G_GEP Summary: This patch adds G_GEP to `shouldCSEOpc` so that it can be CSEd. It also refactors `translateGetElementPtr` by replacing `createGenericVirtualRegister` calls with types. Reviewers: aditya_nandakumar, arsenm, dsanders, paquette, aemerson Reviewed By: aditya_nandakumar Subscribers: wdng, rovka, javed.absar, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66316 llvm-svn: 369070	2019-08-15 23:45:45 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
Matt Arsenault	1f2b727298	MVT: Add v3i16/v3f16 vectors AMDGPU has some buffer intrinsics which theoretically could use this. Some of the generated tables include the 3 and 4 element vector versions of these rounded to 64-bits, which is ambiguous. Add these to help the table disambiguate these. Assertion change is for the path odd sized vectors now take for R600. v3i16 is widened to v4i16, which then needs to be promoted to v4i32. llvm-svn: 369038	2019-08-15 18:58:25 +00:00
Philip Reames	d202899431	[NFC] Add a couple of dump routines for RegisterPressure helper classes llvm-svn: 369037	2019-08-15 18:49:39 +00:00
Jeremy Morse	c476124bc8	[DebugInfo] Avoid crash from dropped fragments in LiveDebugValues This patch avoids a crash caused by DW_OP_LLVM_fragments being dropped from DIExpressions by LiveDebugValues spill-restore code. The appearance of a previously unseen fragment configuration confuses LDV, as documented in PR42773, and reproduced by the test function this patch adds (Crashes on a x86_64 debug build). To avoid this, on spill restore, we now use fragment information from the spilt-location-expression. In addition, when spilling, we now don't spill any DBG_VALUE with a complex expression, as it can't be safely restored and will definitely lead to an incorrect variable location. The discussion of this is in D65368. Differential Revision: https://reviews.llvm.org/D66284 llvm-svn: 369026	2019-08-15 17:49:46 +00:00
Jonas Devlieghere	0eaee545ee	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
Simon Pilgrim	d4df81f463	Remove SmallBitVector.h include. NFCI. SmallBitVector/BitVector types aren't used at all in the cpp file. llvm-svn: 369008	2019-08-15 14:40:37 +00:00
Simon Pilgrim	983e9118a2	Remove BitVector.h include. NFCI. BitVector type isn't used at all in the cpp file. llvm-svn: 369007	2019-08-15 14:39:28 +00:00
Simon Pilgrim	ed804dad1e	[DAGCombine] MergeConsecutiveStores - fix cppcheck/MSVC extension warning. NFCI. Set the StartIdx type to size_t so that it matches the StoreNodes SmallVector size() and index types. Silences the MSVC analyzer warning that unsigned increment might overflow before exceeding size_t on 64-bit targets - this isn't likely to happen but it means we use consistent types and reduces the warning "noise" a little. llvm-svn: 368998	2019-08-15 13:07:14 +00:00
Kang Zhang	2a903c0b67	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: This patch has trigger a bug of r368339, and the r368339 has been reverted, So upstream this patch again. In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 368997	2019-08-15 13:05:16 +00:00
Sanjay Patel	57d459309d	[SDAG][x86] check for relaxed math when matching an FP reduction If the last step in an FP add reduction allows reassociation and doesn't care about -0.0, then we are free to recognize that computation as a reduction that may reorder the intermediate steps. This is requested directly by PR42705: https://bugs.llvm.org/show_bug.cgi?id=42705 and solves PR42947 (if horizontal math instructions are actually faster than the alternative): https://bugs.llvm.org/show_bug.cgi?id=42947 Differential Revision: https://reviews.llvm.org/D66236 llvm-svn: 368995	2019-08-15 12:43:15 +00:00
Florian Hahn	de1d6c8220	Add ptrmask intrinsic This patch adds a ptrmask intrinsic which allows masking out bits of a pointer that must be zero when accessing it, because of ABI alignment requirements or a restriction of the meaningful bits of a pointer through the data layout. This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged pointers) and allows us to not lose information about the underlying object. Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune Reviewed by: sanjoy, jdoerfert Differential Revision: https://reviews.llvm.org/D59065 llvm-svn: 368986	2019-08-15 10:12:26 +00:00
Craig Topper	e7ea06b7d2	[SelectionDAGBuilder] Teach gather/scatter getUniformBase to look through vector zeroinitializer indices in addition to scalar zeroes. llvm-svn: 368926	2019-08-14 21:38:56 +00:00
Sanjay Patel	ecccf29e6c	[SDAG] move variable closer to use; NFC llvm-svn: 368905	2019-08-14 19:46:15 +00:00
Taewook Oh	df7022825c	[DebugInfo] Consider debug label scope has an extra lexical block file Summary: There are places where a case that debug label scope has an extra lexical block file is not considered properly. The modified test won't pass without this patch. Reviewers: aprantl, HsiangKai Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66187 llvm-svn: 368891	2019-08-14 17:58:45 +00:00
Jeremy Morse	90c2794bfc	[DebugInfo] MCP: collect and update DBG_VALUEs encountered in local block MCP currently uses changeDebugValuesDefReg / collectDebugValues to find debug users of a register, however those functions assume that all DBG_VALUEs immediately follow the specified instruction, which isn't necessarily true. This is going to become very often untrue when we turn off CodeGenPrepare::placeDbgValues. Instead of calling changeDebugValuesDefReg on an instruction to change its debug users, in this patch we instead collect DBG_VALUEs of copies as we iterate over insns, and update the debug users of copies that are made dead. This isn't a non-functional change, because MCP will now update DBG_VALUEs that aren't immediately after a copy, but refer to the same register. I've hijacked the regression test for PR38773 to test for this new behaviour, an entirely new test seemed overkill. Differential Revision: https://reviews.llvm.org/D56265 llvm-svn: 368835	2019-08-14 12:20:02 +00:00
Fangrui Song	8caa0aaa4d	[AsmPrinter] Delete redundant .type foo, @function when emitting an ifunc In MCAsmStreamer: .type foo,@function # <--- this is redundant .type foo,@gnu_indirect_function In MCELFStreamer, the latter STT_GNU_IFUNC overrides STT_FUNC. llvm-svn: 368823	2019-08-14 10:30:27 +00:00
Aditya Nandakumar	c65ac865c3	[GlobalISel]: Fix lowering of G_Shuffle_vector where we pick up the wrong source index https://reviews.llvm.org/D66182 llvm-svn: 368781	2019-08-14 01:23:33 +00:00
Aditya Nandakumar	615eee6402	[GlobalISel]: Fix lowering of G_SHUFFLE_VECTOR with scalar sources https://reviews.llvm.org/D66171 llvm-svn: 368753	2019-08-13 21:49:11 +00:00
Matt Arsenault	28215caa60	GlobalISel: Partially implement fewerElementsVector G_UNMERGE_VALUES Odd sized vectors aren't handled yet. llvm-svn: 368713	2019-08-13 16:26:28 +00:00
Matt Arsenault	690645bda0	GlobalISel: Implement lower for G_SHUFFLE_VECTOR llvm-svn: 368709	2019-08-13 16:09:07 +00:00
Matt Arsenault	0a04a06250	GlobalISel: Add more verifier checks for G_SHUFFLE_VECTOR llvm-svn: 368705	2019-08-13 15:52:21 +00:00
Matt Arsenault	5af9cf042f	GlobalISel: Change representation of shuffle masks Currently shufflemasks get emitted as any other constant, and you end up with a bunch of virtual registers of G_CONSTANT with a G_BUILD_VECTOR. The AArch64 selector then asserts on anything that doesn't fit this pattern. This isn't an ideal representation, and should avoid legalization and have fewer opportunities for a representational error. Rather than invent a new shuffle mask operand type, similar to what ShuffleVectorSDNode does, just track the original IR Constant mask operand. I don't completely like the idea of adding another link to the IR, but MIR is already quite dependent on IR constants already, and this will allow sharing the shuffle mask utility functions with the IR. llvm-svn: 368704	2019-08-13 15:34:38 +00:00
Roman Lebedev	676594305a	[CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case) Summary: This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. One huge caveat: this signed case is only valid for positive divisors. While we can freely negate negative divisors, we can't negate `INT_MIN`, so for now if `INT_MIN` is encountered, we bailout. As a follow-up, it should be possible to handle that more gracefully via extra `and`+`setcc`+`select`. This passes llvm's test-suite, and from cursory(!) cross-examination the folds (the assembly) match those of GCC, and manual checking via alive did not reveal any issues (other than the `INT_MIN` case) Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65366 llvm-svn: 368702	2019-08-13 14:57:37 +00:00
Roman Lebedev	f4de7eda4a	[TargetLowering][NFC] prepareUREMEqFold(): fixup comment The comment initially matched the code, but the code was incorrect and was fixed after the initial revert back back when it was introduced, but the comment was never updated. llvm-svn: 368701	2019-08-13 14:57:08 +00:00
Hans Wennborg	5390d25f2b	Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368660	2019-08-13 09:33:25 +00:00
Amara Emerson	e14c91b71a	[GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained. Currently we can't keep any state in the selector object that we get from subtarget. As a result we have to plumb through all our variables through multiple functions. This change makes it non-const and adds a virtual init() method to allow further state to be captured for each target. AArch64 makes use of this in this patch to cache a call to hasFnAttribute() which is expensive to call, and is used on each selection of G_BRCOND. Differential Revision: https://reviews.llvm.org/D65984 llvm-svn: 368652	2019-08-13 06:26:59 +00:00
Aditya Nandakumar	70fdfed45f	[GlobalISel]: Add KnownBits for G_XOR https://reviews.llvm.org/D66119 llvm-svn: 368648	2019-08-13 04:32:33 +00:00
Daniel Sanders	a58a27513b	Eliminate implicit Register->unsigned conversions in VirtRegMap. NFC Summary: This was mostly an experiment to assess the feasibility of completely eliminating a problematic implicit conversion case in D61321 in advance of landing that* but it also happens to align with the goal of propagating the use of Register/MCRegister instead of unsigned so I believe it makes sense to commit it. The overall process for eliminating the implicit conversions from Register/MCRegister -> unsigned was to: 1. Add an explicit conversion to support genuinely required conversions to unsigned. For example, using them as an index for IndexedMap. Sadly it's not possible to have an explicit and implicit conversion to the same type and only deprecate the implicit one so I called the explicit conversion get(). 2. Temporarily annotate the implicit conversion to unsigned with LLVM_ATTRIBUTE_DEPRECATED to make them visible 3. Eliminate implicit conversions by propagating Register/MCRegister/ explicit-conversions appropriately 4. Remove the deprecation added in 2. * My conclusion is that it isn't feasible as there's too much code to update in one go. Depends on D65678 Reviewers: arsenm Subscribers: MatzeB, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65685 llvm-svn: 368643	2019-08-13 00:55:24 +00:00
Aditya Nandakumar	55371e697c	[GISel]: Fix a bug in KnownBits where we should have been using SizeInBits https://reviews.llvm.org/D66039 We were using getIndexSize instead of getIndexSizeInBits(). Added test case for G_PTRTOINT and G_INTTOPTR. llvm-svn: 368618	2019-08-12 21:28:12 +00:00
Hans Wennborg	a45f301f7a	Revert r368339 "[MBP] Disable aggressive loop rotate in plain mode" It caused assertions to fire when building Chromium: lib/CodeGen/LiveDebugValues.cpp:331: bool {anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion `Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed. See https://crbug.com/992871#c3 for how to reproduce. > Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. > > To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. > > Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368579	2019-08-12 14:23:13 +00:00
Kang Zhang	489efc68a5	Revert r368565: [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks llvm-svn: 368574	2019-08-12 14:00:31 +00:00
David Stenberg	9b29ec58b7	[DebugInfo] Remove call sites when eliminating unreachable blocks Summary: When eliminating an unreachable block we must remove any call site information for calls residing in the block. This was originally found on a downstream target, and the attached x86 test case was produced by hand-modifying some MIR. Reviewers: aprantl, asowda, NikolaPrica, djtodoro, ivanbaev, vsk Reviewed By: NikolaPrica, vsk Subscribers: vsk, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D64500 llvm-svn: 368566	2019-08-12 13:22:29 +00:00
Kang Zhang	342fb0db6d	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 368565	2019-08-12 13:15:31 +00:00
Hans Wennborg	5b96d4655c	Revert r368509 "[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks" > In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. > But the `early-ret` pass is before `block-placement`, we don't want to run it again. > This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. > > Reviewed By: efriedma > > Differential Revision: https://reviews.llvm.org/D63972 This also revertes follow-ups r368514 and r368532. llvm-svn: 368560	2019-08-12 12:43:51 +00:00
Simon Pilgrim	05e8209e33	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE llvm-svn: 368553	2019-08-12 10:56:05 +00:00
Bjorn Pettersson	27038a3780	[SelectionDAG] Widen vector results of SMULFIX/UMULFIX/SMULFIXSAT Summary: After the commits that changed x86 backend to widen vectors instead of using promotion some of our downstream tests started to fail. It was noticed that WidenVectorResult has been missing support for SMULFIX/UMULFIX/SMULFIXSAT. This patch adds the missing functionality. Reviewers: craig.topper, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66051 llvm-svn: 368540	2019-08-11 19:27:06 +00:00
Kang Zhang	b1a62d168f	[NFC][CodeGen] Use while loop instead for loop in MachineBlockPlacement::optimizeBranches() This will pass EXPENSIVE check. llvm-svn: 368532	2019-08-11 12:58:50 +00:00
Kang Zhang	555f7495df	[NFC][CodeGen] Modify the PI++ to ++PI in MachineBlockPlacement::optimizeBranches() llvm-svn: 368514	2019-08-10 16:23:17 +00:00
Kang Zhang	36cd84bdd9	[CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 368509	2019-08-10 09:58:52 +00:00
Sanjay Patel	26b2c11451	[DAGCombiner] exclude x*2.0 from normal negation profitability rules This is the codegen part of fixing: https://bugs.llvm.org/show_bug.cgi?id=32939 Even with the optimal/canonical IR that is ideally created by D65954, we would reverse that transform in DAGCombiner and end up with the same asm on AArch64 or x86. I see 2 options for trying to correct this: 1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch). 2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul() that matches the transform done by DAGCombiner. This seems like the less intrusive patch, but if there's some other reason to prefer 1 option over the other, we can change to the other option. Differential Revision: https://reviews.llvm.org/D66016 llvm-svn: 368490	2019-08-09 21:37:32 +00:00
Daniel Sanders	e9a57c2b23	[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487	2019-08-09 21:11:20 +00:00
Bill Wendling	79176a2542	[CodeGen] Require a name for a block addr target Summary: A block address may be used in inline assembly. In which case it requires a name so that the asm parser has something to parse. Creating a name for every block address is a large hammer, but is necessary because at the point when a temp symbol is created we don't necessarily know if it's used in inline asm. This ensures that it exists regardless. Reviewers: nickdesaulniers, craig.topper Subscribers: nathanchance, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65352 llvm-svn: 368478	2019-08-09 20:18:30 +00:00
Bill Wendling	1b10438875	[MC] Don't recreate a label if it's already used Summary: This patch keeps track of MCSymbols created for blocks that were referenced in inline asm. It prevents creating a new symbol which doesn't refer to the block. Inline asm may have a reference to a label. The asm parser however doesn't recognize it as a label and tries to create a new symbol. The result being that instead of the original symbol (e.g. ".Ltmp0") the parser replaces it in the inline asm with the new one (e.g. ".Ltmp00") without updating it in the symbol table. So the machine basic block retains the "old" symbol (".Ltmp0"), but the inline asm uses the new one (".Ltmp00"). Reviewers: nickdesaulniers, craig.topper Subscribers: nathanchance, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65304 llvm-svn: 368477	2019-08-09 20:16:31 +00:00
Sanjay Patel	0b4ae34c2f	[DAGCombiner] remove redundant fold for X*1.0; NFC This is handled at node creation time (similar to X/1.0) after: rL357029 (no fast-math-flags needed) llvm-svn: 368443	2019-08-09 14:30:59 +00:00
Jinsong Ji	6349ce5ca5	[MachinePipeliner] Avoid indeterminate order in FuncUnitSorter Summary: This is exposed by adding a new testcase in PowerPC in https://reviews.llvm.org/rL367732 The testcase got different output on different platform, hence breaking buildbots. The problem is that we get differnt FuncUnitOrder when calculateResMII. The root cause is: 1. Two MachineInstr might get SAME priority(MFUsx) from minFuncUnits. 2. Current comparison operator() will return `MFUs1 > MFUs2`. 3. We use iterators for MachineInstr, so the input to FuncUnitSorter might be different on differnt platform due to the iterator nature. So for two MI with same MFU, their order is actually depends on the iterator order, which is platform (implemtation) dependent. This is risky, and may cause cross-compiling problems. The fix is to check make sure we assign a determine order when they are equal. Reviewers: bcahoon, hfinkel, jmolloy Subscribers: nemanjai, hiraditya, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65992 llvm-svn: 368441	2019-08-09 14:10:57 +00:00
Tim Northover	e1a5f668b3	GlobalISel: pack various parameters for lowerCall into a struct. I've now needed to add an extra parameter to this call twice recently. Not only is the signature getting extremely unwieldy, but just updating all of the callsites and implementations is a pain. Putting the parameters in a struct sidesteps both issues. llvm-svn: 368408	2019-08-09 08:26:38 +00:00
Craig Topper	9158e54270	[SelectionDAG][X86] Move setcc mask splitting for mload/mstore/mgather/mscatter from DAGCombiner to the type legalizer. We may be able to look to how VSELECT is handled to further improve this, but this appears to be neutral or an improvement on the test cases we have. llvm-svn: 368344	2019-08-08 21:14:08 +00:00
Craig Topper	bce4d79f37	[LegalizeTypes] Remove SplitVSETCC helper and just call SplitVecRes_SETCC. llvm-svn: 368343	2019-08-08 21:13:58 +00:00
Guozhi Wei	80347c3acc	[MBP] Disable aggressive loop rotate in plain mode Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse. To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true. Differential Revision: https://reviews.llvm.org/D65673 llvm-svn: 368339	2019-08-08 20:25:23 +00:00
Brian Cain	6dbbd0f343	[llvm-mc] Add reportWarning() to MCContext Adding reportWarning() to MCContext, so that it can be used from the Hexagon assembler backend. llvm-svn: 368327	2019-08-08 19:13:23 +00:00
David Tenty	8558aac82c	Enable assembly output of local commons for AIX Summary: This patch enable assembly output of local commons for AIX using .lcomm directives. Adds a EmitXCOFFLocalCommonSymbol to MCStreamer so we can emit the AIX version of .lcomm assembly directives which include a csect name. Handle the case of BSS locals in PPCAIXAsmPrinter by using EmitXCOFFLocalCommonSymbol. Adds a test for generating .lcomm on AIX Targets. Reviewers: cebowleratibm, hubert.reinterpretcast, Xiangling_L, jasonliu, sfertile Reviewed By: sfertile Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64825 llvm-svn: 368306	2019-08-08 15:40:35 +00:00
Simon Pilgrim	e2e366797e	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Differential Revision: https://reviews.llvm.org/D65887 llvm-svn: 368276	2019-08-08 10:37:03 +00:00
Amy Huang	0b870b969f	Recommit "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" with a fix to clear the SDNode map when SelectionDAG is cleared. llvm-svn: 368230	2019-08-07 22:49:40 +00:00
Bob Haarman	885fa02da9	Revert r367501 "Create unique, but identically-named ELF sections..." This reverts commit `fbc563e2cb` "Create unique, but identically-named ELF sections for explicitly-sectioned functions and globals when using -function-sections and -data-sections." Reason for revert: sections are created with potentially wrong attributes. llvm-svn: 368204	2019-08-07 20:45:23 +00:00
Tim Northover	3c10f346dc	GlobalISel: factor common code from translateCall and translateInvoke. NFC. llvm-svn: 368166	2019-08-07 12:43:53 +00:00
Simon Pilgrim	0eafe011ca	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway llvm-svn: 368155	2019-08-07 11:43:13 +00:00
Kai Luo	02b8056cc1	[MachineCSE][NFC] Use 'profitable' rather than 'beneficial' to name method. llvm-svn: 368124	2019-08-07 05:40:21 +00:00
Aditya Nandakumar	6bbfde5c48	[GISel]: Fix trivial build breakage llvm-svn: 368067	2019-08-06 17:53:04 +00:00
Aditya Nandakumar	c8ac029d0a	[GISel]: Add GISelKnownBits analysis https://reviews.llvm.org/D65698 This adds a KnownBits analysis pass for GISel. This was done as a pass (compared to static functions) so that we can add other features such as caching queries(within a pass and across passes) in the future. This patch only adds the basic pass boiler plate, and implements a lazy non caching knownbits implementation (ported from SelectionDAG). I've also hooked up the AArch64PreLegalizerCombiner pass to use this - there should be no compile time regression as the analysis is lazy. llvm-svn: 368065	2019-08-06 17:18:29 +00:00
Simon Pilgrim	dae5ddad9d	[TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops If we demand no bits/elts from an Op, just return UNDEF llvm-svn: 368043	2019-08-06 14:30:42 +00:00
Igor Kudrin	f26a70a5e7	Switch LLVM to use 64-bit offsets (2/5) This updates all libraries and tools in LLVM Core to use 64-bit offsets which directly or indirectly come to DataExtractor. Differential Revision: https://reviews.llvm.org/D65638 llvm-svn: 368014	2019-08-06 10:49:40 +00:00
Ulrich Weigand	7b24dd741c	[Strict FP] Allow custom operation actions This patch changes the DAG legalizer to respect the operation actions set by the target for strict floating-point operations. (Currently, the legalizer will usually fall back to mutate to the non-strict action (which is assumed to be legal), and only skip mutation if the strict operation is marked legal.) With this patch, if whenever a strict operation is marked as Legal or Custom, it is passed to the target as usual. Only if it is marked as Expand will the legalizer attempt to mutate to the non-strict operation. Note that this will now fail if the non-strict operation is itself marked as Custom -- the target will have to provide a Custom definition for the strict operation then as well. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D65226 llvm-svn: 368012	2019-08-06 10:43:13 +00:00
Cullen Rhodes	ced419f4d7	[SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER Summary: Before this patch MGATHER/MSCATTER is capable of representing all common addressing modes, but only when illegal types are used. This patch adds an IndexType property so more representations are available when using legal types only. Original modes: vector of bases base + vector of signed scaled offsets New modes: base + vector of signed unscaled offsets base + vector of unsigned scaled offsets base + vector of unsigned unscaled offsets The current behaviour of addressing modes for gather/scatter remains unchanged. Patch by Paul Walker. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D65636 llvm-svn: 368008	2019-08-06 09:46:13 +00:00
Matt Arsenault	f4d3113a5f	CodeGen: Migration to using Register llvm-svn: 367974	2019-08-06 03:59:31 +00:00
Amara Emerson	bc1172df14	[GlobalISel][CallLowering] Rename isArgumentHandler() -> isIncomingArgumentHandler() Previous name and comment incorrectly implied it was just for formal arg handlers, which is not true. llvm-svn: 367945	2019-08-05 23:05:28 +00:00
Daniel Sanders	eac86ec25f	Revert Register/MCRegister: Add conversion operators to avoid use of implicit convert to unsigned. NFC MSVC finds ambiguity where clang doesn't and it looks like it's not going to be an easy fix Reverting while I figure out how to fix it This reverts r367916 (git commit `aa15ec3c23`) This reverts r367920 (git commit `5d14efe279`) llvm-svn: 367932	2019-08-05 21:34:45 +00:00
Daniel Sanders	5d14efe279	Fix MSVC error after r367916 It seems that MSVC sees ambiguity between the operator==()'s where clang doesn't llvm-svn: 367920	2019-08-05 20:03:43 +00:00
Amara Emerson	85e5e28ab4	[AArch64][GlobalISel] Inline tiny memcpy et al at -O0. FastISel already does this since the initial arm64 port was upstreamed, so it seems there are no issues with doing this at -O0 for very small memcpys. Gives a 0.2% geomean code size improvement on CTMark. Differential Revision: https://reviews.llvm.org/D65758 llvm-svn: 367919	2019-08-05 20:02:52 +00:00
Matt Arsenault	3922392969	AMDGPU: Correct behavior of f16 buffer loads Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879	2019-08-05 15:59:07 +00:00
Nilanjana Basu	da60fc813c	Changing representation of .cv_def_range directives in Codeview debug info assembly format for better readability llvm-svn: 367867	2019-08-05 14:16:58 +00:00
Nilanjana Basu	b5e4d7de17	Revert "Changing representation of .cv_def_range directives in Codeview debug info assembly format for better readability" This reverts commit `a885afa9fa`. llvm-svn: 367861	2019-08-05 13:55:21 +00:00
Nilanjana Basu	a885afa9fa	Changing representation of .cv_def_range directives in Codeview debug info assembly format for better readability llvm-svn: 367850	2019-08-05 13:11:51 +00:00
Sanjay Patel	eaf13044bd	[DAGCombiner][x86] prevent infinite loop from truncate/extend transforms The test case is based on the example from the post-commit thread for: https://reviews.llvm.org/rGc9171bd0a955 This replaces the x86-specific simple-type check from: rL367766 with a check in the DAGCombiner. Adding the check isn't strictly necessary after the fix from: rL367768 ...but it seems likely that we're heading for trouble if we are creating weird types in this transform. I combined the earlier legality check into the initial clause to simplify the code. So we should only try the trunc/sext transform at the earliest combine stage, but we limit the transform to simple types anyway because the TLI hook is probably too lax about what it considers a free truncate. llvm-svn: 367834	2019-08-05 11:27:07 +00:00
Graham Hunter	208d63ea90	[MVT][SVE] Map between scalable vector IR Type and VTs Adds a two way mapping between the scalable vector IR type and corresponding SelectionDAG ValueTypes. Reviewers: craig.topper, jeroen.dobbelaere, fhahn, rengolin, greened, rovka Reviewed By: greened Differential Revision: https://reviews.llvm.org/D47770 llvm-svn: 367832	2019-08-05 11:18:19 +00:00
Guillaume Chatelet	c97a3d15d2	[LLVM][Alignment] Introduce Alignment Type Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828	2019-08-05 11:02:05 +00:00
Guillaume Chatelet	6c5fb61f8b	[LLVM][Alignment] Introduce Alignment In CallingConv Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Subscribers: hiraditya, llvm-commits, courbet, jfb Tags: #llvm Differential Revision: https://reviews.llvm.org/D65659 llvm-svn: 367822	2019-08-05 09:49:09 +00:00
Oliver Stannard	8ed8353fc4	Reland: Fix and test inter-procedural register allocation for ARM Add an explicit construction of the ArrayRef, gcc 5 and earlier don't seem to select the ArrayRef constructor which takes a C array when the construction is implicit. Original commit message: - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367819	2019-08-05 09:04:10 +00:00
Guillaume Chatelet	65e4b47aad	[LLVM][Alignment] Introduce Alignment Type in DataLayout Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65521 Make getFunctionPtrAlign() return MaybeAlign llvm-svn: 367817	2019-08-05 09:00:43 +00:00
Fangrui Song	d9b948b6eb	Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC F_{None,Text,Append} are kept for compatibility since r334221. llvm-svn: 367800	2019-08-05 05:43:48 +00:00
Craig Topper	5a4989e2ac	[TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users. Summary: The SimplifyDemandedVectorElts function can replace with undef when no elements are demanded, but due to how it interacts with TargetLoweringOpts, it can only do this when the node has no other users. Remove a now unneeded DAG combine from the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65713 llvm-svn: 367788	2019-08-04 17:30:41 +00:00
Craig Topper	76f0f2e0f0	[SelectionDAG] Add node creation debug message to getMemIntrinsicNode. llvm-svn: 367771	2019-08-04 02:32:06 +00:00
Craig Topper	2edeb8a11a	[DAGCombiner] Prevent the combine added in r367710 from creating illegal types after type legalization. This is further fix for PR42880. Sanjay already disabled the X86 TLI hook for non-simple types, but we should really call isTypeLegal here if we're after type legalization. llvm-svn: 367768	2019-08-03 23:09:13 +00:00
Bill Wendling	41a2847a9a	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750	2019-08-03 05:52:47 +00:00
Amara Emerson	c835164a47	Re-commit "[GlobalISel] Add legalization support for non-power-2 loads and stores"" This is an old commit that exposed a bug in the GISel importer, which caused non-truncating stores to be selected for truncating store patterns. Now that's been fixed in r367737 this can go back in. llvm-svn: 367739	2019-08-02 23:44:24 +00:00
Craig Topper	b1cfcd1a56	[ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use scalar bit tests for the branches for expandload/compressstore. Same as what was done for gather/scatter/load/store in r367489. Expandload/compressstore were delayed due to lack of constant masking handling that has since been fixed. llvm-svn: 367738	2019-08-02 23:43:53 +00:00
Douglas Yung	42618b270d	Revert Fix and test inter-procedural register allocation for ARM This reverts r367669 (git commit `f6b00c279a`) This was breaking a build bot http://lab.llvm.org:8011/builders/netbsd-amd64/builds/21233 llvm-svn: 367731	2019-08-02 22:11:49 +00:00
Simon Pilgrim	794f7591ec	[TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple. Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple. llvm-svn: 367721	2019-08-02 21:07:07 +00:00
Daniel Sanders	e7694f34ab	Use MCRegister in MCRegisterInfo's interfaces Summary: As part of this, define DenseMapInfo for MCRegister (and Register while I'm at it) Depends on D65599 Reviewers: arsenm Subscribers: MatzeB, qcolombet, jvesely, wdng, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65605 llvm-svn: 367719	2019-08-02 20:23:00 +00:00
Philip Reames	511be2a158	[Statepoints] Fix overalignment of loads in no-realign-stack functions This really should have been part of 366765. For some reason, I forgot to handle the corresponding load side, and the readable test cases (using deopt vs statepoints) turned out to be overly reduced. Oops. As seen in the test change, the problem was that we were using a load with alignment expectations rather than the unaligned variant when the stack alignment was less than that prefered type alignment. llvm-svn: 367718	2019-08-02 20:17:37 +00:00
Craig Topper	de9b1d7912	[ScalarizeMaskedMemIntrin] Add constant mask support to expandload and compressstore scalarization This adds support for generating all the loads or stores for a constant mask into a single basic block with no conditionals. Differential Revision: https://reviews.llvm.org/D65613 llvm-svn: 367715	2019-08-02 20:04:34 +00:00
Sanjay Patel	68264558f9	[DAGCombiner] try to convert opposing shifts to casts This reverses a questionable IR canonicalization when a truncate is free: sra (add (shl X, N1C), AddC), N1C --> sext (add (trunc X to (width - N1C)), AddC') https://rise4fun.com/Alive/slRC More details in PR42644: https://bugs.llvm.org/show_bug.cgi?id=42644 I limited this to pre-legalization for code simplicity because that should be enough to reverse the IR patterns. I don't have any evidence (no regression test diffs) that we need to try this later. Differential Revision: https://reviews.llvm.org/D65607 llvm-svn: 367710	2019-08-02 19:33:46 +00:00
Eric Christopher	5fb56b1966	Temporarily Revert "Changing representation of cv_def_range directives in Codeview debug info assembly format for better readability" This is breaking bots and the author asked me to revert. This reverts commit 367704. llvm-svn: 367707	2019-08-02 19:10:37 +00:00
Nilanjana Basu	1c67521591	Changing representation of cv_def_range directives in Codeview debug info assembly format for better readability llvm-svn: 367704	2019-08-02 18:44:39 +00:00
Peter Collingbourne	4dcf8800e2	CodeGen: Don't follow aliases when extracting type info. This fixes a crash in the case where the type info object is an alias pointing to a non-zero offset within a global or is otherwise unanalyzable by the stripPointerCasts() function. Looking through the alias is not the right thing to do anyway for similar reasons as D65118. Differential Revision: https://reviews.llvm.org/D65314 llvm-svn: 367696	2019-08-02 17:43:45 +00:00
Tim Northover	522fb7eedc	GlobalISel: support swiftself attribute llvm-svn: 367683	2019-08-02 14:09:49 +00:00
Oliver Stannard	4b7239ebac	[IPRA][ARM] Disable no-CSR optimisation for ARM This optimisation isn't generally profitable for ARM, because we can save/restore many registers in the prologue and epilogue using the PUSH and POP instructions, but mostly use individual LDR/STR instructions for other spills. Differential revision: https://reviews.llvm.org/D64910 llvm-svn: 367670	2019-08-02 10:23:17 +00:00
Oliver Stannard	f6b00c279a	Fix and test inter-procedural register allocation for ARM - Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves with a null RegScavenger. Simply not updating the register scavenger is fine because IPRA only cares about the SavedRegs vector, the acutal code of the function has already been generated at this point. - Add a new hook to TargetRegisterInfo to get the set of registers which can be clobbered inside a call, even if the compiler can see both sides, by linker-generated code. Differential revision: https://reviews.llvm.org/D64908 llvm-svn: 367669	2019-08-02 10:23:05 +00:00
Kang Zhang	038dd43782	[NFC][CodeGen] Modify the type element of TailCalls to simplify the dupRetToEnableTailCallOpts() Summary: The old code can be simplified to define the element type of TailCalls as `BasicBlock` not `CallInst`. Also I use the for-range loop instead the for loop. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D64905 llvm-svn: 367644	2019-08-02 03:09:07 +00:00
Eric Christopher	5a00b0772a	Temporarily revert "Changes to improve CodeView debug info type record inline comments" due to a sanitizer failure. This reverts commit 367623. llvm-svn: 367640	2019-08-02 01:05:47 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
Nilanjana Basu	ac7e5788ca	Changes to improve CodeView debug info type record inline comments Signed-off-by: Nilanjana Basu <nilanjana.basu87@gmail.com> llvm-svn: 367623	2019-08-01 22:05:14 +00:00
Matt Arsenault	d9d30a408e	GlobalISel: Lower scalarizing unmerge of a vector to shifts AMDGPU sometimes has legal s16 and <2 x s16> operations, but all registers are really 32-bit. An unmerge destination really should ben widened to a 32-bit register. If widening a scalarizing vector with a target size that matches the vector size, bitcast to integer and extract the relevant bits with shifts. I'm not sure if this is the right place for this. This could arguably be part of widenScalar for the result. I also have a growing feeling that we're missing a bitcast legalize action. llvm-svn: 367604	2019-08-01 19:10:05 +00:00
Craig Topper	a9ed5436bd	[X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it. This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that... Differential Revision: https://reviews.llvm.org/D65533 llvm-svn: 367601	2019-08-01 18:49:07 +00:00
Matt Arsenault	e56a2ad85e	CodeGen: Allow virtual registers in bundles The note in the documentation suggests this restriction is a compile time optimization for architectures that make heavy use of bundling. Allowing virtual registers in a bundle is useful for some (non-R600) AMDGPU use cases and are infrequent enough to matter. A more common AMDGPU use case has already been using virtual registers in bundles since r333691, although never calling finalizeBundle on them and manually creating the use/def list on the BUNDLE instruction. This is also relatively infrequent, and only happens for consecutive sequences of some load/store types. llvm-svn: 367597	2019-08-01 18:41:28 +00:00
Matt Arsenault	5faa533e47	GlobalISel: Fix widenScalar for G_MERGE_VALUES to pointer AMDGPU testcase isn't broken now, but will be in a future patch without this. llvm-svn: 367591	2019-08-01 18:13:16 +00:00
Simon Pilgrim	1d183b407a	[TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling Allow us to peek through vector insertions to avoid dependencies on entire insertion chains. llvm-svn: 367588	2019-08-01 17:46:44 +00:00
Craig Topper	388df2ea19	[SelectionDAG] Use APInt::isSubsetOf/intersects to simplify some code. Also use KnownBits::isNegative/isNonNegative to further simplify. llvm-svn: 367518	2019-08-01 06:06:21 +00:00
Matt Arsenault	7bedceb5b2	GlobalISel: moreElementsVector for G_LOAD/G_STORE AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503	2019-08-01 01:44:22 +00:00
Peter Collingbourne	fbc563e2cb	Create unique, but identically-named ELF sections for explicitly-sectioned functions and globals when using -function-sections and -data-sections. This allows functions and globals to to be reordered later in the linking phase (using the -symbol-ordering-file) even though reordering will be limited to the scope of the explicit section. Patch by Rahman Lavaee! Differential Revision: https://reviews.llvm.org/D65478 llvm-svn: 367501	2019-08-01 01:38:53 +00:00
Amy Huang	153f20057c	Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" and and partial fix. Causes windows buildbot errors. This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and `53da7ca943`. llvm-svn: 367496	2019-07-31 23:59:31 +00:00
Craig Topper	b70026c43c	[ScalarizeMaskedMemIntrin] Bitcast the mask to the scalar domain and use scalar bit tests for the branches. X86 at least is able to use movmsk or kmov to move the mask to the scalar domain. Then we can just use test instructions to test individual bits. This is more efficient than extracting each mask element individually. I special cased v1i1 to use the previous behavior. This avoids poor type legalization of bitcast of v1i1 to i1. I've skipped expandload/compressstore as I think we need to handle constant masks for those better first. Many tests end up with duplicate test instructions due to tail duplication in the branch folding pass. But the same thing happens when constructing similar code in C. So its not unique to the scalarization. Not sure if this lowering code will also be good for other targets, but we're only testing X86 today. Differential Revision: https://reviews.llvm.org/D65319 llvm-svn: 367489	2019-07-31 22:58:15 +00:00
Michael Berg	005d705d43	Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context. Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper Reviewed By: spatel Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji Differential Revision: https://reviews.llvm.org/D65170 llvm-svn: 367486	2019-07-31 21:57:28 +00:00
Amy Huang	27a73dd02c	Fix to r367374 "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" after windows buildbot failure. Added a check that the MachineInstr exists and is a call before trying to add symbols around it. llvm-svn: 367483	2019-07-31 21:03:38 +00:00
Eric Christopher	36fb93982f	Fix unused variable warning for non-assert builds. llvm-svn: 367482	2019-07-31 21:02:03 +00:00
Mark Lacey	7b8d3eb9e2	[GISel] Pass MD_callees metadata down in call lowering. Summary: This will make it possible to improve IPRA by taking into account register usage in indirect calls. NFC yet; this is just laying the groundwork to start building up patches to take advantage of the information for improved register allocation. Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65488 llvm-svn: 367476	2019-07-31 20:34:02 +00:00
Peter Collingbourne	33773d5cfc	SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned char to unsigned. This makes the field wider than MachineOperand::SubReg_TargetFlags so that we don't end up silently truncating any higher bits. We should still catch any bits truncated from the MachineOperand field as a consequence of the assertion in MachineOperand::setTargetFlags(). Differential Revision: https://reviews.llvm.org/D65465 llvm-svn: 367474	2019-07-31 20:14:09 +00:00
Wei Mi	f49c107f06	[DAGCombine] Limit the number of times for the same store and root nodes to bail out in store merging dependence check. We run into a case where dependence check in store merging bail out many times for the same store and root nodes in a huge basicblock. That increases compile time by almost 100x. The patch add a map to track how many times the bailing out happen for the same store and root, and if it is over a limit, stop considering the store with the same root as a merging candidate. Differential Revision: https://reviews.llvm.org/D65174 llvm-svn: 367472	2019-07-31 19:59:24 +00:00
Djordje Todorovic	b9973f87c6	Reland "[DwarfDebug] Dump call site debug info" The build failure found after the rL365467 has been resolved. Differential Revision: https://reviews.llvm.org/D60716 llvm-svn: 367446	2019-07-31 16:51:28 +00:00
Amy Huang	53da7ca943	[MS] Emit S_HEAPALLOCSITE debug info in SelectionDAG Summary: This emits labels around heapallocsite calls in SelectionDAG. Reviewers: rnk Subscribers: MatzeB, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61105 llvm-svn: 367374	2019-07-31 00:16:13 +00:00
Matt Arsenault	9cf980d4a7	GlobalISel: Add G_ATOMICRMW_{FADD\|FSUB} llvm-svn: 367369	2019-07-30 23:56:30 +00:00
Wei Mi	888efda280	[DAGCombiner] Add an option to control whether or not to enable store merging. Add an option to control whether or not to enable store merging in dag combiner so we can workaround some bugs more easily. Differential Revision: https://reviews.llvm.org/D65482 llvm-svn: 367365	2019-07-30 23:14:56 +00:00
Austin Kerbow	c99f62e313	[AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization. Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64966 llvm-svn: 367344	2019-07-30 18:49:16 +00:00
Sean Fertile	39f3503814	Address post commit review comments on revision 366727. Addresses number of comment made on D64652 after commiting: - Reorders function decls in the TargetLoweringObjectFileXCOFF class. - Fix comment in MCSectionXCOFF to include description of external reference csects. - Convert several llvm_unreachables to report_fatal_error - Convert several dyn_casts to casts as they are expected not to fail. - Avoid copying DataLayout object. llvm-svn: 367324	2019-07-30 15:37:01 +00:00
Simon Pilgrim	f8a7e9de06	[DAGCombine] narrowInsertExtractVectorBinOp - early out for binops that change value type. NFCI. This is implicit in the value type checks in getSubVectorSrc - this just makes it upfront and obvious. llvm-svn: 367220	2019-07-29 11:34:45 +00:00
Simon Pilgrim	76f2f04d9d	[DAGCombine] narrowInsertExtractVectorBinOp - early out for illegal op. NFCI. If the subvector binop is illegal then early-out and avoid the subvector searches. llvm-svn: 367181	2019-07-27 19:42:58 +00:00
Simon Pilgrim	603f94aa2a	[TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied) This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back. This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value. Fixes PR42777 llvm-svn: 367174	2019-07-27 14:11:59 +00:00
Simon Pilgrim	8a52671782	[SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit. If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit. This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches. llvm-svn: 367171	2019-07-27 12:48:46 +00:00
Simon Pilgrim	3ff6126487	[TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit. llvm-svn: 367169	2019-07-27 12:23:36 +00:00
Amara Emerson	7bc4fad0fb	[AArch64][GlobalISel] Implement narrowing of G_SEXT. We need this to narrow a sext to s128. Differential Revision: https://reviews.llvm.org/D65357 llvm-svn: 367164	2019-07-26 23:46:38 +00:00
Sean Fertile	9df6177d38	[PowerPC][AIX]Add lowering of MCSymbol MachineOperand. Adds machine operand lowering for MCSymbolSDNodes to the PowerPC backend. This is needed to produce call instructions in assembly for AIX because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet. Differential Revision: https://reviews.llvm.org/D63738 llvm-svn: 367133	2019-07-26 17:25:27 +00:00
Nico Weber	13f337c4cb	Revert r367091, it caused PR42777. llvm-svn: 367118	2019-07-26 14:58:42 +00:00
Simon Pilgrim	a424a1f351	[SelectionDAG] GetDemandedBits - update SIGN_EXTEND_INREG op to just call SimplifyMultipleUseDemandedBits. llvm-svn: 367098	2019-07-26 10:03:07 +00:00
Simon Pilgrim	9758407bf1	[TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support. llvm-svn: 367096	2019-07-26 09:41:08 +00:00
Simon Pilgrim	d0164fc525	[SelectionDAG] GetDemandedBits - update OR/XOR ops to just call SimplifyMultipleUseDemandedBits. Eventually all of these will be moved over, but we create nodes in GetDemandedBits recursion at the moment which causes regressions when we try to remove them all. llvm-svn: 367092	2019-07-26 09:13:29 +00:00
Simon Pilgrim	b32ceb79b0	[TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support. This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back. llvm-svn: 367091	2019-07-26 08:38:39 +00:00
Kang Zhang	4e794a8bae	Some case eror for: detected memory leaks llvm-svn: 367083	2019-07-26 03:25:58 +00:00
Kang Zhang	5c61015455	[PowerPC] Do the Simple Early Return in block-placement pass to optimize the blocks Summary: In `block-placement` pass, it will create some patterns for unconditional we can do the simple early retrun. But the `early-ret` pass is before `block-placement`, we don't want to run it again. This patch is to do the simple early return to optimize the blocks at the last of `block-placement`. Below is an example ``` BB: \| BB: XOR 3, 3, 4 \| XOR 3, 3, 4 B TBB \| B ChainBB ... \| ... ChainBB: \| ChainBB: B TBB \| ADD 3, 3, 4 ... \| BLR TBB: \| ADD 3, 3, 4 \| BLR \| ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D63972 llvm-svn: 367080	2019-07-26 01:58:53 +00:00
Francis Visoiu Mistrih	2d8fdcae96	Reland: [Remarks] Add support for serializing metadata for every remark streamer This allows every serializer format to implement metaSerializer() and return the corresponding meta serializer. Original llvm-svn: 366946 Reverted llvm-svn: 367004 This fixes the unit tests on Windows bots. llvm-svn: 367078	2019-07-26 01:33:30 +00:00
Francis Visoiu Mistrih	0503add6da	[CodeGen] Don't resolve the stack protector frame accesses until PEI Currently, stack protector loads and stores are resolved during LocalStackSlotAllocation (if the pass needs to run). When this is the case, the base register assigned to the frame access is going to be one of the vregs created during LocalStackSlotAllocation. This means that we are keeping a pointer to the stack protector slot, and we're using this pointer to load and store to it. In case register pressure goes up, we may end up spilling this pointer to the stack, which can be a security concern. Instead, leave it to PEI to resolve the frame accesses. In order to do that, we make all stack protector accesses go through frame index operands, then PEI will resolve this using an offset from sp/fp/bp. Differential Revision: https://reviews.llvm.org/D64759 llvm-svn: 367068	2019-07-25 22:23:48 +00:00
Simon Pilgrim	55fd57ba95	Revert rL366946 : [Remarks] Add support for serializing metadata for every remark streamer This allows every serializer format to implement metaSerializer() and return the corresponding meta serializer. ........ Fix windows build bots http://lab.llvm.org:8011/builders/llvm-clang-x86_64-win-fast http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win llvm-svn: 367004	2019-07-25 10:20:39 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Amara Emerson	13af1ed8e3	[GlobalISel] Support for inlining memcpy, memset and memmove calls. This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951	2019-07-24 22:17:31 +00:00
Francis Visoiu Mistrih	62388e3846	[Remarks] Add support for serializing metadata for every remark streamer This allows every serializer format to implement metaSerializer() and return the corresponding meta serializer. llvm-svn: 366946	2019-07-24 21:29:44 +00:00
Amara Emerson	a1997ce2e5	[AArch64][GlobalISel] Fix a crash during s128 G_ICMP legalization due to r366317. r366317 added a legalization for s128 G_ICMP narrow scalar which tried to hard code the result type of the new legalized G_SELECT. Change this to instead use type of the original G_ICMP result and allow the target to legalize it if necessary later. llvm-svn: 366943	2019-07-24 20:46:42 +00:00
Francis Visoiu Mistrih	ff4b515a77	[Remarks][NFC] Rename remarks::Serializer to remarks::RemarkSerializer llvm-svn: 366939	2019-07-24 19:47:57 +00:00
Simon Pilgrim	2bf871be4c	Fix signed/unsigned comparison warning. NFCI. llvm-svn: 366935	2019-07-24 17:44:22 +00:00
Simon Pilgrim	7d318b2bb1	[DAGCombine] matchBinOpReduction - add partial reduction matching This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector: e.g. <8 x i32> reduction pattern in a <16 x i32> vector: <4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> <2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> <1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction. I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction. Fixes the x86 partial reduction sum cases in PR33758 and PR42023. Differential Revision: https://reviews.llvm.org/D65047 llvm-svn: 366933	2019-07-24 17:29:56 +00:00
Simon Pilgrim	3f01c7197f	[SelectionDAG] makeEquivalentMemoryOrdering - early out for equal chains (PR42727) If we are already using the same chain for the old/new memory ops then just return. Fixes PR42727 which had getLoad() reusing an existing node. llvm-svn: 366922	2019-07-24 16:53:14 +00:00
Sanjay Patel	10dad95a75	[SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC We canonicalize to the add form, so create that directly for efficiency. llvm-svn: 366914	2019-07-24 15:43:50 +00:00
Simon Pilgrim	0e8359aec1	[TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support. If all the demanded elts are from one operand and are inline, then we can use the operand directly. The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements. llvm-svn: 366817	2019-07-23 15:35:55 +00:00
Simon Pilgrim	743d45ee25	[TargetLowering] Add SimplifyMultipleUseDemandedBits This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799	2019-07-23 12:39:08 +00:00
Craig Topper	a658cb0b12	[DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead of an SDNode*. NFCI The function was calling getNode() on an SDValue to return and the caller turned the result back into a SDValue. So just return the original SDValue to avoid this. llvm-svn: 366779	2019-07-23 05:13:39 +00:00
Craig Topper	f5247244f2	[DAGCombiner] Use SDNode::isOperandOf to simplify some code. NFCI llvm-svn: 366778	2019-07-23 05:13:35 +00:00
Richard Trieu	81a5045cd6	Move variable out from debug only section. MFI is no longer just needed for an assert. Move it out of the debug only section to allow non-assert builds to be able to find it. llvm-svn: 366773	2019-07-23 02:59:15 +00:00
Philip Reames	2f5543aa72	[Statepoints] Fix a bug in statepoint lowering for functions w/no-realign-stack We were silently using the ABI alignment for all of the stores generated for deopt and gc values. We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location. The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help. So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO. Note that there's a separate performance possibility here. Even if we can realign stacks, we probably don't want to if all of the stores are in slowpaths. But that's a later patch, if at all. :) llvm-svn: 366765	2019-07-22 23:33:18 +00:00
Sean Fertile	942537d9fa	Stubs out TLOF for AIX and add support for common vars in assembly output. Stubs out a TargetLoweringObjectFileXCOFF class, implementing only SelectSectionForGlobal for common symbols. Also adds an override of EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors and adds support for emitting common globals. llvm-svn: 366727	2019-07-22 19:15:29 +00:00
Matt Arsenault	542720b2bc	TableGen: Support physical register inputs > 255 This was truncating register value that didn't fit in unsigned char. Switch AMDGPU sendmsg intrinsics to using a tablegen pattern. llvm-svn: 366695	2019-07-22 15:02:34 +00:00
Christudasan Devadasan	006cf8c03d	Added address-space mangling for stack related intrinsics Modified the following 3 intrinsics: int_addressofreturnaddress, int_frameaddress & int_sponentry. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64561 llvm-svn: 366679	2019-07-22 12:42:48 +00:00
Oliver Stannard	6771a89fa0	[IPRA][ARM] Make use of the "returned" parameter attribute ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669	2019-07-22 08:44:36 +00:00
Aditya Nandakumar	d7504a1569	[GISel]: Attach missing range metadata while translating G_LOADs https://reviews.llvm.org/D65048 Attach range information to G_LOAD when only defining one register. reviewed by: arsenm llvm-svn: 366656	2019-07-21 14:07:54 +00:00
Roman Lebedev	cd9b19484b	[Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements Summary: Four things here: 1. Generalize the fold to handle non-splat divisors. Reasonably trivial. 2. Unban power-of-two divisors. I don't see any reason why they should be illegal. * There is no ban in Hacker's Delight * I think the ban came from the same bug that caused the miscompile in the base patch - in `floor((2^W - 1) / D)` we were dividing by `D0` instead of `D`, and we were ensuring that `D0` is not `1`, which made sense. 3. Unban `1` divisors. I no longer believe Hacker's Delight actually says that the fold is invalid for `D = 0`. Further considerations: * We know that * `(X u% 1) == 0` can be constant-folded to `1`, * `(X u% 1) != 0` can be constant-folded to `0`, * Also, we know that * `X u<= -1` can be constant-folded to `1`, * `X u> -1` can be constant-folded to `0`, * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p * We know will end up with the following: `(setule/setugt (rotr (mul N, P), K), Q)` * Therefore, for given new DAG nodes and comparison predicates (`ule`/`ugt`), we will still produce the correct answer if: `Q` is a all-ones constant; and both `P` and `K` are anything other than `undef`. * The fold will indeed produce `Q = all-ones`. 4. Try to re-splat the `P` and `K` vectors - we don't care about their values for the lanes where divisor was `1`. Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00 Reviewed By: RKSimon Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63963 llvm-svn: 366637	2019-07-20 16:33:15 +00:00
Matt Arsenault	c14334e959	LiveIntervals: Fix handleMove asserting on BUNDLE The top-level BUNDLE instruction should behave as an ordinary instruction. It is supposed to have all relevant registers as implicit operands. Moving it should work as any other instruction. I believe the assert intended to avoid moving instructions inside bundles. llvm-svn: 366605	2019-07-19 19:32:00 +00:00
Nick Desaulniers	4e9196ebcb	Revert "Use the MachineBasicBlock symbol for a callbr target" This reverts commit r366523/ccbffefccaff42b0d094c9ef0f49fc3e8c8456ea. Two regressions were immediately reported: - https://github.com/ClangBuiltLinux/linux/issues/614 - https://github.com/ClangBuiltLinux/linux/issues/615 Reported-by: nathanchance llvm-svn: 366600	2019-07-19 18:18:02 +00:00
Matt Arsenault	5905aae169	DAG: Handle dbg_value for arguments split into multiple subregs This was handled previously for arguments split due to not fitting in an MVT. This was dropping the register for argument registers split due to TLI::getRegisterTypeForCallingConv. llvm-svn: 366574	2019-07-19 13:36:46 +00:00
Kai Luo	dec624682e	[MachineCSE][MachinePRE] Avoid hoisting code from code regions into hot BBs. Summary: Current PRE hoists common computations into CMBB = DT->findNearestCommonDominator(MBB, MBB1). However, if CMBB is in a hot loop body, we might get performance degradation. Differential Revision: https://reviews.llvm.org/D64394 llvm-svn: 366570	2019-07-19 12:58:16 +00:00
Oliver Stannard	0ed7732671	[IPRA] Don't rely on non-exact function definitions If a function definition is not exact, then the linker could select a differently-compiled version of it, which could use different registers. https://reviews.llvm.org/D64909 llvm-svn: 366557	2019-07-19 09:59:26 +00:00
Bill Wendling	ccbffefcca	Use the MachineBasicBlock symbol for a callbr target Summary: Inline asm doesn't use labels when compiled as an object file. Therefore, we shouldn't create one for the (potential) callbr destination. Instead, use the symbol for the MachineBasicBlock. Reviewers: nickdesaulniers, craig.topper Reviewed By: nickdesaulniers Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64888 llvm-svn: 366523	2019-07-19 01:10:28 +00:00
Amara Emerson	cf12c7815f	[GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516	2019-07-19 00:24:45 +00:00
Peter Collingbourne	50057f3288	CodeGen: Allow !associated metadata to point to aliases. This is a small extension of !associated, mostly useful for the implementation convenience of instrumentation passes that RAUW globals with aliases, such as LowerTypeTests. Differential Revision: https://reviews.llvm.org/D64951 llvm-svn: 366502	2019-07-18 21:37:16 +00:00
Amy Huang	f332fe642c	[COFF] Change a variable type to be const in the HeapAllocSite map. llvm-svn: 366479	2019-07-18 18:22:52 +00:00
Simon Pilgrim	8b525e357f	[DAGCombine] Pull getSubVectorSrc helper out of narrowInsertExtractVectorBinOp. NFCI. NFC step towards reusing this in other EXTRACT_SUBVECTOR combines. llvm-svn: 366435	2019-07-18 13:45:53 +00:00
Nilanjana Basu	4e22770219	Changes to display code view debug info type records in hex format llvm-svn: 366390	2019-07-17 23:43:58 +00:00
Nilanjana Basu	6e4076699c	Adding inline comments to code view type record directives for better readability llvm-svn: 366372	2019-07-17 21:01:12 +00:00
Francis Visoiu Mistrih	9f2b290add	[PEI] Don't re-allocate a pre-allocated stack protector slot The LocalStackSlotPass pre-allocates a stack protector and makes sure that it comes before the local variables on the stack. We need to make sure that later during PEI we don't re-allocate a new stack protector slot. If that happens, the new stack protector slot will end up being after the local variables that it should be protecting. Therefore, we would have two slots assigned for two different stack protectors, one at the top of the stack, and one at the bottom. Since PEI will overwrite the assigned slot for the stack protector, the load that is used to compare the value of the stack protector will use the slot assigned by PEI, which is wrong. For this, we need to check if the object is pre-allocated, and re-use that pre-allocated slot. Differential Revision: https://reviews.llvm.org/D64757 llvm-svn: 366371	2019-07-17 20:46:19 +00:00
Francis Visoiu Mistrih	90ba54bf67	[CodeGen][NFC] Simplify checks for stack protector index checking Use `hasStackProtectorIndex()` instead of `getStackProtectorIndex() >= 0`. llvm-svn: 366369	2019-07-17 20:46:09 +00:00
Matt Arsenault	0966dd0d69	GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sources Extract the sources to the GCD of the original size and target size, padding with implicit_def as necessary. Also fix the case where the requested source type is wider than the original result type. This was ignoring the type, and just using the destination. Do the operation in the requested type and truncate back. llvm-svn: 366367	2019-07-17 20:22:44 +00:00
Matt Arsenault	914a59cad8	GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUES Use an anyext to the requested type for the leftover operand to produce a slightly wider type, and then truncate the final merge. I have another implementation almost ready which handles arbitrary widens, but I think it produces worse code in this example (which I think is 90% due to not folding redundant copies or folding out implicit_def users), so I wanted to add this as a baseline first. llvm-svn: 366366	2019-07-17 20:22:38 +00:00
Evgeniy Stepanov	d752f5e953	Basic codegen for MTE stack tagging. Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360	2019-07-17 19:24:02 +00:00
Alex Bradbury	ab009a602e	[AsmPrinter] Make the encoding of call sites in .gcc_except_table configurable and use for RISC-V The original behavior was to always emit the offsets to each call site in the call site table as uleb128 values, however on some architectures (eg RISCV) these uleb128 offsets into the code cannot always be resolved until link time (because relaxation will invalidate any calculated offsets), and there are no appropriate relocations for uleb128 values. As a consequence it needs to be possible to specify an alternative. This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in .gcc_except_table Differential Revision: https://reviews.llvm.org/D63415 Patch by Edward Jones. llvm-svn: 366329	2019-07-17 14:00:35 +00:00
Alex Bradbury	b94c233d06	[RISCV] Set correct encodings for DWARF exception handling This patch sets correct encodings for DWARF exception handling for RISC-V (other than call site encoding, which must be udata4 rather than uleb128 and is handled by D63415). This has the same intend as D63409, except this version matches GCC/binutils behaviour which uses the same encodings regardless of PIC/non-PIC and medlow/medany code model. llvm-svn: 366327	2019-07-17 13:54:38 +00:00
Petar Avramovic	1e62635d05	[MIPS GlobalISel] ClampScalar and select pointer G_ICMP Add narrowScalar to half of original size for G_ICMP. ClampScalar G_ICMP's operands 2 and 3 to to s32. Select G_ICMP for pointers for MIPS32. Pointer compare is same as for integers, it is enough to declare them as legal type. Differential Revision: https://reviews.llvm.org/D64856 llvm-svn: 366317	2019-07-17 12:08:01 +00:00
Matt Arsenault	1c3f4ec7fc	GlobalISel: Add overload of handleAssignments with CCState AMDGPU needs to allocate special argument registers separately from the user function argument list, so needs direct control over the CCState. The ArgLocs argument is only really necessary because CCState doesn't allow access to it. llvm-svn: 366279	2019-07-16 22:41:34 +00:00
David Blaikie	40580d36c4	DWARF: Skip zero column for inline call sites D64033 <https://reviews.llvm.org/D64033> added DW_AT_call_column for inline sites. However, that change wasn't aware of "-gno-column-info". To avoid adding column info when "-gno-column-info" is used, now DW_AT_call_column is only added when we have non-zero column (when "-gno-column-info" is used, column will be zero). Patch by Wenlei He! Differential Revision: https://reviews.llvm.org/D64784 llvm-svn: 366264	2019-07-16 21:15:19 +00:00
Ulrich Weigand	450c62e33e	[Strict FP] Allow more relaxed scheduling Reimplement scheduling constraints for strict FP instructions in ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed scheduling. Specifially, allow one strict FP instruction to be scheduled across another, as long as it is not moved across any global barrier. Differential Revision: https://reviews.llvm.org/D64412 Reviewed By: cameron.mcinally llvm-svn: 366222	2019-07-16 15:55:45 +00:00
Francis Visoiu Mistrih	cc909812a3	[Remarks][NFC] Combine ParserFormat and SerializerFormat It's useless to have both. llvm-svn: 366216	2019-07-16 15:24:59 +00:00
Amaury Sechet	f34a69c2e2	[DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and flip carry. Summary: As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b. Depends on D57302 Reviewers: hfinkel, RKSimon, craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59208 llvm-svn: 366214	2019-07-16 15:17:00 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
Heejin Ahn	9f96a58ccc	[WebAssembly] Rename except_ref type to exnref Summary: We agreed to rename `except_ref` to `exnref` for consistency with other reference types in https://github.com/WebAssembly/exception-handling/issues/79. This also renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order to use the file for other reference types in future. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64703 llvm-svn: 366145	2019-07-15 22:49:25 +00:00
Matt Arsenault	434d664095	GlobalISel: Implement narrowScalar for vector extract/insert indexes llvm-svn: 366113	2019-07-15 19:37:34 +00:00
Fangrui Song	335f955dc4	[PowerPC] Support fp128 libcalls On PowerPC, IEEE 754 quadruple-precision libcall names use "kf" instead of "tf". In libgcc, libgcc/config/rs6000/float128-sed converts TF names to KF names. This patch implements its 24 substitution rules. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D64282 llvm-svn: 366039	2019-07-15 05:02:32 +00:00
Jonas Devlieghere	83264b3580	[DebugInfo] Add column info for inline sites The column field is missing for all inline sites, currently it's always zero. This changes populates DW_AT_call_column field for inline sites. Test case modified to cover this change. Patch by: Wenlei He Differential revision: https://reviews.llvm.org/D64033 llvm-svn: 365945	2019-07-12 19:25:45 +00:00
Fangrui Song	b251cc0d91	Delete dead stores llvm-svn: 365903	2019-07-12 14:58:15 +00:00
Simon Pilgrim	701e2c0d71	[DAGCombine] narrowExtractedVectorBinOp - wrap subvector extraction in helper. NFCI. First step towards supporting 'free' subvector extractions other than concat_vectors. llvm-svn: 365896	2019-07-12 13:00:35 +00:00
Djordje Todorovic	0739ccd3b5	Revert "[DwarfDebug] Dump call site debug info" A build failure was found on the SystemZ platform. This reverts commit 9e7e73578e54cd22b3c7af4b54274d743b6607cc. llvm-svn: 365886	2019-07-12 09:45:12 +00:00
Jinsong Ji	9577086628	[MachinePipeliner] Fix order for nodes with Anti dependence in same cycle Summary: Problem exposed in PowerPC functional testing. We did not consider Anti dependence for nodes in same cycle, so we may end up generating bad machine code. eg: the reduced test won't verify. * Bad machine code: Using an undefined physical register * - function: lame_encode_buffer_interleaved - basic block: %bb.4 (0x4bde4e12928) - instruction: %29:gprc = ADDZE %27:gprc, implicit-def dead $carry, implicit $carry - operand 3: implicit $carry Reviewers: bcahoon, kparzysz, hfinkel Subscribers: MaskRay, wuzish, nemanjai, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64192 llvm-svn: 365859	2019-07-12 01:59:42 +00:00
Simon Pilgrim	d0307f93a7	[DAGCombine] narrowInsertExtractVectorBinOp - add CONCAT_VECTORS support We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y). This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well. In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops. The fast-isel-store.ll load folding regression is annoying but I don't think is that critical. Differential Revision: https://reviews.llvm.org/D63653 llvm-svn: 365785	2019-07-11 14:45:03 +00:00
Matt Arsenault	6eb8ae8f17	RegUsageInfoCollector: Skip calling conventions I missed before llvm-svn: 365784	2019-07-11 14:41:40 +00:00
Matt Arsenault	7e71902b79	GlobalISel: Use Register llvm-svn: 365780	2019-07-11 14:18:19 +00:00
Tim Northover	67828edbbd	OpaquePtr: switch to GlobalValue::getValueType in a few places. NFC. llvm-svn: 365770	2019-07-11 13:13:02 +00:00
Tim Northover	f2d6597653	OpaquePtr: use byval accessor instead of inspecting pointer type. NFC. The accessor can deal with both "byval(ty)" and "ty* byval" forms seamlessly. llvm-svn: 365769	2019-07-11 13:12:38 +00:00
Sanjay Patel	138328e45c	[SDAG] commute setcc operands to match a subtract If we have: R = sub X, Y P = cmp Y, X ...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags. Motivated by diffs in D58875 - not sure if this changes anything there, but this seems like a good thing independent of that. There's a more involved version of this transform already in IR (in instcombine although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()". Differential Revision: https://reviews.llvm.org/D63958 llvm-svn: 365711	2019-07-10 23:23:54 +00:00
Amara Emerson	7a4d2df04a	[AArch64][GlobalISel] Optimize compare and branch cases with G_INTTOPTR and unknown values. Since we have distinct types for pointers and scalars, G_INTTOPTRs can sometimes obstruct attempts to find constant source values. These usually come about when try to do some kind of null pointer check. Teaching getConstantVRegValWithLookThrough about this operation allows the CBZ/CBNZ optimization to catch more cases. This change also improves the case where we can't find a constant source at all. Previously we would emit a cmp, cset and tbnz for that. Now we try to just emit a cmp and conditional branch, saving an instruction. The cumulative code size improvement of this change plus D64354 is 5.5% geomean on arm64 CTMark -O0. Differential Revision: https://reviews.llvm.org/D64377 llvm-svn: 365690	2019-07-10 19:21:43 +00:00
Michael Berg	f4572249d7	Move three folds for FADD, FSUB and FMUL in the DAG combiner away from Unsafe to more aligned checks that reflect context Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang. I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it. Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm Reviewed By: arsenm Subscribers: michele.scandale, wdng, javed.absar Differential Revision: https://reviews.llvm.org/D64450 llvm-svn: 365679	2019-07-10 18:23:26 +00:00
Nick Desaulniers	8728e45706	[TargetLowering] support BlockAddress as "i" inline asm constraint Summary: This allows passing address of labels to inline assembly "i" input constraints. Fixes pr/42502. Reviewers: ostannard Reviewed By: ostannard Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D64167 llvm-svn: 365664	2019-07-10 17:08:25 +00:00
Matt Arsenault	6ce1b4fec5	GlobalISel: Legalization for G_FMINNUM/G_FMAXNUM llvm-svn: 365658	2019-07-10 16:31:19 +00:00
Matt Arsenault	e595a2c964	GlobalISel: Define the full family of FP min/max instructions llvm-svn: 365657	2019-07-10 16:31:15 +00:00
Simon Pilgrim	94c84aca5d	[DAGCombine] visitINSERT_SUBVECTOR - use uint64_t subvector index. NFCI. Keep the uint64_t type from getZExtValue() to stop truncation/extension overflow warnings in MSVC in subvector index math. llvm-svn: 365621	2019-07-10 12:21:35 +00:00
Simon Pilgrim	bb1167a3a1	Fix const/non-const lambda return type warning. NFCI. llvm-svn: 365613	2019-07-10 10:45:09 +00:00
Matt Arsenault	b1843e130a	GlobalISel: Implement lower for G_FCOPYSIGN In SelectionDAG AMDGPU treated these as legal, but this was mostly because the bitcasts required for FP types were painful. Theoretically the bitpattern should eventually match to bfi, so don't bother trying to get the patterns to import. llvm-svn: 365583	2019-07-09 23:34:29 +00:00
Matt Arsenault	14a4495155	GlobalISel: Combine unmerge of merge with intermediate cast This eliminates some illegal intermediate vectors when operations are scalarized. llvm-svn: 365566	2019-07-09 22:19:13 +00:00
Craig Topper	84a1f07363	[X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549	2019-07-09 19:55:28 +00:00
Jinsong Ji	06fef0b359	Revert "[HardwareLoops] NFC - move hardware loop checking code to isHardwareLoopProfitable()" This reverts commit `d955573065`. llvm-svn: 365520	2019-07-09 17:53:09 +00:00
Amara Emerson	6616e269a6	[AArch64][GlobalISel] Optimize conditional branches followed by unconditional branches If we have an icmp->brcond->br sequence where the brcond just branches to the next block jumping over the br, while the br takes the false edge, then we can modify the conditional branch to jump to the br's target while inverting the condition of the incoming icmp. This means we can eliminate the br as an unconditional branch to the fallthrough block. Differential Revision: https://reviews.llvm.org/D64354 llvm-svn: 365510	2019-07-09 16:05:59 +00:00
Simon Pilgrim	57603cbde8	[DAGCombine] LoadedSlice - keep getOffsetFromBase() uint64_t offset. NFCI. Keep the uint64_t type from getOffsetFromBase() to stop truncation/extension overflow warnings in MSVC in alignment math. llvm-svn: 365504	2019-07-09 15:28:57 +00:00
Chen Zheng	d955573065	[HardwareLoops] NFC - move hardware loop checking code to isHardwareLoopProfitable() Differential Revision: https://reviews.llvm.org/D64197 llvm-svn: 365497	2019-07-09 14:56:17 +00:00
Petar Avramovic	be20e36107	[MIPS GlobalISel] Register bank select for G_PHI. Select i64 phi Select gprb or fprb when def/use register operand of G_PHI is used/defined by either: copy to/from physical register or instruction with only one mapping available for that use/def operand. Integer s64 phi is handled with narrowScalar when mapping is applied, produced artifacts are combined away. Manually set gprb to all register operands of instructions created during narrowScalar. Differential Revision: https://reviews.llvm.org/D64351 llvm-svn: 365494	2019-07-09 14:36:17 +00:00
Simon Pilgrim	480e8ad217	[CodeGen] AccelTable - remove non-constexpr (MSVC) Atom defs Now that we've dropped VS2015 support (D64326) we can enable the constexpr variables on MSVC builds as VS2017+ correctly handles them llvm-svn: 365477	2019-07-09 13:07:48 +00:00
Djordje Todorovic	c1e0ea9765	[NFC][AsmPrinter] Fix the formatting for the rL365467 In addition, fix the build failure for the 'unused' variable. The variable was used inside the 'LLVM_DEBUG()'. llvm-svn: 365469	2019-07-09 12:06:21 +00:00
Tim Northover	60afa49abe	OpaquePtr: add Type parameter to Loads analysis API. This makes the functions in Loads.h require a type to be specified independently of the pointer Value so that when pointers have no structure other than address-space, it can still do its job. Most callers had an obvious memory operation handy to provide this type, but a SROA and ArgumentPromotion were doing more complicated analysis. They get updated to merge the properties of the various instructions they were considering. llvm-svn: 365468	2019-07-09 11:35:35 +00:00
Djordje Todorovic	01eaae6dd1	[DwarfDebug] Dump call site debug info Dump the DWARF information about call sites and call site parameters into debug info sections. The patch also provides an interface for the interpretation of instructions that could load values of a call site parameters in order to generate DWARF about the call site parameters. ([13/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60716 llvm-svn: 365467	2019-07-09 11:33:56 +00:00
Bjorn Pettersson	051a6a1c33	[SelectionDAG] Simplify some calls to getSetCCResultType. NFC DAGTypeLegalizer and SelectionDAGLegalize has helper functions wrapping the call to TLI.getSetCCResultType(...). Use those helpers in more places. llvm-svn: 365456	2019-07-09 10:27:51 +00:00
Bjorn Pettersson	59029017a6	[LegalizeTypes] Fix saturation bug for smul.fix.sat Summary: Make sure we use SETGE instead of SETGT when checking if the sign bit is zero at SMULFIXSAT expansion. The faulty expansion occured when doing "expand" of SMULFIXSAT and the scale was exactly matching the size of the smaller type. For example doing i64 Z = SMULFIXSAT X, Y, 32 and expanding X/Y/Z into using two i32 values. The problem was that we sometimes did not saturate to min when overflowing. Here is an example using Q3.4 numbers: Consider that we are multiplying X and Y. X = 0x80 (-8.0 as Q3.4) Y = 0x20 (2.0 as Q3.4) To avoid loss of precision we do a widening multiplication, getting a 16 bit result Z = 0xF000 (-16.0 as Q7.8) To detect negative overflow we should check if the five most significant bits in Z are less than -1. Assume that we name the 4 most significant bits as HH and the next 4 bits as HL. Then we can do the check by examining if (HH < -1) or (HH == -1 && "sign bit in HL is zero"). The fault was that we have been doing the check as (HH < -1) or (HH == -1 && HL > 0) instead of (HH < -1) or (HH == -1 && HL >= 0). In our example HH is -1 and HL is 0, so the old code did not trigger saturation and simply truncated the result to 0x00 (0.0). With the bugfix we instead detect that we should saturate to min, and the result will be set to 0x80 (-8.0). Reviewers: leonardchan, bevinh Reviewed By: leonardchan Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64331 llvm-svn: 365455	2019-07-09 10:24:50 +00:00
Guillaume Chatelet	336f3e1601	Fixing @llvm.memcpy not honoring volatile. This is explicitly not addressing target-specific code, or calls to memcpy. Summary: https://bugs.llvm.org/show_bug.cgi?id=42254 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63215 llvm-svn: 365449	2019-07-09 09:53:36 +00:00
Jeremy Morse	9bebc65d79	Revert r364515 and r364524 Jordan reports on llvm-commits a performance regression with r364515, backing the patch out while it's investigated. llvm-svn: 365448	2019-07-09 09:38:03 +00:00
Djordje Todorovic	12aca5de02	Reland "[LiveDebugValues] Emit the debug entry values" Emit replacements for clobbered parameters location if the parameter has unmodified value throughout the funciton. This is basic scenario where we can use the debug entry values. ([12/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D58042 llvm-svn: 365444	2019-07-09 08:36:34 +00:00
Jinsong Ji	cbd64f7648	[MachinePipeliner] Fix Phi refers to Phi in same stage in 1st epilogue Summary: This is exposed by functional testing on PowerPC. In some pipelined loops, Phi refer to phi did not get value defined by the Phi, hence getting wrong value later. As the comment mentioned, we should "use the value defined by the Phi, unless we're generating the firstepilog and the Phi refers to a Phi in a different stage.", so Phi refering to same stage Phi should use the value defined by the Phi here. Reviewers: bcahoon, hfinkel Reviewed By: hfinkel Subscribers: MaskRay, wuzish, nemanjai, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64035 llvm-svn: 365428	2019-07-09 02:27:35 +00:00
Nilanjana Basu	faed8516e4	Changing CodeView debug info type record representation in assembly files to make it more human-readable & editable & fixing bug introduced in r364987 llvm-svn: 365417	2019-07-09 01:11:02 +00:00
Reid Kleckner	2f07c2e9d9	Standardize on MSVC behavior for triples with no environment Summary: This makes it so that IR files using triples without an environment work out of the box, without normalizing them. Typically, the MSVC behavior is more desirable. For example, it tends to enable things like constant merging, use of associative comdats, etc. Addresses PR42491 Reviewers: compnerd Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64109 llvm-svn: 365387	2019-07-08 21:05:20 +00:00
Matt Arsenault	5630e3a1c7	RegUsageInfoCollector: Don't iterate all regs for every reg class This is extremly slow on AMDGPU, which has a lot of physical register and a lot of register classes. determineCalleeSaves, via MachineRegisterInfo::isPhysRegUsed already added all of the super registers to the saved set. llvm-svn: 365370	2019-07-08 18:48:42 +00:00
Matt Arsenault	079f77b590	GlobalISel: Convert some build functions to using SrcOp/DstOp llvm-svn: 365343	2019-07-08 16:27:47 +00:00
Matt Arsenault	bd791b57f8	GlobalISel: widenScalar for G_BUILD_VECTOR llvm-svn: 365320	2019-07-08 13:48:06 +00:00
Simon Pilgrim	9285bf0fb9	[TargetLowering] SimplifyDemandedBits - just call computeKnownBits for BUILD_VECTOR cases. Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well). A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts. llvm-svn: 365309	2019-07-08 11:00:39 +00:00
David Majnemer	617df204b5	[CodeGen] Add larger vector types for i32 and f32 Some out of tree backend require larger vector type. Since maintaining the changes out of tree is difficult due to the many manual changes needed when adding a new type we are adding it even if no backend currently use it. Differential Revision: https://reviews.llvm.org/D64141 Patch by Thomas Raoux! llvm-svn: 365274	2019-07-07 04:47:37 +00:00
Simon Pilgrim	9c68aa33e3	[DAGCombine] convertBuildVecZextToZext - remove duplicate getOpcode() call. NFCI. llvm-svn: 365269	2019-07-06 18:32:15 +00:00
Quentin Colombet	0ffe0db6fa	[RegisterCoalescer] Fix an overzealous assert Although removeCopyByCommutingDef deals with full copies, it is still possible to copy undef lanes and thus, we wouldn't have any a value number for these lanes. This fixes PR40215. llvm-svn: 365256	2019-07-06 00:34:54 +00:00
Matt Arsenault	705e46f449	RegUsageInfoCollector: Skip AMDGPU entry point functions I'm not sure if it's worth it or not to add a hook to disable the pass for an arbitrary function. This pass is taking up to 5% of compile time in tiny programs by iterating through all of the physical registers in every register class. This pass should be rewritten in terms of regunits. For now, skip doing anything for entry point functions. The vast majority of functions in the real world aren't callable, so just not running this will give the majority of the benefit. llvm-svn: 365255	2019-07-05 23:33:43 +00:00
Michael Liao	8d6ea2d48c	[CodeGen] Enhance `MachineInstrSpan` to allow the end of MBB to be used. Summary: - Explicitly specify the parent MBB to allow the end iterator to be used. Reviewers: aprantl, MatzeB, craig.topper, qcolombet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64261 llvm-svn: 365240	2019-07-05 20:23:59 +00:00
Matt Arsenault	27a6985d90	ScheduleDAG: Fix incorrectly killing registers in bundles When looking for uses/defs to add kill flags, the iterator was double incremented, skipping the first instruction in the bundle. The use register in the first bundle instruction was then incorrectly killed. The "First" instruction should be the BUNDLE itself as the proper reverse iterator endpoint. llvm-svn: 365216	2019-07-05 15:32:28 +00:00
Robert Lougher	2478b62098	Revert r365198 as this accidentally commited something that should not have been added. llvm-svn: 365199	2019-07-05 12:30:45 +00:00
Robert Lougher	3bea2b15f5	This reverts r365061 and r365062 (test update) Revision r365061 changed a skip of debug instructions for a skip of meta instructions. This is not safe, as IMPLICIT_DEF is classed as a meta instruction. llvm-svn: 365198	2019-07-05 12:20:21 +00:00
Craig Topper	e9aed963ce	[DAGCombiner] Don't combine (addcarry (uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) if the Carry comes from the uaddo. Summary: The uaddo won't be removed and the addcarry will still be dependent on the uaddo. So we'll just increase the use count of X and Y and potentially require a COPY. Reviewers: spatel, RKSimon, deadalnix Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64190 llvm-svn: 365149	2019-07-04 18:18:46 +00:00
Matt Arsenault	43cbca50e4	GlobalISel: Fix widenScalar for pointer typed G_MERGE_VALUES llvm-svn: 365093	2019-07-03 23:08:06 +00:00
Francis Visoiu Mistrih	83bbe2f418	[CodeGen] Make branch funnels pass the machine verifier We previously marked all the tests with branch funnels as `-verify-machineinstrs=0`. This is an attempt to fix it. 1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList = (outs)` 2) After that we hit an assert: ``` Assertion failed: (Op.getValueType() != MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue operands should occur at end of operand list!"), function AddOperand, file /Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp, line 461. ``` The chain operand was added at the beginning of the operand list. Move that to the end. 3) After that we hit another verifier issue in the pseudo expansion where the registers used in the cmps and jmps are not added to the livein lists. Add the `EFLAGS` to all the new MBBs that we create. PR39436 Differential Review: https://reviews.llvm.org/D54155 llvm-svn: 365058	2019-07-03 17:16:45 +00:00
Amaury Sechet	57dfacb32d	Use getAllOnesConstants instead of -1 in DAGCombiner. NFC llvm-svn: 365054	2019-07-03 16:34:36 +00:00
Amaury Sechet	bddb8c3597	[DAGCombine] More diamong carry pattern optimization. Summary: This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one. Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation. Reviewers: hfinkel, RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57302 llvm-svn: 365051	2019-07-03 16:15:59 +00:00
James Molloy	fa4aac7335	[SelectionDAG] Propagate alias metadata to target intrinsic nodes When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available. Differential revision: https://reviews.llvm.org/D64131 llvm-svn: 365043	2019-07-03 14:33:29 +00:00
Oliver Stannard	830b20344b	[ARM] Thumb2: favor R4-R7 over R12/LR in allocation order when opt for minsize For Thumb2, we prefer low regs (costPerUse = 0) to allow narrow encoding. However, current allocation order is like: R0-R3, R12, LR, R4-R11 As a result, a lot of instructs that use R12/LR will be wide instrs. This patch changes the allocation order to: R0-R7, R12, LR, R8-R11 for thumb2 and -Osize. In most cases, there is no extra push/pop instrs as they will be folded into existing ones. There might be slight performance impact due to more stack usage, so we only enable it when opt for min size. https://reviews.llvm.org/D30324 llvm-svn: 365014	2019-07-03 09:58:52 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Nilanjana Basu	c0b557744a	Revert Changing CodeView debug info type record representation in assembly files to make it more human-readable & editable This reverts r364982 (git commit `2082bf28eb`) llvm-svn: 364987	2019-07-03 00:51:49 +00:00
Nilanjana Basu	2082bf28eb	Changing CodeView debug info type record representation in assembly files to make it more human-readable & editable llvm-svn: 364982	2019-07-03 00:26:23 +00:00
Teresa Johnson	e6768d613a	[RA] Fix spelling of Greedy register allocator internal option The internal option added with r323870 has a typo. It isn't being used by any tests, but I decided to fix the spelling and leave it in for use in debugging the changes added in that patch. llvm-svn: 364958	2019-07-02 18:54:03 +00:00
Matt Arsenault	ce690544a6	GlobalISel: Add G_FENCE The pattern importer is for some reason emitting checks for G_CONSTANT for the immediate operands. llvm-svn: 364926	2019-07-02 14:16:39 +00:00
Roman Lebedev	7c8ee375d8	[NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' from D63963 llvm-svn: 364921	2019-07-02 13:21:23 +00:00
Amara Emerson	000ef2c2ae	[TailDuplicator] Fix copy instruction emitting into the wrong block. The code for duplicating instructions could sometimes try to emit copies intended to deal with unconstrainable register classes to the tail block of the original instruction, rather than before the newly cloned instruction in the predecessor block. This was exposed by GlobalISel on arm64. Differential Revision: https://reviews.llvm.org/D64049 llvm-svn: 364888	2019-07-02 06:04:46 +00:00
Zi Xuan Wu	7ae536a1ce	[DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function For a given floating point load / store pair, if the load value isn't used by any other operations, then consider transforming the pair to integer load / store operations if the target deems the transformation profitable. And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer. I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target. Differential Revision: https://reviews.llvm.org/D60601 llvm-svn: 364883	2019-07-02 02:54:52 +00:00
Nilanjana Basu	8b7a0baa20	Testing commit access through minor formatting change llvm-svn: 364843	2019-07-01 20:27:37 +00:00
Matt Arsenault	c9f14f29f5	GlobalISel: Try to widen merges with other merges If the requested source type an be used as a merge source type, create a merge of merges. This avoids creating large, illegal extensions and bit-ops directly to the result type. llvm-svn: 364841	2019-07-01 19:36:10 +00:00
Matt Arsenault	03ca176ab3	GlobalISel: Verify G_MERGE_VALUES operand sizes llvm-svn: 364822	2019-07-01 18:01:35 +00:00
Aditya Nandakumar	1023a2eca3	[GlobalISel]: Allow backends to custom legalize Intrinsics https://reviews.llvm.org/D31359 Add a hook "legalizeInstrinsic" to allow backends to override this and custom lower/legalize intrinsics. llvm-svn: 364821	2019-07-01 17:53:50 +00:00
Matt Arsenault	6f74f55750	GlobalISel: Implement lower for min/max llvm-svn: 364816	2019-07-01 17:18:03 +00:00
Diana Picus	2ba16011c1	Fixup r364512 Fix stack-use-after-scope errors from r364512. One instance was already fixed in r364611 - this patch simplifies that fix and addresses one more instance of similar code. Discussed in: https://reviews.llvm.org/D63905 llvm-svn: 364778	2019-07-01 15:07:38 +00:00
Benjamin Kramer	ed13fef477	[SelectionDAG] Do minnum->minimum at legalization time instead of building time The SDAGBuilder behavior stems from the days when we didn't have fast math flags available in SDAG. We do now and doing the transformation in the legalizer has the advantage that it also works for vector types. llvm-svn: 364743	2019-07-01 11:00:23 +00:00
Jeremy Morse	d2b6665e33	[DebugInfo] Avoid adding too much indirection to pointer-valued variables This patch addresses PR41675, where a stack-pointer variable is dereferenced too many times by its location expression, presenting a value on the stack as the pointer to the stack. The difference between a stack pointer DBG_VALUE and one that refers to a value on the stack, is currently the indirect flag. However the DWARF backend will also try to guess whether something is a memory location or not, based on whether there is any computation in the location expression. By simply prepending the stack offset to existing expressions, we can accidentally convert a register location into a memory location, which introduces a suprise (and unintended) dereference. The solution is to add DW_OP_stack_value whenever we add a DIExpression computation to a stack pointer. It's an implicit location computed on the expression stack, thus needs to be flagged as a stack_value. For the edge case where the offset is zero and the location could be a register location, DIExpression::prepend will still generate opcodes, and thus DW_OP_stack_value must still be added. Differential Revision: https://reviews.llvm.org/D63429 llvm-svn: 364736	2019-07-01 09:38:23 +00:00
Sam Parker	98722691b0	[ARM] WLS/LE Code Generation Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733	2019-07-01 08:21:28 +00:00
Fangrui Song	78ee2fbf98	Cleanup: llvm::bsearch -> llvm::partition_point after r364719 llvm-svn: 364720	2019-06-30 11:19:56 +00:00
Craig Topper	4d0feb28ec	[SelectionDAG] Use the memory VT instead of result VT for FoldingSet profiling in getMaskedLoad/getMaskedStore. This matches what is done by the Profile function. Otherwise CSE won't work properly. llvm-svn: 364717	2019-06-30 06:46:33 +00:00
Sam Parker	9a92be1b35	[HardwareLoops] Loop counter guard intrinsic Introduce llvm.test.set.loop.iterations which sets the loop counter and also produces an i1 after testing that the count is not zero. Differential Revision: https://reviews.llvm.org/D63809 llvm-svn: 364628	2019-06-28 07:38:16 +00:00
Matt Arsenault	3018d1845b	GlobalISel: Use Register llvm-svn: 364618	2019-06-28 01:47:44 +00:00
Matt Arsenault	5e66db6b8c	GlobalISel: Convert rest of MachineIRBuilder to using Register llvm-svn: 364615	2019-06-28 01:16:41 +00:00
Amara Emerson	ecb7ac35f9	[GlobalISel][IRTranslator] Fix some PHI bugs related to jump tables when optimizations are used. The new switch lowering code that tries to generate jump tables and range checks were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to town on trying to generate optimized lowerings, e.g. multiple jump tables, range checks etc. This exposed bugs in the way PHI nodes are handled because the CFG looks even stranger after all of this is done. llvm-svn: 364613	2019-06-27 23:56:34 +00:00
Rumeet Dhindsa	ddc2804e1a	Fix ASAN error caused by commit r364512. This patch intends to fix ASAN stack-use-after-scope error. This is at least a short-term fix to unbreak LLVM's mainline. Differential Revision: https://reviews.llvm.org/D63905 llvm-svn: 364611	2019-06-27 23:37:04 +00:00
Roman Lebedev	29d05c005f	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. This is a recommit, the original commit rL364563 was reverted in rL364568 because test-suite detected miscompile - the new comparison constant 'Q' was being computed incorrectly (we divided by `D0` instead of `D`). Original patch D50222 by @hermord (Dmytro Shynkevych) Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364600	2019-06-27 21:52:10 +00:00
Djordje Todorovic	774eabd097	Revert "[LiveDebugValues] Emit the debug entry values" Appears that the 'test/DebugInfo/MIR/X86/dbginfo-entryvals.mir' does not pass on Windows. This reverts commit rL364553. llvm-svn: 364571	2019-06-27 18:12:04 +00:00
Roman Lebedev	0a2b7b79fa	Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)" Appears to break test-suite on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790 FAIL: burg.execution_time FAIL: spiff.execution_time FAIL: employ.execution_time FAIL: llu.execution_time FAIL: gramschmidt.execution_time FAIL: fdtd-apml.execution_time This reverts commit r364563. llvm-svn: 364568	2019-06-27 17:22:31 +00:00
Roman Lebedev	0627b09863	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... Original patch D50222 by @hermord (Dmytro Shynkevych) This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Original patch author: @hermord (Dmytro Shynkevych)! Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 llvm-svn: 364563	2019-06-27 16:45:42 +00:00
Djordje Todorovic	d6a46aff59	[LiveDebugValues] Emit the debug entry values Emit replacements for clobbered parameters location if the parameter has unmodified value throughout the funciton. This is basic scenario where we can use the debug entry values. ([12/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D58042 llvm-svn: 364553	2019-06-27 15:35:48 +00:00
Djordje Todorovic	7a9ca67fd5	[LiveRangeEdit] Fix build failure caused by the rL364536 llvm-svn: 364549	2019-06-27 14:31:52 +00:00
Simon Pilgrim	83e1a1e79b	[TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support. llvm-svn: 364548	2019-06-27 14:25:54 +00:00
Djordje Todorovic	a0d45058eb	[DWARF] Handle the DW_OP_entry_value operand Add the IR and the AsmPrinter parts for handling of the DW_OP_entry_values DWARF operation. ([11/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60866 llvm-svn: 364542	2019-06-27 13:52:34 +00:00
Simon Pilgrim	c692a8dc51	[TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts llvm-svn: 364541	2019-06-27 13:48:43 +00:00
Djordje Todorovic	71d3869f60	[Backend] Keep call site info valid through the backend Handle call instruction replacements and deletions in order to preserve valid state of the call site info of the MachineFunction. NOTE: If the call site info is enabled for a new target, the assertion from the MachineFunction::DeleteMachineInstr() should help to locate places where the updateCallSiteInfo() should be called in order to preserve valid state of the call site info. ([10/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D61062 llvm-svn: 364536	2019-06-27 13:10:29 +00:00
Djordje Todorovic	7eeeb5947e	[ISEL][X86] Tracking of registers that forward call arguments While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 llvm-svn: 364516	2019-06-27 10:51:15 +00:00
Jeremy Morse	d528bcd965	[DebugInfo] Avoid register coalesing unsoundly changing DBG_VALUE locations Once MIR code leaves SSA form and the liveness of a vreg is considered, DBG_VALUE insts are able to refer to non-live vregs, because their debug-uses do not contribute to liveness. This non-liveness becomes problematic for optimizations like register coalescing, as they can't ``see'' the debug uses in the liveness analyses. As a result registers get coalesced regardless of debug uses, and that can lead to invalid variable locations containing unexpected values. In the added test case, the first vreg operand of ADD32rr is merged with various copies of the vreg (great for performance), but a DBG_VALUE of the unmodified operand is blindly updated to the modified operand. This changes what value the variable will appear to have in a debugger. Fix this by changing any DBG_VALUE whose operand will be resurrected by register coalescing to be a $noreg DBG_VALUE, i.e. give the variable no location. This is an overapproximation as some coalesced locations are safe (others are not) -- an extra domination analysis would be required to work out which, and it would be better if we just don't generate non-live DBG_VALUEs. This fixes PR40010. Differential Revision: https://reviews.llvm.org/D56151 llvm-svn: 364515	2019-06-27 10:20:27 +00:00
Diana Picus	74a50a723b	[GlobalISel] Remove [un]packRegs from IRTranslator Remove the last use of packRegs from IRTranslator and delete pack/unpackRegs. This introduces a fallback to DAGISel for intrinsics with aggregate arguments, since we don't have a testcase for them so it's hard to tell how we'd want to handle them. Discussed in https://reviews.llvm.org/D63551 llvm-svn: 364514	2019-06-27 09:49:07 +00:00
Diana Picus	43fb5ae50c	[GlobalISel] Accept multiple vregs for lowerCall's args Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 llvm-svn: 364512	2019-06-27 09:18:03 +00:00
Diana Picus	8138996128	[GlobalISel] Accept multiple vregs for lowerCall's result Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 llvm-svn: 364511	2019-06-27 09:15:53 +00:00
Diana Picus	c3dbe23977	[GlobalISel] Accept multiple vregs in lowerFormalArgs Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510	2019-06-27 08:54:17 +00:00
Diana Picus	69ce1c1319	[GlobalISel] Allow multiple VRegs in ArgInfo. NFC Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509	2019-06-27 08:50:53 +00:00
Djordje Todorovic	a7cde103c1	[MachineFunction] Base support for call site info tracking Add an attribute into the MachineFunction that tracks call site info. ([8/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D61061 llvm-svn: 364506	2019-06-27 07:48:06 +00:00
Djordje Todorovic	59b39faa18	[IR] Add DISuprogram and DIE for a func decl A unique DISubprogram may be attached to a function declaration used for call site debug info. ([6/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60713 llvm-svn: 364500	2019-06-27 06:07:41 +00:00
Matt Arsenault	47345534aa	PEI: Add default handling of spills to registers llvm-svn: 364472	2019-06-26 20:56:15 +00:00
Evandro Menezes	42e13c8328	[CodeGen] Improve formatting of jump tables (NFC) Split jump tables into individual lines and fix spacing. llvm-svn: 364436	2019-06-26 15:11:31 +00:00
Roman Lebedev	b0ecc1cc6b	[X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418	2019-06-26 12:19:39 +00:00
Clement Courbet	2851248fa1	Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." Breaks sanitizers: libFuzzer :: cxxstring.test libFuzzer :: memcmp.test libFuzzer :: recommended-dictionary.test libFuzzer :: strcmp.test libFuzzer :: value-profile-mem.test libFuzzer :: value-profile-strcmp.test llvm-svn: 364416	2019-06-26 12:13:13 +00:00
Chen Zheng	aa99952896	[HardwareLoops] NFC - move loop with irreducible control flow checking logic to HarewareLoopInfo. llvm-svn: 364415	2019-06-26 12:02:43 +00:00
Clement Courbet	7b3a5f0e6d	[ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline. This allows later passes (in particular InstCombine) to optimize more cases. One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0. llvm-svn: 364412	2019-06-26 11:50:18 +00:00
Simon Pilgrim	a6319e5f83	[DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for extract_subvector(bitcast()) support We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8))) llvm-svn: 364405	2019-06-26 11:17:38 +00:00
Chen Zheng	46ce9e4fff	[HardwareLoops] NFC - move loop with irreducible control flow checking logic to isHardwareLoopProfitable() llvm-svn: 364397	2019-06-26 09:12:52 +00:00
QingShan Zhang	e0e7d4c366	Teach the DAGCombine to fold this pattern(c1 and c2 is constant). // fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) // fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2) // fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e. t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0> t2: i64 = any_extend t1 --> t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0> --> t4: i64 = sign_extend_inreg t3 Differential Revision: https://reviews.llvm.org/D63318 llvm-svn: 364382	2019-06-26 05:12:53 +00:00
Jinsong Ji	fee855b5bc	[MachinePipeliner] Fix risky iterator usage R++, --R When we calculate MII, we use two loops, one with iterator R++ to check whether we can reserve the resource, then --R to move back the iterator to do reservation. This is risky, as R++, --R may not point to the same element at all. The can cause wrong MII. Differential Revision: https://reviews.llvm.org/D63536 llvm-svn: 364353	2019-06-25 21:50:56 +00:00
Philip Reames	be0dedb2e1	[Peephole] Allow folding loads into instructions w/multiple uses (such as test64rr) Peephole opt has a one use limitation which appears to be accidental. The function being used was incorrectly documented as returning whether the def had one user, but instead returned true only when there was one use. Add a corresponding hasOneNonDbgUser helper, and adjust peephole-opt to use the appropriate one. All of the actual folding code handles multiple uses within a single instruction. That codepath is well exercised through instruction selection. Differential Revision: https://reviews.llvm.org/D63656 llvm-svn: 364336	2019-06-25 17:29:18 +00:00
Simon Pilgrim	9762b26032	[DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ...... llvm-svn: 364326	2019-06-25 16:00:16 +00:00
Sanjay Patel	685c5cbc65	[SDAG] expand ctpop != 1 Change the generic ctpop expansion to more efficiently handle a check for not-a-power-of-two value: (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) This is the inverted predicate sibling pattern that was added with: D63004 This should have been done before I changed IR canonicalization to favor this form with: rL364246 ...so if this requires revert/changing, the earlier commit may also need to modified. llvm-svn: 364319	2019-06-25 14:46:52 +00:00
Simon Pilgrim	1a18bb6f25	[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases. Reapplies rL363856. llvm-svn: 364311	2019-06-25 13:25:57 +00:00
Simon Pilgrim	36953ce769	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. Reapplies rL363850 but now with legality checks added at rL364290 llvm-svn: 364303	2019-06-25 12:57:43 +00:00
Sanjay Patel	e4ef62291b	[SDAG] improve expansion of ctpop+setcc This should not cause any visible change in output, but it's more efficient because we were producing non-canonical 'sub x, 1' and 'setcc ugt x, 0'. As mentioned in the TODO, we should also be handling the inverse predicate. llvm-svn: 364302	2019-06-25 12:49:35 +00:00
Simon Pilgrim	69fc111184	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. Reapplies rL363802 but now with legality checks added at rL364290 llvm-svn: 364299	2019-06-25 12:19:12 +00:00
Simon Pilgrim	b23c942ce4	[VectorLegalizer] ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this. This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this. Part of the yak shaving to solve the crashes from rL364264 and rL364272 llvm-svn: 364295	2019-06-25 11:31:37 +00:00
Simon Pilgrim	49b3778e32	[TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations \|\| isOperationLegal cases. We'll improve X86 legality in future commits. llvm-svn: 364290	2019-06-25 10:51:15 +00:00
Roman Lebedev	cdd43eac4f	[Codegen] TargetLowering::SimplifySetCC(): omit urem when possible Summary: This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll` The missing fold, at least partially, looks trivial: https://rise4fun.com/Alive/Zsln i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two, and the `urem` is of something that may at most have a single bit set (or no bits set at all), the `urem` is not needed. Reviewers: RKSimon, craig.topper, xbolva00, spatel Reviewed By: xbolva00, spatel Subscribers: xbolva00, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63390 llvm-svn: 364286	2019-06-25 10:01:42 +00:00
Clement Courbet	3bc5ad551a	[ExpandMemCmp] Move all options to TargetTransformInfo. Split off from D60318. llvm-svn: 364281	2019-06-25 08:04:13 +00:00
Craig Topper	079924b0b7	Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..." This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264	2019-06-25 01:32:42 +00:00
Roland Froese	ea08248b2b	[CodeGen] Add missing vector type legalization for ctlz_zero_undef Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as ctlz and cttz. Differential Revision: https://reviews.llvm.org/D63463 llvm-svn: 364221	2019-06-24 19:27:07 +00:00
Matt Arsenault	faeaedf8e9	GlobalISel: Remove unsigned variant of SrcOp Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194	2019-06-24 16:16:12 +00:00
Matt Arsenault	e3a676e9ad	CodeGen: Introduce a class for registers Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191	2019-06-24 15:50:29 +00:00
Simon Pilgrim	69144a925e	[DAGCombine] visitMUL - allow shift by zero in MulByConstant. This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward. Fixes OSS Fuzz #15429 llvm-svn: 364179	2019-06-24 12:47:17 +00:00
Fangrui Song	f955d5f623	SlotIndexes: delete unused functions llvm-svn: 364154	2019-06-23 16:05:29 +00:00
Fangrui Song	6620e3b2f6	SlotIndexes: simplify IdxMBBPair operators llvm-svn: 364152	2019-06-23 13:16:03 +00:00
Craig Topper	6ddc7912b0	[SelectionDAG] Remove the code that attempts to calculate the alignment for the second half of a split masked load/store. The code divides the alignment by 2 if the original alignment is equal to the original VT size. But this wouldn't be correct if the alignment was larger than the VT size. The memory operand object already takes care of calling MinAlign on the base alignment and the memory pointer offset. So we don't need any special code at all. llvm-svn: 364151	2019-06-23 07:00:46 +00:00
Fangrui Song	43e14390b0	Make GlobalISel depend on SelectionDAG after D63169 GlobalISel/IRTranslator.cpp now references SelectionDAG/FunctionLoweringInfo.cpp. This fixes a link error in -DBUILD_SHARED_LIBS=on builds: ld.lld: error: undefined symbol: llvm::FunctionLoweringInfo::clear() >>> referenced by IRTranslator.cpp:2198 (../lib/CodeGen/GlobalISel/IRTranslator.cpp:2198) >>> lib/CodeGen/GlobalISel/CMakeFiles/LLVMGlobalISel.dir/IRTranslator.cpp.o:(llvm::IRTranslator::finalizeFunction()) llvm-svn: 364124	2019-06-22 01:30:17 +00:00
Amara Emerson	fe4625fb24	[GlobalISel][IRTranslator] Change switch table translation to generate jump tables and range checks. This change makes use of the newly refactored SwitchLoweringUtils code from SelectionDAG to in order to generate jump tables and range checks where appropriate. Much of this code is ported from SDAG with some modifications. We generate G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means that targets which previously relied on the naive one MBB per case stmt translation will now start falling back until they add support for the new opcodes. For range checks, we don't generate any previously unused operations. This just recognizes contiguous ranges of case values and generates a single block per range. Single case value blocks are just a special case of ranges so we get that support almost for free. There are still some optimizations missing that I haven't ported over, and bit-tests are also unimplemented. This patch series is already complex enough. Actual arm64 support for selection of jump tables is coming in a later patch. Differential Revision: https://reviews.llvm.org/D63169 llvm-svn: 364085	2019-06-21 18:10:38 +00:00
Simon Pilgrim	0da13ed1f6	[DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI. llvm-svn: 364076	2019-06-21 16:44:51 +00:00
Simon Pilgrim	ca9933c22d	[DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code. Move the "extract from insert detection code" into a lambda helper function. llvm-svn: 364059	2019-06-21 14:46:21 +00:00
Fangrui Song	dc8de6037c	Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC llvm-svn: 364006	2019-06-21 05:40:31 +00:00
Amara Emerson	bc0d08e0ee	[GlobalISel][Localizer] Allow localization of G_INTTOPTR and chains of instructions. G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's essentially a side effect free cast instruction we can remat both instructions. This patch changes the localizer to enable localization of the chains by iterating over the entry block instructions in reverse order. That way, uses will localized first, and then the defs are free to be localized as well. This also changes the previous SmallPtrSet of localized instructions to use a SetVector instead. We're dealing with pointers and need deterministic iteration order. Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean. Differential Revision: https://reviews.llvm.org/D63630 llvm-svn: 364001	2019-06-21 00:36:19 +00:00
Simon Pilgrim	801c0f12b0	[DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible. Better handling of out-of-i64-range values due to large integer types or from fuzz tests. llvm-svn: 363955	2019-06-20 17:36:23 +00:00
Jordan Rupprecht	02508decf4	[DAGCombiner][NFC] Remove unused var llvm-svn: 363954	2019-06-20 17:30:01 +00:00
Amy Huang	7fac5c8d94	Store a pointer to the return value in a static alloca and let the debugger use that as the variable address for NRVO variables. Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D63361 llvm-svn: 363952	2019-06-20 17:15:21 +00:00
Evandro Menezes	aa10f05044	[CodeGen] Fix formatting and comments (NFC) llvm-svn: 363947	2019-06-20 16:34:00 +00:00
Simon Pilgrim	1d8093249f	[DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363929	2019-06-20 14:42:27 +00:00
Simon Pilgrim	98a0ac5c0f	[DAGCombine] Add TODOs for some combines that should support non-uniform vectors We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc. llvm-svn: 363924	2019-06-20 12:48:49 +00:00
Simon Pilgrim	a487628270	[DAGCombine] Reduce scope of ShAmtVal variable. NFCI. Fixes cppcheck warning. Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here. llvm-svn: 363921	2019-06-20 10:56:37 +00:00
Petar Avramovic	153bd24eda	[MIPS GlobalISel] Select integer to floating point conversions Select G_SITOFP and G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D63542 llvm-svn: 363912	2019-06-20 09:05:02 +00:00
Petar Avramovic	4b4dae1c76	[MIPS GlobalISel] Select floating point to integer conversions Select G_FPTOSI and G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D63541 llvm-svn: 363911	2019-06-20 08:52:53 +00:00
Simon Pilgrim	046d49a8dc	[DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue(). Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something..... llvm-svn: 363887	2019-06-19 22:14:24 +00:00
Simon Pilgrim	f05369768c	[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case. llvm-svn: 363856	2019-06-19 18:34:58 +00:00
Simon Pilgrim	6016fb726c	[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850	2019-06-19 18:00:24 +00:00
Simon Pilgrim	c3994f77cb	[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero. Matches what we already do for SIGN_EXTEND. llvm-svn: 363802	2019-06-19 13:58:02 +00:00
Simon Pilgrim	9eed5d2f78	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363793	2019-06-19 12:41:37 +00:00
Simon Pilgrim	8c49366c9b	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths. llvm-svn: 363792	2019-06-19 12:25:29 +00:00
Simon Pilgrim	bb6b856183	[DAGCombiner] visitSHL - pull out repeated shift amount VT. NFCI. llvm-svn: 363789	2019-06-19 11:31:26 +00:00
Simon Pilgrim	d954a53633	[DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI. We pre-extend, not post. llvm-svn: 363787	2019-06-19 11:17:48 +00:00
Chen Zheng	c5b918de58	[NFC] move some hardware loop checking code to a common place for other using. Differential Revision: https://reviews.llvm.org/D63478 llvm-svn: 363758	2019-06-19 01:26:31 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Amara Emerson	d11ea2c8c5	[GlobalISel][Localizer] Remove redundant set lookup. After changing the algorithm to only process the entry block we never revisit a processed instruction. llvm-svn: 363745	2019-06-18 22:08:40 +00:00
Jinsong Ji	ba43840bfe	[MachinePipeliner][NFC] Do resource tracking log only when requested. In most cases we don't need to do resource tracking debug, so leave them off by default. llvm-svn: 363733	2019-06-18 20:24:49 +00:00
Simon Pilgrim	5bef886cd8	[TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handling Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch. llvm-svn: 363716	2019-06-18 18:22:30 +00:00
Simon Pilgrim	032b54f8e8	[TargetLowering] SimplifyDemandedBits - Merge ZERO_EXTEND+ZERO_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363713	2019-06-18 18:08:30 +00:00
Simon Pilgrim	b6e7108dcd	[TargetLowering] SimplifyDemandedBits - Merge SIGN_EXTEND+SIGN_EXTEND_VECTOR_INREG handling Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches. llvm-svn: 363710	2019-06-18 17:57:53 +00:00
Simon Pilgrim	9aa25be149	[TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element. Fixes temporary regression introduced in rL363693. llvm-svn: 363694	2019-06-18 15:49:35 +00:00
Simon Pilgrim	83bacd8d72	[SelectionDAG] Legalize vaargs that require vector splitting This adds vector splitting for vaarg instructions during type legalization Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D60762 llvm-svn: 363671	2019-06-18 12:24:02 +00:00
Tom Stellard	1f7f64665c	GlobalISel: Remove redundant pass initialization Summary: All the GlobalISel passes are initialized when the target calls initializeGlobalISel(), so we don't need to call the initializers from the pass constructors. Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar Reviewed By: aemerson Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63235 llvm-svn: 363642	2019-06-18 02:05:06 +00:00
Matt Arsenault	5a321b899e	GlobalISel: Use the original flags when lowering fneg to fsub This was ignoring the flag on fneg, and using the source instruction's flags. Also fixes tests missing from r358702. Note the expansion itself isn't correct without nnan, but that should be fixed separately. llvm-svn: 363637	2019-06-17 23:48:43 +00:00
Peter Collingbourne	fb9ce100d1	hwasan: Add a tag_offset DWARF attribute to instrumented stack variables. The goal is to improve hwasan's error reporting for stack use-after-return by recording enough information to allow the specific variable that was accessed to be identified based on the pointer's tag. Currently we record the PC and lower bits of SP for each stack frame we create (which will eventually be enough to derive the base tag used by the stack frame) but that's not enough to determine the specific tag for each variable, which is the stack frame's base tag XOR a value (the "tag offset") that is unique for each variable in a function. In IR, the tag offset is most naturally represented as part of a location expression on the llvm.dbg.declare instruction. However, the presence of the tag offset in the variable's actual location expression is likely to confuse debuggers which won't know about tag offsets, and moreover the tag offset is not required for a debugger to determine the location of the variable on the stack, so at the DWARF level it is represented as an attribute so that it will be ignored by debuggers that don't know about it. Differential Revision: https://reviews.llvm.org/D63119 llvm-svn: 363635	2019-06-17 23:39:41 +00:00
Amara Emerson	146882242f	[GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 `1324200` -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632	2019-06-17 23:20:29 +00:00
Michael Berg	f9bff2a55e	Propagate fmf in IRTranslate for fneg Summary: This case is related to D63405 in that we need to be propagating FMF on negates. Reviewers: volkan, spatel, arsenm Reviewed By: arsenm Subscribers: wdng, javed.absar Differential Revision: https://reviews.llvm.org/D63458 llvm-svn: 363631	2019-06-17 23:19:40 +00:00
Daniel Sanders	184c8ee920	[globalisel] Fix iterator invalidation in the extload combines Summary: Change the way we deal with iterator invalidation in the extload combines as it was still possible to neglect to visit a use. Even worse, it happened in the in-tree test cases and the checks weren't good enough to detect it. We now take a cheap copy of the use list before iterating over it. This prevents iterator invalidation from occurring and has the nice side effect of making the existing schedule-for-erase/schedule-for-insert mechanism moot. Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61813 llvm-svn: 363616	2019-06-17 20:56:31 +00:00
Matt Arsenault	3e140066bc	GlobalISel: Ignore callsite attributes when picking intrinsic type A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. I fixed the same bug in SelectionDAG in r287593. llvm-svn: 363580	2019-06-17 17:01:35 +00:00
Matt Arsenault	a7f09f3c9e	GlobalISel: Verify intrinsics I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579	2019-06-17 17:01:32 +00:00
Whitney Tsang	15b7f5b72d	PHINode: introduce setIncomingValueForBlock() function, and use it. Summary: There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue() but no function to replace incoming value for a specified BasicBlock* predecessor. Clearly, there are a lot of places that could use that functionality. Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn Reviewed By: Meinersbur, fhahn Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D63338 llvm-svn: 363566	2019-06-17 14:38:56 +00:00
Sam Parker	1bd3d00e7e	[CodeGen] Check for HardwareLoop Latch ExitBlock The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556	2019-06-17 13:39:28 +00:00
Luis Marques	2e46312ffd	[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544	2019-06-17 10:54:12 +00:00
Simon Pilgrim	ef78e55205	[SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source. llvm-svn: 363542	2019-06-17 10:14:52 +00:00
Sander de Smalen	5d6ee76c16	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533	2019-06-17 09:13:29 +00:00
Sanjay Patel	c8d88ad1a9	[CodeGenPrepare][x86] shift both sides of a vector select when profitable This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511	2019-06-16 15:29:03 +00:00
Michael Berg	ad6bb86b2d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	968b5f84af	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Matt Arsenault	9487278010	Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478	2019-06-15 00:33:26 +00:00
Mitch Phillips	0d44f129bb	Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit `c2864c0de0`. This reverts rL363410. llvm-svn: 363476	2019-06-14 23:45:34 +00:00
Michael Berg	69394bedc5	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Guozhi Wei	d2210af332	[MBP] Move a latch block with conditional exit and multi predecessors to top of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471	2019-06-14 23:08:59 +00:00
Amara Emerson	f79d3bc724	[GlobalISel] Add a G_BRJT opcode. This is a branch opcode that takes a jump table pointer, jump table index and an index into the table to do an indirect branch. We pass both the table pointer and JTI to allow targets like ARM64 to more easily use the existing jump table compression optimization without having to walk up the block to find a paired G_JUMP_TABLE. Differential Revision: https://reviews.llvm.org/D63159 llvm-svn: 363434	2019-06-14 17:55:48 +00:00
Matt Arsenault	c2864c0de0	GlobalISel: Avoid producing Illegal copies in RegBankSelect Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410	2019-06-14 15:22:25 +00:00
Sanjay Patel	7ea378b940	[CodeGenPrepare] propagate debuginfo when copying a shuffle llvm-svn: 363409	2019-06-14 15:05:35 +00:00
Matt Arsenault	731a81598e	RegBankSelect: Remove checks for invalid mappings Avoid a check for valid and a set of redundant asserts. The place InstructionMapping is constructed asserts all of the default fields are passed anyway for an invalid mapping, so don't overcomplicate this. llvm-svn: 363391	2019-06-14 13:42:40 +00:00
Matt Arsenault	3062e87a1e	Fix not calling TargetCustom PSVs printer If the enum value was greater than the starting target custom value, the custom printer wasn't called. llvm-svn: 363386	2019-06-14 13:26:34 +00:00
David Blaikie	4129e3e0f8	DebugInfo: Include enumerators in pubnames This is consistent with GCC's behavior (which is the defacto standard for pubnames). Though I find the presence of enumerators from enum classes to be a bit confusing, possibly a bug on GCC's end (since they can't be named unqualified, unlike the other names - and names nested in classes don't go in pubnames, for instance - presumably because one must name the class first & that's enough to limit the scope of the search) llvm-svn: 363349	2019-06-14 01:58:56 +00:00
Amy Huang	49275272e3	Use fully qualified name when printing S_CONSTANT records Summary: Before it was using the fully qualified name only for static data members. Now it does for all variable names to match MSVC. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63012 llvm-svn: 363335	2019-06-13 22:53:43 +00:00
Amara Emerson	fb0a40f064	[GlobalISel][IRTranslator] Add debug loc with line 0 to constants emitted into the entry block. Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which lets us use the vreg def assuming it dominates all other users. However, it can cause jumpy debug behaviour since the DebugLoc attached to these MIs are from a user instruction that could be in a different block. Fixes PR40887. Differential Revision: https://reviews.llvm.org/D63286 llvm-svn: 363331	2019-06-13 22:15:35 +00:00
Jinsong Ji	1c88445840	[MachinePiepliner] Don't check boundary node in checkValidNodeOrder This was exposed by PowerPC target enablement. In ScheduleDAG, if we haven't seen any uses in this scheduling region, we will create a dependence edge to ExitSU to model the live-out latency. This is required for vreg defs with no in-region use, and prefetches with no vreg def. When we build NodeOrder in Scheduler, we ignore these boundary nodes. However, when we check Succs in checkValidNodeOrder, we did not skip them, so we still assume all the nodes have been sorted and in order in Indices array. So when we call lower_bound() for ExitSU, it will return Indices.end(), causing memory issues in following Node access. Differential Revision: https://reviews.llvm.org/D63282 llvm-svn: 363329	2019-06-13 21:51:12 +00:00
David Bolvansky	896ece41e4	[Codegen] Merge tail blocks with no successors after block placement Summary: I found the following case having tail blocks with no successors merging opportunities after block placement. Before block placement: bb0: ... bne a0, 0, bb2: bb1: mv a0, 1 ret bb2: ... bb3: mv a0, 1 ret bb4: mv a0, -1 ret The conditional branch bne in bb0 is opposite to beq. After block placement: bb0: ... beq a0, 0, bb1 bb2: ... bb4: mv a0, -1 ret bb1: mv a0, 1 ret bb3: mv a0, 1 ret After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size. Author of original patch: Jim Lin Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen Reviewed By: Jim, dmgreen Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54411 llvm-svn: 363284	2019-06-13 18:11:32 +00:00
David Stenberg	1278a19282	Remove ';' after namespace's closing bracket [NFC] llvm-svn: 363267	2019-06-13 14:02:55 +00:00
Diogo N. Sampaio	0be2d25ecc	[FIX] Forces shrink wrapping to consider any memory access as aliasing with the stack Summary: Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472 The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed. Taking an exception can cause the stack to be corrupted. As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access the stack. Reviewers: dmgreen, qcolombet Reviewed By: qcolombet Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg Tags: #llvm Differential Revision: https://reviews.llvm.org/D63152 llvm-svn: 363265	2019-06-13 13:56:19 +00:00
Jeremy Morse	d2cd9c23b4	[NFC] Sink a function call into LiveDebugValues::process This was requested in D62904, which I successfully missed. This is just a refactor and shouldn't change any behaviour. llvm-svn: 363259	2019-06-13 13:11:57 +00:00
Simon Pilgrim	6b56ad164c	[CodeGen] Add getMachineMemOperand + MachineMemOperand::Flags allocator helper wrapper. NFCI. Pre-commit for D62726 on behalf of @luke (Luke Lau) llvm-svn: 363257	2019-06-13 12:58:55 +00:00
Jeremy Morse	bf2b2f08b0	[DebugInfo] Honour variable fragments in LiveDebugValues This patch makes the LiveDebugValues pass consider fragments when propagating DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable location is added to the open-ranges key, which allows distinct fragments to be tracked separately. To handle overlapping fragments things become slightly funkier. To avoid excessive searching for overlaps in the data-flow part of LiveDebugValues, this patch: * Pre-computes pairings of fragments that overlap, for each DILocalVariable * During data-flow, whenever something happens that causes an open range to be terminated (via erase), any fragments pre-determined to overlap are also terminated. The effect of which is that when encountering a DBG_VALUE fragment that overlaps others, the overlapped fragments do not get propagated to other blocks. We still rely on later location-list building to correctly handle overlapping fragments within blocks. It's unclear whether a mixture of DBG_VALUEs with and without fragmented expressions are legitimate. To avoid suprises, this patch interprets a DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment. Differential Revision: https://reviews.llvm.org/D62904 llvm-svn: 363256	2019-06-13 12:51:57 +00:00
Nikola Prica	076ae0d2e2	[DebugInfo] Move Value struct out of DebugLocEntry as DbgValueLoc (NFC) Since the DebugLocEntry::Value is used as part of DwarfDebug and DebugLocEntry make it as the separate class. Reviewers: aprantl, dstenb Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D63213 llvm-svn: 363246	2019-06-13 10:23:26 +00:00
Jeremy Morse	181bf0cefb	[DebugInfo] Use FrameDestroy to extend stack locations to end-of-function We aim to ignore changes in variable locations during the prologue and epilogue of functions, to avoid using space documenting location changes that aren't visible. However in D61940 / r362951 this got ripped out as the previous implementation was unsound. Instead, use the FrameDestroy flag to identify when we're in the epilogue of a function, and ignore variable location changes accordingly. This fits in with existing code that examines the FrameSetup flag. Some variable locations get shuffled in modified tests as they now cover greater ranges, which is what would be expected. Some additional single-location variables are generated too. Two tests are un-xfailed, they were only xfailed due to r362951 deleting functionality they depended on. Apparently some out-of-tree backends don't accurately maintain FrameDestroy flags -- if you're an out-of-tree maintainer and see changes in variable locations disappear due to a faulty FrameDestroy flag, it's safe to back this change out. The impact is just slightly more debug info than necessary. Differential Revision: https://reviews.llvm.org/D62314 llvm-svn: 363245	2019-06-13 10:03:17 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Matt Arsenault	f29366b1f5	StackProtector: Use PointerMayBeCaptured This was using its own, outdated list of possible captures. This was at minimum not catching cmpxchg and addrspacecast captures. One change is now any volatile access is treated as capturing. The test coverage for this pass is quite inadequate, but this required removing volatile in the lifetime capture test. Also fixes some infrastructure issues to allow running just the IR pass. Fixes bug 42238. llvm-svn: 363169	2019-06-12 14:23:33 +00:00
Anton Afanasyev	339b39b773	[MIR] Skip hoisting to basic block which may throw exception or return Summary: Fix hoisting to basic block which are not legal for hoisting cause it can be terminated by exception or it is return block. Reviewers: john.brawn, RKSimon, MatzeB Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63148 llvm-svn: 363164	2019-06-12 13:51:44 +00:00
Hsiangkai Wang	93be25b580	[NFC] Correct comments in RegisterCoalescer. Differential Revision: https://reviews.llvm.org/D63124 llvm-svn: 363119	2019-06-12 02:58:04 +00:00
Amara Emerson	d133c15925	[GlobalISel] Add a G_JUMP_TABLE opcode. This opcode generates a pointer to the address of the jump table specified by the source operand, which is a jump table index. It will be used in conjunction with an upcoming G_BRJT opcode to support jump table codegen with GlobalISel. Differential Revision: https://reviews.llvm.org/D63111 llvm-svn: 363096	2019-06-11 19:58:06 +00:00
Jinsong Ji	ef2d6d99c0	[PowerPC] Enable MachinePipeliner for P9 with -ppc-enable-pipeliner Implement necessary target hooks to enable MachinePipeliner for P9 only. The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9. Differential Revision: https://reviews.llvm.org/D62164 llvm-svn: 363085	2019-06-11 17:40:39 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Simon Pilgrim	287e78c82b	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105) Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040	2019-06-11 09:44:33 +00:00
Sander de Smalen	cbeb563cfb	Change semantics of fadd/fmul vector reductions. This patch changes how LLVM handles the accumulator/start value in the reduction, by never ignoring it regardless of the presence of fast-math flags on callsites. This change introduces the following new intrinsics to replace the existing ones: llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul and adds functionality to auto-upgrade existing LLVM IR and bitcode. Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D60261 llvm-svn: 363035	2019-06-11 08:22:10 +00:00
Matt Arsenault	c5830f5f05	AtomicExpand: Don't crash on non-0 alloca This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022	2019-06-11 01:35:07 +00:00
Matt Arsenault	383e72fcfe	AMDGPU: Expand < 32-bit atomics Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021	2019-06-11 01:35:00 +00:00
Puyan Lotfi	4d89462a1c	[MIR-Canon] Fixing non-determinism that was breaking bots (NFC). An earlier fix of a subtle iterator invalidation bug had uncovered a nondeterminism that was present in the MultiUsers bag. Problem was that MultiUsers was being looked up using pointers. This patch is an NFC change that numbers each multiuser and processes each in numbered order. This fixes the test failure on netbsd and will likely fix the green-dragon bot too. llvm-svn: 363012	2019-06-11 00:00:25 +00:00
Jessica Paquette	b22954384e	[GlobalISel] Translate memset/memmove/memcpy from undef ptrs into nops If the source is undef, then just don't do anything. This matches SelectionDAG's behaviour in SelectionDAG.cpp. Also add a test showing that we do the right thing here. (irtranslator-memfunc-undef.ll) Differential Revision: https://reviews.llvm.org/D63095 llvm-svn: 362989	2019-06-10 21:53:56 +00:00
Francis Visoiu Mistrih	a438432acc	[FastISel] Skip creating unnecessary vregs for arguments This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963	2019-06-10 16:53:37 +00:00
Jeremy Morse	bcff417292	[DebugInfo] Terminate all location-lists at end of block This commit reapplies r359426 (which was reverted in r360301 due to performance problems) and rolls in D61940 to address the performance problem. I've combined the two to avoid creating a span of slow-performance, and to ease reverting if more problems crop up. The summary of D61940: This patch removes the "ChangingRegs" facility in DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect variable locations. An unchanging register doesn't mean a variable doesn't change its location. The patch kills off everything that calculates the ChangingRegs vector. Previously ChangingRegs spotted epilogues and marked registers as unchanging if they weren't modified outside the epilogue, increasing the chance that we can emit a single-location variable record. Without this feature, debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've split this into two steps as FrameDestroy isn't necessarily supported by all backends. The logic for terminating variable locations at the end of a basic block now becomes much more enjoyably simple: we just terminate them all. Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term. The current algorithm for detecting that a variable has a single-location doesn't work in this scenario (inlined function in multiple blocks), only other bugs were making this test work. fission-ranges.ll gets slightly refreshed too, as the location of "p" is now correctly determined to be a single location. Differential Revision: https://reviews.llvm.org/D61940 llvm-svn: 362951	2019-06-10 15:23:46 +00:00
Nikola Prica	abc1dff7e4	[DebugInfo] More strict debug range for stack variables Variable's stack location can stretch longer than it should. If a variable is placed at the stack in a some nested basic block its range can be calculated to be up to the next occurrence of the variable's DBG_VALUE, or up to the end of the function, thus covering a basic blocks that should not be included in the variable’s location range. This happens because the DbgEntityHistoryCalculator ends register locations at the end of a basic block only if the variable’s location register has been changed throughout the function, which is not the case for the register used to reference stack objects. This patch also tries to produce a single value location if the location list builder managed to merge all the locations into one. Reviewers: aprantl, dstenb, jmorse Reviewed By: aprantl, dstenb, jmorse Subscribers: djtodoro, ivanbaev, asowda Tags: #debug-info Differential Revision: https://reviews.llvm.org/D61600 llvm-svn: 362923	2019-06-10 08:41:06 +00:00
QingShan Zhang	ab846da7e8	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921	2019-06-10 05:40:21 +00:00
David Bolvansky	dcf5e6abdf	[TargetLowering] Simplify (ctpop x) == 1 Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912	2019-06-09 18:18:57 +00:00
Anton Afanasyev	623d9ba068	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 This is fixed recommit of r361356 after PowerPC64 multistage build failure. llvm-svn: 362901	2019-06-09 12:15:47 +00:00
Simon Pilgrim	5f337149fa	Use for-range loop. NFCI. llvm-svn: 362897	2019-06-09 09:07:30 +00:00
Simon Pilgrim	6bae6d5a5d	[DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI. Same codegen, only differ by the oneuse limit for the sextload case. llvm-svn: 362880	2019-06-08 17:02:00 +00:00
Jonas Paulsson	fdc4ea34e3	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection. This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868	2019-06-08 06:19:15 +00:00
Amara Emerson	829037a914	Factor out SelectionDAG's switch analysis and lowering into a separate component. In order for GlobalISel to re-use the significant amount of analysis and optimization code in SDAG's switch lowering, we first have to extract it and create an interface to be used by both frameworks. No test changes as it's NFC. Differential Revision: https://reviews.llvm.org/D62745 llvm-svn: 362857	2019-06-08 00:05:17 +00:00
Volkan Keles	97204a6788	[GlobalISel] IRTranslator: Translate the intrinsics ignored by CodeGen Summary: Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing as they have no effect on CodeGen. Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm Reviewed By: arsenm Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63022 llvm-svn: 362834	2019-06-07 20:19:27 +00:00
Simon Pilgrim	f0240ee76d	[DAGCombine] visitAND - fix local shadow variable warnings. NFCI. llvm-svn: 362825	2019-06-07 18:36:43 +00:00
Simon Pilgrim	4c9db2045a	[DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI. llvm-svn: 362820	2019-06-07 18:07:06 +00:00
Jinsong Ji	7aafdef627	[MachineScheduler] checkResourceLimit boundary condition update When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805	2019-06-07 14:54:47 +00:00
Matt Arsenault	94a609e343	TailDuplicator: Remove no-op analyzeBranch call This could fail, which looked concerning. However nothing was actually using the results of this. I assume this was intended to use the anti-feature of analyzeBranch of removing instructions, but wasn't actually calling it with AllowModify = true. Fixes bug 42162. llvm-svn: 362800	2019-06-07 13:33:34 +00:00
Sam Parker	67f9dc60b8	Fix for lld buildbot Removed unused (in non-debug builds) variable. llvm-svn: 362775	2019-06-07 08:04:18 +00:00
Sam Parker	c5ef502ee8	[CodeGen] Generic Hardware Loop Support Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774	2019-06-07 07:35:30 +00:00
Alexey Lapshin	b9f1e7b16e	[DebugInfo] Incorrect debug info record generated for loop counter. Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage. Range for Debug Variable("i") computed according to current state of instructions inside of basic block. But Register Allocator creates new instructions which were not taken into account when Live Debug Variables computed. In the result DBG_VALUE instruction for the "i" variable was put after these newly inserted instructions. This is incorrect. Debug Value for the loop counter should be inserted before any loop instruction. Differential Revision: https://reviews.llvm.org/D62650 llvm-svn: 362750	2019-06-06 21:19:39 +00:00
Jason Liu	60ec248148	[AIX] Implement function descriptor on SDAG Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735	2019-06-06 19:13:36 +00:00
Simon Pilgrim	842c7792aa	[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123) This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores: 1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another. 2 - The merged load\store node must retain the non-temporal flag. Differential Revision: https://reviews.llvm.org/D62910 llvm-svn: 362723	2019-06-06 17:04:13 +00:00
Simon Pilgrim	da993d08c8	[DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI. Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s llvm-svn: 362695	2019-06-06 10:21:18 +00:00
Petar Avramovic	faaa2b5d21	[MIPS GlobalISel] Select floor and ceil Select G_FFLOOR and G_FCEIL for MIPS32. Differential Revision: https://reviews.llvm.org/D62901 llvm-svn: 362688	2019-06-06 09:02:24 +00:00
Ulrich Weigand	6c5d5ce551	Allow target to handle STRICT floating-point nodes The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is both marked as mayRaiseFPException and FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663	2019-06-05 22:33:10 +00:00
Tim Northover	607c8a9d14	IR: make getParamByValType Just Work. NFC. Most parts of LLVM don't care whether the byval type is derived from an explicit Attribute or from the parameter's pointee type, so it makes sense for the main access function to just return the right value. The very few users who do care (only BitcodeReader so far) can find out how it's specified by accessing the Attribute directly. llvm-svn: 362642	2019-06-05 20:37:47 +00:00
Simon Pilgrim	77d6adc491	Fix shadow local variable warning. NFCI. llvm-svn: 362622	2019-06-05 17:26:29 +00:00
Sanjay Patel	ad62a3a299	[LoopUtils][SLPVectorizer] clean up management of fast-math-flags Instead of passing around fast-math-flags as a parameter, we can set those using an IRBuilder guard object. This is no-functional-change-intended. The motivation is to eventually fix the vectorizers to use and set the correct fast-math-flags for reductions. Examples of that not behaving as expected are: https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast') https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0) D61802 (should be able to reduce with IR-level FMF) Differential Revision: https://reviews.llvm.org/D62272 llvm-svn: 362612	2019-06-05 14:58:04 +00:00
Simon Pilgrim	5a81af547c	[TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI. Will be used more in an upcoming patch. llvm-svn: 362595	2019-06-05 10:59:04 +00:00
Johannes Doerfert	6b432dca5d	[SelectionDAG][FIX] Allow "returned" arguments to be bit-casted Summary: An argument that is return by a function but bit-casted before can still be annotated as "returned". Make sure we do not crash for this case. Reviewers: sunfish, stephenwlin, niravd, arsenm Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59917 llvm-svn: 362546	2019-06-04 20:34:43 +00:00
Nemanja Ivanovic	aed7227b71	Revert r362472 as it is breaking PPC build bots The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots. Reverting it to bring the bots back to green. llvm-svn: 362539	2019-06-04 18:48:43 +00:00
Craig Topper	09a4415803	[DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1) This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial. Fixes PR42118 Differential Revision: https://reviews.llvm.org/D62828 llvm-svn: 362533	2019-06-04 17:44:18 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Roman Lebedev	3dce0326fe	[DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952) Summary: This might be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet. As far as i can tell, there are no regressions here (ignoring x86-32), all changes are either good or neutral. This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`) `@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/vMd3 Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma Reviewed By: RKSimon Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62774 llvm-svn: 362488	2019-06-04 11:06:21 +00:00
Roman Lebedev	be6ce7b3f2	[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold Summary: All changes except ARM look great. https://rise4fun.com/Alive/R2M The regression `test/CodeGen/ARM/addsubcarry-promotion.ll` is recovered fully by D62392 + D62450. Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma Reviewed By: efriedma Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62266 llvm-svn: 362487	2019-06-04 11:06:08 +00:00
Simon Pilgrim	ad298f86b7	[SelectionDAG] ComputeNumSignBits - support constant pool values from target As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits. The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select. It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup. Differential Revision: https://reviews.llvm.org/D62777 llvm-svn: 362486	2019-06-04 10:49:06 +00:00
Simon Pilgrim	3178546a27	[SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD comments. NFCI. Pre-commit requested for D62777. llvm-svn: 362485	2019-06-04 10:17:56 +00:00
Simon Pilgrim	3018d505a3	[SelectionDAG] Add fpto[us]i(undef) --> undef constant fold Follow up to D62807. Differential Revision: https://reviews.llvm.org/D62811 llvm-svn: 362483	2019-06-04 10:04:55 +00:00
QingShan Zhang	11de0e71b0	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D61843 llvm-svn: 362472	2019-06-04 08:53:53 +00:00
Michael Berg	6ff978ee05	Propagate fmf for setcc in SDAG for select folds llvm-svn: 362448	2019-06-03 21:53:26 +00:00
Michael Berg	0b7f98da65	Propagate fmf for setcc/select folds Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG. Reviewers: qcolombet, spatel Reviewed By: qcolombet Subscribers: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D62552 llvm-svn: 362439	2019-06-03 19:12:15 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Simon Pilgrim	cb7e4e8193	[SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205) We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction Differential Revision: https://reviews.llvm.org/D62807 llvm-svn: 362397	2019-06-03 13:02:07 +00:00
Nikola Prica	2d0106a110	[LiveDebugValues] Close range for previous variable's location when adding newly deduced location When LiveDebugValues deduces new variable's location from spill, restore or register copy instruction it should close old variable's location. Otherwise we can have multiple block output locations for same variable. That could lead to inserting two DBG_VALUEs for same variable to the beginning of the successor block which results to ignoring of first DBG_VALUE. Reviewers: aprantl, jmorse, wolfgangp, dstenb Reviewed By: aprantl Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro Tags: #debug-info Differential Revision: https://reviews.llvm.org/D62196 llvm-svn: 362373	2019-06-03 09:48:29 +00:00
Florian Hahn	e71963c850	Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor. If we hit the limit, we do expand the outstanding tokenfactors. Otherwise, we might drop nodes with users in the unexpanded tokenfactors. This fixes the crashes reported by Jordan Rupprecht. Reviewers: niravd, spatel, craig.topper, rupprecht Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D62633 llvm-svn: 362350	2019-06-03 01:30:19 +00:00
Craig Topper	50b35caf30	[DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask. Similar to what was done for masked load and gather. llvm-svn: 362342	2019-06-02 22:52:38 +00:00
Craig Topper	5f79d74946	[X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin Need to cast only to Constant instead of ConstantVector to allow ConstantAggregateZero. llvm-svn: 362341	2019-06-02 22:52:34 +00:00
Craig Topper	a7bc31ebc6	[DAGCombiner] Replace masked loads with a zero mask with the passthru value Similar to what was recently done for gathers in r362015. llvm-svn: 362337	2019-06-02 18:58:46 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Simon Pilgrim	ffb4d2bff7	[DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020) Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot. PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe. Differential Revision: https://reviews.llvm.org/D62783 llvm-svn: 362323	2019-06-02 11:56:39 +00:00
Simon Pilgrim	88522ce388	[TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables in analysis. These might have been replaced in multiple use cases. llvm-svn: 362322	2019-06-02 10:12:55 +00:00
Simon Pilgrim	30a6caa3e7	[TargetLowering] SimplifyDemandedVectorElts - use same arg names as SimplifyDemandedBits. NFCI. Helps with debugging as we recurse between them. llvm-svn: 362321	2019-06-02 10:03:56 +00:00
Craig Topper	f58ef87bb7	[DAGCombiner] Replace two unchecked dyn_casts with casts. The results of the dyn_casts were immediately dereferenced on the next line so they had better not be null. I don't think there's any way for these dyn_casts to fail, so use a cast of adding null check. llvm-svn: 362315	2019-06-02 03:31:01 +00:00
Craig Topper	78c794a70b	[X86] Fix several places that weren't passing what they though they were to MachineInstr::print Over a year ago, MachineInstr gained a fourth boolean parameter that occurs before the TII pointer. When this happened, several places started accidentally passing TII into this boolean parameter instead of the TII parameter. llvm-svn: 362312	2019-06-02 01:36:48 +00:00
Eli Friedman	d8e8722791	[CodeGen] Fix hashing for MO_ExternalSymbol MachineOperands. We were hashing the string pointer, not the string, so two instructions could be identical (isIdenticalTo), but have different hash codes. This showed up as a very rare, non-deterministic assertion failure rehashing a DenseMap constructed by MachineOutliner. So there's no "real" testcase, just a unittest which checks that the hash function behaves correctly. I'm a little scared fixing this is going to cause a regression in outlining or MachineCSE, but hopefully we won't run into any issues. Differential Revision: https://reviews.llvm.org/D61975 llvm-svn: 362281	2019-06-01 00:08:54 +00:00
Craig Topper	bc9e04d0c3	[SelectionDAG] Make the code in mutateStrictFPToFP less aware of how many operands each node has. NFCI Just copy all of the operands except the chain and call MorphNode on that. This removes the IsUnary and IsTernary flags. Also always get the result type from the result type of the original nodes. Previously we got it from the operand except for two nodes where that didn't work. llvm-svn: 362269	2019-05-31 22:18:45 +00:00
Nick Desaulniers	103bd108a7	[RegisterCoalescer] fix potential use of undef value. NFC Summary: Fixes a warning produced from scan-build (llvm.org/reports/scan-build/), further warnings found by annotation isMoveInstr [[nodiscard]]. isMoveInstr potentially does not assign to its parameters, so if they were uninitialized, they will potentially stay uninitialized. It seems most call sites pass references to uninitialized values, then use them without checking the return value. Reviewers: wmi Reviewed By: wmi Subscribers: MatzeB, qcolombet, hiraditya, tpr, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62109 llvm-svn: 362265	2019-05-31 21:20:13 +00:00
Puyan Lotfi	3ea6b24f41	[MIR-Canon] Don't do vreg skip for independent instructions if there are none. We don't want to create vregs if there is nothing to use them for. That causes verifier errors. Differential Revision: https://reviews.llvm.org/D62740 llvm-svn: 362247	2019-05-31 17:34:25 +00:00
Kevin P. Neal	ac79007205	Revert revert of r362112 with minor SystemZ test file corrections. [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: https://reviews.llvm.org/D62546 llvm-svn: 362241	2019-05-31 16:32:12 +00:00
Jinsong Ji	18e7bf5c4d	[MachinePipeliner][NFC] Add some debug log and statistics This is to add some log and statistics for debugging Differential Revision: https://reviews.llvm.org/D62165 llvm-svn: 362233	2019-05-31 15:35:19 +00:00
Puyan Lotfi	0d63cef180	[MIR-Canon] Skip the first N vreg names lazily. This consolidates the vreg skip code into one function (SkipVRegs()). SkipVRegs() now knows if it should skip as if it is the first initialization or subsequent skips. The first skip is also done the first time createVirtualRegister is called by the cursor instead of by the cursor's constructor. This prevents verifier errors on machine functions that have no vregs (where the verifier will complain that there are vregs when the function uses none). Differential Revision: https://reviews.llvm.org/D62717 llvm-svn: 362195	2019-05-31 06:02:38 +00:00
Puyan Lotfi	2a901401fe	[MIR-Canon] Hardening propagateLocalCopies. This is am almost NFC, it does the following: - If there is no register class for a COPY's src or dst, bail. - Fixes uses iterator invalidation bug. Differential Revision: https://reviews.llvm.org/D62713 llvm-svn: 362191	2019-05-31 04:49:58 +00:00
Matt Arsenault	18659f84b2	MISched: Fix -misched-regpressure=0 if subreg liveness enabled Test is waiting on fixing several more crashes in the AMDGPU scheduler implementation with this. llvm-svn: 362174	2019-05-30 23:31:36 +00:00
Francis Visoiu Mistrih	6ada11f134	[Remarks][NFC] Move the serialization to lib/Remarks Separate the remark serialization to YAML from the LLVM Diagnostics. This adds a new serialization abstraction: remarks::Serializer. It's completely independent from lib/IR and it provides an easy way to replace YAML by providing a new remarks::Serializer. Differential Revision: https://reviews.llvm.org/D62632 llvm-svn: 362160	2019-05-30 21:45:59 +00:00
Puyan Lotfi	daaecf98c9	[MIR-Canon] Fixing case where MachineFunction is empty. In cases where the machine function is empty: bail on the RPO traversal. Differential Revision: https://reviews.llvm.org/D62617 llvm-svn: 362158	2019-05-30 21:37:25 +00:00
Roman Lebedev	46511d75b5	[DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts I don't have a test case for these, but there is a test case for D62266 where, even after all the constant-folding patches, we still end up with endless combine loop. Which makes sense, since we don't constant fold for opaque constants. llvm-svn: 362156	2019-05-30 21:10:37 +00:00
Roman Lebedev	a4e3b50e26	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2 Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 362146	2019-05-30 20:37:49 +00:00
Roman Lebedev	57aa36ff91	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 362145	2019-05-30 20:37:39 +00:00
Roman Lebedev	63b4741534	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 362144	2019-05-30 20:37:29 +00:00
Roman Lebedev	05ad5fd213	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 362143	2019-05-30 20:37:18 +00:00
Roman Lebedev	1d9ec7a81b	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 362142	2019-05-30 20:36:54 +00:00
Roman Lebedev	7eb8b5b5dd	[DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold Summary: https://rise4fun.com/Alive/B0A Reviewers: t.p.northover, RKSimon, spatel, craig.topper Reviewed By: RKSimon Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62691 llvm-svn: 362135	2019-05-30 19:27:51 +00:00
Roman Lebedev	691b5e2ecc	[DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold Summary: https://rise4fun.com/Alive/Mb1M Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62689 llvm-svn: 362134	2019-05-30 19:27:42 +00:00
Roman Lebedev	0a3dbbcdfb	[DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold Summary: Direct sibling of D62662, the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62664 llvm-svn: 362133	2019-05-30 19:27:32 +00:00
Roman Lebedev	9ff3159b4a	[DAGCombine] Use FoldConstantArithmetic() to perform C2-(A+C1) -> (C2-C1)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62663 llvm-svn: 362132	2019-05-30 19:27:26 +00:00
Roman Lebedev	cc9a9cf237	[DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold Summary: This was the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, spatel, craig.topper, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62662 llvm-svn: 362131	2019-05-30 19:27:19 +00:00
Roman Lebedev	ef95679741	[DAGCombine] Use FoldConstantArithmetic() to perform ((c1-A)+c2) -> (c1+c2)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62661 llvm-svn: 362130	2019-05-30 19:27:10 +00:00
Tim Northover	b7141207a4	Reapply: IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. The original commit did not remap byval types when linking modules, which broke LTO. This version fixes that. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362128	2019-05-30 18:48:23 +00:00
Puyan Lotfi	0f4446b270	[MIR-Canon] Add support for rewriting VRegs that are typed but don't have an RC. There were crashes (addrspace-memoperands.mir was only one of them) in MIR that had operands that came from before register classes were set. With these operands, creating a replacement vreg (for MIR-Canon's renaming) needs to use the vreg type rather than the RegisterClass which is not present. Differential Revision: https://reviews.llvm.org/D62543 llvm-svn: 362122	2019-05-30 18:06:28 +00:00
Kevin P. Neal	51ce0b196a	Correct error in revert of r362112. Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362118	2019-05-30 17:21:45 +00:00
Kevin P. Neal	d3db7b40b0	Revert r362112, it broke the bots with the message "Unsupported vector argument or return type" Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362117	2019-05-30 17:10:21 +00:00
Kevin P. Neal	2e1807678d	[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: http://reviews.llvm.org/D62546 llvm-svn: 362112	2019-05-30 16:44:47 +00:00
Roman Lebedev	019d270e43	[DAGCombine] Revert of recommit of "binop-with-const hoisting" patches I was looking into an endless combine loop the uncommitted follow-up patch was causing, and it appears even these patches can exibit such an endless loop. The root cause is that we try to hoist one binop (add/sub) with constant operand, and if we get two such binops both of which are eligible for this hoisting, we get stuck. Some cases may highlight missing constant-folds. Reverts r361871,r361872,r361873,r361874. llvm-svn: 362109	2019-05-30 16:07:11 +00:00
Amy Huang	325003be02	CodeView - add static data members to global variable debug info. Summary: Add static data members to IR debug info's list of global variables so that they are emitted as S_CONSTANT records. Related to https://bugs.llvm.org/show_bug.cgi?id=41615. Reviewers: rnk Subscribers: aprantl, cfe-commits, llvm-commits, thakis Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62167 llvm-svn: 362038	2019-05-29 21:45:34 +00:00
Tim Northover	71ee3d0237	Revert "IR: add optional type to 'byval' function parameters" The IRLinker doesn't delve into the new byval attribute when mapping types, and this breaks LTO. llvm-svn: 362029	2019-05-29 20:46:38 +00:00
Benjamin Kramer	107f8d9873	[DAGCombiner] Replace gathers with a zero mask with the passthru value These can be created by the legalizer when splitting a larger gather. See https://llvm.org/PR42055 for a motivating example. Differential Revision: https://reviews.llvm.org/D62613 llvm-svn: 362015	2019-05-29 19:24:19 +00:00
Tim Northover	6e07f16fae	IR: add optional type to 'byval' function parameters When we switch to opaque pointer types we will need some way to describe how many bytes a 'byval' parameter should occupy on the stack. This adds a (for now) optional extra type parameter. If present, the type must match the pointee type of the argument. Note to front-end maintainers: if this causes test failures, it's probably because the "byval" attribute is printed after attributes without any parameter after this change. llvm-svn: 362012	2019-05-29 19:12:48 +00:00
Sam McCall	4f09d9fcfa	Qualify use of llvm::empty that's ambiguous with std::empty llvm-svn: 361968	2019-05-29 15:02:16 +00:00
Richard Trieu	c77aff7e17	Inline a variable into debug section to fix unused variable warning. llvm-svn: 361927	2019-05-29 04:09:32 +00:00
Richard Trieu	e8698ead9d	Inline value into debug statement to avoid unused variable warning. llvm-svn: 361924	2019-05-29 03:43:01 +00:00
Peter Collingbourne	31fda09b2d	Add IR support, ELF section and user documentation for partitioning feature. The partitioning feature was proposed here: http://lists.llvm.org/pipermail/llvm-dev/2019-February/130583.html This is mostly just documentation. The feature itself will be contributed in subsequent patches. Differential Revision: https://reviews.llvm.org/D60242 llvm-svn: 361923	2019-05-29 03:29:01 +00:00
Jinsong Ji	f6cb3bcb4c	Support resource tracking with InstrSchedModel The current design use DFA to do resource tracking in SMS, and DFA only support InstrItins, and also has scaling limitation. This patch extend SMS to allow Subtarget to use ProcResource in InstrSchedModel instead. Differential Revision: https://reviews.llvm.org/D62163 llvm-svn: 361919	2019-05-29 03:02:59 +00:00
Pengfei Wang	72e3f9662b	Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to" This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b. llvm-svn: 361918	2019-05-29 02:49:59 +00:00
Pengfei Wang	818c652643	[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to avoid static check fail RegClassOrBank is an object of RegClassOrRegBank, which is defined as using llvm::RegClassOrRegBank = typedef PointerUnion<const TargetRegisterClass , const RegisterBank > so control flow can not get here. Use ""llvm_unreachable" here to avoid "null pointer" confusion. Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62006 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 361912	2019-05-29 02:20:37 +00:00
Quentin Colombet	a6f57ad2c9	[RegUsageInfoCollector] Don't mark as saved registers that don't have subregister lanes To determine the list of clobbered registers, the RegUsageInfoCollector pass uses the list of callee saved registers provided by the target and then augments it with the list of registers which have all their subregisters saved. It then basically does the difference between all the registers and the saved registers to come up with what is clobbered (plus it checks that the register is defined within that functions). The patch fixes a bug where when register does not have any subregister lane, hence when checking if any of its subregister are not saved, we would find none and think the register is saved as well. That's obviously wrong. The code was actually kind of checking for something like that with the CoveredBySubRegs bit. What this bit says is that a register is completely covered by its subregisters. We required that this bit was set, to check that a register was saved by its subregister lanes, since without this bit, we potentially would miss to check some part of the register. However, this bit is used de facto on registers that don't have any subregisters (e.g., on ARM) and the code was not prepared for that. This patch fixes this by checking that a register has subregisters before declaring it saved when none of its lanes are modified. llvm-svn: 361901	2019-05-28 23:43:12 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Roman Lebedev	dfc34f0211	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361856, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361874	2019-05-28 20:40:10 +00:00
Roman Lebedev	d485c6bc9f	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361855, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361873	2019-05-28 20:40:03 +00:00
Roman Lebedev	96c9986199	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361853, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361872	2019-05-28 20:39:55 +00:00
Roman Lebedev	2feb7e56e2	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361871	2019-05-28 20:39:39 +00:00
Roman Lebedev	272d70c366	Revert DAGCombine "hoist binop with const" folds Appear to introduce test-suite compile-time hang. http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825 This reverts r361852,r361853,r361854,r361855,r361856 llvm-svn: 361865	2019-05-28 19:04:21 +00:00
Roman Lebedev	7669665432	[DAGCombine] (x - C) - y -> (x - y) - C fold Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361856	2019-05-28 17:54:21 +00:00
Roman Lebedev	8c9b3e4e4a	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361855	2019-05-28 17:54:13 +00:00
Roman Lebedev	6a24c9b9ab	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 361854	2019-05-28 17:54:04 +00:00
Roman Lebedev	1499f65ac1	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361853	2019-05-28 17:53:54 +00:00
Roman Lebedev	19f51ec04a	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361852	2019-05-28 17:53:43 +00:00
Simon Pilgrim	9cd9624fb6	[DAG] LegalizeVectorTypes - reduce scope of local variables. NFCI. Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings. llvm-svn: 361821	2019-05-28 13:46:26 +00:00
David Stenberg	5d0e6b6755	Stop undef fragments from closing non-overlapping fragments Summary: When DwarfDebug::buildLocationList() encountered an undef debug value, it would truncate all open values, regardless if they were overlapping or not. This patch fixes so that it only does that for overlapping fragments. This change unearthed a bug that I had introduced in D57511, which I have fixed in this patch. The code in DebugHandlerBase that changes labels for parameter debug values could break DwarfDebug's assumption that the labels for the entries in the debug value history are monotonically increasing. Before this patch, that bug could result in location list entries whose ending address was lower than the beginning address, and with the changes for undef debug values that this patch introduces it could trigger an assertion, due to attempting to emit location list entries with empty ranges. A reproducer for the bug is added in param-reg-const-mix.mir. Reviewers: aprantl, jmorse, probinson Reviewed By: aprantl Subscribers: javed.absar, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D62379 llvm-svn: 361820	2019-05-28 13:23:25 +00:00
Matt Arsenault	d3ed418ad3	MIR: Fix printer crashing on dead CSR frame indexes llvm-svn: 361819	2019-05-28 13:08:31 +00:00
Benjamin Kramer	57e267a2e9	[X86] Custom lower CONCAT_VECTORS of v2i1 The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814	2019-05-28 12:52:57 +00:00
Matt Arsenault	ca84c4be4b	RegAllocFast: Set MayLiveAcrossBlocks when allocating uses Setting mayLiveOut based only on use instructions after allocating the def block did not work if the use block was allocated before the def block, since the virtual register uses were already removed. Fixes bug 41973. llvm-svn: 361781	2019-05-27 20:37:31 +00:00
Sanjay Patel	2f99d009c1	[SelectionDAG] fold concat of extract subvectors This is derived from the related fold for build vectors. We also have a version of this in DAGCombiner. The benefit of having this fold at node creation time is (1) efficiency and (2) preventing infinite looping from creating patterns that should not exist in the first place. Currently, the inf-loop could happen with MergeConsecutiveStores() because it naively creates concat of extracts when forming a wider vector store. That could fight with target-specific store narrowing. llvm-svn: 361780	2019-05-27 20:26:21 +00:00
Sanjay Patel	e13ae3e4d8	[SelectionDAG] fix formatting and redundant comments; NFC There's a possible missing fold here for extracting from the same source vector. It's similar to a check that we use to squash a build vector with all extracted elements from the same source vector. llvm-svn: 361778	2019-05-27 18:26:43 +00:00
Michael Liao	9c70c574b4	[SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`. Summary: - The current implementation simplifies the case where the source of `copyto` is `implicit-def`ed. However, it only works when that `implicit-def` is single-used since it detects that from `implicit-def` and cannot determine which destination vreg should be used if there are multiple uses. - This patch changes that detection when `copyto` is being emitted. If that `copyto`'s source is defined from `implicit-def`, it simplifies it. Hence, it works even that `implicit-def` is multi-used. - Except it simplifies the internal IR, it won't improve the quality of code generation. However, it helps to detect 'implicit-def` in a straight-forward manner in some passes, such as `si-i1-copies`. A test case is added. Reviewers: sunfish, nhaehnle Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62342 llvm-svn: 361777	2019-05-27 18:26:29 +00:00
Simon Pilgrim	ebb053b139	[SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to. llvm-svn: 361773	2019-05-27 16:39:25 +00:00
Nikola Prica	441ad62531	Test commit (NFC) Add blank line. llvm-svn: 361761	2019-05-27 13:51:30 +00:00
David L. Jones	0ff41b8a5a	Revert r361356: "[MIR] Add simple PRE pass to MachineCSE" This is problematic on buildbots, as discussed here: https://reviews.llvm.org/rL361356 It seems like the plan already was to revert, but that hasn't happened yet. llvm-svn: 361746	2019-05-27 06:00:00 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Simon Pilgrim	06e02856ab	[SelectionDAG] GetDemandedBits - cleanup to more closely match SimplifyDemandedBits. NFCI. Prep work before adding demanded elts support. llvm-svn: 361739	2019-05-26 18:58:14 +00:00
Simon Pilgrim	2916b9e28c	[SelectionDAG] MaskedValueIsZero - add demanded elements implementation Will be used in an upcoming patch but I've updated the original implementation to call this to ensure test coverage. llvm-svn: 361738	2019-05-26 18:43:44 +00:00
Sanjay Patel	91131b6500	[SelectionDAG] soften assertion when legalizing narrow vector FP ops The test based on PR42010: https://bugs.llvm.org/show_bug.cgi?id=42010 ...may show an inaccuracy for PPC's target defs, but we should not be so aggressive with an assert here. There's no telling what out-of-tree targets look like. llvm-svn: 361696	2019-05-25 13:48:07 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Bjorn Pettersson	b4771425f5	Use the DataLayout::typeSizeEqualsStoreSize helper. NFC Just a minor refactoring to use the new helper method DataLayout::typeSizeEqualsStoreSize(). This is done when checking if getTypeSizeInBits is equal/non-equal to getTypeStoreSizeInBits. llvm-svn: 361613	2019-05-24 09:20:20 +00:00
Tim Northover	3b2157aeed	GlobalISel: support swifterror attribute on AArch64. swifterror marks an argument as a register pretending to be a pointer, so we need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the infrastructure can be reused from the DAG world. llvm-svn: 361608	2019-05-24 08:40:13 +00:00
Tim Northover	3d7a057b0d	CodeGen: factor out swifterror value tracking. llvm-svn: 361607	2019-05-24 08:39:43 +00:00
Sanjay Patel	7d6c0bce50	[DAGCombiner] make folds of binops safe for opcodes that produce >1 value This is no-functional-change-intended currently because the definition of isBinOp() only includes opcodes that produce 1 value. But if we share that implementation with isCommutativeBinOp() as proposed in D62191, then we need to make sure that the callers bail out for opcodes that they are not prepared to handle correctly. llvm-svn: 361547	2019-05-23 20:17:25 +00:00
Matt Arsenault	0f3ba44b57	AMDGPU/GlobalISel: Legality for integer min/max llvm-svn: 361519	2019-05-23 17:58:48 +00:00
Shoaib Meenai	87226a7202	[AsmPrinter] Treat a narrowing PtrToInt like Trunc When printing assembly for PtrToInt, AsmPrinter::lowerConstant incorrectly assumed that if PtrToInt was not converting to an int with exactly the same number of bits, it must be widening to a larger int. But this isn't necessarily true; PtrToInt can also shrink the size, which is useful when you want to produce a known 32-bit pointer on a 64-bit platform (on x86_64 ELF this yields a R_X86_64_32 relocation). The old behavior of falling through to the widening case for a narrowing PtrToInt yields bogus assembly code like this, which fails to assemble because the no-op bit and it accidentally creates is not a valid relocation: ``` .long a&-1 ``` The fix is to treat a narrowing PtrToInt exactly the same as it already treats Trunc: just emit the expression and let the assembler deal with truncating it in the appropriate way. Patch by Mat Hostetter <mjh@fb.com>. Differential Revision: https://reviews.llvm.org/D61325 llvm-svn: 361508	2019-05-23 16:29:09 +00:00
Petar Jovanovic	aa28b6d198	[LiveDebugValues] Rename 'DMI' into 'DebugInstr' (NFC) This will improve code readability. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D62295 llvm-svn: 361497	2019-05-23 13:49:06 +00:00
Clement Courbet	43882b16a3	[MergeICmps] Make the pass compatible with the new pass manager. Reviewers: gchatelet, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62287 llvm-svn: 361490	2019-05-23 12:35:26 +00:00
Petar Jovanovic	ff47d83e78	[DwarfExpression] Refactor dwarf expression (NFC) Refactor location description kind in order to be easier for extensions (needed for D60866). In addition, cut off some bits from the other class fields. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D62002 llvm-svn: 361480	2019-05-23 10:37:13 +00:00
Matt Arsenault	ca64ef2043	MC: Allow getMaxInstLength to depend on the subtarget Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405	2019-05-22 16:28:41 +00:00
Kees Cook	c2187c20a4	[TargetLowering] Extend bool args to inline-asm according to getBooleanType Summary: This extends Krzysztof Parzyszek's X86-specific solution (https://reviews.llvm.org/D60208) to the generic code pointed out by James Y Knight. Reviewers: kparzysz, craig.topper, nickdesaulniers Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D60224 llvm-svn: 361404	2019-05-22 16:16:15 +00:00
Kees Cook	a7a687e500	[TargetLowering] Add blank line (test commit) llvm-svn: 361403	2019-05-22 16:02:13 +00:00
Anton Afanasyev	df00c6a54f	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 llvm-svn: 361356	2019-05-22 07:41:34 +00:00
Stanislav Mekhanoshin	44d17ca02e	Fix register coalescer failure to prune value Register coalescer fails for the test in the patch with the assertion in JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join live intervals for two adjacent instructions and erase the copy: %2:vreg_256 = COPY %1 %3:vreg_256 = COPY killed %1 The LI needs to be adjusted to kill subrange for the erased instruction and extend the subrange of the original def. That was done for the main interval only but not for the subrange. As a result subrange had a VNI pointing to the erased slot resulting in the above failure. Differential Revision: https://reviews.llvm.org/D62162 llvm-svn: 361293	2019-05-21 19:32:41 +00:00
Leonard Chan	0bada7ce6c	[Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic Add an intrinsic that takes 2 signed integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Differential Revision: https://reviews.llvm.org/D55720 llvm-svn: 361289	2019-05-21 19:17:19 +00:00
Sanjay Patel	10f6b39899	[SelectionDAG] fold insert subvector of undef into undef DAGCombiner simplifies this more liberally as: // If inserting an UNDEF, just return the original vector. if (N1.isUndef()) return N0; So there's no way to make this visible in output AFAIK, but doing this at node creation time should be slightly more efficient. llvm-svn: 361287	2019-05-21 18:53:53 +00:00
Sanjay Patel	51dc59d090	[SelectionDAG] remove redundant code; NFCI getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS(): // Concat of UNDEFs is UNDEF. if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); })) return DAG.getUNDEF(VT); llvm-svn: 361284	2019-05-21 18:28:22 +00:00
Sanjay Patel	78c3f58122	[DAGCombiner] prevent unsafe reassociation of FP ops There are no FP callers of DAGCombiner::reassociateOps() currently, but we can add a fast-math check to make sure this API is not being misused. This was noted as a potential risk (and that risk might increase) with: D62191 llvm-svn: 361268	2019-05-21 14:47:38 +00:00
Florian Hahn	f9b28e53c7	[ScheduleDAGInstrs] Compute topological ordering on demand. In most cases, the topological ordering does not get changed in ScheduleDAGInstrs. We can compute the ordering on demand, similar to D60125. This drastically cuts down the number of times we need to compute the topological ordering, e.g. for SPEC2006, SPEC2k and MultiSource, we get the following stats for -O3 -flto on X86 (showing the top reductions, with small absolute values filtered). The smallest reduction is -50%. Slightly positive impact on compile-time (-0.1 % geomean speedup for test-suite + SPEC & co, with -O1 on X86) Tests: 243 Metric: pre-RA-sched.NumTopoInits Program base patch diff test-suite...ngs-C/fixoutput/fixoutput.test 115.00 3.00 -97.4% test-suite...ks/Prolangs-C/cdecl/cdecl.test 957.00 26.00 -97.3% test-suite...math/automotive-basicmath.test 107.00 3.00 -97.2% test-suite...rolangs-C++/deriv2/deriv2.test 144.00 6.00 -95.8% test-suite...lowfish/security-blowfish.test 410.00 18.00 -95.6% test-suite...frame_layout/frame_layout.test 441.00 23.00 -94.8% test-suite...rolangs-C++/employ/employ.test 159.00 11.00 -93.1% test-suite...s/Ptrdist/anagram/anagram.test 157.00 11.00 -93.0% test-suite...s-C/unix-smail/unix-smail.test 829.00 59.00 -92.9% test-suite...chmarks/Olden/power/power.test 154.00 11.00 -92.9% test-suite...T95/147.vortex/147.vortex.test 19876.00 1434.00 -92.8% test-suite...000/255.vortex/255.vortex.test 19881.00 1435.00 -92.8% test-suite...ce/Applications/Burg/burg.test 2203.00 168.00 -92.4% test-suite...urce/Applications/hbd/hbd.test 1067.00 85.00 -92.0% test-suite...ternal/HMMER/hmmcalibrate.test 3145.00 251.00 -92.0% test-suite.../Applications/spiff/spiff.test 1037.00 84.00 -91.9% test-suite...SPEC/CINT95/130.li/130.li.test 5913.00 487.00 -91.8% test-suite.../CINT95/134.perl/134.perl.test 12532.00 1041.00 -91.7% test-suite...ce/Benchmarks/Olden/bh/bh.test 220.00 19.00 -91.4% test-suite :: External/Nurbs/nurbs.test 2304.00 206.00 -91.1% test-suite...arks/VersaBench/dbms/dbms.test 773.00 75.00 -90.3% test-suite...ce/Applications/siod/siod.test 9043.00 878.00 -90.3% test-suite...pplications/treecc/treecc.test 4510.00 438.00 -90.3% test-suite...T2006/456.hmmer/456.hmmer.test 7093.00 697.00 -90.2% test-suite...s-C/Pathfinder/PathFinder.test 882.00 87.00 -90.1% test-suite.../CINT2000/176.gcc/176.gcc.test 64978.00 6721.00 -89.7% test-suite...cations/hexxagon/hexxagon.test 657.00 69.00 -89.5% test-suite...fice-ispell/office-ispell.test 2712.00 285.00 -89.5% test-suite.../CINT2006/403.gcc/403.gcc.test 139613.00 14992.00 -89.3% test-suite...lications/ClamAV/clamscan.test 25880.00 2785.00 -89.2% Reviewers: MatzeB, atrick, efriedma, niravd Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D60839 llvm-svn: 361253	2019-05-21 13:04:53 +00:00
Dylan McKay	e967308da4	Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess Summary: The endianess used in the calling convention does not always match the endianess of the target on all architectures, namely AVR. When an argument is too large to be legalised by the architecture and is split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian is queried to find the endianess that function arguments must be laid out in. This approach was recommended by Eli Friedman. Originally reported in https://github.com/avr-rust/rust/issues/129. Patch by Carl Peto. Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma Reviewed By: efriedma Subscribers: JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62003 llvm-svn: 361222	2019-05-21 06:38:02 +00:00
Craig Topper	97d4f7c194	[SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto. Since INLINEASM_BR is a terminator we need to flush the pending exports before emitting it. If we don't do this, a TokenFactor can be inserted between it and the BR instruction emitted to finish the callbr lowering. It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit the TokenFactor before that. Differential Revision: https://reviews.llvm.org/D59981 llvm-svn: 361177	2019-05-20 17:08:02 +00:00
Craig Topper	af7a188453	[Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64 We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well. Differential Revision: https://reviews.llvm.org/D62026 llvm-svn: 361169	2019-05-20 16:27:09 +00:00
Craig Topper	203bfdd0f0	[DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC This changes the isShift variable to include the constant operand check that was previously in the if statement. While there fix an 80 column violation and an unnecessary use of getNode. Also fix variable name capitalization. llvm-svn: 361168	2019-05-20 16:26:55 +00:00
Nikita Popov	9060b6df97	[SDAG] Vector op legalization for overflow ops Fixes issue reported by aemerson on D57348. Vector op legalization support is added for uaddo, usubo, saddo and ssubo (umulo and smulo were already supported). As usual, by extracting TargetLowering methods and calling them from vector op legalization. Vector op legalization doesn't really deal with multiple result nodes, so I'm explicitly performing a recursive legalization call on the result value that is not being legalized. There are some existing test changes because expansion happens earlier, so we don't get a DAG combiner run in between anymore. Differential Revision: https://reviews.llvm.org/D61692 llvm-svn: 361166	2019-05-20 16:09:22 +00:00
Matt Arsenault	7c8ec18964	RegAlloc: Fix verifier error with undef identity copies The code did not match the example in the comment, and was checking the undef flag on the copy dest instead of source. The existing tests were only hitting the > 2 operands case. llvm-svn: 361156	2019-05-20 14:09:36 +00:00
Guillaume Chatelet	e386a01e84	[NFC] Refactor visitIntrinsicCall so it doesn't return a const char* Summary: API simplification Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61306 llvm-svn: 361140	2019-05-20 11:01:30 +00:00
Petar Jovanovic	e85bbf564d	[DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC) Refactor DIExpression::With* into a flag enum in order to be less error-prone to use (as discussed on D60866). Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D61943 llvm-svn: 361137	2019-05-20 10:35:57 +00:00
Guillaume Chatelet	a760e69840	Revert "[NFC] Refactor visitIntrinsicCall so it doesn't return a const char*" This reverts commit 706d3cd6388cc3446aab282f3af879862b10cbed. llvm-svn: 361130	2019-05-20 09:00:12 +00:00
Guillaume Chatelet	fa8c152576	[NFC] Refactor visitIntrinsicCall so it doesn't return a const char* Summary: API simplification Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61306 llvm-svn: 361129	2019-05-20 08:52:10 +00:00
Simon Pilgrim	2b45a70fd6	MemCmpExpansion::getCompareLoadPairs - assert we find a comparison diff. NFCI. Fix scan-build uninitialized warning and assert the final diff isn't null. llvm-svn: 361095	2019-05-18 11:31:48 +00:00
Matt Arsenault	02b5ca8cd1	GlobalISel: Implement lower for S64->S32 [SU]ITOFP This is ported from the custom AMDGPU DAG implementation. I think this is a better default expansion than what the DAG currently uses, at least if the target has CTLZ. This implements the signed version in terms of the unsigned conversion, which is implemented with bit operations. SelectionDAG has several other implementations that should eventually be ported depending on what instructions are legal. llvm-svn: 361081	2019-05-17 23:05:13 +00:00
Matt Arsenault	f3cedf4823	GlobalISel: Define integer min/max instructions Doesn't attempt to emit them for anything yet, but some legalizations I want to port use them. llvm-svn: 361061	2019-05-17 18:36:31 +00:00
Roman Lebedev	64c756b991	[DAGCombiner] visitShiftByConstant(): drop bogus signbit check Summary: That check claims that the transform is illegal otherwise. That isn't true: 1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter https://rise4fun.com/Alive/K4A 2. For `ISD::AND`, there is no restriction on constants: https://rise4fun.com/Alive/Wy3 3. For `ISD::OR`, there is no restriction on constants: https://rise4fun.com/Alive/GOH 3. For `ISD::XOR`, there is no restriction on constants: https://rise4fun.com/Alive/ml6 So, why is it there then? This changes the testcase that was touched by @spatel in rL347478, but i'm not sure that test tests anything particular? Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin Reviewed By: spatel Subscribers: javed.absar, llvm-commits, spatel Tags: #llvm Differential Revision: https://reviews.llvm.org/D61918 llvm-svn: 361044	2019-05-17 15:52:58 +00:00
Matt Arsenault	1448f5689e	AMDGPU/GlobalISel: Legalize G_FCOPYSIGN llvm-svn: 361025	2019-05-17 12:19:52 +00:00
Fangrui Song	ec6dc3089e	[GlobalISel] Fix -Wsign-compare on 32-bit -DLLVM_ENABLE_ASSERTIONS=on builds llvm-svn: 360989	2019-05-17 05:53:39 +00:00
Ben Dunbobbin	1d16515fb4	[ELF] Implement Dependent Libraries Feature This patch implements a limited form of autolinking primarily designed to allow either the --dependent-library compiler option, or "comment lib" pragmas ( https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically add the specified library to the link when processing the input file generated by the compiler. Currently this extension is unique to LLVM and LLD. However, care has been taken to design this feature so that it could be supported by other ELF linkers. The design goals were to provide: - A simple linking model for developers to reason about. - The ability to to override autolinking from the linker command line. - Source code compatibility, where possible, with "comment lib" pragmas in other environments (MSVC in particular). Dependent library support is implemented differently for ELF platforms than on the other platforms. Primarily this difference is that on ELF we pass the dependent library specifiers directly to the linker without manipulating them. This is in contrast to other platforms where they are mapped to a specific linker option by the compiler. This difference is a result of the greater variety of ELF linkers and the fact that ELF linkers tend to handle libraries in a more complicated fashion than on other platforms. This forces us to defer handling the specifiers to the linker. In order to achieve a level of source code compatibility with other platforms we have restricted this feature to work with libraries that meet the following "reasonable" requirements: 1. There are no competing defined symbols in a given set of libraries, or if they exist, the program owner doesn't care which is linked to their program. 2. There may be circular dependencies between libraries. The binary representation is a mergeable string section (SHF_MERGE, SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES (0x6fff4c04). The compiler forms this section by concatenating the arguments of the "comment lib" pragmas and --dependent-library options in the order they are encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs sections with the normal mergeable string section rules. As an example, #pragma comment(lib, "foo") would result in: .section ".deplibs","MS",@llvm_dependent_libraries,1 .asciz "foo" For LTO, equivalent information to the contents of a the .deplibs section can be retrieved by the LLD for bitcode input files. LLD processes the dependent library specifiers in the following way: 1. Dependent libraries which are found from the specifiers in .deplibs sections of relocatable object files are added when the linker decides to include that file (which could itself be in a library) in the link. Dependent libraries behave as if they were appended to the command line after all other options. As a consequence the set of dependent libraries are searched last to resolve symbols. 2. It is an error if a file cannot be found for a given specifier. 3. Any command line options in effect at the end of the command line parsing apply to the dependent libraries, e.g. --whole-archive. 4. The linker tries to add a library or relocatable object file from each of the strings in a .deplibs section by; first, handling the string as if it was specified on the command line; second, by looking for the string in each of the library search paths in turn; third, by looking for a lib<string>.a or lib<string>.so (depending on the current mode of the linker) in each of the library search paths. 5. A new command line option --no-dependent-libraries tells LLD to ignore the dependent libraries. Rationale for the above points: 1. Adding the dependent libraries last makes the process simple to understand from a developers perspective. All linkers are able to implement this scheme. 2. Error-ing for libraries that are not found seems like better behavior than failing the link during symbol resolution. 3. It seems useful for the user to be able to apply command line options which will affect all of the dependent libraries. There is a potential problem of surprise for developers, who might not realize that these options would apply to these "invisible" input files; however, despite the potential for surprise, this is easy for developers to reason about and gives developers the control that they may require. 4. This algorithm takes into account all of the different ways that ELF linkers find input files. The different search methods are tried by the linker in most obvious to least obvious order. 5. I considered adding finer grained control over which dependent libraries were ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this is not necessary: if finer control is required developers can fall back to using the command line directly. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html. Differential Revision: https://reviews.llvm.org/D60274 llvm-svn: 360984	2019-05-17 03:44:15 +00:00
Amy Huang	c2029068bc	Emit global variables as S_CONSTANT records for codeview debug info. Summary: This emits S_CONSTANT records for global variables. Currently this emits records for the global variables already being tracked in the LLVM IR metadata, which are just constant global variables; we'll also want S_CONSTANTs for static data members and enums. Related to https://bugs.llvm.org/show_bug.cgi?id=41615 Reviewers: rnk Subscribers: aprantl, hiraditya, llvm-commits, thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D61926 llvm-svn: 360948	2019-05-16 22:28:52 +00:00
Tim Renouf	e3cbdaf1b5	[CodeGen] Fixed de-optimization of legalize subvector extract The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU 3-dword memory instructions, caused a de-optimization problem for code with such a load that then bitcasts via vector of i8, because v12i8 is not an MVT so it legalizes the bitcast by widening it. This commit adds the ability to widen a bitcast using extract_subvector on the result, so the value does not need to go via memory. Differential Revision: https://reviews.llvm.org/D60457 Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64 llvm-svn: 360942	2019-05-16 21:49:06 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Matt Arsenault	828b685ebe	RegAllocFast: Improve hinting heuristic Trace through multiple COPYs when looking for a physreg source. Add hinting for vregs that will be copied into physregs (we only hinted for vregs getting copied to a physreg previously). Give hinted a register a bonus when deciding which value to spill. This is part of my rewrite regallocfast series. In fact this one doesn't even have an effect unless you also flip the allocation to happen from back to front of a basic block. Nonetheless it helps to split this up to ease review of D52010 Patch by Matthias Braun llvm-svn: 360887	2019-05-16 12:50:39 +00:00
Matt Arsenault	27ac8408f6	GlobalISel: Add DstOp version of buildIntrinsic llvm-svn: 360879	2019-05-16 12:22:56 +00:00
Matt Arsenault	11be78bc7a	GlobalISel: Add buildFConstant for APFloat llvm-svn: 360853	2019-05-16 04:09:06 +00:00
Matt Arsenault	012ecbbbba	GlobalISel: Fix indentation llvm-svn: 360851	2019-05-16 04:08:46 +00:00
Matt Arsenault	55146d3139	GlobalISel: Add G_FCOPYSIGN llvm-svn: 360850	2019-05-16 04:08:39 +00:00
Reid Kleckner	4882490349	[codeview] Fix SDNode representation of annotation labels Before this change, they were erroneously constructed with the EH_LABEL SDNode opcode, which caused other passes to interact with them in incorrect ways. See the FIXME about fastisel that this addresses in the existing test case. Fixes PR41890 llvm-svn: 360818	2019-05-15 21:46:05 +00:00
Nicolai Haehnle	f672b6170c	[MachineOperand] Add a ChangeToGA method Summary: Analogous to the other ChangeToXXX methods. See the next patch for a use case. Change-Id: I6548d614706834fb9109ab3c8fe915e9c6ece2a7 Reviewers: arsenm, kzhuravl Subscribers: wdng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61651 llvm-svn: 360789	2019-05-15 17:48:10 +00:00
Nicolai Haehnle	664ceeda68	RegAlloc: try to fail more gracefully when out of registers Summary: The emitError path allows the program to continue, unlike report_fatal_error. This is friendlier to use cases where LLVM is embedded in a larger program, because the caller may be able to deal with the error somewhat gracefully. Change the number of requested NOP bytes in the AArch64 and PowerPC test cases to avoid triggering an unrelated assertion. The compilation still fails, as verified by the test. Change-Id: Iafb9ca341002a597b82e59ddc7a1f13c78758e3d Reviewers: arsenm, MatzeB Subscribers: qcolombet, nemanjai, wdng, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61489 llvm-svn: 360786	2019-05-15 17:29:58 +00:00
Clement Courbet	d9d0665d1c	[[DAGCombiner][NFC] Add a comment. As suggested in D61846. llvm-svn: 360755	2019-05-15 08:21:18 +00:00
Fangrui Song	f4dfd63c74	[IR] Disallow llvm.global_ctors and llvm.global_dtors of the 2-field form in textual format The 3-field form was introduced by D3499 in 2014 and the legacy 2-field form was planned to be removed in LLVM 4.0 For the textual format, this patch migrates the existing 2-field form to use the 3-field form and deletes the compatibility code. test/Verifier/global-ctors-2.ll checks we have a friendly error message. For bitcode, lib/IR/AutoUpgrade UpgradeGlobalVariables will upgrade the 2-field form (add i8* null as the third field). Reviewed By: rnk, dexonsmith Differential Revision: https://reviews.llvm.org/D61547 llvm-svn: 360742	2019-05-15 02:35:32 +00:00
Fangrui Song	2f6ef2fc92	DWARF v5: emit DW_AT_addr_base if DW_AT_low_pc references .debug_addr The condition !AddrPool.empty() is tested before attachRangesOrLowHighPC(), which may add an entry to AddrPool. We emit DW_AT_low_pc (DW_FORM_addrx) but may incorrectly omit DW_AT_addr_base for LineTablesOnly. This can be easily reproduced: clang -gdwarf-5 -gmlt -c a.cc Fix this by moving !AddrPool.empty() below. This was discovered while investigating an lld crash (fixed by D61889) on such object files: ld.lld --gdb-index a.o Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D61891 llvm-svn: 360678	2019-05-14 14:37:26 +00:00
Diana Picus	a568222ddd	[IRTranslator] Don't hardcode GEP index type When breaking up loads and stores of aggregates, the IRTranslator uses LLT::scalar(64) for the index type of the G_GEP instructions that compute the addresses. This is unnecessarily large for 32-bit targets. Use the int ptr type provided by the DataLayout instead. Note that we're already doing the right thing when translating getelementptr instructions from the IR. This is just an oversight when generating new ones while translating loads/stores. Both x86 and AArch64 already have tests confirming that the old behaviour is preserved for 64-bit targets. Differential Revision: https://reviews.llvm.org/D61852 llvm-svn: 360656	2019-05-14 09:25:17 +00:00
Sanjay Patel	99d6420a82	[SDAG] fix unused variable warning and unneeded indirection; NFC llvm-svn: 360640	2019-05-14 00:57:31 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	d3cedee3c6	[TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG More work for PR39709. llvm-svn: 360592	2019-05-13 15:51:26 +00:00
Sanjay Patel	05dafb1c97	[DAGCombiner] narrow vector binop with inserts/extract We catch most of these patterns (on x86 at least) by matching a concat vectors opcode early in combining, but the pattern may emerge later using insert subvector instead. The AVX1 diffs for add/sub overflow show another missed narrowing pattern. That one may be falling though the cracks because of combine ordering and multiple uses. llvm-svn: 360585	2019-05-13 14:31:14 +00:00
Kevin P. Neal	5987749e33	Add constrained fptrunc and fpext intrinsics. The new fptrunc and fpext intrinsics are constrained versions of the regular fptrunc and fpext instructions. Reviewed by: Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D55897 llvm-svn: 360581	2019-05-13 13:23:30 +00:00
Simon Pilgrim	d845bc3d0c	TargetLowering::SimplifyDemandedBits - early-out for UNDEF ops. NFCI. llvm-svn: 360579	2019-05-13 12:44:03 +00:00
Clement Courbet	9afc4764dd	[DAGCombiner] Fix invalid alias analysis. Summary: When we know for sure whether two addresses do or do not alias, we should immediately return from DAGCombiner::isAlias(). I think this comes from a bad copy/paste, Sorry for not catching that during the code review. Fixes PR41855. Reviewers: niravd, gchatelet, EricWF Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61846 llvm-svn: 360566	2019-05-13 09:07:37 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
Sanjay Patel	a09e686821	[DAGCombiner] try to move bitcast after extract_subvector I noticed that we were failing to narrow an x86 ymm math op in a case similar to the 'madd' test diff. That is because a bitcast is sitting between the math and the extract subvector and thwarting our pattern matching for narrowing: t56: v8i32 = add t59, t58 t68: v4i64 = bitcast t56 t73: v2i64 = extract_subvector t68, Constant:i64<2> t96: v4i32 = bitcast t73 There are a few wins and neutral diffs in the other tests. Differential Revision: https://reviews.llvm.org/D61806 llvm-svn: 360541	2019-05-12 14:43:20 +00:00
Simon Pilgrim	605a840747	[DAG] Add SimplifyDemandedBits support for BITREVERSE Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC llvm-svn: 360534	2019-05-11 20:56:05 +00:00
Simon Pilgrim	aeed0a30c0	SelectionDAGISel::CodeGenAndEmitDAG - remove unused variable. NFCI. llvm-svn: 360514	2019-05-11 11:00:37 +00:00
Jordan Rupprecht	16c7fbd112	Revert [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor This reverts r360171 (git commit `a9d6c32eaf`). A repro showing the asan/msan failures is forthcoming. llvm-svn: 360481	2019-05-10 23:20:02 +00:00
Craig Topper	114f763f37	[LegalizeVectorOps] Remove calls to LegalizeOp on the return value from ExpandLoad/ExpandStore. We already updated the LegalizedNodes map at the end of the Expand call. This would have marked the new node as being mapped to itself. So the LegalizeOp call will find that an immediately return. llvm-svn: 360472	2019-05-10 21:42:27 +00:00
Nikita Popov	9f7537bd48	[SDAG] Recursively legalize both vector mulo results Split out from D61692 per RKSimon's suggestion. Vector op legalization will automatically recursively legalize the returned SDValue, but we need to take care of the other results ourselves. Otherwise it will end up getting legalized only during op legalization, by which point it might be too late (though I'm not aware of any specific cases right now). There are codegen differences because expansion occurs earlier now and we don't get a DAGCombiner run in between. Differential Revision: https://reviews.llvm.org/D61744 llvm-svn: 360470	2019-05-10 20:42:48 +00:00
Sanjay Patel	b37ddeafc0	[DAGCombiner] reduce code duplication; NFC llvm-svn: 360462	2019-05-10 20:02:30 +00:00
David Blaikie	7598b71488	DebugInfo: Only move types out of type units if they're named or type united Follow up to r359122, after a bug was reported in it - the original change too aggressively tried to move related types out of type units, which included unnamed types (like array types) which can't reasonably be declared-but-not-defined. A step beyond that is that some types in type units can be anonymous, if they are types with a name for linkage purposes (eg: "typedef struct { } x;"). So ensure those don't get turned into plain declarations (without signatures) because, lacking names, they can't be resolved to the definition. [Also include a fix for llvm-dwarfdump/libDebugInfoDWARF to pretty print types in type units] llvm-svn: 360458	2019-05-10 19:15:29 +00:00
Momchil Velikov	c396f09ce9	Adjust MachineScheduler to use ProcResource counts This fix allows the scheduler to take into account the number of instances of each ProcResource specified. Previously a declaration in a scheduler of ProcResource<1> would be treated identically to a declaration of ProcResource<2>. Now the hazard recognizer would report a hazard only after all of the resource instances are busy. Patch by Jackson Woodruff and Momchil Velikov. Differential Revision: https://reviews.llvm.org/D51160 llvm-svn: 360441	2019-05-10 16:54:32 +00:00
Tim Northover	6c1e3f9493	SelectionDAG: accommodate atomic floating stores. We were applying a pointer truncation to floating types, which crashed LLVM. That is Not A Good Thing(TM). llvm-svn: 360421	2019-05-10 11:23:04 +00:00
Cameron McInally	156eb28289	[CodeGen] Add comment about FSUB <-> FNEG xforms Differential Revision: https://reviews.llvm.org/D61741 llvm-svn: 360366	2019-05-09 19:28:52 +00:00
Florian Hahn	be10bc71f9	[DAGCombiner] Limit number of nodes explored as store candidates. To find the candidates to merge stores we iterate over all nodes in a chain for each store, which leads to quadratic compile times for large basic blocks with a large number of stores. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61511 llvm-svn: 360357	2019-05-09 17:05:52 +00:00
Simon Pilgrim	38ef296265	[CodeGenPrepare] Ensure we get a non-null result from getTrueOrFalseValue. NFCI. llvm-svn: 360328	2019-05-09 10:51:26 +00:00
Markus Lavin	92d5db524e	Make sub-registers index names case sensitive in the MIRParser Prior to this change sub-register index names are assumed to be lower case (but they are printed with original casing). This means that if a target has some upper case characters in its sub-register names then mir-export directly followed by mir-import is not possible. This also means that sub-register indices currently are (and will continue to be) slightly inconsistent with register names which are printed and assumed to be lower case. As the current textual representation of mir has a few inconsistencies in this area it is a bit arbitrary how to address the matter. This change is towards the direction that we feel is most correct (i.e. case sensitivity). Differential Revision: https://reviews.llvm.org/D61499 llvm-svn: 360318	2019-05-09 08:29:04 +00:00
Pengfei Wang	c05aad0532	Bugfix for nullptr check by klocwork Klocwork static check: Pointer from call to function `DebugLoc::operator DILocation *() const ` may be NULL and will be dereference in function `printExtendedName``` Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D61715 llvm-svn: 360317	2019-05-09 08:09:21 +00:00
Bjorn Pettersson	8d19e94f13	[CodeGen] Use "DL.getPointerSizeInBits" instead of "8 * DL.getPointerSize". NFC llvm-svn: 360315	2019-05-09 08:07:36 +00:00
Leonard Chan	95b7abdcc5	[SelectionDAG] Expand ADD/SUBCARRY This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it. Differential Revision: https://reviews.llvm.org/D61411 llvm-svn: 360303	2019-05-09 01:17:48 +00:00
Eric Christopher	c93f56d39e	Temporarily Revert "[DebugInfo] Terminate more location-list ranges at the end of blocks" as it was causing significant compile time regressions. This reverts commit r359426 while we come up with testcases and additional ideas. llvm-svn: 360301	2019-05-08 23:54:03 +00:00
Sanjay Patel	902b3ecdad	[SelectionDAG] fold 'fneg undef' to undef This is extracted from the original draft of D61419 with some additional tests. We don't currently get this in IR (it's conservatively turned into a NaN), but presumably that'll get updated as we add real IR support for 'fneg' rather than 'fsub -0.0, x'. The x86-32 run shows the following, and I haven't looked further to see why, but that seems to be independent: Legalizing: t1: f32 = undef Trying to expand node Creating fp constant: t4: f32 = ConstantFP<0.000000e+00> Differential Revision: https://reviews.llvm.org/D61516 llvm-svn: 360296	2019-05-08 22:19:52 +00:00
Quentin Colombet	157427245a	[RegAllocFast] Scan physcial reg definitions before assigning virtual reg definitions When assigning the definitions of an instruction we were updating the available registers while walking the definitions. Some of those definitions may be from physical registers and thus, they are not available for other definitions to take, but by the time we see that we may have already assign these registers to another virtual register. Fix that by walking through all the definitions and mark as unavailable the physical register definitions, then do the virtual register assignments. PR41790 llvm-svn: 360278	2019-05-08 18:30:26 +00:00
Craig Topper	493aec3ef5	[FastISel][X86] Support FNeg instruction in target independent fast isel handling This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom Differential Revision: https://reviews.llvm.org/D61624 llvm-svn: 360273	2019-05-08 17:27:08 +00:00
Simon Pilgrim	2788ad3ee2	[LegalizeDAG] Assert non-power-of-2 load/store op splits are in range. NFCI. Fixes static analyzer undefined/out-of-range shift warnings. llvm-svn: 360245	2019-05-08 11:22:10 +00:00
Simon Pilgrim	97a0c54179	Fix cppcheck operator precedence warning. NFCI. llvm-svn: 360234	2019-05-08 10:07:34 +00:00
QingShan Zhang	0e71a6e755	[CodeGenPrepare] Don't split the store if it is volatile We shouldn't split the store when it is volatile. Differential Revision: https://reviews.llvm.org/D61169 llvm-svn: 360228	2019-05-08 07:32:12 +00:00
QingShan Zhang	e065af6a42	[NFC] Add a static function to do the endian check Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. Differential Revision: https://reviews.llvm.org/D61236 llvm-svn: 360226	2019-05-08 07:21:37 +00:00
Austin Kerbow	6e6480e216	[CodeGen] Rename DEBUG_TYPE for default hazard recognizer. Summary: The DEBUG_TYPE of the default hazard recognizer should be updated to match the DEBUG_TYPE of the machine-scheduler pass. Reviewers: rampitec Reviewed By: rampitec Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61359 llvm-svn: 360198	2019-05-07 22:09:04 +00:00
Adrian Prantl	e6e8db5e9b	Debug Info: Support address space attributes on rvalue references. DWARF5, 2.12 20ff says that Any debugging information entry representing a pointer or reference type [may have a DW_AT_address_class attribute]. The existing code (https://reviews.llvm.org/D29670) seems to take a quite literal interpretation of that wording. I don't see a reason why an rvalue reference isn't a reference type in the spirit of that paragraph. This patch allows rvalue references to also have address spaces. rdar://problem/50511483 Differential Revision: https://reviews.llvm.org/D61625 llvm-svn: 360176	2019-05-07 17:42:38 +00:00
Florian Hahn	a9d6c32eaf	[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor When simplifying TokenFactors, we potentially iterate over all operands of a large number of TokenFactors. This causes quadratic compile times in some cases and the large token factors cause additional scalability problems elsewhere. This patch adds some limits to the number of nodes explored for the cases mentioned above. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61397 llvm-svn: 360171	2019-05-07 16:47:27 +00:00
Simon Pilgrim	3044ac058b	Avoid use-after-move warnings by using swap instead. NFCI. Swap should be as quick in these cases, and leaves the original variables in a known (empty) state. llvm-svn: 360164	2019-05-07 15:45:00 +00:00
Craig Topper	c6d445f9c1	[FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub. Summary: If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd. I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that. Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma Reviewed By: cameron.mcinally Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61622 llvm-svn: 360111	2019-05-07 04:25:24 +00:00
Fangrui Song	da82ce99b7	[DebugInfo] Delete TypedDINodeRef TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T . Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} or their const variants. This allows us to delete many resolve() calls that clutter the code. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D61369 llvm-svn: 360108	2019-05-07 02:06:37 +00:00
Amy Huang	987b969bab	Fix bug in getCompleteTypeIndex in codeview debug info Summary: When there are multiple instances of a forward decl record type, only the first one is emitted with a type index, because the type is added to a map with a null type index. Avoid this by reordering so that forward decl types aren't added to the map. Reviewers: rnk Subscribers: aprantl, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61460 llvm-svn: 360101	2019-05-06 23:37:03 +00:00
Craig Topper	39f1a97417	[FastISel] Pass the fneg input operand to hasTrivialKill in FastISel::selectFNeg. We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here. llvm-svn: 360097	2019-05-06 23:09:09 +00:00
Philip Reames	2f53d79bff	Fix pr33010, a 2 year old crashing regression The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation. Arguably, we shouldn't be using TargetFrameIndices in StatepointLowering at all, but that's a much deeper change. llvm-svn: 360090	2019-05-06 22:09:31 +00:00
Craig Topper	ad56843dd7	[SelectionDAG][X86] Support inline assembly returning an mmx register into a type with fewer than 64 bits. It's possible to use the 'y' mmx constraint with a type narrower than 64-bits. This patch supports this by bitcasting the mmx type to 64-bits and then truncating to the desired type. There are probably other missing type combinations we need to support, but this is the case we have a bug report for. Fixes PR41748. Differential Revision: https://reviews.llvm.org/D61582 llvm-svn: 360069	2019-05-06 19:50:14 +00:00
Craig Topper	55a71b575c	Revert r359392 and r358887 Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066	2019-05-06 19:29:24 +00:00
Nikita Popov	cfe786a195	[SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635) This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635 by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is legal but former is not) for zero-or-all-ones boolean reductions (which are detected based on sign bits). Differential Revision: https://reviews.llvm.org/D61398 llvm-svn: 360054	2019-05-06 16:17:17 +00:00
Craig Topper	f723490e76	[SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with a report_fatal_error. Based on PR41748, not all cases are handled in this function. llvm_unreachable is treated as an optimization hint than can prune code paths in a release build. This causes weird behavior when PR41748 is encountered on a release build. It appears to generate an fp_round instruction from the floating point code. Making this a report_fatal_error prevents incorrect optimization of the code and will instead generate a message to file a bug report. llvm-svn: 360008	2019-05-06 04:01:49 +00:00
Roman Lebedev	1a1b922177	[NFC] BasicBlock: refactor changePhiUses() out of replacePhiUsesWith(), use it Summary: It is a common thing to loop over every `PHINode` in some `BasicBlock` and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block. `replaceSuccessorsPhiUsesWith()` already had code to do that, it just wasn't a function. So outline it into a new function, and use it. Reviewers: chandlerc, craig.topper, spatel, danielcdh Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61013 llvm-svn: 359996	2019-05-05 18:59:39 +00:00
Roman Lebedev	e3b1d82b53	[NFC] PHINode: introduce replaceIncomingBlockWith() function, use it Summary: There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()` and `PHINode::getNumOperands()`, but no function to replace every specified `BasicBlock` predecessor with some other specified `BasicBlock`. Clearly, there are a lot of places that could use that functionality. Reviewers: chandlerc, craig.topper, spatel, danielcdh Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61011 llvm-svn: 359995	2019-05-05 18:59:30 +00:00
Simon Pilgrim	0f89b76b84	[SelectionDAG] Use any_of/all_of where possible. NFCI. llvm-svn: 359974	2019-05-05 10:30:04 +00:00
Sanjay Patel	5ab41a7a05	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block (2nd try) This is a subset of the original commit from rL359879 which was reverted because it could crash when using the 'RemovedInstructions' structure that enables delayed deletion of dead instructions. The motivating compile-time win does not require that change though. We should get most of that win from this change alone. Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359969	2019-05-04 12:46:32 +00:00
Matt Arsenault	b6c599afd3	Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block" This reverts commit r359912. This should pass now, since the clang test was made less fragile in r359918. llvm-svn: 359919	2019-05-03 19:06:57 +00:00
Simon Pilgrim	5d3b100750	[DAGCombine] Remove repeated variables. NFCI. llvm-svn: 359915	2019-05-03 18:20:28 +00:00
Nico Weber	bb852a9672	Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block" Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail. llvm-svn: 359912	2019-05-03 18:08:03 +00:00
Simon Pilgrim	308b5ec1ff	[TargetLowering] SimplifySetCC - remove repeated variable. NFCI. Also reduce scope of Temp variable. llvm-svn: 359911	2019-05-03 18:02:33 +00:00
Evgeniy Stepanov	46ec57e576	Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic block" This reverts commit r359879, which introduced a compiler crash. llvm-svn: 359908	2019-05-03 17:31:49 +00:00
Matt Arsenault	daf2d653fa	RegAllocFast: Add heuristic to detect values not live-out of a block Add an improved/new heuristic to catch more cases when values are not live out of a basic block. Patch by Matthias Braun llvm-svn: 359906	2019-05-03 17:03:24 +00:00
Simon Pilgrim	d857f64c31	[SelectionDAG] CreateTopologicalOrder - don't use iterator We shouldn't use an iterator to loop across a std::vector when the same loop is adding elements to that std::vector Found by cppcheck llvm-svn: 359900	2019-05-03 15:50:37 +00:00
Simon Pilgrim	bc876df3a5	[TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. NFCI. Only ever used in one block llvm-svn: 359890	2019-05-03 14:38:24 +00:00
Sanjay Patel	8ff072e48e	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. Also, we were restarting the iterator loops when doing the overflow intrinsic transforms by marking the dominator tree for update. That was done to prevent iterating over a removed instruction. But we can postpone the deletion using the existing "RemovedInsts" structure, and that means we don't need to update the DT. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359879	2019-05-03 13:09:18 +00:00
Simon Pilgrim	e798e3a346	[TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI. Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings. llvm-svn: 359877	2019-05-03 12:55:25 +00:00
Anton Afanasyev	6d08b8dbae	Revert "[MIR] Add simple PRE pass to MachineCSE" This reverts commit `9c20156de3`. It breaks stage 2 of clang-ppc64be-linux-multistage. llvm-svn: 359875	2019-05-03 12:36:22 +00:00
Simon Pilgrim	42d2b604b5	[SelectionDAG] Use INT_MIN as (1 << 31) is UB for signed integers. NFCI. llvm-svn: 359873	2019-05-03 11:32:00 +00:00
Simon Pilgrim	bfd00a6440	[SelectionDAG] computeKnownBits - remove some duplicate/shadow variables. NFCI. llvm-svn: 359872	2019-05-03 11:11:03 +00:00
Anton Afanasyev	9c20156de3	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 Reviewers: RKSimon Subscribers: hfinkel, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D56772 llvm-svn: 359870	2019-05-03 10:30:59 +00:00
Quentin Colombet	c9256cc6ba	[IRTranslator] Use the alloc size instead of the store size when translating allocas We use to incorrectly use the store size instead of the alloc size when creating the stack slot for allocas. On aarch64 this can be demonstrated by allocating weirdly sized types. For instance, in the added test case, we use an alloca for i19. We used to allocate a slot of size 24-bit (19 rounded up to the next byte), whereas we really want to use a full 32-bit slot for this type. llvm-svn: 359856	2019-05-03 01:23:56 +00:00
Eli Friedman	0b61d220c9	[AArch64][Windows] Compute function length correctly in unwind tables. The primary fix here is to WinException.cpp: we need to exclude jump tables when computing the length of a function, or else we fail to correctly compute the length. (We can only compute the number of bytes consumed by certain assembler directives after the entire file is parsed. ".p2align" is one of those directives, and is used by jump table generation.) The secondary fix, to MCWin64EH, is to make sure we don't silently miscompile if we hit a similar situation in the future. It's possible we could extend ARM64EmitUnwindInfo so it allows function bodies that contain assembler directives, but that's a lot more complicated; see the FIXME in MCWin64EH.cpp. Fixes https://bugs.llvm.org/show_bug.cgi?id=41581 . Differential Revision: https://reviews.llvm.org/D61095 llvm-svn: 359849	2019-05-03 00:10:45 +00:00
Craig Topper	e8a1cde886	[SelectionDAG] Add asserts to verify the vectorness of input and output types of TRUNCATE/ZERO_EXTEND/ANY_EXTEND/SIGN_EXTEND agree As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR Differential Revision: https://reviews.llvm.org/D61463 llvm-svn: 359836	2019-05-02 22:26:26 +00:00
Sanjay Patel	1972826178	[DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try) The original patch was committed at rL359398 and reverted at rL359695 because of infinite looping. This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop. Original commit message: This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359793	2019-05-02 15:02:08 +00:00
Sanjay Patel	284472be6d	[SelectionDAG] remove constant folding limitations based on FP exceptions We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791	2019-05-02 14:47:59 +00:00
Sanjay Patel	64d5751254	Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate" This reverts commit `fb9a5307a9` (rL359398) because it can cause an infinite loop due to opposing combines. llvm-svn: 359695	2019-05-01 16:06:21 +00:00
Tim Northover	ee2474df9f	DAG: allow DAG pointer size different from memory representation. In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG builder code so that pointers are allowed to have a larger type when "live" in the DAG compared to memory. Pointers get zero-extended whenever they are loaded, and truncated prior to stores. In addition, a few not quite so obvious locations need updating: * A GEP that has not been marked inbounds needs to enforce the IR-documented 2s-complement wrapping at the memory pointer size. Inbounds GEPs are undefined if they overflow the address space, so no additional operations are needed. * Signed comparisons would give incorrect results if performed on the zero-extended values. This shouldn't affect CodeGen for now, but will become active when the AArch64 ILP32 support is committed. llvm-svn: 359676	2019-05-01 12:37:30 +00:00
Sanjay Patel	0387bf5269	[SelectionDAG] remove div-by-zero constant folding restriction We don't have this restriction in IR, so it should not be here either simply out of consistency. Code that wants to handle FP exceptions is expected to use the 'strict' variants of these nodes. We don't get the frem case because frem by 0.0 produces NaN (invalid), and that's the remaining check here (so the removed check for frem was dead code AFAIK). This is the only place in SDAG that uses "HasFPExceptions", so I think we should remove that entirely as a follow-up patch. llvm-svn: 359566	2019-04-30 14:37:15 +00:00
Sjoerd Meijer	0ed4619679	[TargetLowering] findOptimalMemOpLowering. NFCI. This was a local static funtion in SelectionDAG, which I've promoted to TargetLowering so that I can reuse it to estimate the cost of a memory operation in D59787. Differential Revision: https://reviews.llvm.org/D59766 llvm-svn: 359543	2019-04-30 10:09:15 +00:00
Fangrui Song	7bce25cd7d	[AsmPrinter] Make AsmPrinter::HandlerInfo::Handler a unique_ptr Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers. A unique_ptr makes the ownership clearer. llvm-svn: 359541	2019-04-30 09:14:02 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Markus Lavin	a475da36eb	[DebugInfo] DW_OP_deref_size in PrologEpilogInserter. The PrologEpilogInserter need to insert a DW_OP_deref_size before prepending a memory location expression to an already implicit expression to avoid having the existing expression act on the memory address instead of the value behind it. The reason for using DW_OP_deref_size and not plain DW_OP_deref is that big-endian targets need to read the right size as simply truncating a larger read would yield the wrong result (LSB bytes are not at the lower address). This re-commit fixes issues reported in the first one. Namely deref was inserted under wrong conditions and additionally the deref_size argument was incorrectly encoded. Differential Revision: https://reviews.llvm.org/D59687 llvm-svn: 359535	2019-04-30 07:58:57 +00:00
Zi Xuan Wu	49d60fdc2e	[DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), if adde is not legal for the target. Even it's at type-legalize phase. Because adde is special and will not be legalized at operation-legalize phase later. This fixes: PR40922 https://bugs.llvm.org/show_bug.cgi?id=40922 Differential Revision: https://reviews.llvm.org//D60854 llvm-svn: 359532	2019-04-30 03:01:14 +00:00
Simon Pilgrim	9b17b80a0e	computePolynomialFromPointer - add missing early-out return for non-pointer types. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359486	2019-04-29 19:25:16 +00:00
Daniel Sanders	8f079844d0	[globalisel] Improve Legalizer debug output * LegalizeAction should be printed by name rather than number * Newly created instructions are incomplete at the point the observer first sees them. They are therefore recorded in a small vector and printed just before the legalizer moves on to another instruction. By this point, the instruction must be complete. llvm-svn: 359481	2019-04-29 18:45:59 +00:00
Bjorn Pettersson	820994572c	[DAG] Refactor DAGCombiner::ReassociateOps Summary: Extract the logic for doing reassociations from DAGCombiner::reassociateOps into a helper function DAGCombiner::reassociateOpsCommutative, and use that helper to trigger reassociation on the original operand order, or the commuted operand order. Codegen is not identical since the operand order will be different when doing the reassociations for the commuted case. That causes some unfortunate churn in some test cases. Apart from that this should be NFC. Reviewers: spatel, craig.topper, tstellar Reviewed By: spatel Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61199 llvm-svn: 359476	2019-04-29 17:50:10 +00:00
Jeremy Morse	055aee1d8a	[DebugInfo] Terminate more location-list ranges at the end of blocks This patch fixes PR40795, where constant-valued variable locations can "leak" into blocks placed at higher addresses. The root of this is that DbgEntityHistoryCalculator terminates all register variable locations at the end of each block, but not constant-value variable locations. Fixing this requires constant-valued DBG_VALUE instructions to be broadcast into all blocks where the variable location remains valid, as documented in the LiveDebugValues section of SourceLevelDebugging.rst, and correct termination in DbgEntityHistoryCalculator. Differential Revision: https://reviews.llvm.org/D59431 llvm-svn: 359426	2019-04-29 09:13:16 +00:00
Sanjay Patel	fb9a5307a9	[DAGCombiner] try repeated fdiv divisor transform before building estimate This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359398	2019-04-28 12:23:43 +00:00
Nick Desaulniers	7ab164c4a4	[AsmPrinter] refactor to support %c w/ GlobalAddress' Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337	2019-04-26 18:45:04 +00:00
Simon Pilgrim	ef54b1dddf	[DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI. Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format. llvm-svn: 359326	2019-04-26 17:49:02 +00:00
Simon Pilgrim	5d6ef94c36	[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293	2019-04-26 10:49:13 +00:00
Marcello Maggioni	c596584f67	[GlobalISel] Fix inserting copies in the right position for reg definitions When constrainRegClass is called if the constraining happens on a use the COPY needs to be inserted before the instruction that contains the MachineOperand, but if we are constraining a definition it actually needs to be added after the instruction. In addition, the COPY needs to have its operands flipped (in the use case we are copying from the old unconstrained register to the new constrained register, while in the definition case we are copying from the new constrained register that the instruction defines to the old unconstrained register). llvm-svn: 359282	2019-04-26 07:21:56 +00:00
Craig Topper	f9c30eddd0	[SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the input needs to be be split and the output type is a vector. We had special case handling here, but it uses a scalar any_extend for the promotion then bitcasts to the final type. This won't split up the input data into multiple promoted elements like we need. This patch falls back to doing the conversion through memory. Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll changes. The changes to vector-half-conversions.ll are fixing a previously unknown miscompile from this issue. Differential Revision: https://reviews.llvm.org/D61114 llvm-svn: 359219	2019-04-25 18:19:59 +00:00
Jessica Paquette	ba55767f51	[GlobalISel][AArch64] Legalize G_FNEARBYINT Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc. Since the importer allows us to automatically select this after legalization, also add tests for selection etc. Also update arm64-vfloatintrinsics.ll. llvm-svn: 359204	2019-04-25 16:44:40 +00:00
Jessica Paquette	bd7ac30b15	[GlobalISel] Add IRTranslator support for G_FNEARBYINT Translate llvm.nearbyint into G_FNEARBYINT as a simple intrinsic. Update arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D60922 llvm-svn: 359203	2019-04-25 16:39:28 +00:00
Amy Huang	68c9199493	Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label). Summary: This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug info in codeview. Currently only changes FastISel, so emitting labels still needs to be implemented in SelectionDAG. Reviewers: rnk Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61083 llvm-svn: 359149	2019-04-24 23:02:48 +00:00
Sanjay Patel	6f41bf948b	[DAGCombiner] scale repeated FP divisor by splat factor If we have a vector FP division with a splatted divisor, use the existing transform that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then potentially be converted into a scalar FP division by existing combines (rL358984) as seen in the tests here. That can be a potentially big perf difference if scalar fdiv has better timing (including avoiding possible frequency throttling for vector ops). Differential Revision: https://reviews.llvm.org/D61028 llvm-svn: 359147	2019-04-24 22:28:58 +00:00
David Blaikie	832c7d9f36	DebugInfo: Emit only declarations (not whole definitions) of non-unit user defined types into type units While this doesn't come up in reasonable cases currently (the only user defined types not in type units are ones without linkage - which makes for near-ODR violations, because it'd be a type with linkage referencing a type without linkage - such a type can't be validly defined in more than one TU, so arguably it shouldn't be in a type unit to begin with - but it's a convenient way to demonstrate an issue that will become more revalent with homed modular debug info type definitions - which also don't need to be in type units but more legitimately so). Precursor to the Clang change to de-type-unit (by omitting the 'identifier') types homed due to strong linkage vtables. (making that change without this one would lead to major type duplication in type units) llvm-svn: 359122	2019-04-24 18:09:44 +00:00
Bjorn Pettersson	71e8c6f20f	Add "const" in GetUnderlyingObjects. NFC Summary: Both the input Value pointer and the returned Value pointers in GetUnderlyingObjects are now declared as const. It turned out that all current (in-tree) uses of GetUnderlyingObjects were trivial to update, being satisfied with have those Value pointers declared as const. Actually, in the past several of the users had to use const_cast, just because of ValueTracking not providing a version of GetUnderlyingObjects with "const" Value pointers. With this patch we get rid of those const casts. Reviewers: hfinkel, materi, jkorous Reviewed By: jkorous Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61038 llvm-svn: 359072	2019-04-24 06:55:50 +00:00
Francis Visoiu Mistrih	7fee2b89fd	[Remarks] Add string deduplication using a string table * Add support for uniquing strings in the remark streamer and emitting the string table in the remarks section. * Add parsing support for the string table in the RemarkParser. From this remark: ``` --- !Missed Pass: inline Name: NoDefinition DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c', Line: 7, Column: 3 } Function: printArgsNoRet Args: - Callee: printf - String: ' will not be inlined into ' - Caller: printArgsNoRet DebugLoc: { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c', Line: 6, Column: 0 } - String: ' because its definition is unavailable' ... ``` to: ``` --- !Missed Pass: 0 Name: 1 DebugLoc: { File: 3, Line: 7, Column: 3 } Function: 2 Args: - Callee: 4 - String: 5 - Caller: 2 DebugLoc: { File: 3, Line: 6, Column: 0 } - String: 6 ... ``` And the string table in the .remarks/__remarks section containing: ``` inline\0NoDefinition\0printArgsNoRet\0 test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c\0printf\0 will not be inlined into \0 because its definition is unavailable\0 ``` This is mostly supposed to be used for testing purposes, but it gives us a 2x reduction in the remark size, and is an incremental change for the updates to the remarks file format. Differential Revision: https://reviews.llvm.org/D60227 llvm-svn: 359050	2019-04-24 00:06:24 +00:00
Francis Visoiu Mistrih	1646851b87	[CGP] Look through bitcasts when duplicating returns for tail calls The simple case of: ``` int callee(); void caller(void *a) { if (a == NULL) return callee(); return a; } ``` would generate a regular call instead of a tail call because we don't look through the bitcast of the call to `callee` when duplicating the return blocks. Differential Revision: https://reviews.llvm.org/D60837 llvm-svn: 359041	2019-04-23 21:57:46 +00:00
Amy Huang	fc79ab9857	Revert "[MS] Emit S_HEAPALLOCSITE debug info" because of ToTWin64(db) buildbot failure. This reverts commit `d07d6d6177` and `c774f687b6`. llvm-svn: 359034	2019-04-23 21:12:58 +00:00
Jessica Paquette	3cc6d1f542	[AArch64][GlobalISel] Legalize G_INTRINSIC_ROUND Add it to the same rule as G_FCEIL etc. Add a legalizer test, and add a missing switch case to AArch64LegalizerInfo.cpp. llvm-svn: 359033	2019-04-23 21:11:57 +00:00
David Blaikie	2f51176223	Reapply: "DebugInfo: Emit only one kind of accelerated access/name table"" Originally committed in r358931 Reverted in r358997 Seems this change made Apple accelerator tables miss names (because names started respecting the CU NameTableKind GNU & assuming that shouldn't produce accelerated names too), which is never correct (apple accelerator tables don't have separators or CU lists - if present, they must describe all names in all CUs). Original Description: Currently to opt in to debug_names in DWARFv5, the IR must contain 'nameTableKind: Default' which also enables debug_pubnames. Instead, only allow one of {debug_names, apple_names, debug_pubnames, debug_gnu_pubnames}. nameTableKind: Default gives debug_names in DWARFv5 and greater, debug_pubnames in v4 and earlier - and apple_names when tuning for lldb on MachO. nameTableKind: GNU always gives gnu_pubnames llvm-svn: 359026	2019-04-23 19:00:45 +00:00
Jessica Paquette	56342642a0	[AArch64][GlobalISel] Legalize G_INTRINSIC_TRUNC Same patch as G_FCEIL etc. Add the missing switch case in widenScalar, add G_INTRINSIC_TRUNC to the correct rule in AArch64LegalizerInfo.cpp, and add a test. llvm-svn: 359021	2019-04-23 18:20:44 +00:00
David Blaikie	a2470a4653	Revert "DebugInfo: Emit only one kind of accelerated access/name table" Regresses some apple_names situations - still investigating. This reverts commit r358931. llvm-svn: 358997	2019-04-23 15:03:24 +00:00
Fangrui Song	efd94c56ba	Use llvm::stable_sort While touching the code, simplify if feasible. llvm-svn: 358996	2019-04-23 14:51:27 +00:00
Sanjay Patel	06ff5eae5b	[DAGCombiner] generalize binop-of-splats scalarization If we only match build vectors, we can miss some patterns that use shuffles as seen in the affected tests. Note that the underlying calls within getSplatSourceVector() have the potential for compile-time explosion because of exponential recursion looking through binop opcodes, but currently the list of supported opcodes is very limited. Both of those problems should be addressed in follow-up patches. llvm-svn: 358984	2019-04-23 13:16:41 +00:00
Bjorn Pettersson	f97b29be88	[DAGCombiner] Combine OR as ADD when no common bits are set Summary: The DAGCombiner is rewriting (canonicalizing) an ISD::ADD with no common bits set in the operands as an ISD::OR node. This could sometimes result in "missing out" on some combines that normally are performed for ADD. To be more specific this could happen if we already have rewritten an ADD into OR, and later (after legalizations or combines) we expose patterns that could have been optimized if we had seen the OR as an ADD (e.g. reassociations based on ADD). To make the DAG combiner less sensitive to if ADD or OR is used for these "no common bits set" ADD/OR operations we now apply most of the ADD combines also to an OR operation, when value tracking indicates that the operands have no common bits set. Reviewers: spatel, RKSimon, craig.topper, kparzysz Reviewed By: spatel Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59758 llvm-svn: 358965	2019-04-23 10:01:08 +00:00
Chandler Carruth	bbddf21f90	Revert "Use const DebugLoc&" This reverts r358910 (git commit `2b74466530`) While this patch seems trivial and safe and correct, it is not. The copies are actually load bearing copies. You can observe this with MSan or other ways of checking for use-after-destroy, but otherwise this may result in ... difficult to debug inexplicable behavior. I suspect the issue is that the debug location is used after the original reference to it is removed. The metadata backing it gets destroyed as its last references goes away, and then we reference it later through these const references. llvm-svn: 358940	2019-04-23 01:42:07 +00:00
David Blaikie	68602ab2f3	DebugInfo: Emit only one kind of accelerated access/name table Currently to opt in to debug_names in DWARFv5, the IR must contain 'nameTableKind: Default' which also enables debug_pubnames. Instead, only allow one of {debug_names, apple_names, debug_pubnames, debug_gnu_pubnames}. nameTableKind: Default gives debug_names in DWARFv5 and greater, debug_pubnames in v4 and earlier - and apple_names when tuning for lldb on MachO. nameTableKind: GNU always gives gnu_pubnames llvm-svn: 358931	2019-04-22 22:45:11 +00:00
Sanjay Patel	bf8aacb715	[SelectionDAG] move splat util functions up from x86 lowering This was supposed to be NFC, but the change in SDLoc definitions causes instruction scheduling changes. There's nothing x86-specific in this code, and it can likely be used from DAGCombiner's simplifyVBinOp(). llvm-svn: 358930	2019-04-22 22:43:36 +00:00
Matt Arsenault	2b74466530	Use const DebugLoc& llvm-svn: 358910	2019-04-22 19:14:27 +00:00
Matt Arsenault	8f624abc1d	GlobalISel: Legalize scalar G_EXTRACT sources llvm-svn: 358892	2019-04-22 15:10:42 +00:00
Simon Pilgrim	6276ce0142	[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887	2019-04-22 14:04:35 +00:00
Sanjay Patel	9bc6c77220	[DAGCombiner] make variable name less ambiguous; NFC llvm-svn: 358886	2019-04-22 13:42:50 +00:00
Sanjay Patel	d6989daae9	[DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC llvm-svn: 358884	2019-04-22 13:36:07 +00:00
Amara Emerson	4286652556	Revert r358800. Breaks Obsequi from the test suite. The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with a different issue. llvm-svn: 358829	2019-04-20 21:25:00 +00:00
Fangrui Song	d3b2682351	[ExecutionDomainFix] Optimize a binary search insertion llvm-svn: 358815	2019-04-20 13:00:50 +00:00
Amara Emerson	eac69e9377	Revert "Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores"" We were shifting the wrong component of a split load when trying to combine them back into a single value. llvm-svn: 358800	2019-04-19 23:54:44 +00:00
Jessica Paquette	d5c69e0836	[GlobalISel][AArch64] Legalize + select G_FRINT Exactly the same as G_FCEIL, G_FABS, etc. Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc. Differential Revision: https://reviews.llvm.org/D60895 llvm-svn: 358799	2019-04-19 23:41:52 +00:00
Jessica Paquette	ad69af3e95	[GlobalISel] Add IRTranslator support for G_FRINT Add it as a simple intrinsic, update arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D60893 llvm-svn: 358787	2019-04-19 21:46:12 +00:00
Amy Huang	d07d6d6177	Attempt to fix buildbot failure in commit 1bb57bac959ac163fd7d8a76d734ca3e0ecee6ab. llvm-svn: 358786	2019-04-19 21:44:30 +00:00
Amy Huang	c774f687b6	[MS] Emit S_HEAPALLOCSITE debug info Summary: This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug info in codeview. Currently only changes FastISel, so emitting labels still needs to be implemented in SelectionDAG. Reviewers: hans, rnk Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60800 llvm-svn: 358783	2019-04-19 21:09:11 +00:00
Amara Emerson	36c5baef49	Revert "[GlobalISel] Add legalization support for non-power-2 loads and stores" This introduces some runtime failures which I'll need to investigate further. llvm-svn: 358771	2019-04-19 17:42:13 +00:00
Jessica Paquette	dfd87f6fa1	[GlobalISel][AArch64] Legalize vector G_FPOW This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc. Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change. Differential Revision: https://reviews.llvm.org/D60218 llvm-svn: 358764	2019-04-19 16:28:08 +00:00
Sanjay Patel	e197c617a6	[SelectionDAG] soften splat mask assert/unreachable (PR41535) These are general queries, so they should not die when given a degenerate input like an all undef mask. Callers should be able to deal with an op that will eventually be simplified away. llvm-svn: 358761	2019-04-19 15:31:11 +00:00
Bjorn Pettersson	238c9d6308	[CodeGen] Add "const" to MachineInstr::mayAlias Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744	2019-04-19 09:08:38 +00:00
James Molloy	9ad4cb3de4	[PATCH] [MachineScheduler] Check pending instructions when an instruction is scheduled Pending instructions that may have been blocked from being available by the HazardRecognizer may no longer may not be blocked any more when an instruction is scheduled; pending instructions should be re-checked in this case. This is primarily aimed at VLIW targets with large parallelism and esoteric constraints. No testcase as no in-tree targets have this behavior. Differential revision: https://reviews.llvm.org/D60861 llvm-svn: 358743	2019-04-19 09:00:55 +00:00
Ali Tamur	783d84bb39	[llvm] Prevent duplicate files in debug line header in dwarf 5: another attempt Another attempt to land the changes in debug line header to prevent duplicate files in Dwarf 5. I rolled back my previous commit because of a mistake in generating the object file in a test. Meanwhile, I addressed some offline comments and changed the implementation; the largest difference is that MCDwarfLineTableHeader does not keep DwarfVersion but gets it as a parameter. I also merged the patch to fix two lld tests that will strt to fail into this patch. Original Commit: https://reviews.llvm.org/D59515 Original Message: Motivation: In previous dwarf versions, file name indexes started from 1, and the primary source file was not explicit. Dwarf 5 standard (6.2.4) prescribes the primary source file to be explicitly given an entry with an index number 0. The current implementation honors the specification by just duplicating the main source file, once with index number 0, and later maybe with another index number. While this is compliant with the letter of the standard, the duplication causes problems for consumers of this information such as lldb. (Some files are duplicated, where only some of them have a line table although all refer to the same file) With this change, dwarf 5 debug line section files always start from 0, and the zeroth entry is not duplicated whenever possible. This requires different handling of dwarf 4 and dwarf 5 during generation (e.g. when a function returns an index zero for a file name, it signals an error in dwarf 4, but not in dwarf 5) However, I think the minor complication is worth it, because it enables all consumers (lldb, gdb, dwarfdump, objdump, and so on) to treat all files in the file name list homogenously. llvm-svn: 358732	2019-04-19 02:26:56 +00:00
Michael Berg	d573aa0156	[NFC] FMF propagation for GlobalIsel llvm-svn: 358702	2019-04-18 18:48:57 +00:00
Ali Tamur	6263365b08	Fix a typo in comments. [NFC] llvm-svn: 358639	2019-04-18 02:39:37 +00:00
Aditya Nandakumar	9266337656	[GISel]:IRTranslator: Prefer a buidInstr form that allows CSE of cast instructions https://reviews.llvm.org/D60844 Use the style of buildInstr that allows CSEing. llvm-svn: 358637	2019-04-18 02:19:29 +00:00
Nick Desaulniers	9609ce2f33	[AsmPrinter] hoist %a output template to base class for ARM+Aarch64 Summary: X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do basically the same thing (Aarch64 did not correctly handle immediates, ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses %a with an immediate) for a flag that should be target independent anyways. Reviewers: echristo, peter.smith Reviewed By: echristo Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60841 llvm-svn: 358618	2019-04-17 22:21:10 +00:00
Amara Emerson	d51adf0568	Add a getSizeInBits() accessor to MachineMemOperand. NFC. Cleans up a bunch of places where we do getSize() * 8. Differential Revision: https://reviews.llvm.org/D60799 llvm-svn: 358617	2019-04-17 22:21:05 +00:00
Amara Emerson	daf6e66ac5	[GlobalISel] Add legalization support for non-power-2 loads and stores Legalize things like i24 load/store by splitting them into smaller power of 2 operations. This matches how SelectionDAG handles these operations. Differential Revision: https://reviews.llvm.org/D59971 llvm-svn: 358613	2019-04-17 21:30:07 +00:00
Nick Desaulniers	a2077bab40	[AsmPrinter] defer %c to base class for ARM, PPC, and Hexagon. NFC Summary: None of these derived classes do anything that the base class cannot. If we remove these case statements, then the base class can handle them just fine. Reviewers: peter.smith, echristo Reviewed By: echristo Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60803 llvm-svn: 358603	2019-04-17 18:22:48 +00:00
Simon Pilgrim	e7fe6dd5ed	[DAGCombine] Add SimplifyDemandedBits helper that handles demanded elts mask as well The other SimplifyDemandedBits helpers become wrappers to this new demanded elts variant. llvm-svn: 358585	2019-04-17 15:45:44 +00:00
Florian Hahn	258a425c69	[ScheduleDAGRRList] Recompute topological ordering on demand. Currently there is a single point in ScheduleDAGRRList, where we actually query the topological order (besides init code). Currently we are recomputing the order after adding a node (which does not have predecessors) and then we add predecessors edge-by-edge. We can avoid adding edges one-by-one after we added a new node. In that case, we can just rebuild the order from scratch after adding the edges to the DAG and avoid all the updates to the ordering. Also, we can delay updating the DAG until we query the DAG, if we keep a list of added edges. Depending on the number of updates, we can either apply them when needed or recompute the order from scratch. This brings down the geomean compile time for of CTMark with -O1 down 0.3% on X86, with no regressions. Reviewers: MatzeB, atrick, efriedma, niravd, paquette Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D60125 llvm-svn: 358583	2019-04-17 15:05:29 +00:00
Fangrui Song	c82e92bca8	Change some llvm::{lower,upper}_bound to llvm::bsearch. NFC llvm-svn: 358564	2019-04-17 07:58:05 +00:00
Simon Pilgrim	e5573f4f4e	[TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526	2019-04-16 20:57:28 +00:00
Luis Marques	eda370d4c8	[DAGCombiner] Add missing flag to addressing mode check The checks in `canFoldInAddressingMode` tested for addressing modes that have a base register but didn't set the `HasBaseReg` flag to true (it's false by default). This patch fixes that. Although the omission of the flag was technically incorrect it had no known observable impact, so no tests were changed by this patch. Differential Revision: https://reviews.llvm.org/D60314 llvm-svn: 358502	2019-04-16 15:09:18 +00:00
Amara Emerson	02a90ea73d	[AArch64][GlobalISel] Don't do extending loads combine for non-pow-2 types. Since non-pow-2 types are going to get split up into multiple loads anyway, don't do the [SZ]EXTLOAD combine for those and save us trouble later in legalization. llvm-svn: 358458	2019-04-15 22:34:08 +00:00
Tim Northover	2be3f868f9	DAG: propagate ConsecutiveRegs flags to returns too. Arguments already have a flag to inform backends when they have been split up. The AArch64 arm64_32 ABI makes use of these on return types too, so that code emitted for armv7k can be ABI-compliant. There should be no CodeGen changes yet, just making more information available. llvm-svn: 358399	2019-04-15 12:04:10 +00:00
Tim Northover	9db00f7e5b	DAG: propagate whether an arg is a pointer for CallingConv decisions. The arm64_32 ABI specifies that pointers (despite being 32-bits) should be zero-extended to 64-bits when passed in registers for efficiency reasons. This means that the SelectionDAG needs to be able to tell the backend that an argument was originally a pointer, which is implmented here. Additionally, some memory intrinsics need to be declared as taking an i8* instead of an iPTR. There should be no CodeGen change yet, but it will be triggered when AArch64 backend support for ILP32 is added. llvm-svn: 358398	2019-04-15 12:03:54 +00:00
Bjorn Pettersson	60569363a5	[SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry Summary: Use KnownBits::computeForAddSub/computeForAddCarry in SelectionDAG::computeKnownBits when doing value tracking for addition/subtraction. This should improve the precision of the known bits, as we only used to make a simple estimate of known zeroes. The KnownBits support functions are also able to deduce bits that are known to be one in the result. Reviewers: spatel, RKSimon, nikic, lebedev.ri Reviewed By: nikic Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60460 llvm-svn: 358372	2019-04-15 07:19:11 +00:00
Amara Emerson	946b1246d6	[GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 `8033492` -0.6% test-suite :: CTMark/kimwitu++/kc.test `3870380` 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369	2019-04-15 05:04:20 +00:00
Amara Emerson	d189680baa	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368	2019-04-15 04:53:46 +00:00
Amara Emerson	93e58d2396	[AArch64][GlobalISel] Enable copy elision in the pre-legalizer combine and fix a crash. This enables the simple copy combine that already exists in the CombinerHelper. However, it exposed a bug in the GISelChangeObserver where it wouldn't clear a set of MIs to process, and so would end up causing a crash when deleted MIs were being added to the combiner worklist again. Differential Revision: https://reviews.llvm.org/D60579 llvm-svn: 358318	2019-04-13 00:33:25 +00:00
Amara Emerson	bdb5e4e4ca	[GlobalISel] Fix a crash when handling an invalid MVT during call lowering. This crash was introduced in r358032 as we try to construct an EVT from an MVT in order to find the register type for the calling conv. Fall back instead of trying to do this with an invalid MVT coming from i256. llvm-svn: 358314	2019-04-12 22:05:46 +00:00
Sanjay Patel	5e4ad39af7	[DAGCombiner] narrow shuffle of concatenated vectors // shuffle (concat X, undef), (concat Y, undef), Mask --> // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1) The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements. The x86 changes look neutral or better. There's one test with an extra instruction, but that could be reversed for a subtarget with the right attributes. But by default, we want to avoid the 256-bit op when possible (in my motivating benchmark, a handful of ymm ops sprinkled into a sequence of xmm ops are triggering frequency throttling on Haswell resulting in significantly worse perf). Differential Revision: https://reviews.llvm.org/D60545 llvm-svn: 358291	2019-04-12 16:31:56 +00:00
Hiroshi Yamauchi	c27ff0d32d	Add options for MaxLoadsPerMemcmp(OptSize). Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60587 llvm-svn: 358287	2019-04-12 15:05:46 +00:00
Hans Wennborg	4e6b857922	Revert r358268 "[DebugInfo] DW_OP_deref_size in PrologEpilogInserter." It causes clang to crash while building Chromium. See https://crbug.com/952230 for reproducer. > The PrologEpilogInserter need to insert a DW_OP_deref_size before > prepending a memory location expression to an already implicit > expression to avoid having the existing expression act on the memory > address instead of the value behind it. > > The reason for using DW_OP_deref_size and not plain DW_OP_deref is that > big-endian targets need to read the right size as simply truncating a > larger read would yield the wrong result (LSB bytes are not at the lower > address). > > Differential Revision: https://reviews.llvm.org/D59687 llvm-svn: 358281	2019-04-12 12:54:52 +00:00
Fangrui Song	fb79ff6ab5	Use llvm::upper_bound. NFC llvm-svn: 358277	2019-04-12 11:31:16 +00:00
Markus Lavin	138c76129b	[DebugInfo] DW_OP_deref_size in PrologEpilogInserter. The PrologEpilogInserter need to insert a DW_OP_deref_size before prepending a memory location expression to an already implicit expression to avoid having the existing expression act on the memory address instead of the value behind it. The reason for using DW_OP_deref_size and not plain DW_OP_deref is that big-endian targets need to read the right size as simply truncating a larger read would yield the wrong result (LSB bytes are not at the lower address). Differential Revision: https://reviews.llvm.org/D59687 llvm-svn: 358268	2019-04-12 08:23:55 +00:00
Eric Christopher	6b06c6a5ef	Add explicit dependencies on MCSection.h and MCDwarf.h to the .cpp files rather than rely on transitive includes from MCStreamer.h. llvm-svn: 358263	2019-04-12 07:40:01 +00:00
Craig Topper	3b1239d2a8	[TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes. If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding. Differential Revision: https://reviews.llvm.org/D60358 llvm-svn: 358257	2019-04-12 06:49:28 +00:00
Eric Christopher	8bbc3039be	Move addFrameInst out of line and remove the MCDwarf.h include. This removes 500 transitive dependencies for a modification of MCDwarf.h in a build of llc for a single out of line function and reduces the build overhead by more than half without impacting test time of check-llvm. llvm-svn: 358255	2019-04-12 06:31:59 +00:00
Eric Christopher	b6c190da23	Include what's used in a few cpp files - these were getting transitive includes from MCDwarf.h. llvm-svn: 358254	2019-04-12 06:16:33 +00:00
Fangrui Song	cecc435250	Use llvm::lower_bound. NFC This reapplies rL358161. That commit inadvertently reverted an exegesis file to an old version. llvm-svn: 358246	2019-04-12 02:02:06 +00:00
Brendon Cahoon	57c3d4bed3	[Pipeliner] Fix incorrect loop carried dependence calculation The isLoopCarriedDep function does not correctly compute loop carried dependences when the array index offset is negative or the stride is smallar than the access size. Patch by Denis Antrushin. Differential Revision: https://reviews.llvm.org/D60135 llvm-svn: 358233	2019-04-11 21:57:51 +00:00
Ali Tamur	7822b46188	Revert "Use llvm::lower_bound. NFC" This reverts commit rL358161. This patch have broken the test: llvm/test/tools/llvm-exegesis/X86/uops-CMOV16rm-noreg.s llvm-svn: 358199	2019-04-11 17:35:20 +00:00
Sanjay Patel	fd314eca8f	[DAGCombiner] refactor narrowing of extracted vector binop; NFC There's a TODO comment about handling patterns with insert_subvector, and we do want to match that. llvm-svn: 358187	2019-04-11 15:59:47 +00:00
Sanjay Patel	c0f4a35e68	[DAGCombiner][x86] scalarize inserted vector FP ops // bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) --> // build_vec ...undef, (bo x, y), undef... The lifetime of the nodes in these examples is different for variables versus constants, but they are all build vectors briefly, so I'm proposing to catch them in this form to handle all of the leading examples in the motivating test file. Before we have build vectors, we might have insert_vector_element. After that, we might have scalar_to_vector and constant pool loads. It's going to take more work to ensure that FP vector operands are getting simplified with undef elements, so this transform can apply more widely. In a non-loose FP environment, we are likely simplifying FP elements to NaN values rather than undefs. We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors yet. Differential Revision: https://reviews.llvm.org/D60514 llvm-svn: 358172	2019-04-11 14:21:57 +00:00
Fangrui Song	71cce580b9	Use llvm::lower_bound. NFC llvm-svn: 358161	2019-04-11 10:25:41 +00:00
Shiva Chen	7cc03bd064	[RISCV] Put data smaller than eight bytes to small data section Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset could covert most of the small data section. Linker relaxation could transfer the multiple data accessing instructions to a gp base with signed twelve-bit offset instruction. Differential Revision: https://reviews.llvm.org/D57493 llvm-svn: 358150	2019-04-11 04:59:13 +00:00
Amara Emerson	ae878dab03	[AArch64][GlobalISel] Scalarize vector SDIV. llvm-svn: 358142	2019-04-10 23:06:08 +00:00
David Green	0861c87b06	Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not seeing through to the constant in other blocks. Revert this patch while we come up with a better way to handle that. I will try to follow this up with some better tests. llvm-svn: 358113	2019-04-10 18:00:41 +00:00
Matt Arsenault	2064e45ce3	GlobalISel: Move computeValueLLTs Call lowering should use this directly instead of going through the EVT version, but more work is needed to deal with this (mostly the passing of the IR type pointer instead of the relevant properties in ArgInfo). llvm-svn: 358111	2019-04-10 17:27:56 +00:00
Matt Arsenault	0aab99902b	GlobalISel: Fix invoke lowering creating invalid type registers Unlike the call handling, this wasn't checking for void results and creating a register with the invalid LLT llvm-svn: 358110	2019-04-10 17:27:55 +00:00
Matt Arsenault	7187272b2b	GlobalISel: Support legalizing G_CONSTANT with irregular breakdown llvm-svn: 358109	2019-04-10 17:27:53 +00:00
Matt Arsenault	9e0eeba569	GlobalISel: Handle odd breakdowns for bit ops llvm-svn: 358105	2019-04-10 17:07:56 +00:00
Nick Desaulniers	5277b3ff25	[AsmPrinter] refactor to remove remove AsmVariant. NFC Summary: The InlineAsm::AsmDialect is only required for X86; no architecture makes use of it and as such it gets passed around between arch-specific and general code while being unused for all architectures but X86. Since the AsmDialect is queried from a MachineInstr, which we also pass around, remove the additional AsmDialect parameter and query for it deep in the X86AsmPrinter only when needed/as late as possible. This refactor should help later planned refactors to AsmPrinter, as this difference in the X86AsmPrinter makes it harder to make AsmPrinter more generic. Reviewers: craig.topper Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60488 llvm-svn: 358101	2019-04-10 16:38:43 +00:00
Fangrui Song	ae6c9403d1	[MachineOutliner] Replace ostringstream based string concatenation with Twine This makes my libLLVMCodeGen.so.9svn 4936 bytes smaller. While here, delete unused #include <map> llvm-svn: 358089	2019-04-10 14:52:37 +00:00
David Stenberg	b96943b6a0	[DebugInfo] Track multiple registers in DbgEntityHistoryCalculator Summary: When calculating the debug value history, DbgEntityHistoryCalculator would only keep track of register clobbering for the latest debug value per inlined entity. This meant that preceding register-described debug value fragments would live on until the next overlapping debug value, ignoring any potential clobbering. This patch amends DbgEntityHistoryCalculator so that it keeps track of all registers that a inlined entity's currently live debug values are described by. The DebugInfo/COFF/pieces.ll test case has had to be changed since previously a register-described fragment would incorrectly outlive its basic block. The parent patch D59941 is expected to increase the coverage slightly, as it makes sure that location list entries are inserted after clobbered fragments, and this patch is expected to decrease it, as it stops preceding register-described from living longer than they should. All in all, this patch and the preceding patch has a negligible effect on the output from `llvm-dwarfdump -statistics' for a clang-3.4 binary built using the RelWithDebInfo build profile. "Scope bytes covered" increases by 0.5%, and "variables with location" increases from 2212083 to 2212088, but it should improve the accuracy quite a bit. This fixes PR40283. Reviewers: aprantl, probinson, dblaikie, rnk, bjope Reviewed By: aprantl Subscribers: llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D59942 llvm-svn: 358073	2019-04-10 11:28:28 +00:00
David Stenberg	5ffec6deef	[DebugInfo] Improve handling of clobbered fragments Summary: Currently the DbgValueHistorymap only keeps track of clobbered registers for the last debug value that it has encountered. This could lead to preceding register-described debug values living on longer in the location lists than they should. See PR40283 for an example. This patch does not introduce tracking of multiple registers, but changes the DbgValueHistoryMap structure to allow for that in a follow-up patch. This patch is not NFC, as it at least fixes two bugs in DwarfDebug (both are covered in the new clobbered-fragments.mir test): * If a debug value was clobbered (its End pointer set), the value would still be added to OpenRanges, meaning that the succeeding location list entries could potentially contain stale values. * If a debug value was clobbered, and there were non-overlapping fragments that were still live after the clobbering, DwarfDebug would not create a location list entry starting directly after the clobbering instruction. This meant that the location list could have a gap until the next debug value for the variable was encountered. Before this patch, the history map was represented by <Begin, End> pairs, where a new pair was created for each new debug value. When dealing with partially overlapping register-described debug values, such as in the following example: DBG_VALUE $reg2, $noreg, !1, !DIExpression(DW_OP_LLVM_fragment, 32, 32) [...] DBG_VALUE $reg3, $noreg, !1, !DIExpression(DW_OP_LLVM_fragment, 64, 32) [...] $reg2 = insn1 [...] $reg3 = insn2 the history map would then contain the entries `[<DV1, insn1>, [<DV2, insn2>]`. This would leave it up to the users of the map to be aware of the relative order of the instructions, which e.g. could make DwarfDebug::buildLocationList() needlessly complex. Instead, this patch makes the history map structure monotonically increasing by dropping the End pointer, and replacing that with explicit clobbering entries in the vector. Each debug value has an "end index", which if set, points to the entry in the vector that ends the debug value. The ending entry can either be an overlapping debug value, or an instruction which clobbers the register that the debug value is described by. The ending entry's instruction can thus either be excluded or included in the debug value's range. If the end index is not set, the debug value that the entry introduces is valid until the end of the function. Changes to test cases: * DebugInfo/X86/pieces-3.ll: The range of the first DBG_VALUE, which describes that the fragment (0, 64) is located in RDI, was incorrectly ended by the clobbering of RAX, which the second (non-overlapping) DBG_VALUE was described by. With this patch we get a second entry that only describes RDI after that clobbering. * DebugInfo/ARM/partial-subreg.ll: This test seems to indiciate a bug in LiveDebugValues that is caused by it not being aware of fragments. I have added some comments in the test case about that. Also, before this patch DwarfDebug would incorrectly include a register-described debug value from a preceding block in a location list entry. Reviewers: aprantl, probinson, dblaikie, rnk, bjope Reviewed By: aprantl Subscribers: javed.absar, kristof.beyls, jdoerfert, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D59941 llvm-svn: 358072	2019-04-10 11:28:20 +00:00
Fangrui Song	56f70c625a	[AsmPrinter] Delete unused RangeSpanList::addRange llvm-svn: 358068	2019-04-10 10:35:10 +00:00
David Stenberg	6feef56d1b	[DebugInfo] Rename DbgValueHistoryMap::{InstrRange -> Entry}, NFC Summary: In an upcoming commit the history map will be changed so that it contains explicit entries for instructions that clobber preceding debug values, rather than Begin- End range pairs, so generalize the name to "Entry". Also, prefix the iterator variable names in buildLocationList() with "E". In an upcoming commit the entry will have query functions such as "isD(e)b(u)gValue", which could at a glance make one confuse it for iterations over MachineInstrs, so make the iterator names a bit more distinct to avoid that. Reviewers: aprantl Reviewed By: aprantl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59939 llvm-svn: 358060	2019-04-10 09:07:43 +00:00
David Stenberg	3739979c20	[DebugInfo] Make InstrRange into a class, NFC Summary: Replace use of std::pair by creating a class for the debug value instruction ranges instead. This is a preparatory refactoring for improving handling of clobbered fragments. In an upcoming commit the Begin pointer will become a PointerIntPair, so it will be cleaner to have a getter for that. Reviewers: aprantl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59938 llvm-svn: 358059	2019-04-10 09:07:32 +00:00
Florian Hahn	83443c9a9e	[ScheduleDAG] Add statistics for maintaining the topological order. This is helpful to measure the impact of D60125 on maintaining topological orders. Reviewers: MatzeB, atrick, efriedma, niravd Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D60187 llvm-svn: 358058	2019-04-10 09:03:03 +00:00
Amara Emerson	2b523f8162	[GlobalISel][AArch64] Allow CallLowering to handle types which are normally required to be passed as different register types. E.g. <2 x i16> may need to be passed as a larger <2 x i32> type, so formal arg lowering needs to be able truncate it back. Likewise, when dealing with returns of these types, they need to be widened in the appropriate way back. Differential Revision: https://reviews.llvm.org/D60425 llvm-svn: 358032	2019-04-09 21:22:33 +00:00
Craig Topper	61e77b11d1	[DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate. This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate. Differential Revision: https://reviews.llvm.org/D60020 llvm-svn: 358027	2019-04-09 18:33:56 +00:00
Simon Pilgrim	d7cc0ec581	[TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support llvm-svn: 358019	2019-04-09 16:52:21 +00:00
Stanislav Mekhanoshin	913ba8eeb4	Revert LIS handling in MachineDCE One of out of tree targets has regressed with this patch. Reverting it for now and let liveness to be fully reconstructed in case pass was used after the LIS is created to resolve the regression. Differential Revision: https://reviews.llvm.org/D60466 llvm-svn: 358015	2019-04-09 16:13:53 +00:00
Simon Pilgrim	55f79ef9fe	[TargetLowering] SimplifyDemandedBits - Remove GetDemandedSrcMask lambda. NFCI. An older version of this could return false but now that this always succeeds we can just inline and simplify it. llvm-svn: 357999	2019-04-09 12:29:26 +00:00
Simon Pilgrim	345eacd555	[TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op. llvm-svn: 357992	2019-04-09 10:27:59 +00:00
David Stenberg	2028ae975c	[DebugInfo] Pass all values in DebugLocEntry's constructor, NFC Summary: With MergeValues() removed, amend DebugLocEntry's constructor so that it takes multiple values rather than a single, and keep non-fragment values in OpenRanges, as this allows some cleanup of the code in buildLocationList(). Reviewers: aprantl, dblaikie, loladiro Reviewed By: aprantl Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D59303 llvm-svn: 357988	2019-04-09 10:08:26 +00:00
David Stenberg	93b497a61d	[DebugInfo] Remove redundant DebugLocEntry::MergeValues() function, NFC Summary: The MergeValues() function would try to merge two entries if they shared the same beginning label. Having the same beginning label means that the former entry's range would be empty; however, after D55919 we no longer create entries for empty ranges, so we can no longer land in a situation where that check in MergeValues would succeed. Instead, the "merging" is done by keeping the live values from the preceding empty ranges in OpenRanges, and adding them to the first non-empty range. Reviewers: aprantl, dblaikie, loladiro Reviewed By: aprantl Subscribers: llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D59301 llvm-svn: 357974	2019-04-09 07:46:09 +00:00
Simon Pilgrim	9f74df7d5b	[TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts. llvm-svn: 357942	2019-04-08 20:59:38 +00:00
Adrian Prantl	6ed5706a2b	Add LLVM IR debug info support for Fortran COMMON blocks COMMON blocks are a feature of Fortran that has no direct analog in C languages, but they are similar to data sections in assembly language programming. A COMMON block is a named area of memory that holds a collection of variables. Fortran subprograms may map the COMMON block memory area to their own, possibly distinct, non-empty list of variables. A Fortran COMMON block might look like the following example. COMMON /ALPHA/ I, J For this construct, the compiler generates a new scope-like DI construct (!DICommonBlock) into which variables (see I, J above) can be placed. As the common block implies a range of storage with global lifetime, the !DICommonBlock refers to a !DIGlobalVariable. The Fortran variable that comprise the COMMON block are also linked via metadata to offsets within the global variable that stands for the entire common block. @alpha_ = common global %alphabytes_ zeroinitializer, align 64, !dbg !27, !dbg !30, !dbg !33 !14 = distinct !DISubprogram(…) !20 = distinct !DICommonBlock(scope: !14, declaration: !25, name: "alpha") !25 = distinct !DIGlobalVariable(scope: !20, name: "common alpha", type: !24) !27 = !DIGlobalVariableExpression(var: !25, expr: !DIExpression()) !29 = distinct !DIGlobalVariable(scope: !20, name: "i", file: !3, type: !28) !30 = !DIGlobalVariableExpression(var: !29, expr: !DIExpression()) !31 = distinct !DIGlobalVariable(scope: !20, name: "j", file: !3, type: !28) !32 = !DIExpression(DW_OP_plus_uconst, 4) !33 = !DIGlobalVariableExpression(var: !31, expr: !32) The DWARF generated for this is as follows. DW_TAG_common_block: DW_AT_name: alpha DW_AT_location: @alpha_+0 DW_TAG_variable: DW_AT_name: common alpha DW_AT_type: array of 8 bytes DW_AT_location: @alpha_+0 DW_TAG_variable: DW_AT_name: i DW_AT_type: integer4 DW_AT_location: @Alpha+0 DW_TAG_variable: DW_AT_name: j DW_AT_type: integer4 DW_AT_location: @Alpha+4 Patch by Eric Schweitz! Differential Revision: https://reviews.llvm.org/D54327 llvm-svn: 357934	2019-04-08 19:13:55 +00:00
Simon Pilgrim	561ba38623	[DAG] Pull out ComputeNumSignBits call to make debugging easier. NFCI. llvm-svn: 357861	2019-04-07 11:49:33 +00:00
Stanislav Mekhanoshin	c8f78f8dd3	[AMDGPU] Add MachineDCE pass after RenameIndependentSubregs Detect dead lanes can create some dead defs. Then RenameIndependentSubregs will break a REG_SEQUENCE which may use these dead defs. At this point a dead instruction can be removed but we do not run a DCE anymore. MachineDCE was only running before live variable analysis. The patch adds a mean to preserve LiveIntervals and SlotIndexes in case it works past this. Differential Revision: https://reviews.llvm.org/D59626 llvm-svn: 357805	2019-04-05 20:11:32 +00:00
Fangrui Song	2c5c12c041	Change some dyn_cast to more apropriate isa. NFC llvm-svn: 357773	2019-04-05 16:16:23 +00:00
Simon Pilgrim	17586cda4a	[SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D60006 llvm-svn: 357765	2019-04-05 14:56:21 +00:00
Matt Arsenault	106429b4e4	GlobalISel: Add another overload of buildUnmerge It's annoying to have to create an array of the result type, particularly when you don't care about the size of the value. llvm-svn: 357763	2019-04-05 14:03:07 +00:00
Sanjay Patel	50a8652785	[DAGCombiner][x86] scalarize splatted vector FP ops There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760	2019-04-05 13:32:17 +00:00
Piotr Sobczak	0376ac1d94	[SelectionDAG] Compute known bits of CopyFromReg Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745	2019-04-05 07:44:09 +00:00
Serguei Katkov	c39636cc2c	[FastISel] Fix crash for gc.relocate lowring Lowering safepoint checks that all gc.relocaes observed in safepoint must be lowered. However Fast-Isel is able to skip dead gc.relocate. To resolve this issue we just ignore dead gc.relocate in the check. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60184 llvm-svn: 357742	2019-04-05 05:41:08 +00:00
Eric Christopher	798e83b5d6	NFC: Move API uses of MD5::MD5Result to Optional rather than a pointer. Differential Revision: https://reviews.llvm.org/D60290 llvm-svn: 357736	2019-04-04 23:34:38 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
Serguei Katkov	fb44846e37	[FastISel] Fix the crash in gc.result lowering The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction. This works as follows: Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel for these instructions if it is a call and continue fast instruction selections. However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining instructions in basic block. However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint causing breakage invariant the gc.results should be handled after statepoint. Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext) and as a result test does not check fast-isel at all. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D60182 llvm-svn: 357672	2019-04-04 04:19:56 +00:00
Evandro Menezes	7c711ccf36	[IR] Create new method in `Function` class (NFC) Create method `optForNone()` testing for the function level equivalent of `-O0` and refactor appropriately. Differential revision: https://reviews.llvm.org/D59852 llvm-svn: 357638	2019-04-03 21:27:03 +00:00
Jessica Paquette	e794121cd0	[AArch64][GlobalISel] Legalize G_FEXP2 Same as G_EXP. Add a test, and update legalizer-info-validation.mir and f16-instructions.ll. Differential Revision: https://reviews.llvm.org/D60165 llvm-svn: 357605	2019-04-03 16:58:32 +00:00
Simon Pilgrim	8d248dbd77	[DAGCombiner] Rename variables Demanded -> DemandedBits/DemandedElts. NFCI. Use consistent variable names down the SimplifyDemanded* call stack so debugging isn't such a annoyance. llvm-svn: 357602	2019-04-03 16:00:59 +00:00
Sanjay Patel	00dae6b22d	[DAGCombiner] loosen restrictions for moving shuffles after vector binop There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580	2019-04-03 13:42:06 +00:00
Simon Pilgrim	02599de2e1	[DAGCombine] Don't use getZExtValue() until we know the constant is in range. Noticed during prep for a patch for PR40758. llvm-svn: 357571	2019-04-03 11:00:55 +00:00
Hans Wennborg	94b867dc7c	Revert r357256 "[DAGCombine] Improve Lifetime node chains." As it caused a pathological compile-time regressionin V8, see PR41352. > Improve both start and end lifetime nodes chain dependencies. > > Reviewers: courbet > > Reviewed By: courbet > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D59795 This also reverts the follow-up r357309: > [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. > > Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357563	2019-04-03 07:41:58 +00:00
Jessica Paquette	ed23352379	[GlobalISel] Add IRTranslator support for llvm.stacksave and llvm.stackrestore Also update arm64-irtranslator.ll. Differential Revision: https://reviews.llvm.org/D60140 llvm-svn: 357538	2019-04-02 22:46:31 +00:00
Sanjay Patel	7cb7daabbb	[DAGCombiner] reduce code duplication; NFC llvm-svn: 357498	2019-04-02 17:20:54 +00:00
Sander de Smalen	7f23e0a62f	Enforce StackID definition in PEI There are various places in LLVM where the definition of StackID is not properly honoured, for example in PEI where objects with a StackID > 0 are allocated on the default stack (StackID0). This patch enforces that PEI only considers allocating objects to StackID 0. Reviewers: arsenm, thegameg, MatzeB Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60062 llvm-svn: 357460	2019-04-02 09:46:52 +00:00
Nick Lewycky	c0ebfbe3f3	Add an optional list of blocks to avoid when looking for a path in isPotentiallyReachable. The leads to some ambiguous overloads, so update three callers. Differential Revision: https://reviews.llvm.org/D60085 llvm-svn: 357447	2019-04-02 01:05:48 +00:00
Alex Bradbury	da20f5ca74	[RISCV] Generate address sequences suitable for mcmodel=medium This patch adds an implementation of a PC-relative addressing sequence to be used when -mcmodel=medium is specified. With absolute addressing, a 'medium' codemodel may cause addresses to be out of range. This is because while 'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as opposed to 'small', which implies the first 2 GiB only. Note that LLVM/Clang currently specifies code models differently to GCC, where small and medium imply the same functionality as GCC's medlow and medany respectively. Differential Revision: https://reviews.llvm.org/D54143 Patch by Lewis Revill. llvm-svn: 357393	2019-04-01 14:42:56 +00:00
Nirav Dave	54f7118de5	[DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357309	2019-03-29 20:26:23 +00:00
Nirav Dave	7e84cacdbd	[DAG] Avoid redundancy in StoreMerge TokenFactor generation. Avoid generating redundant TokenFactor when all merged stores have the same chain. llvm-svn: 357299	2019-03-29 18:50:22 +00:00
Nirav Dave	fe59e14031	[DAGCombine] Prune unnused nodes. Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283	2019-03-29 17:35:56 +00:00
Evandro Menezes	0f797b8732	[CodeGen] Refactor the option for the maximum jump table size Refactor the option `max-jump-table-size` to default to the maximum representable number. Essentially, NFC. llvm-svn: 357280	2019-03-29 17:28:11 +00:00
Nirav Dave	610036c506	[DAG] Set up infrastructure to avoid smart constructor-based dangling nodes Summary: Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations without fully pruning unused result values. This results in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter for the combiner to make sure such nodes have the chance of being pruned. This allows a number of additional peephole optimizations. Reviewers: efriedma, RKSimon, craig.topper, jyknight Reviewed By: jyknight Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58068 llvm-svn: 357279	2019-03-29 17:26:40 +00:00
Sanjay Patel	12685d0f7c	[DAGCombiner] simplify shuffle of shuffle After investigating the examples from D59777 targeting an SSE4.1 machine, it looks like a very different problem due to how we map illegal types (256-bit in these cases). We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand. We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that generality means it is limited to patterns with a one-use constraint, and the examples here have 2 uses. We don't need any uses or legality limitations for a simplification (no new value is created). It looks like we miss this pattern in IR too. In one of the zext examples here, we have shuffle masks like this: Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7> Shuf = vector_shuffle<4,u,6,7,u,u,u,u> ...so that's moving the high half of the 1st vector into the low half. But the high half of the 1st vector is already identical to the low half. Differential Revision: https://reviews.llvm.org/D59961 llvm-svn: 357258	2019-03-29 14:20:38 +00:00
Nirav Dave	9259de217e	[DAGCombine] Improve Lifetime node chains. Improve both start and end lifetime nodes chain dependencies. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59795 llvm-svn: 357256	2019-03-29 14:09:47 +00:00
Sanjay Patel	665a385035	[DAGCombiner] fold sext into decrement This is a sibling to rL357178 that I noticed we'd hit if we chose an alternate transform in D59818. %z = zext i8 %x to i32 %dec = add i32 %z, -1 %r = sext i32 %dec to i64 => %z2 = zext i8 %x to i64 %r = add i64 %z2, -1 https://rise4fun.com/Alive/kPP The x86 vector diffs show a slight regression, so there's a chance that we should limit this and the previous transform to scalars. But given that we allowed vectors before, I'm matching that behavior here. We should change both transforms together if that's the right thing to do. llvm-svn: 357254	2019-03-29 13:49:08 +00:00
Hans Wennborg	800b12f90a	Switch lowering: exploit unreachable fall-through when lowering case range cluster In the example below, we would previously emit two range checks, one for cases 1--3 and one for 4--6. This patch makes us exploit the fact that the fall-through is unreachable and only one range check is necessary. switch i32 %i, label %default [ i32 1, label %bb1 i32 2, label %bb1 i32 3, label %bb1 i32 4, label %bb2 i32 5, label %bb2 i32 6, label %bb2 ] default: unreachable llvm-svn: 357252	2019-03-29 13:40:05 +00:00
Clement Courbet	b70355f0b4	[ScheduleDAG] Move `Topo` and `addEdge` to base class. Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`. There is nothing actually specific to `ScheduleDAGMI` in `Topo`. llvm-svn: 357239	2019-03-29 08:33:05 +00:00
Craig Topper	ea626d8bdb	[SelectionDAGBuilder] Fix 80 column violation. NFC llvm-svn: 357213	2019-03-28 20:52:22 +00:00
Eli Friedman	96f295e23b	[InterleavedAccessPass] Don't increase the number of bytes loaded. Even if the interleaving transform would otherwise be legal, we shouldn't introduce an interleaved load that is wider than the original load: it might have undefined behavior. It might be possible to perform some sort of mask-narrowing transform in some cases (using a narrower interleaved load, then extending the results using shufflevectors). But I haven't tried to implement that, at least for now. Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 . Differential Revision: https://reviews.llvm.org/D59954 llvm-svn: 357212	2019-03-28 20:44:50 +00:00
Nirav Dave	8b9c9822a1	[DAG] Fix Lifetime Node ID hashing. llvm-svn: 357179	2019-03-28 15:53:01 +00:00
Sanjay Patel	ffa8d3def7	[DAGCombiner] fold sext into negation As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178	2019-03-28 15:46:02 +00:00
Simon Pilgrim	38a0616c1d	[DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161	2019-03-28 11:34:21 +00:00
Nirav Dave	6b741a8038	[DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121	2019-03-27 20:37:08 +00:00
Justin Bogner	b1650f0da9	[LegalizeVectorTypes] Allow single loads and stores for more short vectors When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable. (See https://reviews.llvm.org/rL236528 for reference.) This applies that behaviour to vector types. If the vector type is TypePromoteInteger, the element type is going to be TypePromoteInteger as well, which will lead to have a single promoting load rather than N individual promoting loads. For instance, if we have a v3i1, we would now have a load of v4i1 instead of 3 loads of i1. Patch by Guillaume Marques. Thanks! Differential Revision: https://reviews.llvm.org/D56201 llvm-svn: 357120	2019-03-27 20:35:56 +00:00
Nirav Dave	c6dfaa0e83	Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes." This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116	2019-03-27 19:54:41 +00:00
Teresa Johnson	b7e213808c	[CGP] Reset DT when optimizing select instructions Summary: A recent fix (r355751) caused a compile time regression because setting the ModifiedDT flag in optimizeSelectInst means that each time a select instruction is optimized the function walk in runOnFunction stops and restarts again (which was needed to build a new DT before we started building it lazily in r356937). Now that the DT is built lazily, a simple fix is to just reset the DT at this point, rather than restarting the whole function walk. In the future other places that set ModifiedDT may want to switch to just resetting the DT directly. But that will require an evaluation to ensure that they don't otherwise need to restart the function walk. Reviewers: spatel Subscribers: jdoerfert, llvm-commits, xur Tags: #llvm Differential Revision: https://reviews.llvm.org/D59889 llvm-svn: 357111	2019-03-27 18:44:25 +00:00
Nikita Popov	6d855ea024	[ConstantRange] Rename isWrappedSet() to isUpperWrapped() Split out from D59749. The current implementation of isWrappedSet() doesn't do what it says on the tin, and treats ranges like [X, Max] as wrapping, because they are represented as [X, 0) when using half-inclusive ranges. This also makes it inconsistent with the semantics of isSignWrappedSet(). This patch renames isWrappedSet() to isUpperWrapped(), in preparation for the introduction of a new isWrappedSet() method with corrected behavior. llvm-svn: 357107	2019-03-27 18:19:33 +00:00
Matt Arsenault	2e9ddcc30e	RegPressure: Fix crash on blocks with only dbg_value If there were only dbg_values in the block, recede would hit the beginning of the block and try to use thet dbg_value as a real instruction. llvm-svn: 357105	2019-03-27 18:14:02 +00:00
Amara Emerson	381188f1f3	[GlobalISel] Fix legalizer artifact combiner from crashing with invalid dead instructions. The artifact combiners push instructions which have been marked for deletion onto an list for the legalizer to deal with on return. However, for trunc(ext) combines the combiner routine recursively calls itself. When it does this the dead instructions list may not be empty, and the other combiners don't expect to be dealing with essentially invalid MIR (multiple vreg defs etc). This change fixes it by ensuring that the dead instructions are processed on entry into tryCombineInstruction. As a result, this fix exposed a few places in tests where G_TRUNC instructions were not being deleted even though they were dead. Differential Revision: https://reviews.llvm.org/D59892 llvm-svn: 357101	2019-03-27 17:47:42 +00:00
Quentin Colombet	89daf49e5c	[PeepholeOpt] Don't stop simplifying copies on sequence of subregs This patch removes an overly conservative check that would prevent simplifying copies when the value we were tracking would go through several subregister indices. Indeed, the intend of this check was to not track values whenever we have to compose subregister, but actually what the check was doing was bailing anytime we see a second subreg, even if that second subreg would actually be the new source of truth (as opposed to a part of that subreg). Differential Revision: https://reviews.llvm.org/D59891 llvm-svn: 357095	2019-03-27 17:27:56 +00:00
Matt Arsenault	b19361243b	PEI: Delay checking requiresFrameIndexReplacementScavenging Currently this is called before the frame size is set on the function. For AMDGPU, the scavenger is used for large frames where part of the offset needs to be materialized in a register, so estimating the frame size is useful for knowing whether the scavenger is useful. llvm-svn: 357087	2019-03-27 16:37:31 +00:00
Matt Arsenault	733b8571b4	MIR: Freeze reserved regs after parsing everything The AMDGPU implementation of getReservedRegs depends on MachineFunctionInfo fields that are parsed from the YAML section. This was reserving the wrong register since it was setting the reserved regs before parsing the correct one. Some tests were relying on the default reserved set for the assumed default calling convention. llvm-svn: 357083	2019-03-27 16:12:26 +00:00
Nirav Dave	b5630a2ab1	[DAGCombiner] Unify Lifetime and memory Op aliasing. Rework BaseIndexOffset and isAlias to fully work with lifetime nodes and fold in lifetime alias analysis. This is mostly NFC. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59794 llvm-svn: 357070	2019-03-27 14:14:46 +00:00
Nirav Dave	96a264e053	[DAGCombine] Refactor GatherAllAliases. NFCI. llvm-svn: 357069	2019-03-27 14:14:35 +00:00
Hans Wennborg	5c0d7a24e8	Re-commit r355490 "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default" Original commit by Ayonam Ray. This commit adds a regression test for the issue discovered in the previous commit: that the range check for the jump table can only be omitted if the fall-through destination of the jump table is unreachable, which isn't necessarily true just because the default of the switch is unreachable. This addresses the missing optimization in PR41242. > During the lowering of a switch that would result in the generation of a > jump table, a range check is performed before indexing into the jump > table, for the switch value being outside the jump table range and a > conditional branch is inserted to jump to the default block. In case the > default block is unreachable, this conditional jump can be omitted. This > patch implements omitting this conditional branch for unreachable > defaults. > > Differential Revision: https://reviews.llvm.org/D52002 > Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 357067	2019-03-27 14:10:11 +00:00
Jonas Paulsson	38342a5185	[DAGCombiner] Don't allow addcarry if the carry producer is illegal. getAsCarry() checks that the input argument is a carry-producing node before allowing a transformation to addcarry. This patch adds a check to make sure that the carry-producing node is legal. If it is not, it may not remain in a form that is manageable by the target backend. The test case caused a compilation failure during instruction selection for this reason on SystemZ. Patch by Ulrich Weigand. Review: Sanjay Patel https://reviews.llvm.org/D59822 llvm-svn: 357052	2019-03-27 08:41:46 +00:00
Francis Visoiu Mistrih	ee1a6e70fa	[Remarks] Emit a section containing remark diagnostics metadata A section containing metadata on remark diagnostics will be emitted if the flag (-mllvm) -remarks-section is present. For now, the metadata is: * a magic number for remarks: "REMARKS\0" * the version number: a little-endian uint64_t * the absolute file path to the serialized remark diagnostics: a null-terminated string. Differential Revision: https://reviews.llvm.org/D59571 llvm-svn: 357043	2019-03-27 01:13:59 +00:00
Quentin Colombet	c74271c537	[LiveRange] Reset the VNIs when splitting subranges When splitting a subrange we end up with two different subranges covering two different, non overlapping, lanes. As part of this splitting the VNIs of the original live-range need to be dispatched to the subranges according to which lanes they are actually defining. Prior to this patch we were assuming that all values were defining all lanes. This was wrong as demonstrated by llvm.org/PR40835. Differential Revision: https://reviews.llvm.org/D59731 llvm-svn: 357032	2019-03-26 21:27:15 +00:00
Sanjay Patel	bb5cba3cca	[SDAG] add simplifications for FP at node creation time We have the folds for fadd/fsub/fmul already in DAGCombiner, so it may be possible to remove that code if we can guarantee that these ops are zapped before they can exist. llvm-svn: 357029	2019-03-26 20:54:15 +00:00
Ali Tamur	02e96648d7	Revert "[llvm] Reapply "Prevent duplicate files in debug line header in dwarf 5."" This reverts commit rL357020. The commit broke the test llvm/test/tools/llvm-objdump/embedded-source.test on some builds including clang-ppc64be-linux-multistage, clang-s390x-linux, clang-with-lto-ubuntu, clang-x64-windows-msvc, llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast (and others). llvm-svn: 357026	2019-03-26 20:05:27 +00:00
Ali Tamur	2f5cd03a3f	[llvm] Reapply "Prevent duplicate files in debug line header in dwarf 5." Reapply rL356941 after regenerating the object file in the failing test llvm/test/tools/llvm-objdump/embedded-source.test from source. Original commit message: [llvm] Prevent duplicate files in debug line header in dwarf 5. Motivation: In previous dwarf versions, file name indexes started from 1, and the primary source file was not explicit. Dwarf 5 standard (6.2.4) prescribes the primary source file to be explicitly given an entry with an index number 0. The current implementation honors the specification by just duplicating the main source file, once with index number 0, and later maybe with another index number. While this is compliant with the letter of the standard, the duplication causes problems for consumers of this information such as lldb. (Some files are duplicated, where only some of them have a line table although all refer to the same file) With this change, dwarf 5 debug line section files always start from 0, and the zeroth entry is not duplicated whenever possible. This requires different handling of dwarf 4 and dwarf 5 during generation (e.g. when a function returns an index zero for a file name, it signals an error in dwarf 4, but not in dwarf 5) However, I think the minor complication is worth it, because it enables all consumers (lldb, gdb, dwarfdump, objdump, and so on) to treat all files in the file name list homogenously. Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D59515 llvm-svn: 357018	2019-03-26 18:53:23 +00:00
Nirav Dave	a28c514581	[DAG] Avoid smart constructor-based dangling nodes. Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996	2019-03-26 15:08:14 +00:00
Simon Pilgrim	e24441aab0	[TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989	2019-03-26 12:32:01 +00:00
Yi Kong	74b874ac4c	Fix nondeterminism introduced in r353954 DenseMap iteration order is not guaranteed, use MapVector instead. Fix provided by srhines. Differential Revision: https://reviews.llvm.org/D59807 llvm-svn: 356988	2019-03-26 12:18:08 +00:00
Ali Tamur	fdce82a814	Revert "[llvm] Prevent duplicate files in debug line header in dwarf 5." This reverts commit `312ab05887`. My commit broke the build; I will revert and find out what happened. llvm-svn: 356951	2019-03-25 21:09:07 +00:00
Ali Tamur	312ab05887	[llvm] Prevent duplicate files in debug line header in dwarf 5. Summary: Motivation: In previous dwarf versions, file name indexes started from 1, and the primary source file was not explicit. Dwarf 5 standard (6.2.4) prescribes the primary source file to be explicitly given an entry with an index number 0. The current implementation honors the specification by just duplicating the main source file, once with index number 0, and later maybe with another index number. While this is compliant with the letter of the standard, the duplication causes problems for consumers of this information such as lldb. (Some files are duplicated, where only some of them have a line table although all refer to the same file) With this change, dwarf 5 debug line section files always start from 0, and the zeroth entry is not duplicated whenever possible. This requires different handling of dwarf 4 and dwarf 5 during generation (e.g. when a function returns an index zero for a file name, it signals an error in dwarf 4, but not in dwarf 5) However, I think the minor complication is worth it, because it enables all consumers (lldb, gdb, dwarfdump, objdump, and so on) to treat all files in the file name list homogenously. Reviewers: dblaikie, probinson, aprantl, espindola Reviewed By: probinson Subscribers: emaste, jvesely, nhaehnle, aprantl, javed.absar, arichardson, hiraditya, MaskRay, rupprecht, jdoerfert, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D59515 llvm-svn: 356941	2019-03-25 20:08:00 +00:00
Simon Pilgrim	167af1bafb	[SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.). This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........ Differential Revision: https://reviews.llvm.org/D59363 llvm-svn: 356938	2019-03-25 18:51:57 +00:00
Teresa Johnson	3bd4b5a925	[CGP] Build the DominatorTree lazily Summary: In r355512 CGP was changed to build the DominatorTree only once per function traversal, to avoid repeatedly building it each time it was accessed. This solved one compile time issue but introduced another. In the second case, we now were building the DT unnecessarily many times when we performed many function traversals (i.e. more than once per function when running CGP because of changes made each time). Change to saving the DT in the CodeGenPrepare object, and building it lazily when needed. It is reset whenever we need to rebuild it. The case that exposed the issue there are 617 functions, and we walk them (i.e. execute the "while (MadeChange)" loop in runOnFunction) a total of 12083 times (so previously we were building the DT 12083 times). With this patch we only build the DT 844 times (average of 1.37 times per function). We dropped the total time to compile this file from 538.11s without this patch to 339.63s with it. There is still an issue as CGP is taking much longer than all other passes even with this patch, and before a recent compiler release cut at r355392 the total time to this compile was only 97 sec with a huge reduction in CGP time. I suspect that one of the other recent changes to CGP led to iterating each function many more times on average, but I need to do some more investigation. Reviewers: spatel Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59696 llvm-svn: 356937	2019-03-25 18:38:48 +00:00
Matt Arsenault	b27e4974d0	MISched: Don't schedule regions with 0 instructions I think this is correct, but may not necessarily be the correct fix for the assertion I'm really trying to solve. If a scheduling region was found that only has dbg_value instructions, the RegPressure tracker would end up in an inconsistent state because it would skip over any debug instructions and point to an instruction outside of the scheduling region. It may still be possible for this to happen if there are some real schedulable instructions between dbg_values, but I haven't managed to break this. The testcase is extremely sensitive and I'm not sure how to make it more resistent to future scheduler changes that would avoid stressing this situation. llvm-svn: 356926	2019-03-25 17:15:44 +00:00
Craig Topper	07e3071854	[LegalizeDAG] Expand i16 bswap directly to a rotate by 8 instead of relying on DAG combine. An i16 bswap can be implemented with an i16 rotate by 8. We previously emitted a shift and OR sequence that DAG combine should be able to turn back into rotate. But we might as well go there directly. If rotate isn't legal, LegalizeDAG should further legalize it to either the opposite rotate, or the shift and OR pattern. I don't know of any way to get the existing DAG combine reliance to fail. So I don't know any way to add new tests for this that wouldn't have worked previously. llvm-svn: 356860	2019-03-24 17:02:14 +00:00
Teresa Johnson	4dc851964c	[CGP] Make several static functions member functions (NFC) This is extracted from D59696 as suggested in the review. It is preparation for making the DominatorTree a member variable. llvm-svn: 356857	2019-03-24 15:18:50 +00:00
Simon Pilgrim	94e8f152c1	[TargetLowering] SimplifyDemandedBits trunc(srl(x, C1)) - early out for out of range C1. NFCI. llvm-svn: 356810	2019-03-22 20:53:49 +00:00
Matt Arsenault	b34afa311d	GlobalISel: Fix RegBankSelect for REG_SEQUENCE The AArch64 test was broken since the result register already had a set register class, so this test was a no-op. The mapping verify call would fail because the result size is not the same as the inputs like in a copy or phi. The AMDGPU testcases are half broken and introduce illegal VGPR->SGPR copies which need much more work to handle correctly (same for phis), but add them as a baseline. llvm-svn: 356713	2019-03-21 20:45:36 +00:00
Craig Topper	9f0b17a248	[ScalarizeMaskedMemIntrin] Add support for scalarizing expandload and compressstore intrinsics. This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle. I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway. Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that. Differential Revision: https://reviews.llvm.org/D59180 llvm-svn: 356687	2019-03-21 17:38:52 +00:00
Florian Hahn	71033f2987	[DAGCombiner] Use getTokenFactor in a few more cases. SDNodes can only have 64k operands and for some inputs (e.g. large number of stores), we can reach this limit when creating TokenFactor nodes. This patch is a follow up to D56740 and updates a few more places that potentially can create TokenFactors with too many operands. Reviewers: efriedma, craig.topper, aemerson, RKSimon Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D59156 llvm-svn: 356668	2019-03-21 14:32:09 +00:00
Simon Pilgrim	da4992bf8d	[DAGCombine] SimplifySelectCC - call FoldSetCC with the setcc result type We were calling FoldSetCC with the compare operand type instead of the result type. Found by OSS-Fuzz #13838 (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13838) llvm-svn: 356667	2019-03-21 14:07:18 +00:00
Sanjay Patel	d47eac59ef	[CodeGenPrepare] limit formation of overflow intrinsics (PR41129) This is probably a bigger limitation than necessary, but since we don't have any evidence yet that this transform led to real-world perf improvements rather than regressions, I'm making a quick, blunt fix. In the motivating x86 example from: https://bugs.llvm.org/show_bug.cgi?id=41129 ...and shown in the regression test, we want to avoid an extra instruction in the dominating block because that could be costly. The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version is any better than the other yet. Differential Revision: https://reviews.llvm.org/D59602 llvm-svn: 356665	2019-03-21 13:57:07 +00:00
Simon Pilgrim	54ed653870	[SelectionDAG] Add scalarization of ABS node (PR41149) Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D59577 llvm-svn: 356656	2019-03-21 11:18:54 +00:00
Craig Topper	8de7bc0bff	[ScalarizeMaskedMemIntrinsics] Reverse some if conditions to reduce indentations to remove curly braces. Pre-commit for D59180 llvm-svn: 356646	2019-03-21 05:54:37 +00:00
Stanislav Mekhanoshin	0a11829ab2	Allow machine dce to remove uses in the same instruction Machine DCE cannot remove a dead definition if there are non-dbg uses. A use however can be in the same instruction: dead %0 = INST %0 Such instructions sometimes created by Detect dead lanes pass. Allow this instruction to be deleted despite the use if the only use belongs to the same instruction. Differential Revision: https://reviews.llvm.org/D59565 llvm-svn: 356619	2019-03-20 21:42:05 +00:00
Sanjay Patel	a2250e923b	[CGP] fix formatting; NFC llvm-svn: 356572	2019-03-20 16:47:53 +00:00
Sanjay Patel	d1ce455f7b	[CGP] convert chain of 'if' to 'switch'; NFC This should be extended, but CGP does some strange things, so I'm intentionally not changing the potential order of any transforms yet. llvm-svn: 356566	2019-03-20 15:53:06 +00:00
Simon Pilgrim	51f65171e9	Remove out of date comment. NFCI. DAGCombiner::convertBuildVecZextToZext just requires the extractions to be sequential, they don't have to start from 0'th index. llvm-svn: 356552	2019-03-20 12:24:15 +00:00
Clement Courbet	238af52ded	[ExpandMemCmp] Trigger on bcmp too. Summary: Fixes 41150. Reviewers: gchatelet Subscribers: hiraditya, llvm-commits, ckennelly, sbenza, jyknight Tags: #llvm Differential Revision: https://reviews.llvm.org/D59593 llvm-svn: 356550	2019-03-20 11:51:11 +00:00
Florian Hahn	1663c9466f	[DwarfDebug] Skip entries to big for 16 bit size field in Dwarf < 5. Nothing prevents entries from being bigger than the 16 bit size field in Dwarf < 5. For entries that are too big, just emit an empty entry instead of crashing. This fixes PR41038. Reviewers: probinson, aprantl, davide Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D59518 llvm-svn: 356514	2019-03-19 20:37:06 +00:00
Matt Arsenault	cf55a657f0	CodeGen: Refactor regallocator command line and target selection This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. llvm-svn: 356506	2019-03-19 19:33:12 +00:00
Matt Arsenault	3c98cdd218	RegAllocFast: Do not allocate registers for undef uses Do not actually allocate a register for an undef use. Previously we we would create unnecessary reload instruction for undef uses where the register wasn't live. Patch by Matthias Braun llvm-svn: 356501	2019-03-19 19:16:04 +00:00
Matt Arsenault	c2e35a6f32	RegAllocFast: Remove early selection loop, the spill calculation will report cost 0 anyway for free regs The 2nd loop calculates spill costs but reports free registers as cost 0 anyway, so there is little benefit from having a separate early loop. Surprisingly this is not NFC, as many register are marked regDisabled so the first loop often picks up later registers unnecessarily instead of the first one available in the allocation order... Patch by Matthias Braun llvm-svn: 356499	2019-03-19 19:01:34 +00:00
Simon Pilgrim	77482120da	Fix for ABS legalization on PPC buildbot. llvm-svn: 356498	2019-03-19 18:55:46 +00:00
Philip Reames	db65a5b776	Allow unordered loads to be considered invariant in CodeGen The actual code change is fairly straight forward, but exercising it isn't. First, it turned out we weren't adding the appropriate flags in SelectionDAG. Second, it turned out that we've got some optimization gaps, so obvious test cases don't work. My first attempt (in atomic-unordered.ll) points out a deficiency in our peephole-opt folding logic which I plan to fix separately. Instead, I'm exercising this through MachineLICM. Differential Revision: https://reviews.llvm.org/D59375 llvm-svn: 356494	2019-03-19 18:27:18 +00:00
Philip Reames	2153c4b828	[AtomicExpand] Fix a crash bug when lowering unordered loads to cmpxchg Add tests for wider atomic loads and stores. In the process, fix a crasher where we appearently handled unorder stores, but not loads, when lowering to cmpxchg idioms. llvm-svn: 356482	2019-03-19 17:20:49 +00:00
Justin Bogner	b353d6887e	[DAGCombine] Fix a miscompile when reducing BUILD_VECTORs to a shuffle In r311255 we added a case where we split vectors whose elements are all derived from the same input vector so that we could shuffle it more efficiently. In doing so, createBuildVecShuffle was taught to adjust for the fact that all indices would be based off of the first vector when this happens, but it's possible for the code that checked that to fire incorrectly if we happen to have a BUILD_VECTOR of extracts from subvectors and don't hit this new optimization. Instead of trying to detect if we've split the vector by checking if we have extracts from the same base vector, we can just pass that information into createBuildVecShuffle, avoiding the miscompile. Differential Revision: https://reviews.llvm.org/D59507 llvm-svn: 356476	2019-03-19 16:52:00 +00:00
Simon Pilgrim	a56f2822d0	[SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect These changes are related to PR37743 and include: SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node. Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner. Add promoting the integer ABS node in the LegalizeIntegerType. Expand-based legalization of integer result for the ABS nodes. Expand-based legalization of ABS vector operations. Add some integer abs testcases for different typesizes for Thumb arch Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to: tmp = (SRA, Hi, 31) Lo = (UADDO tmp, Lo) Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1)) Lo = (XOR tmp, Lo) The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern: (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))). Change integer abs testcases for codegen with the ABS node support for AArch64. Indicate that the ABS is legal for the i64 type when the NEON is supported. Change the integer abs testcases to show changing of codegen. Add combine and legalization of ABS nodes for Thumb arch. Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition. For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743 Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D49837 llvm-svn: 356468	2019-03-19 16:24:55 +00:00
Markus Lavin	b86ce219f4	[DebugInfo] Introduce DW_OP_LLVM_convert Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows for a convenient way to perform type conversions on the Dwarf expression stack. As an additional bonus it paves the way for using other Dwarf v5 ops that need to reference a base_type. The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp to perform sext/zext on debug values but mainly the patch is about preparing terrain for adding other Dwarf v5 ops that need to reference a base_type. For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a complex shift & mask pattern is generated to emulate sext/zext. This is a recommit of r356442 with trivial fixes for the failing tests. Differential Revision: https://reviews.llvm.org/D56587 llvm-svn: 356451	2019-03-19 13:16:28 +00:00
Markus Lavin	ad78768d59	Revert "[DebugInfo] Introduce DW_OP_LLVM_convert" This reverts commit 1cf4b593a7ebd666fc6775f3bd38196e8e65fafe. Build bots found failing tests not detected locally. Failing Tests (3): LLVM :: DebugInfo/Generic/convert-debugloc.ll LLVM :: DebugInfo/Generic/convert-inlined.ll LLVM :: DebugInfo/Generic/convert-linked.ll llvm-svn: 356444	2019-03-19 09:17:28 +00:00
Markus Lavin	cd8a940b37	[DebugInfo] Introduce DW_OP_LLVM_convert Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows for a convenient way to perform type conversions on the Dwarf expression stack. As an additional bonus it paves the way for using other Dwarf v5 ops that need to reference a base_type. The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp to perform sext/zext on debug values but mainly the patch is about preparing terrain for adding other Dwarf v5 ops that need to reference a base_type. For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a complex shift & mask pattern is generated to emulate sext/zext. Differential Revision: https://reviews.llvm.org/D56587 llvm-svn: 356442	2019-03-19 08:48:19 +00:00
Amara Emerson	a140276a1e	[GlobalISel] Include missing change from r356396 Forgot to add a change to relax some asserts in r356396. llvm-svn: 356411	2019-03-18 21:29:21 +00:00
Amara Emerson	8627178d46	Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy() After review comments, it was preferred to not teach MachineIRBuilder about non-generic instructions beyond using buildInstr(). For AArch64 I've changed the buildCopy() calls to buildInstr() + a separate addReg() call. This also relaxes the MachineIRBuilder's COPY checking more because it may not always have a SrcOp given to it. llvm-svn: 356396	2019-03-18 19:20:10 +00:00
Adhemerval Zanella	664c1ef528	[TargetLowering] Add code size information on isFPImmLegal. NFC This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389	2019-03-18 18:40:07 +00:00
Nirav Dave	55c921f4bf	[DAG] Cleanup unused node in SimplifySelectCC. Delete temporarily constructed node uses for analysis after it's use, holding onto original input nodes. Ideally this would be rewritten without making nodes, but this appears relatively complex. Reviewers: spatel, RKSimon, craig.topper Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57921 llvm-svn: 356382	2019-03-18 17:02:38 +00:00
David Stenberg	8a2e4af7e7	[DebugInfo] Ignore bitcasts when lowering stack arg dbg.values Summary: Look past bitcasts when looking for parameter debug values that are described by frame-index loads in `EmitFuncArgumentDbgValue()`. In the attached test case we would be left with an undef `DBG_VALUE` for the parameter without this patch. A similar fix was done for parameters passed in registers in D13005. This fixes PR40777. Reviewers: aprantl, vsk, jmorse Reviewed By: aprantl Subscribers: bjope, javed.absar, jdoerfert, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D58831 llvm-svn: 356363	2019-03-18 11:27:32 +00:00
Tim Renouf	c4e128e221	[CodeGen] Defined MVTs v3i32, v3f32, v5i32, v5f32 AMDGPU would like to use these MVTs. Differential Revision: https://reviews.llvm.org/D58901 Change-Id: I6125fea810d7cc62a4b4de3d9904255a1233ae4e llvm-svn: 356351	2019-03-17 22:56:38 +00:00
Tim Renouf	c302b9b5fe	[CodeGen] Prepare for introduction of v3 and v5 MVTs AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Exclude non-legal non-power-of-2 vector types from ComputeRegisterProp mechanism in TargetLoweringBase::getTypeConversion. * Cope with SETCC and VSELECT for odd-width i1 vector when the other vectors are legal type. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58899 Change-Id: Ib5f23377dbef511be3a936211a0b9f94e46331f8 llvm-svn: 356350	2019-03-17 21:43:12 +00:00
Matt Arsenault	884a18d792	RegAllocFast: Add hint to debug printing llvm-svn: 356348	2019-03-17 21:31:40 +00:00
Nikita Popov	9a4453592b	[DAGCombine] Fold (x & ~y) \| y patterns Fold (x & ~y) \| y and it's four commuted variants to x \| y. This pattern can in particular appear when a vselect c, x, -1 is expanded to (x & ~c) \| (-1 & c) and combined to (x & ~c) \| c. This change has some overlap with D59066, which avoids creating a vselect of this form in the first place during uaddsat expansion. Differential Revision: https://reviews.llvm.org/D59174 llvm-svn: 356333	2019-03-17 15:45:38 +00:00
Sanjay Patel	6a6e808b69	[TargetLowering] improve the default expansion of uaddsat/usubsat This is a subset of what was proposed in: D59006 ...and may overlap with test changes from: D59174 ...but it seems like a good general optimization to turn selects into bitwise-logic when possible because we never know exactly what can happen at this stage of DAG combining depending on how the target has defined things. Differential Revision: https://reviews.llvm.org/D59066 llvm-svn: 356332	2019-03-17 14:57:40 +00:00
Simon Pilgrim	3b0a6c69ee	[DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097) rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef. llvm-svn: 356327	2019-03-16 17:36:26 +00:00
Heejin Ahn	66ce419468	[WebAssembly] Make rethrow take an except_ref type argument Summary: In the new wasm EH proposal, `rethrow` takes an `except_ref` argument. This change was missing in r352598. This patch adds `llvm.wasm.rethrow.in.catch` intrinsic. This is an intrinsic that's gonna eventually be lowered to wasm `rethrow` instruction, but this intrinsic can appear only within a catchpad or a cleanuppad scope. Also this intrinsic needs to be invokable - otherwise EH pad successor for it will not be correctly generated in clang. This also adds lowering logic for this intrinsic in `SelectionDAGBuilder::visitInvoke`. This routine is basically a specialized and simplified version of `SelectionDAGBuilder::visitTargetIntrinsic`, but we can't use it because if is only for `CallInst`s. This deletes the previous `llvm.wasm.rethrow` intrinsic and related tests, which was meant to be used within a `__cxa_rethrow` library function. Turned out this needs some more logic, so the intrinsic for this purpose will be added later. LateEHPrepare takes a result value of `catch` and inserts it into matching `rethrow` as an argument. `RETHROW_IN_CATCH` is a pseudo instruction that serves as a link between `llvm.wasm.rethrow.in.catch` and the real wasm `rethrow` instruction. To generate a `rethrow` instruction, we need an `except_ref` argument, which is generated from `catch` instruction. But `catch` instrutions are added in LateEHPrepare pass, so we use `RETHROW_IN_CATCH`, which takes no argument, until we are able to correctly lower it to `rethrow` in LateEHPrepare. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59352 llvm-svn: 356316	2019-03-16 05:38:57 +00:00
Amara Emerson	7097e83dab	[GlobalISel] Make isel verification checks of vregs run under NDEBUG only. llvm-svn: 356309	2019-03-16 01:02:10 +00:00
Amara Emerson	3739a20875	[GlobalISel] Allow MachineIRBuilder to build subregister copies. This relaxes some asserts about sizes, and adds an optional subreg parameter to buildCopy(). Also update AArch64 instruction selector to use this in places where we previously used MachineInstrBuilder manually. Differential Revision: https://reviews.llvm.org/D59434 llvm-svn: 356304	2019-03-15 21:59:50 +00:00
Simon Pilgrim	8fbe439345	[SelectionDAG] Add SimplifyDemandedBits handling for ISD::SCALAR_TO_VECTOR Fixes a lot of constant folding mismatches between i686 and x86_64 llvm-svn: 356273	2019-03-15 17:00:55 +00:00
Mikael Holmen	339daae806	[CodeGenPrepare] avoid crashing from replacing a phi twice Summary: This is a fix to bug 41052: https://bugs.llvm.org/show_bug.cgi?id=41052 While trying to optimize a memory instruction in a dead basic block, we end up registering the same phi for replacement twice. This patch avoids registering more than the first replacement candidate for a phi. Patch by: JesperAntonsson Reviewers: skatkov, aprantl Reviewed By: aprantl Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59358 llvm-svn: 356260	2019-03-15 13:51:05 +00:00
Sanjay Patel	2c9275a790	[CGP] add another bailout for degenerate code (PR41064) This is almost the same as: rL355345 ...and should prevent any potential crashing from examples like: https://bugs.llvm.org/show_bug.cgi?id=41064 ...although the bug was masked by: rL355823 ...and I'm not sure how to repro the problem after that change. llvm-svn: 356218	2019-03-14 23:14:31 +00:00
Matt Arsenault	bc6d07ca46	MIR: Allow targets to serialize MachineFunctionInfo This has been a very painful missing feature that has made producing reduced testcases difficult. In particular the various registers determined for stack access during function lowering were necessary to avoid undefined register errors in a large percentage of cases. Implement a subset of the important fields that need to be preserved for AMDGPU. Most of the changes are to support targets parsing register fields and properly reporting errors. The biggest sort-of bug remaining is for fields that can be initialized from the IR section will be overwritten by a default initialized machineFunctionInfo section. Another remaining bug is the machineFunctionInfo section is still printed even if empty. llvm-svn: 356215	2019-03-14 22:54:43 +00:00
Philip Reames	70d156991c	Allow code motion (and thus folding) for atomic (but unordered) memory operands Building on the work done in D57601, now that we can distinguish between atomic and volatile memory accesses, go ahead and allow code motion of unordered atomics. As seen in the diffs, this allows much better folding of memory operations into using instructions. (Mostly done by the PeepholeOpt pass.) Note: I have not reviewed all callers of hasOrderedMemoryRef since one of them - isSafeToMove - is very widely used. I'm relying on the documented semantics of each method to judge correctness. Differential Revision: https://reviews.llvm.org/D59345 llvm-svn: 356170	2019-03-14 17:20:59 +00:00
Adrian Prantl	e69917f166	Add IR debug info support for Elemental, Pure, and Recursive Procedures. Patch by Eric Schweitz! Differential Revision: https://reviews.llvm.org/D54043 llvm-svn: 356163	2019-03-14 16:29:54 +00:00
Matt Arsenault	133716929c	GlobalISel: Use multiple returns for intrinsic structs This is consistent with what SelectionDAG does and is much easier to work with than the extract sequence with an artificial wide register. For the AMDGPU control flow intrinsics, this was producing an s128 for the i64, i1 tuple return. Any legalization that should apply to a real s128 value would badly obscure the direct values that need to be seen. llvm-svn: 356147	2019-03-14 14:18:56 +00:00
Quentin Colombet	e77e5f44b8	[GlobalISel][Utils] Add a getConstantVRegVal variant that looks through instrs getConstantVRegVal used to only look for G_CONSTANT when looking at unboxing the value of a vreg. However, constants are sometimes not directly used and are hidden behind trunc, s\|zext or copy chain of computation. In particular this may be introduced by the legalization process that doesn't want to simplify these patterns because it can lead to infine loop when legalizing a constant. To circumvent that problem, add a new variant of getConstantVRegVal, named getConstantVRegValWithLookThrough, that allow to look through extensions. Differential Revision: https://reviews.llvm.org/D59227 llvm-svn: 356116	2019-03-14 01:37:13 +00:00
Craig Topper	66df7361ff	[ResetMachineFunctionPass] Add visited functions statistics info Adding a "NumFunctionsVisited" for collecting the visited function number. It can be used to collect function pass rate in some tests, the pass rate = (NumberVisited - NumberReset)/NumberVisited. e.g. it can be used for caculating GlobalISel pass rate in Test-Suite. Patch by Tianyang Zhu (zhutianyang) Differential Revision: https://reviews.llvm.org/D59285 llvm-svn: 356114	2019-03-14 01:13:15 +00:00
Nirav Dave	ee5183c796	[DAGCombiner] Fix Comment. NFC. llvm-svn: 356069	2019-03-13 17:44:40 +00:00
Nirav Dave	d6351340bb	[DAGCombiner] If a TokenFactor would be merged into its user, consider the user later. Summary: A number of optimizations are inhibited by single-use TokenFactors not being merged into the TokenFactor using it. This makes we consider if we can do the merge immediately. Most tests changes here are due to the change in visitation causing minor reorderings and associated reassociation of paired memory operations. CodeGen tests with non-reordering changes: X86/aligned-variadic.ll -- memory-based add folded into stored leaq value. X86/constant-combiners.ll -- Optimizes out overlap between stores. X86/pr40631_deadstore_elision -- folds constant byte store into preceding quad word constant store. Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet Reviewed By: courbet Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59260 llvm-svn: 356068	2019-03-13 17:07:09 +00:00
Clement Courbet	3bb5d0bb9b	Re-land r354244 "[DAGCombiner] Eliminate dead stores to stack." Always check candidates for hasOtherUses(), not only stores. llvm-svn: 356050	2019-03-13 13:56:23 +00:00
Simon Pilgrim	360ce82db2	[DAG] Move integer setcc %x, %x folding into FoldSetCC First step towards PR40800 - I intend to move the float case in a separate future patch. I had to tweak the (overly reduced) thumb2 test and the x86 widening test change is annoying (no longer rematerializable) but we should address this separately. Differential Revision: https://reviews.llvm.org/D59244 llvm-svn: 356040	2019-03-13 11:08:57 +00:00
Philip Reames	21a50ccf9c	[ImplicitNullChecks] Support unordered atomic accesses Update the INC pass to allow folding unordered atomics. This is the first optimization unblocked by the changes landed from D57601. llvm-svn: 356006	2019-03-13 03:25:20 +00:00
Matt Arsenault	bdfb6cfdf1	MIR: Stop reinitializing target information for every use Every time a physical register reference was parsed, this would initialize a string map for every register in in target, and discard it for the next. The same applies for the other fields initialized from target information. Follow along with how the function state is tracked, and add a new tracking class for target information. The string->register class/register bank for some reason were kept separately, so track them in the same place. llvm-svn: 355970	2019-03-12 20:42:12 +00:00
Philip Reames	18408d5e79	[CodeGen] Add MMOs to statepoint nodes during SelectionDAG The existing statepoint lowering code does something odd; it adds machine memory operands post instruction selection. This was copied from the stackmap/patchpoint implementation, but appears to be non-idiomatic. This change is largely NFC. It moves the MMO creation logic into SelectionDAG building. It ends up not quite being NFC because the size of the stack slot is reflected in the MMO. The old code blindly used pointer size for the MMO size, which appears to have always been incorrect for larger values. It just happened nothing actually relied on the MMOs, so it worked out okay. For context, I'm planning on removing the MOVolatile flag from these in a future commit, and then removing the MOStore flag from deopt spill slots in a separate one. Doing so is motivated by a small test case where we should be able to better schedule spill slots, but don't do so due to a memory use/def implied by the statepoint. Differential Revision: https://reviews.llvm.org/D59106 llvm-svn: 355953	2019-03-12 19:12:33 +00:00
Nikita Popov	149bc099f6	[SDAG] Expand pow2 mulo using shifts Expand MULO with constant power of two operand into a shift. The overflow is checked with (x << shift) >> shift == x, where the right shift will be logical for umulo and arithmetic for smulo (with exception for multiplications by signed_min). Differential Revision: https://reviews.llvm.org/D59041 llvm-svn: 355937	2019-03-12 16:57:25 +00:00
Simon Pilgrim	9f0a5ca843	[DAGCombine] Pull out repeated demanded bitmask generation. NFCI. llvm-svn: 355932	2019-03-12 15:58:28 +00:00
Tim Northover	8935aca9c7	CodeGenPrep: preserve inbounds attribute when sinking GEPs. Targets can potentially emit more efficient code if they know address computations never overflow. For example ILP32 code on AArch64 (which only has 64-bit address computation) can ignore the possibility of overflow with this extra information. llvm-svn: 355926	2019-03-12 15:22:23 +00:00
Eugene Leviant	1e249caaec	[CGP] Fix UB when GEP is bound to trivial PHINode Differential revision: https://reviews.llvm.org/D59140 llvm-svn: 355904	2019-03-12 10:10:29 +00:00
Sanjoy Das	3f5ce18658	Reland "Relax constraints for reduction vectorization" Change from original commit: move test (that uses an X86 triple) into the X86 subdirectory. Original description: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 llvm-svn: 355889	2019-03-12 01:31:44 +00:00
Nathan Lanza	cc51dc649a	Add Swift enumerator value for CodeView::SourceLanguage Summary: Swift now generates PDBs for debugging on Windows. llvm and lldb need a language enumerator value too properly handle the output emitted by swiftc. Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59231 llvm-svn: 355882	2019-03-11 23:27:59 +00:00
Sanjoy Das	2136a5bc49	Revert "Relax constraints for reduction vectorization" This reverts commit r355868. Breaks hexagon. llvm-svn: 355873	2019-03-11 22:37:31 +00:00
Evgeniy Stepanov	aedec3f684	Remove ASan asm instrumentation. Summary: It is incomplete and has no users AFAIK. Reviewers: pcc, vitalybuka Subscribers: srhines, kubamracek, mgorny, krytarowski, eraman, hiraditya, jdoerfert, #sanitizers, llvm-commits, thakis Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D59154 llvm-svn: 355870	2019-03-11 21:50:10 +00:00
Sanjoy Das	93f8cc186a	Relax constraints for reduction vectorization Summary: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 llvm-svn: 355868	2019-03-11 21:36:41 +00:00
Jessica Paquette	42d16501e6	[GlobalISel][AArch64] Always fall back on aarch64.neon.addp.* Overloaded intrinsics aren't necessarily safe for instruction selection. One such intrinsic is aarch64.neon.addp.*. This is a temporary workaround to ensure that we always fall back on that intrinsic. Eventually this will be replaced with a proper solution. https://bugs.llvm.org/show_bug.cgi?id=40968 Differential Revision: https://reviews.llvm.org/D59062 llvm-svn: 355865	2019-03-11 20:51:17 +00:00
Nikita Popov	aa7cfa75f9	[SDAG][AArch64] Legalize VECREDUCE Fixes https://bugs.llvm.org/show_bug.cgi?id=36796. Implement basic legalizations (PromoteIntRes, PromoteIntOp, ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes. There are more legalizations missing (esp float legalizations), but there's no way to test them right now, so I'm not adding them. This also includes a few more changes to make this work somewhat reasonably: * Add support for expanding VECREDUCE in SDAG. Usually experimental.vector.reduce is expanded prior to codegen, but if the target does have native vector reduce, it may of course still be necessary to expand due to legalization issues. This uses a shuffle reduction if possible, followed by a naive scalar reduction. * Allow the result type of integer VECREDUCE to be larger than the vector element type. For example we need to be able to reduce a v8i8 into an (nominally) i32 result type on AArch64. * Use the vector operand type rather than the scalar result type to determine the action, so we can control exactly which vector types are supported. Also change the legalize vector op code to handle operations that only have vector operands, but no vector results, as is the case for VECREDUCE. * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE), explicitly specify for which vector types the reductions are supported. This does not handle anything related to VECREDUCE_STRICT_*. Differential Revision: https://reviews.llvm.org/D58015 llvm-svn: 355860	2019-03-11 20:22:13 +00:00
Jonas Paulsson	8b8dc50e79	[RegAlloc] Avoid compile time regression with multiple copy hints. As a fix for https://bugs.llvm.org/show_bug.cgi?id=40986 ("excessive compile time building opencollada"), this patch makes sure that no phys reg is hinted more than once from getRegAllocationHints(). This handles the case were many virtual registers are assigned to the same physreg. The previous compile time fix (r343686) in weightCalcHelper() only made sure that physical/virtual registers are passed no more than once to addRegAllocationHint(). Review: Dimitry Andric, Quentin Colombet https://reviews.llvm.org/D59201 llvm-svn: 355854	2019-03-11 19:00:37 +00:00
Simon Pilgrim	f3be93a2ff	[DAG] FoldSetCC - reuse valuetype + ensure its simple. llvm-svn: 355847	2019-03-11 17:56:18 +00:00
Brian Gesiak	4349dc76fa	[Utils] Extract EliminateUnreachableBlocks (NFC) Summary: Extract the functionality of eliminating unreachable basic blocks within a function, previously encapsulated within the -unreachableblockelim pass, and make it available as a function within BlockUtils.h. No functional change intended other than making the logic reusable. Exposing this logic makes it easier to implement https://reviews.llvm.org/D59068, which fixes coroutines bug https://bugs.llvm.org/show_bug.cgi?id=40979. Reviewers: mkazantsev, wmi, davidxl, silvas, davide Reviewed By: davide Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59069 llvm-svn: 355846	2019-03-11 17:51:57 +00:00
Simon Pilgrim	1bb5b56485	[DAG] Move SetCC NaN handling into FoldSetCC llvm-svn: 355845	2019-03-11 17:43:10 +00:00
Simon Pilgrim	53518b45a5	[DAG] TargetLowering::SimplifySetCC - call FoldSetCC early to handle constant/commute folds. Noticed while looking at PR40800 (and also D57921) llvm-svn: 355828	2019-03-11 15:01:31 +00:00
Sam Parker	52760bf435	[CGP] Limit distance between overflow math and cmp Inserting an overflowing arithmetic intrinsic can increase register pressure by producing two values at a point where only one is needed, while the second use maybe several blocks away. This increase in pressure is likely to be more detrimental on performance than rematerialising one of the original instructions. So, check that the arithmetic and compare instructions are no further apart than their immediate successor/predecessor. Differential Revision: https://reviews.llvm.org/D59024 llvm-svn: 355823	2019-03-11 13:19:46 +00:00
Benjamin Kramer	6ff32e143a	[MIPS GlobalISel] Silence uninitialized variable warning The control flow here cannot ever use the uninitialized value, but it's too hard for the compiler to figure that out. Clang warns: llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2600:28: error: variable 'CarrySum' is used uninitialized whenever 'for' loop exits because its condition is false [-Werror,-Wsometimes-uninitialized] for (unsigned i = 2; i < Factors.size(); ++i) ^~~~~~~~~~~~~~~~~~ llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2604:26: note: uninitialized use occurs here CarrySumPrevDstIdx = CarrySum; ^~~~~~~~ llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2600:28: note: remove the condition if it is always true for (unsigned i = 2; i < Factors.size(); ++i) ^~~~~~~~~~~~~~~~~~ llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp:2583:22: note: initialize the variable 'CarrySum' to silence this warning unsigned CarrySum; ^ = 0 llvm-svn: 355818	2019-03-11 10:39:15 +00:00
Petar Avramovic	5229f47f9f	[MIPS GlobalISel] NarrowScalar G_UMULH NarrowScalar G_UMULH in LegalizerHelper using multiplyRegisters helper function. NarrowScalar G_UMULH for MIPS32. Differential Revision: https://reviews.llvm.org/D58825 llvm-svn: 355815	2019-03-11 10:08:44 +00:00
Petar Avramovic	0b17e59b5c	[MIPS GlobalISel] NarrowScalar G_MUL Narrow Scalar G_MUL for MIPS32. Revisit NarrowScalar implementation in LegalizerHelper. Introduce new helper function multiplyRegisters. It performs generic multiplication of values held in multiple registers. Generated instructions use only types NarrowTy and i1. Destination can be same or two times size of the source. Differential Revision: https://reviews.llvm.org/D58824 llvm-svn: 355814	2019-03-11 10:00:17 +00:00
Amaury Sechet	a135fd5562	Remove redundant extractBooleanFlip argument. NFC llvm-svn: 355794	2019-03-11 00:37:01 +00:00
Sanjay Patel	7d8260feb6	[CGP] fix comments; NFC llvm-svn: 355791	2019-03-10 18:42:30 +00:00
Craig Topper	1a872f2b15	Recommit r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher. Original commit message: Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355784	2019-03-10 05:21:52 +00:00
Amaury Sechet	b62642a115	Refactor isBooleanFlip into extractBooleanFlip so that users do not depend on the patern matched. NFC llvm-svn: 355769	2019-03-09 02:51:52 +00:00
Craig Topper	69f8c1653d	[ScalarizeMaskedMemIntrin] Use IRBuilder functions that take uint32_t/uint64_t for getelementptr, extractelement, and insertelement. This saves needing to call getInt32 ourselves. Making the code a little shorter. The test changes are because insert/extract use getInt64 internally. Shouldn't be a functional issue. This cleanup because I plan to write similar code for expandload/compressstore. llvm-svn: 355767	2019-03-09 02:08:41 +00:00
Wei Mi	98214347c4	Rename a local variable counter to Counter. llvm-svn: 355759	2019-03-08 23:32:07 +00:00
Wei Mi	fb9693d1c9	[RegisterCoalescer][NFC] bind a DenseMap access to a reference to avoid repeated lookup operations llvm-svn: 355757	2019-03-08 23:29:46 +00:00
Craig Topper	d84f605910	[ScalarizeMaskedMemIntrin] Only set the ModifiedDT flag if new basic blocks were added. There are special cases in the scalarization for constant masks. If we hit one of the special cases we don't need to reset the iteration. Noticed while starting work on adding expandload/compressstore to this pass. llvm-svn: 355754	2019-03-08 23:03:43 +00:00
Rong Xu	ce3be45cac	[CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst r44412 fixed a huge compile time regression but it needed ModifiedDT flag to be maintained correctly in optimizations in optimizeBlock() and optimizeInst(). Function optimizeSelectInst() does not update the flag. This patch propagates the flag in optimizeSelectInst() back to optimizeBlock(). This patch also removes ModifiedDT in CodeGenPrepare class (which is not used). The property of ModifiedDT is now recorded in a ref parameter. Differential Revision: https://reviews.llvm.org/D59139 llvm-svn: 355751	2019-03-08 22:46:18 +00:00
Matt Arsenault	26e76ef0e2	DAG: Don't try to cluster loads with tied inputs This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728	2019-03-08 20:46:15 +00:00
Amaury Sechet	782ac933b5	[DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a) Summary: This pattern is sometime created after legalization. Reviewers: efriedma, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58874 llvm-svn: 355716	2019-03-08 19:39:32 +00:00
Wei Mi	72ec6801b5	[RegisterCoalescer] Limit the number of joins for large live interval with many valnos. Recently we found compile time out problem in several cases when SpeculativeLoadHardening was enabled. The significant compile time was spent in register coalescing pass, where register coalescer tried to join many other live intervals with some very large live intervals with many valnos. Specifically, every time JoinVals::mapValues is called, computeAssignment will be called by getNumValNums() times of the target live interval. If the large live interval has N valnos and has N copies associated with it, trying to coalescing those copies will at least cost N^2 complexity. The patch adds some limit to the effort trying to join those very large live intervals with others. By default, for live interval with > 100 valnos, and when it has been coalesced with other live interval by more than 100 times, we will stop coalescing for the live interval anymore. That put a compile time cap for the N^2 algorithm and effectively solves the compile time problem we saw. Differential revision: https://reviews.llvm.org/D59143 llvm-svn: 355714	2019-03-08 19:25:32 +00:00
Simon Pilgrim	04e8439f72	[DAGCombine] Merge visitSMULO+visitUMULO into visitMULO. NFCI. llvm-svn: 355690	2019-03-08 11:41:18 +00:00
Simon Pilgrim	c71d6d157f	[DAGCombine] Merge visitSADDO+visitUADDO into visitADDO. NFCI. llvm-svn: 355689	2019-03-08 11:30:33 +00:00
Simon Pilgrim	2c2e76a9e2	[DAGCombine] Merge visitSSUBO+visitUSUBO into visitSUBO. NFCI. llvm-svn: 355688	2019-03-08 11:16:55 +00:00
Brian Gesiak	4e467043fb	[CodeGen] Reuse BlockUtils for -unreachableblockelim pass (NFC) Summary: The logic in the -unreachableblockelim pass does the following: 1. It traverses the function it's given in depth-first order and creates a set of basic blocks that are unreachable from the function's entry node. 2. It iterates over each of those unreachable blocks and (1) removes any successors' references to the dead block, and (2) replaces any uses of instructions from the dead block with null. The logic in (2) above is identical to what the `llvm::DeleteDeadBlocks` function from `BasicBlockUtils.h` does. The only difference is that `llvm::DeleteDeadBlocks` replaces uses of instructions from dead blocks not with null, but with undef. Replace the duplicate logic in the -unreachableblockelim pass with a call to `llvm::DeleteDeadBlocks`. This results in less code but no functional change (NFC). Reviewers: mkazantsev, wmi, davidxl, silvas, davide Reviewed By: davide Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59064 llvm-svn: 355634	2019-03-07 20:40:55 +00:00
Paul Robinson	05efe0fdc4	[PS4] Emit a trap after a stack-protector fail call. llvm-svn: 355542	2019-03-06 19:57:43 +00:00
Philip Reames	9549f7560f	[AtomicExpand] Allow libcall expansion for non-zero address spaces (try 2) Restore a reverted commit, with the silly mistake fixed. Sorry for the previous breakage. Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead. Differential Revision: https://reviews.llvm.org/D58760 llvm-svn: 355540	2019-03-06 19:27:13 +00:00
Simon Pilgrim	9d6347cfc1	[DAGCombine] Improve select (not Cond), N1, N2 -> select Cond, N2, N1 fold Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well. Requested by @spatel on D59006 llvm-svn: 355533	2019-03-06 18:52:52 +00:00
Simon Pilgrim	cdf95f8f07	[DAGCombiner] Enable UADDO/USUBO vector combine support Differential Revision: https://reviews.llvm.org/D58965 llvm-svn: 355517	2019-03-06 16:11:03 +00:00
Sanjay Patel	1fefc30b08	[TargetLowering] simplify code for uaddsat/usubsat expansion; NFC We had 2 local variable names for the same type. llvm-svn: 355516	2019-03-06 16:06:27 +00:00
Alexander Kornienko	3d467a890e	Revert "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default" This reverts commit `2a0f2c5ef3` (r355490). The commit causes an assertion failure when compiling LLVM code: $ cat repro.cpp class QQQ { public: bool x() const; bool y() const; unsigned getSizeInBits() const { if (y() \|\| x()) return getScalarSizeInBits(); return getScalarSizeInBits() * 2; } unsigned getScalarSizeInBits() const; }; int f(const QQQ &Ty) { switch (Ty.getSizeInBits()) { case 1: case 8: return 0; case 16: return 1; case 32: return 2; case 64: return 3; default: __builtin_unreachable(); } } $ clang -O2 -o repro.o repro.cpp assert.h assertion failed at llvm/include/llvm/ADT/ilist_iterator.h:139 in llvm::ilist_iterator::reference llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, true, false>::operator() const [OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void>, IsReverse = true, IsConst = false]: !NodePtr->isKnownSentinel() Check failure stack trace: * @ 0x558aab4afc10 __assert_fail @ 0x558aa885479b llvm::ilist_iterator<>::operator() @ 0x558aa8854715 llvm::MachineInstrBundleIterator<>::operator() @ 0x558aa92c33c3 llvm::X86InstrInfo::optimizeCompareInstr() @ 0x558aa9a9c251 (anonymous namespace)::PeepholeOptimizer::optimizeCmpInstr() @ 0x558aa9a9b371 (anonymous namespace)::PeepholeOptimizer::runOnMachineFunction() @ 0x558aa99a4fc8 llvm::MachineFunctionPass::runOnFunction() @ 0x558aab019fc4 llvm::FPPassManager::runOnFunction() @ 0x558aab01a3a5 llvm::FPPassManager::runOnModule() @ 0x558aab01aa9b (anonymous namespace)::MPPassManager::runOnModule() @ 0x558aab01a635 llvm::legacy::PassManagerImpl::run() @ 0x558aab01afe1 llvm::legacy::PassManager::run() @ 0x558aa5914769 (anonymous namespace)::EmitAssemblyHelper::EmitAssembly() @ 0x558aa5910f44 clang::EmitBackendOutput() @ 0x558aa5906135 clang::BackendConsumer::HandleTranslationUnit() @ 0x558aa6d165ad clang::ParseAST() @ 0x558aa6a94e22 clang::ASTFrontendAction::ExecuteAction() @ 0x558aa590255d clang::CodeGenAction::ExecuteAction() @ 0x558aa6a94840 clang::FrontendAction::Execute() @ 0x558aa6a38cca clang::CompilerInstance::ExecuteAction() @ 0x558aa4e2294b clang::ExecuteCompilerInvocation() @ 0x558aa4df6200 cc1_main() @ 0x558aa4e1b37f ExecuteCC1Tool() @ 0x558aa4e1a725 main @ 0x7ff20d56abbd __libc_start_main @ 0x558aa4df51c9 _start llvm-svn: 355515	2019-03-06 15:23:50 +00:00
Teresa Johnson	b1daf0aef6	[CGP] Avoid repeatedly building DominatorTree causing long compile-time (NFC) Summary: In r354298 a DominatorTree construction was added via new function combineToUSubWithOverflow, which was subsequently restructured into replaceMathCmpWithIntrinsic in r354689. We are hitting a very long compile time due to this repeated construction, once per math cmp in the function. We shouldn't need to build the DominatorTree more than once per function, except when a transformation invalidates it. There is already a boolean flag that is returned from these methods indicating whether the DT has been modified. We can simply build the DT once per Function walk in CodeGenPrepare::runOnFunction, since any time a change is made we break out of the Function walk and restart it. I modified the code so that both replaceMathCmpWithIntrinsic as well as mergeSExts (which was also building a DT) use the DT constructed by the run method. From -mllvm -time-passes: Before this patch: CodeGen Prepare user time is 328s With this patch: CodeGen Prepare user time is 21s Reviewers: spatel Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58995 llvm-svn: 355512	2019-03-06 14:57:40 +00:00
Francis Visoiu Mistrih	6b622ebea0	Revert "[Remarks] Refactor remark diagnostic emission in a RemarkStreamer" This reverts commit 2e8c4997a2089f8228c843fd81b148d903472e02. Breaks bots. llvm-svn: 355511	2019-03-06 14:52:37 +00:00
Sanjay Patel	89e534746f	[TargetLowering] simplify code for uaddsat/usubsat expansion; NFC llvm-svn: 355508	2019-03-06 14:34:59 +00:00
Francis Visoiu Mistrih	9052f50cb4	[Remarks] Refactor remark diagnostic emission in a RemarkStreamer This allows us to store more info about where we're emitting the remarks without cluttering LLVMContext. This is needed for future support for the remark section. Differential Revision: https://reviews.llvm.org/D58996 llvm-svn: 355507	2019-03-06 14:32:08 +00:00
Simon Pilgrim	1bdc2d1874	[DAGCombiner] Add SADDO/SSUBO combine support Basic constant handling folds, for both scalars and vectors Differential Revision: https://reviews.llvm.org/D58967 llvm-svn: 355506	2019-03-06 14:22:21 +00:00
Simon Pilgrim	642f53d292	[DAGCombiner] Enable SMULO/UMULO vector combine support (PR40442) Differential Revision: https://reviews.llvm.org/D58968 llvm-svn: 355495	2019-03-06 11:04:21 +00:00
Ayonam Ray	2a0f2c5ef3	[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Differential Revision: https://reviews.llvm.org/D52002 Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 355490	2019-03-06 10:01:02 +00:00
Ayonam Ray	af92b7a3b8	Reversing the commit of revision 355483 since it is giving a regression on a newly added test. llvm-svn: 355487	2019-03-06 07:51:28 +00:00
Ayonam Ray	6025fa8e30	[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Differential Revision: https://reviews.llvm.org/D52002 Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev llvm-svn: 355483	2019-03-06 07:27:45 +00:00
Mitch Phillips	f0c21e2ff5	Revert "[AtomicExpand] Allow libcall expansion for non-zero address spaces" for buildbot failures. llvm-svn: 355461	2019-03-06 00:25:40 +00:00
Philip Reames	1e4c5d3611	[AtomicExpand] Allow libcall expansion for non-zero address spaces Be consistent about how we treat atomics in non-zero address spaces. If we get to the backend, we tend to lower them as if in address space 0. Do the same if we need to insert a libcall instead. Differential Revision: https://reviews.llvm.org/D58760 llvm-svn: 355453	2019-03-05 23:00:14 +00:00
Craig Topper	57fd733140	Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures. llvm-svn: 355433	2019-03-05 19:18:16 +00:00
Craig Topper	2982b846e9	[Subtarget] Merge ProcSched and ProcDesc arrays in MCSubtargetInfo into a single array. These arrays are both keyed by CPU name and go into the same tablegenerated file. Merge them so we only need to store keys once. This also removes a weird space saving quirk where we used the ProcDesc.size() to create to build an ArrayRef for ProcSched. Differential Revision: https://reviews.llvm.org/D58939 llvm-svn: 355431	2019-03-05 18:54:38 +00:00
Craig Topper	ca26808da9	[Subtarget] Create a separate SubtargetSubtargetKV struct for ProcDesc to remove fields from the stack tables that aren't needed for CPUs The description for CPUs was just the CPU name wrapped with "Select the " and " processor". We can just do that directly in the help printer instead of making a separate version in the binary for each CPU. Also remove the Value field that isn't needed and was always 0. Differential Revision: https://reviews.llvm.org/D58938 llvm-svn: 355429	2019-03-05 18:54:34 +00:00
Sanjay Patel	8b72080d4d	[SDAG] move FP constant folding to helper function; NFC llvm-svn: 355411	2019-03-05 16:42:33 +00:00
Sanjay Patel	3b2d0bc7c2	[CodeGenPrepare] avoid crashing on non-canonical/degenerate code The test is reduced from an example in the post-commit thread for: rL354746 http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190304/632396.html While we must avoid dying here, the real question should be: Why is non-canonical and/or degenerate code making it to CGP when using the new pass manager? llvm-svn: 355345	2019-03-04 22:47:13 +00:00
Craig Topper	509a8a3cf1	[DAGCombiner][X86][SystemZ][AArch64] Combine some cases of (bitcast (build_vector constants)) between legalize types and legalize dag. This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality Differential Revision: https://reviews.llvm.org/D58884 llvm-svn: 355324	2019-03-04 19:12:16 +00:00
Eugene Leviant	daea28ab64	[DebugInfo] Construct nested types on behalf of owner CU Differential revision: https://reviews.llvm.org/D58786 llvm-svn: 355303	2019-03-04 07:15:36 +00:00
Heejin Ahn	195a62e9ae	[WebAssembly] Delete ThrowUnwindDest map from WasmEHFuncInfo Summary: Before when we implemented the first EH proposal, 'catch <tag>' instruction may not catch an exception so there were multiple EH pads an exception can unwind to. That means a BB could have multiple EH pad successors. Now after we switched to the new proposal, every 'catch' instruction catches an exception, and there is only one catchpad per catchswitch, so we at most have one EH pad successor, making `ThrowUnwindDest` map in `WasmEHInfo` unnecessary. Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems, because other optimization passes can split a BB that contains possibly throwing calls (previously invokes), and we have to update the map every time that happens, which is not easy for common CodeGen passes. This also correctly updates successor info in LateEHPrepare when we add a rethrow instruction. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58486 llvm-svn: 355296	2019-03-03 22:35:56 +00:00
Simon Pilgrim	37a63a748e	Use SDValue::getConstantOperandAPInt helper where possible. NFCI. llvm-svn: 355267	2019-03-02 11:11:22 +00:00
Craig Topper	4cfc39179e	[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary. Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355224	2019-03-01 20:18:38 +00:00
Thomas Lively	f3b4f99007	[WebAssembly] Remove uses of ThreadModel Summary: In the clang UI, replaces -mthread-model posix with -matomics as the source of truth on threading. In the backend, replaces -thread-model=posix with the atomics target feature, which is now collected on the WebAssemblyTargetMachine along with all other used features. These collected features will also be used to emit the target features section in the future. The default configuration for the backend is thread-model=posix and no atomics, which was previously an invalid configuration. This change makes the default valid because the thread model is ignored. A side effect of this change is that objects are never emitted with passive segments. It will instead be up to the linker to decide whether sections should be active or passive based on whether atomics are used in the final link. Reviewers: aheejin, sbc100, dschuff Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, steven_wu, dexonsmith, rupprecht, jfb, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58742 llvm-svn: 355112	2019-02-28 18:39:08 +00:00
Bjorn Pettersson	d30f308a9f	Add support for computing "zext of value" in KnownBits. NFCI Summary: The description of KnownBits::zext() and KnownBits::zextOrTrunc() has confusingly been telling that the operation is equivalent to zero extending the value we're tracking. That has not been true, instead the user has been forced to explicitly set the extended bits as known zero afterwards. This patch adds a second argument to KnownBits::zext() and KnownBits::zextOrTrunc() to control if the extended bits should be considered as known zero or as unknown. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58650 llvm-svn: 355099	2019-02-28 15:45:29 +00:00
Matt Arsenault	d3093c2f1f	GlobalISel: Implement fewerElementsVector for phi llvm-svn: 355048	2019-02-28 00:16:32 +00:00
Matt Arsenault	72bcf15dbf	GlobalISel: Implement moreElementsVector for phi llvm-svn: 355047	2019-02-28 00:01:05 +00:00
Philip Reames	288a95fc8c	Seperate volatility and atomicity/ordering in SelectionDAG At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..). This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success. To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll).. Previously landed D57593 Fix a bug in the definition of isUnordered on MachineMemOperand D57596 [CodeGen] Be conservative about atomic accesses as for volatile D57802 Be conservative about unordered accesses for the moment rL353959: [Tests] First batch of cornercase tests for unordered atomics. rL353966: [Tests] RMW folding tests w/unordered atomic operations. rL353972: [Tests] More unordered atomic lowering tests. rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface rL354740: [Hexagon, SystemZ] Be super conservative about atomics rL354800: [Lanai] Be super conservative about atomics rL354845: [ARM] Be super conservative about atomics Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.) Differential Revision: https://reviews.llvm.org/D57601 llvm-svn: 355025	2019-02-27 20:20:08 +00:00
Eugene Leviant	7f78d4712f	[DebugInfo] Apply subprogram attributes on behalf of owner CU When using full LTO it is possible that template function definition DIE is bound to one compilation unit and it's declaration to another. We should add function declaration attributes on behalf of its owner CU otherwise we may end up with malformed file identifier in function declaration DW_AT_decl_file attribute. Differential revision: https://reviews.llvm.org/D58538 llvm-svn: 354978	2019-02-27 14:46:59 +00:00
Petar Avramovic	bd39569913	[MIPS GlobalISel] Select G_UADDO Lower G_UADDO. Legalize G_UADDO for MIPS32 Differential Revision: https://reviews.llvm.org/D58671 llvm-svn: 354900	2019-02-26 17:22:42 +00:00
Nirav Dave	582d46328c	[DAG] Fix constant store folding to handle non-byte sizes. Avoid crashes from zero-byte values due to sub-byte store sizes. Reviewers: uabelho, courbet, rnk Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58626 llvm-svn: 354884	2019-02-26 15:02:32 +00:00
Simon Pilgrim	566177c3d5	[LegalizeDAG] Use APInt::getSplat helper to create bitreverse masks. NFCI. llvm-svn: 354867	2019-02-26 11:44:23 +00:00
Simon Pilgrim	810fa04ac7	[LegalizeDAG] Expand SADDO/SSUBO using SADDSAT/SSUBSAT (PR37763) If SADDSAT/SSUBSAT are legal, then we can expand SADDO/SSUBO by performing a ADD/SUB and a SADDO/SSUBO and then compare the results. I looked at doing this for UADDO/USUBO as well but as we don't have to do as many range comparisons I didn't see any/much benefit. Differential Revision: https://reviews.llvm.org/D58637 llvm-svn: 354866	2019-02-26 11:27:53 +00:00
Aaron Smith	1d5f8632d7	[CodeView] Emit HasConstructorOrDestructor class option for non-trivial constructors Reviewers: zturner, rnk, llvm-commits, aleksandr.urakov Reviewed By: zturner, rnk Subscribers: jdoerfert, majnemer, asmith Tags: #llvm Differential Revision: https://reviews.llvm.org/D44406 llvm-svn: 354841	2019-02-26 03:23:56 +00:00
Matt Arsenault	752579736e	RegBankSelect: Handle slightly more complex value mappings Try to use concat_vectors. Also remove unnecessary assert on pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers for AMDGPU. llvm-svn: 354828	2019-02-25 22:24:13 +00:00
Matt Arsenault	0b14857415	RegisterScavenger: Allow fail without spill AMDGPU wants to use this in some contexts where the spilling is either impossible, or a worse alternative to doing something else. llvm-svn: 354816	2019-02-25 20:29:04 +00:00
Andrea Di Biagio	4a1e59a6e0	Fix a sign compare warning breaking the -Werror build. The warning was introduced at r354793. llvm-svn: 354810	2019-02-25 19:33:58 +00:00
Simon Pilgrim	80d0e9c563	[SelectionDAG] Add demanded elts variants to isConstOrConstSplat helpers. NFCI. These helpers extend the existing isConstOrConstSplat helper checks to support DemandedElts masks as well. We already had a local version of this in SelectionDAG that computeKnownBits/ComputeNumSignBits made use of, but this adds the functionality directly to the BuildVectorSDNode node and extends isConstOrConstSplat etc. to use that. This will allow us to reuse the functionality in SimplifyDemandedVectorElts/SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D58503 llvm-svn: 354797	2019-02-25 16:31:58 +00:00
Simon Pilgrim	28441ac75f	[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold Differential Revision: https://reviews.llvm.org/D58585 llvm-svn: 354793	2019-02-25 16:02:01 +00:00
Craig Topper	8c9724ea4f	[SelectionDAG] Add a OPC_CheckChild2CondCode to SelectionDAGISel to remove a MoveChild and MoveParent pair. OPC_CheckCondCode is always used as operand 2 of a setcc. And its always surrounded by a MoveChild2 and a MoveParent. By having a dedicated opcode for this case we can reduce the number of bytes needed for this pattern from 4 bytes to 2. This saves ~3000 bytes in the X86 table. llvm-svn: 354763	2019-02-25 03:11:44 +00:00
Craig Topper	be3348573e	[LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753	2019-02-24 19:23:36 +00:00
Sanjay Patel	cb04ba032f	[CGP] add special-cases to form unsigned add with overflow (PR40486) There's likely a missed IR canonicalization for at least 1 of these patterns. Otherwise, we wouldn't have needed the pattern-matching enhancement in D57516. Note that -- unlike usubo added with D57789 -- the TLI hook for this transform defaults to 'on'. So if there's any perf fallout from this, targets should look at how they're lowering the uaddo node in SDAG and/or override that hook. The x86 diffs suggest that there's some missing pattern-matching for forming inc/dec. This should fix the remaining known problems in: https://bugs.llvm.org/show_bug.cgi?id=40486 https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354746	2019-02-24 15:31:27 +00:00
Craig Topper	dc185522fb	[TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands. The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist. X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here. Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change. llvm-svn: 354738	2019-02-23 21:41:44 +00:00
Craig Topper	ccc860cb81	Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract" r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit." These were reverted in r354713 as their context depended on other patches that were reverted for a bug. llvm-svn: 354734	2019-02-23 19:51:32 +00:00
Jordan Rupprecht	6387fa2715	[NFC] Fix typos: preceeding -> preceding llvm-svn: 354715	2019-02-23 01:28:32 +00:00
Reid Kleckner	e3876637cf	Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713	2019-02-23 01:19:42 +00:00
Craig Topper	62619d064d	[LegalizeTypes] Use PromoteTargetBoolean in PromoteIntOp_ADDSUBCARRY instead of reimplementing it. NFCI llvm-svn: 354710	2019-02-23 00:38:19 +00:00
Daniel Sanders	07cda257f8	Restore ability for C++ API users to Enable IPRA. Summary: Prior to r310876 one of our out-of-tree targets was enabling IPRA by modifying the TargetOptions::EnableIPRA. This no longer works on current trunk since the useIPRA() hook overrides any values that are set in advance. This patch adjusts the behaviour of the hook so that API users and useIPRA() can both enable it but useIPRA() cannot disable it if the API user already enabled it. Reviewers: arsenm Reviewed By: arsenm Subscribers: wdng, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D38043 llvm-svn: 354692	2019-02-22 20:59:07 +00:00
Sanjay Patel	ffe1cf5e92	[CGP] move overflow intrinsic insertion to common location; NFCI We need to enhance the uaddo matching to handle special-cases as seen in PR40486 and PR31754. That means we won't necessarily have a def-use pattern, so we'll need to check dominance to determine where to place the intrinsic (as we already do for usubo). This preliminary patch is just rearranging the code, so the planned follow-up to improve uaddo will be more clear. llvm-svn: 354689	2019-02-22 20:20:24 +00:00
Matt Arsenault	7b55066a34	MIR: Preserve incoming frame index numbers Don't skip incrementing the frame index number if the object is dead. Instructions can still be referencing the old frame index number, and this doesn't attempt to remap those. The resulting MIR then fails to load because the use instructions use a higher frame index number than recorded list of stack objects. I'm not sure it's possible to craft a testcase with the existing set of passes. It requires selectively marking some stack objects dead in an essentially random order. StackSlotColoring condenses towards the low indexes. This avoids a regression in a future AMDGPU commit when some frame indexes are lowered separately from PEI. llvm-svn: 354688	2019-02-22 19:30:38 +00:00
Matt Arsenault	6d05d6a7b6	CodeGen: Make RegAllocRegistry a template class Will allow re-using the machinery for independent sets of register allocators. This will allow AMDGPU to use separate command line options for the allocator to use for SGPRs separate from VGPRs. llvm-svn: 354687	2019-02-22 19:16:52 +00:00
Guozhi Wei	4c8e480358	[MBP] Factor out function hasViableTopFallthrough and enhancement This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect. Differential Revision: https://reviews.llvm.org/D58393 llvm-svn: 354682	2019-02-22 18:04:37 +00:00
Nirav Dave	46f939c118	Disable big-endian constant store merges from rL354676. llvm-svn: 354677	2019-02-22 16:20:34 +00:00
Nirav Dave	44037d7a63	[DAGCombine] Fold overlapping constant stores Fold a smaller constant store into larger constant stores immediately preceeding it. Reviewers: rnk, courbet Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58468 llvm-svn: 354676	2019-02-22 16:00:19 +00:00
Craig Topper	fa6187d230	[LegalizeVectorOps] Improve the placement of ANDs in the ExpandLoad path for non-byte-sized loads. When we need to merge two adjacent loads the AND mask for the low piece was still sized for the full src element size. But we didn't have that many bits. The upper bits are already zero due to the SRL. So we can skip the AND if we're going to combine with the high bits. We do need an AND to clear out any bits from the high part. We were anding the high part before combining with the low part, but it looks like ANDing after the OR gets better results. So we can just emit the final AND after the optional concatentation is done. That will handling skipping before the OR and get rid of extra high bits after the OR. llvm-svn: 354655	2019-02-22 07:03:25 +00:00
Craig Topper	069cf05e87	[LegalizeVectorOps] Simplify the non-byte sized load handling VectorLegalizer::ExpandLoad. NFCI Remove an if that should always be true. Merge the body of another into the only block that could make the if true. llvm-svn: 354654	2019-02-22 06:18:33 +00:00
Matt Arsenault	0280a5e143	DAG: Add helper for creating shifts with correct type llvm-svn: 354649	2019-02-22 03:38:47 +00:00
Craig Topper	be22f329a9	[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract. Otherwise we end up creating extract_vector_elts that then each need to have their input promoted. This can lead to truncates needing to be emitted for each of those. But we already emitted any_extends when we legalized the extract_subvector. So now we have pairs of any_extend+trunc that partially cancel. But depending on how DAGCombiner visits them we can get weird results. By promoting the input at the same time we can create only a single any_extend or truncate. There's one regression in the vector-narrow-binop.ll case, but that looks easy to fix with a follow up patch. llvm-svn: 354647	2019-02-22 01:49:50 +00:00
Sanjay Patel	ba5ee817e9	[DAGCombiner] prevent infinite looping by truncating 'and' (PR40793) This fold can occur during legalization, so it can fight with promotion to the larger type. It apparently takes a special sequence and subtarget to avoid more basic simplifications that would hide the problem. But there's a bigger question raised here: why does distributeTruncateThroughAnd() even exist? It duplicates functionality from a more minimal pattern that we already have. But getting rid of this function requires some preliminary steps. https://bugs.llvm.org/show_bug.cgi?id=40793 llvm-svn: 354594	2019-02-21 16:01:48 +00:00
Matt Arsenault	8df2f3dab2	RegBankSelect: Allow targets to introduce control flow for mapping For AMDGPU, if an operand requires an SGPR but is only available as a VGPR, a loop needs to be introduced to execute the instruction with each unique combination of values across all lanes. The rest of the instructions in the block will be moved to a new block following the loop. Check if the next instruction's parent changed, and update the iterators and insertion block if this happened. Tests will be included in a future patch. llvm-svn: 354591	2019-02-21 15:48:13 +00:00
Clement Courbet	a0321c23e8	Re-land part of r354244 "[DAGCombiner] Eliminate dead stores to stack." This part introduces the lifetime node. llvm-svn: 354578	2019-02-21 12:59:36 +00:00
Xin Tong	2d84c00dfa	Add skipFunction to PostRA machine sinking pass. Summary: Add skipFunction to PostRA machine sinking pass. Reviewers: junbuml Subscribers: arsenm, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57847 llvm-svn: 354541	2019-02-21 02:11:06 +00:00
Sanjay Patel	198cc305e9	[CGP] match a special-case of unsigned subtract overflow This is the 'sub0' (negate) pattern from PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 llvm-svn: 354519	2019-02-20 21:23:04 +00:00
Nirav Dave	48cf37b55c	[DAGCombine] Generalize Dead Store to overlapping stores. Summary: Remove stores that are immediately overwritten by larger stores. Reviewers: courbet, rnk Reviewed By: rnk Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58467 llvm-svn: 354518	2019-02-20 21:07:50 +00:00
Craig Topper	8d9c224a8c	[SelectionDAG] Teach GetDemandedBits to look at the known zeros of the LHS when handling ISD::AND If the LHS has known zeros, then the RHS immediate mask might have been simplified to remove those bits. This patch adds a call to computeKnownBits to get the known zeroes to handle that possibility. I left an early out to skip the call if all of the demanded bits are set in the mask. Differential Revision: https://reviews.llvm.org/D58464 llvm-svn: 354514	2019-02-20 20:52:26 +00:00
Nikita Popov	c3b496de7a	[SDAG] Support vector UMULO/SMULO Second part of https://bugs.llvm.org/show_bug.cgi?id=40442. This adds an extra UnrollVectorOverflowOp() method to SDAG, because the general UnrollOverflowOp() method can't deal with multiple results. Additionally we need to expand UMULO/SMULO during vector op legalization, as it may result in unrolling, which may need additional type legalization. Differential Revision: https://reviews.llvm.org/D57997 llvm-svn: 354513	2019-02-20 20:41:44 +00:00
Craig Topper	f4923db5a3	Revert r354498 "[X86] Add test case to show missed opportunity to remove an explicit AND on the bit position from BT when it has known zeros." I accidentally committed more than just the test. llvm-svn: 354499	2019-02-20 18:47:26 +00:00
Craig Topper	f8498a615b	[X86] Add test case to show missed opportunity to remove an explicit AND on the bit position from BT when it has known zeros. If the bit position has known zeros in it, then the AND immediate will likely be optimized to remove bits. This can prevent GetDemandedBits from recognizing that the AND is unnecessary. llvm-svn: 354498	2019-02-20 18:45:38 +00:00
Matt Arsenault	75e30c4d5d	GlobalISel: Fix fewerElementsVector for ctlz with different result type Also complete the set of related operations. llvm-svn: 354480	2019-02-20 16:42:52 +00:00
Matt Arsenault	c4d07554e4	GlobalISel: Implement moreElementsVector for g_insert results llvm-svn: 354477	2019-02-20 16:11:22 +00:00
Clement Courbet	62b3b91ab2	Re-land the refactoring part of r354244 "[DAGCombiner] Eliminate dead stores to stack." This is an NFC. llvm-svn: 354476	2019-02-20 15:45:58 +00:00
David Green	cb5a48b060	[Codegen] Remove dead flags on Physical Defs in machine cse We may leave behind incorrect dead flags on instructions that are CSE'd. Make sure we remove the dead flags on physical registers to prevent other incorrect code motion. Differential Revision: https://reviews.llvm.org/D58115 llvm-svn: 354443	2019-02-20 10:22:18 +00:00
Mikael Holmen	2d6bb13443	[RegAllocGreedy] Take last chance recoloring into account in split and assign Summary: This is a follow-up to r353988 where tryEvict was extended to take last chance recoloring into account. Now we do the same thing for trySplit and tryAssign. Now we always pass a "FixedRegisters" argument to canEvictInterference and tryEvict so it doesn't need to have a default value anymore. The need for this was found long ago in an out-of-tree target. Unfortunately I don't have a reproducer for an in-tree target. Reviewers: qcolombet, rudkx Reviewed By: qcolombet, rudkx Subscribers: rudkx, MatzeB, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58376 llvm-svn: 354439	2019-02-20 07:14:39 +00:00

... 24 25 26 27 28 ...

28401 Commits