llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	23f03f5059	AMDGPU: Fix iterator crash in AMDGPUPromoteAlloca The lifetime intrinsic was erased, which was the next iterator. llvm-svn: 363668	2019-06-18 12:23:44 +00:00
Matt Arsenault	d5ce8ec778	AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scale llvm-svn: 363667	2019-06-18 12:23:42 +00:00
Sjoerd Meijer	7a7009f7c8	[ARM] Some Thumb2ITBlock clean ups. NFC Some more refactoring, like registering the IT Block pass, less cryptic variable names, and some simplification of loops. Differential Revision: https://reviews.llvm.org/D63419 llvm-svn: 363666	2019-06-18 12:13:11 +00:00
Jonas Paulsson	5c64a8c4c6	[SystemZ] Fix AHIMuxK pseudo expansion. Do not emit a copy if the source and destination registers are the same. Review: Ulrich Weigand llvm-svn: 363665	2019-06-18 12:10:02 +00:00
Valery Pykhtin	7e854e1cdd	[AMDGPU] Speed up live-in virtual register set computaion in GCNScheduleDAGMILive. Differential revision: https://reviews.llvm.org/D62401 llvm-svn: 363661	2019-06-18 11:43:17 +00:00
Graham Hunter	43854e3ccc	[SVE][IR] Scalable Vector IR Type with pr42210 fix Recommit of D32530 with a few small changes: - Stopped recursively walking through aggregates in the verifier, so that we don't impose too much overhead on large modules under LTO (see PR42210). - Changed tests to match; the errors are slightly different since they only report the array or struct that actually contains a scalable vector, rather than all aggregates which contain one in a nested member. - Corrected an older comment Reviewers: thakis, rengolin, sdesmalen Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D63321 llvm-svn: 363658	2019-06-18 10:11:56 +00:00
Simon Pilgrim	7dd529e54d	[X86] Replace any_extend* vector extensions with zero_extend* equivalents First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal. In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep...... Differential Revision: https://reviews.llvm.org/D63326 llvm-svn: 363655	2019-06-18 09:50:13 +00:00
Yevgeny Rouban	69daf4a72d	[SimplifyCFG] NFC, prof branch_weighs handling is simplified Using the new SwitchInstProfUpdateWrapper this patch simplifies 3 places of prof branch_weights handling. Differential Revision: https://reviews.llvm.org/D62123 llvm-svn: 363652	2019-06-18 06:50:52 +00:00
Craig Topper	f4284f8a9d	[X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper function. NFCI Preliminary step for D59909 llvm-svn: 363645	2019-06-18 04:23:58 +00:00
Craig Topper	587427716c	[X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644	2019-06-18 03:23:15 +00:00
Craig Topper	8582ecd8d9	[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class. Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643	2019-06-18 03:23:11 +00:00
Tom Stellard	1f7f64665c	GlobalISel: Remove redundant pass initialization Summary: All the GlobalISel passes are initialized when the target calls initializeGlobalISel(), so we don't need to call the initializers from the pass constructors. Reviewers: qcolombet, t.p.northover, paquette, dsanders, aemerson, aditya_nandakumar Reviewed By: aemerson Subscribers: rovka, kristof.beyls, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63235 llvm-svn: 363642	2019-06-18 02:05:06 +00:00
Matt Arsenault	5a321b899e	GlobalISel: Use the original flags when lowering fneg to fsub This was ignoring the flag on fneg, and using the source instruction's flags. Also fixes tests missing from r358702. Note the expansion itself isn't correct without nnan, but that should be fixed separately. llvm-svn: 363637	2019-06-17 23:48:43 +00:00
Peter Collingbourne	d57f7cc15e	hwasan: Use bits [3..11) of the ring buffer entry address as the base stack tag. This saves roughly 32 bytes of instructions per function with stack objects and causes us to preserve enough information that we can recover the original tags of all stack variables. Now that stack tags are deterministic, we no longer need to pass -hwasan-generate-tags-with-calls during check-hwasan. This also means that the new stack tag generation mechanism is exercised by check-hwasan. Differential Revision: https://reviews.llvm.org/D63360 llvm-svn: 363636	2019-06-17 23:39:51 +00:00
Peter Collingbourne	fb9ce100d1	hwasan: Add a tag_offset DWARF attribute to instrumented stack variables. The goal is to improve hwasan's error reporting for stack use-after-return by recording enough information to allow the specific variable that was accessed to be identified based on the pointer's tag. Currently we record the PC and lower bits of SP for each stack frame we create (which will eventually be enough to derive the base tag used by the stack frame) but that's not enough to determine the specific tag for each variable, which is the stack frame's base tag XOR a value (the "tag offset") that is unique for each variable in a function. In IR, the tag offset is most naturally represented as part of a location expression on the llvm.dbg.declare instruction. However, the presence of the tag offset in the variable's actual location expression is likely to confuse debuggers which won't know about tag offsets, and moreover the tag offset is not required for a debugger to determine the location of the variable on the stack, so at the DWARF level it is represented as an attribute so that it will be ignored by debuggers that don't know about it. Differential Revision: https://reviews.llvm.org/D63119 llvm-svn: 363635	2019-06-17 23:39:41 +00:00
Amara Emerson	146882242f	[GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block. Inter-block localization is the same as what currently happens, except now it only runs on the entry block because that's where the problematic constants with long live ranges come from. The second phase is a new intra-block localization phase which attempts to re-sink the already localized instructions further right before one of the multiple uses. One additional change is to also localize G_GLOBAL_VALUE as they're constants too. However, on some targets like arm64 it takes multiple instructions to materialize the value, so some additional heuristics with a TTI hook have been introduced attempt to prevent code size regressions when localizing these. Overall, these changes improve CTMark code size on arm64 by 1.2%. Full code size results: Program baseline new diff ------------------------------------------------------------------------------ test-suite...-typeset/consumer-typeset.test 1249984 1217216 -2.6% test-suite...:: CTMark/ClamAV/clamscan.test 1264928 1232152 -2.6% test-suite :: CTMark/SPASS/SPASS.test 1394092 1361316 -2.4% test-suite...Mark/mafft/pairlocalalign.test 731320 714928 -2.2% test-suite :: CTMark/lencod/lencod.test 1340592 `1324200` -1.2% test-suite :: CTMark/kimwitu++/kc.test 3853512 3820420 -0.9% test-suite :: CTMark/Bullet/bullet.test 3406036 3389652 -0.5% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8017000 8016992 -0.0% test-suite...TMark/7zip/7zip-benchmark.test 2856588 2856588 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 765704 765704 0.0% Geomean difference -1.2% Differential Revision: https://reviews.llvm.org/D63303 llvm-svn: 363632	2019-06-17 23:20:29 +00:00
Michael Berg	f9bff2a55e	Propagate fmf in IRTranslate for fneg Summary: This case is related to D63405 in that we need to be propagating FMF on negates. Reviewers: volkan, spatel, arsenm Reviewed By: arsenm Subscribers: wdng, javed.absar Differential Revision: https://reviews.llvm.org/D63458 llvm-svn: 363631	2019-06-17 23:19:40 +00:00
Craig Topper	971ad74ba2	Use VR128X instead of FR32X/FR64X for the register class in VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630	2019-06-17 23:08:29 +00:00
Craig Topper	0e18300802	[X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what types are allowed here. NFC Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code. llvm-svn: 363629	2019-06-17 23:08:09 +00:00
Stanislav Mekhanoshin	121956108f	[AMDGPU] Use custom inserter for gfx10 VOP2b This is part of the approved D63204 pending parent revision. This small change is in fact a part of the VOP2b legalization which does not technically belong to wave32 support, so extracted separately. llvm-svn: 363625	2019-06-17 22:37:37 +00:00
Philip Reames	44475363e8	Teach getSCEVAtScope how to handle loop phis w/invariant operands in loops w/taken backedges This patch really contains two pieces: Teach SCEV how to fold a phi in the header of a loop to the value on the backedge when a) the backedge is known to execute at least once, and b) the value is safe to use globally within the scope dominated by the original phi. Teach IndVarSimplify's rewriteLoopExitValues to allow loop invariant expressions which already exist (and thus don't need new computation inserted) even in loops where we can't optimize away other uses. Differential Revision: https://reviews.llvm.org/D63224 llvm-svn: 363619	2019-06-17 21:06:17 +00:00
Daniel Sanders	184c8ee920	[globalisel] Fix iterator invalidation in the extload combines Summary: Change the way we deal with iterator invalidation in the extload combines as it was still possible to neglect to visit a use. Even worse, it happened in the in-tree test cases and the checks weren't good enough to detect it. We now take a cheap copy of the use list before iterating over it. This prevents iterator invalidation from occurring and has the nice side effect of making the existing schedule-for-erase/schedule-for-insert mechanism moot. Reviewers: aditya_nandakumar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, javed.absar, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61813 llvm-svn: 363616	2019-06-17 20:56:31 +00:00
Stanislav Mekhanoshin	3138278287	[AMDGPU] Propagate function attributes thru bitcasts AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614	2019-06-17 20:42:48 +00:00
Philip Reames	fe8bd96ebd	Fix a bug w/inbounds invalidation in LFTR (recommit) Recommit r363289 with a bug fix for crash identified in pr42279. Issue was that a loop exit test does not have to be an icmp, leading to a null dereference crash when new logic was exercised for that case. Test case previously committed in r363601. Original commit comment follows: This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV. The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program. As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well. (Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.) Differential Revision: https://reviews.llvm.org/D62939 llvm-svn: 363613	2019-06-17 20:32:22 +00:00
Nicolai Haehnle	ae4fcb97dd	AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printer Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602	2019-06-17 19:28:43 +00:00
Joseph Tremoulet	daa1ae6142	[EarlyCSE] Fix hashing of self-compares Summary: Update compare normalization in SimpleValue hashing to break ties (when the same value is being compared to itself) by switching to the swapped predicate if it has a lower numerical value. This brings the hashing in line with isEqual, which already recognizes the self-compares with swapped predicates as equal. Fixes PR 42280. Reviewers: spatel, efriedma, nikic, fhahn, uabelho Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63349 llvm-svn: 363598	2019-06-17 19:11:28 +00:00
Alina Sbirlea	7a0098aa6e	[MemorySSA] Don't use template when the clone is a simplified instruction. Summary: LoopRotate doesn't create a faithful clone of an instruction, it may simplify it beforehand. Hence the clone of an instruction that has a MemoryDef associated may not be a definition, but a use or not a memory alternig instruction. Don't rely on the template when the clone may be simplified. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63355 llvm-svn: 363597	2019-06-17 18:58:40 +00:00
Jessica Paquette	49537bbf74	[GlobalISel][AArch64] Fold G_SUB into G_ICMP when it's safe to do so Basically porting over the behaviour in AArch64ISelLowering to GISel. See emitComparison for reference. When we have something like this: ``` lhs = G_SUB 0, y ... G_ICMP lhs, rhs ``` We can fold away the G_SUB and produce a cmn instead, given that we produce the same value in NZCV. Add a test showing that the transformation works, and also showing that we don't perform the transformation when it's unsafe. Also factor out the CSet emission into emitCSetForICMP. Differential Revision: https://reviews.llvm.org/D63163 llvm-svn: 363596	2019-06-17 18:40:06 +00:00
Craig Topper	f3f968adcd	[X86] Add TB_NO_REVERSE to some memory folding table entries where the register form requires 64-bit mode, but the memory form does not. We don't know if its safe to unfold if we're in 32-bit mode. This is simlar to what was done to some load opcodes in r363523. I think its pretty unlikely we will try to unfold these anyway so I don't think this is testable. llvm-svn: 363595	2019-06-17 18:38:07 +00:00
Simon Pilgrim	835999e48a	[X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026) If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592	2019-06-17 18:20:04 +00:00
Alina Sbirlea	05f77803f4	[MemorySSA] Add all MemoryPhis before filling their values. Summary: Add all MemoryPhis in IDF before filling in their incomign values. Otherwise, a new Phi can be added that needs to become the incoming value of another Phi. Test fails the verification in verifyPrevDefInPhis. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63353 llvm-svn: 363590	2019-06-17 18:16:53 +00:00
Stanislav Mekhanoshin	a9191c8492	[AMDGPU] gfx1010 wavefrontsize intrinsic folding Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588	2019-06-17 17:57:50 +00:00
Matt Arsenault	6d741f29ec	AMDGPU: Fold readlane/readfirstlane calls llvm-svn: 363587	2019-06-17 17:52:35 +00:00
Stanislav Mekhanoshin	ad04e7ad42	[AMDGPU] Pass to propagate ABI attributes from kernels to the functions The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586	2019-06-17 17:47:28 +00:00
Simon Pilgrim	bb9adfdb4e	[X86][AVX] Split under-aligned vector nt-stores. If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582	2019-06-17 17:22:38 +00:00
Warren Ristow	6452bdd29b	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581	2019-06-17 17:20:08 +00:00
Matt Arsenault	3e140066bc	GlobalISel: Ignore callsite attributes when picking intrinsic type A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. I fixed the same bug in SelectionDAG in r287593. llvm-svn: 363580	2019-06-17 17:01:35 +00:00
Matt Arsenault	a7f09f3c9e	GlobalISel: Verify intrinsics I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579	2019-06-17 17:01:32 +00:00
Matt Arsenault	fee1949b35	AMDGPU/GlobalISel: Account for multiple defs when finding intrinsic ID llvm-svn: 363578	2019-06-17 17:01:27 +00:00
Stanislav Mekhanoshin	5d00c3060e	[AMDGPU] gfx1010 wave32 metadata Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577	2019-06-17 16:48:56 +00:00
Tom Stellard	8b1c53b528	AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECT Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576	2019-06-17 16:27:43 +00:00
Francis Visoiu Mistrih	34667519dc	[Remarks] Extend -fsave-optimization-record to specify the format Use -fsave-optimization-record=<format> to specify a different format than the default, which is YAML. For now, only YAML is supported. llvm-svn: 363573	2019-06-17 16:06:00 +00:00
Simon Pilgrim	12cb792d7f	[X86] combineLoad - begun making the load split code more generic. NFCI. This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment. This is necessary for an upcoming patch to split under-aligned non-temporal vector loads. llvm-svn: 363570	2019-06-17 15:54:36 +00:00
Whitney Tsang	15b7f5b72d	PHINode: introduce setIncomingValueForBlock() function, and use it. Summary: There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue() but no function to replace incoming value for a specified BasicBlock* predecessor. Clearly, there are a lot of places that could use that functionality. Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn Reviewed By: Meinersbur, fhahn Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D63338 llvm-svn: 363566	2019-06-17 14:38:56 +00:00
Simon Pilgrim	454e6b9010	[X86][SSE] Prevent misaligned non-temporal vector load/store combines For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564	2019-06-17 14:26:10 +00:00
Matt Arsenault	1df203d78e	InferAddressSpaces: Fix cloning original addrspacecast If an addrspacecast needed to be inserted again, this was creating a clone of the original cast for each user. Just use the original, which also saves losing the value name. llvm-svn: 363562	2019-06-17 14:13:29 +00:00
Matt Arsenault	b10f097833	AMDGPU: Ignore subtarget for InferAddressSpaces Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561	2019-06-17 14:13:24 +00:00
Matt Arsenault	29e792659b	AMDGPU/GlobalISel: Fix default mapping for non-register operands Tests will be in future commits when new intrinsics are handled here. llvm-svn: 363559	2019-06-17 13:52:19 +00:00
Matt Arsenault	e683eba0ed	AMDGPU: Cleanup custom PseudoSourceValue definitions Use separate enums for each kind, avoid repeating overloads, and add missing classof implementation. llvm-svn: 363558	2019-06-17 13:52:15 +00:00
Sam Parker	1bd3d00e7e	[CodeGen] Check for HardwareLoop Latch ExitBlock The HardwareLoops pass finds exit blocks with a scevable exit count. If the target specifies to update the loop counter in a register, through a phi, we need to ensure that the exit block is a latch so that we can insert the phi with the correct value for the incoming edge. Differential Revision: https://reviews.llvm.org/D63336 llvm-svn: 363556	2019-06-17 13:39:28 +00:00
Bjorn Pettersson	83773b77a5	[LV] Deny irregular types in interleavedAccessCanBeWidened Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 llvm-svn: 363547	2019-06-17 12:02:24 +00:00
Luis Marques	2e46312ffd	[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544	2019-06-17 10:54:12 +00:00
Fangrui Song	5401c2db6e	Fix clang -Wcovered-switch-default after stack-id change by D60137 llvm-svn: 363543	2019-06-17 10:20:20 +00:00
Simon Pilgrim	ef78e55205	[SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source. llvm-svn: 363542	2019-06-17 10:14:52 +00:00
Sam Parker	60d6fb2a63	[SCEV] Use NoWrapFlags when expanding a simple mul Second functional change following on from rL362687. Pass the NoWrapFlags from the MulExpr to InsertBinop when we're generating a shl or mul. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363540	2019-06-17 10:05:18 +00:00
Fangrui Song	4bde5d3c08	[ARM] Fix another -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265 llvm-svn: 363535	2019-06-17 09:29:50 +00:00
Fangrui Song	89d6905c59	[ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D63265 llvm-svn: 363534	2019-06-17 09:26:50 +00:00
Sander de Smalen	5d6ee76c16	Describe stack-id as an enum This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533	2019-06-17 09:13:29 +00:00
Sam Parker	a059efa885	[ARM] Remove ARMComputeBlockSize Forgot to remove file! llvm-svn: 363532	2019-06-17 09:13:10 +00:00
Sam Parker	f7c0b3aeb2	[ARM] Add ARMBasicBlockInfo.cpp Forgot to add file! llvm-svn: 363531	2019-06-17 09:05:43 +00:00
Sam Parker	966f4e874e	[ARM] Extract some code from ARMConstantIslandPass Create the ARMBasicBlockUtils class for tracking and querying basic blocks sizes so we can use them when generating low-overhead loops. Differential Revision: https://reviews.llvm.org/D63265 llvm-svn: 363530	2019-06-17 08:49:09 +00:00
Hans Wennborg	a9e5d2f35d	Re-commit r357452 (take 3): "SimplifyCFG SinkCommonCodeFromPredecessors: Also sink function calls without used results (PR41259)" Third time's the charm. This was reverted in r363220 due to being suspected of an internal benchmark regression and a test failure, none of which turned out to be caused by this. llvm-svn: 363529	2019-06-17 07:47:28 +00:00
Yevgeny Rouban	ee62c40eae	[SimplifyCFG] Fix prof branch_weights MD while removing unreachable switch cases SimplifyCFG has a bug that results in inconsistent prof branch_weights metadata if unreachable switch cases are removed. This patch fixes this bug by making use of the newly introduced SwitchInstProfUpdateWrapper class (see patch D62122). A new test is created. Differential Revision: https://reviews.llvm.org/D62186 llvm-svn: 363527	2019-06-17 05:55:12 +00:00
Justin Hibbits	1d1cf30b73	PowerPC: Optimize SPE double parameter calling setup Summary: SPE passes doubles the same as soft-float, in register pairs as i32 types. This is all handled by the target-independent layer. However, this is not optimal when splitting or reforming the doubles, as it pushes to the stack and loads from, on either side. For instance, to pass a double argument to a function, assuming the double value is in r5, the sequence currently looks like this: evstdd 5, X(1) lwz 3, X(1) lwz 4, X+4(1) Likewise, to form a double into r5 from args in r3 and r4: stw 3, X(1) stw 4, X+4(1) evldd 5, X(1) This optimizes the fence to use SPE instructions. Now, to pass a double to a function: mr 4, 5 evmergehi 3, 5, 5 And to form a double into r5 from args in r3 and r4: evmergelo 5, 3, 4 This is comparable to the way that gcc generates the double splits. This also fixes a bug with expanding builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54583 llvm-svn: 363526	2019-06-17 03:15:23 +00:00
Craig Topper	9f2f127009	[X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523	2019-06-16 22:33:09 +00:00
Roman Lebedev	5a663bd77a	[InstSimplify] Fix addo/subo undef folds (PR42209) Fix folds of addo and subo with an undef operand to be: `@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`, as per LLVM undef rules. Same for commuted variants. Based on the original version of the patch by @nikic. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 \| PR42209 ]] Differential Revision: https://reviews.llvm.org/D63065 llvm-svn: 363522	2019-06-16 20:39:45 +00:00
Nicolai Haehnle	41abf2766e	AMDGPU: Prepare for explicit absolute relocations in code generation Summary: We will use absolute relocations for LDS symbols. Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61492 llvm-svn: 363517	2019-06-16 17:43:37 +00:00
Nicolai Haehnle	6d71be4e67	AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0 Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516	2019-06-16 17:32:01 +00:00
Nicolai Haehnle	490e83cd43	AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514	2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin	5250021672	[AMDGPU] gfx10 conditional registers handling This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513	2019-06-16 17:13:09 +00:00
Sanjay Patel	c8d88ad1a9	[CodeGenPrepare][x86] shift both sides of a vector select when profitable This is based on the example/discussion in PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 Proper vector shift instructions don't appear until AVX2, so we may generate several extra instructions within a loop trying to compensate for that. It's difficult to recover from that shift expansion later than this, so use the existing TLI hook and splat analysis to enable better codegen. This extends CGP functionality introduced with: rL201655 Differential Revision: https://reviews.llvm.org/D63233 llvm-svn: 363511	2019-06-16 15:29:03 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	fcffc2facc	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Simon Pilgrim	456ca5d7f7	[X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI. llvm-svn: 363501	2019-06-15 19:12:44 +00:00
Simon Pilgrim	90e87af303	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	990f3ceb67	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Aaron Puchert	e1dc495e63	[Clang] Harmonize Split DWARF options with llc Summary: With Split DWARF the resulting object file (then called skeleton CU) contains the file name of another ("DWO") file with the debug info. This can be a problem for remote compilation, as it will contain the name of the file on the compilation server, not on the client. To use Split DWARF with remote compilation, one needs to either * make sure only relative paths are used, and mirror the build directory structure of the client on the server, * inject the desired file name on the client directly. Since llc already supports the latter solution, we're just copying that over. We allow setting the actual output filename separately from the value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU. Fixes PR40276. Reviewers: dblaikie, echristo, tejohnson Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D59673 llvm-svn: 363496	2019-06-15 15:38:51 +00:00
Kang Zhang	2d51adcb57	[PowerPC] Set the innermost hot loop to align 32 bytes Summary: If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that we can decrease cache misses and branch-prediction misses. Actual alignment of the loop will depend on the hotness check and other logic in alignBlocks. The old code will only align hot loop to 32 bytes when the LoopSize larger than 16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop to 32 bytes not only for the hot loop whose size is 16~32 bytes. Reviewed By: steven.zhang, jsji Differential Revision: https://reviews.llvm.org/D61228 llvm-svn: 363495	2019-06-15 15:10:24 +00:00
Gauthier Harnisch	83c7b61052	[clang] Add storage for APValue in ConstantExpr Summary: When using ConstantExpr we often need the result of the expression to be kept in the AST. Currently this is done on a by the node that needs the result and has been done multiple times for enumerator, for constexpr variables... . This patch adds to ConstantExpr the ability to store the result of evaluating the expression. no functional changes expected. Changes: - Add trailling object to ConstantExpr that can hold an APValue or an uint64_t. the uint64_t is here because most ConstantExpr yield integral values so there is an optimized layout for integral values. - Add basic* serialization support for the trailing result. - Move conversion functions from an enum to a fltSemantics from clang::FloatingLiteral to llvm::APFloatBase. this change is to make it usable for serializing APValues. - Add basic* Import support for the trailing result. - ConstantExpr created in CheckConvertedConstantExpression now stores the result in the ConstantExpr Node. - Adapt AST dump to print the result when present. basic* : None, Indeterminate, Int, Float, FixedPoint, ComplexInt, ComplexFloat, the result is not yet used anywhere but for -ast-dump. Reviewers: rsmith, martong, shafik Reviewed By: rsmith Subscribers: rnkovacs, hiraditya, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62399 llvm-svn: 363493	2019-06-15 10:24:47 +00:00
Fangrui Song	b6dc09e725	[BranchProbability] Delete a redundant overflow check llvm-svn: 363492	2019-06-15 10:09:59 +00:00
Nikita Popov	8550fb386a	[SCEV] Use unsigned/signed intersection type in SCEV Based on D59959, this switches SCEV to use unsigned/signed range intersection based on the sign hint. This will prefer non-wrapping ranges in the relevant domain. I've left the one intersection in getRangeForAffineAR() to use the smallest intersection heuristic, as there doesn't seem to be any obvious preference there. Differential Revision: https://reviews.llvm.org/D60035 llvm-svn: 363490	2019-06-15 09:15:52 +00:00
Nikita Popov	9145562b48	[SimplifyIndVar] Simplify non-overflowing saturating add/sub If we can detect that saturating math that depends on an IV cannot overflow, replace it with simple math. This is similar to the CVP optimization from D62703, just based on a different underlying analysis (SCEV vs LVI) that catches different cases. Differential Revision: https://reviews.llvm.org/D62792 llvm-svn: 363489	2019-06-15 08:48:52 +00:00
Fangrui Song	44cc4e9351	[RISCV] Simplify RISCVAsmBackend::writeNopData(). NFC llvm-svn: 363486	2019-06-15 06:14:15 +00:00
Michael Berg	ad6bb86b2d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	968b5f84af	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Matt Arsenault	9487278010	Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478	2019-06-15 00:33:26 +00:00
Mitch Phillips	0d44f129bb	Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect" This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit `c2864c0de0`. This reverts rL363410. llvm-svn: 363476	2019-06-14 23:45:34 +00:00
Michael Berg	69394bedc5	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Guozhi Wei	d2210af332	[MBP] Move a latch block with conditional exit and multi predecessors to top of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471	2019-06-14 23:08:59 +00:00
Akira Hatanaka	a704a8f28c	[ObjC][ARC] Delete ObjC runtime calls on global variables annotated with 'objc_arc_inert' Those calls are no-ops, so they can be safely deleted. rdar://problem/49839633 Differential Revision: https://reviews.llvm.org/D62433 llvm-svn: 363468	2019-06-14 22:06:32 +00:00
Matt Arsenault	aa41e92e17	AMDGPU: Avoid most waitcnts before calls Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465	2019-06-14 21:52:26 +00:00
Ziang Wan	af857b93df	Add --print-supported-cpus flag for clang. This patch allows clang users to print out a list of supported CPU models using clang [--target=<target triple>] --print-supported-cpus Then, users can select the CPU model to compile to using clang --target=<triple> -mcpu=<model> a.c It is a handy feature to help cross compilation. llvm-svn: 363464	2019-06-14 21:42:21 +00:00
Matt Arsenault	282dac717e	SROA: Allow eliminating addrspacecasted allocas There is a circular dependency between SROA and InferAddressSpaces today that requires running both multiple times in order to be able to eliminate all simple allocas and addrspacecasts. InferAddressSpaces can't remove addrspacecasts when written to memory, and SROA helps move pointers out of memory. This should avoid inserting new commuting addrspacecasts with GEPs, since there are unresolved questions about pointer wrapping between different address spaces. For now, don't replace volatile operations that don't match the alloca addrspace, as it would change the address space of the access. It may be still OK to insert an addrspacecast from the new alloca, but be more conservative for now. llvm-svn: 363462	2019-06-14 21:38:31 +00:00
Jinsong Ji	bbab7acedf	[PowerPC][NFC] Comments update and remove some unused def llvm-svn: 363461	2019-06-14 21:33:51 +00:00
Matt Arsenault	9e5fa33378	AMDGPU: Fix dropping memref for ds append/consume The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455	2019-06-14 21:01:24 +00:00
Matt Arsenault	1c5a87956f	AMDGPU: Set isTrap on S_TRAP This seems to only be used for generating some kind of documentation, but might as well set it. llvm-svn: 363454	2019-06-14 21:01:24 +00:00
Lang Hames	1b091540d2	[JITLink] Move JITLinkMemoryManager into its own header. llvm-svn: 363444	2019-06-14 19:41:21 +00:00
Francis Visoiu Mistrih	0b0851399e	[Remarks] Use the RemarkSetup error in setupOptimizationRemarks Added the errors in r363415 but they were not used in the RemarkStreamer. llvm-svn: 363439	2019-06-14 18:18:26 +00:00
Amara Emerson	f79d3bc724	[GlobalISel] Add a G_BRJT opcode. This is a branch opcode that takes a jump table pointer, jump table index and an index into the table to do an indirect branch. We pass both the table pointer and JTI to allow targets like ARM64 to more easily use the existing jump table compression optimization without having to walk up the block to find a paired G_JUMP_TABLE. Differential Revision: https://reviews.llvm.org/D63159 llvm-svn: 363434	2019-06-14 17:55:48 +00:00
Florian Hahn	dcdd12b68c	Revert Fix a bug w/inbounds invalidation in LFTR Reverting because it breaks a green dragon build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208 This reverts r363289 (git commit `eb88badff9`) llvm-svn: 363427	2019-06-14 17:23:09 +00:00
Florian Hahn	a19809045c	Revert [LFTR] Stylistic cleanup as suggested in last review comment of D62939 [NFC] Reverting because it depends on r363289, which breaks a green dragon build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208 This reverts r363292 (git commit `42a3fc133d`) llvm-svn: 363426	2019-06-14 17:22:56 +00:00
Florian Hahn	e1b4b1b46e	Revert [LFTR] Rename variable to minimize confusion [NFC] Reverting because it depends on r363289, which breaks a green dragon build: http://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/18208 This reverts r363293 (git commit `c37be29634`) llvm-svn: 363425	2019-06-14 17:22:49 +00:00
Jinsong Ji	c9e3dbb0a5	[PowerPC][NFC] Format comments in P9InstrResrouce.td llvm-svn: 363423	2019-06-14 17:04:24 +00:00
Shawn Landden	f2e60fc4e8	[SimpligyCFG] NFC intended, remove GCD that was only used for powers of two and replace with an equilivent countTrailingZeros. GCD is much more expensive than this, with repeated division. This depends on D60823 Differential Revision: https://reviews.llvm.org/D61151 llvm-svn: 363422	2019-06-14 16:56:49 +00:00
Valery Pykhtin	ffeb01c113	[AMDGPU] Don't constrain callees with inlinehint from inlining on MaxBB check Summary: Function bodies marked inline in an opencl source are eliminated but MaxBB check may prevent inlining them leaving undefined references. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, Anastasia, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63337 llvm-svn: 363418	2019-06-14 16:37:33 +00:00
Kevin P. Neal	fece7c6c83	[FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86. I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5 Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Craig Topper, Kevin P. Neal Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D63271 llvm-svn: 363417	2019-06-14 16:28:55 +00:00
Stanislav Mekhanoshin	cdf339266b	[AMDGPU] gfx1010 BoolReg definition. NFC. Earlier commit has added AMDGPUOperand::isBoolReg(). Turns out gcc issues warning about unused function since D63204 is not yet submitted. Added NFC part of D63204 to have a use of that function and mute the warning. llvm-svn: 363416	2019-06-14 16:25:46 +00:00
Francis Visoiu Mistrih	7a21113ce8	Reland: [Remarks] Refactor optimization remarks setup * Add a common function to setup opt-remarks * Rename common options to the same names * Add error types to distinguish between file errors and regex errors llvm-svn: 363415	2019-06-14 16:20:51 +00:00
Matt Arsenault	c2864c0de0	GlobalISel: Avoid producing Illegal copies in RegBankSelect Avoid producing illegal register bank copies for reg_sequence and phi. The default implementation assumes it is possible to pick any operand's bank and use that for the result, introducing a copy for operands with a different bank. This does not check for illegal copies. It is not legal to introduce a VGPR->SGPR copy, so any VGPR operand requires the result to be a VGPR. The changes in getInstrMappingImpl aren't strictly necessary, since AMDGPU now just bypasses this for reg_sequence/phi. This could be replaced with an assert in case other targets run into this. It is currently responsible for producing the error for unsatisfiable copies, but this will be better served with a verifier check. For phis, for now assume any undetermined operands must be VGPRs. Eventually, this needs to be able to defer mapping these operations. This also does not yet have a way to check for whether the block is in a divergent region. llvm-svn: 363410	2019-06-14 15:22:25 +00:00
Sanjay Patel	7ea378b940	[CodeGenPrepare] propagate debuginfo when copying a shuffle llvm-svn: 363409	2019-06-14 15:05:35 +00:00
Johannes Doerfert	282d34ee78	[Attributor] Disable the Attributor by default and fix a comment llvm-svn: 363408	2019-06-14 14:53:41 +00:00
Matt Arsenault	492d71cc99	AMDGPU: Fold readlane intrinsics of constants I'm not 100% sure about this, since I'm worried about IR transforms that might end up introducing divergence downstream once replaced with a constant, but I haven't come up with an example yet. llvm-svn: 363406	2019-06-14 14:51:26 +00:00
Mikhail Maltsev	d1cc2e1543	[ARM] Add MVE horizontal accumulation instructions This is the family of vector instructions that combine all the lanes in their input vector(s), and output a value in one or two GPRs. Differential Revision: https://reviews.llvm.org/D62670 llvm-svn: 363403	2019-06-14 14:31:13 +00:00
Eugene Leviant	c74910b842	Fix failing test on ARM buildbot r363261 caused test failure on 32-bit ARM buildbot, because of unsigned integer overflow. This patch fixes it changing offset type from size_t to uint64_t. llvm-svn: 363393	2019-06-14 13:45:21 +00:00
Matt Arsenault	731a81598e	RegBankSelect: Remove checks for invalid mappings Avoid a check for valid and a set of redundant asserts. The place InstructionMapping is constructed asserts all of the default fields are passed anyway for an invalid mapping, so don't overcomplicate this. llvm-svn: 363391	2019-06-14 13:42:40 +00:00
Matt Arsenault	5a86dbcf30	AMDGPU: Fix input chain when gluing copies to m0 I don't think this was causing any observable issues, but was making reading the DAG dump confusing. llvm-svn: 363389	2019-06-14 13:33:36 +00:00
Andrea Di Biagio	6b78e4d0a4	[MCA] Ignore invalid processor resource writes of zero cycles. NFCI In debug mode, the tool also raises a warning and prints out a message which helps identify the problematic MCWriteProcResEntry from the scheduling class. This message would have been useful to have when triaging PR42282. llvm-svn: 363387	2019-06-14 13:31:21 +00:00
Matt Arsenault	3062e87a1e	Fix not calling TargetCustom PSVs printer If the enum value was greater than the starting target custom value, the custom printer wasn't called. llvm-svn: 363386	2019-06-14 13:26:34 +00:00
Matt Arsenault	d3c84e6719	AMDGPU: Refactor to prepare for manually selecting more intrinsics llvm-svn: 363385	2019-06-14 13:26:32 +00:00
Matt Arsenault	74d67c2086	AMDGPU: Fix printing trailing whitespace after s_endpgm llvm-svn: 363384	2019-06-14 13:26:29 +00:00
Matt Arsenault	642f39c93e	AMDGPU: Fix missing const llvm-svn: 363383	2019-06-14 13:26:23 +00:00
Sjoerd Meijer	3058a62b90	[ARM] MVE VPT Block Pass Initial commit of a new pass to create vector predication blocks, called VPT blocks, that are supported by the Armv8.1-M MVE architecture. This is a first naive implementation. I.e., for 2 consecutive predicated instructions I1 and I2, for example, it will generate 2 VPT blocks: VPST I1 VPST I2 A more optimal implementation would obviously put instructions in the same VPT block when they are predicated on the same condition and when it is allowed to do this: VPTT I1 I2 We will address this optimisation with follow up patches when the groundwork is in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis that we need for the more optimal implementation. VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes cannot interact with each other. Instructions allowed in VPT blocks must be MVE instructions that are marked as VPT compatible. Differential Revision: https://reviews.llvm.org/D63247 llvm-svn: 363370	2019-06-14 11:46:05 +00:00
George Rimar	cfa1a62a4c	[yaml2obj] - Allow setting cutom Flags for implicit sections. With this patch we get ability to set any flags we want for implicit sections defined in YAML. Differential revision: https://reviews.llvm.org/D63136 llvm-svn: 363367	2019-06-14 11:01:14 +00:00
Sam Parker	0cf9639a9c	[SCEV] Pass NoWrapFlags when expanding an AddExpr InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363364	2019-06-14 09:19:41 +00:00
Eric Christopher	5e83d8fff4	Move commentary on opcode translation for code16 mov instructions to segment registers closer to the segment register check for when we add further optimizations. llvm-svn: 363355	2019-06-14 04:51:55 +00:00
David Blaikie	4129e3e0f8	DebugInfo: Include enumerators in pubnames This is consistent with GCC's behavior (which is the defacto standard for pubnames). Though I find the presence of enumerators from enum classes to be a bit confusing, possibly a bug on GCC's end (since they can't be named unqualified, unlike the other names - and names nested in classes don't go in pubnames, for instance - presumably because one must name the class first & that's enough to limit the scope of the search) llvm-svn: 363349	2019-06-14 01:58:56 +00:00
Stanislav Mekhanoshin	c43e67bfff	[AMDGPU] gfx1011/gfx1012 targets Differential Revision: https://reviews.llvm.org/D63307 llvm-svn: 363344	2019-06-14 00:33:31 +00:00
Francis Visoiu Mistrih	e4147ea1ef	Revert "[Remarks] Refactor optimization remarks setup" This reverts commit `6e6e3af55b`. This breaks greendragon. llvm-svn: 363343	2019-06-14 00:05:56 +00:00
Vedant Kumar	1e4882c890	[Coverage] Speculative fix for r363325 for an older compiler It looks like an older version of gcc can't figure out that it needs to move a unique_ptr while implicitly constructing an Expected object. llvm-svn: 363342	2019-06-14 00:03:22 +00:00
Stanislav Mekhanoshin	68a2fef9ae	[AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32 Differential Revision: https://reviews.llvm.org/D63301 llvm-svn: 363339	2019-06-13 23:47:36 +00:00
Amy Huang	49275272e3	Use fully qualified name when printing S_CONSTANT records Summary: Before it was using the fully qualified name only for static data members. Now it does for all variable names to match MSVC. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63012 llvm-svn: 363335	2019-06-13 22:53:43 +00:00
Peter Collingbourne	0feb6e52f1	Symbolize: Remove dead code. NFCI. The only caller of SymbolizableObjectFile::create passes a non-null DebugInfoContext and asserts that they do so. Move the assert into SymbolizableObjectFile::create and remove null checks. Differential Revision: https://reviews.llvm.org/D63298 llvm-svn: 363334	2019-06-13 22:49:34 +00:00
Amara Emerson	fb0a40f064	[GlobalISel][IRTranslator] Add debug loc with line 0 to constants emitted into the entry block. Constants, including G_GLOBAL_VALUE, are all emitted into the entry block which lets us use the vreg def assuming it dominates all other users. However, it can cause jumpy debug behaviour since the DebugLoc attached to these MIs are from a user instruction that could be in a different block. Fixes PR40887. Differential Revision: https://reviews.llvm.org/D63286 llvm-svn: 363331	2019-06-13 22:15:35 +00:00
Craig Topper	cf34a2bd5d	[X86Disassembler] Unify the EVEX and VEX code in emitContextTable. Merge the ATTR_VEXL/ATTR_EVEXL bits. NFCI Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes. Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately. llvm-svn: 363330	2019-06-13 22:15:25 +00:00
Jinsong Ji	1c88445840	[MachinePiepliner] Don't check boundary node in checkValidNodeOrder This was exposed by PowerPC target enablement. In ScheduleDAG, if we haven't seen any uses in this scheduling region, we will create a dependence edge to ExitSU to model the live-out latency. This is required for vreg defs with no in-region use, and prefetches with no vreg def. When we build NodeOrder in Scheduler, we ignore these boundary nodes. However, when we check Succs in checkValidNodeOrder, we did not skip them, so we still assume all the nodes have been sorted and in order in Indices array. So when we call lower_bound() for ExitSU, it will return Indices.end(), causing memory issues in following Node access. Differential Revision: https://reviews.llvm.org/D63282 llvm-svn: 363329	2019-06-13 21:51:12 +00:00
Francis Visoiu Mistrih	6e6e3af55b	[Remarks] Refactor optimization remarks setup * Add a common function to setup opt-remarks * Rename common options to the same names * Add error types to distinguish between file errors and regex errors llvm-svn: 363328	2019-06-13 21:46:57 +00:00
Vedant Kumar	901d04fc6d	[Coverage] Load code coverage data from archives Support loading code coverage data from regular archives, thin archives, and from MachO universal binaries which contain archives. Testing: check-llvm, check-profile (with {A,UB}San enabled) rdar://51538999 Differential Revision: https://reviews.llvm.org/D63232 llvm-svn: 363325	2019-06-13 20:48:57 +00:00
Stanislav Mekhanoshin	ccecd22db9	[AMDGPU] gfx1010 AMDGPUSetCCOp definition It was missing from D63293 and breaks in a debug tablegen w/o this part. llvm-svn: 363323	2019-06-13 20:23:02 +00:00
Lang Hames	2f8c6f9362	[ORC] Rename MaterializationResponsibility resolve and emit methods to notifyResolved/notifyEmitted. The 'notify' prefix better describes what these methods do: they update the JIT symbol states and notify any pending queries that the 'resolved' and 'emitted' states have been reached (rather than actually performing the resolution or emission themselves). Since new states are going to be introduced in the near future (to track symbol registration/initialization) it's worth changing the convention pre-emptively to avoid further confusion. llvm-svn: 363322	2019-06-13 20:11:23 +00:00
Nikita Popov	ad81d427ca	[LangRef] Clarify poison semantics I find the current documentation of poison somewhat confusing, mainly because its use of "undefined behavior" doesn't seem to align with our usual interpretation (of immediate UB). Especially the sentence "any instruction that has a dependence on a poison value has undefined behavior" is very confusing. Clarify poison semantics by: * Replacing the introductory paragraph with the standard rationale for having poison values. * Spelling out that instructions depending on poison return poison. * Spelling out how we go from a poison value to immediate undefined behavior and give the two examples we currently use in ValueTracking. * Spelling out that side effects depending on poison are UB. Differential Revision: https://reviews.llvm.org/D63044 llvm-svn: 363320	2019-06-13 19:45:36 +00:00
Philip Reames	038e01dc9a	Add a clarifying comment about branching on poison I recently got this wrong (again), and I'm sure I'm not the only one. Put a comment in the logical place someone would look to "fix" the obvious "missed optimization" which arrises based on the common misunderstanding. Hopefully, this will save others time. :) llvm-svn: 363318	2019-06-13 19:27:56 +00:00
Stanislav Mekhanoshin	8bcc9bb595	[AMDGPU] gfx1010 base changes for wave32 Differential Revision: https://reviews.llvm.org/D63293 llvm-svn: 363299	2019-06-13 19:18:29 +00:00
Philip Reames	c37be29634	[LFTR] Rename variable to minimize confusion [NFC] As pointed out by Nikita in D62625, BackedgeTakenCount is generally used to refer to the backedge taken count of the loop. A conditional backedge taken count - one which only applies if a particular exit is taken - is called a ExitCount in SCEV code, so be consistent here. llvm-svn: 363293	2019-06-13 18:40:15 +00:00
Philip Reames	42a3fc133d	[LFTR] Stylistic cleanup as suggested in last review comment of D62939 [NFC] llvm-svn: 363292	2019-06-13 18:32:55 +00:00
Philip Reames	eb88badff9	Fix a bug w/inbounds invalidation in LFTR This contains fixes for two cases where we might invalidate inbounds and leave it stale in the IR (a miscompile). Case 1 is when switching to an IV with no dynamically live uses, and case 2 is when doing pre-to-post conversion on the same pointer type IV. The basic scheme used is to prove that using the given IV (pre or post increment forms) would have to already trigger UB on the path to the test we're modifying. As such, our potential UB triggering use does not change the semantics of the original program. As was pointed out in the review thread by Nikita, this is defending against a separate issue from the hasConcreteDef case. This is about poison, that's about undef. Unfortunately, the two are different, see Nikita's comment for a fuller explanation, he explains it well. (Note: I'm going to address Nikita's last style comment in a separate commit just to minimize chance of subtle bugs being introduced due to typos.) Differential Revision: https://reviews.llvm.org/D62939 llvm-svn: 363289	2019-06-13 18:23:13 +00:00
Leonard Chan	09f56b51ec	[clang][NewPM] Fix broken -O0 test from missing assumptions Add an AssumptionCache callback to the InlineFuntionInfo used for the AlwaysInlinerPass to match codegen of the AlwaysInlinerLegacyPass to generate llvm.assume. This fixes CodeGen/builtin-movdir.c when new PM is enabled by default. Differential Revision: https://reviews.llvm.org/D63170 llvm-svn: 363287	2019-06-13 18:18:40 +00:00
David Bolvansky	896ece41e4	[Codegen] Merge tail blocks with no successors after block placement Summary: I found the following case having tail blocks with no successors merging opportunities after block placement. Before block placement: bb0: ... bne a0, 0, bb2: bb1: mv a0, 1 ret bb2: ... bb3: mv a0, 1 ret bb4: mv a0, -1 ret The conditional branch bne in bb0 is opposite to beq. After block placement: bb0: ... beq a0, 0, bb1 bb2: ... bb4: mv a0, -1 ret bb1: mv a0, 1 ret bb3: mv a0, 1 ret After block placement, that appears new tail merging opportunity, bb1 and bb3 can be merged as one block. So the conditional constraint for merging tail blocks with no successors should be removed. In my experiment for RISC-V, it decreases code size. Author of original patch: Jim Lin Reviewers: haicheng, aheejin, craig.topper, rnk, RKSimon, Jim, dmgreen Reviewed By: Jim, dmgreen Subscribers: xbolva00, dschuff, javed.absar, sbc100, jgravelle-google, aheejin, kito-cheng, dmgreen, PkmX, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54411 llvm-svn: 363284	2019-06-13 18:11:32 +00:00
Stanislav Mekhanoshin	2bda177da0	[AMDGPU] ImmArg and SourceOfDivergence for permlane/dpp Added missing ImmArg and SourceOfDivergence to the crosslane intrinsics. Differential Revision: https://reviews.llvm.org/D63216 llvm-svn: 363276	2019-06-13 16:31:51 +00:00
Joseph Tremoulet	3bc6e2a7aa	[EarlyCSE] Ensure equal keys have the same hash value Summary: The logic in EarlyCSE that looks through 'not' operations in the predicate recognizes e.g. that `select (not (cmp sgt X, Y)), X, Y` is equivalent to `select (cmp sgt X, Y), Y, X`. Without this change, however, only the latter is recognized as a form of `smin X, Y`, so the two expressions receive different hash codes. This leads to missed optimization opportunities when the quadratic probing for the two hashes doesn't happen to collide, and assertion failures when probing doesn't collide on insertion but does collide on a subsequent table grow operation. This change inverts the order of some of the pattern matching, checking first for the optional `not` and then for the min/max/abs patterns, so that e.g. both expressions above are recognized as a form of `smin X, Y`. It also adds an assertion to isEqual verifying that it implies equal hash codes; this fires when there's a collision during insertion, not just grow, and so will make it easier to notice if these functions fall out of sync again. A new flag --earlycse-debug-hash is added which can be used when changing the hash function; it forces hash collisions so that any pair of values inserted which compare as equal but hash differently will be caught by the isEqual assertion. Reviewers: spatel, nikic Reviewed By: spatel, nikic Subscribers: lebedev.ri, arsenm, craig.topper, efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62644 llvm-svn: 363274	2019-06-13 15:24:11 +00:00
Simon Pilgrim	757a2f13fd	[X86] Use fresh MemOps when emitting VAARG64 Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on. Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D62726 llvm-svn: 363268	2019-06-13 14:05:37 +00:00
David Stenberg	1278a19282	Remove ';' after namespace's closing bracket [NFC] llvm-svn: 363267	2019-06-13 14:02:55 +00:00
Diogo N. Sampaio	0be2d25ecc	[FIX] Forces shrink wrapping to consider any memory access as aliasing with the stack Summary: Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472 The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed. Taking an exception can cause the stack to be corrupted. As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access the stack. Reviewers: dmgreen, qcolombet Reviewed By: qcolombet Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg Tags: #llvm Differential Revision: https://reviews.llvm.org/D63152 llvm-svn: 363265	2019-06-13 13:56:19 +00:00
Eugene Leviant	407c8f1f49	Extra error checking to ARMAttributeParser The patch checks for subsection length as discussed in D63191 llvm-svn: 363260	2019-06-13 13:25:20 +00:00
Jeremy Morse	d2cd9c23b4	[NFC] Sink a function call into LiveDebugValues::process This was requested in D62904, which I successfully missed. This is just a refactor and shouldn't change any behaviour. llvm-svn: 363259	2019-06-13 13:11:57 +00:00
Simon Tatham	286e1d2c2d	[ARM] Set up infrastructure for MVE vector instructions. This commit prepares the way to start adding the main collection of MVE instructions, which operate on the 128-bit vector registers. The most obvious thing that's needed, and the simplest, is to add the MQPR register class, which is like the existing QPR except that it has fewer registers in it. The more complicated part: MVE defines a system of vector predication, in which instructions operating on 128-bit vector registers can be constrained to operate on only a subset of the lanes, using a system of prefix instructions similar to the existing Thumb IT, in that you have one prefix instruction which designates up to 4 following instructions as subject to predication, and within that sequence, the predicate can be inverted by means of T/E suffixes ('Then' / 'Else'). To support instructions of this type, we've added two new Tablegen classes `vpred_n` and `vpred_r` for standard clusters of MC operands to add to a predicated instruction. Both include a flag indicating how the instruction is predicated at all (options are T, E and 'not predicated'), and an input register field for the register controlling the set of active lanes. They differ from each other in that `vpred_r` also includes an input operand for the previous value of the output register, for instructions that leave inactive lanes unchanged. `vpred_n` lacks that extra operand; it will be used for instructions that don't preserve inactive lanes in their output register (either because inactive lanes are zeroed, as the MVE load instructions do, or because the output register isn't a vector at all). This commit also adds the family of prefix instructions themselves (VPT / VPST), and all the machinery needed to work with them in assembly and disassembly (e.g. generating the 't' and 'e' mnemonic suffixes on disassembled instructions within a predicated block) I've added a couple of demo instructions that derive from the new Tablegen base classes and use those two operand clusters. The bulk of the vector instructions will come in followup commits small enough to be manageable. (One exception is that I've added the full version of `isMnemonicVPTPredicable` in the AsmParser, because it seemed pointless to carefully split it up.) Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62669 llvm-svn: 363258	2019-06-13 13:11:13 +00:00
Simon Pilgrim	6b56ad164c	[CodeGen] Add getMachineMemOperand + MachineMemOperand::Flags allocator helper wrapper. NFCI. Pre-commit for D62726 on behalf of @luke (Luke Lau) llvm-svn: 363257	2019-06-13 12:58:55 +00:00
Jeremy Morse	bf2b2f08b0	[DebugInfo] Honour variable fragments in LiveDebugValues This patch makes the LiveDebugValues pass consider fragments when propagating DBG_VALUE insts between blocks, fixing PR41979. Fragment info for a variable location is added to the open-ranges key, which allows distinct fragments to be tracked separately. To handle overlapping fragments things become slightly funkier. To avoid excessive searching for overlaps in the data-flow part of LiveDebugValues, this patch: * Pre-computes pairings of fragments that overlap, for each DILocalVariable * During data-flow, whenever something happens that causes an open range to be terminated (via erase), any fragments pre-determined to overlap are also terminated. The effect of which is that when encountering a DBG_VALUE fragment that overlaps others, the overlapped fragments do not get propagated to other blocks. We still rely on later location-list building to correctly handle overlapping fragments within blocks. It's unclear whether a mixture of DBG_VALUEs with and without fragmented expressions are legitimate. To avoid suprises, this patch interprets a DBG_VALUE with no fragment as overlapping any DBG_VALUE _with_ a fragment. Differential Revision: https://reviews.llvm.org/D62904 llvm-svn: 363256	2019-06-13 12:51:57 +00:00
Dmitry Preobrazhensky	1fca3b1972	[AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D61125 llvm-svn: 363255	2019-06-13 12:46:37 +00:00
Eugene Leviant	b00dbcbb43	[ThinLTO][Bitcode] Add 'entrycount' to FS_COMBINED_PROFILE. NFC Differential revision: https://reviews.llvm.org/D63078 llvm-svn: 363254	2019-06-13 12:33:26 +00:00
Simon Pilgrim	0baf136a4d	[X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases. Based on fuzz test from @craig.topper llvm-svn: 363251	2019-06-13 11:26:21 +00:00
Nikola Prica	076ae0d2e2	[DebugInfo] Move Value struct out of DebugLocEntry as DbgValueLoc (NFC) Since the DebugLocEntry::Value is used as part of DwarfDebug and DebugLocEntry make it as the separate class. Reviewers: aprantl, dstenb Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D63213 llvm-svn: 363246	2019-06-13 10:23:26 +00:00
Jeremy Morse	181bf0cefb	[DebugInfo] Use FrameDestroy to extend stack locations to end-of-function We aim to ignore changes in variable locations during the prologue and epilogue of functions, to avoid using space documenting location changes that aren't visible. However in D61940 / r362951 this got ripped out as the previous implementation was unsound. Instead, use the FrameDestroy flag to identify when we're in the epilogue of a function, and ignore variable location changes accordingly. This fits in with existing code that examines the FrameSetup flag. Some variable locations get shuffled in modified tests as they now cover greater ranges, which is what would be expected. Some additional single-location variables are generated too. Two tests are un-xfailed, they were only xfailed due to r362951 deleting functionality they depended on. Apparently some out-of-tree backends don't accurately maintain FrameDestroy flags -- if you're an out-of-tree maintainer and see changes in variable locations disappear due to a faulty FrameDestroy flag, it's safe to back this change out. The impact is just slightly more debug info than necessary. Differential Revision: https://reviews.llvm.org/D62314 llvm-svn: 363245	2019-06-13 10:03:17 +00:00
Simon Tatham	848d3d0d2c	[ARM] Refactor handling of IT mask operands. During assembly, the mask operand to an IT instruction (storing the sequence of T/E for 'Then' and 'Else') is parsed out of the mnemonic into a representation that encodes 'Then' and 'Else' in the same way regardless of the condition code. At some point during encoding it has to be converted into the instruction encoding used in the architecture, in which the mask encodes a sequence of replacement low-order bits for the condition code, so that which bit value means 'then' and which 'else' depends on whether the original condition code had its low bit set. Previously, that transformation was done by processInstruction(), half way through assembly. So an MCOperand storing an IT mask would sometimes store it in one format, and sometimes in the other, depending on where in the assembly pipeline you were. You can see this in diagnostics from `llvm-mc -debug -triple=thumbv8a -show-inst`, for example: if you give it an instruction such as `itete eq`, you'd see an `<MCOperand Imm:5>` in a diagnostic become `<MCOperand Imm:11>` in the final output. Having the same data structure store values with time-dependent semantics is confusing already, and it will get more confusing when we introduce the MVE VPT instruction which reuses the Then/Else bitmask idea in a different context. So I'm refactoring: now, all `ARMOperand` and `MCOperand` representations of an IT mask work exactly the same way, namely, 0 means 'Then' and 1 means 'Else', regardless of what original predicate is being referred to. The architectural encoding of IT that depends on the original condition is now constructed at the point when we turn the `MCOperand` into the final instruction bit pattern, and decoded similarly in the disassembler. The previous condition-independent parse-time format used 0 for Else and 1 for Then. I've taken the opportunity to flip the sense of it while I'm changing all of this anyway, because it seems to me more natural to use 0 for 'leave the starting condition unchanged' and 1 for 'invert it', as if those bits were an XOR mask. Reviewers: ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63219 llvm-svn: 363244	2019-06-13 10:01:52 +00:00
Sander de Smalen	51c2fa0e2a	Improve reduction intrinsics by overloading result value. This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240	2019-06-13 09:37:38 +00:00
Sam Parker	179e0fa881	[NFC] Simplify Call query Use getIntrinsicID() directly from IntrinsicInst. llvm-svn: 363235	2019-06-13 08:32:56 +00:00
Sam Parker	9d28473a35	[ARM][TTI] Scan for existing loop intrinsics TTI should report that it's not profitable to generate a hardware loop if it, or one of its child loops, has already been converted. Differential Revision: https://reviews.llvm.org/D63212 llvm-svn: 363234	2019-06-13 08:28:46 +00:00
Sander de Smalen	7957fc6547	[IntrinsicEmitter] Extend argument overloading with forward references. Extend the mechanism to overload intrinsic arguments by using either backward or forward references to the overloadable arguments. In for example: def int_something : Intrinsic<[LLVMPointerToElt<0>], [llvm_anyvector_ty], []>; LLVMPointerToElt<0> is a forward reference to the overloadable operand of type 'llvm_anyvector_ty' and would allow intrinsics such as: declare i32* @llvm.something.v4i32(<4 x i32>); declare i64* @llvm.something.v2i64(<2 x i64>); where the result pointer type is deduced from the element type of the first argument. If the returned pointer is not a pointer to the element type, LLVM will give an error: Intrinsic has incorrect return type! i64* (<4 x i32>)* @llvm.something.v4i32 Reviewers: RKSimon, arsenm, rnk, greened Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62995 llvm-svn: 363233	2019-06-13 08:19:33 +00:00
Shawn Landden	8b142bcc3f	[SimplifyCFG] reverting preliminary Switch patches again This reverts 363226 and 363227, both NFC intended I swear I fixed the test case that is failing, and ran the tests, but I will look into it again. llvm-svn: 363229	2019-06-13 05:26:17 +00:00
Shawn Landden	636220e83c	[SimpligyCFG] NFC intended, remove GCD that was only used for powers of two and replace with an equilivent countTrailingZeros. GCD is much more expensive than this, with repeated division. This depends on D60823 Differential Revision: https://reviews.llvm.org/D61151 llvm-svn: 363227	2019-06-13 05:01:44 +00:00
Tom Stellard	f335672218	X86: Clean up pass initialization Summary: - Remove redundant initializations from pass constructors that were already being initialized by LLVMInitializeX86Target(). - Add initialization function for the FPS pass. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63218 llvm-svn: 363221	2019-06-13 02:09:32 +00:00
David L. Jones	c73fadaa84	Revert r361811: 'Re-commit r357452 (take 2): "SimplifyCFG SinkCommonCodeFromPredecessors ...' We have observed some failures with internal builds with this revision. - Performance regressions: - llvm's SingleSource/Misc evalloop shows performance regressions (although these may be red herrings). - Benchmarks for Abseil's SwissTable. - Correctness: - Failures for particular libicu tests when building the Google AppEngine SDK (for PHP). hwennborg has already been notified, and is aware of reproducer failures. llvm-svn: 363220	2019-06-13 02:04:45 +00:00
Philip Reames	ae2581cef3	[IndVars] Extend diagnostic -replexitval flag w/ability to bypass hard use hueristic Note: This does mean that "always" is now more powerful than it was. llvm-svn: 363196	2019-06-12 19:52:05 +00:00
Stanislav Mekhanoshin	245b5ba344	[AMDGPU] gfx1010 dpp16 and dpp8 Differential Revision: https://reviews.llvm.org/D63203 llvm-svn: 363186	2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin	5f581c9f08	[AMDGPU] gfx1010 premlane instructions Differential Revision: https://reviews.llvm.org/D63202 llvm-svn: 363185	2019-06-12 17:52:51 +00:00
Simon Atanasyan	efc0d1a298	[Mips] Add s.d instruction alias for Mips1 Add support for s.d instruction for Mips1 which expands into two swc1 instructions. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D63199 llvm-svn: 363184	2019-06-12 17:52:05 +00:00
Philip Reames	e51c3d8b82	[SCEV] Teach computeSCEVAtScope benefit from one-input Phi. PR39673 SCEV does not propagate arguments through one-input Phis so as to make it easy for the SCEV expander (and related code) to preserve LCSSA. It's not entirely clear this restriction is neccessary, but for the moment it exists. For this reason, we don't analyze single-entry phi inputs. However it is possible that when an this input leaves the loop through LCSSA Phi, it is a provable constant. Missing that results in an order of optimization issue in loop exit value rewriting where we miss some oppurtunities based on order in which we visit sibling loops. This patch teaches computeSCEVAtScope about this case. We can generalize it later, but so far we can only replace LCSSA Phis with their constant loop-exiting values. We should probably also add similiar logic directly in the SCEV construction path itself. Patch by: mkazantsev (with revised commit message by me) Differential Revision: https://reviews.llvm.org/D58113 llvm-svn: 363180	2019-06-12 17:21:47 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Simon Pilgrim	5b0e0dd709	[X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c) Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1). An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit. llvm-svn: 363178	2019-06-12 16:38:20 +00:00
Matt Arsenault	f29366b1f5	StackProtector: Use PointerMayBeCaptured This was using its own, outdated list of possible captures. This was at minimum not catching cmpxchg and addrspacecast captures. One change is now any volatile access is treated as capturing. The test coverage for this pass is quite inadequate, but this required removing volatile in the lifetime capture test. Also fixes some infrastructure issues to allow running just the IR pass. Fixes bug 42238. llvm-svn: 363169	2019-06-12 14:23:33 +00:00
Mikael Holmen	030df51e27	[ARM] Fix compiler warning Without this fix clang 3.6 complains with: ../lib/Target/ARM/ARMAsmPrinter.cpp:1473:18: error: variable 'BranchTarget' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] } else if (MI->getOperand(1).isSymbol()) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1479:22: note: uninitialized use occurs here MCInst.addExpr(BranchTarget); ^~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1473:14: note: remove the 'if' if its condition is always true } else if (MI->getOperand(1).isSymbol()) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../lib/Target/ARM/ARMAsmPrinter.cpp:1465:33: note: initialize the variable 'BranchTarget' to silence this warning const MCExpr *BranchTarget; ^ = nullptr 1 error generated. Discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190610/661417.html llvm-svn: 363166	2019-06-12 14:19:22 +00:00
Matt Arsenault	aa6bdf9dcd	LoopVersioning: Respect convergent This changes the standalone pass only. Arguably the utility class itself should assert there are no convergent calls. However, a target pass with additional context may still be able to version a loop if all of the dynamic conditions are sufficiently uniform. llvm-svn: 363165	2019-06-12 14:05:58 +00:00
Anton Afanasyev	339b39b773	[MIR] Skip hoisting to basic block which may throw exception or return Summary: Fix hoisting to basic block which are not legal for hoisting cause it can be terminated by exception or it is return block. Reviewers: john.brawn, RKSimon, MatzeB Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63148 llvm-svn: 363164	2019-06-12 13:51:44 +00:00
Matt Arsenault	86325be3d7	LoopLoadElim: Respect convergent llvm-svn: 363162	2019-06-12 13:50:47 +00:00
Matt Arsenault	2466ba97bc	LoopDistribute/LAA: Respect convergent This case is slightly tricky, because loop distribution should be allowed in some cases, and not others. As long as runtime dependency checks don't need to be introduced, this should be OK. This is further complicated by the fact that LoopDistribute partially ignores if LAA says that vectorization is safe, and then does its own runtime pointer legality checks. Note this pass still does not handle noduplicate correctly, as this should always be forbidden with it. I'm not going to bother trying to fix it, as it would require more effort and I think noduplicate should be removed. https://reviews.llvm.org/D62607 llvm-svn: 363160	2019-06-12 13:34:19 +00:00
Nico Weber	8bbdea447e	Fix a Wunused-lambda-capture warning. The capture was added in the first commit of https://reviews.llvm.org/D61934 when it was used. In the reland, the use was removed but the capture wasn't removed. llvm-svn: 363155	2019-06-12 12:46:46 +00:00
Sam Parker	757ac02dc8	[ARM] Implement TTI::isHardwareLoopProfitable Implement the backend target hook to drive the HardwareLoops pass. The low-overhead branch extension for Arm M-class cores is flexible enough that we don't have to ensure correctness at this point, except checking that the loop counter variable can be stored in LR - a 32-bit register. For it to be profitable, we want to avoid loops that contain function calls, or any other instruction that alters the PC. This implementation uses TargetLoweringInfo, to query type and operation actions, looks at intrinsic calls and also performs some manual checks for remainder/division and FP operations. I think this should be a good base to start and extra details can be filled out later. Differential Revision: https://reviews.llvm.org/D62907 llvm-svn: 363149	2019-06-12 12:00:42 +00:00
Sam Parker	61de6a4e9c	[NFC][SCEV] Add NoWrapFlag argument to InsertBinOp 'Use wrap flags in InsertBinop' (rL362687) was reverted due to miscompiles. This patch introduces the previous change to pass no-wrap flags but now only FlagAnyWrap is passed. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363147	2019-06-12 11:53:55 +00:00
Nico Weber	1dc2123d64	Share /machine: handling code with llvm-cvtres too r363016 let lld-link and llvm-lib share the /machine: parsing code. This lets llvm-cvtres share it as well. Making llvm-cvtres depend on llvm-lib seemed a bit strange (it doesn't need llvm-lib's dependencies on BinaryFormat and BitReader) and I couldn't find a good place to put this code. Since it's just a few lines, put it in lib/Object for now. Differential Revision: https://reviews.llvm.org/D63120 llvm-svn: 363144	2019-06-12 11:32:43 +00:00
Simon Pilgrim	ca39de7199	[XCore] CombineSTORE - Use allowsMemoryAccess wrapper. NFCI. Noticed in D63075 - there was a allowsMisalignedMemoryAccesses call to check for unaligned loads and a check for aligned legal type loads - which is exactly what allowsMemoryAccess does. llvm-svn: 363141	2019-06-12 11:08:29 +00:00
Ben Dunbobbin	564d248ec2	[ThinLTO]LTO]Legacy] Fix dependent libraries support by adding querying of the IRSymtab Dependent libraries support for the legacy api was committed in a broken state (see: https://reviews.llvm.org/D60274). This was missed due to the painful nature of having to integrate the changes into a linker in order to test. This change implements support for dependent libraries in the legacy LTO api: - I have removed the current api function, which returns a single string, and added functions to access each dependent library specifier individually. - To reduce the testing pain, I have made the api functions as thin as possible to maximize coverage from llvm-lto. - When doing ThinLTO the system linker will load the modules lazily when scanning the input files. Unfortunately, when modules are lazily loaded there is no access to module level named metadata. To fix this I have added api functions that allow querying the IRSymtab for the dependent libraries. I hope to expand the api in the future so that, eventually, all the information needed by a client linker during scan can be retrieved from the IRSymtab. Differential Revision: https://reviews.llvm.org/D62935 llvm-svn: 363140	2019-06-12 11:07:56 +00:00
Simon Pilgrim	32c1e73603	[XCore] LowerLOAD/LowerSTORE - Use allowsMemoryAccess wrapper. NFCI. Noticed in D63075 - there was a allowsMisalignedMemoryAccesses call to check for unaligned loads and a check for aligned legal type loads - which is exactly what allowsMemoryAccess does. llvm-svn: 363137	2019-06-12 10:46:50 +00:00
Orlando Cazalet-Hyams	a947156396	Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion" This reverts commit `1a0f7a2077`. See phabricator thread for D60831. llvm-svn: 363132	2019-06-12 08:34:51 +00:00
Sjoerd Meijer	de73404b8c	[AArch64] Merge globals when optimising for size Extern global merging is good for code-size. There's definitely potential for performance too, but there's one regression in a benchmark that needs investigating, so that's why we enable it only when we optimise for size for now. Patch by Ramakota Reddy and Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D61947 llvm-svn: 363130	2019-06-12 08:28:35 +00:00
Craig Topper	ed4cd44870	[X86] Add VCMPSSZrr_Intk and VCMPSDZrr_Intk to isNonFoldablePartialRegisterLoad. The non-masked versions are already in there. I'm having some trouble coming up with a way to test this right now. Most load folding should happen during isel so I'm not sure how to get peephole pass to do it. llvm-svn: 363125	2019-06-12 06:29:53 +00:00
Hsiangkai Wang	04ddf39b44	[RISCV] Add CFI directives for RISCV prologue/epilog. In order to generate correct debug frame information, it needs to generate CFI information in prologue and epilog. Differential Revision: https://reviews.llvm.org/D61773 llvm-svn: 363120	2019-06-12 03:04:22 +00:00
Hsiangkai Wang	93be25b580	[NFC] Correct comments in RegisterCoalescer. Differential Revision: https://reviews.llvm.org/D63124 llvm-svn: 363119	2019-06-12 02:58:04 +00:00
Philip Reames	02f0b379f5	Fix a bug in getSCEVAtScope w.r.t. non-canonical loops The issue is that if we have a loop with multiple predecessors outside the loop, the code was expecting to merge them and only return if equal, but instead returned the first one seen. I have no idea if this actually tripped anywhere. I noticed it by accident when reading the code and have no idea how to go about constructing a test case. llvm-svn: 363112	2019-06-11 23:21:24 +00:00
Philip Reames	082cd30327	Generalize icmp matching in IndVars' eliminateTrunc We were only matching RHS being a loop invariant value, not the inverse. Since there's nothing which appears to canonicalize loop invariant values to RHS, this means we missed cases. Differential Revision: https://reviews.llvm.org/D63112 llvm-svn: 363108	2019-06-11 22:43:25 +00:00
Sanjay Patel	40e3bdf876	[Analysis] add isSplatValue() for vectors in IR We have the related getSplatValue() already in IR (see code just above the proposed addition). But sometimes we only need to know that the value is a splat rather than capture the splatted scalar value. Also, we have an isSplatValue() function already in SDAG. Motivation - recent bugs that would potentially benefit from improved splat analysis in IR: https://bugs.llvm.org/show_bug.cgi?id=37428 https://bugs.llvm.org/show_bug.cgi?id=42174 Differential Revision: https://reviews.llvm.org/D63138 llvm-svn: 363106	2019-06-11 22:25:18 +00:00
Amara Emerson	d133c15925	[GlobalISel] Add a G_JUMP_TABLE opcode. This opcode generates a pointer to the address of the jump table specified by the source operand, which is a jump table index. It will be used in conjunction with an upcoming G_BRJT opcode to support jump table codegen with GlobalISel. Differential Revision: https://reviews.llvm.org/D63111 llvm-svn: 363096	2019-06-11 19:58:06 +00:00
Alina Sbirlea	cb4ed8a7bc	[MemorySSA] When applying updates, clean unnecessary Phis. Summary: After applying a set of insert updates, there may be trivial Phis left over. Clean them up. Reviewers: george.burgess.iv Subscribers: jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63033 llvm-svn: 363094	2019-06-11 19:09:34 +00:00
Alina Sbirlea	3cef1f7d64	Only passes that preserve MemorySSA must mark it as preserved. Summary: The method `getLoopPassPreservedAnalyses` should not mark MemorySSA as preserved, because it's being called in a lot of passes that do not preserve MemorySSA. Instead, mark the MemorySSA analysis as preserved by each pass that does preserve it. These changes only affect the new pass mananger. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62536 llvm-svn: 363091	2019-06-11 18:27:49 +00:00
Amy Huang	9970817c57	Deduplicate S_CONSTANTs in LLD. Summary: Deduplicate S_CONSTANTS when linking, if they have the same value. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63151 llvm-svn: 363089	2019-06-11 18:02:39 +00:00
Jinsong Ji	ef2d6d99c0	[PowerPC] Enable MachinePipeliner for P9 with -ppc-enable-pipeliner Implement necessary target hooks to enable MachinePipeliner for P9 only. The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9. Differential Revision: https://reviews.llvm.org/D62164 llvm-svn: 363085	2019-06-11 17:40:39 +00:00
Jonas Devlieghere	a6fe345ac9	[Path] Set FD to -1 in moved-from TempFile When moving a temp file, explicitly set the file descriptor to -1 so we can never accidentally close the moved-from TempFile. Differential revision: https://reviews.llvm.org/D63087 llvm-svn: 363083	2019-06-11 16:42:42 +00:00
Cameron McInally	08200d6d26	[InstCombine] Handle -(X-Y) --> (Y-X) for unary fneg when NSZ Differential Revision: https://reviews.llvm.org/D62612 llvm-svn: 363082	2019-06-11 16:21:21 +00:00
Cameron McInally	796de11331	[InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNeg Differential Revision: https://reviews.llvm.org/D62629 llvm-svn: 363080	2019-06-11 15:45:41 +00:00
Nico Weber	af6bc65ddf	lld-link: Reject more than one resource .obj file Users are exepcted to pass all .res files to the linker, which then merges all the resource in all .res files into a tree structure and then converts the final tree structure to a .obj file with .rsrc$01 and .rsrc$02 sections and then links that. If the user instead passes several .obj files containing such resources, the correct thing to do would be to have custom code to merge the trees in the resource sections instead of doing normal section merging -- but link.exe rejects if multiple resource obj files are passed in with LNK4078, so let lld-link do that too instead of silently writing broken .rsrc sections in that case. The only real way to run into this is if users manually convert .res files to .obj files by running cvtres and then handing the resulting .obj files to lld-link instead, which in practice likely never happens. (lld-link is slightly stricter than link.exe now: If link.exe is passed one .obj file created by cvtres, and a .res file, for some reason it just emits a warning instead of an error and outputs strange looking data. lld-link now errors out on mixed input like this.) One way users could accidentally run into this is the following scenario: If a .res file is passed to lib.exe, then lib.exe calls cvtres.exe on the .res file before putting it in the output .lib. (llvm-lib currently doesn't do this.) link.exe's /wholearchive seems to only add obj files referenced from the static library index, but lld-link current really adds all files in the archive. So if lld-link /wholearchive is used with .lib files produced by lib.exe and .res files were among the files handed to lib.exe, we previously silently produced invalid output, but now we error out. link.exe's /wholearchive semantics on the other hand mean that it wouldn't load the resource object files from the .lib file at all. Since this scenario is probably still an unlikely corner case, the difference in behavior here seems fine -- and lld-link might have to change to use link.exe's /wholearchive semantics in the future anyways. Vaguely related to PR42180. Differential Revision: https://reviews.llvm.org/D63109 llvm-svn: 363078	2019-06-11 15:22:28 +00:00
Lewis Revill	a5240361dd	[RISCV] Add lowering of addressing sequences for PIC This patch allows lowering of PIC addresses by using PC-relative addressing for DSO-local symbols and accessing the address through the global offset table for non-DSO-local symbols. Differential Revision: https://reviews.llvm.org/D55303 llvm-svn: 363058	2019-06-11 12:57:47 +00:00
Lewis Revill	28a5cadb3a	[RISCV] Lower inline asm constraints I, J & K for RISC-V This validates and lowers arguments to inline asm nodes which have the constraints I, J & K, with the following semantics (equivalent to GCC): I: Any 12-bit signed immediate. J: Immediate integer zero only. K: Any 5-bit unsigned immediate. Differential Revision: https://reviews.llvm.org/D54093 llvm-svn: 363054	2019-06-11 12:42:13 +00:00
Mikhail Maltsev	7bd5c55cad	[ARM] First MVE instructions: scalar shifts. This introduces a new decoding table for MVE instructions, and starts by adding the family of scalar shift instructions that are part of the MVE architecture extension: saturating shifts within a single GPR, and long shifts across a pair of GPRs (both saturating and normal). Some of these shift instructions have only 3-bit register fields in the encoding, with the low bit fixed. So they can only address an odd or even numbered GPR (depending on the operand), and therefore I add two new register classes, GPREven and GPROdd. Differential Revision: https://reviews.llvm.org/D62668 Change-Id: Iad95d5f83d26aef70c674027a184a6b1e0098d33 llvm-svn: 363051	2019-06-11 12:04:32 +00:00
Nico Weber	dd6019526d	Let writeWindowsResourceCOFF() take a TimeStamp parameter For lld, pass in Config->Timestamp (which is set based on lld's /timestamp: and /Brepro flags). Since the writeWindowsResourceCOFF() data is only used in-memory by LLD and the obj's timestamp isn't used for anything in the output, this doesn't change behavior. For llvm-cvtres, add an optional /timestamp: parameter, and use the current behavior of calling time() if the parameter is not passed in. This doesn't really change observable behavior (unless someone passes /timestamp: to llvm-cvtres, which wasn't possible before), but it removes the last unqualified call to time() from llvm/lib, which seems like a good thing. Differential Revision: https://reviews.llvm.org/D63116 llvm-svn: 363050	2019-06-11 11:26:50 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Orlando Cazalet-Hyams	1a0f7a2077	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046	2019-06-11 10:37:20 +00:00
Simon Tatham	14241378d3	[ARM] Fix unused-variable warning in rL363039. The variable `OffsetMask` is currently only used in an assertion, so if assertions are compiled out and -Werror is enabled, it becomes a build failure. llvm-svn: 363043	2019-06-11 10:09:12 +00:00
Simon Pilgrim	287e78c82b	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105) Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040	2019-06-11 09:44:33 +00:00
Simon Tatham	8c865cacda	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need a new addressing mode. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 363039	2019-06-11 09:29:18 +00:00
Sander de Smalen	cbeb563cfb	Change semantics of fadd/fmul vector reductions. This patch changes how LLVM handles the accumulator/start value in the reduction, by never ignoring it regardless of the presence of fast-math flags on callsites. This change introduces the following new intrinsics to replace the existing ones: llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul and adds functionality to auto-upgrade existing LLVM IR and bitcode. Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D60261 llvm-svn: 363035	2019-06-11 08:22:10 +00:00
Craig Topper	627d8168e7	[X86] Add load folding isel patterns to scalar_math_patterns and AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032	2019-06-11 04:30:53 +00:00
Tom Stellard	4b0b26199b	Revert CMake: Make most target symbols hidden by default This reverts r362990 (git commit `374571301d`) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028	2019-06-11 03:21:13 +00:00
Peter Collingbourne	e5bdedac9d	Symbolize: Make DWPName a symbolizer option instead of an argument to symbolize{,Inlined}Code. This makes the interface simpler and more consistent with the interface for .dSYM files and fixes a bug where llvm-symbolizer would not read the dwp if it was asked to symbolize data before symbolizing code. Differential Revision: https://reviews.llvm.org/D63114 llvm-svn: 363025	2019-06-11 02:32:27 +00:00
Matt Arsenault	c5830f5f05	AtomicExpand: Don't crash on non-0 alloca This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022	2019-06-11 01:35:07 +00:00
Matt Arsenault	383e72fcfe	AMDGPU: Expand < 32-bit atomics Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021	2019-06-11 01:35:00 +00:00
Nico Weber	b941fa8821	llvm-lib: Implement /machine: argument And share some code with lld-link. While here, also add a FIXME about PR42180 and merge r360150 to llvm-lib. Differential Revision: https://reviews.llvm.org/D63021 llvm-svn: 363016	2019-06-11 01:13:41 +00:00
Yi Kong	432f48fcd4	[AArch64] Add more CPUs to host detection Returns "cortex-a73" for 3rd and 4th gen Kryo; not precisely correct, but close enough. Differential Revision: https://reviews.llvm.org/D63099 llvm-svn: 363013	2019-06-11 00:05:36 +00:00
Puyan Lotfi	4d89462a1c	[MIR-Canon] Fixing non-determinism that was breaking bots (NFC). An earlier fix of a subtle iterator invalidation bug had uncovered a nondeterminism that was present in the MultiUsers bag. Problem was that MultiUsers was being looked up using pointers. This patch is an NFC change that numbers each multiuser and processes each in numbered order. This fixes the test failure on netbsd and will likely fix the green-dragon bot too. llvm-svn: 363012	2019-06-11 00:00:25 +00:00
Shoaib Meenai	5062cf599c	[Support] Explicitly detect recursive response files Previous detection relied upon an arbitrary hard coded limit of 21 response files, which some code bases were running up against. The new detection maintains a stack of processing response files and explicitly checks if a newly encountered file is in the current stack. Some bookkeeping data is necessary in order to detect when to pop the stack. Patch by Chris Glover. Differential Revision: https://reviews.llvm.org/D62798 llvm-svn: 363005	2019-06-10 23:24:02 +00:00
Rong Xu	7ea131c20c	[PGO] Fix the buildbot failure in r362995 Fixed one unused variable warning. llvm-svn: 363004	2019-06-10 23:20:04 +00:00
Rong Xu	e44fa83c37	[PGO] Handle cases of non-instrument BBs As shown in PR41279, some basic blocks (such as catchswitch) cannot be instrumented. This patch filters out these BBs in PGO instrumentation. It also sets the profile count to the fail-to-instrument edge, so that we can propagate the counts in the CFG. Differential Revision: https://reviews.llvm.org/D62700 llvm-svn: 362995	2019-06-10 22:36:27 +00:00
Tom Stellard	374571301d	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990	2019-06-10 22:12:56 +00:00
Jessica Paquette	b22954384e	[GlobalISel] Translate memset/memmove/memcpy from undef ptrs into nops If the source is undef, then just don't do anything. This matches SelectionDAG's behaviour in SelectionDAG.cpp. Also add a test showing that we do the right thing here. (irtranslator-memfunc-undef.ll) Differential Revision: https://reviews.llvm.org/D63095 llvm-svn: 362989	2019-06-10 21:53:56 +00:00
Philip Reames	4bf1c23990	Factor out a helper function for readability and reuse in a future patch [NFC] llvm-svn: 362980	2019-06-10 20:41:27 +00:00
Philip Reames	a9633d5f0b	[LFTR] Use recomputed BE count This was discussed as part of D62880. The basic thought is that computing BE taken count after widening should produce (on average) an equally good backedge taken count as the one before widening. Since there's only one test in the suite which is impacted by this change, and it's essentially equivelent codegen, that seems to be a reasonable assertion. This change was separated from r362971 so that if this turns out to be problematic, the triggering piece is obvious and easily revertable. For the nestedIV example from elim-extend.ll, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) Note that before is an i32 type, and the after is an i64. Truncating the i64 produces the i32. llvm-svn: 362975	2019-06-10 19:18:53 +00:00
Jinsong Ji	9c7f93e914	[PowerPC][HTM]Fix $zero is not a GPRC register for builtin_ttest This was found during HTM cleanup. Adding a test for builtin_ttest would expose following issue. * Bad machine code: Illegal physical register for instruction * - function: test10 - basic block: %bb.0 entry (0xf0e57497b58) - instruction: %5:crrc0 = TABORTWCI 0, $zero, 0 - operand 2: $zero $zero is not a GPRC register. LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D63079 llvm-svn: 362974	2019-06-10 19:04:14 +00:00
Philip Reames	5d84ccb230	Prepare for multi-exit LFTR [NFC] This change does the plumbing to wire an ExitingBB parameter through the LFTR implementation, and reorganizes the code to work in terms of a set of individual loop exits. Most of it is fairly obvious, but there's one key complexity which makes it worthy of consideration. The actual multi-exit LFTR patch is in D62625 for context. Specifically, it turns out the existing code uses the backedge taken count from before a IV is widened. Oddly, we can end up with a different (more expensive, but semantically equivelent) BE count for the loop when requerying after widening. For the nestedIV example from elim-extend, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) This is the only test in tree which seems sensitive to this difference. The actual result of using the wider BETC on this example is that we actually produce slightly better code. :) In review, we decided to accept that test change. This patch is structured to preserve the old behavior, but a separate change will immediate follow with the behavior change. (I wanted it separate for problem attribution purposes.) Differential Revision: https://reviews.llvm.org/D62880 llvm-svn: 362971	2019-06-10 17:51:13 +00:00
Francis Visoiu Mistrih	a438432acc	[FastISel] Skip creating unnecessary vregs for arguments This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963	2019-06-10 16:53:37 +00:00
Cameron McInally	670d0f478b	[ExecutionEngine] Fix rL362941: Add UnaryOperator visitor to the interpreter Missed break statements. This was D62881. llvm-svn: 362958	2019-06-10 16:05:25 +00:00
Piotr Sobczak	9b11e93d90	[AMDGPU] Optimize image_[load\|store]_mip Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957	2019-06-10 15:58:51 +00:00
Simon Tatham	67065c5c70	Revert rL362953 and its followup rL362955. These caused a build failure because I managed not to notice they depended on a later unpushed commit in my current stack. Sorry about that. llvm-svn: 362956	2019-06-10 15:58:19 +00:00
Simon Tatham	42078d41d5	[ARM] Add the non-MVE instructions in Arm v8.1-M. This should have been part of r362953, but I had a finger-trouble incident and committed the old rather than new version of the patch. Sorry. llvm-svn: 362955	2019-06-10 15:41:58 +00:00
Sanjay Patel	9650c95b7e	[InstCombine] allow unordered preds when canonicalizing to fabs() We have a known-never-nan value via 'nnan', so an unordered predicate is the same as its ordered sibling. Similar to: rL362937 llvm-svn: 362954	2019-06-10 15:39:00 +00:00
Simon Tatham	baeea91933	[ARM] Add the non-MVE instructions in Arm v8.1-M. This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need some new addressing modes. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 362953	2019-06-10 15:36:34 +00:00
Jeremy Morse	bcff417292	[DebugInfo] Terminate all location-lists at end of block This commit reapplies r359426 (which was reverted in r360301 due to performance problems) and rolls in D61940 to address the performance problem. I've combined the two to avoid creating a span of slow-performance, and to ease reverting if more problems crop up. The summary of D61940: This patch removes the "ChangingRegs" facility in DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect variable locations. An unchanging register doesn't mean a variable doesn't change its location. The patch kills off everything that calculates the ChangingRegs vector. Previously ChangingRegs spotted epilogues and marked registers as unchanging if they weren't modified outside the epilogue, increasing the chance that we can emit a single-location variable record. Without this feature, debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've split this into two steps as FrameDestroy isn't necessarily supported by all backends. The logic for terminating variable locations at the end of a basic block now becomes much more enjoyably simple: we just terminate them all. Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term. The current algorithm for detecting that a variable has a single-location doesn't work in this scenario (inlined function in multiple blocks), only other bugs were making this test work. fission-ranges.ll gets slightly refreshed too, as the location of "p" is now correctly determined to be a single location. Differential Revision: https://reviews.llvm.org/D61940 llvm-svn: 362951	2019-06-10 15:23:46 +00:00
Sanjay Patel	85de9634e6	[InstCombine] fix bug in canonicalization to fabs() Forgot to translate the predicate clauses in rL362943. llvm-svn: 362945	2019-06-10 14:57:45 +00:00
Sanjay Patel	8b6d9f60ed	[InstCombine] change canonicalization to fabs() to use FMF on fsub Similar to rL362909: This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fsub because they have the same operand. llvm-svn: 362943	2019-06-10 14:46:36 +00:00
Simon Tatham	b87669f166	[ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR. Arm v8.1-M supports the VMOV instructions that move a half-precision value to and from a GPR, but not if the GPR is SP or PC. To fix this, I've changed those instructions to use the rGPR register class instead of GPR. rGPR always excludes PC, and it excludes SP except in the presence of the HasV8Ops target feature (i.e. Arm v8-A). So the effect is that VMOV.F16 to and from PC is now illegal everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A cores (which I believe is all as it should be). Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60704 llvm-svn: 362942	2019-06-10 14:43:55 +00:00
Cameron McInally	ce49e2231b	[ExecutionEngine] Add UnaryOperator visitor to the interpreter This is to support the unary FNeg instruction. Differential Revision: https://reviews.llvm.org/D62881 llvm-svn: 362941	2019-06-10 14:38:48 +00:00
Sanjay Patel	8cd8c5784b	[InstCombine] allow unordered preds when canonicalizing to fabs() PR42179: https://bugs.llvm.org/show_bug.cgi?id=42179 llvm-svn: 362937	2019-06-10 14:14:51 +00:00
Andrea Di Biagio	47db08dbb1	[MCA] Further refactor the bottleneck analysis view. NFCI. llvm-svn: 362933	2019-06-10 12:50:08 +00:00
George Rimar	1e41007aeb	[yaml2obj/obj2yaml] - Make RawContentSection::Content and RawContentSection::Size optional This is a follow-up for D62809. Content and Size fields should be optional as was discussed in comments of the D62809's thread. With that, we can describe a specific string table and symbol table sections in a more correct way and also show appropriate errors. The patch adds lots of test cases where the behavior is described in details. Differential revision: https://reviews.llvm.org/D62957 llvm-svn: 362931	2019-06-10 12:43:18 +00:00
David Green	d847aa573b	[ARM] Enable Unroll UpperBound This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928	2019-06-10 10:22:14 +00:00
Nikola Prica	abc1dff7e4	[DebugInfo] More strict debug range for stack variables Variable's stack location can stretch longer than it should. If a variable is placed at the stack in a some nested basic block its range can be calculated to be up to the next occurrence of the variable's DBG_VALUE, or up to the end of the function, thus covering a basic blocks that should not be included in the variable’s location range. This happens because the DbgEntityHistoryCalculator ends register locations at the end of a basic block only if the variable’s location register has been changed throughout the function, which is not the case for the register used to reference stack objects. This patch also tries to produce a single value location if the location list builder managed to merge all the locations into one. Reviewers: aprantl, dstenb, jmorse Reviewed By: aprantl, dstenb, jmorse Subscribers: djtodoro, ivanbaev, asowda Tags: #debug-info Differential Revision: https://reviews.llvm.org/D61600 llvm-svn: 362923	2019-06-10 08:41:06 +00:00
QingShan Zhang	ab846da7e8	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921	2019-06-10 05:40:21 +00:00
Craig Topper	9000a72a4b	[X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920	2019-06-10 04:50:12 +00:00
Craig Topper	ceb807bbbc	[X86] Disable f32->f64 extload when sse2 is enabled Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919	2019-06-10 04:37:16 +00:00
Vivek Pandya	11cb15f8ed	Do not derive no-recurse attribute if function does not have exact definition. This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336 Reviewers: jdoerfert Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D63045 llvm-svn: 362918	2019-06-10 04:16:04 +00:00
Craig Topper	dd10099d5c	[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled. llvm-svn: 362915	2019-06-10 01:18:55 +00:00
Craig Topper	f7ba8b808a	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914	2019-06-10 00:41:07 +00:00
Nico Weber	80fee25776	Revert r361953 "[SVE][IR] Scalable Vector IR Type" This reverts commit `f4fc01f8dd`. It caused a 3-4x slowdown when doing thinlto links, PR42210. llvm-svn: 362913	2019-06-09 19:27:50 +00:00
David Bolvansky	dcf5e6abdf	[TargetLowering] Simplify (ctpop x) == 1 Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912	2019-06-09 18:18:57 +00:00
Roman Lebedev	d669758d84	[InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompiles A precondition 'x != 0' was forgotten by me: https://rise4fun.com/Alive/JFNP https://rise4fun.com/Alive/jHvL These 4 folds with non-constants could be re-enabled, but for now let's go for the simplest solution. https://bugs.llvm.org/show_bug.cgi?id=42198 llvm-svn: 362911	2019-06-09 16:30:42 +00:00
Sanjay Patel	87cd16a86e	[InstCombine] change canonicalization to fabs() to use FMF on fneg This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fneg (fsub) because they have the same operand. This works around the most glaring FMF logical inconsistency cited in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 llvm-svn: 362909	2019-06-09 16:22:01 +00:00
Sanjay Patel	866db10228	[InstSimplify] reduce code duplication for fcmp folds; NFC llvm-svn: 362904	2019-06-09 13:58:46 +00:00
Sanjay Patel	73f5a855b3	[InstSimplify] enhance fcmp fold with never-nan operand This is another step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. This is a continuation of D62979 / rL362879. llvm-svn: 362903	2019-06-09 13:48:59 +00:00
Anton Afanasyev	623d9ba068	[MIR] Add simple PRE pass to MachineCSE This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 This is fixed recommit of r361356 after PowerPC64 multistage build failure. llvm-svn: 362901	2019-06-09 12:15:47 +00:00
Ayke van Laethem	f18cf230e4	[CaptureTracking] Don't let comparisons against null escape inbounds pointers Pointers that are in-bounds (either through dereferenceable_or_null or thorough a getelementptr inbounds) cannot be captured with a comparison against null. There is no way to construct a pointer that is still in bounds but also NULL. This helps safe languages that insert null checks before load/store instructions. Without this patch, almost all pointers would be considered captured even for simple loads. With this patch, an icmp with null will not be seen as escaping as long as certain conditions are met. There was a lot of discussion about this patch. See the Phabricator thread for detals. Differential Revision: https://reviews.llvm.org/D60047 llvm-svn: 362900	2019-06-09 10:20:33 +00:00
Jatin Bhateja	2a30aeb010	[X86] NFCI : Comment updation for EVEX to VEX translation. Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898	2019-06-09 09:59:26 +00:00
Simon Pilgrim	5f337149fa	Use for-range loop. NFCI. llvm-svn: 362897	2019-06-09 09:07:30 +00:00
Amara Emerson	0d20969dea	[AArch64][GlobalISel] Select immediate forms of cmp instructions. A simple re-use of the immediate operand matcher and renderer functions. rdar://43795178 llvm-svn: 362896	2019-06-09 07:31:25 +00:00
Craig Topper	2ba0e2518b	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant. We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895	2019-06-09 03:21:33 +00:00
Craig Topper	7d8494c41c	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894	2019-06-08 23:53:31 +00:00
Simon Pilgrim	6bae6d5a5d	[DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI. Same codegen, only differ by the oneuse limit for the sextload case. llvm-svn: 362880	2019-06-08 17:02:00 +00:00
Sanjay Patel	4329c15f11	[InstSimplify] enhance fcmp fold with never-nan operand This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. I'll update the 'ult' case below here as a follow-up assuming no problems here. Differential Revision: https://reviews.llvm.org/D62979 llvm-svn: 362879	2019-06-08 15:12:33 +00:00
David Green	c5471c2a57	[ARM] Adjust isLegalT1AddressImmediate for non-legal types Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874	2019-06-08 10:32:53 +00:00
David Green	342d1b81a3	[ARM] Add MVE addressing to isLegalT2AddressImmediate Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873	2019-06-08 10:18:23 +00:00
David Green	4ecce205d5	[ARM] Add fp16 addressing to isLegalT2AddressImmediate The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872	2019-06-08 10:09:02 +00:00
David Green	10fbaa96c5	[ARM] Add HasNEON for all Neon patterns in ARMInstrNEON.td. NFCI We are starting to add an entirely separate vector architecture to the ARM backend. To do that we need at least some separation between the existing NEON and the new MVE code. This patch just goes through the Neon patterns and ensures that they are predicated on HasNEON, giving MVE a stable place to start from. No tests yet as this is largely an NFC, and we don't have the other target that will treat any of these intructions as legal. Differential Revision: https://reviews.llvm.org/D62945 llvm-svn: 362870	2019-06-08 09:36:49 +00:00
Jonas Paulsson	bca56ab073	[SystemZ] Fix CMakeLists.txt for alphabetical order (NFC). llvm-svn: 362869	2019-06-08 06:42:02 +00:00
Jonas Paulsson	fdc4ea34e3	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection. This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868	2019-06-08 06:19:15 +00:00
Amara Emerson	829037a914	Factor out SelectionDAG's switch analysis and lowering into a separate component. In order for GlobalISel to re-use the significant amount of analysis and optimization code in SDAG's switch lowering, we first have to extract it and create an interface to be used by both frameworks. No test changes as it's NFC. Differential Revision: https://reviews.llvm.org/D62745 llvm-svn: 362857	2019-06-08 00:05:17 +00:00
Keno Fischer	eb4a561fa3	[GVN] non-functional code movement Summary: Move some code around, in preparation for later fixes to the non-integral addrspace handling (D59661) Patch By Jameson Nash <jameson@juliacomputing.com> Reviewed By: reames, loladiro Differential Revision: https://reviews.llvm.org/D59729 llvm-svn: 362853	2019-06-07 23:08:38 +00:00
Matt Arsenault	ddd2c9ac86	AMDGPU: Force skips around traps llvm-svn: 362852	2019-06-07 23:02:52 +00:00
Alina Sbirlea	eaea538d18	[DomTreeUpdater] Add all insert before all delete updates to reduce compile time. Summary: The cleanup in D62751 introduced a compile-time regression due to the way DT updates are performed. Add all insert edges then all delete edges in DTU to match the previous compile time. Compile time on the test provided by @mstorsjo before and after this patch on my machine: 113.046s vs 35.649s Repro: clang -target x86_64-w64-mingw32 -c -O3 glew-preproc.c; on https://martin.st/temp/glew-preproc.c. Reviewers: kuhar, NutshellySima, mstorsjo Subscribers: jlebar, mstorsjo, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62981 llvm-svn: 362839	2019-06-07 20:43:55 +00:00
Craig Topper	bd03230cb0	[X86] Remove unnecessary new line escape from the end of a macro. NFC llvm-svn: 362837	2019-06-07 20:30:40 +00:00
Volkan Keles	97204a6788	[GlobalISel] IRTranslator: Translate the intrinsics ignored by CodeGen Summary: Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing as they have no effect on CodeGen. Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm Reviewed By: arsenm Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63022 llvm-svn: 362834	2019-06-07 20:19:27 +00:00
Nick Desaulniers	7ddd694d36	[APFloat] APFloat::Storage::Storage - refix use after move Summary: Re-land r360675 after it was reverted in r360770. This was reported in: https://llvm.org/reports/scan-build/ Based on feedback in: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190513/652286.html Reviewers: RKSimon, efriedma Reviewed By: RKSimon, efriedma Subscribers: eli.friedman, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62767 llvm-svn: 362833	2019-06-07 19:51:22 +00:00
Lang Hames	d4a8089f03	[ORC] Update symbol lookup to use a single callback with a required symbol state rather than two callbacks. The asynchronous lookup API (which the synchronous lookup API wraps for convenience) used to take two callbacks: OnResolved (called once all requested symbols had an address assigned) and OnReady to be called once all requested symbols were safe to access). This patch updates the asynchronous lookup API to take a single 'OnComplete' callback and a required state (SymbolState) to determine when the callback should be made. This simplifies the common use case (where the client is interested in a specific state) and will generalize neatly as new states are introduced to track runtime initialization of symbols. Clients who were making use of both callbacks in a single query will now need to issue two queries (one for SymbolState::Resolved and another for SymbolState::Ready). Synchronous lookup API clients who were explicitly passing the WaitOnReady argument will now need neeed to pass a SymbolState instead (for 'WaitOnReady == true' use SymbolState::Ready, for 'WaitOnReady == false' use SymbolState::Resolved). Synchronous lookup API clients who were using default arugment values should see no change. llvm-svn: 362832	2019-06-07 19:33:51 +00:00
Simon Pilgrim	f0240ee76d	[DAGCombine] visitAND - fix local shadow variable warnings. NFCI. llvm-svn: 362825	2019-06-07 18:36:43 +00:00
Simon Pilgrim	4c9db2045a	[DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI. llvm-svn: 362820	2019-06-07 18:07:06 +00:00
Sanjay Patel	e490e4a0e7	[Analysis] simplify code for getSplatValue(); NFC AFAIK, this is only currently called by TTI, but it could be used from instcombine or CGP to help solve problems like: https://bugs.llvm.org/show_bug.cgi?id=37428 https://bugs.llvm.org/show_bug.cgi?id=42174 llvm-svn: 362810	2019-06-07 16:09:54 +00:00
Jinsong Ji	7aafdef627	[MachineScheduler] checkResourceLimit boundary condition update When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805	2019-06-07 14:54:47 +00:00
Stefan Stipanovic	128e8e8fb9	test-commit llvm-svn: 362802	2019-06-07 14:18:02 +00:00
Matt Arsenault	94a609e343	TailDuplicator: Remove no-op analyzeBranch call This could fail, which looked concerning. However nothing was actually using the results of this. I assume this was intended to use the anti-feature of analyzeBranch of removing instructions, but wasn't actually calling it with AllowModify = true. Fixes bug 42162. llvm-svn: 362800	2019-06-07 13:33:34 +00:00
Joerg Sonnenberger	b2e96169b0	[NFC] Don't export helpers of ConstantFoldCall llvm-svn: 362799	2019-06-07 13:28:52 +00:00
Nico Weber	d546b5052b	llvm-lib: Disallow mixing object files with different machine types lib.exe doesn't allow creating .lib files with object files that have differing machine types. Update llvm-lib to match. The motivation is to make it possible to infer the machine type of a .lib file in lld, so that it can warn when e.g. a 32-bit .lib file is passed to a 64-bit link (PR38965). Fixes PR38782. Differential Revision: https://reviews.llvm.org/D62913 llvm-svn: 362798	2019-06-07 13:24:34 +00:00
Sanjay Patel	6880bceda2	[x86] narrow extract subvector of vector select This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797	2019-06-07 13:17:46 +00:00
Simon Tatham	5d66f2b0af	[ARM] Fix bugs introduced by the fp64/d32 rework. Change D60691 caused some knock-on failures that weren't caught by the existing tests. Firstly, selecting a CPU that should have had a restricted FPU (e.g. `-mcpu=cortex-m4`, which should have 16 d-regs and no double precision) could give the unrestricted version, because `ARM::getFPUFeatures` returned a list of features including subtracted ones (here `-fp64`,`-d32`), but `ARMTargetInfo::initFeatureMap` threw away all the ones that didn't start with `+`. Secondly, the preprocessor macros didn't reliably match the actual compilation settings: for example, `-mfpu=softvfp` could still set `__ARM_FP` as if hardware FP was available, because the list of features on the cc1 command line would include things like `+vfp4`,`-vfp4d16` and clang didn't realise that one of those cancelled out the other. I've fixed both of these issues by rewriting `ARM::getFPUFeatures` so that it returns a list that enables every FP-related feature compatible with the selected FPU and disables every feature not compatible, which is more verbose but means clang doesn't have to understand the dependency relationships between the backend features. Meanwhile, `ARMTargetInfo::handleTargetFeatures` is testing for all the various forms of the FP feature names, so that it won't miss cases where it should have set `HW_FP` to feed into feature test macros. That in turn caused an ordering problem when handling `-mcpu=foo+bar` together with `-mfpu=something_that_turns_off_bar`. To fix that, I've arranged that the `+bar` suffixes on the end of `-mcpu` and `-march` cause feature names to be put into a separate vector which is concatenated after the output of `getFPUFeatures`. Another side effect of all this is to fix a bug where `clang -target armv8-eabi` by itself would fail to set `__ARM_FEATURE_FMA`, even though `armv8` (aka Arm v8-A) implies FP-Armv8 which has FMA. That was because `HW_FP` was being set to a value including only the `FPARMV8` bit, but that feature test macro was testing only the `VFP4FPU` bit. Now `HW_FP` ends up with all the bits set, so it gives the right answer. Changes to tests included in this patch: * `arm-target-features.c`: I had to change basically all the expected results. (The Cortex-M4 test in there should function as a regression test for the accidental double-precision bug.) * `arm-mfpu.c`, `armv8.1m.main.c`: switched to using `CHECK-DAG` everywhere so that those tests are no longer sensitive to the order of cc1 feature options on the command line. * `arm-acle-6.5.c`: been updated to expect the right answer to that FMA test. * `Preprocessor/arm-target-features.c`: added a regression test for the `mfpu=softvfp` issue. Reviewers: SjoerdMeijer, dmgreen, ostannard, samparker, JamesNagurne Reviewed By: ostannard Subscribers: srhines, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62998 llvm-svn: 362791	2019-06-07 12:42:54 +00:00
Sam Elliott	f720647ddd	[RISCV] Support Bit-Preserving FP in F/D Extensions Summary: This allows some integer bitwise operations to instead be performed by hardware fp instructions. This is correct because the RISC-V spec requires the F and D extensions to use the IEEE-754 standard representation, and fp register loads and stores to be bit-preserving. This is tested against the soft-float ABI, but with hardware float extensions enabled, so that the tests also ensure the optimisation also fires in this case. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62900 llvm-svn: 362790	2019-06-07 12:20:14 +00:00
Valery Pykhtin	cb8de55f47	[AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789	2019-06-07 12:16:46 +00:00
Cullen Rhodes	1f0d251244	[AArch64][AsmParser] error on unexpected SVE predicate type suffix Summary: This patch fixes a bug in the assembler that permitted a type suffix on predicate registers when not expected. For instance, the following was previously valid: faddv h0, p0.q, z1.h This bug was present in all SVE instructions containing predicates with no type suffix and no predication form qualifier, i.e. /z or /m. The latter instructions are already caught with an appropiate error message by the assembler, e.g.: .text <stdin>:1:13: error: not expecting size suffix cmpne p1.s, p0.b/z, z2.s, 0 ^ A similar issue for SVE vector registers was fixed in: https://reviews.llvm.org/D59636 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62942 llvm-svn: 362780	2019-06-07 08:46:56 +00:00
Cullen Rhodes	f730548484	[AArch64][AsmParser] Provide better diagnostics for SVE predicates Patch by Sander de Smalen (sdesmalen) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62941 llvm-svn: 362779	2019-06-07 08:37:00 +00:00
Pengfei Wang	f8b28931a7	[X86] -march=cooperlake (llvm) Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776	2019-06-07 08:31:35 +00:00
Sam Parker	67f9dc60b8	Fix for lld buildbot Removed unused (in non-debug builds) variable. llvm-svn: 362775	2019-06-07 08:04:18 +00:00
Sam Parker	c5ef502ee8	[CodeGen] Generic Hardware Loop Support Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774	2019-06-07 07:35:30 +00:00
Dylan McKay	04b418f246	[AVR] Expand 16-bit rotations during the legalization stage In r356860, the legalization logic for BSWAP was modified to ISD::ROTL, rather than the old ISD::{SHL, SRL, OR} nodes. This works fine on AVR for 8-bit rotations, but 16-bit rotations are currently unimplemented - they always trigger an assertion error in the AVRExpandPseudoInsts pass ("RORW unimplemented"). This patch instructions the legalizer to expand 16-bit rotations into the previous SHL, SRL, OR pattern it did previously. This fixes the 'issue-cannot-select-bswap.ll' test. Interestingly, this test failure seems flaky - it passes successfully on the avr-build-01 buildbot, but fails locally on my Arch Linux install. llvm-svn: 362773	2019-06-07 06:55:00 +00:00
Fangrui Song	c841b9abf0	[MC][ELF] Don't create relocations with section symbols for STB_LOCAL ifunc We should keep the symbol type (STT_GNU_IFUNC) for a local ifunc because it may result in an IRELATIVE reloc that the dynamic loader will use to resolve the address at startup time. There is another problem that is not fixed by this patch: a PC relative relocation should also create a relocation with the ifunc symbol. llvm-svn: 362767	2019-06-07 03:47:22 +00:00
Fangrui Song	19189993c9	[LV] Fix -Wunused-function after r362736 llvm-svn: 362762	2019-06-07 01:48:26 +00:00
Matt Arsenault	c0edb8f5cf	AMDGPU: Don't count mask branch pseudo towards skip threshold llvm-svn: 362761	2019-06-07 00:14:55 +00:00
Matt Arsenault	99ee81b183	AMDGPU: Insert skips for blocks with FLAT This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760	2019-06-07 00:14:45 +00:00
Nemanja Ivanovic	ef4a3aa549	[PowerPC] Exploit the vector min/max instructions Use the PPC vector min/max instructions for computing the corresponding operation as these should be faster than the compare/select sequences we currently emit. Differential revision: https://reviews.llvm.org/D47332 llvm-svn: 362759	2019-06-06 23:49:01 +00:00
Matt Arsenault	b6cfa129cc	AMDGPU: Insert skip branches over return blocks SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754	2019-06-06 22:51:51 +00:00
Alexey Lapshin	b9f1e7b16e	[DebugInfo] Incorrect debug info record generated for loop counter. Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage. Range for Debug Variable("i") computed according to current state of instructions inside of basic block. But Register Allocator creates new instructions which were not taken into account when Live Debug Variables computed. In the result DBG_VALUE instruction for the "i" variable was put after these newly inserted instructions. This is incorrect. Debug Value for the loop counter should be inserted before any loop instruction. Differential Revision: https://reviews.llvm.org/D62650 llvm-svn: 362750	2019-06-06 21:19:39 +00:00
Alexander Timofeev	37bd9bd137	[AMDGPU] Partial revert for the `ba447bae74` "Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749	2019-06-06 21:13:02 +00:00
Craig Topper	f320f26716	[X86] Make a bunch of merge masked binops commutable for loading folding. This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746	2019-06-06 21:00:04 +00:00
Craig Topper	ca541b20d0	[CFLGraph] Add support for unary fneg instruction. Differential Revision: https://reviews.llvm.org/D62791 llvm-svn: 362737	2019-06-06 19:21:23 +00:00
Renato Golin	9e97caf594	[LV] Wrap LV illegality reporting in a function. NFC. A function for loop vectorization illegality reporting has been introduced: void LoopVectorizationLegality::reportVectorizationFailure( const StringRef DebugMsg, const StringRef OREMsg, const StringRef ORETag, Instruction * const I) const; The function prints a debug message when the debug for the compilation unit is enabled as well as invokes the optimization report emitter to generate a message with a specified tag. The function doesn't cover any complicated logic when a custom lambda should be passed to the emitter, only generating a message with a tag is supported. The function always prints the instruction `I` after the debug message whenever the instruction is specified, otherwise the debug message ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.' Patch by Pavel Samolysov <samolisov@gmail.com> llvm-svn: 362736	2019-06-06 19:15:52 +00:00
Jason Liu	60ec248148	[AIX] Implement function descriptor on SDAG Summary: (1) Function descriptor on AIX On AIX, a called routine may have 2 distinct symbols associated with it: * A function descriptor (Name) * A function entry point (.Name) The descriptor structure on AIX is the same as those in the ELF V1 ABI: * The address of the entry point of the function. * The TOC base address for the function. * The environment pointer. The descriptor symbol uses the same name as the source level function in C. The function entry point is analogous to the symbol we would generate for a function in a non-descriptor-based ABI, except that it is renamed by prepending a ".". Which symbol gets referenced depends on the context: * Taking the address of the function references the descriptor symbol. * Calling the function references the entry point symbol. (2) Speaking of implementation on AIX, for direct function call target, we create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to replace original TargetGlobalAddress SDNode. Then down the path, we can take advantage of this MCSymbol. Patch by: Xiangling_L Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara Differential Revision: https://reviews.llvm.org/D62532 llvm-svn: 362735	2019-06-06 19:13:36 +00:00
Craig Topper	6cda33ba36	[InlineCost] Add support for unary fneg. This adds support for unary fneg based on the implementation of BinaryOperator without the soft float FP cost. Previously we would just delegate to visitUnaryInstruction. I think the only real change is that we will pass the FastMath flags to SimplifyFNeg now. Differential Revision: https://reviews.llvm.org/D62699 llvm-svn: 362732	2019-06-06 19:02:18 +00:00
Philip Reames	101915cfda	[LoopPred] Fix a bug in unconditional latch bailout introduced in r362284 This is a really silly bug that even a simple test w/an unconditional latch would have caught. I tried to guard against the case, but put it in the wrong if check. Oops. llvm-svn: 362727	2019-06-06 18:02:36 +00:00
Simon Pilgrim	842c7792aa	[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123) This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores: 1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another. 2 - The merged load\store node must retain the non-temporal flag. Differential Revision: https://reviews.llvm.org/D62910 llvm-svn: 362723	2019-06-06 17:04:13 +00:00
Dmitri Gribenko	5438cc6910	Remove unused PPC.h includes under llvm/lib/Target/PowerPC. llvm-svn: 362718	2019-06-06 16:47:06 +00:00
Craig Topper	6b67dfa54c	[X86] Make masked floating point equality/ordered compares commutable for load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717	2019-06-06 16:39:04 +00:00
Whitney Tsang	03e8369a72	[DA] Add an option to control delinearization validity checks Summary: Dependence Analysis performs static checks to confirm validity of delinearization. These checks often fail for 64-bit targets due to type conversions and integer wrapping that prevent simplification of the SCEV expressions. These checks would also fail at compile-time if the lower bound of the loops are compile-time unknown. For example: void foo(int n, int m, int a[][m]) { for (int i = 0; i < n; ++i) for (int j = 0; j < m; ++j) { a[i][j] = a[i+1][j-2]; } } opt -mem2reg -instcombine -indvars -loop-simplify -loop-rotate -inline -pass-remarks=.* -debug-pass=Arguments -da-permissive-validity-checks=false k3.ll -analyze -da will produce the following by default: da analyze - anti [* *\|<]! but will produce the following expected dependence vector if the validity checks are disabled: da analyze - consistent anti [1 -2]! This revision will introduce a debug option that will leave the validity checks in place by default, but allow them to be turned off. New tests are added for cases where it cannot be proven at compile-time that the individual subscripts stay in-bound with respect to a particular dimension of an array. These tests enable the option to provide user guarantee that the subscripts do not over/under-flow into other dimensions, thereby producing more accurate dependence vectors. For prior discussion on this topic, leading to this change, please see the following thread: http://lists.llvm.org/pipermail/llvm-dev/2019-May/132372.html Reviewers: Meinersbur, jdoerfert, kbarton, dmgreen, fhahn Reviewed By: Meinersbur, jdoerfert, dmgreen Subscribers: fhahn, hiraditya, javed.absar, llvm-commits, Whitney, etiotto Tag: LLVM Differential Revision: https://reviews.llvm.org/D62610 llvm-svn: 362711	2019-06-06 15:12:49 +00:00
Jason Liu	0338b88861	[AIX] Implement call lowering with parameters could pass onto GPRs Summary: This patch implements SDAG call lowering on AIX for functions which only have parameters that could fit into GPRs. Reviewers: hubert.reinterpretcast, syzaara Differential Revision: https://reviews.llvm.org/D62823 llvm-svn: 362708	2019-06-06 14:36:43 +00:00
Thomas Preud'homme	71d3f227a7	FileCheck [6/12]: Introduce numeric variable definition Summary: This patch is part of a patch series to add support for FileCheck numeric expressions. This specific patch introduces support for defining numeric variable in a CHECK directive. This commit introduces support for defining numeric variable from a litteral value in the input text. Numeric expressions can then use the variable provided it is on a later line. Copyright: - Linaro (changes up to diff 183612 of revision D55940) - GraphCore (changes in later versions of revision D55940 and in new revision created off D55940) Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield Tags: #llvm Differential Revision: https://reviews.llvm.org/D60386 llvm-svn: 362705	2019-06-06 13:21:06 +00:00
Adhemerval Zanella	559e69a821	AArch64] Handle ISD::LRINT and ISD::LLRINT for float16 This patch is a follow up for D62018 to add lrint/llrint support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62863 llvm-svn: 362700	2019-06-06 12:38:11 +00:00
Benjamin Kramer	f1249442cf	Revert "[SCEV] Use wrap flags in InsertBinop" This reverts commit r362687. Miscompiles llvm-profdata during selfhost. llvm-svn: 362699	2019-06-06 12:35:46 +00:00
Adhemerval Zanella	bce9e11a7b	[AArch64] Handle ISD::LROUND and ISD::LLROUND for float16 This patch is a follow up for D61391 to add lround/llround support for float16. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62861 llvm-svn: 362698	2019-06-06 11:53:26 +00:00
Dmitri Gribenko	8c2c072582	Include what you use in LanaiAsmParser.cpp llvm-svn: 362696	2019-06-06 10:37:06 +00:00
Simon Pilgrim	da993d08c8	[DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI. Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s llvm-svn: 362695	2019-06-06 10:21:18 +00:00
Petar Avramovic	81132ce0e9	[MIPS GlobalISel] Select sqrt Select G_FSQRT for MIPS32. Differential Revision: https://reviews.llvm.org/D62905 llvm-svn: 362692	2019-06-06 10:00:41 +00:00
Petar Avramovic	0a1fd355b2	[MIPS GlobalISel] Select fabs Select G_FABS for MIPS32. Differential Revision: https://reviews.llvm.org/D62903 llvm-svn: 362690	2019-06-06 09:22:37 +00:00
Petar Avramovic	a7d0006447	[MIPS GlobalISel] Select fpext and fptrunc Select G_FPEXT and G_FPTRUNC for MIPS32. Differential Revision: https://reviews.llvm.org/D62902 llvm-svn: 362689	2019-06-06 09:16:58 +00:00
Petar Avramovic	faaa2b5d21	[MIPS GlobalISel] Select floor and ceil Select G_FFLOOR and G_FCEIL for MIPS32. Differential Revision: https://reviews.llvm.org/D62901 llvm-svn: 362688	2019-06-06 09:02:24 +00:00
Sam Parker	7cc580f5e9	[SCEV] Use wrap flags in InsertBinop If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 362687	2019-06-06 08:56:26 +00:00
Amara Emerson	d3144a4abc	[AArch64][GlobalISel] Add manual selection support for G_ZEXTLOADs to s64. We already get support for G_ZEXTLOAD to s32 from the importer, but it can't deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing with G_ZEXTLOAD isn't much work. Also add tests to check the imported pattern selections to s32 work. llvm-svn: 362681	2019-06-06 07:58:37 +00:00
Amara Emerson	d940e20051	[AArch64][GlobalISel] Add the new changes to fix PR42129 that were supposed to go into r362666. The changes weren't staged so ended up just re-commiting the unmodified reverted change. llvm-svn: 362677	2019-06-06 07:33:47 +00:00
Craig Topper	9226ba6b37	[X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675	2019-06-06 05:41:27 +00:00
Amara Emerson	c37ff0d138	Revert "Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp"" When looking through copies, make sure to not try to find the vreg def of a physreg. Normally getVRegDef will return nullptr in this case, but if there happens to be multiple defs then it will assert. This fixes PR42129. llvm-svn: 362666	2019-06-05 23:46:16 +00:00
Matt Arsenault	34c8b835b1	AMDGPU: Don't fix emergency stack slot at offset 0 This forced the caller to be aware of this, which is an ugly ABI feature. Partially reverts r295877. The original reasons for doing this are mostly fixed. Alloca is now in a non-0 address space, so it should be OK to have 0 as a valid pointer. Since we treat the absolute address as the pointer value, this part only really needed to apply to kernels. Since r357093, we avoid the need to increment/decrement the offset register in more cases, and since r354816 the scavenger can fail without spilling, so it's less critical that we try to avoid an offset that fits in the MUBUF offset. Restrict to callable functions for now to split this into 2 steps to limit thte number of test updates and in case anything breaks. llvm-svn: 362665	2019-06-05 22:37:50 +00:00
Cameron McInally	c72fbe5dc1	[MSAN] Add unary FNeg visitor to the MemorySanitizer Differential Revision: https://reviews.llvm.org/D62909 llvm-svn: 362664	2019-06-05 22:37:05 +00:00
Ulrich Weigand	6c5d5ce551	Allow target to handle STRICT floating-point nodes The ISD::STRICT_ nodes used to implement the constrained floating-point intrinsics are currently never passed to the target back-end, which makes it impossible to handle them correctly (e.g. mark instructions are depending on a floating-point status and control register, or mark instructions as possibly trapping). This patch allows the target to use setOperationAction to switch the action on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code will stop converting the STRICT nodes to regular floating-point nodes, but instead pass the STRICT nodes to the target using normal SelectionDAG matching rules. To avoid having the back-end duplicate all the floating-point instruction patterns to handle both strict and non-strict variants, we make the MI codegen explicitly aware of the floating-point exceptions by introducing two new concepts: - A new MCID flag "mayRaiseFPException" that the target should set on any instruction that possibly can raise FP exception according to the architecture definition. - A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI instruction resulting from expansion of any constrained FP intrinsic. Any MI instruction that is both marked as mayRaiseFPException and FPExcept then needs to be considered as raising exceptions by MI-level codegen (e.g. scheduling). Setting those two new flags is straightforward. The mayRaiseFPException flag is simply set via TableGen by marking all relevant instruction patterns in the .td files. The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes in the SelectionDAG, and gets inherited in the MachineSDNode nodes created from it during instruction selection. The flag is then transfered to an MIFlag when creating the MI from the MachineSDNode. This is handled just like fast-math flags like no-nans are handled today. This patch includes both common code changes required to implement the new features, and the SystemZ implementation. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D55506 llvm-svn: 362663	2019-06-05 22:33:10 +00:00
Petr Hosek	2f94203e23	Revert "[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp" This reverts commit r362435 as this triggers ICE, see PR42129 for details. llvm-svn: 362662	2019-06-05 22:27:31 +00:00
Matt Arsenault	b812b7a45e	AMDGPU: Invert frame index offset interpretation Since the beginning, the offset of a frame index has been consistently interpreted backwards. It was treating it as an offset from the scratch wave offset register as a frame register. The correct interpretation is the offset from the SP on entry to the function, before the prolog. Frame index elimination then should select either SP or another register as an FP. Treat the scratch wave offset on kernel entry as the pre-incremented SP. Rely more heavily on the standard hasFP and frame pointer elimination logic, and clean up the private reservation code. This saves a copy in most callee functions. The kernel prolog emission code is still kind of a mess relying on checking the uses of physical registers, which I would prefer to eliminate. Currently selection directly emits MUBUF instructions, which require using a reference to some register. Use the register chosen for SP, and then ignore this later. This should probably be cleaned up to use pseudos that don't refer to any specific base register until frame index elimination. Add a workaround for shaders using large numbers of SGPRs. I'm not sure these cases were ever working correctly, since as far as I can tell the logic for figuring out which SGPR is the scratch wave offset doesn't match up with the shader input initialization in the shader programming guide. llvm-svn: 362661	2019-06-05 22:20:47 +00:00
Mircea Trofin	e3eeacd70a	[CallSite removal] Refactoring llvm::InlineFunction APIs Summary: This change only unifies the API previous API pair accepting CallInst and InvokeInst, thus making it easier to refactor inliner pass ode to CallBase. The implementation of the unified API still relies on the CallSite implementation. Reviewers: eraman, chandlerc, jdoerfert Reviewed By: jdoerfert Subscribers: jdoerfert, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62283 llvm-svn: 362656	2019-06-05 21:28:13 +00:00
Sanjay Patel	ac111e526d	[InstCombine] simplify code for bitcast of insertelement; NFC llvm-svn: 362655	2019-06-05 21:26:52 +00:00
Matt Arsenault	663d762c9a	NewGVN: Handle addrspacecast The AllConstant check needs to be moved out of the if/else if chain to avoid a test regression. The "there is no SimplifyZExt" comment puzzles me, since there is SimplifyCastInst. Additionally, the Simplify* calls seem to not see the operand as constant, so this needs to be tried if the simplify failed. llvm-svn: 362653	2019-06-05 21:15:52 +00:00
Craig Topper	3975b15dba	[X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647	2019-06-05 21:00:31 +00:00
Whitney Tsang	2d0896c1cb	[LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, and loop induction variable. Summary: This PR extends the loop object with more utilities to get loop bounds, step, and loop induction variable. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. /// Example: /// for (int i = lb; i < ub; i+=step) /// <loop body> /// --- pseudo LLVMIR --- /// beforeloop: /// guardcmp = (lb < ub) /// if (guardcmp) goto preheader; else goto afterloop /// preheader: /// loop: /// i1 = phi[{lb, preheader}, {i2, latch}] /// <loop body> /// i2 = i1 + step /// latch: /// cmp = (i2 < ub) /// if (cmp) goto loop /// exit: /// afterloop: /// /// getBounds /// getInitialIVValue --> lb /// getStepInst --> i2 = i1 + step /// getStepValue --> step /// getFinalIVValue --> ub /// getCanonicalPredicate --> '<' /// getDirection --> Increasing /// getInductionVariable --> i1 /// getAuxiliaryInductionVariable --> {i1} /// isCanonical --> false Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn Reviewed By: kbarton Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D60565 llvm-svn: 362644	2019-06-05 20:42:47 +00:00
Tim Northover	8d7f118ab2	InstCombine: correctly change byval type attribute alongside call args. When the byval attribute has a type, it must match the pointee type of any parameter; but InstCombine was not updating the attribute when folding casts of various kinds away. llvm-svn: 362643	2019-06-05 20:38:17 +00:00
Tim Northover	607c8a9d14	IR: make getParamByValType Just Work. NFC. Most parts of LLVM don't care whether the byval type is derived from an explicit Attribute or from the parameter's pointee type, so it makes sense for the main access function to just return the right value. The very few users who do care (only BitcodeReader so far) can find out how it's specified by accessing the Attribute directly. llvm-svn: 362642	2019-06-05 20:37:47 +00:00
Matt Arsenault	4fb580c314	AMDGPU: Remove amdgpu-max-work-group-size attribute This has been deprecated for a long time, and mesa recently switched to amdgpu-flat-work-group-size. llvm-svn: 362641	2019-06-05 20:32:32 +00:00
Matt Arsenault	0f8a764e8f	AMDGPU: Fix using 2 different enums for same operand flags These enums are really for the same namespace of flags set on arbitrary MachineOperands, so merge them to avoid value collisions. llvm-svn: 362640	2019-06-05 20:32:25 +00:00
Dan Gohman	53572d0470	[WebAssembly] Limit PIC support to the Emscripten target The current PIC support currently only works with Emscripten, so disable it for other targets. This is the PIC portion of https://reviews.llvm.org/D62542. Reviewed By: dschuff, sbc100 llvm-svn: 362638	2019-06-05 20:01:01 +00:00
Craig Topper	d0fff89b81	[X86] Add the vector integer min/max instructions to isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The reduce test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629	2019-06-05 18:25:09 +00:00
Simon Pilgrim	77d6adc491	Fix shadow local variable warning. NFCI. llvm-svn: 362622	2019-06-05 17:26:29 +00:00
Sanjay Patel	2bf82879bd	[x86] split more 256-bit stores of concatenated vectors As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620	2019-06-05 16:40:57 +00:00
Simon Pilgrim	de586bd1fd	[X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI. Enables us to use this to split 512-bit vectors in future patches. llvm-svn: 362617	2019-06-05 16:14:14 +00:00
Whitney Tsang	590b1aee60	Revert "Title: [LOOPINFO] Extend Loop object to add utilities to get the loop" This reverts commit `d34797dfc2`. llvm-svn: 362615	2019-06-05 15:32:56 +00:00
Dinar Temirbulatov	15c657d13d	[SLP] Fix regression in broadcasts caused by operand reordering patch D59973. This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 . The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found. Please see the lit test for some examples. Committed on behalf of @vporpo (Vasileios Porpodas) Differential Revision: https://reviews.llvm.org/D62427 llvm-svn: 362613	2019-06-05 15:26:28 +00:00
Sanjay Patel	ad62a3a299	[LoopUtils][SLPVectorizer] clean up management of fast-math-flags Instead of passing around fast-math-flags as a parameter, we can set those using an IRBuilder guard object. This is no-functional-change-intended. The motivation is to eventually fix the vectorizers to use and set the correct fast-math-flags for reductions. Examples of that not behaving as expected are: https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast') https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0) D61802 (should be able to reduce with IR-level FMF) Differential Revision: https://reviews.llvm.org/D62272 llvm-svn: 362612	2019-06-05 14:58:04 +00:00
Benjamin Kramer	b90b354798	[LoopInfo] Fix unused variable warning. NFC. llvm-svn: 362610	2019-06-05 14:43:58 +00:00
Whitney Tsang	d34797dfc2	Title: [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, and loop induction variable. Summary: This PR extends the loop object with more utilities to get loop bounds, step, and loop induction variable. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information. /// Example: /// for (int i = lb; i < ub; i+=step) /// <loop body> /// --- pseudo LLVMIR --- /// beforeloop: /// guardcmp = (lb < ub) /// if (guardcmp) goto preheader; else goto afterloop /// preheader: /// loop: /// i1 = phi[{lb, preheader}, {i2, latch}] /// <loop body> /// i2 = i1 + step /// latch: /// cmp = (i2 < ub) /// if (cmp) goto loop /// exit: /// afterloop: /// /// getBounds /// getInitialIVValue --> lb /// getStepInst --> i2 = i1 + step /// getStepValue --> step /// getFinalIVValue --> ub /// getCanonicalPredicate --> '<' /// getDirection --> Increasing /// getInductionVariable --> i1 /// getAuxiliaryInductionVariable --> {i1} /// isCanonical --> false Reviewers: kbarton, hfinkel, dmgreen, Meinersbur, jdoerfert, syzaara, fhahn Reviewed By: kbarton Subscribers: tvvikram, bmahjour, etiotto, fhahn, jsji, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D60565 llvm-svn: 362609	2019-06-05 14:34:12 +00:00
Petar Avramovic	22e99c434f	[MIPS GlobalISel] Select fcmp Select floating point compare for MIPS32. Differential Revision: https://reviews.llvm.org/D62721 llvm-svn: 362603	2019-06-05 14:03:13 +00:00
Sjoerd Meijer	a1bb4fb79d	[ARM] Allow "-march=foo+fp" to vary with foo This is the LLVM part of this change, the Clang part contains the full description in its commit message. Differential Revision: https://reviews.llvm.org/D60697 llvm-svn: 362600	2019-06-05 13:11:51 +00:00
Simon Pilgrim	886a55eaa0	[X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y)) We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type. This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size. llvm-svn: 362599	2019-06-05 12:56:53 +00:00
Simon Pilgrim	5a81af547c	[TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI. Will be used more in an upcoming patch. llvm-svn: 362595	2019-06-05 10:59:04 +00:00
Simon Pilgrim	db134aaec2	[IPO] Disabled 'default only' switch statements to fix MSVC warnings. @jdoerfert Looks like these are placeholders for incoming abstract attributes patches so I've just commented the code out, even though this is usually frowned upon. llvm-svn: 362592	2019-06-05 10:04:05 +00:00
Dmitri Gribenko	6fc4c1cc54	Include what you use in PPCFrameLowering.h llvm-svn: 362590	2019-06-05 08:58:00 +00:00
Yevgeny Rouban	a3e16719c4	Resubmit "[CorrelatedValuePropagation] Fix prof branch_weights metadata handling for SwitchInst" This reverts commit `5b32f60ec3`. The fix is in commit `4f9e68148b`. This patch fixes the CorrelatedValuePropagation pass to keep prof branch_weights metadata of SwitchInst consistent. It makes use of SwitchInstProfUpdateWrapper. New tests are added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D62126 llvm-svn: 362583	2019-06-05 05:46:40 +00:00
Michael Liao	fa449a9bb2	Suppress false-positive GCC -Wreturn-type warning. llvm-svn: 362582	2019-06-05 04:18:12 +00:00
Johannes Doerfert	aade782a98	[Attributor] Pass infrastructure and fixpoint framework NOTE: Note that no attributes are derived yet. This patch will not go in alone but only with others that derive attributes. The framework is split for review purposes. This commit introduces the Attributor pass infrastructure and fixpoint iteration framework. Further patches will introduce abstract attributes into this framework. In a nutshell, the Attributor will update instances of abstract arguments until a fixpoint, or a "timeout", is reached. Communication between the Attributor and the abstract attributes that are derived is restricted to the AbstractState and AbstractAttribute interfaces. Please see the file comment in Attributor.h for detailed information including design decisions and typical use case. Also consider the class documentation for Attributor, AbstractState, and AbstractAttribute. Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59918 llvm-svn: 362578	2019-06-05 03:02:24 +00:00
Nemanja Ivanovic	7c842fadf1	[PowerPC] Collapse RLDICL/RLDICR into RLDIC when possible Generally speaking, we lower to an optimal rotate sequence for nodes visible in the SDAG. However, there are instances where the two rotates are not visible at ISEL time - most notably those in a very common sequence when lowering switch statements to jump tables. A common situation is a switch on a 32-bit integer. This value has to have the upper 32 bits cleared and because jump table offsets are word offsets, the value needs to be shifted left by 2 bits. We currently emit the clear and the left shift as two separate instructions, but this is not needed as we can lower it to a single RLDIC. This patch just cleans that up. Differential revision: https://reviews.llvm.org/D60402 llvm-svn: 362576	2019-06-05 02:36:40 +00:00
Fangrui Song	f090e6f7b6	[llvm-objdump/llvm-readobj/obj2yaml/yaml2obj] Support DT_PPC_GOT and DT_PPC_OPT In glibc, DT_PPC_GOT indicates that PowerPC32 Secure PLT ABI is used. I plan to use it in D62464. DT_PPC_OPT currently indicates if a TLSDESC inspired TLS optimization is enabled. Reviewed By: grimar, jhenderson, rupprecht Differential Revision: https://reviews.llvm.org/D62851 llvm-svn: 362569	2019-06-05 01:36:48 +00:00
Nemanja Ivanovic	fe97754acf	Initial support for IBM MASS vector library This is the LLVM portion of patch https://reviews.llvm.org/D59881. The clang portion is to follow. llvm-svn: 362568	2019-06-05 01:31:43 +00:00
Craig Topper	78fdce25a1	[X86] Cleanup convertIntLogicToFPLogic a little. NFCI -Use early returns to reduce indentation -Replace multipe ifs with a switch. -Replace an assert with an llvm_unreachable default in the switch. -Check that the FP type we're going to use for the X86ISD::FAND/FOR/FXOR is legal rather than checking that the integer type matches the width of a legal scalar fp type. This all runs after legalization so it shouldn't really matter, but making sure we're using a valid type in the X86ISD node is really whats important. llvm-svn: 362565	2019-06-05 01:00:34 +00:00
Cameron McInally	5c7245b830	[Scalarizer] Add UnaryOperator visitor to scalarization pass Differential Revision: https://reviews.llvm.org/D62858 llvm-svn: 362558	2019-06-04 23:01:36 +00:00
Amara Emerson	2d37cb82f0	[AArch64][GlobalISel] Make extloads to i64 legal. Although we had the support in the prelegalizer combiner to generate the G_SEXTLOAD or G_ZEXTLOAD ops, the legalizer definitions for arm64 had them as lowering back to separate ops. llvm-svn: 362553	2019-06-04 21:51:34 +00:00
Thomas Lively	3d9ca00e74	[WebAssembly] Fix ISel crash on sext_inreg/extract type mismatch Summary: Adjusts the index and adds a bitcast around the vector operand of EXTRACT_VECTOR_ELT so that its lane type matches the source type of its parent sext_inreg. Without this bitcast the ISel patterns do not match and ISel fails. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62646 llvm-svn: 362547	2019-06-04 21:08:20 +00:00
Johannes Doerfert	6b432dca5d	[SelectionDAG][FIX] Allow "returned" arguments to be bit-casted Summary: An argument that is return by a function but bit-casted before can still be annotated as "returned". Make sure we do not crash for this case. Reviewers: sunfish, stephenwlin, niravd, arsenm Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59917 llvm-svn: 362546	2019-06-04 20:34:43 +00:00
Johannes Doerfert	40107ce753	Introduce Value::stripPointerCastsSameRepresentation This patch allows current users of Value::stripPointerCasts() to force the result of the function to have the same representation as the value it was called on. This is useful in various cases, e.g., (non-)null checks. In this patch only a single call site was adjusted to fix an existing misuse that would cause nonnull where they may be wrong. Uses in attribute deduction and other areas, e.g., D60047, are to be expected. For a discussion on this topic, please see [0]. [0] http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html Reviewers: hfinkel, arsenm, reames Subscribers: wdng, hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61607 llvm-svn: 362545	2019-06-04 20:21:46 +00:00
Nico Weber	1dce82636c	llvm-undname: Correctly demangle vararg parameters FunctionSignatureNode already had an IsVariadic field, but it wasn't used anywhere yet. Set it and use it. llvm-svn: 362541	2019-06-04 19:10:08 +00:00
Nico Weber	4638548468	llvm-undname: More coverage-related cleanups - The loop in demangleFunctionParameterList() only exits on Error, @, and Z. All 3 cases were handled, so the rest of the function is DEMANGLE_UNREACHABLE. - The loop in demangleTemplateParameterList() always returns on Error, so there's no need to check for that in the loop header and after the loop. - Add test cases for invalid function parameter manglings. - Add a (redundant) test case for a simple template parameter list mangling. - Add a test case pointing out that varargs functions aren't demangled correctly. llvm-svn: 362540	2019-06-04 18:49:05 +00:00
Nemanja Ivanovic	aed7227b71	Revert r362472 as it is breaking PPC build bots The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots. Reverting it to bring the bots back to green. llvm-svn: 362539	2019-06-04 18:48:43 +00:00
Alina Sbirlea	bfceed49ce	[Utils] Clean another duplicated util method. Summary: Following the cleanup in D48202, method foldBlockIntoPredecessor has the same behavior. Replace its uses with MergeBlockIntoPredecessor. Remove foldBlockIntoPredecessor. Reviewers: chandlerc, dmgreen Subscribers: jlebar, javed.absar, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62751 llvm-svn: 362538	2019-06-04 18:45:15 +00:00
Nico Weber	878df1c2a9	llvm-undname: Add test coverage for demangleInitFiniStub() llvm-svn: 362536	2019-06-04 18:06:28 +00:00
Craig Topper	137de38009	[X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535	2019-06-04 18:03:07 +00:00
Benjamin Kramer	03ff1b3c30	[X86] Fold single-use variable into assert. NFC. Avoids an unused variable warning in Release builds. llvm-svn: 362534	2019-06-04 18:01:07 +00:00
Craig Topper	09a4415803	[DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1) This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial. Fixes PR42118 Differential Revision: https://reviews.llvm.org/D62828 llvm-svn: 362533	2019-06-04 17:44:18 +00:00
Alex Brachet	c33944832c	[MACHO] Replaced calls to getStruct with getStructOrErr in functions returning Error or Expected or similar llvm-svn: 362526	2019-06-04 16:55:30 +00:00
Sanjay Patel	606eb2367f	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524	2019-06-04 16:40:04 +00:00
Peter Smith	f15e3d856f	[AArch64][ELF] Add support for PLT decoding with BTI instructions present Arm Architecture v8.5a introduces Branch Target Identification (BTI). When enabled all indirect branches must target a bti instruction of the appropriate form. As PLT sequences may sometimes be the target of an indirect branch and PLT[0] always is, a static linker may need to generate PLT sequences that contain "bti c" as the first instruction. In effect: bti c adrp x16, page offset to .got.plt ... Instead of: adrp x16, page offset to .got.plt ... At present the PLT decoding assumes the adrp will always be the first instruction. This patch adds support for a single "bti c" to prefix it. A test binary has been uploaded with such a PLT sequence. A forthcoming LLD patch will make heavy use of the PLT decoding code. Differential Revision: https://reviews.llvm.org/D62598 llvm-svn: 362523	2019-06-04 16:35:40 +00:00
Nico Weber	d98a0a362f	llvm-undname: Yet more coverage for error paths - For error returns in demangleSpecialTableNode(), demangleLocalStaticGuard(), RTTITypeDescriptor, demangleRttiBaseClassDescriptorNode(), demangleUnsigned(), demangleUntypedVariable() (via RttiBaseClassArray) - For ?_A and ?_P which are handled at early levels of the demangler but are not implemented in a later stage; this is now more obvious - Replace a "default:" with an explicit list of cases, to get -Wswitch check we list all cases llvm-svn: 362520	2019-06-04 16:25:28 +00:00
Nikita Popov	df621bdfc8	[LVI][CVP] Add support for urem, srem and sdiv The underlying ConstantRange functionality has been added in D60952, D61207 and D61238, this just exposes it for LVI. I'm switching the code from using a whitelist to a blacklist, as we're down to one unsupported operation here (xor) and writing it this way seems more obvious :) Differential Revision: https://reviews.llvm.org/D62822 llvm-svn: 362519	2019-06-04 16:24:09 +00:00
Nico Weber	c1a0e6fe6b	llvm-undname: More no-op changes to increase test coverage - Add test coverage around invalid anon namespaces and for error paths in demanglePrimitiveType() and in demangleFullyQualifiedTypeName() - Use DEMANGLE_UNREACHABLE in two more unreachable places llvm-svn: 362514	2019-06-04 15:38:00 +00:00
Jinsong Ji	3144d7a2da	[PowerPC] P9 Scheduling Model: dispatching rule fixes This is to address some of the problems in existing P9 resource modeling, especially about the dispatching rules. Instead of using a hypothetical DISPATCHER , we try to use the number of actual dispatch slots, and define SchedWriteRes to model dispatch rules, then update instruction classes according to dispatch rules. All the dispatch rules and instruction classes update are made according to POWER9 User Manual. Differential Revision: https://reviews.llvm.org/D61873 llvm-svn: 362509	2019-06-04 15:22:23 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Nico Weber	880d21d3cb	llvm-undname: Several behavior-preserving changes to increase coverage - Replace `Error = true` in a few branches that are truly unreachable with DEMANGLE_UNREACHABLE - Remove early return early in startsWithLocalScopePattern() because it's redundant with the next two early returns - Remove unreachable `case '0'` (it's handled in the branch below) - Remove an unused bool return - Add test coverage for several early error returns, mostly in array type parsing llvm-svn: 362506	2019-06-04 15:13:30 +00:00
Simon Pilgrim	a6e289e9f8	[X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI. As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher. llvm-svn: 362504	2019-06-04 15:02:33 +00:00
Shawn Landden	669775f9db	[Support] make countLeadingZeros() countTrailingZeros() countLeadingOnes() and countTrailingOnes() return unsigned This matches APInt's versions of these functions, and there is no need for these to be size_t. (as well as __builtin_clzll()) Differential Revision: https://reviews.llvm.org/D60823 llvm-svn: 362503	2019-06-04 14:51:15 +00:00
Dmitri Gribenko	454fc77872	Include what you use in PPCRegisterInfo.cpp llvm-svn: 362495	2019-06-04 12:55:00 +00:00
Peter Smith	49d7221f71	[AArch64][ELF][llvm-readobj] Add support for BTI and PAC dynamic tags ELF for the 64-bit Arm Architecture defines two processor-specific dynamic tags: DT_AARCH64_BTI_PLT 0x70000001, d_val DT_AARCH64_PAC_PLT 0x70000003, d_val These presence of these tags indicate that PLT sequences have been protected using Branch Target Identification and Pointer Authentication respectively. The presence of both indicates that the PLT sequences have been protected with both Branch Target Identification and Pointer Authentication. This patch adds the tags and tests for llvm-readobj and yaml2obj. As some of the processor specific dynamic tags overlap, this patch splits them up, keeping their original default value if they were not previously mentioned explicitly in a switch case. Differential Revision: https://reviews.llvm.org/D62596 llvm-svn: 362493	2019-06-04 11:44:33 +00:00
Roman Lebedev	3dce0326fe	[DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952) Summary: This might be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet. As far as i can tell, there are no regressions here (ignoring x86-32), all changes are either good or neutral. This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`) `@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/vMd3 Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma Reviewed By: RKSimon Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62774 llvm-svn: 362488	2019-06-04 11:06:21 +00:00
Roman Lebedev	be6ce7b3f2	[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold Summary: All changes except ARM look great. https://rise4fun.com/Alive/R2M The regression `test/CodeGen/ARM/addsubcarry-promotion.ll` is recovered fully by D62392 + D62450. Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma Reviewed By: efriedma Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62266 llvm-svn: 362487	2019-06-04 11:06:08 +00:00
Simon Pilgrim	ad298f86b7	[SelectionDAG] ComputeNumSignBits - support constant pool values from target As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits. The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select. It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup. Differential Revision: https://reviews.llvm.org/D62777 llvm-svn: 362486	2019-06-04 10:49:06 +00:00
Simon Pilgrim	3178546a27	[SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD comments. NFCI. Pre-commit requested for D62777. llvm-svn: 362485	2019-06-04 10:17:56 +00:00
Owen Reynolds	5d5078e341	[llvm-ar] Reapply Fix relative thin archive path handling Includes a fix for an introduced build failure due to a post c++11 use of std::mismatch. This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command. Differential Revision: https://reviews.llvm.org/D59491 llvm-svn: 362484	2019-06-04 10:13:03 +00:00
Simon Pilgrim	3018d505a3	[SelectionDAG] Add fpto[us]i(undef) --> undef constant fold Follow up to D62807. Differential Revision: https://reviews.llvm.org/D62811 llvm-svn: 362483	2019-06-04 10:04:55 +00:00
Mikhail Maltsev	08da01b496	[ARM] Add FP16 vector insert/extract patterns This change adds two FP16 extraction and two insertion patterns (one per possible vector length). Extractions are handled by copying a Q/D register into one of VFP2 class registers, where single FP32 sub-registers can be accessed. Then the extraction of even lanes are simple sub-register extractions (because we don't care about the top parts of registers for FP16 operations). Odd lanes need an additional VMOVX instruction. Unfortunately, insertions cannot be handled in the same way, because: * There is no instruction to insert FP16 into an even lane (VINS only works with odd lanes) * The patterns for odd lanes will have a form of a DAG (not a tree), and will not be implementable in pure tablegen Because of this insertions are handled in the same way as 16-bit integer insertions (with conversions between FP registers and GPRs using VMOVHR instructions). Without these patterns the ARM backend would sometimes fail during instruction selection. This patch also adds patterns which combine: * an FP16 element extraction and a store into a single VST1 instruction * an FP16 load and insertion into a single VLD1 instruction Differential Revision: https://reviews.llvm.org/D62651 llvm-svn: 362482	2019-06-04 09:39:55 +00:00
Dmitri Gribenko	63846039f5	Silenced a warning "implicit conversion turns string literal into bool" introduced in r362473 llvm-svn: 362480	2019-06-04 09:31:07 +00:00
Dmitri Gribenko	73a15d4b78	Include what you use in PPC.h llvm-svn: 362477	2019-06-04 09:16:35 +00:00
Dmitri Gribenko	067a17b51d	Include what you use in PPCMachineScheduler.cpp llvm-svn: 362476	2019-06-04 09:16:31 +00:00
Dmitri Gribenko	9d1c5ea165	Include what you use in PPCRegisterInfo.h llvm-svn: 362475	2019-06-04 09:13:08 +00:00
Yevgeny Rouban	4f9e68148b	Make SwitchInstProfUpdateWrapper safer While prof branch_weights inconsistencies are being fixed patch by patch (pass by pass) we need SwitchInstProfUpdateWrapper to be safe with respect to inconsistent metadata that can come from passes that have not been fixed yet. See the bug found by @nikic in https://reviews.llvm.org/D62126. This patch introduces one more state (called Invalid) to the wrapper class that allows users to work with the underlying SwitchInst ignoring the prof metadata changes. Created a unit test for the SwitchInstProfUpdateWrapper class. Reviewers: davidx, nikic, eraman, reames, chandlerc Reviewed By: davidx Differential Revision: https://reviews.llvm.org/D62656 llvm-svn: 362473	2019-06-04 09:03:39 +00:00
QingShan Zhang	11de0e71b0	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D61843 llvm-svn: 362472	2019-06-04 08:53:53 +00:00
Simon Tatham	ac02445524	[ARM] Turn some undefined encoding bits into 0s. The family of 32-bit Thumb instruction encodings that include t2ORR, t2AND and t2EOR are all listed in the ArmARM as having (0) in bit 15. The Tablegen descriptions of those instructions listed them as ?. This change tightens that up by making them into 0 + Unpredictable. In the specific case of t2ORR, we tighten it up still further by making the zero bit mandatory. This change comes from Arm v8.1-M, in which encodings with that bit equal to 1 will now be used for different instructions. Reviewers: dmgreen, samparker, SjoerdMeijer, efriedma Reviewed By: dmgreen, efriedma Subscribers: efriedma, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60705 llvm-svn: 362470	2019-06-04 08:28:48 +00:00
Cameron McInally	89f9af5487	[SCCP] Add UnaryOperator visitor to SCCP for unary FNeg Differential Revision: https://reviews.llvm.org/D62819 llvm-svn: 362449	2019-06-03 21:53:56 +00:00
Michael Berg	6ff978ee05	Propagate fmf for setcc in SDAG for select folds llvm-svn: 362448	2019-06-03 21:53:26 +00:00
Matt Arsenault	0ceda9fb5c	AMDGPU: Disable stack realignment for kernels This is something of a workaround, and the state of stack realignment controls is kind of a mess. Ideally, we would be able to specify the stack is infinitely aligned on entry to a kernel. TargetFrameLowering provides multiple controls which apply at different points. The StackRealignable field is used during SelectionDAG, and for some reason distinct from this hook. StackAlignment is a single field not dependent on the function. It would probably be better to make that dependent on the calling convention, and the maximum value for kernels. Currently this doesn't really change anything, since the frame lowering mostly does its own thing. This helps avoid regressions in a future change which will rely more heavily on hasFP. llvm-svn: 362447	2019-06-03 21:33:22 +00:00
Jessica Paquette	7500c97ce4	[AArch64][GlobalISel] Optimize G_FCMP + G_SELECT pairs when G_SELECT is fp Instead of emitting all of the test stuff for a compare when it's only used by a select, instead, just emit the compare + select. The select will use the value of NZCV correctly, so we don't need to emit all of the test instructions etc. For now, only support fp selects which use G_FCMP. Also only support condition codes which will only require one select to represent. Also add a test. Differential Revision: https://reviews.llvm.org/D62695 llvm-svn: 362446	2019-06-03 20:47:20 +00:00
George Burgess IV	c24a2f4ad9	CFLAA: reflow comments; NFC llvm-svn: 362442	2019-06-03 19:56:22 +00:00
Craig Topper	7a4eabef39	[CFLGraph] Add FAdd to visitConstantExpr. This looks like an oversight as all the other binary operators are present. Accidentally noticed while auditing places that need FNeg handling. No test because as noted in the review it would be contrived and amount to "don't crash" Differential Revision: https://reviews.llvm.org/D62790 llvm-svn: 362441	2019-06-03 19:35:52 +00:00
Craig Topper	dcf865f0ca	[X86] Fix the pattern for merge masked vcvtps2pd. r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440	2019-06-03 19:29:14 +00:00
Michael Berg	0b7f98da65	Propagate fmf for setcc/select folds Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG. Reviewers: qcolombet, spatel Reviewed By: qcolombet Subscribers: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D62552 llvm-svn: 362439	2019-06-03 19:12:15 +00:00
Nemanja Ivanovic	bad43d8f49	[PowerPC] Look through copies for compare elimination We currently miss the opportunities for optmizing comparisons in the peephole optimizer if the input is the result of a COPY since we look for record-form versions of the producing instruction. This patch simply lets the optimization peek through copies. Differential revision: https://reviews.llvm.org/D59633 llvm-svn: 362438	2019-06-03 19:09:15 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Nikita Popov	c061b99c5b	[ConstantRange] Add sdiv() support The implementation is conceptually simple: We separate the LHS and RHS into positive and negative components and then also compute the positive and negative components of the result, taking into account that e.g. only pos/pos and neg/neg will give a positive result. However, there's one significant complication: SignedMin / -1 is UB for sdiv, and we can't just ignore it, because the APInt result of SignedMin would break the sign segregation. Instead we drop SignedMin or -1 from the corresponding ranges, taking into account some edge cases with wrapped ranges. Because of the sign segregation, the implementation ends up being nearly fully precise even for wrapped ranges (the remaining imprecision is due to ranges that are both signed and unsigned wrapping and are divided by a trivial divisor like 1). This means that the testing cannot just check the signed envelope as we usually do. Instead we collect all possible results in a bitvector and construct a better sign wrapped range (than the full envelope). Differential Revision: https://reviews.llvm.org/D61238 llvm-svn: 362430	2019-06-03 18:19:54 +00:00
Andrew Kaylor	4172dbab5d	Fix a crash when the default of a switch is removed This patch fixes a problem that occurs in LowerSwitch when a switch statement has a PHI node as its condition, and the PHI node only has two incoming blocks, and one of those incoming blocks is through an unreachable default in the switch statement. When this condition occurs, LowerSwitch holds a pointer to the condition value, but removes the switch block as a predecessor of the PHI block, causing the PHI node to be replaced. LowerSwitch then tries to use its stale pointer to the original condition value, causing a crash. Differential Revision: https://reviews.llvm.org/D62560 llvm-svn: 362427	2019-06-03 17:54:15 +00:00
Dmitri Gribenko	26c43d0ef8	Include what you use in Lanai.h Other files were not relying on these transitive includes, so I'm submitting this change separately. llvm-svn: 362423	2019-06-03 17:02:15 +00:00
Dmitri Gribenko	b8aeaf882e	Include what you use in LanaiAsmPrinter.cpp llvm-svn: 362422	2019-06-03 17:02:07 +00:00
Dmitri Gribenko	dc136847e3	Include what you use in LanaiMemAluCombiner.cpp llvm-svn: 362421	2019-06-03 17:02:02 +00:00
Dmitri Gribenko	f4d22bd0b4	Include what you use in LanaiISelDAGToDAG.cpp llvm-svn: 362420	2019-06-03 17:01:57 +00:00
Dmitri Gribenko	179154f6b9	Include what you use in LanaiFrameLowering.{cpp,h} llvm-svn: 362419	2019-06-03 17:01:52 +00:00
Dmitri Gribenko	8e317e29da	Include what you use in LanaiRegisterInfo.cpp llvm-svn: 362416	2019-06-03 16:31:37 +00:00
Philip Reames	9ed1673703	[LoopPred] Convert a second member function to a static helper [NFC] (And remember to actually mark the first one static.) llvm-svn: 362415	2019-06-03 16:23:20 +00:00
Dmitri Gribenko	857de979a7	Revert "[llvm-ar] Fix relative thin archive path handling" This reverts commit r362407. It broke compilation of llvm/lib/Object/ArchiveWriter.cpp: error: type 'llvm::sys::path::const_iterator' does not provide a call operator llvm-svn: 362413	2019-06-03 16:21:37 +00:00
Nemanja Ivanovic	009d08f313	[PowerPC] Set PROT_READ flag for MF_EXEC to prevent segfaults on PPC machines The big endian PPC buildbots are all failing now due to calls to cache invalidation in unit tests on data that has only the PROT_EXEC flag set. This has been an issue all along on FreeBSD but it can affect Linux machines depending on configuration. This patch mitigates the issue the same way it is mitigated on FreeBSD. Since this is needed to bring the buildbots back to green, I plan to commit this and allow for post-commit review, but I thought I would also post it here for ease of access/readability. Differential revision: https://reviews.llvm.org/D62741 llvm-svn: 362412	2019-06-03 16:20:59 +00:00
Philip Reames	0912b06f78	[LoopPred] Convert member function to free helper function [NFC] llvm-svn: 362411	2019-06-03 16:17:14 +00:00
Dmitri Gribenko	bedcaea99a	Include what you use in LanaiInstrInfo.cpp llvm-svn: 362408	2019-06-03 15:26:25 +00:00
Owen Reynolds	fade9cbed7	[llvm-ar] Fix relative thin archive path handling This fixes some thin archive relative path issues, paths are shortened where possible and paths are output correctly when using the display table command. Differential Revision: https://reviews.llvm.org/D59491 llvm-svn: 362407	2019-06-03 15:26:07 +00:00
Dmitri Gribenko	b3bd866c7f	Include what you use in PPCInstrInfo.h llvm-svn: 362405	2019-06-03 15:04:05 +00:00
Dmitri Gribenko	2b369f83c5	Include what you use in NVPTX.h Other files were not relying on these transitive includes, so I'm submitting this change separately. llvm-svn: 362403	2019-06-03 14:37:26 +00:00
Dmitri Gribenko	14c69fefe6	Include what you use in NVPTX.h I also fixed all other files that were including NVPTX.h and were relying on transitive includes. llvm-svn: 362402	2019-06-03 14:26:50 +00:00
Dmitry Preobrazhensky	9111f35f02	[AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292 Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D62660 llvm-svn: 362400	2019-06-03 13:51:24 +00:00
Simon Pilgrim	cb7e4e8193	[SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205) We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction Differential Revision: https://reviews.llvm.org/D62807 llvm-svn: 362397	2019-06-03 13:02:07 +00:00
Dmitri Gribenko	7a3e4ab286	Include what you use in LanaiInstPrinter.cpp llvm-svn: 362395	2019-06-03 12:53:05 +00:00
Dmitri Gribenko	2f66316c96	Include what you use in LanaiMCCodeEmitter.cpp LanaiMCCodeEmitter.cpp was not using any APIs from Lanai.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Lanai target library and the MCTargetDesc library). llvm-svn: 362394	2019-06-03 12:42:48 +00:00
Dmitri Gribenko	c69ee63cb9	Include what you use in LanaiDisassembler.cpp llvm-svn: 362392	2019-06-03 12:37:11 +00:00
Nicolai Haehnle	edfa756f3f	AMDGPU/GFX10: V_CMPX_xxx instructions still have an omod operand Summary: Change-Id: If6ee98e4a723b643bc37254fc6ef8b3812db16da Reviewers: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62720 Change-Id: Id547ef152b2f92b24dc1c0efbf7e4467c4fb4b6e llvm-svn: 362390	2019-06-03 12:07:41 +00:00
Dmitri Gribenko	8668fc0102	Include what you use in HexagonInstPrinter.cpp HexagonInstPrinter.cpp was not using any APIs from HexagonAsmPrinter.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362389	2019-06-03 11:41:22 +00:00
Dmitri Gribenko	61b49ccb77	Include what you use in HexagonAsmPrinter.h llvm-svn: 362388	2019-06-03 11:41:18 +00:00
Dmitri Gribenko	03d1b33041	Include what you use in HexagonMCInstrInfo.cpp HexagonMCInstrInfo.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362387	2019-06-03 11:25:37 +00:00
Dmitri Gribenko	970b9f961f	Include what you use in HexagonMCCodeEmitter.cpp HexagonMCCodeEmitter.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362386	2019-06-03 11:20:53 +00:00
Dmitri Gribenko	ebe360edfa	Include what you use in HexagonMCCompound.cpp HexagonMCCompound.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362385	2019-06-03 11:20:48 +00:00
Dmitri Gribenko	6e076a081a	Include what you use in HexagonShuffler.cpp HexagonShuffler.cpp was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362384	2019-06-03 11:14:20 +00:00
Dmitri Gribenko	6214b577b7	Include what you use in HexagonMCChecker.cpp HexagonMCChecker.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362383	2019-06-03 11:14:15 +00:00
Dmitri Gribenko	bf2a356ec0	Include what you use in HexagonMCTargetDesc.cpp HexagonMCTargetDesc.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362382	2019-06-03 11:14:10 +00:00
Dmitri Gribenko	beb7f48a29	Include what you use in HexagonMCShuffler.cpp HexagonMCShuffler.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362381	2019-06-03 11:14:05 +00:00
Simon Tatham	dc83a3c449	[ARM] Fix recent breakage of -mfpu=none. The recent change D60691 introduced a bug in clang when handling option combinations such as `-mcpu=cortex-m4 -mfpu=none`. Those options together should select Cortex-M4 but disable all use of hardware FP, but in fact, now hardware FP instructions can still be generated in that mode. The reason is because the handling of FPUVersion::NONE disables all the same feature names it used to, of which the base one is `vfp2`. But now there are further features below that, like `vfp2d16fp` and (following D60694) `fpregs`, which also need to be turned off to disable hardware FP completely. Added a tiny test which double-checks that compiling a simple FP function doesn't access the FP registers. Reviewers: SjoerdMeijer, dmgreen Reviewed By: dmgreen Subscribers: lebedev.ri, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62729 llvm-svn: 362380	2019-06-03 11:02:53 +00:00
Dmitri Gribenko	7ebfbebfe1	Include what you use in HexagonELFObjectWriter.cpp HexagonELFObjectWriter.cpp was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362376	2019-06-03 09:56:40 +00:00
Nikola Prica	2d0106a110	[LiveDebugValues] Close range for previous variable's location when adding newly deduced location When LiveDebugValues deduces new variable's location from spill, restore or register copy instruction it should close old variable's location. Otherwise we can have multiple block output locations for same variable. That could lead to inserting two DBG_VALUEs for same variable to the beginning of the successor block which results to ignoring of first DBG_VALUE. Reviewers: aprantl, jmorse, wolfgangp, dstenb Reviewed By: aprantl Subscribers: probinson, asowda, ivanbaev, petarj, djtodoro Tags: #debug-info Differential Revision: https://reviews.llvm.org/D62196 llvm-svn: 362373	2019-06-03 09:48:29 +00:00
Dmitri Gribenko	0aa374a306	Include what you use in HexagonAsmBackend.cpp HexagonAsmBackend.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362372	2019-06-03 09:43:05 +00:00
Dmitri Gribenko	301f8fd632	Include what you use in HexagonAsmParser.cpp HexagonAsmParser.cpp was not using any APIs from Hexagon.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the AsmParser library). llvm-svn: 362370	2019-06-03 09:38:48 +00:00
Dmitri Gribenko	c5327ab71d	Include what you use in HexagonShuffler.h HexagonShuffler.h was not using any APIs from Hexagon.h, and was only including it for transitive dependencies. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary Hexagon target library and the MCTargetDesc library). llvm-svn: 362369	2019-06-03 09:33:48 +00:00
Dmitri Gribenko	3c837201e0	Include what you use in BPFMCTargetDesc.cpp BPFMCTargetDesc.cpp was not using any APIs from BPF.h. Doing so is problematic from include-what-you-use perspective, but it is also a layering issue (it creates a dependency cycle between the primary BPF target library and the MCTargetDesc library). llvm-svn: 362368	2019-06-03 09:29:51 +00:00
Diogo N. Sampaio	df92f84110	[ARM][FIX] Ran out of registers due tail recursion Summary: - pr42062 When compiling for MinSize, ARMTargetLowering::LowerCall decides to indirect multiple calls to a same function. However, it disconsiders the limitation that thumb1 indirect calls require the callee to be in a register from r0 to r3 (llvm limiation). If all those registers are used by arguments, the compiler dies with "error: run out of registers during register allocation". This patch tells the function IsEligibleForTailCallOptimization if we intend to perform indirect calls, as to avoid tail call optimization. Reviewers: dmgreen, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62683 llvm-svn: 362366	2019-06-03 08:58:05 +00:00
Sam Parker	a0bd6f8a1a	[AArch64] Check for simple type in FPToUInt DAGCombiner was hitting a SimpleType assertion when trying to combine a v3f32 before type legalization. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41916 Differential Revision: https://reviews.llvm.org/D62734 llvm-svn: 362365	2019-06-03 08:49:17 +00:00
Jim Lin	20b14dacbb	[AVR] Fix incorrect source regclass of LDWRdPtr Summary: LDWRdPtr would be expanded to ld+ldd. ldd only accepts the pointer register is Y or Z. So the register class of pointer of LDWRdPtr should be PTRDISPREGS instead of PTRREGS. Reviewers: dylanmckay Reviewed By: dylanmckay Subscribers: dylanmckay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62300 llvm-svn: 362351	2019-06-03 02:31:07 +00:00
Florian Hahn	e71963c850	Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor. If we hit the limit, we do expand the outstanding tokenfactors. Otherwise, we might drop nodes with users in the unexpanded tokenfactors. This fixes the crashes reported by Jordan Rupprecht. Reviewers: niravd, spatel, craig.topper, rupprecht Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D62633 llvm-svn: 362350	2019-06-03 01:30:19 +00:00
Nico Weber	54362477c7	llvm-undname; Add more test coverage for demangleFunctionClass() Also add two FC_Far that seem to be missing, by symmetry from the public and protected cases. (But FC_Far isn't really a thing anymore, so this doesn't really have an observable effect.) llvm-svn: 362344	2019-06-02 23:26:57 +00:00
Craig Topper	50b35caf30	[DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask. Similar to what was done for masked load and gather. llvm-svn: 362342	2019-06-02 22:52:38 +00:00
Craig Topper	5f79d74946	[X86] Add test cases for masked store and masked scatter with an all zeroes mask. Fix bug in ScalarizeMaskedMemIntrin Need to cast only to Constant instead of ConstantVector to allow ConstantAggregateZero. llvm-svn: 362341	2019-06-02 22:52:34 +00:00
Simon Pilgrim	8a32ca381d	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338	2019-06-02 20:37:02 +00:00
Craig Topper	a7bc31ebc6	[DAGCombiner] Replace masked loads with a zero mask with the passthru value Similar to what was recently done for gathers in r362015. llvm-svn: 362337	2019-06-02 18:58:46 +00:00
Simon Pilgrim	59a8db628b	[TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI. Prep work before resurrecting D61257. llvm-svn: 362335	2019-06-02 18:06:42 +00:00
Nico Weber	b5cd6163f4	Remove code path that's dead after r358835 llvm-svn: 362333	2019-06-02 17:41:07 +00:00
Simon Pilgrim	71a39bcf68	[X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921) Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.) llvm-svn: 362327	2019-06-02 15:47:49 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Simon Pilgrim	ffb4d2bff7	[DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020) Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot. PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe. Differential Revision: https://reviews.llvm.org/D62783 llvm-svn: 362323	2019-06-02 11:56:39 +00:00
Simon Pilgrim	88522ce388	[TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables in analysis. These might have been replaced in multiple use cases. llvm-svn: 362322	2019-06-02 10:12:55 +00:00
Simon Pilgrim	30a6caa3e7	[TargetLowering] SimplifyDemandedVectorElts - use same arg names as SimplifyDemandedBits. NFCI. Helps with debugging as we recurse between them. llvm-svn: 362321	2019-06-02 10:03:56 +00:00
Craig Topper	f58ef87bb7	[DAGCombiner] Replace two unchecked dyn_casts with casts. The results of the dyn_casts were immediately dereferenced on the next line so they had better not be null. I don't think there's any way for these dyn_casts to fail, so use a cast of adding null check. llvm-svn: 362315	2019-06-02 03:31:01 +00:00
Craig Topper	78c794a70b	[X86] Fix several places that weren't passing what they though they were to MachineInstr::print Over a year ago, MachineInstr gained a fourth boolean parameter that occurs before the TII pointer. When this happened, several places started accidentally passing TII into this boolean parameter instead of the TII parameter. llvm-svn: 362312	2019-06-02 01:36:48 +00:00
Craig Topper	396a915c26	[X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative. llvm-svn: 362309	2019-06-02 00:42:58 +00:00
Nikita Popov	900578d1c1	[SimplifyIndVar] Refactor overflow check elimination code; NFC Extract a willNotOverflow() helper function that is shared between eliminateOverflowIntrinsic() and strengthenOverflowingOperation(). Use WithOverflowInst for the former. We'll be able to reuse the same code for saturating intrinsics as well. llvm-svn: 362305	2019-06-01 20:21:53 +00:00
Craig Topper	7cebf0af40	[InlineCost] Don't add the soft float function call cost for the fneg idiom, fsub -0.0, %x Summary: Fneg can be implemented with an xor rather than a function call so we don't need to add the function call overhead. This was pointed out in D62699 Reviewers: efriedma, cameron.mcinally Reviewed By: efriedma Subscribers: javed.absar, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62747 llvm-svn: 362304	2019-06-01 19:40:07 +00:00
Andrea Di Biagio	6a989c358c	[MCA][Scheduler] Change how memory instructions are dispatched to the pending set. NFCI llvm-svn: 362302	2019-06-01 15:22:37 +00:00
Simon Atanasyan	25694e0084	[mips] Extend range of register indexes accepted by cfcmsa/ctcmsa The `cfcmsa` and `ctcmsa` instructions accept index of MSA control register. The MIPS64 SIMD Architecture define eight MSA control registers. But register index for `cfcmsa` and `ctcmsa` instructions might be any number in 0..31 range. If the index is greater then 7, `cfcmsa` writes zero to the destination registers and `ctcmsa` does nothing [1]. [1] MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD Architecture Module https://www.mips.com/?do-download=the-mips64-simd-architecture-module Differential Revision: https://reviews.llvm.org/D62597 llvm-svn: 362299	2019-06-01 13:55:18 +00:00
Dylan McKay	45eb4c7e55	[AVR] Disable register coalescing to the PTRDISPREGS class If we would allow register coalescing on PTRDISPREGS class then register allocator can lock Z register to some virtual register. Larger instructions requiring a memory acces then fail during the register allocation phase since there is no available register to hold a pointer if Y register was already taken for a stack frame. This patch prevents it by keeping Z register spillable. It does it by not allowing coalescer to lock it. Original discussion on https://github.com/avr-rust/rust/issues/128. llvm-svn: 362298	2019-06-01 12:38:56 +00:00
Nikita Popov	46d4dba6e6	[IndVarSimplify] Fixup nowrap flags during LFTR (PR31181) Fix for https://bugs.llvm.org/show_bug.cgi?id=31181 and partial fix for LFTR poison handling issues in general. When LFTR moves a condition from pre-inc to post-inc, it may now depend on value that is poison due to nowrap flags. To avoid this, we clear any nowrap flag that SCEV cannot prove for the post-inc addrec. Additionally, LFTR may switch to a different IV that is dynamically dead and as such may be arbitrarily poison. This patch will correct nowrap flags in some but not all cases where this happens. This is related to the adoption of IR nowrap flags for the pre-inc addrec. (See some of the switch_to_different_iv tests, where flags are not dropped or insufficiently dropped.) Finally, there are likely similar issues with the handling of GEP inbounds, but we don't have a test case for this yet. Differential Revision: https://reviews.llvm.org/D60935 llvm-svn: 362292	2019-06-01 09:40:18 +00:00
Dylan McKay	038e3b9f57	Extend the DWARFExpression address handling to support 16-bit addresses This allows the DWARFExpression class to handle addresses without crashing on targets with 16-bit pointers like AVR. This is required in order to generate assembly from clang via the '-S' flag. This fixes an error with the following message: clang: llvm/include/llvm/DebugInfo/DWARF/DWARFExpression.h:132: llvm::DWARFExpression::DWARFExpression(llvm::DataExtractor, uint16_t, uint8_t): Assertion `AddressSize == 8 \|\| AddressSize == 4' failed. llvm-svn: 362290	2019-06-01 09:18:26 +00:00
Craig Topper	c288a19bb7	[X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables. llvm-svn: 362288	2019-06-01 06:20:59 +00:00
Craig Topper	48fdb61766	[X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the output to make it easier to diff. Fix a few other formatting issues in the manual table. And remove some old FIXMEs. llvm-svn: 362287	2019-06-01 06:20:55 +00:00
Nick Desaulniers	b380846a12	[RuntimeDyld] fix too-small-bitmask error Summary: This was flagged in https://www.viva64.com/en/b/0629/ under "Snippet No. 33". It seems that this statement is doing the standard bitwise trick for adjusting a value to have a specific alignment. The issue is that getStubAlignment() returns an unsigned, while DataSize is declared a uint64_t. The right hand side of the expression is not extended to 64b before bitwise negation, resulting in the top half of the mask being 0s, which is not correct for realignment. Reviewers: lhames, MaskRay Reviewed By: MaskRay Subscribers: RKSimon, MaskRay, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62227 llvm-svn: 362286	2019-06-01 04:51:26 +00:00
Richard Trieu	4e875464df	Inline variable into assert to fix unused variable warning. llvm-svn: 362285	2019-06-01 03:32:20 +00:00
Philip Reames	19afdf74bb	[LoopPred] Eliminate a redundant/confusing cover function [NFC] llvm-svn: 362284	2019-06-01 03:09:28 +00:00
Philip Reames	099eca832e	[LoopPred] Handle a subset of NE comparison based latches At the moment, LoopPredication completely bails out if it sees a latch of the form: %cmp = icmp ne %iv, %N br i1 %cmp, label %loop, label %exit OR %cmp = icmp ne %iv.next, %NPlus1 br i1 %cmp, label %loop, label %exit This is unfortunate since this is exactly the form that LFTR likes to produce. So, go ahead and recognize simple cases where we can. For pre-increment loops, we leverage the fact that LFTR likes canonical counters (i.e. those starting at zero) and a (presumed) range fact on RHS to discharge the check trivially. For post-increment forms, the key insight is in remembering that LFTR had to insert a (N+1) for the RHS. CVP can hopefully prove that add nsw/nuw (if there's appropriate range on N to start with). This leaves us both with the post-inc IV and the RHS involving an nsw/nuw add, and SCEV can discharge that with no problem. This does still need to be extended to handle non-one steps, or other harder patterns of variable (but range restricted) starting values. That'll come later. Differential Revision: https://reviews.llvm.org/D62748 llvm-svn: 362282	2019-06-01 00:31:58 +00:00
Eli Friedman	d8e8722791	[CodeGen] Fix hashing for MO_ExternalSymbol MachineOperands. We were hashing the string pointer, not the string, so two instructions could be identical (isIdenticalTo), but have different hash codes. This showed up as a very rare, non-deterministic assertion failure rehashing a DenseMap constructed by MachineOutliner. So there's no "real" testcase, just a unittest which checks that the hash function behaves correctly. I'm a little scared fixing this is going to cause a regression in outlining or MachineCSE, but hopefully we won't run into any issues. Differential Revision: https://reviews.llvm.org/D61975 llvm-svn: 362281	2019-06-01 00:08:54 +00:00
Tom Tan	eb4d6142dc	[COFF, ARM64] Add CodeView register mapping CodeView has its own register map which is defined in cvconst.h. Missing this mapping before saving register to CodeView causes debugger to show incorrect value for all register based variables, like variables in register and local variables addressed by register (stack pointer + offset). This change added mapping between LLVM register and CodeView register so the correct register number will be stored to CodeView/PDB, it aso fixed the mapping from CodeView register number to register name based on current CPUType but print PDB to yaml still assumes X86 CPU and needs to be fixed. Differential Revision: https://reviews.llvm.org/D62608 llvm-svn: 362280	2019-05-31 23:43:31 +00:00
Nick Desaulniers	7fcad2f171	[PowerPC] check for INLINEASM_BR along w/ INLINEASM Summary: It looks like since INLINEASM_BR was created off of INLINEASM (r353563), a few checks for INLINEASM needed to be updated to check for either case. pr/41999 Reviewers: hfinkel Reviewed By: hfinkel Subscribers: nemanjai, hiraditya, kbarton, jsji, llvm-commits, craig.topper, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62403 llvm-svn: 362278	2019-05-31 23:02:13 +00:00
Reid Kleckner	eddd6c25b5	[codeview] Revert inline line table change of r362264 Testing with debuggers shows that our previous behavior was correct. The reason I thought MSVC did things differently is that MSVC prefers to use the 0xB combined code offset and code length update opcode when inline sites are discontiguous. Keep the test changes, and update the llvm-pdbutil inline line table dumper to account for this new interpretation of the opcodes. llvm-svn: 362277	2019-05-31 22:55:03 +00:00
Matt Arsenault	302eedcbfa	AMDGPU: Fix not adding ImplicitBufferPtr as a live-in Fixes missing test from r293000. llvm-svn: 362275	2019-05-31 22:47:36 +00:00
Erik Pilkington	abb2a93c53	[SimplifyLibCalls] Fold more fortified functions into non-fortified variants When the object size argument is -1, no checking can be done, so calling the _chk variant is unnecessary. We already did this for a bunch of these functions. rdar://50797197 Differential revision: https://reviews.llvm.org/D62358 llvm-svn: 362272	2019-05-31 22:41:36 +00:00
Erik Pilkington	5234921119	NFC: Pull out a function to reduce some duplication Part of https://reviews.llvm.org/D62358 llvm-svn: 362271	2019-05-31 22:41:31 +00:00
Craig Topper	bc9e04d0c3	[SelectionDAG] Make the code in mutateStrictFPToFP less aware of how many operands each node has. NFCI Just copy all of the operands except the chain and call MorphNode on that. This removes the IsUnary and IsTernary flags. Also always get the result type from the result type of the original nodes. Previously we got it from the operand except for two nodes where that didn't work. llvm-svn: 362269	2019-05-31 22:18:45 +00:00
Nick Desaulniers	103bd108a7	[RegisterCoalescer] fix potential use of undef value. NFC Summary: Fixes a warning produced from scan-build (llvm.org/reports/scan-build/), further warnings found by annotation isMoveInstr [[nodiscard]]. isMoveInstr potentially does not assign to its parameters, so if they were uninitialized, they will potentially stay uninitialized. It seems most call sites pass references to uninitialized values, then use them without checking the return value. Reviewers: wmi Reviewed By: wmi Subscribers: MatzeB, qcolombet, hiraditya, tpr, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62109 llvm-svn: 362265	2019-05-31 21:20:13 +00:00
Reid Kleckner	e98cf5fe47	[codeview] Fix inline line table accuracy for discontiguous segments After improving the inline line table dumper in llvm-pdbutil and looking at MSVC's inline line tables, it is clear that setting the length of the inlined code region does not update the code offset. This means that the delta to the beginning of a new discontiguous inlined code region should be calculated relative to the last code offset, excluding the length. Implementing this is a one line fix for MC: simply don't update LastLabel. While I'm updating these test cases, switch them to use llvm-objdump -d and llvm-pdbutil. This allows us to show offsets of each instruction and correlate the line table offsets to the actual code. llvm-svn: 362264	2019-05-31 20:55:31 +00:00
Nikita Popov	7bafae55c0	Reapply [CVP] Simplify non-overflowing saturating add/sub If we can determine that a saturating add/sub will not overflow based on range analysis, convert it into a simple binary operation. This is a sibling transform to the existing with.overflow handling. Reapplying this with an additional check that the saturating intrinsic has integer type, as LVI currently does not support vector types. Differential Revision: https://reviews.llvm.org/D62703 llvm-svn: 362263	2019-05-31 20:48:26 +00:00
Nikita Popov	23a02f6a5f	[CVP] Fix assertion failure on vector with.overflow Noticed on D62703. LVI only handles plain integers, not vectors of integers. This was previously not an issue, because vector support for with.overflow is only a relatively recent addition. llvm-svn: 362261	2019-05-31 20:42:07 +00:00
Craig Topper	c669629e6c	[X86] Resync Host.cpp with compiler-rt's cpu_model.c to enable 0x55 to be identified as cascadelake when avx512vnni is detected. Some other formatting changes. llvm-svn: 362256	2019-05-31 19:18:07 +00:00
Nikita Popov	ccb63e0bfe	Revert "[CVP] Simplify non-overflowing saturating add/sub" This reverts commit `1e692d1777`. Causes assertion failure in builtins-wasm.c clang test. llvm-svn: 362254	2019-05-31 19:04:47 +00:00
Puyan Lotfi	3ea6b24f41	[MIR-Canon] Don't do vreg skip for independent instructions if there are none. We don't want to create vregs if there is nothing to use them for. That causes verifier errors. Differential Revision: https://reviews.llvm.org/D62740 llvm-svn: 362247	2019-05-31 17:34:25 +00:00
Nikita Popov	1e692d1777	[CVP] Simplify non-overflowing saturating add/sub If we can determine that a saturating add/sub will not overflow based on range analysis, convert it into a simple binary operation. This is a sibling transform to the existing with.overflow handling. Differential Revision: https://reviews.llvm.org/D62703 llvm-svn: 362242	2019-05-31 16:46:05 +00:00
Kevin P. Neal	ac79007205	Revert revert of r362112 with minor SystemZ test file corrections. [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully. Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Cameron McInally, Kevin P. Neal Approved by: Cameron McInally Differential Revision: https://reviews.llvm.org/D62546 llvm-svn: 362241	2019-05-31 16:32:12 +00:00
Stanislav Mekhanoshin	fbbe5230f4	[AMDGPU] Use InliningThresholdMultiplier for inline hint AMDGPU uses multiplier 9 for the inline cost. It is taken into account everywhere except for inline hint threshold. As a result we are penalizing functions with the inline hint making them less probable to be inlined than those without the hint. Defaults are 225 for a normal function and 325 for a function with an inline hint. Currently we have effective threshold 225 * 9 = 2025 for normal functions and just 325 for those with the hint. That is fixed by this patch. Differential Revision: https://reviews.llvm.org/D62707 llvm-svn: 362239	2019-05-31 16:19:26 +00:00
Guozhi Wei	c3a24e93d5	[PPC] Correctly adjust branch probability in PPCReduceCRLogicals In PPCReduceCRLogicals after splitting the original MBB into 2, the 2 impacted branches still use original branch probability. This is unreasonable. Suppose we have following code, and the probability of each successor is 50%. condc = conda \|\| condb br condc, label %target, label %fallthrough It can be transformed to following, br conda, label %target, label %newbb newbb: br condb, label %target, label %fallthrough Since each branch has a probability of 50% to each successor, the total probability to %fallthrough is 25% now, and the total probability to %target is 75%. This actually changed the original profiling data. A more reasonable probability can be set to 70% to the false side for each branch instruction, so the total probability to %fallthrough is close to 50%. This patch assumes the branch target with two incoming edges have same edge frequency and computes new probability fore each target, and keep the total probability to original targets unchanged. Differential Revision: https://reviews.llvm.org/D62430 llvm-svn: 362237	2019-05-31 16:11:17 +00:00
Jinsong Ji	18e7bf5c4d	[MachinePipeliner][NFC] Add some debug log and statistics This is to add some log and statistics for debugging Differential Revision: https://reviews.llvm.org/D62165 llvm-svn: 362233	2019-05-31 15:35:19 +00:00
Russell Gallop	802c9b59d5	ftime-trace: Trace loop passes These can take a significant amount of time in some builds. Suggested by Andrea Di Biagio. Differential Revision: https://reviews.llvm.org/D62666 llvm-svn: 362219	2019-05-31 10:14:04 +00:00
Roman Lebedev	39390d8317	[InstCombine] 'C-(C2-X) --> X+(C-C2)' constant-fold It looks this fold was already partially happening, indirectly via some other folds, but with one-use limitation. No other fold here has that restriction. https://rise4fun.com/Alive/ftR llvm-svn: 362217	2019-05-31 09:47:16 +00:00
Roman Lebedev	886c4ef35a	[InstCombine] 'add (sub C1, X), C2 --> sub (add C1, C2), X' constant-fold https://rise4fun.com/Alive/qJQ llvm-svn: 362216	2019-05-31 09:47:04 +00:00
Cullen Rhodes	0fc3a07398	[AArch64][SVE2] Asm: support WHILE instructions Summary: Patch adds support for the following instructions: * WHILEGE, WHILEGT, WHILEHS, WHILEHI, WHILEWR, WHILERW The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62601 llvm-svn: 362215	2019-05-31 09:13:55 +00:00
Cullen Rhodes	087d1337f8	[AArch64][SVE2] Asm: support TBL/TBX instructions Summary: A three sources variant of the TBL instruction is added to the existing SVE instruction in SVE2. This is implemented with minor changes to the existing TableGen class. TBX is a new instruction with its own definition. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62600 llvm-svn: 362214	2019-05-31 09:06:53 +00:00
Cullen Rhodes	2e870011b6	[AArch64][SVE2] Asm: support SVE2 store instructions Summary: Patch adds support for the following instructions: * STNT1B, STNT1H, STNT1S, STNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62599 llvm-svn: 362213	2019-05-31 08:59:40 +00:00
Petar Avramovic	efcd3c0009	[MIPS GlobalISel] Handle position independent code Handle position independent code for MIPS32. When callee is global address, lower call will emit callee as G_GLOBAL_VALUE and add target flag if needed. Support $gp in getRegBankFromRegClass(). Select G_GLOBAL_VALUE, specially handle case when there are target flags attached by lowerCall. Differential Revision: https://reviews.llvm.org/D62589 llvm-svn: 362210	2019-05-31 08:27:06 +00:00
Petar Avramovic	9058b50fb2	[mips] Move initGlobalBaseReg to MipsFunctionInfo. NFC Move initGlobalBaseReg from MipsSEDAGToDAGISel to MipsFunctionInfo. This way functions used for handling position independent code during instruction selection, getGlobalBaseReg and initGlobalBaseReg, end up in same class. Differential Revision: https://reviews.llvm.org/D62586 llvm-svn: 362206	2019-05-31 08:15:28 +00:00
Craig Topper	b457e430f3	[InstructionSimplify] Add missing implementation of llvm::SimplifyUnOp. NFC There are no callers currently, but the function is declared so we should at least implement it. llvm-svn: 362205	2019-05-31 08:10:23 +00:00
Petar Avramovic	f4a6dd28b6	[MIPS GlobalISel] Lower call for callee that is register Lower call for callee that is register for MIPS32. Register should contain callee function address. Differential Revision: https://reviews.llvm.org/D62585 llvm-svn: 362204	2019-05-31 08:06:17 +00:00
Craig Topper	31d00d80a2	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64. These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203	2019-05-31 07:38:26 +00:00
Craig Topper	b79cc5f802	[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199	2019-05-31 06:21:53 +00:00
Puyan Lotfi	0d63cef180	[MIR-Canon] Skip the first N vreg names lazily. This consolidates the vreg skip code into one function (SkipVRegs()). SkipVRegs() now knows if it should skip as if it is the first initialization or subsequent skips. The first skip is also done the first time createVirtualRegister is called by the cursor instead of by the cursor's constructor. This prevents verifier errors on machine functions that have no vregs (where the verifier will complain that there are vregs when the function uses none). Differential Revision: https://reviews.llvm.org/D62717 llvm-svn: 362195	2019-05-31 06:02:38 +00:00
Craig Topper	23066033a1	[X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions. This makes the 5 address operands come first. And the data operand comes last. This matches the operand order the instruction is created with. It's also the expected order in X86MCInstLower. So everything appeared to work, but the operands didn't match their declared type. Fixes a -verify-machineinstrs failure. Also remove the isel patterns from these instructions since they should only be used for stack spills and reloads. I'm not even sure what types the patterns were looking for to match. llvm-svn: 362193	2019-05-31 05:20:27 +00:00
Puyan Lotfi	2a901401fe	[MIR-Canon] Hardening propagateLocalCopies. This is am almost NFC, it does the following: - If there is no register class for a COPY's src or dst, bail. - Fixes uses iterator invalidation bug. Differential Revision: https://reviews.llvm.org/D62713 llvm-svn: 362191	2019-05-31 04:49:58 +00:00
Pengfei Wang	2e67d0c842	[X86] Add VP2INTERSECT instructions Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188	2019-05-31 02:50:41 +00:00
Sam Clegg	9d21f510ee	Fix -DBUILD_SHARED_LIBS=ON build after rL362160 Differential Revision: https://reviews.llvm.org/D62709 llvm-svn: 362180	2019-05-31 01:04:00 +00:00
Craig Topper	70dc2200a2	[X86] Remove result type constraints from the extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175	2019-05-30 23:35:24 +00:00
Matt Arsenault	18659f84b2	MISched: Fix -misched-regpressure=0 if subreg liveness enabled Test is waiting on fixing several more crashes in the AMDGPU scheduler implementation with this. llvm-svn: 362174	2019-05-30 23:31:36 +00:00
Craig Topper	d6b74cc859	[X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC The LoadExt table defaults to all combinations being Legal. For vector types, only src VTs with an i1 element type were ever changed. So we don't need to mark them legal manually. llvm-svn: 362170	2019-05-30 22:29:06 +00:00
Francis Visoiu Mistrih	48998d10e0	[Remarks] Fix usage of enum class Breaks the build on some compilers: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9720/steps/build%20stage%201/logs/stdio llvm-svn: 362165	2019-05-30 22:01:56 +00:00
Francis Visoiu Mistrih	6ada11f134	[Remarks][NFC] Move the serialization to lib/Remarks Separate the remark serialization to YAML from the LLVM Diagnostics. This adds a new serialization abstraction: remarks::Serializer. It's completely independent from lib/IR and it provides an easy way to replace YAML by providing a new remarks::Serializer. Differential Revision: https://reviews.llvm.org/D62632 llvm-svn: 362160	2019-05-30 21:45:59 +00:00
Puyan Lotfi	daaecf98c9	[MIR-Canon] Fixing case where MachineFunction is empty. In cases where the machine function is empty: bail on the RPO traversal. Differential Revision: https://reviews.llvm.org/D62617 llvm-svn: 362158	2019-05-30 21:37:25 +00:00
Roman Lebedev	46511d75b5	[DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts I don't have a test case for these, but there is a test case for D62266 where, even after all the constant-folding patches, we still end up with endless combine loop. Which makes sense, since we don't constant fold for opaque constants. llvm-svn: 362156	2019-05-30 21:10:37 +00:00
Nikita Popov	e906f2a370	[CVP] Generalize willNotOverflow(); NFC Change argument from WithOverflowInst to BinaryOpIntrinsic, so this function can also be used for saturating math intrinsics. llvm-svn: 362152	2019-05-30 21:03:10 +00:00
Lang Hames	a100042b27	[RuntimeDyld] Update reserveAllocationSpace to account for stub padding. This should fix the buildbot failures caused by r362139. llvm-svn: 362151	2019-05-30 20:58:28 +00:00
Martin Storsjo	0fe645c086	[InstCombine] Avoid use after free in DenseMap, when built with GCC Previously, this used a statement like this: Map[A] = Map[B]; This is equivalent to the following: const auto &Src = Map[B]; auto &Dest = Map[A]; Dest = Src; The second statement, "auto &Dest = Map[A];" can insert a new element into the DenseMap, which can potentially grow and reallocate the DenseMap's internal storage, which will invalidate the existing reference to the source. When doing the actual assignment, the Src reference is dereferenced, accessing memory that was freed when the DenseMap grew. This issue hasn't shown up when LLVM was built with Clang, because the right hand side ended up dereferenced before evaulating the left hand side. (If the value type is a larger data type, Clang doesn't do this but behaves like GCC.) With GCC, a cast to Value* isn't enough to make it dereference the right hand side reference before invoking operator[] (while that is enough to make Clang/LLVM do the right thing for larger types), but storing it in an intermediate variable in a separate statement works. This fixes PR42065. Differential Revision: https://reviews.llvm.org/D62624 llvm-svn: 362150	2019-05-30 20:53:21 +00:00
Roman Lebedev	a4e3b50e26	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2 Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 362146	2019-05-30 20:37:49 +00:00
Roman Lebedev	57aa36ff91	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 362145	2019-05-30 20:37:39 +00:00
Roman Lebedev	63b4741534	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 362144	2019-05-30 20:37:29 +00:00
Roman Lebedev	05ad5fd213	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 362143	2019-05-30 20:37:18 +00:00
Roman Lebedev	1d9ec7a81b	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 362142	2019-05-30 20:36:54 +00:00
Lang Hames	0e124b37bd	[RuntimeDyld] Apply padding and alignment bumps to all sections with stubs, and increase the MachO/x86-64 stub alignment to 8. Stub alignment should be guaranteed for any section containing RuntimeDyld stubs/GOT-entries. To do this we should pad and align all sections containing stubs, not just code sections. This commit also bumps the MachO/x86-64 stub alignment to 8, so that GOT entries will be aligned. llvm-svn: 362139	2019-05-30 19:59:20 +00:00

... 9 10 11 12 13 ...

124284 Commits