llvm-project

Commit Graph

Author	SHA1	Message	Date
Javed Absar	b16d146838	Add support for #pragma clang section This patch provides a means to specify section-names for global variables, functions and static variables, using #pragma directives. This feature is only defined to work sensibly for ELF targets. One can specify section names as: #pragma clang section bss="myBSS" data="myData" rodata="myRodata" text="myText" One can "unspecify" a section name with empty string e.g. #pragma clang section bss="" data="" text="" rodata="" Reviewers: Roger Ferrer, Jonathan Roelofs, Reid Kleckner Differential Revision: https://reviews.llvm.org/D33413 llvm-svn: 304704	2017-06-05 10:09:13 +00:00
Peter Smith	adde667007	[ARM] Support fixup for Thumb2 modified immediate This change adds a new fixup fixup_t2_so_imm for the t2_so_imm_asmoperand "T2SOImm". The fixup permits code such as: .L1: sub r3, r3, #.L2 - .L1 .L2: to assemble in Thumb2 as well as in ARM state. The operand predicate isT2SOImm() explicitly doesn't match expressions containing :upper16: and :lower16: as expressions with these operators must match the movt and movw instructions. The test mov r0, foo2 in thumb2-diagnostics is moved to a new file as the fixup delays the error message till after the assembler has quit due to the other errors. As the mov instruction shares the t2_so_imm_asmoperand mov instructions with a non constant expression now match t2MOVi rather than t2MOVi16 so the error message is slightly different. Fixes PR28647 Differential Revision: https://reviews.llvm.org/D33492 llvm-svn: 304702	2017-06-05 09:37:12 +00:00
Sven van Haastregt	78819e0fd4	[InstCombine] Fix extractelement use before def This fixes a bug that can cause extractelements with operands that haven't been defined yet to be inserted at a wrong point when optimising insertelements. Patch by Karl Hylen. Differential Revision: https://reviews.llvm.org/D33449 llvm-svn: 304701	2017-06-05 09:18:10 +00:00
Renato Golin	cdf840fd38	Revert "[sanitizer-coverage] one more flavor of coverage: -fsanitize-coverage=inline-8bit-counters. Experimental so far, not documenting yet." This reverts commit r304630, as it broke ARM/AArch64 bots for 2 days. llvm-svn: 304698	2017-06-05 07:35:52 +00:00
Stanislav Mekhanoshin	286a4225b9	[AMDGPU] Fix SIFoldOperands crash with clamp Fixes bug #33302. Pass did not account that Src1 of max instruction can be an immediate. Differential Revision: https://reviews.llvm.org/D33884 llvm-svn: 304696	2017-06-05 01:03:04 +00:00
Simon Pilgrim	46dd55f1e1	[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 1> Step 2: unpcklps X, Y ==> <3, 2, 1, 0> The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc. Instead, this patch unpacks progressively larger sequential vector elements together: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 2> Step 2: unpcklpd X, Y ==> <3, 2, 1, 0> This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree. Differential Revision: https://reviews.llvm.org/D33864 llvm-svn: 304688	2017-06-04 20:12:04 +00:00
Igor Breger	3bfba2c569	[GlobalISel][X86] merge irtranslator-call test files. NFC llvm-svn: 304683	2017-06-04 12:41:10 +00:00
Craig Topper	2b54baeb96	[X86] Replace 'REQUIRES: x86' in tests with 'REQUIRES: x86-registered-target' which seems to be the correct way to make them run on an x86 build. llvm-svn: 304682	2017-06-04 08:21:58 +00:00
Craig Topper	fe9ad82e44	[ConstantFolding] Properly support constant folding of vector powi intrinsic. The second argument is not a vector so needs special treatment. llvm-svn: 304679	2017-06-04 07:30:28 +00:00
Craig Topper	97f113e795	[InstSimplify] Add test case demonstrating that we fail to constant fold vector llvm.powi intrinsics due to the second argument not being a vector. llvm-svn: 304678	2017-06-04 07:30:23 +00:00
Craig Topper	0799ff9e64	[InstCombine] Add support for simplifying ctlz/cttz intrinsics based on known bits. llvm-svn: 304669	2017-06-03 18:50:32 +00:00
Craig Topper	7c553edced	[ConstantFolding] Fix constant folding for vector cttz and ctlz intrinsics to understand that the second argument is still a scalar. llvm-svn: 304668	2017-06-03 18:50:29 +00:00
Craig Topper	36fa2f0dee	[InstCombine][InstSimplify] Add various tests for ctlz/cttz with vectors, some showing missed optimizations. NFC llvm-svn: 304667	2017-06-03 18:50:26 +00:00
Craig Topper	622c0f89ec	[InstCombine] Use cttz instead of ctlz in the cttz_cmp_vec test case. Looks like a copy paste mistake. llvm-svn: 304666	2017-06-03 18:50:23 +00:00
Stanislav Mekhanoshin	0330660403	[AMDGPU] Untangle SDWA pass from SIShrinkInstructions Remove dependency of SDWA pass on SIShrinkInstructions. The goal is to move SDWA even higher in the stack to avoid second run of MachineLICM, MachineCSE and SIFoldOperands. Also added handling to preserve original src modifiers. Differential Revision: https://reviews.llvm.org/D33860 llvm-svn: 304665	2017-06-03 17:39:47 +00:00
Amaury Sechet	39fbe3bb60	Regenerate expectations for trunc-to-bool.ll . NFC llvm-svn: 304660	2017-06-03 11:35:40 +00:00
Simon Pilgrim	f93debb40c	[X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combining Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. llvm-svn: 304659	2017-06-03 11:12:57 +00:00
Kostya Serebryany	f7db346cdf	[sanitizer-coverage] one more flavor of coverage: -fsanitize-coverage=inline-8bit-counters. Experimental so far, not documenting yet. llvm-svn: 304630	2017-06-03 01:35:47 +00:00
Tom Stellard	e042412ef1	AMDGPU/GlobalISel: Mark 1-bit integer constants as legal Summary: These are mostly legal, but will probably need special lowering for some cases. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D33791 llvm-svn: 304628	2017-06-03 01:13:33 +00:00
Evgeniy Stepanov	704003ea3d	Revert "[CFI] Remove LinkerSubsectionsViaSymbols." This reverts commit r304582: breaks cfi-devirt :: anon-namespace.cpp on Darwin. llvm-svn: 304626	2017-06-03 00:46:27 +00:00
Stanislav Mekhanoshin	f154b4f52c	[AMDGPU] Preserve operand order in SIFoldOperands SIFoldOperands can commute operands even if no folding was done. This change is to preserve IR is no folding was done. Differential Revision: https://reviews.llvm.org/D33802 llvm-svn: 304625	2017-06-03 00:41:52 +00:00
Quentin Colombet	1ee8616ca0	[SystemZ] Simplify test case. NFC Remove useless successors information. llvm-svn: 304615	2017-06-02 23:40:58 +00:00
Sanjay Patel	56641ac497	[x86] fix over-specific triple; NFC There's nothing darwin-specific in these tests, and using that setting causes extra phantom diffs when the auto-generated check lines are regenerated today. llvm-svn: 304614	2017-06-02 23:40:46 +00:00
Philip Reames	80135bdf9e	Canonicalize a test via utils/update_test_checks.py Turns out I might not have further changes to make here, but with the way I'd written the tests, even I couldn't tell that. :( llvm-svn: 304613	2017-06-02 23:27:36 +00:00
Sanjay Patel	4cad0f0477	[x86] add tests for unsigned vector compares with known signbits; NFC (PR33276) llvm-svn: 304612	2017-06-02 23:24:28 +00:00
Matthias Braun	0021d46a1c	RegisterScavenging: Add ScavengerTest pass This pass allows to run the register scavenging independently of PrologEpilogInserter to allow targeted testing. Also adds some basic register scavenging tests. llvm-svn: 304606	2017-06-02 23:01:42 +00:00
Quentin Colombet	2145cf3f07	[RABasic] Properly update the LiveRegMatrix when LR splitting occur Prior to this patch we used to not touch the LiveRegMatrix while doing live-range splitting. In other words, when live-range splitting was occurring, the LiveRegMatrix was not reflecting the changes. This is generally fine because it means the query to the LiveRegMatrix will be conservately correct. However, when decisions are taken based on what is going to happen on the interferences (e.g., when we spill a register and know that it is going to be available for another one), we might hit an assertion that the color used for the assignment is still in use. This patch makes sure the changes on the live-ranges are properly reflected in the LiveRegMatrix, so the assertions don't break. An alternative could have been to remove the assertion, but it would make the invariants of the code and the general reasoning more complicated in my opnion. http://llvm.org/PR33057 llvm-svn: 304603	2017-06-02 22:46:31 +00:00
Quentin Colombet	ebbaed6d3c	[RABasic] Properly initialize the pass Use the initializeXXX method to initialize the RABasic pass in the pipeline. This enables us to take advantage of the .mir infrastructure. llvm-svn: 304602	2017-06-02 22:46:26 +00:00
Xinliang David Li	0b7d858fa3	[PartialInlining] Minor cost anaysis tuning Also added a test option and 2 cost analysis related tests. llvm-svn: 304599	2017-06-02 22:08:04 +00:00
Jun Bum Lim	2960d41e68	[InlineCost] Enable the new switch cost heuristic Summary: This is to enable the new switch inline cost heuristic (r301649) by removing the old heuristic as well as the flag itself. In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and 8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64. No significant code size / performance regression was found in O3/O2/Os. No significant complain was reported from the llvm-dev thread. Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo Reviewed By: echristo Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D32653 llvm-svn: 304594	2017-06-02 20:42:54 +00:00
Ahmed Bougacha	018a68f9e4	[X86] Correctly broadcast NaN-like integers as float on AVX. Since r288804, we try to lower build_vectors on AVX using broadcasts of float/double. However, when we broadcast integer values that happen to have a NaN float bitpattern, we lose the NaN payload, thereby changing the integer value being broadcast. This is caused by ConstantFP::get, to which we pass the splat i32 as a float (by bitcasting it using bitsToFloat). ConstantFP::get takes a double parameter, so we end up lossily converting a single-precision NaN to double-precision. Instead, avoid any kinds of conversions by directly building an APFloat from the splatted APInt. Note that this also fixes another piece of code (broadcast of subvectors), that currently isn't susceptible to the same problem. Also note that we could really just use APInt and ConstantInt throughout: the constant pool type doesn't matter much. Still, for consistency, use the appropriate type. llvm-svn: 304590	2017-06-02 20:02:59 +00:00
Zachary Turner	92dcdda623	[CodeView] Support CodeView subsections in any order. Previously we would expect certain subsections to appear in a certain order because some subsections would reference other subsections, but in practice we need to support arbitrary orderings since some object file and PDB file producers generate them this way. This also paves the way for supporting Yaml <-> Object File conversion of CodeView, since Object Files typically have quite a large number of subsections in their debug info. Differential Revision: https://reviews.llvm.org/D33807 llvm-svn: 304588	2017-06-02 19:49:14 +00:00
Amaury Sechet	04ffaca604	Regenerate expectation for wide-fma-contraction.ll . NFC llvm-svn: 304586	2017-06-02 19:15:04 +00:00
Keno Fischer	514a6a54e7	[SROA] Fix crash due to bad bitcast Summary: As shown in the test case, SROA was crashing when trying to split stores (to the alloca) of loads (from anywhere), because it assumed the pointer operand to the loads and stores had to have the same address space. This isn't the case. Make sure to use the correct pointer type for both the load and the store. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D32593 llvm-svn: 304585	2017-06-02 19:04:17 +00:00
Evgeniy Stepanov	63f056327d	[CFI] Remove LinkerSubsectionsViaSymbols. Since D17854 LinkerSubsectionsViaSymbols is unnecessary. It is interfering with ThinLTO implementation of CFI-ICall, where the aliases used on the !LinkerSubsectionsViaSymbols branch are needed to export jump tables to ThinLTO backends. llvm-svn: 304582	2017-06-02 18:45:14 +00:00
Evgeniy Stepanov	b933ad3a77	Skip CFI for dead functions. Differential Revision: https://reviews.llvm.org/D33805 llvm-svn: 304578	2017-06-02 18:24:23 +00:00
Evgeniy Stepanov	659b3bc77d	Move summary dead stripping before regular LTO. This way dead stripping results are recorded in combined summary and can be used in regular LTO passes. Differential Revision: https://reviews.llvm.org/D33615 llvm-svn: 304577	2017-06-02 18:24:17 +00:00
Konstantin Zhuravlyov	be6c0ca5e2	AMDGPU: Make auto waitcnt before barrier a feature Differential Revision: https://reviews.llvm.org/D33793 llvm-svn: 304571	2017-06-02 17:40:26 +00:00
Philip Reames	94cc4a29ed	Add placeholder for more extensive verification of psuedo ops This initial patch doesn't actually do much useful. It's just to show where the new code goes. Once this is in, I'll extend the verification logic to check more useful properties. For those curious, the more complicated version of this patch already found one very suspicious thing. Differential Revision: https://reviews.llvm.org/D33819 llvm-svn: 304564	2017-06-02 16:36:37 +00:00
Sanjay Patel	ce241f48c5	[InstCombine] fix icmp with not op and constant to work with splat vector constant llvm-svn: 304562	2017-06-02 16:29:41 +00:00
Craig Topper	b23e7c78a5	[InstSimplify][ConstantFolding] Teach constant folding how to handle icmp null, (inttoptr x) as well as it handles icmp (inttoptr x), null Summary: The constant folding code currently assumes that the constant expression will always be on the left and the simple null will be on the right. But that's not true at least on the path from InstSimplify. This patch adds support to ConstantFolding to detect the reversed case. Reviewers: spatel, dberlin, majnemer, davide, joey Reviewed By: joey Subscribers: joey, llvm-commits Differential Revision: https://reviews.llvm.org/D33801 llvm-svn: 304559	2017-06-02 16:17:32 +00:00
Amaury Sechet	5746e7356a	Update select.ll expected results. NFC llvm-svn: 304557	2017-06-02 16:07:43 +00:00
Sanjay Patel	630a524e8d	[InstCombine] fix/add tests for icmp with not ops; NFC The existing test was not minimal, and there was no coverage for the variants with a constant or vector types. llvm-svn: 304555	2017-06-02 15:35:45 +00:00
Alexander Timofeev	3f70b619a9	AMDGPUAnnotateUniformValue should always treat volatile loads as divergent llvm-svn: 304554	2017-06-02 15:25:52 +00:00
Mark Searles	70359ac60d	[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests. -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551	2017-06-02 14:19:25 +00:00
Zoran Jovanovic	2aae0649a1	[mips][microMIPS] Extending size reduction pass with LBU16, LHU16, SB16 and SH16 Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: LBU instruction is transformed into 16-bit instruction LBU16 LHU instruction is transformed into 16-bit instruction LHU16 SB instruction is transformed into 16-bit instruction SB16 SH instruction is transformed into 16-bit instruction SH16 Differential Revision: https://reviews.llvm.org/D33091 llvm-svn: 304550	2017-06-02 14:14:21 +00:00
Krzysztof Parzyszek	066e8b56a0	[Hexagon] Return 0 from getDotNewPredOp when .new opcode does not exist This allows using this function to test if an instruction can be converted to a .new form. llvm-svn: 304549	2017-06-02 14:07:06 +00:00
Amaury Sechet	2e1fed9ef8	Regenerate sse3.ll test results. NFC llvm-svn: 304548	2017-06-02 14:02:49 +00:00
Amaury Sechet	8e370f14cb	Regenerate and-sink.ll test results. NFC llvm-svn: 304547	2017-06-02 14:02:46 +00:00
Amaury Sechet	f0c066f140	Regenerate shrink-compare.ll test results. NFC llvm-svn: 304546	2017-06-02 14:02:43 +00:00
Benjamin Kramer	19092d783c	[X86] Don't fold into memory operands into insertps in the generated folding tables. insertps behaves differently, the register form selects from an input register based on the immediate operand while the memory form just loads the given address. We have custom code to change the immediate in cases where that's legal, so completely remove insertps from the generated tables. llvm-svn: 304540	2017-06-02 10:50:22 +00:00
John Brawn	6671616cde	[GlobalMerge] Don't merge globals that may be preempted When a global may be preempted it needs to be accessed directly, instead of indirectly through a MergedGlobals symbol, for the preemption to work. This fixes PR33136. Differential Revision: https://reviews.llvm.org/D33727 llvm-svn: 304537	2017-06-02 10:24:14 +00:00
Diana Picus	e7aa90987d	[ARM] GlobalISel: Support struct params/returns Very very similar to the support for arrays. As with arrays, we don't support returning large structs that wouldn't fit in R0-R3. Most front-ends would likely use sret arguments for that anyway. The only significant difference is that when splitting a struct, we need to make sure we set the correct original alignment on each member, otherwise it may get split incorrectly between stack and registers. llvm-svn: 304536	2017-06-02 10:16:48 +00:00
Javed Absar	4ae7e81233	[ARM] Cortex-A57 scheduling model for ARM backend (AArch32) This patch implements the Cortex-A57 scheduling model. The main code is in ARMScheduleA57.td, ARMScheduleA57WriteRes.td. Small changes in cpp,.h files to support required scheduling predicates. Scheduling model implemented according to: http://infocenter.arm.com/help/topic/com.arm.doc.uan0015b/Cortex_A57_Software_Optimization_Guide_external.pdf. Patch by : Andrew Zhogin (submitted on his behalf, as requested). Rewiewed by: Renato Golin, Diana Picus, Javed Absar, Kristof Beyls. Differential Revision: https://reviews.llvm.org/D28152 llvm-svn: 304530	2017-06-02 08:53:19 +00:00
Amaury Sechet	9a6fdc0bd5	Specify triple for xor-icmp.ll . llvm-svn: 304526	2017-06-02 07:45:22 +00:00
Amaury Sechet	968dda7f81	Regenerate expectations for xor-icmp.ll . NFC llvm-svn: 304525	2017-06-02 07:25:02 +00:00
Gor Nishanov	053d2d24f7	[coroutines] PR33271: Remove stray coro.save intrinsics during CoroSplit Summary: Optimization passes may remove llvm.coro.suspend intrinsic while leaving matching llvm.coro.save intrinsic orphaned. Make sure we clean up orphaned coro.saves. The bug manifested with a crash similar to this: ``` llvm_unreachable("Unknown type!"); llvm::MVT::getVT (Ty=0x489518, HandleUnknown=false) llvm::EVT::getEVT llvm::TargetLoweringBase::getValueType llvm::ComputeValueVTs llvm::SelectionDAGBuilder::visitTargetIntrinsic ``` Reviewers: GorNishanov Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D33817 llvm-svn: 304518	2017-06-02 02:18:36 +00:00
Xinliang David Li	621e8dcf1f	[Profile] Enhance expect lowering to handle correlated branches builtin_expect applied on && or \|\| expressions were not handled properly before. With this patch, the problem is fixed. Differential Revision: http://reviews.llvm.org/D33164 llvm-svn: 304517	2017-06-02 02:09:31 +00:00
Sam Clegg	c38e947e50	[WebAssembly] MC: Fix references to undefined externals in data section Undefined externals don't need to have a size or an offset. This was broken by r303915. Added a test for this case. This fixes the "Compile LLVM Torture (o)" step on the wasm waterfall. Differential Revision: https://reviews.llvm.org/D33803 llvm-svn: 304505	2017-06-02 01:05:24 +00:00
Mandeep Singh Grang	fce1f464ac	[PredicateInfo] Enable -reverse-iterate tests only for +Asserts builds Summary: The flag -reverse-iterate is present only on +Asserts builds. Reviewers: dberlin, davide, RKSimon, efriedma, chapuni Reviewed By: efriedma, chapuni Subscribers: chapuni, llvm-commits Differential Revision: https://reviews.llvm.org/D33795 llvm-svn: 304498	2017-06-01 23:52:59 +00:00
Tim Shen	4e912aa5af	[ThinLTO] Move -lto-use-new-pm to llvm-lto2, and change it to -use-new-pm. Summary: As we teach Clang to use ThinkLTO + new PM, it's good for the users to inject through Config, instead of setting a flag in the LTOBackend library. Move the flag to llvm-lto2. As it moves to llvm-lto2, a new name -use-new-pm seems simpler and as clear. Reviewers: davide, tejohnson Subscribers: mehdi_amini, Prazek, inglorion, eraman, chandlerc, llvm-commits Differential Revision: https://reviews.llvm.org/D33799 llvm-svn: 304492	2017-06-01 23:13:44 +00:00
Zachary Turner	ebd3ae8371	[CodeView] Properly align symbol records on read/write. Object files have symbol records not aligned to any particular boundary (e.g. 1-byte aligned), while PDB files have symbol records padded to 4-byte aligned boundaries. Since they share the same reading / writing code, we have to provide an option to specify the alignment and propagate it up to the producer or consumer who knows what the alignment is supposed to be for the given container type. Added a test for this by modifying the existing PDB -> YAML -> PDB round-tripping code to round trip symbol records as well as types. Differential Revision: https://reviews.llvm.org/D33785 llvm-svn: 304484	2017-06-01 21:52:41 +00:00
Yaxun Liu	a618acf923	[AMDGPU] Fix kernel arg segment size for amdgizcl Differential Revision: https://reviews.llvm.org/D33307 llvm-svn: 304482	2017-06-01 21:31:53 +00:00
Craig Topper	5ea2d55e1c	[InstSimplify][ConstantFolding] Add test demonstrating failure to simplify (icmp eq null, inttoptr x) when the null is on the left hand side. NFC llvm-svn: 304474	2017-06-01 21:20:07 +00:00
Adrian Prantl	d9cd4d52e3	DbgValueHistoryCalculator: Ignore call instructions that claim to clobber SP. The AArch64 backend marks calls that involve aggregate function arguments as having an implicit def of SP. We already have the same workaround in LiveDebugValues and in DbgValueHistoryCalculator for SP clobbers in register masks. This adds register defs to the list. Fixes rdar://problem/30361929 and Swift SR-3851. llvm-svn: 304471	2017-06-01 21:14:58 +00:00
Nirav Dave	4952871630	[SDAG] Fix CombineTo ordering in visitZERO_EXTEND and visitSIGN_EXTEND Reorder CombineTo Calls to prevent references to stale/deleted SDNodes which caused undue assertions. Reviewers: dbabokin Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D31625 llvm-svn: 304460	2017-06-01 19:33:50 +00:00
Haicheng Wu	bf277f38ad	[InlineCost] Add a test case for GEP cost The added test case is to check whether the simplified value is passed to getGEPCost(). Differential Revision: https://reviews.llvm.org/D33779 llvm-svn: 304454	2017-06-01 19:06:07 +00:00
Xinliang David Li	ee8d6acb1f	[Profile] Fix builtin_expect lowering bug The lowerer wrongly assumes the ICMP instruction 1) always has a constant operand; 2) the operand has value 0. It also assumes the expected value can only be one, thus other values other than one will be considered 'zero'. This leads to wrong profile annotation when other integer values are used other than 0, 1 in the comparison or in the expect intrinsic. Also missing is handling of equal predicate. This patch fixes all the above problems. Differential Revision: http://reviews.llvm.org/D33757 llvm-svn: 304453	2017-06-01 19:05:55 +00:00
Xinliang David Li	0a0acbcf78	[PartialInlining] Emit branch info and profile data as remarks This allows us to collect profile statistics to tune static branch prediction. Differential Revision: http://reviews.llvm.org/D33746 llvm-svn: 304452	2017-06-01 18:58:50 +00:00
Mandeep Singh Grang	33a1b73600	[PredicateInfo] Fix non-determinism in codegen uncovered by reverse iterating SmallPtrSet Summary: Sort OpsToRename before iterating to make iteration order deterministic. Thanks to Daniel Berlin for the sorting logic. Reviewers: dberlin, RKSimon, efriedma, davide Reviewed By: dberlin, davide Subscribers: sanjoy, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D33265 llvm-svn: 304447	2017-06-01 18:36:24 +00:00
Krzysztof Parzyszek	3cf16576d5	[Hexagon] Fix dependence check in the packetizer An incorrect check in the packetizer lead to an attempt to convert an unconditional branch to a .new (conditional) form. llvm-svn: 304442	2017-06-01 18:02:40 +00:00
Krzysztof Parzyszek	51fd5405d5	[Hexagon] Handle long-running simplification loop in idiom recognition The initial assumption was that the simplification would converge to a fixed point relatvely quickly. Turns out that there are legitimate situa- tions where the complexity of the code causes it to take a large number of iterations. Two main changes: - Instead of aborting upon hitting the limit, simply return nullptr. - Reduce the limit to 10,000 from 100,000. llvm-svn: 304441	2017-06-01 18:00:47 +00:00
Amaury Sechet	94eb633dd2	Fix addcarry-crash.ll llvm-svn: 304415	2017-06-01 14:24:31 +00:00
Amaury Sechet	b761959993	Add regression test for the addcarry crash. See D33770 for context. llvm-svn: 304414	2017-06-01 14:09:56 +00:00
Florian Hahn	fca7b8348f	[ARM] Create relocations for Thumb functions calling ARM fns in ELF. Summary: Without using a fixup in this case, BL will be used instead of BLX to call internal ARM functions from Thumb functions. Reviewers: rafael, t.p.northover, peter.smith, kristof.beyls Reviewed By: peter.smith Subscribers: srhines, echristo, aemerson, rengolin, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33436 llvm-svn: 304413	2017-06-01 13:50:57 +00:00
Chandler Carruth	8b3be4e59d	[PM/ThinLTO] Port the ThinLTO pipeline (both components) to the new PM. Based on the original patch by Davide, but I've adjusted the API exposed to just be different entry points rather than exposing more state parameters. I've factored all the common logic out so that we don't have any duplicate pipelines, we just stitch them together in different ways. I think this makes the build easier to reason about and understand. This adds a direct method for getting the module simplification pipeline as well as a method to get the optimization pipeline. While not my express goal, this seems nice and gives a good place comment about the restrictions that are imposed on them. I did make some minor changes to the way the pipelines are structured here, but hopefully not ones that are significant or controversial: 1) I sunk the PGO indirect call promotion to only be run when we have PGO enabled (or as part of the special ThinLTO pipeline). 2) I made the extra GlobalOpt run in ThinLTO just happen all the time and at a slightly more powerful place (before we remove available externaly functions). This seems like general goodness and not a big compile time sink, so it didn't make sense to only use it in ThinLTO. Fewer differences in the pipeline makes everything simpler IMO. 3) I hoisted the ThinLTO stop point pre-link above the the RPO function attr inference. The RPO inference won't infer anything terribly meaningful pre-link (recursiveness?) so it didn't make a lot of sense. But if the placement of RPO inference starts to matter, we should move it to the canonicalization phase anyways which seems like a better place for it (and there is a FIXME to this effect!). But that seemed a bridge too far for this patch. If we ever need to parameterize these pipelines more heavily, we can always sink the logic to helper functions with parameters to keep those parameters out of the public API. But the changes above seemed minor that we could possible get away without the parameters entirely. I added support for parsing 'thinlto' and 'thinlto-pre-link' names in pass pipelines to make it easy to test these routines and play with them in larger pipelines. I also added a really basic manifest of passes test that will show exactly how the pipelines behave and work as well as making updates to them clear. Lastly, this factoring does introduce a nesting layer of module pass managers in the default pipeline. I don't think this is a big deal and the flexibility of decoupling the pipelines seems easily worth it. Differential Revision: https://reviews.llvm.org/D33540 llvm-svn: 304407	2017-06-01 11:39:39 +00:00
Zvi Rackover	7693733e80	[X86] Match bitcast of vxi1 to pmovmsk Summary: Add an early combine to match patterns such as: (i16 bitcast (v16i1 x)) -> (i16 movmsk (v16i8 sext (v16i1 x))) This combine needs to happen early enough before type-legalization scalarizes the result of the setcc. Reviewers: igorb, craig.topper, RKSimon Subscribers: delena, llvm-commits Differential Revision: https://reviews.llvm.org/D33311 llvm-svn: 304406	2017-06-01 11:27:57 +00:00
Amaury Sechet	9c5d1e966b	[DAGCombine] Refactor common addcarry pattern. Summary: This pattern is no very useful per se, but it exposes optimization for toehr patterns that wouldn't kick in otherwize. It's very common and worth optimizing for. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32756 llvm-svn: 304402	2017-06-01 10:48:04 +00:00
Amaury Sechet	2e43cb6d03	[DAGCombine] (add/uaddo X, Carry) -> (addcarry X, 0, Carry) Summary: This enables further transforms. Depends on D32916 Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32925 llvm-svn: 304401	2017-06-01 10:42:39 +00:00
Tim Shen	6b41141863	[ThinLTO] Migrate ThinLTOBitcodeWriter to the new PM. Summary: Also see D33429 for other ThinLTO + New PM related changes. Reviewers: davide, chandlerc, tejohnson Subscribers: mehdi_amini, Prazek, cfe-commits, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33525 llvm-svn: 304378	2017-06-01 01:02:12 +00:00
Xinliang David Li	32c5e809be	[PartialInlining] Reduce outlining overhead by removing unneeded live-out(s) Differential Revision: http://reviews.llvm.org/D33694 llvm-svn: 304375	2017-06-01 00:12:41 +00:00
Dehao Chen	6b737ddce7	Add LiveRangeShrink pass to shrink live range within BB. Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 304371	2017-05-31 23:25:25 +00:00
Reid Kleckner	fc7ba565ed	[EH] Recognize __(gxx\|gcc)_personality_seh0 as the GNU EH personalities These are no-ops when there are no invokes. We don't need to emit LSDAs for them. Fixes PR33220. llvm-svn: 304367	2017-05-31 22:35:52 +00:00
Matthias Braun	605f779516	ImplicitNullChecks: Clear kill/dead flags when moving instructions around The values are marked as livein in the successor blocks so marking them as killed or dead was wrong. llvm-svn: 304366	2017-05-31 22:23:08 +00:00
Reid Kleckner	c2f1bbfe4f	[EH] Fix the LSDA that we emit for unknown EH personalities We should have a single call site entry with no landing pad. This indicates that no EH action should be taken and the unwinder should unwind to the next frame. We currently don't recognize __gxx_personality_seh0 as a known personality, so we forcibly emit a table, and that table was wrong. This was filed as PR33220. Now we emit a correct table for that personality. The next step is to recognize that we can completely skip the table for this personality. llvm-svn: 304363	2017-05-31 22:18:49 +00:00
Wei Mi	0bd3f41588	Revert rL304050. It may break sanitizer bootstrap. Revert it for now while investigating. llvm-svn: 304350	2017-05-31 21:29:33 +00:00
Teresa Johnson	a6a3fb57a1	[ThinLTO] Reduce unnecessary map lookups during combined summary write Summary: Don't assign values to undefined references, simply don't emit those reference edges as they are not useful (we were already not emitting call edges to undefined refs). Also, streamline the later lookup of value ids when writing the summaries, by combining the check for value id existence with the access of that value id. Reviewers: pcc Subscribers: Prazek, llvm-commits, inglorion Differential Revision: https://reviews.llvm.org/D33634 llvm-svn: 304323	2017-05-31 18:58:11 +00:00
Nirav Dave	3424373f30	[ScheduleDAG] Deal with already scheduled loads in ScheduleDAG. Summary: If we attempt to unfold an SUnit in ScheduleDAG that results in finding an already scheduled load, we must should abort the unfold as it will not improve scheduling. This fixes PR32610. Reviewers: jmolloy, sunfish, bogner, spatel Subscribers: llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D32911 llvm-svn: 304321	2017-05-31 18:43:17 +00:00
Matthias Braun	d6a36ae282	TargetMachine: Indicate whether machine verifier passes. This adds a callback to the LLVMTargetMachine that lets target indicate that they do not pass the machine verifier checks in all cases yet. This is intended to be a temporary measure while the targets are fixed allowing us to enable the machine verifier by default with EXPENSIVE_CHECKS enabled! Differential Revision: https://reviews.llvm.org/D33696 llvm-svn: 304320	2017-05-31 18:41:23 +00:00
Kostya Serebryany	53b34c8443	[sanitizer-coverage] remove stale code (old coverage); llvm part llvm-svn: 304319	2017-05-31 18:27:33 +00:00
Sean Fertile	457ddd311a	[PowerPC] Correctly specify the cache line size for Power 7, 8 and 9. Fixes PPCTTIImpl::getCacheLineSize() returning the wrong cache line size for newer ppc processors. Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D33656 llvm-svn: 304317	2017-05-31 18:20:17 +00:00
Anna Thomas	777bb90bdc	Revert "[Atomics][LoopIdiom] Recognize unordered atomic memcpy" This reverts commit r304310. It caused build failures in polly and mingw due to undefined reference to llvm::RTLIB::getMEMCPY_ELEMENT_ATOMIC. llvm-svn: 304315	2017-05-31 17:20:51 +00:00
Zaara Syeda	3a7578c658	[PPC] Inline expansion of memcmp This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313	2017-05-31 17:12:38 +00:00
Mark Searles	11d0a04050	[AMDGPU] Fix bugs in new waitcnt pass. Add test. - new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it - fix handling of PERMUTE ops - fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass ) - add new test Differential Revision: https://reviews.llvm.org/D33114 llvm-svn: 304311	2017-05-31 16:44:23 +00:00
Anna Thomas	056c009f1b	[Atomics][LoopIdiom] Recognize unordered atomic memcpy Summary: Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations. Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html Patch by Daniel Neilson! Reviewers: reames, anna, skatkov Reviewed By: reames Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33243 llvm-svn: 304310	2017-05-31 16:39:52 +00:00
Dmitry Preobrazhensky	793c592652	[AMDGPU][MC] New syntax for ds_swizzle_b32 offset See Bug 28601: https://bugs.llvm.org//show_bug.cgi?id=28601 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33542 llvm-svn: 304309	2017-05-31 16:26:47 +00:00
Florian Hahn	ff25b6d8f6	[AArch64] Enable FeatureFuseAES on Cortex-A53. It improves performance on Cortex-A53. llvm-svn: 304307	2017-05-31 15:50:03 +00:00
Florian Hahn	064a2f9222	[AArch64] Enable FeatureFuseAES on Cortex-A73. It improves performance on Cortex-A73. llvm-svn: 304304	2017-05-31 15:25:25 +00:00
Nirav Dave	7c70fddba6	[DAG] Avoid use of stale store. Correct references to alignment of store which may be deleted in a previous iteration of merge. Instead use first store that would be merged. Corrects pr33172's use-after-poison caught by ASan. Reviewers: spatel, hfinkel, RKSimon Reviewed By: RKSimon Subscribers: thegameg, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33686 llvm-svn: 304299	2017-05-31 13:36:17 +00:00
Tony Jiang	60c247de18	[PowerPC] Fix a performance bug for PPC::XXPERMDI. There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance. Differential Revision: https://reviews.llvm.org/D33404 llvm-svn: 304298	2017-05-31 13:09:57 +00:00
Amaury Sechet	6a303a4e73	Regenerate xchg-nofold.ll expected results. NFC. llvm-svn: 304291	2017-05-31 09:44:08 +00:00
Nemanja Ivanovic	accab033c9	[PowerPC] Eliminate integer compare instructions - vol. 3 This patch builds upon https://reviews.llvm.org/rL302810 to add handling for the 64-bit SETEQ patterns. Differential Revision: https://reviews.llvm.org/D33369 llvm-svn: 304286	2017-05-31 08:04:07 +00:00
Dylan McKay	043fa4b3d6	[AVR] Fix a big in shift operator lowering; Authored by Dr. Gergo Erdi When generating code for a shift loop, check the shift amount against the literal value 0, not R0 llvm-svn: 304284	2017-05-31 06:27:46 +00:00
Nemanja Ivanovic	e597bd8230	[PowerPC] Eliminate integer compare instructions - vol. 2 This patch builds upon https://reviews.llvm.org/rL302810 to add handling for bitwise logical operations in general purpose registers. The idea is to keep the values in GPRs as long as possible - only extracting them to a condition register bit when no further operations are to be done. Differential Revision: https://reviews.llvm.org/D31851 llvm-svn: 304282	2017-05-31 05:40:25 +00:00
George Burgess IV	0a7b989036	[CFLAA] Add missing break; note things are broken. Thanks to Galina Kistanova for finding the missing break! When trying to make a test for this, I realized our logic for handling extractvalue/insertvalue/... is somewhat broken. This makes constructing a test-case for this missing break nontrivial. llvm-svn: 304275	2017-05-31 02:35:26 +00:00
Daniel Berlin	be3e7ba45e	NewGVN: Fix PR 33185 by checking whether we need to recursively generate a phi of ops, which we don't currently support. llvm-svn: 304272	2017-05-31 01:47:32 +00:00
Daniel Berlin	9ceafe267b	Fix test that wasn't update_test_check'd llvm-svn: 304271	2017-05-31 01:47:29 +00:00
Vedant Kumar	b745804bb1	Mark a test as requiring a default triple This test assumes that llc can infer a default triple. I'm not sure why exactly, but the Verify MachineInstrs bot requires tests to be explicit about this dependency. This commit follows the lead from r248452 and adds in 'REQUIRES: default_triple' to omit-empty.ll. Bot URL: http://lab.llvm.org:8080/green/job/Verify-Machineinstrs_AArch64/7500 llvm-svn: 304269	2017-05-31 01:42:55 +00:00
Matthias Braun	05eeadbfd1	ARM: Fix cmpxchg O0 expansion This is the equivalent of r304048 for ARM: - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304267	2017-05-31 01:21:35 +00:00
Tim Shen	0bd0aa8f07	[AntiDepBreaker] Revert r299124 and add a test. Summary: AntiDepBreaker intends to add all live-outs, including the implicit CSRs, in StartBlock. r299124 was done without understanding that intention. Now with the live-ins propagated correctly (D32464), we can revert this change. Reviewers: MatzeB, qcolombet Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D33697 llvm-svn: 304251	2017-05-30 22:26:52 +00:00
Tim Northover	d276d85309	MIR: update test for noVRegs removal. I think I hadn't git pulled recently enough to bring it in. llvm-svn: 304250	2017-05-30 22:02:19 +00:00
Tim Northover	fb26d9a286	MIR: remove explicit "noVRegs" property. We can infer this from the incoming MIR, so there's no reason to represent it with a special flag. llvm-svn: 304246	2017-05-30 21:28:57 +00:00
Xinliang David Li	74480adafd	[PartialInlining] Shrinkwrap allocas with live range contained in outline region. Differential Revision: http://reviews.llvm.org/D33618 llvm-svn: 304245	2017-05-30 21:22:18 +00:00
Quentin Colombet	73141d5b4b	[Localizer] Don't trick to be smart for the insertion point There is no guarantee that the first use of a constant that is traversed is actually the first in the related basic block. Thus, if we use that as the insertion point we may end up with definitions that don't dominate there use. llvm-svn: 304244	2017-05-30 20:53:06 +00:00
Ben Langmuir	a8217afe16	[llvm-config] Fix cflags test looking for "warning" This will fail if you configure with e.g. -Wno-unknown-warning-option. Change it to check for 'warning:' just like we did for 'error:' in r289484. llvm-svn: 304239	2017-05-30 20:21:47 +00:00
Matthew Simpson	646475a9bc	[LV] Reapply r303763 with fix for PR33193 r303763 caused build failures in some out-of-tree tests due to an assertion in TTI. The original patch updated cost estimates for induction variable update instructions marked for scalarization. However, it didn't consider that the incoming value of an induction variable phi node could be a cast instruction. This caused queries for cast instruction costs with a mix of vector and scalar types. This patch includes a fix for cast instructions and the test case from PR33193. The fix was suggested by Jonas Paulsson <paulsson@linux.vnet.ibm.com>. Reference: https://bugs.llvm.org/show_bug.cgi?id=33193 Original Differential Revision: https://reviews.llvm.org/D33457 llvm-svn: 304235	2017-05-30 19:55:57 +00:00
Vedant Kumar	87aefe9042	Revert "This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level." This reverts commit r304209. I think this change is responsible for a tablgen failure in stage2 builds: http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/2171/ I reproduced the failure locally (without ThinLTO), reverted the commit, rebuilt the stage1 clang, rebuilt the stage2 llvm-tblgen tool, and found that the crash disappears when the commit is reverted. Here is the stack trace: FAILED: lib/Target/ARM/ARMGenRegisterBank.inc.tmp cd /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM && /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes /Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp 0 llvm-tblgen 0x0000000106fc9568 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40 1 llvm-tblgen 0x0000000106fc9be6 SignalHandler(int) + 422 2 libsystem_platform.dylib 0x00000001076a7fba _sigtramp + 26 3 libsystem_platform.dylib 0x00007fff58deb468 _sigtramp + 1366570184 4 llvm-tblgen 0x0000000106e89cc7 llvm::CodeGenRegBank::getCompositeSubRegIndex(llvm::CodeGenSubRegIndex, llvm::CodeGenSubRegIndex) + 615 5 llvm-tblgen 0x0000000106e88be6 llvm::CodeGenRegister::computeSubRegs(llvm::CodeGenRegBank&) + 2182 6 llvm-tblgen 0x0000000106e8e9f0 llvm::CodeGenRegBank::CodeGenRegBank(llvm::RecordKeeper&) + 2192 7 llvm-tblgen 0x0000000106f384a1 llvm::EmitRegisterBank(llvm::RecordKeeper&, llvm::raw_ostream&) + 65 8 llvm-tblgen 0x0000000106f72c64 (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&) + 1172 9 llvm-tblgen 0x0000000106fcb15f llvm::TableGenMain(char, bool ()(llvm::raw_ostream&, llvm::RecordKeeper&)) + 3599 10 llvm-tblgen 0x0000000106f727a6 main + 134 11 libdyld.dylib 0x000000010733c6a5 start + 1 Stack dump: 0. Program arguments: /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp /bin/sh: line 1: 41986 Segmentation fault: 11 /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz -master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp llvm-svn: 304231	2017-05-30 19:25:22 +00:00
Eric Beckmann	72fb6a87fb	Adding parsing ability for .res file. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33566 llvm-svn: 304225	2017-05-30 18:19:06 +00:00
Craig Topper	4cec434a1b	[InstCombine] Add test cases to show missed opportunities to remove compare instructions after cttz/ctlz/ctpop where some bits of the input is known. llvm-svn: 304224	2017-05-30 17:47:59 +00:00
Krzysztof Parzyszek	ef58017b35	[Hexagon] Improve code generation for 32x32-bit multiplication For multiplications of 64-bit values (giving 64-bit result), detect cases where the arguments are sign-extended 32-bit values, on a per- operand basis. This will allow few patterns to match a wider variety of combinations in which extensions can occur. llvm-svn: 304223	2017-05-30 17:47:51 +00:00
Stanislav Mekhanoshin	56ea488d8b	[AMDGPU] Allow SDWA in instructions with immediates and SGPRs An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219	2017-05-30 16:49:24 +00:00
Mark Searles	00ce96f6ee	[AMDGPU] Require waitcnt before barrier for all targets; adjust tests. Differential Revision: https://reviews.llvm.org/D33576 llvm-svn: 304217	2017-05-30 16:22:43 +00:00
Andrew V. Tischenko	8b04826663	This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level. llvm-svn: 304209	2017-05-30 13:00:44 +00:00
Ulrich Weigand	3f484e68cc	[SystemZ] Add decimal floating-point instructions This adds assembler / disassembler support for the decimal floating-point instructions. Since LLVM does not yet have support for decimal float types, these cannot be used for codegen at this point. llvm-svn: 304203	2017-05-30 10:15:16 +00:00
Ulrich Weigand	f32adf6944	[SystemZ] Add hexadecimal floating-point instructions This adds assembler / disassembler support for the hexadecimal floating-point instructions. Since the Linux ABI does not use any hex float data types, these are not useful for codegen. llvm-svn: 304202	2017-05-30 10:13:23 +00:00
Ulrich Weigand	6ceea9a4d3	[SystemZ] Add missing assembler/disassembler tests A few instructions that are actually correctly supported in the assembler and disassembler did not have any tests. llvm-svn: 304200	2017-05-30 10:11:13 +00:00
Oliver Stannard	3d0f9507d5	[MC] Fix constant pools with DenseMap sentinel values The MC ConstantPool class uses a DenseMap to track generated constants, with the int64_t value of the constant as the key. This fails when values of 0x7fffffffffffffff or 0x7ffffffffffffffe are inserted into the constant pool, as these are sentinel values for DenseMap. The fix is to use std::map instead, which doesn't use sentinel values. Differential revision: https://reviews.llvm.org/D33667 llvm-svn: 304199	2017-05-30 09:37:11 +00:00
Zoran Jovanovic	375b60de74	[mips] Expansion of LI.S and LI.D Author: smaksimovic Reviewers: dsanders sdardis Introduces LI.S and LI.D pseudo instructions with floating point operands. Differential Revision: https://reviews.llvm.org/D14390 llvm-svn: 304198	2017-05-30 09:33:43 +00:00
Kristof Beyls	2af1e90eb2	Fix PR33031: correct the estimate of maximum offset for instructions spilling/filling the stack. llvm-svn: 304196	2017-05-30 06:58:41 +00:00
Joerg Sonnenberger	9375a25342	Revert r303763, results in asserts i.e. while building Ruby. llvm-svn: 304179	2017-05-29 22:52:17 +00:00
Zvi Rackover	c7bf2a1fae	[X86] Add tests for (ix bitcast (vxi1 and ...)). NFC. To be improved by D33311. llvm-svn: 304171	2017-05-29 19:00:57 +00:00
Zvi Rackover	41e01b3c98	[X86] Replace undef value in flaky test D33311 exposes the flakiness in this test. Replacing the undef placed by bugpoint, makes it more interesting and robust. llvm-svn: 304168	2017-05-29 18:27:00 +00:00
Benjamin Kramer	41b61242a4	[wasm] Fix test after r304117. llvm-svn: 304164	2017-05-29 16:32:52 +00:00
Benjamin Kramer	fd1952761e	[X86] Don't fold away the memory operand of an xchg. xchg with a mem operand has different locking semantics. If we unfold it into a xchg r,r we will loose the implicit lock. Likewise we never want to fold a register xchg into a memory one as it would be a lot slower. This triggers during LLVM selfhost. llvm-svn: 304163	2017-05-29 16:25:20 +00:00
Sanjay Patel	51152a3727	[DAGCombiner] fix load narrowing transform to exclude loads with extension The extending load possibility was missed in: https://reviews.llvm.org/rL304072 We might want to handle this cases as a follow-up, but bailing out for now to avoid miscompiling. llvm-svn: 304153	2017-05-29 13:24:58 +00:00
Nikolai Bozhenov	82f0801c1b	[Nios2] Target registration Reviewers: craig.topper, hfinkel, joerg, lattner, zvi Reviewed By: craig.topper Subscribers: oren_ben_simhon, igorb, belickim, tvvikram, mgorny, llvm-commits, pavel.v.chupin, DavidKreitzer Differential Revision: https://reviews.llvm.org/D32669 Patch by AndreiGrischenko <andrei.l.grischenko@intel.com> llvm-svn: 304144	2017-05-29 09:48:30 +00:00
Diana Picus	bf4aed2c38	[ARM] GlobalISel: Support array returns These are a bit rare in practice, but they don't require anything special compared to array parameters, so support them as well. llvm-svn: 304137	2017-05-29 08:19:19 +00:00
Diana Picus	8cca8cb0ce	[ARM] GlobalISel: Support array parameters/arguments Clang coerces structs into arrays, so it's a good idea to support them. Most of the support boils down to getting the splitToValueTypes helper to actually split types. We then use G_INSERT/G_EXTRACT to deal with the parts. llvm-svn: 304132	2017-05-29 07:01:52 +00:00
Mehdi Amini	96ab48f9da	DebugInfo: Include .dwo file name when hashing multiple CUs in a single file This is really a workaround for ThinLTO in particular - since it can import partial CUs that may end up looking very similar/the same as the same partial import in another ThinLTO compile. An alternative fix would be to change the DICompileUnit metadata to include a "primary file" or the like - and when importing for ThinLTO set the primary file to the name of the DICompileUnit that is being imported into. This involves changing the schema and would reduce the excessive uniqueness in the hash that this change creates - allowing diagnosing of more duplicate CUs than will be caught with this change. But duplicate CUs can still be caught in non-ThinLTO builds & are mostly a nuisance rather than a particularly deliberate/effective tool for finding broken code. (arguably the hash could always include the dwo file and nothing in fission would break, I think..) Reapply of r304119 after adding a triple to the test and moving it to the X86 directory. llvm-svn: 304130	2017-05-29 06:32:34 +00:00
Mehdi Amini	4181205563	DebugInfo: Omit an empty CU when a subprogram was moved into its use When the only use of a CU is for a subprogram that's only emitted into the using CU (to avoid cross-CU references in DWO files), avoid creating that CU at all. Reapply of r304111 after adding a triple to the test and moving it to the X86 directory. llvm-svn: 304129	2017-05-29 06:25:30 +00:00
Tobias Grosser	8cf785f6b1	Revert "[IfConversion] Keep the CFG updated incrementally in IfConvertTriangle" The reverted change introdued assertions ala: "MachineBasicBlock::succ_iterator llvm::MachineBasicBlock::removeSuccessor(succ_iterator, bool): Assertion `I != Successors.end() && "Not a current successor!"' Mikael, the original committer, wrote me that he is working on a fix, but that it likely will take some time to get this resolved. As this bug is one of the last two issues that keep the AOSP buildbot from turning green, I revert the original commit r302876. I am looking forward to see this recommitted after the assertion has been resolved. llvm-svn: 304128	2017-05-29 06:12:18 +00:00
Mehdi Amini	e161ced16a	Revert "DebugInfo: Omit an empty CU when a subprogram was moved into its use" This reverts commit r304111. GreenDragon is broken. llvm-svn: 304126	2017-05-29 05:17:57 +00:00
Mehdi Amini	d8056bb7d8	Revert "DebugInfo: Include .dwo file name when hashing multiple CUs in a single file" This reverts commit r304119 and r304118. GreenDragon is broken. llvm-svn: 304125	2017-05-29 05:17:54 +00:00
Zachary Turner	df1832cf86	Resubmit "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables." This was reverted due to buildbot breakages and I was not familiar with this code to investigate it. But while trying to get a useful backtrace for the author, it turns out the fix was very obvious. Resubmitting this patch as is, and will submit the fix in a followup so that the fix is not hidden in the larger CL. llvm-svn: 304122	2017-05-29 02:19:37 +00:00
Zachary Turner	5b199be769	Revert "[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables." This reverts commit 28cb1003507f287726f43c771024a1dc102c45fe as well as all subsequent followups. llvm-tblgen currently segfaults with this change, and it seems it has been broken on the bots all day with no fixes in preparation. See, for example: http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/ llvm-svn: 304121	2017-05-29 01:48:53 +00:00
David Blaikie	ce0c205813	DebugInfo: Include .dwo file name when hashing multiple CUs in a single file This is really a workaround for ThinLTO in particular - since it can import partial CUs that may end up looking very similar/the same as the same partial import in another ThinLTO compile. An alternative fix would be to change the DICompileUnit metadata to include a "primary file" or the like - and when importing for ThinLTO set the primary file to the name of the DICompileUnit that is being imported into. This involves changing the schema and would reduce the excessive uniqueness in the hash that this change creates - allowing diagnosing of more duplicate CUs than will be caught with this change. But duplicate CUs can still be caught in non-ThinLTO builds & are mostly a nuisance rather than a particularly deliberate/effective tool for finding broken code. (arguably the hash could always include the dwo file and nothing in fission would break, I think..) llvm-svn: 304119	2017-05-29 00:48:45 +00:00
David Blaikie	02f8a07689	Attempt to fix buildbots... llvm-svn: 304118	2017-05-29 00:24:01 +00:00
David Blaikie	f2f898a044	DebugInfo: Omit an empty CU when a subprogram was moved into its use When the only use of a CU is for a subprogram that's only emitted into the using CU (to avoid cross-CU references in DWO files), avoid creating that CU at all. llvm-svn: 304111	2017-05-28 22:51:37 +00:00
Sanjay Patel	bb9fe3b409	[x86] auto-generate better checks; NFC llvm-svn: 304090	2017-05-28 13:57:59 +00:00
Ayman Musa	d9f1fe43a8	[X86] Adding new LLVM TableGen backend that generates the X86 backend memory folding tables. X86 backend holds huge tables in order to map between the register and memory forms of each instruction. This TableGen Backend automatically generated all these tables with the appropriate flags for each entry. Differential Revision: https://reviews.llvm.org/D32684 llvm-svn: 304088	2017-05-28 12:55:36 +00:00
David Blaikie	7b91deb68d	DebugInfo: Add source code/build instructions for split-dwarf-dwp symbolizer test Addressing post-commit code review feedback from Paul Robinson on r303609. llvm-svn: 304080	2017-05-27 19:52:20 +00:00
Gor Nishanov	ffbeb22b6f	Cloning: Fix debug info cloning Summary: I believe https://reviews.llvm.org/rL302576 introduced two bugs: 1) it produces duplicate distinct variables for every: dbg.value describing the same variable. To fix the problme I switched form getDistinct() to get() in DebugLoc.cpp: auto reparentVar = [&](DILocalVariable Var) { return DILocalVariable::getDistinct( 2) It passes NewFunction plain name as a linkagename parameter to Subprogram constructor. Breaks assert in: \|\| DeclLinkageName.empty()) \|\| LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed. #9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3 # (Edit: reproducer added) Here how https://reviews.llvm.org/rL302576 broke coroutine debug info. Coroutine body of the original function is split into several parts by cloning and removing unneeded code. All parts describe the original function and variables present in the original function. For a simple case, prior to Split, original function has these two blocks: ``` PostSpill: ; preds = %AllocaSpillBB call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !14, metadata !15), !dbg !13 store i32 %x, i32* %x.addr, align 4 ... and sw.epilog: ; preds = %sw.bb %x.addr.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4, !dbg !20 %4 = load i32, i32* %x.addr.reload.addr, align 4, !dbg !20 call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !14, metadata !15), !dbg !13 !14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11) ``` Note that in two blocks different expression represent the same original user variable X. Before rL302576, for every cloned function there was exactly one cloned DILocalVariable(name: "x" as in: ``` define i8* @f(i32 %x) #0 !dbg !6 { ... !6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, ... !14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11) define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 { ... !25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2) !28 = !DILocalVariable(name: "x", arg: 1, scope: !25, file: !7, line: 55, type: !11) ``` After rL302576, for every cloned function there were as many DILocalVariable(name: "x" as there were "call void @llvm.dbg.value" for that variable. This was causing asserts in VerifyDebugInfo and AssemblyPrinter. Example: ``` !27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, !29 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11) !39 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11) !41 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11) ``` Second problem: Prior to rL302576, all clones were described by DISubprogram referring to original function. ``` define i8* @f(i32 %x) #0 !dbg !6 { ... !6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 { ... !25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, ``` After rL302576, DISubprogram for clones is of two minds, plain name refers to the original name, linkageName refers to plain name of the clone. ``` !27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, ``` I think the assumption in AsmPrinter is that both name and linkageName should refer to the same entity. It asserts here when they are not: ``` \|\| DeclLinkageName.empty()) \|\| LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed. #9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const*, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3 ``` After this fix, behavior (with respect to coroutines) reverts to exactly as it was before and therefore making them debuggable again, or even more importantly, compilable, with "-g" Reviewers: dblaikie, echristo, aprantl Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33614 llvm-svn: 304079	2017-05-27 19:41:09 +00:00
Tobias Grosser	e3684d0b84	[SCEV] Assume parameters coming from function calls contain IVs The optimistic delinearization implemented in LLVM detects array sizes by looking for non-linear products between parameters and induction variables. In OpenCL code, such products often look like: A[get_global_id(0) * N + get_global_id(1)] Hence, the IV is hidden in the get_global_id() call and consequently delinearization would fail as no induction variable is available that helps us to identify N as array size parameter. We now use a very simple heuristic to change this. We assume that each parameter that comes directly from a function call is a hidden induction variable. As a result, we can delinearize the access above to: A[get_global_id(0)][get_global_id(1] llvm-svn: 304073	2017-05-27 15:17:49 +00:00
Sanjay Patel	33f4a97287	[DAGCombiner] use narrow load to avoid vector extract If we have (extract_subvector(load wide vector)) with no other users, that can just be (load narrow vector). This is intentionally conservative. Follow-ups may loosen the one-use constraint to account for the extract cost or just remove the one-use check. The memop chain updating is based on code that already exists multiple times in x86 lowering, so that should be pulled into a helper function as a follow-up. Background: this is a potential improvement noticed via regressions caused by making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see comments in D33137). Differential Revision: https://reviews.llvm.org/D33578 llvm-svn: 304072	2017-05-27 14:07:03 +00:00
Matthias Braun	88c8c9847d	AArch64/PEI: Do not add reserved regs to liveins We do not track liveness for reserved registers. It is unnecessary to add them to block livein lists. llvm-svn: 304059	2017-05-27 03:38:02 +00:00
Keno Fischer	090f1959c1	[SCEVExpander] Try harder to avoid introducing inttoptr Summary: This fixes introduction of an incorrect inttoptr/ptrtoint pair in the included test case which makes use of non-integral pointers. I suspect there are more cases like this left, but this takes care of the one I was seeing at the moment. Reviewers: sanjoy Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D33129 llvm-svn: 304058	2017-05-27 03:22:55 +00:00
Matthias Braun	868bbd4022	ScheduleDAGInstrs: Fix fixupKills() Rewrite fixupKills() to use the LivePhysRegs class. Simplifies the code and fixes a bug where the CSR registers in return blocks where missed leading to invalid kill flags. Also remove the unnecessary rule that we wouldn't set kill flags on tied operands. No tests as I have an upcoming commit improving MachineVerifier checks to catch these cases in multiple existing lit tests. llvm-svn: 304055	2017-05-27 02:50:50 +00:00
Quentin Colombet	7a43eddf28	[AArch64][GlobalISel] Add the Localizer pass for the O0 pipeline This should fix most of the issue we have right now with constants being spilled all over the place. llvm-svn: 304052	2017-05-27 01:34:07 +00:00
Quentin Colombet	bece442bd8	[GlobalISel] Add a localizer pass for target to use This reverts commit r299287 plus clean-ups. The localizer pass is a helper pass that could be run at O0 in the GISel pipeline to work around the deficiency of the fast register allocator. It basically shortens the live-ranges of the constants so that the allocator does not spill all over the place. Long term fix would be to make the greedy allocator fast. llvm-svn: 304051	2017-05-27 01:34:00 +00:00
Wei Mi	5bbb5aafc1	[GVN] Recommit the patch "Add phi-translate support in scalarpre". The recommit is to fix a bug about ExtractValue and InsertValue ops. For those ops, some varargs inside GVN::Expression are not value numbers but raw index numbers. It is wrong to do phi-translate for raw index numbers, and the fix is to stop doing that. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. Differential Revision: https://reviews.llvm.org/D32252 llvm-svn: 304050	2017-05-27 00:54:19 +00:00
Matthias Braun	b4f74224ff	AArch64: Fix cmpxchg O0 expansion - Rewrite livein calculation to use the computeLiveIns() helper function. This is slightly less efficient but easier to reason about and doesn't unnecessarily add pristine and reserved registers[1] - Zero the status register at the beginning of the loop to make sure it has a defined value. - Remove kill flags of values that need to stay alive throughout the loop. [1] An upcoming commit of mine will tighten the MachineVerifier to catch these. llvm-svn: 304048	2017-05-26 23:48:59 +00:00
David Blaikie	23d2f0d77a	Fix test broken by r304020 It's a workaround because the test was flakey passing to begin with, but it looks like (going off commit history) it really did want to test in the presence of debug info, so keep that behavior (by adding something to the CU so it's not dropped) & restore the flakey pass in the process. (added a FIXME in case someone else decides to look at it later) llvm-svn: 304042	2017-05-26 22:11:18 +00:00
Konstantin Zhuravlyov	b2ff8dfea0	Resubmit r303859 with test fixed. [AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Patch by Tim Corringham llvm-svn: 304031	2017-05-26 20:38:26 +00:00
Craig Topper	1da22c3244	[InstSimplify] Use m_APInt instead of m_ConstantInt in ((V + N) & C1) \| (V & C2) handling in order to support splat vectors. The tests here are have operands commuted to provide more coverage. I also commuted one of the instructions in the scalar tests so the 4 tests cover the 4 commuted variations Differential Revision: https://reviews.llvm.org/D33599 llvm-svn: 304021	2017-05-26 19:03:53 +00:00
David Blaikie	07963bd1d1	DebugInfo: Do not emit empty CUs Consistent with GCC and addresses a shortcoming with ThinLTO where many imported CUs may end up being empty (because the functions imported from them either ended up not being used (and were then discarded, since they're imported as available_externally) or optimized away entirely). Test cases previously testing empty CUs (either intentionally, or because they didn't need anything more complicated) had a trivial 'int' or similar basic type added to their retained types list. This is a first order approximation - a deeper implementation could do things like: 1) Be more lazy about construction of the CU - for example if two CUs containing a single identical retained type are linked together, with this change one of the two CUs will be produced but empty (since a duplicate type won't be produced). 2) Go further and invert all the CU links the same way the subprogram link is inverted - keep named CU lists of retained types, macros, etc, and have those link back to the CU. Then if they're emitted, the CU is emitted, but never otherwise - this would allow the metadata itself to be dropped earlier too, though it seems unlikely that's an important optimization as there shouldn't be many CUs relative to the number of other entities. llvm-svn: 304020	2017-05-26 18:52:56 +00:00
Peter Collingbourne	7730b24448	PMB: Run the whole-program-devirt pass during LTO at --lto-O0. The whole-program-devirt pass needs to run at -O0 because only it knows about the llvm.type.checked.load intrinsic: it needs to both lower the intrinsic itself and handle it in the summary. Differential Revision: https://reviews.llvm.org/D33571 llvm-svn: 304019	2017-05-26 18:27:13 +00:00
Dmitry Preobrazhensky	6a2431df0b	[AMDGPU][MC][GFX9] Corrected encoding of flat_scratch* for SDWA opcodes See bug 33171: https://bugs.llvm.org/show_bug.cgi?id=33171 Reviewers: Sam Kolton Differential Revision: https://reviews.llvm.org/D33553 llvm-svn: 304015	2017-05-26 18:01:29 +00:00
David Blaikie	7f2b717b52	DebugInfo: Don't include locations for debug-having code inlined into nodebug functions This produced 'strange' DWARF anyway - the CU would have no ranges (or at least not a range including the inlined code) nor any subprogram or inlined_subroutine - yet the line table would have entries for these instructions. (this actually becomes more relevant with changes coming after this, where a CU without any contents will be omitted entirely - so there would be no line table to put this on anyway) llvm-svn: 304004	2017-05-26 17:05:15 +00:00
Tom Stellard	dde28a8c92	AMDGPU/GlobalISel: Mark 32-bit float constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33212 llvm-svn: 304003	2017-05-26 16:40:03 +00:00
Matthias Braun	eec1f3672a	LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI Re-commit r303938 and r303954 with a fix for addLiveIns(): the internal addPristines() function must be called on an empty set or it may accidentally reset saved registers. - addLiveOutsNoPristines() needs to add callee saved registers that are actually saved and restored somewhere to the set (they are not pristine). - Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines(). This fixes the problem from D32156. Differential Revision: https://reviews.llvm.org/D32464 llvm-svn: 304001	2017-05-26 16:23:08 +00:00
Sam Kolton	363f47a2c7	[AMDGPU] SDWA: add disassembler support for GFX9 Summary: Added decoder methods and tests Reviewers: vpykhtin, artem.tamazov, dp Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33545 llvm-svn: 303999	2017-05-26 15:52:00 +00:00
Sanjay Patel	ec13ebf2c8	[DAGCombiner] use narrow vector ops to eliminate concat/extract (PR32790) In the best case: extract (binop (concat X1, X2), (concat Y1, Y2)), N --> binop XN, YN ...we kill all of the extract/concat and just have narrow binops remaining. If only one of the binop operands is amenable, this transform is still worthwhile because we kill some of the extract/concat. Optional bitcasting makes the code more complicated, but there doesn't seem to be a way to avoid that. The TODO about extending to more than bitwise logic is there because we really will regress several x86 tests including madd, psad, and even a plain integer-multiply-by-2 or shift-left-by-1. I don't think there's anything fundamentally wrong with this patch that would cause those regressions; those folds are just missing or brittle. If we extend to more binops, I found that this patch will fire on at least one non-x86 regression test. There's an ARM NEON test in test/CodeGen/ARM/coalesce-subregs.ll with a pattern like: t5: v2f32 = vector_shuffle<0,3> t2, t4 t6: v1i64 = bitcast t5 t8: v1i64 = BUILD_VECTOR Constant:i64<0> t9: v2i64 = concat_vectors t6, t8 t10: v4f32 = bitcast t9 t12: v4f32 = fmul t11, t10 t13: v2i64 = bitcast t12 t16: v1i64 = extract_subvector t13, Constant:i32<0> There was no functional change in the codegen from this transform from what I could see though. For the x86 test changes: 1. PR32790() is the closest call. We don't reduce the AVX1 instruction count in that case, but we improve throughput. Also, on a core like Jaguar that double-pumps 256-bit ops, there's an unseen win because two 128-bit ops have the same cost as the wider 256-bit op. SSE/AVX2/AXV512 are not affected which is expected because only AVX1 has the extract/concat ops to match the pattern. 2. do_not_use_256bit_op() is the best case. Everyone wins by avoiding the concat/extract. Related bug for IR filed as: https://bugs.llvm.org/show_bug.cgi?id=33026 3. The SSE diffs in vector-trunc-math.ll are just scheduling/RA, so nothing real AFAICT. 4. The AVX1 diffs in vector-tzcnt-256.ll are all the same pattern: we reduced the instruction count by one in each case by eliminating two insert/extract while adding one narrower logic op. https://bugs.llvm.org/show_bug.cgi?id=32790 Differential Revision: https://reviews.llvm.org/D33137 llvm-svn: 303997	2017-05-26 15:33:18 +00:00
John Brawn	9009d2905d	[ARM] Fix lowering of misaligned memcpy/memset Currently getOptimalMemOpType returns i32 for large enough sizes without checking for alignment, leading to poor code generation when misaligned accesses aren't permitted as we generate a word store then later split it up into byte stores. This means we inadvertantly go over the MaxStoresPerMemcpy limit and for memset we splat the memset value into a word then immediately split it up again. Fix this by leaving it up to FindOptimalMemOpLowering to figure out which type to use, but also fix a bug there where it wasn't correctly checking if misaligned memory accesses are allowed. Differential Revision: https://reviews.llvm.org/D33442 llvm-svn: 303990	2017-05-26 13:59:12 +00:00
Amaury Sechet	ba9d8ba82a	nits in wide-integer-cmp.ll . NFC llvm-svn: 303989	2017-05-26 13:56:54 +00:00
John Brawn	57b2492b38	[ARM] Add tests for 6-M memcpy/memset code generation Differential Revision: https://reviews.llvm.org/D33495 llvm-svn: 303987	2017-05-26 13:52:36 +00:00
Andrew V. Tischenko	fdb264e263	The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands." llvm-svn: 303985	2017-05-26 13:23:34 +00:00
Max Kazantsev	41450329f7	Re-enable "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start" The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on many non-X86 configs with this patch. The reason of this is that the patch makes a correctless fix that changes optimizer's behavior for this test. Without the change, LSR was making an overconfident simplification basing on a wrong SCEV. Apparently it did not need the IV analysis to do this. With the change, it chose a different way to simplify (that wasn't so confident), and this way required the IV analysis. Now, following the right execution path, LSR tries to make a transformation relying on IV Users analysis. This analysis is target-dependent due to this code: // LSR is not APInt clean, do not touch integers bigger than 64-bits. // Also avoid creating IVs of non-native types. For example, we don't want a // 64-bit IV in 32-bit code just because the loop has one 64-bit cast. uint64_t Width = SE->getTypeSizeInBits(I->getType()); if (Width > 64 \|\| !DL.isLegalInteger(Width)) return false; To make a proper transformation in this test case, the type i32 needs to be legal for the specified data layout. When the test runs on some non-X86 configuration (e.g. pure ARM 64), opt gets confused by the specified target and does not use it, rejecting the specified data layout as well. Instead, it uses some default layout that does not treat i32 as a legal type (currently the layout that is used when it is not specified does not have legal types at all). As result, the transformation we expect to happen does not happen for this test. This re-enabling patch does not have any source code changes compared to the original patch rL303730. The only difference is that the failing test is moved to X86 directory and now has requirement of running on x86 only to comply with the specified target triple and data layout. Differential Revision: https://reviews.llvm.org/D33543 llvm-svn: 303971	2017-05-26 06:47:04 +00:00
Wei Mi	3250ae3f7c	Revert rL303923 since it broke the sanitizer bootstrap build bot. llvm-svn: 303969	2017-05-26 05:42:50 +00:00
Matthias Braun	c93c063993	Revert "LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI" Tentatively revert this to see if it fixes the buildbot stage2 breakages. This reverts commit r303938. This reverts commit r303954. llvm-svn: 303960	2017-05-26 02:25:20 +00:00
Matthias Braun	9e6826de77	Test for r303938 llvm-svn: 303954	2017-05-26 01:29:25 +00:00
Chandler Carruth	86248d5632	[PM] Enable the new simple loop unswitch pass in the new pass manager (where it is the only realistic option). This passes the LLVM test suite for me, but I'm clearly still hammering on this. llvm-svn: 303952	2017-05-26 01:24:11 +00:00
Peter Collingbourne	f87197ad91	LTO: Do summary-based prevailing symbol resolution at --lto-O0. Prevailing symbol resolution is necessary for correctness. Without this we can end up dropping a referenced linkonce symbol from the link. Differential Revision: https://reviews.llvm.org/D33570 llvm-svn: 303939	2017-05-25 23:40:11 +00:00
Tim Shen	a2b85da879	[PPC] Fix atomics lowering in DAG lowering. I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931	2017-05-25 22:58:35 +00:00
David Blaikie	9f8669461d	Fix test to handle running on platforms which don't enable pubnames at all Check that there are no entries in the pub sections, but that they may either be not present or present-but-empty. llvm-svn: 303927	2017-05-25 22:10:51 +00:00
Wei Mi	fd257fa7bf	[GVN] Add phi-translate support in scalarpre. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. Differential Revision: https://reviews.llvm.org/D32252 llvm-svn: 303923	2017-05-25 21:49:02 +00:00
Andrew Kaylor	f466001eef	Add constrained intrinsics for some libm-equivalent operations Differential revision: https://reviews.llvm.org/D32319 llvm-svn: 303922	2017-05-25 21:31:00 +00:00
Matthias Braun	1527baab0c	CodeGen: Rename DEBUG_TYPE to match passnames Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921	2017-05-25 21:26:32 +00:00
Bob Haarman	55256ada25	[pdb] pad source file name buffer at the end instead of the beginning Summary: DbiStreamBuilder calculated the offset of the source file names inside the file info substream as the size of the file info substream minus the size of the file names. Since the file info substream is padded to a multiple of 4 bytes, this caused the first file name to be aligned on a 4-byte boundary. By contrast, DbiModuleList would read the file names immediately after the file name offset table, without skipping to the next 4-byte boundary. This change makes it so that the file names are written to the location where DbiModuleList expects them, and puts any necessary padding for the file info substream after the file names instead of before it. Reviewers: amccarth, rnk, zturner Reviewed By: amccarth, zturner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33475 llvm-svn: 303917	2017-05-25 21:12:15 +00:00
Sam Clegg	1c154a6107	[WebAssembly] MC: Include unnamed data when writing wasm files Also, include global entries for all data symbols, not just external ones, since these are referenced by the relocation records. Add a test case that includes unnamed data. Differential Revision: https://reviews.llvm.org/D33079 llvm-svn: 303915	2017-05-25 21:08:07 +00:00
Nico Weber	b3d83a092a	Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots. llvm-svn: 303902	2017-05-25 19:19:29 +00:00
Manoj Gupta	d536180fdc	[AArch64]: add 'a' inline asm operand modifier. Summary: This is used in the Linux kernel, and effectively just means "print an address". This brings back r193593. Reviewed by: Renato Golin Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls Subscribers: aemerson, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33558 llvm-svn: 303901	2017-05-25 19:07:57 +00:00
Adrian Prantl	f062192632	Fix SelectionDAGBuilder::getDbgValue to not expect DW_OP_deref on FI vars This fixes an oversight in r300522, which changed alloca dbg.values to no longer emit a DW_OP_deref. The array.ll testcase was regenerated from source. Fixes PR33166: https://bugs.llvm.org/show_bug.cgi?id=33166 llvm-svn: 303897	2017-05-25 18:54:10 +00:00
David Blaikie	b3cee2fb42	DebugInfo: Produce debug_{gnu_}pub{names,types} entries when explicitly requested, even in -gmlt or when empty Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to parse the rest of the DIEs when building gdb-index. This causes gold to trip over LLVM's output when there are DW_FORM_ref_addr present. Gold does use the presence of a debug_gnu_pub{names,types} entry for the CU to skip parsing the debug_info portion, so make sure that's included even when empty (technically, when empty there couldn't be any ref_addr anyway - it only came up when gmlt didn't produce any (even non-empty) pubnames - but given what that reveals about gold's implementation, this seems like a good thing to do for consistency). llvm-svn: 303894	2017-05-25 18:50:28 +00:00
Bob Haarman	ea91fafd33	[llvm-pdbdump] [yaml2pdb] always include object file name in module info Summary: Previously, the yaml2pdb subcommand of llvm-pdbdump only included object file names in module info if a module info stream was present. This change makes it so that we include the object file name even if there is no module info stream for the module. As a result, running llvm-pdbdump pdb2yaml -dbi-module-info original.pdb > original.yaml && llvm-pdbdump yaml2pdb -pdb=new.pdb original.yaml && llvm-pdbdump pdb2yaml -dbi-module-info new.pdb > new.yaml now produces identical original.yaml and new.yaml files. Reviewers: amccarth, zturner Reviewed By: zturner Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D33463 llvm-svn: 303891	2017-05-25 18:04:17 +00:00
Daniel Berlin	e67c322260	NewGVN: Fix PR 33119, PR 33129, due to regressed undef handling Fix PR33120 and others by eliminating self-cycles a different way. llvm-svn: 303875	2017-05-25 15:44:20 +00:00
Artur Pilipenko	315eafc339	[InstCombine] Teach isAllocSiteRemovable to look through addrspacecasts Reviewed By: reames Differential Revision: https://reviews.llvm.org/D28565 llvm-svn: 303870	2017-05-25 15:14:48 +00:00
Sanjay Patel	5150612012	[InstCombine] make icmp-mul fold more efficient There's probably a lot more like this (see also comments in D33338 about responsibility), but I suspect we don't usually get a visible manifestation. Given the recent interest in improving InstCombine efficiency, another potential micro-opt that could be repeated several times in this function: morph the existing icmp pred/operands instead of creating a new instruction. llvm-svn: 303860	2017-05-25 14:13:57 +00:00
Tim Corringham	32d0d38679	[AMDGPU] add intrinsic for s_getpc Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859	2017-05-25 14:04:14 +00:00
Oren Ben Simhon	7bf27f03f2	[X86] Adding vpopcntd and vpopcntq instructions AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858	2017-05-25 13:45:23 +00:00
James Molloy	a929063233	[GVNSink] GVNSink pass This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts. This pass attempts to sink instructions into successors, reducing static instruction count and enabling if-conversion. We use a variant of global value numbering to decide what can be sunk. Consider: [ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ] [ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ] \ / [ %e = phi i32 %a2, %c2 ] [ add i32 %e, 4 ] GVN would number %a1 and %c1 differently because they compute different results - the VN of an instruction is a function of its opcode and the transitive closure of its operands. This is the key property for hoisting and CSE. What we want when sinking however is for a numbering that is a function of the uses of an instruction, which allows us to answer the question "if I replace %a1 with %c1, will it contribute in an equivalent way to all successive instructions?". The (new) PostValueTable class in GVN provides this mapping. This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG. Differential revision: https://reviews.llvm.org/D24805 llvm-svn: 303850	2017-05-25 12:51:11 +00:00
Chandler Carruth	f4d62c480c	[PM] Teach the PGO instrumentation pasess to run GlobalDCE before instrumenting code. This is important in the new pass manager. The old pass manager's inliner has a small DCE routine embedded within it. The new pass manager relies on the actual GlobalDCE pass for this. Without this patch, instrumentation profiling with the new PM results in massive code bloat in the object files because the instrumentation itself ends up preventing DCE from working to remove the code. We should probably change the instrumentation (and/or DCE) so that we can eliminate dead code even if instrumented, but we shouldn't even spend the time generating instrumentation for that code so this still seems like a good patch. Differential Revision: https://reviews.llvm.org/D33535 llvm-svn: 303845	2017-05-25 07:15:09 +00:00
Chandler Carruth	dd2e275a47	[PM/Unswitch] Fix a bug in the domtree update logic for the new unswitch pass. The original logic only considered direct successors of the hoisted domtree nodes, but that isn't really enough. If there are other basic blocks that are completely within the subtree, their successors could just as easily be impacted by the hoisting. The more I think about it, the more I think the correct update here is to hoist every block on the dominance frontier which has an idom in the chain we hoist across. However, this is subtle enough that I'd definitely appreciate some more eyes on it. Sadly, if this is the correct algorithm, it requires computing a (highly localized) dominance frontier. I've done this in the simplest (IE, least code) way I could come up with, but that may be too naive. Suggestions welcome here, dominance update algorithms are not an area I've studied much, so I don't have strong opinions. In good news, with this patch, turning on simple unswitch passes the LLVM test suite for me with asserts enabled. Differential Revision: https://reviews.llvm.org/D32740 llvm-svn: 303843	2017-05-25 06:33:36 +00:00
George Karpenkov	a1c532784d	Fix coverage check for full post-dominator basic blocks. Coverage instrumentation which does not instrument full post-dominators and full-dominators may skip valid paths, as the reasoning for skipping blocks may become circular. This patch fixes that, by only skipping full post-dominators with multiple predecessors, as such predecessors by definition can not be full-dominators. llvm-svn: 303827	2017-05-25 01:41:46 +00:00
Gor Nishanov	0ea1863b27	[coroutines] Relocate instructions that maybe spilled after coro.begin Summary: Frontend generates store instructions after allocas, for example: ``` define i8* @f(i64 %this) "coroutine.presplit"="1" personality i32 0 { entry: %this.addr = alloca i64 store i64 %this, i64* %this.addr .. %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc) ``` Such instructions may require spilling into coro.frame, but, coro-frame address is only available after coro.begin and thus needs to be moved after coro.begin. The only instructions that should not be moved are the arguments of coro.begin and all of their operands. Reviewers: GorNishanov, majnemer Reviewed By: GorNishanov Subscribers: llvm-commits, EricWF Differential Revision: https://reviews.llvm.org/D33527 llvm-svn: 303825	2017-05-25 00:46:20 +00:00
Tony Jiang	0a429f040e	[PowerPC] Fix a performance bug for PPC::XXSLDWI. There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822	2017-05-24 23:48:29 +00:00
Rafael Espindola	8b78185e00	Print symbols from COFF import libraries. This change allows llvm-nm to print symbols found in import libraries, in part by allowing COFFImportFiles to be casted to SymbolicFiles. Patch by Dave Lee! llvm-svn: 303821	2017-05-24 23:40:36 +00:00
Gor Nishanov	1f72d75714	[coroutines] Allow rematerialization upto 4 times. Remove incorrect assert Reviewers: majnemer Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D33524 llvm-svn: 303819	2017-05-24 23:01:02 +00:00
Sanjay Patel	07b1ba54b5	[InstCombine] use m_APInt to allow icmp-mul-mul vector fold The swapped operands in the first test is a manifestation of an inefficiency for vectors that doesn't exist for scalars because the IRBuilder checks for an all-ones mask for scalars, but not vectors. llvm-svn: 303818	2017-05-24 22:58:17 +00:00
Sanjay Patel	a8ac360a0c	[InstCombine] add tests for icmp eq (mul X, C), (mul Y, C); NFC llvm-svn: 303816	2017-05-24 22:36:14 +00:00
Sanjay Patel	3e8935bdc5	[InstCombine] move tests and use FileCheck; NFC llvm-svn: 303808	2017-05-24 21:48:25 +00:00
Teresa Johnson	cd2aa0d2e4	Fix a couple of typos in memory intrinsic optimization output (NFC) s/instrinsic/intrinsic llvm-svn: 303782	2017-05-24 17:55:25 +00:00
Zaara Syeda	932978315b	P9: D-form vector load/store. Differential Revision: https://reviews.llvm.org/D33248 llvm-svn: 303780	2017-05-24 17:50:37 +00:00
Krzysztof Parzyszek	6a0005d1b4	Move machine-cse-physreg.mir to test/CodeGen/Thumb llvm-svn: 303778	2017-05-24 17:20:47 +00:00
Craig Topper	77e07cc010	[InstSimplify] Simplify uadd/sadd/umul/smul with overflow intrinsics when the Zero or Undef is on the LHS. Summary: This code was migrated from InstCombine a few years ago. InstCombine had nearby code that would move Constants to the RHS for these, but InstSimplify doesn't have such code on this path. Reviewers: spatel, majnemer, davide Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33473 llvm-svn: 303774	2017-05-24 17:05:28 +00:00
Matthew Simpson	6349380fa4	Revert r291254: [AArch64] Reduce vector insert/extract cost for Falkor The default vector insert/extract cost is more profitable on Falkor than the reduced cost. llvm-svn: 303771	2017-05-24 16:48:39 +00:00
Matthew Simpson	d6f179cad6	[LV] Update type in cost model for scalarization For non-uniform instructions marked for scalarization, we should update `VectorTy` when computing instruction costs to reflect the scalar type. In addition to determining instruction costs, this type is also used to signal that all instructions in the loop will be scalarized. This currently affects memory instructions and non-pointer induction variables and their updates. (We also mark GEPs scalar after vectorization, but their cost is computed together with memory instructions.) For scalarized induction updates, this patch also scales the scalar cost by the vectorization factor, corresponding to each induction step. llvm-svn: 303763	2017-05-24 15:26:15 +00:00
Vadzim Dambrouski	b07351f4f8	[MSP430] Fix PR33050: Don't use ADD16ri to lower FrameIndex. Use ADDframe pseudo instruction instead. This will fix machine verifier error, and will help to fix PR32146. Differential Revision: https://reviews.llvm.org/D33452 llvm-svn: 303758	2017-05-24 15:08:30 +00:00
Sanjay Patel	6232406b34	[InstCombine] add tests to show potential missing folds; NFC As noted in https://bugs.llvm.org/show_bug.cgi?id=33138 and the comments, there are multiple ways to view this. If we choose not to solve this in InstCombine, these tests will serve as documentation of that choice. llvm-svn: 303755	2017-05-24 14:56:51 +00:00
Sanjay Patel	d12df92457	[InstCombine] add tests to document bitcast + bitwise-logic behavior; NFC The solution for PR26702 ( https://bugs.llvm.org/show_bug.cgi?id=26702 ) added a canonicalization rule, but the minimal regression tests don't demonstrate how that rule interacts with other folds. llvm-svn: 303750	2017-05-24 14:21:31 +00:00
Diana Picus	183863fc3b	Revert "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start" This reverts commit r303730 because it broke all the buildbots. llvm-svn: 303747	2017-05-24 14:16:04 +00:00
Jonas Paulsson	8624b7e1ce	[LoopVectorizer] Let target prefer scalar addressing computations. The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744	2017-05-24 13:42:56 +00:00
Mikael Holmen	2676f8269a	MachineCSE: Respect interblock physreg liveness Summary: This is a fix for PR32538. MachineCSE first looks at MO.isDead(), but if it is not marked dead, MachineCSE still wants to do its own check to see if it is trivially dead. This check for the trivial case assumed that physical registers cannot be live out of a block. Patch by Mattias Eriksson. Reviewers: qcolombet, jbhateja Reviewed By: qcolombet, jbhateja Subscribers: jbhateja, llvm-commits Differential Revision: https://reviews.llvm.org/D33408 llvm-svn: 303731	2017-05-24 09:35:23 +00:00
Max Kazantsev	13e016bf48	[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that the loop of our base recurrency is the bottom-lost in terms of domination. This assumption may be broken by an expression which is treated as invariant, and which depends on a complex Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence. Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike other SCEVs, SCEVUnknown are sometimes position-bound. For example, here: for (...) { // loop phi = {A,+,B} } X = load ... Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment). It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop> may be existant. This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr, if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer expect such behavior. llvm-svn: 303730	2017-05-24 08:52:18 +00:00
Daniel Sanders	35b72229b1	Explicitly set CPU and -slow-incdec to try to fix r303678's test on llvm-clang-x86_64-expensive-checks-win. llvm-svn: 303727	2017-05-24 07:02:37 +00:00
Daniel Sanders	b40e16fbef	Revert r303720: Tweak r303678's test to try to fix llvm-clang-x86_64-expensive-checks-win. It doesn't fix that builder. llvm-svn: 303721	2017-05-24 06:44:55 +00:00
Daniel Sanders	2087483748	Tweak r303678's test to try to fix llvm-clang-x86_64-expensive-checks-win. I suspect this buildbot has slow-incdec set by default, most likely due to the default CPU having this set. This feature bit can prevent optsize from having an effect on this IR. llvm-svn: 303720	2017-05-24 06:05:14 +00:00
Davide Italiano	fd9100e056	[NewGVN] Update additionalUsers when we simplify to a value. Otherwise we don't revisit an instruction that could be simplified, and when we verify, we discover there's something that changed, i.e. what we had wasn't a maximal fixpoint. Fixes PR32836. llvm-svn: 303715	2017-05-24 02:30:24 +00:00
George Karpenkov	018472c34a	Revert "Disable coverage opt-out for strong postdominator blocks." This reverts commit 2ed06f05fc10869dd1239cff96fcdea2ee8bf4ef. Buildbots do not like this on Linux. llvm-svn: 303710	2017-05-24 00:29:12 +00:00
George Karpenkov	a793cfb441	Revert "Fixes for tests for r303698" This reverts commit 69bfaf72e7502eb08bbca88a57925fa31c6295c6. llvm-svn: 303709	2017-05-24 00:29:08 +00:00
George Karpenkov	65ab07b1f1	Fixes for tests for r303698 llvm-svn: 303701	2017-05-23 22:42:34 +00:00
Davide Italiano	4bc91190ea	[LIR] Strengthen the check for recurrence variable in popcnt/CTLZ. Fixes PR33114. Differential Revision: https://reviews.llvm.org/D33420 llvm-svn: 303700	2017-05-23 22:32:56 +00:00
George Karpenkov	9017ca290a	Disable coverage opt-out for strong postdominator blocks. Coverage instrumentation has an optimization not to instrument extra blocks, if the pass is already "accounted for" by a successor/predecessor basic block. However (https://github.com/google/sanitizers/issues/783) this reasoning may become circular, which stops valid paths from having coverage. In the worst case this can cause fuzzing to stop working entirely. This change simplifies logic to something which trivially can not have such circular reasoning, as losing valid paths does not seem like a good trade-off for a ~15% decrease in the # of instrumented basic blocks. llvm-svn: 303698	2017-05-23 21:58:54 +00:00
Vadzim Dambrouski	49dd6e68c2	[MSP430] Add subtarget features for hardware multiplier. Also add more processors to make -mcpu option behave similar to gcc. Differential Revision: https://reviews.llvm.org/D33335 llvm-svn: 303695	2017-05-23 21:49:42 +00:00
Simon Pilgrim	c910a70b21	[AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045) This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045 Differential Revision: https://reviews.llvm.org/D33451 llvm-svn: 303691	2017-05-23 21:27:15 +00:00
Francis Visoiu Mistrih	1c98701e57	AsmPrinter: mark the beginning and the end of a function in verbose mode llvm-svn: 303690	2017-05-23 21:22:16 +00:00
Changpeng Fang	1dbace195d	AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684	2017-05-23 20:25:41 +00:00
Stanislav Mekhanoshin	53a21292f8	[AMDGPU] Combine and (srl) into shl (bfe) Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681	2017-05-23 19:54:48 +00:00
Oleg Ranevskyy	09df0020fc	[ARM] Temporarily disable globals promotion to constant pools to prevent miscompilation Summary: A temporary workaround for PR32780 - rematerialized instructions accessing the same promoted global through different constant pool entries. The patch turns off the globals promotion optimization leaving all its code in place, so that it can be easily turned on once PR32780 is fixed. Since this is a miscompilation issue causing generation of misbehaving code, and the problem is very subtle, the patch might be valuable enough to get into 4.0.1. Reviewers: efriedma, jmolloy Reviewed By: efriedma Subscribers: aemerson, javed.absar, llvm-commits, rengolin, asl, tstellar Differential Revision: https://reviews.llvm.org/D33446 llvm-svn: 303679	2017-05-23 19:38:37 +00:00
Daniel Sanders	452c8aec61	[globalisel][tablegen] Add support for (set $dst, 1) and test X86's OptForSize predicate. Summary: It's rare but a small number of patterns use IntInit's at the root of the match. On X86, one such rule is enabled by the OptForSize predicate and causes the compiler to use the smaller: %0 = MOV32r1 instead of the usual: %0 = MOV32ri 1 This patch adds support for matching IntInit's at the root and uses this as a test case for the optsize attribute that was implemented in r301750 Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls, aditya_nandakumar Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32791 llvm-svn: 303678	2017-05-23 19:33:16 +00:00
Craig Topper	ae292aaad1	[InstSimplify] Add more tests for undef inputs and multiplying by 0 for the add/sub/mul with overflow intrinsics. NFC llvm-svn: 303671	2017-05-23 18:42:58 +00:00
Craig Topper	15288da293	[InstSimplify] auto-generate test checks. NFC llvm-svn: 303664	2017-05-23 17:57:36 +00:00
Sanjay Patel	b0244b5c9a	[InstCombine] auto-generate test checks; NFC llvm-svn: 303663	2017-05-23 17:51:22 +00:00
Sanjay Patel	d3106add77	[InstCombine] allow icmp-xor folds for vectors (PR33138) This fixes the first part of: https://bugs.llvm.org/show_bug.cgi?id=33138 More work is needed for the bitcasted variant. llvm-svn: 303660	2017-05-23 17:29:58 +00:00
Craig Topper	dfd27dd107	[InstCombine] Use update_test_checks to regenerate the ctpop test. NFC llvm-svn: 303659	2017-05-23 17:20:18 +00:00
Sanjay Patel	7ad3dbe836	[InstCombine] add icmp-xor tests to show vector neglect; NFC Also, rename the tests and the file, add comments, and add more tests because there are no existing tests for some of these folds. These patterns are particularly important for crippled vector ISAs that have limited compare predicates (PR33138). llvm-svn: 303652	2017-05-23 16:53:05 +00:00
Stanislav Mekhanoshin	a96ec3f360	[AMDGPU] Convert shl (add) into add (shl) shl (or\|add x, c2), c1 => or\|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641	2017-05-23 15:59:58 +00:00
Ulrich Weigand	7f02d67fce	[RuntimeDyld, PowerPC] Fix check for external symbols when detecting reloction overflow The PowerPC part of processRelocationRef currently assumes that external symbols can be identified by checking for SymType == SymbolRef::ST_Unknown. This is actually incorrect in some cases, causing relocation overflows to be mis-detected. The correct check is to test whether Value.SymbolName is null. Includes test case. Note that it is a bit tricky to replicate the exact condition that triggers the bug in a test case. The one included here seems to fail reliably (before the fix) across different operating system versions on Power, but it still makes a few assumptions (called out in the test case comments). Also add ppc64le platform name to the supported list in the lit.local.cfg files for the MCJIT and OrcMCJIT directories, since those tests were currently not run at all. Fixes PR32650. Reviewer: hfinkel Differential Revision: https://reviews.llvm.org/D33402 llvm-svn: 303637	2017-05-23 14:51:18 +00:00
Anna Thomas	c07d5544dd	[JumpThreading] Safely replace uses of condition This patch builds over https://reviews.llvm.org/rL303349 and replaces the use of the condition only if it is safe to do so. We should not blindly RAUW the condition if experimental.guard or assume is a use of that condition. This is because LVI may have used the guard/assume to identify the value of the condition, and RUAWing will fold the guard/assume and uses before the guards/assumes. Reviewers: sanjoy, reames, trentxintong, mkazantsev Reviewed by: sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33257 llvm-svn: 303633	2017-05-23 13:36:25 +00:00
Sam Kolton	f7659d71eb	[AMDGPU] SDWA: Add assembler support for GFX9 Summary: Added separate pseudo and real instruction for GFX9 SDWA instructions. Currently supports only in assembler. Depends D32493 Reviewers: vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D33132 llvm-svn: 303620	2017-05-23 10:08:55 +00:00
Florian Hahn	abb4218b98	[AArch64] Make instruction fusion more aggressive. Summary: This patch makes instruction fusion more aggressive by * adding artificial edges between the successors of FirstSU and SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps. * updating PostGenericScheduler::tryCandidate to keep clusters together, similar to GenericScheduler::tryCandidate. This change increases the number of AES instruction pairs generated on Cortex-A57 and Cortex-A72. This doesn't change code at all in most benchmarks or general code, but we've seen improvement on kernels using AESE/AESMC and AESD/AESIMC. Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB Reviewed By: evandro Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33230 llvm-svn: 303618	2017-05-23 09:33:34 +00:00
Igor Breger	617be6e475	[GlobalISel][X86] G_LOAD/G_STORE vec256/512 support Summary: mark G_LOAD/G_STORE vec256/512 legal for AVX/AVX512. Implement instruction selection. Reviewers: zvi, guyblank Reviewed By: zvi Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33268 llvm-svn: 303617	2017-05-23 08:23:51 +00:00
Ayal Zaks	589e1d9610	[LV] Report multiple reasons for not vectorizing under allowExtraAnalysis The default behavior of -Rpass-analysis=loop-vectorizer is to report only the first reason encountered for not vectorizing, if one is found, at which time the vectorizer aborts its handling of the loop. This patch allows multiple reasons for not vectorizing to be identified and reported, at the potential expense of additional compile-time, under allowExtraAnalysis which can currently be turned on by Clang's -fsave-optimization-record and opt's -pass-remarks-missed. Removed from LoopVectorizationLegality::canVectorize() the redundant checking and reporting if we CantComputeNumberOfIterations, as LAI::canAnalyzeLoop() also does that. This redundancy is caught by a lit test once multiple reasons are reported. Patch initially developed by Dror Barak. Differential Revision: https://reviews.llvm.org/D33396 llvm-svn: 303613	2017-05-23 07:08:02 +00:00
David Blaikie	15d85fc537	libDebugInfo: Support symbolizing using DWP files llvm-svn: 303609	2017-05-23 06:48:53 +00:00
Akira Hatanaka	e8ae3346a3	[AArch64] Fix PRR33100. This commit fixes a bug introduced in r301019 where optimizeLogicalImm would replace a logical node's immediate operand that was CSE'd and was also an operand of another node. This commit fixes the bug by replacing the logical node instead of its immediate operand. rdar://problem/32295276 llvm-svn: 303607	2017-05-23 06:08:37 +00:00
Amaury Sechet	beef4d7887	Update expected result for or-branch.ll . NFC llvm-svn: 303606	2017-05-23 05:42:54 +00:00
Teresa Johnson	2db1369c1f	Support for taking the max of module flags when linking, use for PIE/PIC Summary: Add Max ModFlagBehavior, which can be used to take the max of two module flag values when merging modules. Use it for the PIE and PIC levels. This avoids an error when we try to import from a module built -fpic into a module built -fPIC, for example. For both PIE and PIC levels, this will be legal, since the code generation gets more conservative as the level is increased. Therefore we can take the max instead of somehow trying to block importing between modules compiled with different levels. Reviewers: tmsriram, pcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33418 llvm-svn: 303590	2017-05-23 00:08:00 +00:00
Tim Northover	997f5f10c6	InstructionSimplify: don't speculate about Constants changing. When presented with an icmp/select pair, we can end up asking what would happen if we replaced one constant with another in an instruction. This is a mistake, while non-constant Values could become a constant, constants cannot change and trying to do so can lead to completely invalid IR (a GEP referencing a non-existant field in the original case). llvm-svn: 303580	2017-05-22 21:28:08 +00:00
Evgeniy Stepanov	b9f1b014e1	Infer relocation model from module flags in relocatable LTO link. Fix for PR33096. llvm-svn: 303578	2017-05-22 21:11:35 +00:00
Zachary Turner	d4136e945e	Implement various flavors of type merging. Previous algotirhm assumed that types and ids are in a single unified stream. For inputs that come from object files, this is the case. But if the input is already a PDB, or is the result of a previous merge, then the types and ids will already have been split up, in which case we need an algorithm that can accept operate on independent streams of types and ids that refer across stream boundaries to each other. Differential Revision: https://reviews.llvm.org/D33417 llvm-svn: 303577	2017-05-22 21:07:43 +00:00
Adrian Prantl	fb31da1306	Don't generate line&scope debug info for meta-instructions. MachineInstructions that don't generate any code (such as IMPLICIT_DEFs) should not generate any debug info either. Fixes PR33107. https://bugs.llvm.org/show_bug.cgi?id=33107 This reapplies r303566 without any modifications. The stage2 build failures persisted even after reverting this patch, and looking back through history, it looks like these tests are flaky. llvm-svn: 303575	2017-05-22 20:47:09 +00:00
Teresa Johnson	525dcb617b	Fix update VP metadata after inlining for instrumentation PGO Summary: With instrumentation profiling, when updating the VP metadata after an inline, VP metadata on the inlined copy was inadvertantly having all counts zeroed out. This was causing indirect calls from code inlined during the call step to be marked as cold in the ThinLTO summaries and not imported. The CallerBFI needs to be passed down so that the CallSiteCount can be computed from the profile summary info. With Sample PGO this was working since the count is extracted from the branch weight metadata on the call being inlined (even before we stopped looking at metadata for non-sample PGO in r302844 this largely wasn't working for instrumentation PGO since only promoted indirect calls would be getting inlined and have the metadata). Added an instrumentation PGO test and renamed the sample PGO test. Reviewers: danielcdh, eraman Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D33389 llvm-svn: 303574	2017-05-22 20:28:18 +00:00
Adrian Prantl	334a130a6f	Revert "Don't generate line&scope debug info for meta-instructions." This reverts commit r303566 while investigating a stage2 buildbot failure. llvm-svn: 303570	2017-05-22 18:50:12 +00:00
Stanislav Mekhanoshin	5fa289f0d8	[AMDGPU] Narrow lshl from 64 to 32 bit if possible Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569	2017-05-22 16:58:10 +00:00
Adrian Prantl	4c047f8931	Don't generate line&scope debug info for meta-instructions. MachineInstructions that don't generate any code (such as IMPLICIT_DEFs) should not generate any debug info either. Fixes PR33107. https://bugs.llvm.org/show_bug.cgi?id=33107 llvm-svn: 303566	2017-05-22 16:21:02 +00:00
Nirav Dave	3465861177	[X86] Remove target feature info from mul-i256.ll test. NFC. llvm-svn: 303558	2017-05-22 15:04:08 +00:00
Simon Atanasyan	e0b726f2fa	[mips] Support micromips attribute passed by front-end This patch adds handling of the `micromips` and `nomicromips` attributes passed by front-end. The patch depends on D33363. Differential revision: https://reviews.llvm.org/D33364 llvm-svn: 303545	2017-05-22 12:47:41 +00:00
Daniel Sanders	c244ff6a1e	Revert r303259 - [globalisel][tablegen] Import rules containing intrinsic_wo_chain. It's causing some buildbots to timeout whenever tablegen needs re-compilation, particularly those with -fsanitize=memory but not only them. A compile time regression was expected since it triples the amount of SelectionDAG rules we are able to import but it's currently too high. llvm-svn: 303542	2017-05-22 10:14:33 +00:00
James Molloy	6110be9759	Re-apply r302416: [ARM] Clear the constant pool cache on explicit .ltorg directives Re-applying now that PR32825 which was raised on the commit this fixed up is now known to have also been fixed by this commit. Original commit message: Multiple ldr pseudoinstructions with the same constant value will reuse the same constant pool entry. However, if the constant pool is explicitly flushed with a .ltorg directive, we should not try to reference constants in the previous pool any longer, since they may be out of range. This fixes assembling hand-written assembler source which repeatedly loads the same constant value, across a binary size larger than the pc-relative fixup range for ldr instructions (4096 bytes). Such assembler source already uses explicit .ltorg instructions to emit constant pools with regular intervals. However if we try to reuse constants emitted in earlier pools, they end up out of range. This makes the output of the testcase match what binutils gas does (prior to this patch, it would fail to assemble). Differential Revision: https://reviews.llvm.org/D32847 llvm-svn: 303540	2017-05-22 09:42:07 +00:00
James Molloy	5193c80830	Re-apply r286006: Fix 24560: assembler does not share constant pool for same constants Re-applying now that the open bug on this commit, PR32825, is known to be fixed. Original commit message: Summary: This patch returns the same label if the CP entry with the same value has been created. Reviewers: eli.friedman, rengolin, jmolloy Subscribers: majnemer, jmolloy, llvm-commits Differential Revision: https://reviews.llvm.org/D25804 llvm-svn: 303539	2017-05-22 09:42:01 +00:00
Strahinja Petrovic	ab9573f37c	[MIPS] Add support to match more patterns for DINS instruction This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 303537	2017-05-22 09:06:44 +00:00
James Molloy	5cc75ae8f9	Revert "[ARM] Clear the constant pool cache on explicit .ltorg directives" This reverts commit r302416. This was a fixup for r286006, which has now been reverted so this doesn't apply (either in concept or in code). This commit itself has no problems, but the underlying issue it was fixing has now disappeared from the codebase. llvm-svn: 303536	2017-05-22 08:49:28 +00:00
James Molloy	5a9cf2e22d	Revert "Fix 24560: assembler does not share constant pool for same constants" This reverts commit r286006. It caused PR32825 and wasn't fixed. llvm-svn: 303535	2017-05-22 08:42:47 +00:00
Amaury Sechet	89d733a505	Regenerate expected result for test constant-combines.ll . NFC llvm-svn: 303533	2017-05-22 07:49:16 +00:00
David Blaikie	d2f3a941e0	libDebugInfo/DWARF: Apply relocations for debug_addr addresses in object files llvm-symbolizer would fail to symbolize addresses in unlinked object files when handling .dwo file data because the addresses would not be relocated in the same way as the ranges in the skeleton CU in the object file. Fix that so object files can be symbolized the same as executables. llvm-svn: 303532	2017-05-22 07:02:47 +00:00
Sanjoy Das	036dda25a5	[SCEV] Clarify behavior around max backedge taken count This is a re-application of a r303497 that was reverted in r303498. I thought it had broken a bot when it had not (the breakage did not go away with the revert). This change makes the split between the "exact" backedge taken count and the "maximum" backedge taken count a bit more obvious. Both of these are upper bounds on the number of times the loop header executes (since SCEV does not account for most kinds of abnormal control flow), but the latter is guaranteed to be a constant. There were a few places where the max backedge taken count was a non-constant; I've changed those to compute constants instead. At this point, I'm not sure if the constant max backedge count can be computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without losing precision. If it can, we can simplify even further by making `getMaxBackedgeTakenCount` a thin wrapper around `getBackedgeTakenCount` and `getUnsignedRange`. llvm-svn: 303531	2017-05-22 06:46:04 +00:00
Zvi Rackover	fdddf671d8	[X86] Add (ix bitcast(vsetcc)) test cases with illegal types. NFC. llvm-svn: 303530	2017-05-22 06:39:12 +00:00
Amaury Sechet	1b195c35a0	Add a test case for large integer subtraction via subcarry. NFC llvm-svn: 303528	2017-05-22 06:06:45 +00:00
Amaury Sechet	822ba7eddc	Add test case for subcarry optimization. llvm-svn: 303525	2017-05-22 02:31:42 +00:00
Daniel Berlin	d130b6c27d	NewGVN: Fix PR 33116, the memoryphi version of bug 32838. llvm-svn: 303521	2017-05-21 23:41:58 +00:00
Davide Italiano	3318970cd3	[NewGVN] Actually check the NewGVN output. Apparently I messed up squashing two consecutive commits. llvm-svn: 303516	2017-05-21 20:55:53 +00:00
Davide Italiano	f3540cff9d	[NewGVN] Add a test for non most dominating leader. Taken from PR32845. Dan removed the most dominating leader check in r303443, but we check this test anyway to make sure things don't regress. llvm-svn: 303515	2017-05-21 20:50:16 +00:00
Davide Italiano	21a49dcdf1	[InstCombine] Take in account the size in sext->lshr->trunc patterns. Otherwise we end up miscompiling, transforming: define i8 @tinky() { %sext = sext i1 1 to i16 %hibit = lshr i16 %sext, 15 %tr = trunc i16 %hibit to i8 ret i8 %tr } into: %sext = sext i1 1 to i8 ret i8 %sext and the first get folded to ret i8 1, while the second gets folded to ret i8 -1. Eventually we should get rid of this transform entirely, but for now, this at least fixes a know correctness bug. Differential Revision: https://reviews.llvm.org/D33338 llvm-svn: 303513	2017-05-21 20:30:27 +00:00
Sanjay Patel	7d383d6a69	[InstCombine] add tests for potential (lshr(sext X), C) folds; NFC As discussed in: https://reviews.llvm.org/D33338 ...we may be able to remove a wider pattern match by doing these more basic canonicalizations. llvm-svn: 303504	2017-05-21 15:18:52 +00:00
Igor Breger	014fc566e7	[GlobalISel][X86] Fix G_TRUNC instruction selection. Updated tests with -verify-machineinstrs flag. It fixes 3 tests failed with machine verifier enabled and listed in PR27481 llvm-svn: 303502	2017-05-21 11:13:56 +00:00
Hiroshi Inoue	37e63b1b21	Summary PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass. This patch improves this optimization to eliminate more compare instructions in two types of common case. - comparison against a constant 1 or -1 The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant. This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0". With this patch, compare can be eliminated in the following code sequence, as an example. uint64_t a, b; if ((a \| b) & 0x8000000000000000ull) { ... } else { ... } - andi for 32-bit comparison on PPC64 Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check. %1 = and i32 %a, 10 %2 = icmp ne i32 %1, 0 br i1 %2, label %foo, label %bar In this simple example, LLVM generates andi. + cmplwi + beq on PPC64. This patch make it possible to eliminate the cmplwi for this case. I added andi. for optimization targets if it is safe to do so. Differential Revision: https://reviews.llvm.org/D30081 llvm-svn: 303500	2017-05-21 06:00:05 +00:00
Sanjoy Das	8963650cfa	Revert "[SCEV] Clarify behavior around max backedge taken count" This reverts commit r303497 since it breaks the msan bootstrap bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1379/ llvm-svn: 303498	2017-05-21 05:02:12 +00:00
Sanjoy Das	5207168383	[SCEV] Clarify behavior around max backedge taken count This change makes the split between the "exact" backedge taken count and the "maximum" backedge taken count a bit more obvious. Both of these are upper bounds on the number of times the loop header executes (since SCEV does not account for most kinds of abnormal control flow), but the latter is guaranteed to be a constant. There were a few places where the max backedge taken count was a non-constant; I've changed those to compute constants instead. At this point, I'm not sure if the constant max backedge count can be computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without losing precision. If it can, we can simplify even further by making `getMaxBackedgeTakenCount` a thin wrapper around `getBackedgeTakenCount` and `getUnsignedRange`. llvm-svn: 303497	2017-05-21 01:47:50 +00:00
Xin Tong	9fbfeefadf	Revert "Add pthread_self function prototype and make it speculatable." This reverts commit 143d7445b5dfa2f6d6c45bdbe0433d9fc531be21. Build breaking llvm-svn: 303496	2017-05-21 00:37:55 +00:00
Xin Tong	75af3af957	Add pthread_self function prototype and make it speculatable. Summary: This allows pthread_self to be pulled out of a loop by LICM. Reviewers: hfinkel, arsenm, davide Reviewed By: davide Subscribers: davide, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D32782 llvm-svn: 303495	2017-05-20 22:40:25 +00:00
David Blaikie	8d039d40c5	llvm-symbolizer: Support multiple CUs in a single DWO file llvm-svn: 303482	2017-05-20 03:32:49 +00:00
Eric Beckmann	a6bdf751a2	Add functionality to cvtres to parse all entries in res file. Summary: Added the new modules in the Object/ folder. Updated the llvm-cvtres interface as well, and added additional tests. Subscribers: llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D33180 llvm-svn: 303480	2017-05-20 01:49:19 +00:00
Davide Italiano	9a0f542db6	[NewGVN] Create a StoreExpression instead of a VariableExpression. In the case where we have an operand defined by a lod of the same memory location. Historically this was a VariableExpression because we wanted to make sure they ended up in the same class, but if we create the right expression, they end up in the same class anyway. Fixes PR32897. Thanks to Dan for the detailed discussion and the fix suggestion. llvm-svn: 303475	2017-05-20 00:46:54 +00:00
Adrian Prantl	981a799896	Revert "Revert "ThinLTO: Verify bitcode before lauching the ThinLTOCodeGenerator."" This reapplies commit r303438 modified to not verify cross-imported bitcode in FunctionImporter. rdar://problem/31233625 Differential Revision: https://reviews.llvm.org/D33370 llvm-svn: 303470	2017-05-20 00:00:08 +00:00
Adrian Prantl	660437975b	Revert "ThinLTO: Verify bitcode before lauching the ThinLTOCodeGenerator." This reverts commit r303438 while deliberating buildbot breakage. llvm-svn: 303467	2017-05-19 23:32:21 +00:00
Matthias Braun	50ec0b5dce	SimplifyLibCalls: Optimize wcslen Refactor the strlen optimization code to work for both strlen and wcslen. This especially helps with programs in the wild where people pass L"string"s to const std::wstring& function parameters and the wstring constructor gets inlined. This also fixes a lingerind API problem/bug in getConstantStringInfo() where zeroinitializers would always give you an empty string (without a length) back regardless of the actual length of the initializer which did not work well in the TrimAtNul==false causing the PR mentioned below. Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG memcpy lowering and may lead to some cases for out-of-bounds zeroinitializer accesses not getting optimized anymore. So some code with UB may produce out of bound memory reads now instead of just producing zeros. The refactoring "accidentally" fixes http://llvm.org/PR32124 Differential Revision: https://reviews.llvm.org/D32839 llvm-svn: 303461	2017-05-19 22:37:09 +00:00
Evgeniy Stepanov	2acea2786b	[safestack] Disable stack coloring by default. Workaround for apparent miscompilation of PR32143. llvm-svn: 303456	2017-05-19 20:58:48 +00:00
Daniel Berlin	e021d2d629	NewGVN: Fix PR32838. This is a complicated bug involving two issues: 1. What do we do with phi nodes when we prove all arguments are not live? 2. When is it safe to use value leaders to determine if we can ignore an argumnet? llvm-svn: 303453	2017-05-19 20:22:20 +00:00
Simon Pilgrim	c74e7f0a42	Fix line-endings. llvm-svn: 303448	2017-05-19 19:47:29 +00:00
Davide Italiano	7dc2efbce4	[InstCombine] Actually commit the test showing the miscompile. Clarify a comment while I'm here. llvm-svn: 303447	2017-05-19 19:41:11 +00:00
Zachary Turner	526f4f2aa8	Resubmit "[CodeView] Provide a common interface for type collections." This was originally reverted because it was a breaking a bunch of bots and the breakage was not surfacing on Windows. After much head-scratching this was ultimately traced back to a bug in the lit test runner related to its pipe handling. Now that the bug in lit is fixed, Windows correctly reports these test failures, and as such I have finally (hopefully) fixed all of them in this patch. llvm-svn: 303446	2017-05-19 19:26:58 +00:00
Davide Italiano	d837b0f9b9	[InstCombine] Add tests to demonstrate the miscompile in PR33078. llvm-svn: 303445	2017-05-19 19:23:24 +00:00
Daniel Berlin	b527b2cf13	Last of the major pieces to NewGVN - yay! Summary: NewGVN: Handle equivalence between phi of ops and op of phis. This makes our GVN mostly-complete. It would be complete, modulo some deliberate choices we make. This means it detects roughly all herband equivalences in polynomial time, including cases notoriously hard for other GVN's to detect. It also detects a very large swath of the cases we currently rely on instcombine to detect that involve folding upwards through phis. Fixes PR 31125, 31463, PR 31868 Reviewers: davide Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32151 llvm-svn: 303444	2017-05-19 19:01:27 +00:00
Amaury Sechet	77cfb4a85f	[DAGCombine] (addcarry 0, 0, X) -> (ext/trunc X) Summary: While this makes some case better and some case worse - so it's unclear if it is a worthy combine just by itself - this is a useful canonicalisation. As per discussion in D32756 . Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32916 llvm-svn: 303441	2017-05-19 18:20:44 +00:00
Adrian Prantl	f9ab9bfc39	ThinLTO: Verify bitcode before lauching the ThinLTOCodeGenerator. rdar://problem/31233625 Differential Revision: https://reviews.llvm.org/D33151 llvm-svn: 303438	2017-05-19 17:55:02 +00:00
Matthias Braun	420713c40b	Fix typo in test llvm-svn: 303436	2017-05-19 17:25:20 +00:00
Simon Pilgrim	63892402ba	[X86][FMA] Tests showing missed fmsubadd opportunities (PR30633) llvm-svn: 303435	2017-05-19 17:19:26 +00:00
Dmitry Preobrazhensky	ce941c9c38	[AMDGPU][MC] Corrected disassembler to decode instructions with 2 literals See bug 32922: https://bugs.llvm.org//show_bug.cgi?id=32922 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32912 llvm-svn: 303428	2017-05-19 14:27:52 +00:00
Dmitry Preobrazhensky	9321e8fcec	[AMDGPU][MC] Fixed bugs in export instruction See Bugs 33019, 33056: https://bugs.llvm.org//show_bug.cgi?id=33019 https://bugs.llvm.org//show_bug.cgi?id=33056 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33288 llvm-svn: 303423	2017-05-19 13:36:09 +00:00
Guy Blank	548e22a1a7	[X86][AVX512] Make i1 illegal in the CodeGen This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421	2017-05-19 12:35:15 +00:00
Volkan Keles	6a36c64720	[GlobalISel] IRTranslator: Translate ConstantStruct Reviewers: qcolombet, ab, t.p.northover, aditya_nandakumar, dsanders Reviewed By: qcolombet Subscribers: rovka, kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D33317 llvm-svn: 303412	2017-05-19 09:47:02 +00:00
Zachary Turner	1dfcf8d92c	Revert "[CodeView] Provide a common interface for type collections." This is a squash of ~5 reverts of, well, pretty much everything I did today. Something is seriously broken with lit on Windows right now, and as a result assertions that fire in tests are triggering failures. I've been breaking non-Windows bots all day which has seriously confused me because all my tests have been passing, and after running lit with -a to view the output even on successful runs, I find out that the tool is crashing and yet lit is still reporting it as a success! At this point I don't even know where to start, so rather than leave the tree broken for who knows how long, I will get this back to green, and then once lit is fixed on Windows, hopefully hopefully fix the remaining set of problems for real. llvm-svn: 303409	2017-05-19 05:57:45 +00:00
Davide Italiano	ee49f4943c	[NewGVN] Delete the old store when we find congruent to a load. (or non-store, more in general). Fixes PR33086. Caught by the store verifier. llvm-svn: 303406	2017-05-19 04:06:10 +00:00
Zachary Turner	27ac223a85	Fix a broken test. Similar to my previous fix, it turns out llvm-pdbdump has been printing an incorrect value since the beginning of time, but we didn't know it was incorrect. Specifically, we were interpreting a TypeIndex as referencing a type from the TPI stream when it actually should come from the IPI stream. So we were printing a string that looked like a valid string, but was just from the wrong place. llvm-svn: 303403	2017-05-19 03:04:08 +00:00
Matthias Braun	d6e75ed93e	LiveIntervalAnalysis: Fix missing case in pruneSubRegValues() pruneSubRegValues() needs to remove subregister ranges starting at instructions that later get removed by eraseInstrs(). It missed to check one case in which eraseInstrs() would remove an instruction. Fixes http://llvm.org/PR32688 llvm-svn: 303396	2017-05-19 00:18:03 +00:00
Davide Italiano	eab0de2b82	[NewGVN] Break infinite recursion in singleReachablePHIPath(). We can have cycles between PHIs and this causes singleReachablePhi() to call itself indefintely (until we run out of stack). The proper solution would be that of computing SCCs, but it's not worth for now, so just keep a visited set and give up when we find a cycle. Thanks to Dan for the discussion/help with this. Fixes PR33014. llvm-svn: 303393	2017-05-18 23:22:44 +00:00
Zachary Turner	8fb441ab9c	[llvm-pdbdump] Add the ability to merge PDBs. Merging PDBs is a feature that will be used heavily by the linker. The functionality already exists but does not have deep test coverage because it's not easily exposed through any tools. This patch aims to address that by adding the ability to merge PDBs via llvm-pdbdump. It takes arbitrarily many PDBs and outputs a single PDB. Using this new functionality, a test is added for merging type records. Future patches will add the ability to merge symbol records, module information, etc. llvm-svn: 303389	2017-05-18 23:03:41 +00:00
Sanjay Patel	86bf0a874c	[InstCombine] add more tests for xor-of-icmps; NFC llvm-svn: 303387	2017-05-18 22:47:57 +00:00
Davide Italiano	a76e5fa111	[NewGVN] Replace predicate info leftovers. This time with an additional fix, i.e. we remove the dead @llvm.ssa.copy instruction. llvm-svn: 303385	2017-05-18 21:43:23 +00:00
Craig Topper	df01feb40e	[InstSimplify] Make m_Not work for xor -1, X Currently m_Not only works the canonical xor X, -1 form that InstCombine produces. InstSimplify can't rely on this canonicalization. Differential Revision: https://reviews.llvm.org/D33331 llvm-svn: 303379	2017-05-18 20:27:32 +00:00
Hans Wennborg	b00ffd8cb7	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB." This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. llvm-svn: 303369	2017-05-18 18:50:05 +00:00
Craig Topper	93898495b9	[InstSimplify] Add test cases for missing fold (A & B) \| ~(A ^ B) -> ~(A ^ B). llvm-svn: 303367	2017-05-18 18:14:40 +00:00
Sanjay Patel	0a84eb8d6e	[InstCombine] move test and use better checks; NFC Previously, this test was checking for 'or i1', but that was actually matched by 'xor i1'. llvm-svn: 303364	2017-05-18 17:48:07 +00:00
Wei Mi	8848c1e3c7	[LSR] Call canonicalize after we generate a new Formula in GenerateTruncates. Fix PR33077. The testcase in PR33077 generates a LSR Use Formula with two SCEVAddRecExprs for the same loop. Such uncommon formula will become non-canonical after GenerateTruncates adds sign extension to the ScaledReg of the Formula, and it will break the assertion that every Formula to be inserted is canonical. The fix is to call canonicalize for the raw Formula generated by GenerateTruncates before inserting it. llvm-svn: 303361	2017-05-18 17:21:22 +00:00
Francis Visoiu Mistrih	8b61764cbb	[LegacyPassManager] Remove TargetMachine constructors This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360	2017-05-18 17:21:13 +00:00
Anna Thomas	7bca59152a	[JumpThreading] Dont RAUW condition incorrectly Summary: We have a bug when RAUWing the condition if experimental.guard or assumes is a use of that condition. This is because LazyValueInfo may have used the guards/assumes to identify the value of the condition at the end of the block. RAUW replaces the uses at the guard/assume as well as uses before the guard/assume. Both of these are incorrect. For now, disable RAUW for conditions and fix the logic as a next step: https://reviews.llvm.org/D33257 Reviewers: sanjoy, reames, trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33279 llvm-svn: 303349	2017-05-18 13:12:18 +00:00
Sam Kolton	ebfdaf7394	[AMDGPU] SDWA operands should not intersect with potential MIs Summary: There should be no intesection between SDWA operands and potential MIs. E.g.: ``` v_and_b32 v0, 0xff, v1 -> src:v1 sel:BYTE_0 v_and_b32 v2, 0xff, v0 -> src:v0 sel:BYTE_0 v_add_u32 v3, v4, v2 ``` In that example it is possible that we would fold 2nd instruction into 3rd (v_add_u32_sdwa) and then try to fold 1st instruction into 2nd (that was already destroyed). So if SDWAOperand is also a potential MI then do not apply it. Reviewers: vpykhtin, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32804 llvm-svn: 303347	2017-05-18 12:12:03 +00:00
Guy Blank	d19632fa16	[MVT] add v1i1 MVT Adds the v1i1 MVT as a preparation for another commit (https://reviews.llvm.org/D32273) Differential Revision: https://reviews.llvm.org/D32540 llvm-svn: 303346	2017-05-18 11:29:41 +00:00
Igor Breger	842b5b36ba	[GlobalISel][X86] G_ADD/G_SUB vector legalizer/selector support. Summary: G_ADD/G_SUB vector legalizer/selector support. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33232 llvm-svn: 303345	2017-05-18 11:10:56 +00:00
Simon Pilgrim	6bba6068be	[X86][AVX512] Add 512-bit vector ctpop costs + tests llvm-svn: 303342	2017-05-18 10:42:34 +00:00
Daniel Sanders	89e9308623	Re-commit: [globalisel][tablegen] Import rules containing intrinsic_wo_chain. Summary: As of this patch, 1018 out of 3938 rules are currently imported. Depends on D32275 Reviewers: qcolombet, kristof.beyls, rovka, t.p.northover, ab, aditya_nandakumar Reviewed By: qcolombet Subscribers: dberris, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32278 The previous commit failed on test-suite/Bitcode/simd_ops/AArch64_halide_runtime.bc because isImmOperandEqual() assumed MO was a register operand and that's not always true. llvm-svn: 303341	2017-05-18 10:33:36 +00:00
Zvi Rackover	d17d13d2a9	[X86] Add explicit triple to test invocation llvm-svn: 303340	2017-05-18 09:32:56 +00:00
Lama Saba	2ea271b54a	[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333	2017-05-18 08:11:50 +00:00
Serguei Katkov	00211c3faa	Fix buildbot failure after rL303327: [BPI] Reduce the probability of unreachable edge to minimal value greater than 0. One more test is updated to meet new branch probability for unreachable branches. llvm-svn: 303329	2017-05-18 07:20:52 +00:00
Zvi Rackover	c20c6d07cf	[X86] Adding tests for scalar bitcasts from vsetcc. NFC. llvm-svn: 303328	2017-05-18 07:04:48 +00:00
Serguei Katkov	ba831f78fd	[BPI] Reduce the probability of unreachable edge to minimal value greater than 0 The probability of edge coming to unreachable block should be as low as possible. The change reduces the probability to minimal value greater than zero. The bug https://bugs.llvm.org/show_bug.cgi?id=32214 show the example when the probability of edge coming to unreachable block is greater than for edge coming to out of the loop and it causes incorrect loop rotation. Please note that with this change the behavior of unreachable heuristic is a bit different than others. Specifically, before this change the sum of probabilities coming to unreachable blocks have the same weight for all branches (it was just split over all edges of this block coming to unreachable blocks). With this change it might be slightly different but not to much due to probability of taken branch to unreachable block is really small. Reviewers: chandlerc, sanjoy, vsk, congh, junbuml, davidxl, dexonsmith Reviewed By: chandlerc, dexonsmith Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D30633 llvm-svn: 303327	2017-05-18 06:11:56 +00:00
Akira Hatanaka	b10bff1183	[ThinLTO] Do not assert when adding a module with a different but compatible target triple Currently, an assertion fails in ThinLTOCodeGenerator::addModule when the target triple of the module being added doesn't match that of the one stored in TMBuilder. This patch relaxes the constraint and makes changes to allow target triples that only differ in their version numbers on Apple platforms, similarly to what r228999 did. rdar://problem/30133904 Differential Revision: https://reviews.llvm.org/D33291 llvm-svn: 303326	2017-05-18 03:52:29 +00:00
Justin Bogner	be42c4aec8	Update three tests I missed in r302979 and r302990 llvm-svn: 303319	2017-05-18 00:58:06 +00:00
Sanjay Patel	7f4687f164	[InstCombine] add test for xor-of-icmps; NFC This is another form of the problem discussed in D32143. llvm-svn: 303315	2017-05-17 23:22:52 +00:00
Quentin Colombet	a072d13e54	Revert "[globalisel][tablegen] Import rules containing intrinsic_wo_chain." This reverts commit r303259. This breaks the GISel bot: http://lab.llvm.org:8080/green/job/Compiler_Verifiers_GlobalISEL/5163/consoleFull#-134276167849ba4694-19c4-4d7e-bec5-911270d8a58c llvm-svn: 303313	2017-05-17 23:17:29 +00:00
Sanjay Patel	ba212c241a	[InstCombine] handle icmp i1 X, C early to avoid creating an unknown pattern The missing optimization for xor-of-icmps still needs to be added, but by being more efficient (not generating unnecessary logic ops with constants) we avoid the bug. See discussion in post-commit comments: https://reviews.llvm.org/D32143 llvm-svn: 303312	2017-05-17 22:29:40 +00:00
Sanjay Patel	3cd38a8d4c	[InstCombine] add test for missing icmp bool fold; NFC llvm-svn: 303310	2017-05-17 22:20:02 +00:00
Matt Arsenault	2b1f9aa577	AMDGPU: Start defining a calling convention Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308	2017-05-17 21:56:25 +00:00
Kyle Butt	f6c61ef64d	CodeGen: Power: Add lowering for shifts of v1i128. When legalizing vector operations on vNi128, they will be split to v1i128 because that is a legal type on ppc64, but then the compiler will crash in selection dag because it fails to select for these operations. This patch fixes shift operations. Logical shift right and left shift can be performed in the vector unit, but algebraic shift right requires being split. Differential Revision: https://reviews.llvm.org/D32774 llvm-svn: 303307	2017-05-17 21:54:41 +00:00
Michael Liao	ab12984634	Fix PR33028 - '-verify-mahcineinstrs' starts to complain allocatable live-in physical registers on non-entry or non-landing-pad basic blocks. - Refactor the XBEGIN translation to define EAX on a dedicated fallback code path due to XABORT. Add a pseudo instruction to define EAX explicitly to avoid add physical register live-in. Differential Revision: https://reviews.llvm.org/D33168 llvm-svn: 303306	2017-05-17 21:48:00 +00:00
Matt Arsenault	a53292779a	AMDGPU: Remove old intrinsic uses llvm-svn: 303305	2017-05-17 21:38:21 +00:00
Simon Pilgrim	23ef26728a	[X86][AVX512] Add 512-bit vector ctlz costs + tests llvm-svn: 303300	2017-05-17 21:02:18 +00:00
Dehao Chen	00549e47bd	update the test that should have been updated in r303292. (NFC) llvm-svn: 303298	2017-05-17 20:44:08 +00:00
Matt Arsenault	98f2946ab3	AMDGPU: Make better use of op_sel with high components Handle more general swizzles. llvm-svn: 303296	2017-05-17 20:30:58 +00:00
Sanjay Patel	e2787b9a35	[InstSimplify] handle all icmp i1 X, C in one place; NFCI We already handled all of the new tests identically, but several of those went through a lot of unnecessary processing before getting folded. Another motivation for grouping these cases together is that InstCombine needs a similar fold. Currently, it handles the 'not' cases inefficiently which can lead to bugs as described in the post-commit comments of: https://reviews.llvm.org/D32143 llvm-svn: 303295	2017-05-17 20:27:55 +00:00
Simon Pilgrim	d0365967c4	[X86][AVX512] Add 512-bit vector cttz costs + tests llvm-svn: 303293	2017-05-17 20:22:54 +00:00
Dehao Chen	02828a93e8	Only enable LiveRangeShrink for x86. Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure. Reviewers: MatzeB, qcolombet Reviewed By: qcolombet Subscribers: jholewinski, jyknight, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33294 llvm-svn: 303292	2017-05-17 20:18:13 +00:00
Matt Arsenault	786eeea23e	AMDGPU: Try to use op_sel when selecting packed instructions Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291	2017-05-17 20:00:00 +00:00
Simon Pilgrim	91b46c99be	[X86] Split ctpop/ctlz/cttz cost tests This will make things a lot easier to test all the permutations of avx512 llvm-svn: 303290	2017-05-17 19:57:20 +00:00
Matt Arsenault	ee324ffc1f	AMDGPU: Fix min3/max3 combines for f16/i16 Fix missing instruction definitions for min3/max3. llvm-svn: 303284	2017-05-17 19:25:06 +00:00
Simon Pilgrim	a9a92a1a6a	[X86][AVX512] Add 512-bit vector bitreverse costs + tests llvm-svn: 303283	2017-05-17 19:20:20 +00:00
Sanjay Patel	b2e7003103	[InstCombine] add isCanonicalPredicate() helper function and use it; NFCI There should be a slight efficiency improvement from handling icmp/fcmp with one matcher and reducing duplicated code. The larger motivation is that there are questions about how predicate canonicalization is handled, and the refactoring should make it easier if we want to change any of that behavior. 1. As noted in the code comment, we've chosen 3 of the 16 FCMP preds as not canonical. Why those 3? It goes back to rL32751 from what I can tell, but I'm not sure if there's a justification for that rule. 2. We currently do not canonicalize integer select conditions. Should we use the same rule that applies to branches for selects? 3. We currently do canonicalize some FP select conditions, and those rules would conflict with the rule shown here. Should one or both be changed? No-functional-change-intended, but adding tests anyway because there's no coverage for most of the predicates. Differential Revision: https://reviews.llvm.org/D33247 llvm-svn: 303261	2017-05-17 14:21:19 +00:00
Daniel Sanders	52c9a0c9f2	[globalisel][tablegen] Import rules containing intrinsic_wo_chain. Summary: As of this patch, 1018 out of 3938 rules are currently imported. Depends on D32275 Reviewers: qcolombet, kristof.beyls, rovka, t.p.northover, ab, aditya_nandakumar Reviewed By: qcolombet Subscribers: dberris, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32278 llvm-svn: 303259	2017-05-17 13:39:49 +00:00
Sanjay Patel	9c8f7a2eff	[x86] Update tests in psubus.ll; NFC Remove unnecessary memops to minimize tests. Patch by Yulia Koval! Differential Revision: https://reviews.llvm.org/D32643 llvm-svn: 303258	2017-05-17 13:39:16 +00:00
Krzysztof Parzyszek	2b0533126e	[PPC] Properly update register save area offsets The variables MinGPR/MinG8R were not updated properly when resetting the offsets, which in the included testcase lead to saving the CR register in the same location as R30. This fixes another issue reported in PR26519. Differential Revision: https://reviews.llvm.org/D33017 llvm-svn: 303257	2017-05-17 13:25:09 +00:00
Igor Breger	28f290fab8	[GlobalISel][X86] Support add i64 in IA32. Summary: support G_UADDE instruction selection. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D33096 llvm-svn: 303255	2017-05-17 12:48:08 +00:00
Jonas Paulsson	8722ade770	[SystemZ] Modelling of costs of divisions with a constant power of 2. Such divisions will eventually be implemented with shifts which should be reflected in the cost function. Review: Ulrich Weigand llvm-svn: 303254	2017-05-17 12:46:26 +00:00
Daniel Sanders	ed205a090d	[globalisel][tablegen] Require that all registers between instructions of a match are virtual. Summary: Without this, it's possible to encounter multiple defs for a register. This is triggered by the current version of D32868 when applied to trunk. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D32869 llvm-svn: 303253	2017-05-17 12:43:30 +00:00
Daniel Cederman	4af795b499	[Sparc] Remove execute permissions from non-executable text files Reviewers: jyknight, lero_chris, venkatra Reviewed By: jyknight Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27127 llvm-svn: 303245	2017-05-17 11:05:20 +00:00
Diana Picus	382602f176	[GlobalISel][TableGen] Fix handling of default operands When looping through a destination pattern's operands to decide how many default operands we need to introduce, we used to count the "expanded" number of operands. So if one default operand would be rendered as 2 values, we'd count it as 2 operands, when in fact it needs to count as only 1 operand regardless of how many values it expands to. This turns out to be a problem only in some very specific cases, e.g. when we have one operand with multiple default values followed by more operands with default values (see the new test). In such a situation we'd stop looping before looking at all the operands, and then error out assuming that we don't have enough default operands to make up the shortfall. At the moment this only affects ARM. The patch removes the loop counting default operands entirely and assumes that we'll have to introduce values for any default operand that we find (i.e. we're assuming it cannot be given as a child at all). It also extracts the code for adding renderers for default operands into a helper method. Differential Revision: https://reviews.llvm.org/D33031 llvm-svn: 303240	2017-05-17 08:57:28 +00:00
Pavel Labath	859d302349	[RuntimeDyld] Fix debug section relocation (pr20457) Summary: Debug info sections, (or non-SHF_ALLOC sections in general) should be linked as if their load address was zero to emulate the behavior of the static linker. This bug was discovered because it was breaking lldb expression evaluation on linux. Reviewers: lhames Subscribers: aprantl, eugene, clayborg, lldb-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D32899 llvm-svn: 303239	2017-05-17 08:47:28 +00:00
Gor Nishanov	db38485588	[coroutines] Handle spills before catchswitch If we need to spill the result of the PHI instruction, we insert the spill after all of the PHIs and EHPads, however, in a catchswitch block there is no room to insert the spill. Make room by splitting away catchswitch into a separate block. Before the fix: catch.dispatch: %val = phi i32 [ 1, %if.then ], [ 2, %if.else ] %switch = catchswitch within none [label %catch] unwind label %cleanuppad After: catch.dispatch: %val = phi i32 [ 1, %if.then ], [ 2, %if.else ] %tok = cleanuppad within none [] ; spill goes here cleanupret from %tok unwind label %catch.dispatch.switch catch.dispatch.switch: %switch = catchswitch within none [label %catch] unwind label %cleanuppad https://reviews.llvm.org/D31846 llvm-svn: 303232	2017-05-17 03:09:22 +00:00
Davide Italiano	65699e5e7d	[NewGVN] Re-enable test now that the nondeterminism has been fixed. llvm-svn: 303217	2017-05-16 22:27:06 +00:00
NAKAMURA Takumi	3c386711f7	llvm/test/Transforms/InstCombine/debuginfo-skip.ll REQUIRES +asserts. llvm-svn: 303216	2017-05-16 22:19:56 +00:00
Sanjay Patel	877364ff99	[InstSimplify] add folds for constant mask of value shifted by constant We would eventually catch these via demanded bits and computing known bits in InstCombine, but I think it's better to handle the simple cases as soon as possible as a matter of efficiency. This fold allows further simplifications based on distributed ops transforms. eg: %a = lshr i8 %x, 7 %b = or i8 %a, 2 %c = and i8 %b, 1 InstSimplify can directly fold this now: %a = lshr i8 %x, 7 Differential Revision: https://reviews.llvm.org/D33221 llvm-svn: 303213	2017-05-16 21:51:04 +00:00
Amara Emerson	c9916d7e97	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Easwaran Raman	3cd1479c3f	[Inliner] Do not mix callsite and callee hotness based updates. Update threshold based on callee's hotness only when BFI is not available. Otherwise use only callsite's hotness. This makes it easier to reason about hotness related threshold updates. Differential revision: https://reviews.llvm.org/D33157 llvm-svn: 303210	2017-05-16 21:18:09 +00:00
Tim Shen	0fbbef43e0	[PPC] Add -ppc-asm-full-reg-names to atomic-2.ll. NFC. Differential Revisions: https://reviews.llvm.org/D32763 llvm-svn: 303209	2017-05-16 20:58:55 +00:00
Matthias Braun	83a11ca664	Test for r303197 llvm-svn: 303208	2017-05-16 20:53:27 +00:00
Tim Shen	3bef27cc6f	[PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync. Summary: This fixes pr32392. The lowering pipeline is: llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in expandPostRAPseudo. The reason why expandPostRAPseudo is chosen is because previous passes are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne- 7, .+4 (some branch pass(s)). Differential Revision: https://reviews.llvm.org/D32763 llvm-svn: 303205	2017-05-16 20:18:06 +00:00
Sanjay Patel	6b6ce6350f	[InstCombine] auto-generate better checks; NFC llvm-svn: 303203	2017-05-16 20:09:32 +00:00
Dmitry Mikulin	fce148c568	In debug builds non-trivial amount of time is spent in InstCombine processing @llvm.dbg.* calls in visitCallInst(). They can be safely ignored. llvm-svn: 303202	2017-05-16 20:08:49 +00:00
Reid Kleckner	0ad69fc89f	Revert "[X86] Replace slow LEA instructions in X86" This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199	2017-05-16 19:55:03 +00:00
Nirav Dave	da8f221273	Elide stores which are overwritten without being observed. Summary: In SelectionDAG, when a store is immediately chained to another store to the same address, elide the first store as it has no observable effects. This is causes small improvements dealing with intrinsics lowered to stores. Test notes: * Many testcases overwrite store addresses multiple times and needed minor changes, mainly making stores volatile to prevent the optimization from optimizing the test away. * Many X86 test cases optimized out instructions associated with associated with va_start. * Note that test_splat in CodeGen/AArch64/misched-stp.ll no longer has dependencies to check and can probably be removed and potentially replaced with another test. Reviewers: rnk, john.brawn Subscribers: aemerson, rengolin, qcolombet, jyknight, nemanjai, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33206 llvm-svn: 303198	2017-05-16 19:43:56 +00:00
Renato Golin	d69570e017	Revert "[ARM] Mark LEApcrel instructions as isAsCheapAsAMove" Revert "[ARM] Mark LEApcrel as not having side effects" This reverts commit r303054 and r303053, as they broke the ARM self-hosting buildbots: http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/1550 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost-neon/builds/1349 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/1845 Offline investigation on course. llvm-svn: 303193	2017-05-16 17:59:07 +00:00
Sanjay Patel	f5eeb35dce	[InstCombine] add motivational comment for tests; NFC The referenced tests are derived from: https://bugs.llvm.org/show_bug.cgi?id=32791 and: https://reviews.llvm.org/D33172 The motivation for including negative tests may not be clear, so I'm adding an explanatory comment here. In the post-commit thread for r303133: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170515/453793.html ...it was mentioned that we don't want to add redundant tests. This is a valid point. But in this case, we have a patch under review (D33172) that demonstrates that no existing regression tests are affected by a proposed code change, but these are. Therefore, I think these tests have value not visible in any existing regression tests regardless of whether they show a transform. Differential Revision: https://reviews.llvm.org/D33242 llvm-svn: 303185	2017-05-16 16:30:46 +00:00
Lama Saba	52e892577d	[X86] Replace slow LEA instructions in X86 According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183	2017-05-16 16:01:36 +00:00
Matthew Simpson	af60af1ed5	Revert 303174, 303176, and 303178 These commits are breaking the bots. Reverting to investigate. llvm-svn: 303182	2017-05-16 15:50:30 +00:00
Matthew Simpson	62a7fab6b9	Make test target-specific llvm-svn: 303178	2017-05-16 15:33:22 +00:00
Matthew Simpson	c3c92cf2c7	Fix test case to unbreak bots llvm-svn: 303176	2017-05-16 15:20:27 +00:00
Matthew Simpson	b7b5d55c38	[LV] Avoid potentential division by zero when selecting IC llvm-svn: 303174	2017-05-16 14:43:55 +00:00
Gor Nishanov	23453c11ff	[coroutines] Handle unwind edge splitting Summary: RewritePHIs algorithm used in building of CoroFrame inserts a placeholder ``` %placeholder = phi [%val] ``` on every edge leading to a block starting with PHI node with multiple incoming edges, so that if one of the incoming values was spilled and need to be reloaded, we have a place to insert a reload. We use SplitEdge helper function to split the incoming edge. SplitEdge function does not deal with unwind edges comping into a block with an EHPad. This patch adds an ehAwareSplitEdge function that can correctly split the unwind edge. For landing pads, we clone the landing pad into every edge block and replace the original landing pad with a PHI collection the values from all incoming landing pads. For WinEH pads, we keep the original EHPad in place and insert cleanuppad/cleapret in the edge blocks. Reviewers: majnemer, rnk Reviewed By: majnemer Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D31845 llvm-svn: 303172	2017-05-16 14:11:39 +00:00
Igor Breger	3a45504498	[GlobalISel][X86] Split memop test file. NFC llvm-svn: 303169	2017-05-16 13:37:31 +00:00
Max Kazantsev	b09b5db793	[SCEV] Fix sorting order for AddRecExprs The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs by loop depth, but does not pay attention to dominance of loops. This can lead us to the following buggy situation: for (...) { // loop1 op1 = {A,+,B} } for (...) { // loop2 op2 = {A,+,B} S = add op1, op2 } In this case there is no guarantee that in operand list of S the op2 comes before op1 (loop depth is the same, so they will be sorted just lexicographically), so we can incorrectly treat S as a recurrence of loop1, which is wrong. This patch changes the sorting logic so that it places the dominated recs before the dominating recs. This ensures that when we pick the first recurrency in the operands order, it will be the bottom-most in terms of domination tree. The attached test set includes some tests that produce incorrect SCEV estimations and crashes with oldlogic. Reviewers: sanjoy, reames, apilipenko, anna Reviewed By: sanjoy Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33121 llvm-svn: 303148	2017-05-16 07:27:06 +00:00
Davide Italiano	a641842845	Revert "[NewGVN] Replace predicate info leftovers." It's breaking the bots. llvm-svn: 303142	2017-05-16 05:51:21 +00:00
Davide Italiano	331058fcc4	[NewGVN] Replace predicate info leftovers. Fixes PR32945. Differential Revision: https://reviews.llvm.org/D33226 llvm-svn: 303141	2017-05-16 05:23:23 +00:00
Sanjay Patel	515d1a6804	[InstCombine] add tests for PR32791; NFC llvm-svn: 303133	2017-05-15 23:59:28 +00:00
Francis Visoiu Mistrih	ebbc7159e9	[ShrinkWrapping] Handle restores on no-return paths Shrink-wrapping uses post-dominators to find a restore point that post-dominates all the uses of CSR / stack. The way dominator trees are modeled in LLVM today is that unreachable blocks are not present in a generic dominator tree, so, an unreachable node is dominated by anything: include/llvm/Support/GenericDomTree.h:467. Since for post-dominators, a no-return block is considered "unreachable", calling findNearestCommonDominator on an unreachable node A and a non-unreachable node B, will return B, which can be false. If we find such node, we bail out since there is no good restore point available. rdar://problem/30186931 llvm-svn: 303130	2017-05-15 23:13:35 +00:00
Sanjay Patel	9edfbc4409	[InstSimplify] add tests for unnecessary mask of shifted values; NFC llvm-svn: 303127	2017-05-15 22:54:37 +00:00
Justin Bogner	2847c99909	Add "REQUIRES:" to the last few tests that use target specific intrinsics llvm-svn: 303123	2017-05-15 22:15:22 +00:00
Tim Northover	203c6f055d	AArch64: use linker-private symbols for globals in MachO. We don't use section-relative relocations on AArch64, so all symbols must be at least visible to the linker (i.e. properly global or l_whatever, but not L_whatever). llvm-svn: 303118	2017-05-15 21:51:38 +00:00
David Blaikie	441cfee780	PR32288: Describe a bool parameter's DWARF location with a simple register There's no need (& a bit incorrect) to mask off the high bits of the register reference when describing a simple bool value. Reviewers: aprantl Differential Revision: https://reviews.llvm.org/D31062 llvm-svn: 303117	2017-05-15 21:34:01 +00:00
Adam Nemet	e29686e5c1	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Hans Wennborg	bd6e9e77a7	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Tim Northover	8b96c7e9b5	AArch64: diagnose unrecognized features in .cpu directive. We were silently ignoring any features we couldn't match up, which led to errors in an inline asm block missing the conventional "\n\t". llvm-svn: 303108	2017-05-15 19:42:15 +00:00
Sanjay Patel	878715f978	[InstCombine] restrict icmp fold with 2 sdiv exact operands (PR32949) This is the InstCombine counterpart to D32954. I added some comments about the code duplication in: rL302436 Alive-based verification: http://rise4fun.com/Alive/dPw This is a 2nd fix for the problem reported in: https://bugs.llvm.org/show_bug.cgi?id=32949 Differential Revision: https://reviews.llvm.org/D32970 llvm-svn: 303105	2017-05-15 19:27:53 +00:00
Sanjay Patel	a23b141cd2	[InstSimplify] restrict icmp fold with 2 sdiv exact operands (PR32949) These folds were introduced with https://reviews.llvm.org/rL127064 as part of solving: https://bugs.llvm.org/show_bug.cgi?id=9343 As shown here: http://rise4fun.com/Alive/C8 ...however, the sdiv exact case needs a stronger predicate. I opted for duplicated code instead of adding another fallthrough because I think that's easier to read (and edit in case we need/want to restrict/loosen the predicates any more). This should fix: https://bugs.llvm.org/show_bug.cgi?id=32949 https://bugs.llvm.org/show_bug.cgi?id=32948 Differential Revision: https://reviews.llvm.org/D32954 llvm-svn: 303104	2017-05-15 19:16:49 +00:00
Evgeny Stupachenko	2fecd38ab8	The patch adds CTLZ idiom recognition. Summary: The following loops should be recognized: i = 0; while (n) { n = n >> 1; i++; body(); } use(i); And replaced with builtin_ctlz(n) if body() is empty or for CPUs that have CTLZ instruction converted to countable: for (j = 0; j < builtin_ctlz(n); j++) { n = n >> 1; i++; body(); } use(builtin_ctlz(n)); Reviewers: rengolin, joerg Differential Revision: http://reviews.llvm.org/D32605 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 303102	2017-05-15 19:08:56 +00:00
Davide Italiano	6e7a212748	[NewGVN] Fix verification of MemoryPhis in verifyMemoryCongruency(). verifyMemoryCongruency() filters out trivially dead MemoryDef(s), as we find them immediately dead, before moving from TOP to a new congruence class. This fixes the same problem for PHI(s) skipping MemoryPhis if all the operands are dead. Differential Revision: https://reviews.llvm.org/D33044 llvm-svn: 303100	2017-05-15 18:50:53 +00:00
Teresa Johnson	41db92f9ae	Add support for handling ifuncs to GlobalValue::getBaseObject Summary: All GlobalIndirectSymbol types (not just GlobalAlias) should return their base object. Without this patch LTO would warn "Unable to determine comdat of alias!" for an ifunc. Reviewers: pcc Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D33202 llvm-svn: 303096	2017-05-15 18:28:29 +00:00
Kyle Butt	7d531daece	CodeGen: BlockPlacement: Increase tail duplication size for O3. At O3 we are more willing to increase size if we believe it will improve performance. The current threshold for tail-duplication of 2 instructions is conservative, and can be relaxed at O3. Benchmark results: llvm test-suite: 6% improvement in aha, due to duplication of loop latch 3% improvement in hexxagon 2% slowdown in lpbench. Seems related, but couldn't completely diagnose. Internal google benchmark: Produces 4% improvement on internal google protocol buffer serialization benchmarks. Differential-Revision: https://reviews.llvm.org/D32324 llvm-svn: 303084	2017-05-15 17:30:47 +00:00
Simon Pilgrim	55ff57861a	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146) Follow up to D33147 NVPTXTargetLowering::LowerCall was trusting the default argument values. Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33189 llvm-svn: 303082	2017-05-15 17:17:44 +00:00
Rafael Espindola	04bf953de4	Add an extra test for archive symbol tables. The table should include only defined symbols. llvm-svn: 303075	2017-05-15 15:56:23 +00:00
Simon Pilgrim	7d2f06ae22	[SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 add/sub/mul llvm-svn: 303074	2017-05-15 15:48:15 +00:00
Florian Hahn	af91e7e6d2	[AArch64] Enable FeatureFuseAES on Cortex-A72. This patch enables fusing dependent AESE/AESMC and AESD/AESIMC instruction pairs on Cortex-A72, as recommended in the Software Optimization Guide, section 4.10. llvm-svn: 303073	2017-05-15 15:15:22 +00:00
Dmitry Preobrazhensky	167f8b69e3	[AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64 See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D33123 llvm-svn: 303070	2017-05-15 14:28:23 +00:00
Simon Pilgrim	a12d65972a	[SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 shifts llvm-svn: 303069	2017-05-15 14:27:11 +00:00
Dinar Temirbulatov	aa2b7a6faa	Test commit. llvm-svn: 303059	2017-05-15 13:14:04 +00:00
Dmitry Preobrazhensky	03852a9dca	[AMDGPU][MC] Removed V_MQSAD_U16_U8 This instruction does not really exist See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018 Reviewers: vpykhtin, artem.tamazov Differential Revision: https://reviews.llvm.org/D33126 llvm-svn: 303055	2017-05-15 12:37:03 +00:00
John Brawn	9486becf09	[ARM] Mark LEApcrel instructions as isAsCheapAsAMove Doing this means that if an LEApcrel is used in two places we will rematerialize instead of generating two MOVs. This is particularly useful for printfs using the same format string, where we want to generate an address into a register that's going to get corrupted by the call. Differential Revision: https://reviews.llvm.org/D32858 llvm-svn: 303054	2017-05-15 11:57:54 +00:00
John Brawn	43132c46a6	[ARM] Mark LEApcrel as not having side effects Doing this lets us hoist it out of loops, and I've also marked it as rematerializable the same as the thumb1 and thumb2 counterparts. It looks like it being marked as such was just a mistake, as the commit that made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the LEApcrelJT instructions were marked as having side-effects, so it looks like the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was accidentally marked as such also. Differential Revision: https://reviews.llvm.org/D32857 llvm-svn: 303053	2017-05-15 11:50:21 +00:00
Ayman Musa	c5490e5a29	[X86] Relocate code of replacement of subtarget unsupported masked memory intrinsics to run also on -O0 option. Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional). CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0). Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation. Differential Revision: https://reviews.llvm.org/D32487 llvm-svn: 303050	2017-05-15 11:30:54 +00:00
Sam Kolton	1a5a5e6a2a	[TableGen] Add EncoderMethod to RegisterOperand Reviewers: stoklund, grosbach, vpykhtin Differential Revision: https://reviews.llvm.org/D32493 llvm-svn: 303044	2017-05-15 10:13:07 +00:00
Arnaud A. de Grandmaison	6d2417924c	MCObjectStreamer : fail with a diagnostic when emitting an out of range value. We were previously silently emitting bogus data in release mode, making it very hard to diagnose the error, or crashing with an assert in debug mode. A proper diagnostic is now always emitted when the value to be emitted is out of range. llvm-svn: 303041	2017-05-15 08:43:27 +00:00
Igor Breger	06c61e8639	[GlobalISel][X86] G_BR instruction select test llvm-svn: 303036	2017-05-15 07:03:38 +00:00
Daniel Jasper	61fa0dcac3	Add '#' to test regex that I forgot in r303025. llvm-svn: 303034	2017-05-15 04:58:27 +00:00
Daniel Jasper	54392a20a2	Fix two tests that weren't correctly copied. One didn't correctly fine the regex variable, the other still had a RUN line for FNOBUILTIN-checks, which weren't copied to the file. llvm-svn: 303025	2017-05-14 22:07:50 +00:00
Simon Pilgrim	d0ef9d8e93	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts llvm-svn: 303023	2017-05-14 20:52:11 +00:00
Simon Pilgrim	f96b4ab92d	[X86][AVX2] Fix costs for v4i64 ashr by splat llvm-svn: 303022	2017-05-14 20:25:42 +00:00
Simon Pilgrim	de4467b182	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat llvm-svn: 303021	2017-05-14 20:02:34 +00:00
Craig Topper	c27dc9f797	[X86] Add avx512vl command lines to the 128/256-bit vector-lzcnt tests so we can see what compare instructions are being used in the lookup table code. I noticed the 512-bit lzcnts don't use the X86 specific lookup table code and instead use the EXPAND case in LegalizeDAG. I was toying around with fixing this and noticed it would require compare instructions that generate i1 masks and then converting from mask to vector. Then I noticed that we don't test which compares are used with avx512vl and no avx512cd. llvm-svn: 303020	2017-05-14 19:38:11 +00:00
Craig Topper	87804dfe76	[X86] Cleanup some of the check-prefixes in the vector-lzcnt tests. Remove an unneeded prefix from the 32-bit command line. Make all the 64-bit triples match. Replace ALL with X64 and remove it from the 32-bit test. llvm-svn: 303019	2017-05-14 19:38:09 +00:00
Simon Pilgrim	d3f0d03cc5	[X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul sequences llvm-svn: 303017	2017-05-14 18:52:15 +00:00
Shoaib Meenai	ee97c5f012	[COFF] Gracefully handle empty .drectve sections Running `llvm-readobj -coff-directives msvcrt.lib` resulted in this error: Invalid data was encountered while parsing the file This happened because some of the object files in the archive have empty `.drectve` sections. These empty sections result in a `parse_failed` error being returned from `COFFObjectFile::getSectionContents()`, which in turn caused `llvm-readobj` to stop. With this change, `getSectionContents` now returns success, and like before the resulting array is empty. Patch by Dave Lee. Differential Revision: https://reviews.llvm.org/D32652 llvm-svn: 303014	2017-05-14 18:34:56 +00:00
Simon Pilgrim	5bef9c627e	[X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + mask. Tweak cost model to match what lowering actually does. llvm-svn: 303013	2017-05-14 17:59:46 +00:00
Simon Pilgrim	aa8dffb69b	[X86][SSE] Account for cost of extract/insert of v32i8 vector shifts llvm-svn: 303012	2017-05-14 17:36:07 +00:00
Simon Pilgrim	4599eaa09a	[X86][XOP] Account for cost of extract/insert of 256-bit vector shifts llvm-svn: 303010	2017-05-14 13:38:53 +00:00
Simon Pilgrim	f3ee9c6997	[X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts. llvm-svn: 303009	2017-05-14 11:46:26 +00:00
Simon Pilgrim	f3e87ac5f0	[X86][AVX] Add additional 32-bit target vector shift tests Shows issue with 32-bits not being able to peek through subvectors to extract constant splats llvm-svn: 303008	2017-05-14 11:13:03 +00:00
Craig Topper	479daaf74c	[InstSimplify] Add patterns for folding (A & B) \| (~A ^ B) -> (~A ^ B) and its commuted variants. We already had (A & ~B) \| (A ^ B), but we missed the cases where the not was part of the xor. llvm-svn: 303004	2017-05-14 07:54:43 +00:00
Craig Topper	982cc3b1d5	foo llvm-svn: 303003	2017-05-14 07:54:40 +00:00
Xinliang David Li	90a9ef6ced	Renable test that was disabled due to cost analysis llvm-svn: 303000	2017-05-14 02:58:39 +00:00
Zachary Turner	0683be2ebc	[llvm-pdbdump] Add the option to sort functions and data. llvm-svn: 302998	2017-05-14 01:13:40 +00:00
Simon Pilgrim	754c1618ec	[SelectionDAG] Added support for EXTRACT_SUBVECTOR/CONCAT_VECTORS demandedelts in ComputeNumSignBits llvm-svn: 302997	2017-05-13 22:10:58 +00:00
Simon Pilgrim	78b0ce03e9	[X86][SSE] Test showing missing EXTRACT_SUBVECTOR/CONCAT_VECTORS demandedelts support in ComputeNumSignBits llvm-svn: 302994	2017-05-13 21:50:18 +00:00
Simon Pilgrim	7666afd042	[SelectionDAG] Add VECTOR_SHUFFLE support to ComputeNumSignBits llvm-svn: 302993	2017-05-13 19:57:10 +00:00
Simon Pilgrim	ded23a7fb1	[X86][SSE] Test showing inability of ComputeNumSignBits to resolve shuffles llvm-svn: 302992	2017-05-13 17:41:07 +00:00
Justin Bogner	3a3e115e81	MSan: Mark MemorySanitizer tests that use x86 intrinsics as REQUIRES: x86 Tests that use target intrinsics are inherently target specific. Mark them as such. llvm-svn: 302990	2017-05-13 16:24:38 +00:00
Simon Pilgrim	ef46c2762a	[x86, SSE] AVX1 PR28129 (256-bit all-ones rematerialization) Further perf tests on Jaguar indicate that: vxorps %ymm0, %ymm0, %ymm0 vcmpps $15, %ymm0, %ymm0, %ymm0 is consistently faster (by about 9%) than: vpcmpeqd %xmm0, %xmm0, %xmm0 vinsertf128 $1, %xmm0, %ymm0, %ymm0 Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well. Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D32416 llvm-svn: 302989	2017-05-13 13:42:35 +00:00
Simon Pilgrim	7d62e4b455	[LoopOptimizer][Fix]PR32859, PR24738 The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form. Here it is: before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ] and after this change we have: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ] Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D33055 llvm-svn: 302988	2017-05-13 13:25:57 +00:00
Craig Topper	935f7b050f	[InstCombine] Prevent InstCombine from triggering an extra iteration if something changed in the initial Worklist creation Summary: If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop. This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now. This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation. I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this. This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that. Reviewers: davide, majnemer, spatel, chandlerc Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32453 llvm-svn: 302982	2017-05-13 06:56:04 +00:00
Justin Bogner	d2a10ad761	ConstProp: Split x86 SSE intrinsic tests out of calls.ll This allows us to mark this as `REQUIRES: x86`, since it uses x86 target specific intrinsics. llvm-svn: 302980	2017-05-13 05:52:17 +00:00
Justin Bogner	3c6fbad388	InstCombine: Move tests that use target intrinsics into subdirectories Tests with target intrinsics are inherently target specific, so it doesn't actually make sense to run them if we've excluded their target. llvm-svn: 302979	2017-05-13 05:39:46 +00:00
NAKAMURA Takumi	5057086a35	Disable llvm/test/Transforms/NewGVN/pr32934.ll while Davide is investigating. llvm-svn: 302977	2017-05-13 03:05:38 +00:00
Davide Italiano	d580dcd4da	[NewGVN] XFAIL a flaky test until I find out what's going on. I bet the change is correct but this test seems to expose some underlying problem that manifest only on some buildbots, and I'm not able to reproduce locally. Unfortunately I can't debug right now but I don't want to annoy people with spurious failures, so I'll XFAIL until I can take a look (over the weekend). llvm-svn: 302976	2017-05-13 02:45:47 +00:00
Dylan McKay	0c4debc123	[AVR] When lowering Select8/Select16, put newly generated MBBs in the same spot Contributed by Dr. Gergő Érdi. Fixes a bug. Raised from (https://github.com/avr-rust/rust/issues/49). llvm-svn: 302973	2017-05-13 00:22:34 +00:00
Justin Bogner	b713266331	AA: Use generic intrinsics for tests instead of target specific ones Update a few tests to use llvm.masked.load/store instead of arm neon vector loads and stores, and move the tests that are actually specific to those arm intrinsics to their own files. This lets us mark the tests that use target specific intrinsics as requiring those targets. llvm-svn: 302972	2017-05-13 00:12:52 +00:00
Xinliang David Li	66bdfca77a	[PartialInlining] Profile based cost analysis Implemented frequency based cost/saving analysis and related options. The pass is now in a state ready to be turne on in the pipeline (in follow up). Differential Revision: http://reviews.llvm.org/D32783 llvm-svn: 302967	2017-05-12 23:41:43 +00:00
Andrew Kaylor	b01e94ee8d	[TLI] Add mapping for various '__<func>_finite' forms of the math routines to SVML routines Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31789 llvm-svn: 302957	2017-05-12 22:11:26 +00:00
Andrew Kaylor	f7c864f89c	[ConstantFolding] Add folding for various math '__<func>_finite' routines generated from -ffast-math Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31788 llvm-svn: 302956	2017-05-12 22:11:20 +00:00
Andrew Kaylor	3cd8c16d7f	[TLI] Add declarations for various math header file routines from math-finite.h that create '__<func>_finite as functions Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31787 llvm-svn: 302955	2017-05-12 22:11:12 +00:00
Sanjay Patel	1b8589407b	[x86] add vector tests for demanded bits; NFC llvm-svn: 302949	2017-05-12 20:53:48 +00:00
Changpeng Fang	161e8c39af	AMDGPU/SI: Don't promote to vector if the load/store is volatile. Summary: We should not change volatile loads/stores in promoting alloca to vector. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D33107 llvm-svn: 302943	2017-05-12 20:31:12 +00:00
Simon Pilgrim	a1978aaefd	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146) This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146. Differential Revision: https://reviews.llvm.org/D33147 llvm-svn: 302942	2017-05-12 19:56:43 +00:00
Dehao Chen	65dd23e273	Add LiveRangeShrink pass to shrink live range within BB. Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 302938	2017-05-12 19:29:27 +00:00
Tom Stellard	fab6b1af6e	AMDGPU: Add lit.local.cfg to disable global-isel tests when global-isel is disabled This should fix bots broken by r302919. llvm-svn: 302928	2017-05-12 17:59:30 +00:00
Reid Kleckner	5bc8543a36	[codeview] Fix assertion failure introduced in r295354 refactoring CodeViewDebug sets Asm to nullptr to disable debug info generation. You can get a .ll file like no-cus.ll from 'clang -gcodeview -g0', which happens in the ubsan test suite. llvm-svn: 302923	2017-05-12 17:02:40 +00:00
Tom Stellard	a0d67c748a	AMDGPU/GlobalISel: Mark 32-bit integer constants as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D33115 llvm-svn: 302919	2017-05-12 16:46:46 +00:00
James Y Knight	d4e1b00e7c	[SPARC] Support 'f' and 'e' inline asm constraints. Based on patch by Patrick Boettcher and Chris Dewhurst. Differential Revision: https://reviews.llvm.org/D29116 llvm-svn: 302911	2017-05-12 15:59:10 +00:00
Sanjay Patel	ae362303fd	[x86] add tests for potential vector narrowing optimization (PR32790) llvm-svn: 302910	2017-05-12 15:56:39 +00:00
Davide Italiano	cc7257c200	[LoopUnroll] Fix a test. REQUIRE should be REQUIRES. Found by inspection. llvm-svn: 302909	2017-05-12 15:30:58 +00:00
Davide Italiano	41f5c7bcba	[NewGVN] Don't incorrectly reset the memory leader. This code was missing a check for stores, so we were thinking the congruency class didn't have any memory members, and reset the memory leader. Differential Revision: https://reviews.llvm.org/D33056 llvm-svn: 302905	2017-05-12 15:22:45 +00:00
Serguei Katkov	63c9c81152	[BPI] Ignore remainder while distributing the remaining probability from unreachanble This is a follow up patch for https://reviews.llvm.org/rL300440 to address a comment. To make implementation to be consistent with other cases we just ignore the remainder after distribution of remaining probability between reachable edges. If we reduced the probability of some edges coming to unreachable blocks we should distribute the remaining part across other edges coming to reachable blocks to satisfy the condition that sum of all probabilities should be equal to one. If this remaining part is not divided by number of "reachable" edges then we get this remainder. This remainder probability should be pretty small. Other cases just ignore if the sum of probabilities is not equal to one so we do the same. Reviewers: chandlerc, sanjoy, vsk, junbuml, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D32124 llvm-svn: 302883	2017-05-12 07:50:06 +00:00
Jonas Paulsson	d1ec738502	Handle a COPY with undef source operand in LowerCopy() Llvm-stress discovered that a COPY may end up in ExpandPostRA::LowerCopy() with an undef source operand. It is not possible for the target to handle this, as this flag is not passed to TII->copyPhysReg(). This patch solves this by treating such a COPY as an identity COPY. Review: Matthias Braun https://reviews.llvm.org/D32892 llvm-svn: 302877	2017-05-12 06:32:03 +00:00
Mikael Holmen	ce3ec4519b	[IfConversion] Keep the CFG updated incrementally in IfConvertTriangle Summary: Instead of using RemoveExtraEdges (which uses analyzeBranch, which cannot always be trusted) at the end to fixup the CFG we keep the CFG updated as we go along and remove or add branches and merge blocks. This way we won't have any problems if the involved MBBs contain unanalyzable instructions. This fixes PR32721. In that case we had a triangle EBB \| \ \| \| \| TBB \| / FBB where FBB didn't have any successors at all since it ended with an unconditional return. Then TBB and FBB were be merged into EBB, but EBB would still keep its successors, and the use of analyzeBranch and CorrectExtraCFGEdges wouldn't help to remove them since the return instruction is not analyzable (at least not on ARM). Reviewers: kparzysz, iteratee, MatzeB Reviewed By: iteratee Subscribers: aemerson, rengolin, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33037 llvm-svn: 302876	2017-05-12 06:28:58 +00:00
Chandler Carruth	d869b18826	[PM/Unswitch] Teach the new simple loop unswitch to handle loop invariant PHI inputs and to rewrite PHI nodes during the actual unswitching. The checking is quite easy, but rewriting the PHI nodes is somewhat surprisingly challenging. This should handle both branches and switches. I think this is now a full featured trivial unswitcher, and more full featured than the trivial cases in the old pass while still being (IMO) somewhat simpler in how it works. Next up is to verify its correctness in more widespread testing, and then to add non-trivial unswitching. Thanks to Davide and Sanjoy for the excellent review. There is one remaining question that I may address in a follow-up patch (see the review thread for details) but it isn't related to the functionality specifically. Differential Revision: https://reviews.llvm.org/D32699 llvm-svn: 302867	2017-05-12 02:19:59 +00:00
David Blaikie	488393f822	DWARF: Avoid cross-CU references under Fission Turns out that the Fission/Split DWARF package format (DWP) is currently insufficient to handle cross-CU (ref_addr) references. So for now, duplicate any debug info needed in these situations: * inlined_subroutine's abstract_origin * inlined variable's abstract_origin * types Keep the ref_addr behavior in general, including in the split DWARF inline debug info that can be emitted into the object files for online symbolication. Keep a flag to use the old (ref_addr) behavior for testing ways of addressing this limitation in the DWP tool (& for those not using DWP packaging). llvm-svn: 302858	2017-05-12 01:13:45 +00:00
Dehao Chen	8d1c983f45	Change sample profile writer to make it deterministic. Summary: This patch changes the function profile output order to be deterministic. In order to make it easier to understand, hottest functions (with most total samples) is ordered first. Reviewers: dnovillo, davidxl Reviewed By: dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33111 llvm-svn: 302851	2017-05-11 23:43:44 +00:00
Teresa Johnson	2a6b7991d4	Restrict call metadata based hotness detection to Sample PGO mode Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844	2017-05-11 23:18:05 +00:00
Guozhi Wei	22e7da9597	[PPC] Change the register constraint of the first source operand of instruction mtvsrdd to g8rc_nox0 According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0. This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified. Differential Revision: https://reviews.llvm.org/D32880 llvm-svn: 302834	2017-05-11 22:17:35 +00:00
Easwaran Raman	c103ef89ee	Decrease inlinecold-threshold to 45 I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829	2017-05-11 21:36:28 +00:00
Chad Rosier	aeffffdb44	[AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD. Differential Revision: http://reviews.llvm.org/D33101. llvm-svn: 302822	2017-05-11 20:07:24 +00:00
Vadzim Dambrouski	38e30197c3	[MSP430] Generate EABI-compliant libcalls Updates the MSP430 target to generate EABI-compatible libcall names. As a byproduct, adjusts the hardware multiplier options available in the MSP430 target, adds support for promotion of the ISD::MUL operation for 8-bit integers, and correctly marks R11 as used by call instructions. Patch by Andrew Wygle. Differential Revision: https://reviews.llvm.org/D32676 llvm-svn: 302820	2017-05-11 19:56:14 +00:00
Matt Arsenault	47ccafe787	AMDGPU: Remove tfe bit from flat instruction definitions We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814	2017-05-11 17:38:33 +00:00
Matt Arsenault	bf5482e4bb	AMDGPU: Pull fneg out of extract_vector_elt This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813	2017-05-11 17:26:25 +00:00
Adam Nemet	0aca09fc6c	[SLP] Emit optimization remarks The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811	2017-05-11 17:06:17 +00:00
Nemanja Ivanovic	96c3d626a2	[PowerPC] Eliminate integer compare instructions - vol. 1 This patch is the first in a series of patches to provide code gen for doing compares in GPRs when the compare result is required in a GPR. It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64 extensions. This first patch handles equality comparison on i32 operands with the result sign or zero extended. Differential Revision: https://reviews.llvm.org/D31847 llvm-svn: 302810	2017-05-11 16:54:23 +00:00
Simon Pilgrim	e2c055b8c5	[X86][AVX] Added zeroall/zeroupper scheduler tests Missing on SandyBridge and Btver2 models llvm-svn: 302804	2017-05-11 15:02:49 +00:00
Javed Absar	f3d7904d20	[IR] Allow attributes with global variables This patch extends llvm-ir to allow attributes to be set on global variables. An RFC was sent out earlier by my colleague James Molloy: http://lists.llvm.org/pipermail/cfe-dev/2017-March/053100.html A key part of that proposal was to extend LLVM-IR to carry attributes on global variables. This generic feature could be useful for multiple purposes. In our present context, it would be useful to carry user specified sections for bss/rodata/data. Reviewed by: Jonathan Roelofs, Reid Kleckner Differential Revision: https://reviews.llvm.org/D32009 llvm-svn: 302794	2017-05-11 12:28:08 +00:00
Alexander Potapenko	a658ae8fe2	[msan] Fix PR32842 It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless. This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero). Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0. This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact. For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code: orq %rcx, %rax jne .LBB0_6 , instead of: orl %ecx, %eax testb $1, %al jne .LBB0_6 llvm-svn: 302787	2017-05-11 11:07:48 +00:00
Chandler Carruth	97500a9918	[x86] Fix a failure to select with AVX-512 when the type legalizer manages to form a VSELECT with a non-i1 element type condition. Those are technically allowed in SDAG (at least, the generic type legalization logic will form them and I wouldn't want to try to audit everything te preclude forming them) so we need to be able to lower them. This isn't too hard to implement. We mark VSELECT as custom so we get a chance in C++, add a fast path for i1 conditions to get directly handled by the patterns, and a fallback when we need to manually force the condition to be an i1 that uses the vptestm instruction to turn a non-mask into a mask. This, unsurprisingly, generates awful code. But it at least doesn't crash. This was actually impacting open source packages built with LLVM for AVX-512 in the wild, so quickly landing a patch that at least stops the immediate bleeding. I think I've found where to fix the codegen quality issue, but less confident of that change so separating it out from the thing that doesn't change the result of any existing test case but causes mine to not crash. llvm-svn: 302785	2017-05-11 10:52:16 +00:00
Diana Picus	9cfbc6d94f	[ARM][GlobalISel] Legalize narrow scalar ops by widening This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB and MUL to 32 bits since we only have TableGen patterns for 32 bits. See the commit message for r292827 for more details. At this point we could just remove some of the tests for regbankselect and instruction-select, since we're not going to see any narrow operations at those levels anymore. Instead I decided to update them with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences generated by the legalizer. llvm-svn: 302782	2017-05-11 09:45:57 +00:00
Diana Picus	657bfd3302	[ARM][GlobalISel] Support for G_ANYEXT G_ANYEXT can be introduced by the legalizer when widening scalars. Add support for it in the register bank info (same mapping as everything else) and in the instruction selector. When selecting it, we treat it as a COPY, just like G_TRUNC. On this occasion we get rid of some assertions in selectCopy so we can reuse it. This shouldn't be a problem at the moment since we're not supporting any complicated cases (e.g. FPR, different register banks). We might want to separate the paths when we do. llvm-svn: 302778	2017-05-11 08:28:31 +00:00
Igor Breger	c7b5977bb1	[GlobalISel][X86] G_ICMP support. Summary: support G_ICMP for scalar types i8/i16/i64. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits, krytarowski Differential Revision: https://reviews.llvm.org/D32995 llvm-svn: 302774	2017-05-11 07:17:40 +00:00
David L. Jones	bbd97d273b	Revert "[SDAG] Relax conditions under stores of loaded values can be merged" This reverts r302712. The change fails with ASAN enabled: ERROR: AddressSanitizer: use-after-poison on address ... at ... READ of size 2 at ... thread T0 #0 ... in llvm::SDNode::getNumValues() const <snip>/include/llvm/CodeGen/SelectionDAGNodes.h:855:42 #1 ... in llvm::SDNode::hasAnyUseOfValue(unsigned int) const <snip>/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7270:3 #2 ... in llvm::SDValue::use_empty() const <snip> include/llvm/CodeGen/SelectionDAGNodes.h:1042:17 #3 ... in (anonymous namespace)::DAGCombiner::MergeConsecutiveStores(llvm::StoreSDNode*) <snip>/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12944:7 Reviewers: niravd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33081 llvm-svn: 302746	2017-05-10 23:56:21 +00:00
Sanjay Patel	40a87a909b	[InstCombine] remove fold that swaps xor/or with constants; NFCI // (X ^ C1) \| C2 --> (X \| C2) ^ (C1&~C2) This canonicalization was added at: https://reviews.llvm.org/rL7264 By moving xors out/down, we can more easily combine constants. I'm adding tests that do not change with this patch, so we can verify that those kinds of transforms are still happening. This is no-functional-change-intended because there's a later fold: // (X^C)\|Y -> (X\|Y)^C iff Y&C == 0 ...and demanded-bits appears to guarantee that any fold that would have hit the fold we're removing here would be caught by that 2nd fold. Similar reasoning was used in: https://reviews.llvm.org/rL299384 The larger motivation for removing this code is that it could interfere with the fix for PR32706: https://bugs.llvm.org/show_bug.cgi?id=32706 Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'. Differential Revision: https://reviews.llvm.org/D33050 llvm-svn: 302733	2017-05-10 21:33:55 +00:00
Matt Arsenault	3c5e4237c6	AMDGPU: Make some packed shuffles free VOP3P instructions can encode access to either half of the register. llvm-svn: 302730	2017-05-10 21:29:33 +00:00
Nirav Dave	a38c049fc5	[SDAG] Relax conditions under stores of loaded values can be merged Summary: Allow consecutive stores whose values come from consecutive loads to merged in the presense of other uses of the loads. Previously this was disallowed as in general the merged load cannot be shared with the other uses. Merging N stores into 1 may cause as many as N redundant loads. However in the context of caching this should have neglible affect on memory pressure and reduce instruction count making it almost always a win. Fixes PR32086. Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30471 llvm-svn: 302712	2017-05-10 19:53:41 +00:00
Sanjay Patel	0b24b7ef72	[InstSimplify, InstCombine] move 'or' simplification tests; NFC Surprisingly, I don't think these are redundant for InstSimplify. They were just misplaced as InstCombine tests. llvm-svn: 302684	2017-05-10 15:57:47 +00:00
Simon Pilgrim	3fc619a49c	[X86][SSE] Check vec_set BUILD_VECTOR tests on both 32 and 64-bit targets llvm-svn: 302683	2017-05-10 15:52:59 +00:00
Quentin Colombet	307e29124c	[AArch64][RegisterBankInfo] Change the default mapping of fp stores. For stores, check if the stored value is defined by a floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302679	2017-05-10 15:19:41 +00:00
Amara Emerson	816542ceb3	[AArch64] Enable use of reduction intrinsics. The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678	2017-05-10 15:15:38 +00:00
Sanjay Patel	30a7157372	[InstCombine] remove redundant tests The first test in this file is duplicated exactly in and.ll -> test33. We have commuted and vector variants there too. The second test is a composite of 2 folds. The first fold is tested independently in add.ll -> flip_and_mask (including vector variant). After that transform fires, the IR is identical to the first transform. llvm-svn: 302676	2017-05-10 14:54:49 +00:00
Sanjay Patel	beac508fc9	[InstCombine] fix auto-generated FileCheck-captured variable refs The script at utils/update_test_checks.py has (had?) a bug when variables start with the same sequence of letters (clearly, not all of the time). llvm-svn: 302674	2017-05-10 14:40:04 +00:00
Sanjay Patel	3d8905f3a0	[InstCombine] fix typo in test comment; NFC llvm-svn: 302669	2017-05-10 14:25:23 +00:00
Ulrich Weigand	93b369ed11	[SystemZ] Add miscellaneous instructions This adds a few missing instructions for the assembler and disassembler. Those should be the last missing general- purpose (Chapter 7) instructions for the z10 ISA. llvm-svn: 302667	2017-05-10 14:20:15 +00:00
Ulrich Weigand	d3604dc72c	[SystemZ] Add missing arithmetic instructions This adds the remaining general arithmetic instructions for assembler / disassembler use. Most of these are not useful for codegen; a few might be, and those are listed in the README.txt for future improvements. llvm-svn: 302665	2017-05-10 14:18:47 +00:00
Sam Clegg	c0d76649d4	[llvm-readobj] Improve errors on invalid binary The previous code was discarding the error message from createBinary() by calling errorToErrorCode(). This meant that such error were always reported unhelpfully as "Invalid data was encountered while parsing the file". Other tools such as llvm-objdump already produce a more the error message in this case. Differential Revision: https://reviews.llvm.org/D32985 llvm-svn: 302664	2017-05-10 14:18:11 +00:00
Sanjay Patel	2e069f250a	[InstCombine] add (ashr (shl i32 X, 31), 31), 1 --> and (not X), 1 This is another step towards favoring 'not' ops over random 'xor' in IR: https://bugs.llvm.org/show_bug.cgi?id=32706 This transformation may have occurred in longer IR sequences using computeKnownBits, but that could be much more expensive to calculate. As the scalar result shows, we do not currently favor 'not' in all cases. The 'not' created by the transform is transformed again (unnecessarily). Vectors don't have this problem because vectors are (wrongly) excluded from several other combines. llvm-svn: 302659	2017-05-10 13:56:52 +00:00
Michael Zuckerman	1f1a912c60	[LLVM][inline-asm] Altmacro string escape character '!' This patch is the fourth patch in a series of reviews for the Altmacro feature. This patch introduces a new escape character '!' and it depends on D32701. according to https://sourceware.org/binutils/docs/as/Altmacro.html: "single-character string escape To include any single character literally in a string (even if the character would otherwise have some special meaning), you can prefix the character with !' (an exclamation mark). For example, you can write <4.3 !> 5.4!!>' to get the literal text `4.3 > 5.4!'. " Differential Revision: https://reviews.llvm.org/D32792 llvm-svn: 302652	2017-05-10 13:08:11 +00:00
Mikael Holmen	21c867c26e	[IfConversion] Add missing check in IfConversion/canFallThroughTo Summary: When trying to figure out if MBB could fallthrough to ToMBB (possibly by falling through a bunch of other MBBs) we didn't actually check if there was fallthrough between the last two blocks in the chain. Reviewers: kparzysz, iteratee, MatzeB Reviewed By: kparzysz, iteratee Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D32996 llvm-svn: 302650	2017-05-10 13:06:13 +00:00
Jonas Paulsson	11d251c05c	[SystemZ] Implement getRepRegClassFor() This method must return a valid register class, or the list-ilp isel scheduler will crash. For MVT::Untyped nullptr was previously returned, but now ADDR128BitRegClass is returned instead. This is needed just as long as list-ilp (and probably also list-hybrid) is still there. Review: Ulrich Weigand, A Trick https://reviews.llvm.org/D32802 llvm-svn: 302649	2017-05-10 13:03:25 +00:00
Dmitry Preobrazhensky	da61a7f9ef	[AMDGPU][MC] Corrected v_madak/madmk to avoid printing "_e32" in disassembler output See bug 32927: https://bugs.llvm.org//show_bug.cgi?id=32927 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D32913 llvm-svn: 302648	2017-05-10 13:00:28 +00:00
Igor Breger	8e5e40862a	[GlobalISel][X86] Split test file. NFC llvm-svn: 302647	2017-05-10 12:58:31 +00:00
Ulrich Weigand	c7eb5a95b2	[SystemZ] Add decimal integer instructions This adds the set of decimal integer (BCD) instructions for assembler / disassembler use. llvm-svn: 302646	2017-05-10 12:42:45 +00:00
Ulrich Weigand	33a441adf9	[SystemZ] Add crypto instructions This adds the set of message-security assist instructions for assembler / disassembler use. llvm-svn: 302645	2017-05-10 12:42:00 +00:00
Ulrich Weigand	435cd1a3e4	[SystemZ] Add translate/convert instructions This adds the set of character-set translate and convert instructions for assembler / disassembler use. llvm-svn: 302644	2017-05-10 12:41:12 +00:00
Ulrich Weigand	eb17909536	[SystemZ] Add missing memory/string instructions This adds a number of missing memory and string instructions for assembler / disassembler use. llvm-svn: 302643	2017-05-10 12:40:15 +00:00
Ulrich Weigand	52461726dd	[SystemZ] Reformat assembler/disassembler tests The assembler and disassmebler test cases started out formatted and sorted in a particular way, but this got lost over time as patches were added. Reformat them again. NFC. llvm-svn: 302642	2017-05-10 12:39:11 +00:00
Simon Pilgrim	c29af824bf	[DAGCombiner] Add vector support to fold (shl/srl 0, x) -> 0 llvm-svn: 302641	2017-05-10 12:34:27 +00:00
Chandler Carruth	f3bd8ddedb	Revert r301950: SpeculativeExecution: Stop using whitelist for costs This pass doesn't correctly handle testing for when it is legal to hoist arbitrary instructions. The whitelist happens to make it safe, so before it is removed the pass's legality checks will need to be enhanced. Details have been added to the code review thread for the patch. llvm-svn: 302640	2017-05-10 12:30:07 +00:00
Amara Emerson	836b0f48c1	Add a late IR expansion pass for the experimental reduction intrinsics. This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631	2017-05-10 09:42:49 +00:00
Igor Breger	fda31e64e0	[GlobalISel][X86] G_ZEXT i1 to i32/i64 support. Summary: Support G_ZEXT i1 to i32/i64 instruction selection. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32965 llvm-svn: 302623	2017-05-10 06:52:58 +00:00
Ahmed Bougacha	a09ff59cc2	[CodeGen] Don't require AA in TwoAddress at -O0. This is a follow-up to r302611, which moved an -O0 computation of DT from SDAGISel to TwoAddress. Don't use it here either, and avoid computing it completely. The only use was forwarding the analysis as an optional argument to utility functions. Differential Revision: https://reviews.llvm.org/D32766 llvm-svn: 302612	2017-05-10 00:56:00 +00:00
Ahmed Bougacha	604526fe87	[CodeGen] Don't require AA in SDAGISel at -O0. Before r247167, the pass manager builder controlled which AA implementations were used, exporting them all in the AliasAnalysis analysis group. Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA implementations if made available in the pass pipeline. But regardless, SDAGISel is required at O0, and really doesn't need to be doing fancy optimizations based on useful AA results. Don't require AA at CodeGenOpt::None, and only use it otherwise. This does have a functional impact (and one testcase is pessimized because we can't reuse a load). But I think that's desirable no matter what. Note that this alone doesn't result in less DT computations: TwoAddress was previously able to reuse the DT we computed for SDAG. That will be fixed separately. Differential Revision: https://reviews.llvm.org/D32766 llvm-svn: 302611	2017-05-10 00:39:30 +00:00
Ahmed Bougacha	8c358e3016	[CodeGen] Compute DT/LI lazily in SafeStackLegacyPass. NFC. We currently require SCEV, which requires DT/LI. Those are expensive to compute, but the pass only runs for functions that have the safestack attribute. Compute DT/LI to build SCEV lazily, only when the pass is actually going to transform the function. Differential Revision: https://reviews.llvm.org/D31302 llvm-svn: 302610	2017-05-10 00:39:25 +00:00
Ahmed Bougacha	bcb79e6386	[CodeGen] Add an -O0 backend pipeline test. NFC. This should hopefully makes changes to the O0 pipeline obvious; it's easy to require expensive passes, and this helps make informed decisions. Case in point: in the few weeks separating the time when I initially wrote this patch to the time when I committed, the test regressed as r302103 added another use of DT! llvm-svn: 302608	2017-05-10 00:39:17 +00:00
Sam Clegg	2ffff5af85	[WebAssembly] Improve libObject support for wasm imports and exports Previously we had only supported the importing and exporting of functions and globals. Also, add usefull overload of getWasmSymbol() and getNumberOfSymbols() in support of lld port. Differential Revision: https://reviews.llvm.org/D33011 llvm-svn: 302601	2017-05-09 23:48:41 +00:00
Sanjay Patel	a06384f3d8	[InstCombine] add tests for andn; NFC llvm-svn: 302599	2017-05-09 23:40:13 +00:00
Keno Fischer	06f962c1e8	[GVN] Fix a crash on encountering non-integral pointers Summary: This fixes the immediate crash caused by introducing an incorrect inttoptr before attempting the conversion. There may still be a legality check missing somewhere earlier for non-integral pointers, but this change seems necessary in any case. Reviewers: sanjoy, dberlin Reviewed By: dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32623 llvm-svn: 302587	2017-05-09 21:07:20 +00:00
Sanjay Patel	608cde04ab	[InstCombine] update test file to use FileCheck; NFC llvm-svn: 302585	2017-05-09 20:46:12 +00:00
Zvi Rackover	b483e28c77	DAGCombine: Combine shuffles of splat-shuffles Summary: Reapply r299047, but this time handle correctly splat-masks with undef elements. Reviewers: spatel, RKSimon, eli.friedman, andreadb Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31961 llvm-svn: 302583	2017-05-09 20:25:38 +00:00
Matthew Simpson	78fd46b230	[AArch64] Consider widening instructions in cost calculations The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582	2017-05-09 20:18:12 +00:00
Reid Kleckner	b5fced7324	[codeview] Check for a DIExpression offset for local variables Fixes inalloca parameters, which previously all pointed to the same offset. Extend the test to use llvm-readobj so that we can test the offset in a readable way. llvm-svn: 302578	2017-05-09 19:59:29 +00:00
Adrian Prantl	c10d0e5ccd	Make it illegal for two Functions to point to the same DISubprogram As recently discussed on llvm-dev [1], this patch makes it illegal for two Functions to point to the same DISubprogram and updates FunctionCloner to also clone the debug info of a function to conform to the new requirement. To simplify the implementation it also factors out the creation of inlineAt locations from the Inliner into a general-purpose utility in DILocation. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html <rdar://problem/31926379> Differential Revision: https://reviews.llvm.org/D32975 This reapplies r302469 with a fix for a bot failure (reparentDebugInfo now checks for the case the orig and new function are identical). llvm-svn: 302576	2017-05-09 19:47:37 +00:00
Wolfgang Pieb	15fa44698c	[DWARF] Fix a parsing issue with type unit headers. Reviewers: dblaikie Differential Revision: https://reviews.llvm.org/D32987 llvm-svn: 302574	2017-05-09 19:38:38 +00:00
Jacques Pienaar	0dbcc34f6b	[lanai] Add computeKnownBitsForTargetNode for Lanai. Summary: computeKnownBitsForTargetNode was not defined for Lanai which resulted in additional AND's with 0x1 for the output of SETCC instructions. Reviewers: eliben, majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29605 llvm-svn: 302568	2017-05-09 18:35:26 +00:00
Sam Clegg	a0efcfe92b	[WebAssembly] Fix validation of start function The check for valid start function was inverted. Added a new test in test/Object to check this case and fixed the existing tests in for ObjectYAML. Differential Revision: https://reviews.llvm.org/D32986 llvm-svn: 302560	2017-05-09 17:51:38 +00:00
Davide Italiano	d6bb8cab03	[NewGVN] Fix a consistent order for phi nodes operands. The way we currently define congruency for two PHIExpression(s) is: 1) The operands to the phi functions are congruent 2) The PHIs are defined in the same BasicBlock. NewGVN works under the assumption that phi operands are in predecessor order, or at least in some consistent order. OTOH, is valid IR: patatino: %meh = phi i16 [ %0, %winky ], [ %conv1, %tinky ] %banana = phi i16 [ %0, %tinky ], [ %conv1, %winky ] br label %end and the in-memory representations of the two SSA registers have an inconsistent order. This violation of NewGVN assumptions results into two PHIs found congruent when they're not. While we think it's useful to have always a consistent order enforced, let's fix this in NewGVN sorting uses in predecessor order before creating a PHI expression. Differential Revision: https://reviews.llvm.org/D32990 llvm-svn: 302552	2017-05-09 16:58:28 +00:00
Craig Topper	f893d49f0c	[X86] Add more patterns for BZHI isel This patch adds more patterns that a reasonable person might write that can be compiled to BZHI. This adds support for (~0U >> (32 - b)) & a; and a << (32 - b) >> (32 - b); This was inspired by the code in APInt::clearUnusedBits. This can pass an index of 32 to the bzhi instruction which a quick test of Haswell hardware shows will not mask any bits. Though the description text in the Intel manual says the "index is saturated to OperandSize-1". The pseudocode in the same manual indicates no bits will be zeroed for this case. I think this is still missing cases where the subtract portion is an 8-bit operation. Differential Revision: https://reviews.llvm.org/D32616 llvm-svn: 302549	2017-05-09 16:32:11 +00:00
Sanjay Patel	6844e21f59	[InstCombineCasts] Fix checks in sext->lshr->trunc pattern. The comment says to avoid the case where zero bits are shifted into the truncated value, but the code checks that the shift is smaller than the truncated value instead of the number of bits added by the sign extension. Fixing this allows a shift by more than the value size to be introduced, which is undefined behavior, so the shift is capped at the value size minus one, which has the expected behavior of filling the value with the sign bit. Patch by Jacob Young! Differential Revision: https://reviews.llvm.org/D32285 llvm-svn: 302548	2017-05-09 16:24:59 +00:00
Guy Blank	0c42d8c35b	VX512] Only look at lower bit in constant scalar masks for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit. This patch handles cases where the lower bit is '1'. Differential Revision: https://reviews.llvm.org/D32805 llvm-svn: 302546	2017-05-09 16:16:48 +00:00
Reid Kleckner	3a363fff7e	Re-land "Use the frame index side table for byval and inalloca arguments" This re-lands r302483. It was not the cause of PR32977. llvm-svn: 302544	2017-05-09 16:02:20 +00:00
Reid Kleckner	84075fddff	Re-land "Don't add DBG_VALUE instructions for static allocas in dbg.declare" This re-lands commit r302461. It was not the cause of PR32977. llvm-svn: 302543	2017-05-09 16:01:47 +00:00
Hans Wennborg	66fb0d9768	Revert r302469 "Make it illegal for two Functions to point to the same DISubprogram" This caused PR32977. Original commit message: > Make it illegal for two Functions to point to the same DISubprogram > > As recently discussed on llvm-dev [1], this patch makes it illegal for > two Functions to point to the same DISubprogram and updates > FunctionCloner to also clone the debug info of a function to conform > to the new requirement. To simplify the implementation it also factors > out the creation of inlineAt locations from the Inliner into a > general-purpose utility in DILocation. > > [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html > <rdar://problem/31926379> > > Differential Revision: https://reviews.llvm.org/D32975 llvm-svn: 302533	2017-05-09 14:44:15 +00:00
Anna Thomas	0691483435	[LV] Fix insertion point for shuffle vectors in first order recurrence Summary: In first order recurrence vectorization, when the previous value is a phi node, we need to set the insertion point to the first non-phi node. We can have the previous value being a phi node, due to the generation of new IVs as part of trunc optimization [1]. [1] https://reviews.llvm.org/rL294967 Reviewers: mssimpso, mkuper Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32969 llvm-svn: 302532	2017-05-09 14:29:33 +00:00
Guy Blank	31b37297de	[X86][AVX512] Refine some avx512er intrinsics tests. NFC. The modified tests should test the masked intrinsics. Currently the mask is constant, which with a future patch (https://reviews.llvm.org/D32805) will cause the intrinsics to be replaced with an unmasked version. This patch changes the constant mask to be a variable one. llvm-svn: 302529	2017-05-09 14:03:51 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Simon Dardis	659c43f11a	Revert "[MIPS] Add support to match more patterns for DINS instruction" This reverts commit rL302512. This broke the mips buildbots. llvm-svn: 302526	2017-05-09 13:18:48 +00:00
Simon Pilgrim	ca3a63a849	[X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X) Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD. Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal. Differential Revision: https://reviews.llvm.org/D32973 llvm-svn: 302525	2017-05-09 13:14:40 +00:00
Guy Blank	5995802911	[X86][AVX512] Add test for masking of scalar instructions. llvm-svn: 302519	2017-05-09 12:32:48 +00:00
Nikolai Bozhenov	b7bf386e80	[X86] Clang option -fuse-init-array has no effect when generating for MCU target Reviewers: Eugene.Zelenko, dschuff, craig.topper Reviewed By: craig.topper Subscribers: ahatanak, aaboud, DavidKreitzer, llvm-commits, cfe-commits Differential Revision: https://reviews.llvm.org/D32543 Patch by AndreiGrischenko <andrei.l.grischenko@intel.com> llvm-svn: 302513	2017-05-09 10:14:03 +00:00
Strahinja Petrovic	27ae4c3259	[MIPS] Add support to match more patterns for DINS instruction This patch adds support for recognizing patterns to match DINS instruction. Differential Revision: https://reviews.llvm.org/D31465 llvm-svn: 302512	2017-05-09 10:02:00 +00:00
Reid Kleckner	41bb94233b	Revert "Don't add DBG_VALUE instructions for static allocas in dbg.declare" This reverts commit r302461. It appears to be causing failures compiling gtest with debug info on the Linux sanitizer bot. I was unable to reproduce the failure locally, however. llvm-svn: 302504	2017-05-09 01:57:44 +00:00
Teresa Johnson	720d9b4111	Fix code section prefix for proper layout Summary: r284533 added hot and cold section prefixes based on profile information, to enable grouping of hot/cold functions at link time. However, it used "cold" as the prefix for cold sections, but gold only recognizes "unlikely" (which is used by gcc for cold sections). Therefore, cold sections were not properly being grouped. Switch to using "unlikely" Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32983 llvm-svn: 302502	2017-05-09 01:43:24 +00:00
Reid Kleckner	9f29914d40	Revert "Use the frame index side table for byval and inalloca arguments" This reverts r302483 and it's follow up fix. llvm-svn: 302493	2017-05-09 01:14:39 +00:00
Evgeniy Stepanov	f7e8acf0fc	Ignore !associated metadata with null argument. Fixes PR32577 (comment 10). Such metadata may legitimately appear in LTO. llvm-svn: 302485	2017-05-08 23:46:20 +00:00
Reid Kleckner	918e8157d8	Relax Dwarf filecheck test for 32-bit hosts llvm-svn: 302484	2017-05-08 23:27:52 +00:00
Reid Kleckner	45efcf0c96	Use the frame index side table for byval and inalloca arguments Summary: For inalloca functions, this is a very common code pattern: %argpack = type <{ i32, i32, i32 }> define void @f(%argpack* inalloca %args) { entry: %a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0 %b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1 %c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2 tail call void @llvm.dbg.declare(metadata i32* %a, ... "a") tail call void @llvm.dbg.declare(metadata i32* %c, ... "b") tail call void @llvm.dbg.declare(metadata i32* %b, ... "c") Even though these GEPs can be simplified to a constant offset from EBP or RSP, we don't do that at -O0, and each GEP is computed into a register. Registers used to compute argument addresses are typically spilled and clobbered very quickly after the initial computation, so live debug variable tracking loses information very quickly if we use DBG_VALUE instructions. This change moves processing of dbg.declare between argument lowering and basic block isel, so that we can ask if an argument has a frame index or not. If the argument lives in a register as is the case for byval arguments on some targets, then we don't put it in the side table and during ISel we emit DBG_VALUE instructions. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32980 llvm-svn: 302483	2017-05-08 23:20:27 +00:00
Sanjoy Das	9c52c7656c	Add basic test case for -instnamer llvm-svn: 302482	2017-05-08 23:18:46 +00:00
Sanjay Patel	2a895ce134	[InstCombine] add tests from D32285 to show current problems; NFC llvm-svn: 302475	2017-05-08 22:33:20 +00:00
Adrian Prantl	200a5ef526	Make it illegal for two Functions to point to the same DISubprogram As recently discussed on llvm-dev [1], this patch makes it illegal for two Functions to point to the same DISubprogram and updates FunctionCloner to also clone the debug info of a function to conform to the new requirement. To simplify the implementation it also factors out the creation of inlineAt locations from the Inliner into a general-purpose utility in DILocation. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html <rdar://problem/31926379> Differential Revision: https://reviews.llvm.org/D32975 llvm-svn: 302469	2017-05-08 21:17:08 +00:00
Sanjay Patel	a1c8814891	[InstCombine] add folds for not-of-shift-right This is another step towards getting rid of dyn_castNotVal, so we can recommit: https://reviews.llvm.org/rL300977 As the tests show, we were missing the lshr case for constants and both ashr/lshr vector splat folds. The ashr case with constant was being performed inefficiently in 2 steps. It's also possible there was a latent bug in that case because we can't do that fold if the constant is positive: http://rise4fun.com/Alive/Bge llvm-svn: 302465	2017-05-08 20:49:59 +00:00
Tim Northover	c48c993b75	ARM: use divmod libcalls on embedded MachO platforms too. The separated libcalls are implemented in terms of __divmodsi4 and __udivmodsi4 anyway, so we should always use them if possible. llvm-svn: 302462	2017-05-08 20:00:14 +00:00
Reid Kleckner	bf828eedb4	Don't add DBG_VALUE instructions for static allocas in dbg.declare Summary: An llvm.dbg.declare of a static alloca is always added to the MachineFunction dbg variable map, so these values are entirely redundant. They survive all the way through codegen to be ignored by DWARF emission. Effectively revert r113967 Two bugpoint-reduced test cases from 2012 broke as a result of this change. Despite my best efforts, I haven't been able to rewrite the test case using dbg.value. I'm not too concerned about the lost coverage because these were reduced from the test-suite, which we still run. Reviewers: aprantl, dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32920 llvm-svn: 302461	2017-05-08 19:58:15 +00:00
Quentin Colombet	55a72b3b05	[AArch64][RegisterBankInfo] Change the default mapping of fp loads. This fixes PR32550, in a way that does not imply running the greedy mode at O0. The fix consists in checking if a load is used by any floating point instruction and if yes, we return a default mapping with FPR instead of GPR. llvm-svn: 302453	2017-05-08 18:16:31 +00:00
Sanjay Patel	322db476f3	[InstCombine] move/add tests for not(shr (not X), Y); NFC llvm-svn: 302451	2017-05-08 18:16:04 +00:00
Daniel Berlin	0f2af7f93b	ConstantFold: Handle gep nonnull, undef as well llvm-svn: 302447	2017-05-08 17:37:33 +00:00
Daniel Berlin	74ffa5c62f	ConstantFold: Fold getelementptr (i32, i32* null, i64 undef) to null. Transforms/IndVarSimplify/2011-10-27-lftrnull will fail if this regresses. Transforms/GVN/PRE/2011-06-01-NonLocalMemdepMiscompile.ll has been changed to still test what it was trying to test. llvm-svn: 302446	2017-05-08 17:37:29 +00:00
Craig Topper	868813ffbb	[ValueTracking] Use KnownOnes to provide a better bound on known zeros for ctlz/cttz intrinics This patch uses KnownOnes of the input of ctlz/cttz to bound the value that can be returned from these intrinsics. This makes these intrinsics more similar to the handling for ctpop which already uses known bits to produce a similar bound. Differential Revision: https://reviews.llvm.org/D32521 llvm-svn: 302444	2017-05-08 17:22:34 +00:00
Zvi Rackover	0f1ffb6cab	[X86] Split test configurations. NFC. Split test that includes reproducer for pr32967 to KNL and SKX. llvm-svn: 302442	2017-05-08 16:54:25 +00:00
Sanjay Patel	0fbdaa1f0c	[InstCombine] add another test for PR32949; NFC A patch for the InstSimplify variant of this bug is up for review here: https://reviews.llvm.org/D32954 llvm-svn: 302434	2017-05-08 15:58:57 +00:00
Zvi Rackover	7fa777fb74	Adding reproducer for pr32967. NFC. llvm-svn: 302426	2017-05-08 14:47:32 +00:00
Simon Pilgrim	df39b03f29	[X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks. Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead. This is a first step in several things we can do to improve PBLENDV support: * Better matching of X86ISD::ANDNP patterns. * Handle floating point cases. * Better vector and bitcast support in computeNumSignBits. * Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.). Differential Revision: https://reviews.llvm.org/D32953 llvm-svn: 302424	2017-05-08 14:16:39 +00:00
Simon Pilgrim	8a3b9c7401	Normalize line endings. NFCI, llvm-svn: 302422	2017-05-08 13:32:34 +00:00
Simon Pilgrim	f5ca255d18	[ARM][NEON] Add support for ISD::ABS lowering Update NEON int_arm_neon_vabs intrinsic to use the ISD::ABS opcode directly Added constant folding tests. Differential Revision: https://reviews.llvm.org/D32938 llvm-svn: 302417	2017-05-08 10:37:34 +00:00
Martin Storsjo	fd4c158a84	[ARM] Clear the constant pool cache on explicit .ltorg directives Multiple ldr pseudoinstructions with the same constant value will reuse the same constant pool entry. However, if the constant pool is explicitly flushed with a .ltorg directive, we should not try to reference constants in the previous pool any longer, since they may be out of range. This fixes assembling hand-written assembler source which repeatedly loads the same constant value, across a binary size larger than the pc-relative fixup range for ldr instructions (4096 bytes). Such assembler source already uses explicit .ltorg instructions to emit constant pools with regular intervals. However if we try to reuse constants emitted in earlier pools, they end up out of range. This makes the output of the testcase match what binutils gas does (prior to this patch, it would fail to assemble). Differential Revision: https://reviews.llvm.org/D32847 llvm-svn: 302416	2017-05-08 10:26:24 +00:00
Igor Breger	810c6257f1	[GlobalISel][X86] G_GEP selection support. Summary: [GlobalISel][X86] G_GEP selection support. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32396 llvm-svn: 302412	2017-05-08 09:40:43 +00:00
Igor Breger	605b965ae5	[GlobalISel][X86] G_MUL legalizer/selector support. Summary: G_MUL legalizer/selector/regbank support. Use only Tablegen-erated instruction selection. This patch dealing with legal operations only. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: krytarowski, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32698 llvm-svn: 302410	2017-05-08 09:03:37 +00:00
Andrew Ng	9f9c41adec	[Lit] Fix to prevent creation of "%SystemDrive%" directory on Windows. This patch propogates the environment variable SYSTEMDRIVE on Windows when running the unit tests. This prevents the creation of a directory named "%SystemDrive%" when running the unit tests from FileSystemTest that use the function llvm::sys::fs::remove_directories which in turn uses SHFileOperationW. It is within SHFileOperationW that this environment variable may be used and if undefined causes the creation of a "%SystemDrive%" directory in the current directory. Differential Revision: https://reviews.llvm.org/D32910 llvm-svn: 302409	2017-05-08 08:55:38 +00:00
Dean Michael Berris	9bcaed867a	[XRay] Custom event logging intrinsic This patch introduces an LLVM intrinsic and a target opcode for custom event logging in XRay. Initially, its use case will be to allow users of XRay to log some type of string ("poor man's printf"). The target opcode compiles to a noop sled large enough to enable calling through to a runtime-determined relative function call. At runtime, when X-Ray is enabled, the sled is replaced by compiler-rt with a trampoline to the logic for creating the custom log entries. Future patches will implement the compiler-rt parts and clang-side support for emitting the IR corresponding to this intrinsic. Reviewers: timshen, dberris Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D27503 llvm-svn: 302405	2017-05-08 05:45:21 +00:00
Eric Beckmann	521a739f5f	Quick fix to D32609, it seems .o files are not transferred in all cases. Therefore the .o file in question is renamed to .obj.coff. llvm-svn: 302400	2017-05-08 02:47:25 +00:00
Eric Beckmann	efef15a0c7	Update llvm-readobj -coff-resources to display tree structure. Summary: Continue making updates to llvm-readobj to display resource sections. This is necessary for testing the up and coming cvtres tool. Reviewers: zturner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32609 llvm-svn: 302399	2017-05-08 02:47:07 +00:00
Eric Beckmann	03de7c1501	Revert "Hopefully one last commit to fix this patch, addresses string reference" Summary: This reverts commit 56beec1b1cfc6d263e5eddb7efff06117c0724d2. Revert "Quick fix to D32609, it seems .o files are not transferred in all cases." This reverts commit 7652eecd29cfdeeab7f76f687586607a99ff4e36. Revert "Update llvm-readobj -coff-resources to display tree structure." This reverts commit 422b62c4d302cfc92401418c2acd165056081ed7. Reviewers: zturner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32958 llvm-svn: 302397	2017-05-08 02:25:03 +00:00
Eric Beckmann	300831c0c4	Quick fix to D32609, it seems .o files are not transferred in all cases. Therefore the .o file in question is renamed to .obj.coff. llvm-svn: 302388	2017-05-07 23:31:14 +00:00
Eric Beckmann	33fca46ec3	Update llvm-readobj -coff-resources to display tree structure. Summary: Continue making updates to llvm-readobj to display resource sections. This is necessary for testing the up and coming cvtres tool. Reviewers: zturner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32609 llvm-svn: 302386	2017-05-07 22:47:22 +00:00
Simon Pilgrim	2d1c6d6e8d	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics. Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378	2017-05-07 20:58:55 +00:00
Sanjay Patel	390f1dc6ba	[InstSimplify] add tests for PR32949 miscompile; NFC llvm-svn: 302374	2017-05-07 18:19:13 +00:00
Zvi Rackover	973ff7c74c	InstructionSimplify: Relanding r301766 Summary: Re-applying r301766 with a fix to a typo and a regression test. The log message for r301766 was: ================================================================================== InstructionSimplify: Canonicalize shuffle operands. NFC-ish. Summary: Apply canonicalization rules: 1. Input vectors with no elements selected from can be replaced with undef. 2. If only one input vector is constant it shall be the second one. This allows constant-folding to cover more ad-hoc simplifications that were in place and avoid duplication for RHS and LHS checks. There are more rules we may want to add in the future when we see a justification. e.g. mask elements that select undef elements can be replaced with undef. ================================================================================== Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32863 llvm-svn: 302373	2017-05-07 18:16:37 +00:00
Sanjay Patel	599e65b1ff	[InstSimplify] use ConstantRange to simplify or-of-icmps We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases. I had to check some of these with Alive to prove to myself it's right, but everything seems to check out. Eg, the deleted code in instcombine was completely ignoring predicates with mismatched signedness. This is a follow-up to: https://reviews.llvm.org/rL301260 https://reviews.llvm.org/D32143 llvm-svn: 302370	2017-05-07 15:11:40 +00:00
Simon Pilgrim	33f7397cc0	[X86][AVX512] Relax assertion and just exit combine for unsupported types (PR32907) llvm-svn: 302361	2017-05-06 20:53:52 +00:00
Simon Pilgrim	fea153f341	[X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegen Extend NoVLX targets to use the 512-bit versions llvm-svn: 302359	2017-05-06 19:11:59 +00:00
Simon Pilgrim	781cb10104	[X86][SSE] Break register dependencies on v16i8/v8i16 BUILD_VECTOR on SSE41 rL294581 broke unnecessary register dependencies on partial v16i8/v8i16 BUILD_VECTORs, but on SSE41 we (currently) use insertion for full BUILD_VECTORs as well. By allowing full insertion to occur on SSE41 targets we can break register dependencies here as well. llvm-svn: 302355	2017-05-06 17:30:39 +00:00
Simon Pilgrim	946f08c618	[X86][AVX2] Add scheduling latency/throughput tests for some AVX2 instructions Many more to come... llvm-svn: 302338	2017-05-06 13:46:09 +00:00
Simon Pilgrim	2c15447f99	[DAGCombiner] If ISD::ABS is legal/custom, use it directly instead of canonicalizing first. Remove an extra canonicalization step if ISD::ABS is going to be used anyway. Updated x86 abs combine to check that we are lowering from both canonicalizations. llvm-svn: 302337	2017-05-06 13:44:42 +00:00
Krzysztof Parzyszek	d0c71ef8ab	[RDF] Remove covered parts of reached uses for phi and use in same block llvm-svn: 302305	2017-05-05 22:10:32 +00:00
Matthias Braun	4682ac6c83	ARM: Compute MaxCallFrame size early This exposes a method in MachineFrameInfo that calculates MaxCallFrameSize and calls it after instruction selection in the ARM target. This avoids ARMBaseRegisterInfo::canRealignStack()/ARMFrameLowering::hasReservedCallFrame() giving different answers in early/late phases of codegen. The testcase shows a particular nasty example result of that where we would fail to properly align an alloca. Differential Revision: https://reviews.llvm.org/D32622 llvm-svn: 302303	2017-05-05 22:04:05 +00:00
Matthias Braun	c1c5691686	Add missing target triple to test llvm-svn: 302301	2017-05-05 21:50:26 +00:00
Kannan Narayanan	5e73b04b84	[AMDGPU] In the new waitcnt insertion pass, use getHeader instead of getTopBlock to find the loop header. Differential Revision: https://reviews.llvm.org/D32831 llvm-svn: 302290	2017-05-05 21:10:17 +00:00
Matthias Braun	8940114f61	MIParser/MIRPrinter: Compute block successors if not explicitely specified - MIParser: If the successor list is not specified successors will be added based on basic block operands in the block and possible fallthrough. - MIRPrinter: Adds a new `simplify-mir` option, with that option set: Skip printing of block successor lists in cases where the parser is guaranteed to reconstruct it. This means we still print the list if some successor cannot be determined (happens for example for jump tables), if the successor order changes or branch probabilities being unequal. Differential Revision: https://reviews.llvm.org/D31262 llvm-svn: 302289	2017-05-05 21:09:30 +00:00
Konstantin Zhuravlyov	6ccb076aeb	AMDGPU/AMDHSA: Set COMPUTE_PGM_RSRC2:LDS_SIZE to 0 This field is populated by the CP Differential Revision: https://reviews.llvm.org/D32619 llvm-svn: 302277	2017-05-05 20:13:55 +00:00
Sam Clegg	03cdd1241f	[WebAssembly] Add ObjectYAML support for wasm name section Differential Revision: https://reviews.llvm.org/D32841 llvm-svn: 302266	2017-05-05 18:12:34 +00:00
Alexei Starovoitov	7bab73b1f8	[bpf] fix a bug which causes incorrect big endian reloc fixup o Add bpfeb support in BPF dwarfdump unit test case Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@fb.com> llvm-svn: 302265	2017-05-05 18:05:00 +00:00
Amaury Sechet	841b0907c4	Add more variations of addcarry in the tests. NFC. llvm-svn: 302252	2017-05-05 16:27:55 +00:00
Sanjay Patel	3bf1d6b763	[InstSimplify] fix copy-paste mistake in test comments; NFC llvm-svn: 302251	2017-05-05 16:24:58 +00:00
Sanjay Patel	34cd5e4307	[InstSimplify] add tests for (icmp X, C1 \| icmp X, C2); NFC These are the 'or' counterparts for the tests added with r300493. llvm-svn: 302248	2017-05-05 16:12:05 +00:00
Simon Pilgrim	3f8d8f5f43	[X86][SSE] Add 128/256/512 bit vector build vector from register tests llvm-svn: 302243	2017-05-05 15:36:31 +00:00
Aditya Kumar	1c42d135e1	[LoopIdiom] check for safety while expanding Loop Idiom recognition was generating memset in a case that would result generating a division operation to an unsafe location. Differential Revision: https://reviews.llvm.org/D32674 llvm-svn: 302238	2017-05-05 14:49:45 +00:00
Simon Pilgrim	ac3c4b6da4	[X86][AVX512] Improve support and testing for CTLZ of 512-bit vectors without CDI llvm-svn: 302233	2017-05-05 13:31:52 +00:00
Krzysztof Parzyszek	31d4b3b247	Remove stale live-ins in the branch folder Hoisting common code can cause registers that live-in in the successor blocks to no longer be live-in. The live-in information needs to be updated to reflect this, or otherwise incorrect code can be generated later on. Differential Revision: https://reviews.llvm.org/D32661 llvm-svn: 302228	2017-05-05 12:20:07 +00:00
John Brawn	1b74f8c51f	[ARM] Add support for ORR and ORN instruction substitutions Recently support was added for substituting one intruction for another by negating or inverting the immediate, but ORR and ORN were missed so this patch adds them. This one is slightly different to the others in that ORN only exists in thumb, so we only do the substitution in thumb. Differential Revision: https://reviews.llvm.org/D32534 llvm-svn: 302224	2017-05-05 11:31:25 +00:00
George Rimar	2122ff64c6	[llvm-dwarfdump] - Print an error message if section decompression failed. llvm-dwarfdump currently prints no message if decompression fails for some reason. I noticed that during work on one of LLD patches where LLD produced an broken output. It was a bit confusing to see no output for section dumped and no any error message at all. Patch adds error message for such cases. Differential revision: https://reviews.llvm.org/D32865 llvm-svn: 302221	2017-05-05 10:52:39 +00:00
Martin Storsjo	2b0fae877e	[ArgPromotion] Add a testcase for PR32917 Differential Revision: https://reviews.llvm.org/D32882 llvm-svn: 302216	2017-05-05 08:40:24 +00:00
Dehao Chen	a75d0da91b	Update VP prof metadata during inlining. Summary: r298270 added profile update logic for branch_weights. This patch implements profile update logic for VP prof metadata too. Reviewers: eraman, tejohnson, davidxl Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32773 llvm-svn: 302209	2017-05-05 00:47:34 +00:00
Evgeniy Stepanov	9aff829f78	Remap metadata attached to global variables. Fix for PR32577. Global variables may have !associated metadata, which includes a reference to another global. It needs remapping. llvm-svn: 302203	2017-05-04 23:29:39 +00:00
Marek Olsak	584d2c05d4	AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200	2017-05-04 22:25:20 +00:00
Aditya Nandakumar	117b667bd9	[GISel]: Add support to translate ConstantVectors Reviewed by Quentin https://reviews.llvm.org/D32814 llvm-svn: 302196	2017-05-04 21:43:12 +00:00
Sanjay Patel	e42b4d566e	[InstSimplify] add folds for or-of-casted-icmps The sibling folds for 'and' with casts were added with https://reviews.llvm.org/rL273200. This is a preliminary step for adding the 'or' variants for the folds added with https://reviews.llvm.org/rL301260. The reason for the strange form with constant LHS in the 1st test is because there's another missing fold in that case for the inverted predicate. That should be fixed when we add the ConstantRange functionality for 'or-of-icmps' that already exists for 'and-of-icmps'. I'm hoping to share more code for the and/or cases, so we won't have these differences. This will allow us to remove code from InstCombine. It's also possible that we can remove some code here in InstSimplify. I think we have some duplicated folds because patterns are not matched in a general way. Differential Revision: https://reviews.llvm.org/D32876 llvm-svn: 302189	2017-05-04 19:51:34 +00:00
Sam Clegg	fc5b5cd29e	[WebAssembly] Add wasm symbol table support to llvm-objdump Differential Revision: https://reviews.llvm.org/D32760 llvm-svn: 302185	2017-05-04 19:32:43 +00:00
Krzysztof Parzyszek	038a0546db	[PPC] When restoring R30 (PIC base pointer), mark it as <def> This happened on the PPC32/SVR4 path and was discovered when building FreeBSD on PPC32. It was a typo-class error in the frame lowering code. This fixes PR26519. llvm-svn: 302183	2017-05-04 19:14:54 +00:00
Reid Kleckner	6d2ea6ec80	[ms-inline-asm] Use the frontend size only for ambiguous instructions This avoids problems on code like this: char buf[16]; __asm { movups xmm0, [buf] mov [buf], eax } The frontend size in this case (1) is wrong, and the register makes the instruction matching unambiguous. There are also enough bytes available that we shouldn't complain to the user that they are potentially using an incorrectly sized instruction to access the variable. Supersedes D32636 and D26586 and fixes PR28266 llvm-svn: 302179	2017-05-04 18:19:52 +00:00
Sanjay Patel	500e5122d3	[InstSimplify] add tests for or-of-casted-icmps; NFC llvm-svn: 302174	2017-05-04 17:36:53 +00:00
Easwaran Raman	5e6f9bd4f8	[PM] Add ProfileSummaryAnalysis as a required pass in the new pipeline. Differential revision: https://reviews.llvm.org/D32768 llvm-svn: 302170	2017-05-04 16:58:45 +00:00
Adrian Prantl	ba701469a9	Add accidentally deleted testcase back. llvm-svn: 302167	2017-05-04 16:26:07 +00:00
Adrian Prantl	defc99a94e	Cleanup tests to not share a DISubprogram between multiple Functions. rdar://problem/31926379 llvm-svn: 302166	2017-05-04 16:24:31 +00:00
Chad Rosier	84a238dd62	[DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)). Differential Revision: http://reviews.llvm.org/D32596 llvm-svn: 302153	2017-05-04 14:14:44 +00:00
Simon Pilgrim	66af84bfc0	[X86][AVX512] Fix VPABSD file checks Fix capitalization and string matching llvm-svn: 302150	2017-05-04 13:42:57 +00:00
Simon Pilgrim	960a8e71e0	[X86][SSE] Add i686 triple tests for partial vector and re-association llvm-svn: 302149	2017-05-04 13:35:40 +00:00
Jonas Paulsson	4fd156261e	[SystemZ] Make copyPhysReg() add impl-use operands of super reg. When a 128 bit COPY is lowered into two instructions, an impl-use operand of the super-reg should be added to each new instruction in case one of the sub-regs is undefined. Review: Ulrich Weigand llvm-svn: 302146	2017-05-04 13:33:30 +00:00
Simon Pilgrim	5127dbbb23	[X86][SSE] Add i686 triple tests for PBLENDW commutation llvm-svn: 302145	2017-05-04 13:08:09 +00:00
Simon Pilgrim	fbaaf25739	[X86][AVX1] Regenerate checks and add i686 triple tests for folded logical ops llvm-svn: 302144	2017-05-04 13:00:30 +00:00
Michael Zuckerman	763e60e1f8	[LLVM][inline-asm][Altmacor] Altmacro string delimiter '<..>' In this patch, I introduce a new altmacro string delimiter. This review is the second review in a series of four reviews. (one for each altmacro feature: LOCAL, string delimiter, string '!' escape sign and absolute expression as a string '%' ). In the alternate macro mode, you can delimit strings with matching angle brackets <..> when using it as a part of calling macro arguments. As described in the https://sourceware.org/binutils/docs-2.27/as/Altmacro.html "<string> You can delimit strings with matching angle brackets." assumptions: 1. If an argument begins with '<' and ends with '>'. The argument is considered as a string. 2. Except adding new string mark '<..>', a regular macro behavior is expected. 3. The altmacro cannot affect the regular less/greater behavior. 4. If a comma is present inside an angle brackets it considered as a character and not as a separator. Differential Revision: https://reviews.llvm.org/D32701 llvm-svn: 302135	2017-05-04 10:37:00 +00:00
Igor Breger	70583606b1	[X86][AVX-512] Allow EVEX encoded instruction selection when available for mul v8i32. Differential Revision: https://reviews.llvm.org/D32679 llvm-svn: 302127	2017-05-04 07:34:58 +00:00
Sam Parker	df337704f0	[ARM] ACLE Chapter 9 intrinsics Added the integer data processing intrinsics from ACLE v2.1 Chapter 9 but I have missed out the saturation_occurred intrinsics for now. For the instructions that read and write the GE bits, a chain is included and the only instruction that reads these flags (sel) is only selectable via the implemented intrinsic. Differential Revision: https://reviews.llvm.org/D32281 llvm-svn: 302126	2017-05-04 07:31:28 +00:00
Oren Ben Simhon	51de0330eb	[X86] Disabling PLT in Regcall CC Functions According to psABI, PLT stub clobbers XMM8-XMM15. In Regcall calling convention those registers are used for passing parameters. Thus we need to prevent lazy binding in Regcall. Differential Revision: https://reviews.llvm.org/D32430 llvm-svn: 302124	2017-05-04 07:22:49 +00:00
Igor Breger	0d5949e366	[AVX-512VL] Autogenerate checks. Add --show-mc-encoding to check instruction predicate. llvm-svn: 302123	2017-05-04 06:53:31 +00:00
Craig Topper	d4d09fd73d	[SelectionDAG] Improve known bits support for CTPOP. This is based on the same concept from ValueTracking's version of computeKnownBits. llvm-svn: 302110	2017-05-04 04:33:27 +00:00
Dean Michael Berris	bdfe90050b	[XRay] Create an Index of sleds per function Summary: This change adds a new section to the xray-instrumented binary that stores an index into ranges of the instrumentation map, where sleds associated with the same function can be accessed as an array. At runtime, we can get access to this index by function ID offset allowing for selective patching and unpatching by function ID. Each entry in this new section (xray_fn_idx) will include two pointers indicating the start and one past the end of the sleds associated with the same function. These entries will be 16 bytes long on x86 and aarch64. On arm, we align to 16 bytes anyway so the runtime has to take that into consideration. __{start,stop}_xray_fn_idx will be the symbols that the runtime will look for when we implement the selective patching/unpatching by function id APIs. Because XRay synthesizes the function id's in a monotonically increasing manner at runtime now, implementations (and users) can use this table to look up the sleds associated with a specific function. This is useful in implementations that want to do things like: - Implement coverage mode for functions by patching everything pre-main, then as functions are encountered, the installed handler can unpatch the function that's been encountered after recording that it's been called. - Do "learning mode", so that the implementation can figure out some statistical information about function calls by function id for a time being, and then determine which functions are worth uninstrumenting at runtime. - Do "selective instrumentation" where an implementation can specifically instrument only certain function id's at runtime (either based on some external data, or through some other heuristics) instead of patching all the instrumented functions at runtime. Reviewers: dblaikie, echristo, chandlerc, javed.absar Subscribers: pelikan, aemerson, kpw, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D32693 llvm-svn: 302109	2017-05-04 03:37:57 +00:00
Dean Michael Berris	22f2bcf4b9	[XRay] Detect loops in functions being lowered Summary: This is an implementation of the loop detection logic that XRay needs to determine whether a function might take time at runtime. Without this heuristic, XRay will tend to not instrument short functions that have loops that might have runtime dependent on inputs or external values. While this implementation doesn't do any further analysis than just figuring out whether there is a loop in the MachineFunction being code-gen'ed, we're paving the way for being able to perform more sophisticated analysis of the function in the future (for example to determine whether the trip count for the loop might be constant, and make a decision on that instead). This enables us to cover more functions with the default heuristics, and potentially identify ones that have variable runtime latency just by looking for the presence of loops. Reviewers: chandlerc, rnk, pelikan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32274 llvm-svn: 302103	2017-05-04 01:24:26 +00:00
Michael Zolotukhin	37162adf3e	[SCEV] createAddRecFromPHI: Optimize for the most common case. Summary: The existing implementation creates a symbolic SCEV expression every time we analyze a phi node and then has to remove it, when the analysis is finished. This is very expensive, and in most of the cases it's also unnecessary. According to the data I collected, ~60-70% of analyzed phi nodes (measured on SPEC) have the following form: PN = phi(Start, OP(Self, Constant)) Handling such cases separately significantly speeds this up. Reviewers: sanjoy, pete Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32663 llvm-svn: 302096	2017-05-03 23:53:38 +00:00
Matthias Braun	739a7b2f9c	strlen-1.ll: Fix test Change test for `strlen(x) == 0 --> *x == 0` to actually test the pattern. llvm-svn: 302094	2017-05-03 23:32:51 +00:00
Craig Topper	cff357c322	[InstCombine][KnownBits] Use KnownBits better to detect nsw adds Change checkRippleForAdd from a heuristic to a full check - if it is provable that the add does not overflow return true, otherwise false. Patch by Yoav Ben-Shalom Differential Revision: https://reviews.llvm.org/D32686 llvm-svn: 302093	2017-05-03 23:22:46 +00:00
Reid Kleckner	5c0bdef5aa	Mark functions as not having CFI once we finalize an x86 stack frame We'll set it back to true in emitPrologue if it gets called. It doesn't get called for naked functions. Fixes PR32912 llvm-svn: 302092	2017-05-03 23:13:42 +00:00
Saleem Abdulrasool	87f033885e	DebugInfo: elide type index entries for synthetic types Compiler emitted synthetic types may not have an associated DIFile (translation unit). In such a case, when generating CodeView debug type information, we would attempt to compute an absolute filepath which would result in a segfault due to a NULL DIFile*. If there is no source file associated with the type, elide the type index entry for the type and record the type information. This actually results in higher fidelity debug information than clang/C2 as of this writing. Resolves PR32668! llvm-svn: 302085	2017-05-03 21:39:01 +00:00
Ahmed Bougacha	a1991bdde2	[AArch64] armv8-A doesn't have CRC. That's only a required extension as of v8.1a. Remove it from the "generic" CPU as well: it should only support the base ISA (and binutils agrees). Also unify the MC tests into crc.s and arm64-crc32.s llvm-svn: 302077	2017-05-03 20:33:52 +00:00
Krzysztof Parzyszek	2af5037d34	[Hexagon] Use automatically-generated scheduling information for HVX Patch by Jyotsna Verma. llvm-svn: 302073	2017-05-03 20:10:36 +00:00
Alexei Starovoitov	4198f2a702	[bpf] add relocation support . there should be no runtime relocation inside the bpf function. . relocation supported here mostly for debugging. . a test case is added. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 302055	2017-05-03 17:30:56 +00:00
Tim Northover	761bcdaf06	ARM: add extra test for addrmode folding. I was worried we might replace a mul with a mul+shift even if there were later uses. Turns out to be unfounded but I'd just as well add an actual test for it. llvm-svn: 302051	2017-05-03 16:54:30 +00:00
Simon Pilgrim	03ccf91d85	[X86][LWP] Add stack folding mappings and tests for LWPINS/LWPVAL instructions llvm-svn: 302049	2017-05-03 16:46:30 +00:00
Amaury Sechet	666c705953	[DAGCombine] (addcarry (add\|uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) Summary: Do the transform when the carry isn't used. It's a pattern exposed when legalizing large integers. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32755 llvm-svn: 302047	2017-05-03 16:28:10 +00:00
Simon Pilgrim	99b925bdf3	[X86][LWP] Add llvm support for LWP instructions (reapplied). This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041	2017-05-03 15:51:39 +00:00
Simon Pilgrim	a271c54324	Revert rL302028 due to accidental line ending changes. llvm-svn: 302038	2017-05-03 15:42:29 +00:00
Krzysztof Parzyszek	4763c2d999	[Hexagon] Adjust latency between allocframe and the first store on stack Allocframe and the following stores on the stack have a latency of 2 cycles when not in the same packet. This happens because R29 is needed early by the store instruction. Since one of such stores can be packetized along with allocframe and use old value of R29, we can assign it 0 cycle latency while leaving latency of other stores to the default value of 2 cycles. Patch by Jyotsna Verma. llvm-svn: 302034	2017-05-03 15:33:09 +00:00
Simon Pilgrim	b2e0464fde	[X86][LWP] Add llvm support for LWP instructions. This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028	2017-05-03 15:18:34 +00:00
Oren Ben Simhon	dbd4bba1ec	[X86] Support of no_caller_saved_registers attribute This patch implements the LLVM part for no_caller_saved_registers attribute as appears here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5ed3cc7b66af4758f7849ed6f65f4365be8223be. In order to implement the attribute, we use the dynamic CSR mechanism to remove returned/passed arguments from the function regmask/CSR list. Differential Revision: https://reviews.llvm.org/D31876 llvm-svn: 302020	2017-05-03 13:07:19 +00:00
Elad Cohen	ef5798acf5	Support arbitrary address space pointers in masked gather/scatter intrinsics. Fixes PR31789 - When loop-vectorize tries to use these intrinsics for a non-default address space pointer we fail with a "Calling a function with a bad singature!" assertion. This patch solves this by adding the 'vector of pointers' argument as an overloaded type which will determine the address space. Differential revision: https://reviews.llvm.org/D31490 llvm-svn: 302018	2017-05-03 12:28:54 +00:00
Dylan McKay	4aedb8a6b7	[AVR] Reserve the Y register in all functions llvm-svn: 302017	2017-05-03 11:56:01 +00:00
Anna Thomas	53c8d95c85	[Loop Deletion] Delete loops that are never executed Summary: Currently, loop deletion deletes loop where the only values that are used outside the loop are loop-invariant. This patch adds logic to delete loops where the loop is proven to be never executed (i.e. the only predecessor of the loop preheader has a constant conditional branch as terminator, and the preheader is not the taken target). This will remove loops that become dead after loop-unswitching generates constant conditional branches. The next steps are: 1. moving the loop deletion implementation to LoopUtils. 2. Add logic in loop-simplifyCFG which will support changing conditional constant branches to unconditional branches. If loops become unreachable in this process, they can be removed using `deleteDeadLoop` function. Reviewers: chandlerc, efriedma, sanjoy, reames Reviewed by: sanjoy Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32494 llvm-svn: 302015	2017-05-03 11:47:11 +00:00
Alex Lorenz	c748d7b57b	[Triple] Add a "macos" OS type that acts as a synonym for "macosx" The "macosx" OS type is still the canonical type. In the future "macos" will become the canonical OS type (but we will still support "macosx"). rdar://27043820 Differential Revision: https://reviews.llvm.org/D32748 llvm-svn: 302011	2017-05-03 10:42:35 +00:00
Matt Arsenault	6a288c1e32	Replace hardcoded intrinsic list with speculatable attribute. No change in which intrinsics should be speculated. llvm-svn: 301995	2017-05-03 02:26:10 +00:00
Peter Collingbourne	e95901caa4	Revert r295861, "[ModuleSummaryAnalysis] Don't crash when referencing unnamed globals." We should always expect values to be named before running the module summary analysis (see NameAnonGlobals pass), so it's fine if we crash in that case. llvm-svn: 301991	2017-05-03 00:18:48 +00:00
Tim Shen	e59d06fe78	[PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single instruction Summary: This is the corresponding llvm change to D28037 to ensure no performance regression. Reviewers: bogner, kbarton, hfinkel, iteratee, echristo Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28329 llvm-svn: 301990	2017-05-03 00:07:02 +00:00
Tim Northover	4a01ffbd6a	ARM: avoid handing a deleted node back to TableGen during ISel. When we replaced the multiplicand the destination node might already exist. When that happens the original gets CSEd and deleted. However, it's actually used as the offset so nonsense is produced. Should fix PR32726. llvm-svn: 301983	2017-05-02 22:45:19 +00:00
Joel Jones	6513405735	[AArch64] ILP32 Backend Relocation Support Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and TLSDESC_ADD_LO12 relocations Rearrange ordering in AArch64.def to follow relocation encoding Fix name: R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC Add support for several "TLS", "TLSGD", and "TLSLD" relocations for ILP32 Fix return values from isNonILP32reloc Add implementations for R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC, R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC, R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC, TLSLD_LDST128_DTPREL_LO12, TLSLD_LDST128_DTPREL_LO12_NC, TLSLE_LDST128_TPREL_LO12, TLSLE_LDST128_TPREL_LO12_NC Modify error messages to give name of equivalent relocation in the ABI not being used, along with better checking for non-existent requested relocations. Added assembler support for "pg_hi21_nc" Relocation definitions added without implementations: R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21, R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19, R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL, R_AARCH64_P32_TLSDESC Fix encoding: R_AARCH64_P32_TLSDESC_ADR_PAGE21 Reviewers: Peter Smith Patch by: Joel Jones (jjones@cavium.com) Differential Revision: https://reviews.llvm.org/D32072 llvm-svn: 301980	2017-05-02 22:01:48 +00:00
Paul Robinson	2bc3873fe6	[DWARFv5] Parse new line-table header format. The directory and file tables now have form-based content descriptors. Parse these and extract the per-directory/file records based on the descriptors. For now we support only DW_FORM_string (inline) for the path names; follow-up work will add support for indirect forms (i.e., DW_FORM_strp, strx<N>, and line_strp). Differential Revision: http://reviews.llvm.org/D32713 llvm-svn: 301978	2017-05-02 21:40:47 +00:00
Tim Northover	f9d8eee3db	ARM: add arm1176j-f processor I doubt anyone actually uses it, and I'm not even entirely convinced it exists myself; but it is our default for "clang -arch armv6". Functionally, if it does exist it's identical to the arm1176jz-f from LLVM's point of view (the difference is apparently in the "Security Extensions"). llvm-svn: 301962	2017-05-02 19:06:13 +00:00
Xinliang David Li	ab8722f80a	[PartialInlining] Add more early filtering This is a follow up to the previous inline cost patch for quicker filtering. llvm-svn: 301959	2017-05-02 18:43:21 +00:00
Matt Arsenault	5c80618fb7	AMDGPU: Don't promote alloca to LDS for leaf functions LDS use in leaf functions not currently handled. llvm-svn: 301958	2017-05-02 18:33:18 +00:00
Krzysztof Parzyszek	57a8bb4343	[Hexagon] Change iconst to emit 27bit relocation Patch by Colin LeMahieu. llvm-svn: 301956	2017-05-02 18:19:11 +00:00
Krzysztof Parzyszek	a750383d0f	[Hexagon] Add extenders for GD_PLT_B22_PCREL and LD_PLT_B22_PCREL Patch by Sid Manning. llvm-svn: 301955	2017-05-02 18:15:33 +00:00
Krzysztof Parzyszek	9aaf923376	[Hexagon] Don't ignore mult-cycle latency information The compiler was generating code that ends up ignoring a multiple latency dependence between two instructions by scheduling the intructions in back-to-back packets. The packetizer needs to end a packet if the latency of the current current insruction and the source in the previous packet is greater than 1 cycle. This case occurs when there is still room in the current packet, but scheduling the instruction causes a stall. Instead, the packetizer should start a new packet. Also, if the current packet already contains a stall, then it is okay to add another instruction to the packet that also causes a stall. This occurs when there are no instructions that can be scheduled in between the producer and consumer instructions. This patch changes the latency for loads to 2 cycles from 3 cycles. This change refects that a load only needs to be separated by one extra packet to eliminate the stall. Patch by Ikhlas Ajbar. llvm-svn: 301954	2017-05-02 18:12:19 +00:00
Krzysztof Parzyszek	b0af1ef741	[Hexagon] Make sure duplexed dealloc_returns are checked for double jumps Patch by Colin LeMahieu. llvm-svn: 301951	2017-05-02 18:03:08 +00:00
Matt Arsenault	9ac7d6be3c	SpeculativeExecution: Stop using whitelist for costs Just let TTI's cost do this instead of arbitrarily restricting this. llvm-svn: 301950	2017-05-02 18:02:18 +00:00
Krzysztof Parzyszek	49f7e0a98b	[Hexagon] Move checking AXOK to checker Patch by Colin LeMahieu. llvm-svn: 301949	2017-05-02 18:00:37 +00:00
Krzysztof Parzyszek	c15f8d2a08	[Hexagon] Extract function that checks endloops with other branches Change location number to point to conflicting branch instruction. Patch by Colin LeMahieu. llvm-svn: 301946	2017-05-02 17:56:11 +00:00
Zachary Turner	a0aae2757d	Revert "Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and" This reverts commit c08155afc5d3230792da2ad30a046a8617735a73. This is causing undefined symbol errors with some of the constants. llvm-svn: 301944	2017-05-02 17:51:27 +00:00
Joel Jones	705103e523	Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and TLSDESC_ADD_LO12 relocations Rearrange ordering in AArch64.def to follow relocation encoding Fix name: R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC Add support for several "TLS", "TLSGD", and "TLSLD" relocations for ILP32 Fix return values from isNonILP32reloc Add implementations for R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC, R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC, R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC, TLSLD_LDST128_DTPREL_LO12, TLSLD_LDST128_DTPREL_LO12_NC, TLSLE_LDST128_TPREL_LO12, TLSLE_LDST128_TPREL_LO12_NC Modify error messages to give name of equivalent relocation in the ABI not being used, along with better checking for non-existent requested relocations. Added assembler support for "pg_hi21_nc" Relocation definitions added without implementations: R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21, R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21, R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19, R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL, R_AARCH64_P32_TLSDESC Fix encoding: R_AARCH64_P32_TLSDESC_ADR_PAGE21 Reviewers: Peter Smith Patch by: Joel Jones (jjones@cavium.com) Differential Revision: https://reviews.llvm.org/D32072 llvm-svn: 301939	2017-05-02 17:14:31 +00:00
Matt Arsenault	7b82b4bddb	AMDGPU: Make intrinsics speculatable llvm-svn: 301937	2017-05-02 16:57:44 +00:00
Zachary Turner	edef14510e	[PDB/CodeView] Read/write codeview inlinee line information. Previously we wrote line information and file checksum information, but we did not write information about inlinee lines and functions. This patch adds support for that. llvm-svn: 301936	2017-05-02 16:56:09 +00:00
Amaury Sechet	3847996d74	Add new test case for addcarry. NFC. llvm-svn: 301932	2017-05-02 16:07:32 +00:00
Marek Olsak	a302a736ec	AMDGPU: Add AMDGPU_HS calling convention Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930	2017-05-02 15:41:10 +00:00
Sanjay Patel	6381db18fe	[InstCombine] don't use DeMorgan's Law on integer constants (2nd try) This was originally checked in here: https://reviews.llvm.org/rL301923 And reverted here: https://reviews.llvm.org/rL301924 Because there's a clang test that would fail after this. I fixed/removed the offending CHECK lines in: https://reviews.llvm.org/rL301928 So let's try this again. Original commit message: This is the fold that causes the infinite loop in BoringSSL (https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c) when we fix instcombine demanded bits to prefer 'not' ops as in https://reviews.llvm.org/D32255. There are 2 or 3 problems with dyn_castNotVal, and I don't think we can reinstate https://reviews.llvm.org/D32255 until dyn_castNotVal is completely eliminated. 1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot. 2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors. 3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X \| ~Y) That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X \| ~ Y) --> ~(X & Y) Differential Revision: https://reviews.llvm.org/D32665 llvm-svn: 301929	2017-05-02 15:31:40 +00:00
Sanjay Patel	da0b4deafa	revert r301923 : [InstCombine] don't use DeMorgan's Law on integer constants There's a clang test that is wrongly using -O1 and failing after this commit. llvm-svn: 301924	2017-05-02 14:48:23 +00:00
Sanjay Patel	096a981982	[InstCombine] don't use DeMorgan's Law on integer constants This is the fold that causes the infinite loop in BoringSSL (https://github.com/google/boringssl/blob/master/crypto/cipher/e_rc2.c) when we fix instcombine demanded bits to prefer 'not' ops as in D32255. There are 2 or 3 problems with dyn_castNotVal, and I don't think we can reinstate D32255 until dyn_castNotVal is completely eliminated. 1. As shown here, it transforms 'not' into random xor. This transform is harmful to SCEV and codegen because 'not' can often be folded while random xor cannot. 2. It does not transform vector constants. This is actually a good thing, but if you don't believe the above argument, then we shouldn't have excluded vectors. 3. It tries to avoid transforming not(not(X)). That's nice, but it doesn't match the greedy nature of instcombine. If we DeMorganize a pattern that has an extra 'not' in it: ~(~(~X) & Y) --> (~X \| ~Y) That's just another case of DeMorgan, so we should trust that we'll fold that pattern too: (~X \| ~ Y) --> ~(X & Y) Differential Revision: https://reviews.llvm.org/D32665 llvm-svn: 301923	2017-05-02 14:31:30 +00:00
Amaury Sechet	106a7eab84	[DAGCombine] (uaddo X, (addcarry Y, 0, Carry)) -> (addcarry X, Y, Carry) Summary: This is a common pattern that arise when legalizing large integers operations. Only do it when Y + 1 cannot overflow as this would change the carry behavior of uaddo . Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32687 llvm-svn: 301922	2017-05-02 14:15:48 +00:00
Rafael Espindola	3ba2573744	Add llvm::object::getELFSectionTypeName(). This is motivated by https://reviews.llvm.org/D32488 where I am trying to add printing of the section type for incompatible sections to LLD error messages. This patch allows us to use the same code in llvm-readobj and LLD instead of duplicating the function inside LLD. Patch by Alexander Richardson! llvm-svn: 301921	2017-05-02 14:04:52 +00:00
Amaury Sechet	153911f71d	[DAGCombine] (add X, (addcarry Y, 0, Carry)) -> (addcarry X, Y, Carry) Summary: Common pattern when legalizing large integers operations. Similar to D32687, when the carry isn't used. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Differential Revision: https://reviews.llvm.org/D32738 llvm-svn: 301919	2017-05-02 13:34:25 +00:00
Simon Pilgrim	6615bde93b	[X86][SSE] Add test for PR30264 (combining multiple constants inputs in a shuffle) llvm-svn: 301915	2017-05-02 12:25:17 +00:00
Simon Pilgrim	89ad89cc73	[SelectionDAG] Improve support for promotion of <1 x fX> floating point argument types (PR31088) PR31088 demonstrated that we were assuming that only integers require promotion from <1 x iX> types, when in fact float types may require it as well - in this case half floats. This patch adds support for extension/truncation for both integer and float types. Differential Revision: https://reviews.llvm.org/D32391 llvm-svn: 301910	2017-05-02 10:33:08 +00:00
Simon Pilgrim	8deb87a6c0	[DAGCombiner] Improve MatchBswapHword logic (PR31357) The existing code only looks at half of the tree when matching bswap + rol patterns ending in an OR tree (as opposed to a cascade). Patch originally introduced by Jim Lewis. Submitted on the behalf of Dinar Temirbulatov. Differential Revision: https://reviews.llvm.org/D32039 llvm-svn: 301907	2017-05-02 10:16:19 +00:00
Xinliang David Li	6133846be1	[PartialInlining] Hook up inline cost analysis Differential Revision: http://reviews.llvm.org/D32666 llvm-svn: 301894	2017-05-02 02:44:14 +00:00
Dylan McKay	28355efdad	[AVR] Save/restore the frame pointer for all functions A recent commit I made made it so that we only did this for signal or interrupt handlers. This broke normal functions. llvm-svn: 301893	2017-05-02 01:57:48 +00:00
Nemanja Ivanovic	b89c27f515	[PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE Fixes PR30730. This is a re-commit of a pulled commit. The commit was pulled because some software projects contained uses of Altivec vectors that violated alignment requirements. Known issues have now been fixed. Committing on behalf of Lei Huang. Differential Revision: https://reviews.llvm.org/D26861 llvm-svn: 301892	2017-05-02 01:47:34 +00:00
Ahmed Bougacha	899a75cefe	[AArch64] armv8-A doesn't have LSE. r288279 mistakenly added it to all arches, but it's only available from v8.1 onwards. The testcase is awkward, because (I suspect) of PR32873. Spotted by inspection. llvm-svn: 301890	2017-05-02 00:45:01 +00:00
George Burgess IV	7bc507a2e8	Revert r301880 This change caused buildbot failures, apparently because we're not passing around types that InstSimplify is used to seeing. I'm not overly familiar with InstSimplify, so I'm reverting this until I can figure out what exactly is wrong. llvm-svn: 301885	2017-05-01 23:54:41 +00:00
Zachary Turner	8a2ebfb1cd	[CodeView] Write CodeView line information. Differential Revision: https://reviews.llvm.org/D32716 llvm-svn: 301882	2017-05-01 23:27:42 +00:00
George Burgess IV	6935aefdf0	[InstSimplify] Handle selects of GEPs with 0 offset In particular (since it wouldn't fit nicely in the summary): (select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V) Differential Revision: https://reviews.llvm.org/D31435 llvm-svn: 301880	2017-05-01 23:12:08 +00:00
Matthias Braun	ab9438cb03	MachineFrameInfo: Track whether MaxCallFrameSize is computed yet; NFC This tracks whether MaxCallFrameSize is computed yet. Ideally we would assert and fail when the value is queried before it is computed, however this fails various targets that need to be fixed first. Differential Revision: https://reviews.llvm.org/D32570 llvm-svn: 301851	2017-05-01 22:32:25 +00:00
Davide Italiano	2dfd46bf08	[NewGVN] Don't derive incorrect implications. In the testcase attached, we believe %tmp1 implies %tmp4. where: br i1 %tmp1, label %bb2, label %bb7 br i1 %tmp4, label %bb5, label %bb7 because Wwhile looking at PredicateInfo stuffs we end up calling isImpliedTrueByMatchingCmp() with the arguments backwards. Differential Revision: https://reviews.llvm.org/D32718 llvm-svn: 301849	2017-05-01 22:26:28 +00:00
Sanjay Patel	59d0aeaafe	[InstCombine] check one-use before applying DeMorgan nor/nand folds If we have ~(~X & Y), it only makes sense to transform it to (X \| ~Y) when we do not need the intermediate (~X & Y) value. In that case, we would need an extra instruction to generate ~Y + 'or' (as shown in the test changes). It's ok if we have multiple uses of ~X or Y, however. In those cases, we may not reduce the instruction count or critical path, but we might improve throughput because we can generate ~X and ~Y in parallel. Whether that actually makes perf sense or not for a target is something we can't answer in IR. Differential Revision: https://reviews.llvm.org/D32703 llvm-svn: 301848	2017-05-01 22:25:42 +00:00
Peter Collingbourne	c15d60b772	Object: Remove ModuleSummaryIndexObjectFile class. Differential Revision: https://reviews.llvm.org/D32195 llvm-svn: 301832	2017-05-01 20:42:32 +00:00
Krzysztof Parzyszek	55db483a46	[Hexagon] Improving error reporting for writing to read only registers Patch by Colin LeMahieu. llvm-svn: 301828	2017-05-01 20:10:41 +00:00
Krzysztof Parzyszek	e96d27a997	[Hexagon] Give better error messages for solo instruction errors Patch by Colin LeMahieu. llvm-svn: 301827	2017-05-01 20:06:01 +00:00
Xin Tong	a4b9b9f42a	Take indirect branch into account as well when folding. We may not be able to rewrite indirect branch target, but we also want to take it into account when folding, i.e. if it and all its successor's predecessors go to the same destination, we can fold, i.e. no need to thread. llvm-svn: 301816	2017-05-01 17:15:37 +00:00
Sanjoy Das	ddebb703fc	Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts In cases where an instruction (a call site, say) is RAUW'ed with some other value (this is possible via the `returned` attribute, for instance), we want the slot in UnknownInsts to point to the original Instruction we wanted to track, not the value it got replaced by. Fixes PR32587. This relands r301426. llvm-svn: 301814	2017-05-01 17:07:56 +00:00
Craig Topper	6b1b630a98	[SelectionDAG] Use known ones to provide a better bound for the known zeros for CTTZ/CTLZ operations. This is the SelectionDAG version of D32521. If know where at least one 1 is located in the input to these intrinsics we can place an upper bound on the number of bits needed to represent the count and thus increase the number of known zeros in the output. I think we can also refine this further for CTTZ_UNDEF/CTLZ_UNDEF by assuming that the answer will never be BitWidth. I've left this out for now because it caused other test failures across multiple targets. Usually because of turning ADD into OR based on this new information. I'll fix CTPOP in a future patch. Differential Revision: https://reviews.llvm.org/D32692 llvm-svn: 301806	2017-05-01 16:08:06 +00:00
Xin Tong	21f8ac235e	[JumpThread] Do RAUW in case Cond folds to a constant in the CFG Summary: [JumpThread] Do RAUW in case Cond folds to a constant in the CFG Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32407 llvm-svn: 301804	2017-05-01 15:34:17 +00:00
Sanjay Patel	d2f13b62d9	[InstCombine] add multi-use variants for DeMorgan folds; NFC llvm-svn: 301802	2017-05-01 14:52:17 +00:00
Sanjay Patel	c526fbcfa9	[InstCombine] use FileCheck and auto-generate checks; NFC llvm-svn: 301801	2017-05-01 14:20:30 +00:00
Sanjay Patel	4e312203af	[InstCombine] consolidate more DeMorgan tests; NFC llvm-svn: 301800	2017-05-01 14:10:59 +00:00
Michael Zuckerman	da4b52e4bf	Fix test for altmacro llvm-svn: 301799	2017-05-01 14:00:54 +00:00
Michael Zuckerman	56704618aa	[LLVM][inline-asm] Altmacro absolute expression '%' feature In this patch, I introduce a new alt macro feature. This feature adds meaning for the % when using it as a prefix to the calling macro arguments. In the altmacro mode, the percent sign '%' before an absolute expression convert the expression first to a string. As described in the https://sourceware.org/binutils/docs-2.27/as/Altmacro.html "Expression results as strings You can write `%expr' to evaluate the expression expr and use the result as a string." expression assumptions: 1. '%' can only evaluate an absolute expression. 2. Altmacro '%' must be the first character of the evaluated expression. 3. If no '%' is located before the expression, a regular module operation is expected. 4. The result of Absolute Expressions can be only integer. Differential Revision: https://reviews.llvm.org/D32526 llvm-svn: 301797	2017-05-01 13:20:12 +00:00
Dylan McKay	59e7fe3da8	[AVR] Implement non-constant bit rotations This lets us do bit rotations of variable amount. llvm-svn: 301794	2017-05-01 09:48:55 +00:00
Igor Breger	4064dc76c5	[GlobalISel][X86] rename test file. NFC. llvm-svn: 301793	2017-05-01 08:11:02 +00:00
Craig Topper	c8b5693948	[X86] Add tests for opportunities to improve known bits for CTTZ and CTLZ. llvm-svn: 301791	2017-05-01 06:33:17 +00:00
Igor Breger	c08a783521	[GlobalISel][X86] G_SEXT/G_ZEXT support. Reviewers: zvi, guyblank Reviewed By: zvi Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32591 llvm-svn: 301790	2017-05-01 06:30:16 +00:00
Igor Breger	a9edb88d46	[GlobalISel][X86] G_LOAD/G_STORE pointer selection support. Summary: [GlobalISel][X86] G_LOAD/G_STORE pointer selection support. Reviewers: zvi, guyblank Reviewed By: zvi, guyblank Subscribers: dberris, rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D32217 llvm-svn: 301788	2017-05-01 06:08:32 +00:00
Dylan McKay	2e8718bcbb	[AVR] Fix a bug so that we now emit R_AVR_16 fixups with the correct offset Before this, the LDS/STS instructions would have their opcodes overwritten while linking. llvm-svn: 301782	2017-04-30 23:33:52 +00:00
Sanjay Patel	ad13826aea	[DAGCombiner] shrink/widen a vselect to match its condition operand size (PR14657) We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check legality of the select that we want to produce. A few things to note: 1. We can't wait until after legalization and do this generically because (at least in the x86 tests from PR14657), we'll have PACKSS and bitcasts in the pattern. 2. This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but that requires a closer look to make sure we don't end up worse. 3. There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases. That should be fixed next. 4. I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple legal vector sizes, but if there are other targets like that, we should add more tests. 5. There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests): despite IR that is terrible for the target, this patch allows us to generate the optimal loop code because something post-ISEL is hoisting the splat extends above the vector loops. Differential Revision: https://reviews.llvm.org/D32620 llvm-svn: 301781	2017-04-30 22:44:51 +00:00
Sanjoy Das	08989c7ecd	Rename isKnownNotFullPoison to programUndefinedIfPoison; NFC Summary: programUndefinedIfPoison makes more sense, given what the function does; and I'm about to add a function with a name similar to isKnownNotFullPoison (so do the rename to avoid confusion). Reviewers: broune, majnemer, bjarke.roune Reviewed By: broune Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30444 llvm-svn: 301776	2017-04-30 19:41:19 +00:00
Amaury Sechet	8ac81f3924	Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29872 llvm-svn: 301775	2017-04-30 19:24:09 +00:00
Sanjay Patel	0c6086f493	[InstCombine] consolidate tests for DeMorgan folds; NFC I'm proposing to add tests and change behavior in D32665. llvm-svn: 301774	2017-04-30 18:57:12 +00:00
Zvi Rackover	4086e13e0d	InstructionSimplify: Simplify a shuffle with a undef mask to undef Summary: Following the discussion in pr32486, adding the simplification: shuffle %x, %y, undef -> undef Reviewers: spatel, RKSimon, andreadb, davide Reviewed By: spatel Subscribers: jroelofs, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D32293 llvm-svn: 301764	2017-04-30 06:06:26 +00:00
Simon Atanasyan	3979f43813	[mips] Emit R_MICROMIPS_TLS_GOTTPREL relocation for %gottprel in case of microMIPS In case of microMIPS mode %gottprel operator should emit microMIPS relocation R_MICROMIPS_TLS_GOTTPREL, not R_MIPS_TLS_GOTTPREL. Differential Revision: http://reviews.llvm.org/D32617 llvm-svn: 301763	2017-04-30 04:27:23 +00:00
Daniel Sanders	887a141d4d	[globalisel][tablegen] Fix the test after silencing the unused variable warning in r301755. llvm-svn: 301756	2017-04-29 19:46:27 +00:00
Daniel Sanders	e9fdba39e0	[globalisel][tablegen] Compute available feature bits correctly. Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750	2017-04-29 17:30:09 +00:00
Simon Pilgrim	694cb2c838	[X86][AVX] Added codegen tests for _mm256_zext* helper intrinsics (PR32839) Not great codegen, especially as VEX moves support implicit zeroing of upper bits.... llvm-svn: 301748	2017-04-29 17:15:12 +00:00
Simon Pilgrim	ac7f3e24d3	[X86][SSE] Add initial <2 x half> tests for PR31088 As discussed on D32391, test X86/X64 SSE2 and X64 F16C. llvm-svn: 301744	2017-04-29 14:29:06 +00:00
Matt Arsenault	2a80369ae4	AMDGPU: Fix copies from physical registers in SIFixSGPRCopies This would assert when there were multiple defs of a physical register. We just need to move all of the users of it. llvm-svn: 301730	2017-04-29 01:26:34 +00:00
Zachary Turner	5b6e4e0aed	[llvm-pdbdump] Abstract some of the YAML/Raw printing code. There is a lot of duplicate code for printing line info between YAML and the raw output printer. This introduces a base class that can be shared between the two, and makes some minor cleanups in the process. llvm-svn: 301728	2017-04-29 01:13:21 +00:00
Akira Hatanaka	6fdcb3c2ce	[ObjCARC] Do not move a release between a call and a retainAutoreleasedReturnValue that retains the returned value. This commit fixes a bug in ARC optimizer where it moves a release between a call and a retainAutoreleasedReturnValue, causing the returned object to be released before the retainAutoreleasedReturnValue can retain it. This commit accomplishes that by doing a lookahead and checking whether the call prevents the release from moving upwards. In the long term, we should treat the region between the retainAutoreleasedReturnValue and the call as a critical section and disallow moving anything there (possibly using operand bundles). rdar://problem/20449878 llvm-svn: 301724	2017-04-29 00:23:11 +00:00
Davide Italiano	534e314356	[LoopUnswitch] Don't remove instructions with side effects. This fixes PR32818. Differential Revision: https://reviews.llvm.org/D32664 llvm-svn: 301722	2017-04-29 00:12:18 +00:00
Sanjay Patel	c8ab6bb27d	[InstCombine] add tests to show potentially bogus application of DeMorgan (NFC) llvm-svn: 301714	2017-04-28 23:14:33 +00:00
Matt Arsenault	e0f9e984fd	InferAddressSpaces: Search constant expressions for addrspacecasts These are pretty common when using local memory, and the 64-bit generic addressing is much more expensive to compute. llvm-svn: 301711	2017-04-28 22:52:41 +00:00
Adrian Prantl	fed4f399d3	Remove line and file from DINamespace. Fixes the issue highlighted in http://lists.llvm.org/pipermail/cfe-dev/2014-June/037500.html. The DW_AT_decl_file and DW_AT_decl_line attributes on namespaces can prevent LLVM from uniquing types that are in the same namespace. They also don't carry any meaningful information. rdar://problem/17484998 Differential Revision: https://reviews.llvm.org/D32648 llvm-svn: 301706	2017-04-28 22:25:46 +00:00
Matt Arsenault	a1e734050c	InferAddressSpaces: Infer from just addrspacecasts Eliminates some more cases where some subset of the addressing computation remains flat. Some cases with addrspacecasts in nested constant expressions are still left behind however. llvm-svn: 301704	2017-04-28 22:18:08 +00:00
Krzysztof Parzyszek	072ddb383c	[RDF] Correctly calculate lane masks for defs llvm-svn: 301700	2017-04-28 21:57:53 +00:00
Krzysztof Parzyszek	2065a2f4e6	Properly handle PHIs with subregisters in UnreachableBlockElim When a PHI operand has a subregister, create a COPY instead of simply replacing the PHI output with the input it. Differential Revision: https://reviews.llvm.org/D32650 llvm-svn: 301699	2017-04-28 21:56:33 +00:00
Krzysztof Parzyszek	0b3acbb1dd	[Hexagon] Do not move a block if it is on a fall-through path llvm-svn: 301698	2017-04-28 21:54:11 +00:00
Sam Clegg	a06de02889	[WebAssembly] Add size of section header to data relocation offsets. Also, add test for data relocations and fix addend to be signed. Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D32513 llvm-svn: 301690	2017-04-28 21:22:38 +00:00
Matt Arsenault	cf5e7fe358	[ValueTracking] Teach isSafeToSpeculativelyExecute() about the speculatable attribute Patch by Tom Stellard llvm-svn: 301688	2017-04-28 21:13:09 +00:00
Sam Clegg	ff0730b3fc	[WebAssembly] Write initial memory in pages not bytes Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D32660 llvm-svn: 301687	2017-04-28 21:12:09 +00:00
Matt Arsenault	b19b57ea60	Add speculatable function attribute This attribute tells the optimizer that the function may be speculated. Patch by Tom Stellard llvm-svn: 301680	2017-04-28 20:25:27 +00:00
Marek Olsak	2d82590f64	AMDGPU: Add new amdgcn.init.exec intrinsics v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677	2017-04-28 20:21:58 +00:00
Alexei Starovoitov	f7bd5ebd3b	[bpf] add bigendian support to disassembler . swap 4-bit register encoding, 16-bit offset and 32-bit imm to support big endian archs . add a test Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 301653	2017-04-28 16:51:01 +00:00
Jun Bum Lim	919f9e8d65	[InlineCost] Improve the cost heuristic for Switch Summary: The motivation example is like below which has 13 cases but only 2 distinct targets ``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86) ``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times. This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false This change was originally proposed by Haicheng in D29870. Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier Reviewed By: hans Subscribers: joerg, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31085 llvm-svn: 301649	2017-04-28 16:04:03 +00:00
Teresa Johnson	51177295c4	Memory intrinsic value profile optimization: Avoid divide by 0 Summary: Skip memops if the total value profiled count is 0, we can't correctly scale up the counts and there is no point anyway. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32624 llvm-svn: 301645	2017-04-28 14:30:54 +00:00
Simon Pilgrim	7ae9419dc0	[DAGCombiner] Add ComputeNumSignBits vector demanded elements support to ASHR and INSERT_VECTOR_ELT (reapplied) Reapplied r299221 after fix for nondeterminism in ThinLTO builder (rL301599), with extra check for implicit truncation of inserted element. llvm-svn: 301644	2017-04-28 13:21:18 +00:00
Simon Pilgrim	ec93334317	[X86][SSE] Added new tests from D32416 to show codegen delta llvm-svn: 301641	2017-04-28 11:53:08 +00:00
Simon Pilgrim	04928fd021	[X86][SSE] Renames all ones test to better match type. Added 8f32/4f64 optsize tests discussed on D32416 llvm-svn: 301639	2017-04-28 11:12:30 +00:00
Simon Pilgrim	67b1a79985	[X86][SSE] Add codegen test for _mm_set_pd1 (PR32827) llvm-svn: 301638	2017-04-28 10:31:42 +00:00
Andrew Ng	03e35b6bc0	[DebugInfo][X86] Improve X86 Optimize LEAs handling of debug values. This is a follow up to the fix in r298360 to improve the handling of debug values when redundant LEAs are removed. The fix in r298360 effectively discarded the debug values. This patch now attempts to preserve the debug values by using the DWARF DW_OP_stack_value operation via prependDIExpr. Moved functions appendOffset and prependDIExpr from Local.cpp to DebugInfoMetadata.cpp and made them available as static member functions of DIExpression. Differential Revision: https://reviews.llvm.org/D31604 llvm-svn: 301630	2017-04-28 08:44:30 +00:00
Diana Picus	0674a3ce97	[ARM] GlobalISel: Tighten test. NFC Explicitly check types and load sizes in the IRTranslator test. llvm-svn: 301627	2017-04-28 07:50:47 +00:00
Max Kazantsev	531db9a504	[EarlyCSE] Mark the condition of assume intrinsic as true EarlyCSE should not just ignore assumes. It should use the fact that its condition is true for all dominated instructions. Reviewers: sanjoy, reames, apilipenko, anna, skatkov Reviewed By: reames, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32482 llvm-svn: 301625	2017-04-28 06:25:39 +00:00
Max Kazantsev	0589d9fa0f	[EarlyCSE] Remove guards with conditions known to be true If a condition is calculated only once, and there are multiple guards on this condition, we should be able to remove all guards dominated by the first of them. This patch allows EarlyCSE to try to find the condition of a guard among the known values, and if it is true, remove the guard. Otherwise we keep the guard and mark its condition as 'true' for future consideration. Reviewers: sanjoy, reames, apilipenko, skatkov, anna, dberlin Reviewed By: reames, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32476 llvm-svn: 301623	2017-04-28 06:05:48 +00:00
Sanjoy Das	ba0daee6b2	[StackMaps] Increase the size of the "location size" field Summary: In some cases LLVM (especially the SLP vectorizer) will create vectors that are 256 bytes (or larger). Given that this is intentional[0] is likely to get more common, this patch updates the StackMap binary format to deal with the spill locations for said vectors. This change also bumps the stack map version from 2 to 3. [0]: https://reviews.llvm.org/D32533#738350 Reviewers: reames, kavon, skatkov, javed.absar Subscribers: mcrosier, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D32629 llvm-svn: 301615	2017-04-28 04:48:42 +00:00
Saleem Abdulrasool	41d9ef3ced	COFF Import: expose both symbols COFF Import libraries which use the obsolete CONSTANT export are supposed to get two symbols, one with the `_imp_` prefix and one without. Ensure that we expose both for iteration. This is necessary to fix the librarian with COFF CONSTANT exports. llvm-svn: 301614	2017-04-28 04:29:43 +00:00
Zachary Turner	7159ab95c7	[llvm-pdbdump] Allow printing only a portion of a stream. When dumping raw data from a stream, you might know the offset of a certain record you're interested in, as well as how long that record is. Previously, you had to dump the entire stream and wade through the bytes to find the interesting record. This patch allows you to specify an offset and length on the command line, and it will only dump the requested range. llvm-svn: 301607	2017-04-28 00:43:38 +00:00
Sam Clegg	10545c9c24	[WebAssembly] Add some tests for wasm MC layer Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D32558 llvm-svn: 301606	2017-04-28 00:36:36 +00:00
Sanjay Patel	73d8c43da8	[InstCombine] fix matcher to bind to specific operand (PR32830) Matching any random value would be very wrong: https://bugs.llvm.org/show_bug.cgi?id=32830 llvm-svn: 301594	2017-04-27 21:55:03 +00:00
Evgeniy Stepanov	964f4663c4	[asan] Fix dead stripping of globals on Linux. Use a combination of !associated, comdat, @llvm.compiler.used and custom sections to allow dead stripping of globals and their asan metadata. Sometimes. Currently this works on LLD, which supports SHF_LINK_ORDER with sh_link pointing to the associated section. This also works on BFD, which seems to treat comdats as all-or-nothing with respect to linker GC. There is a weird quirk where the "first" global in each link is never GC-ed because of the section symbols. At this moment it does not work on Gold (as in the globals are never stripped). This is a second re-land of r298158. This time, this feature is limited to -fdata-sections builds. llvm-svn: 301587	2017-04-27 20:27:27 +00:00
Evgeniy Stepanov	716f0ff222	[asan] Put ctor/dtor in comdat. When possible, put ASan ctor/dtor in comdat. The only reason not to is global registration, which can be TU-specific. This is not the case when there are no instrumented globals. This is also limited to ELF targets, because MachO does not have comdat, and COFF linkers may GC comdat constructors. The benefit of this is a lot less __asan_init() calls: one per DSO instead of one per TU. It's also necessary for the upcoming gc-sections-for-globals change on Linux, where multiple references to section start symbols trigger quadratic behaviour in gold linker. This is a second re-land of r298756. This time with a flag to disable the whole thing to avoid a bug in the gold linker: https://sourceware.org/bugzilla/show_bug.cgi?id=19002 llvm-svn: 301586	2017-04-27 20:27:23 +00:00
Simon Pilgrim	9a08ad8abd	[X86][SSE] Add tests for broadcast from larger vector loads llvm-svn: 301583	2017-04-27 20:19:00 +00:00
Zachary Turner	8d6396d3b0	[llvm-readobj] Dump COFF Resources section. This patch dumps the raw bytes of the .rsrc sections that are present in COFF object and executable files. Subsequent patches will parse this information and dump in a more human readable format. Differential Revision: https://reviews.llvm.org/D32463 Patch By: Eric Beckmann llvm-svn: 301578	2017-04-27 19:38:38 +00:00
Chandler Carruth	1353f9a48b	[PM/LoopUnswitch] Introduce a new, simpler loop unswitch pass. Currently, this pass only focuses on trivial loop unswitching. At that reduced problem it remains significantly better than the current loop unswitch: - Old pass is worse than cubic complexity. New pass is (I think) linear. - New pass is much simpler in its design by focusing on full unswitching. (See below for details on this). - New pass doesn't carry state for thresholds between pass iterations. - New pass doesn't carry state for correctness (both miscompile and infloop) between pass iterations. - New pass produces substantially better code after unswitching. - New pass can handle more trivial unswitch cases. - New pass doesn't recompute the dominator tree for the entire function and instead incrementally updates it. I've ported all of the trivial unswitching test cases from the old pass to the new one to make sure that major functionality isn't lost in the process. For several of the test cases I've worked to improve the precision and rigor of the CHECKs, but for many I've just updated them to handle the new IR produced. My initial motivation was the fact that the old pass carried state in very unreliable ways between pass iterations, and these mechansims were incompatible with the new pass manager. However, I discovered many more improvements to make along the way. This pass makes two very significant assumptions that enable most of these improvements: 1) Focus on full unswitching -- that is, completely removing whatever control flow construct is being unswitched from the loop. In the case of trivial unswitching, this means removing the trivial (exiting) edge. In non-trivial unswitching, this means removing the branch or switch itself. This is in opposition to partial unswitching where some part of the unswitched control flow remains in the loop. Partial unswitching only really applies to switches and to folded branches. These are very similar to full unrolling and partial unrolling. The full form is an effective canonicalization, the partial form needs a complex cost model, cannot be iterated, isn't canonicalizing, and should be a separate pass that runs very late (much like unrolling). 2) Leverage LLVM's Loop machinery to the fullest. The original unswitch dates from a time when a great deal of LLVM's loop infrastructure was missing, ineffective, and/or unreliable. As a consequence, a lot of complexity was added which we no longer need. With these two overarching principles, I think we can build a fast and effective unswitcher that fits in well in the new PM and in the canonicalization pipeline. Some of the remaining functionality around partial unswitching may not be relevant today (not many test cases or benchmarks I can find) but if they are I'd like to add support for them as a separate layer that runs very late in the pipeline. Purely to make reviewing and introducing this code more manageable, I've split this into first a trivial-unswitch-only pass and in the next patch I'll add support for full non-trivial unswitching against a fixed threshold, exactly like full unrolling. I even plan to re-use the unrolling thresholds, as these are incredibly similar cost tradeoffs: we're cloning a loop body in order to end up with simplified control flow. We should only do that when the total growth is reasonably small. One of the biggest changes with this pass compared to the previous one is that previously, each individual trivial exiting edge from a switch was unswitched separately as a branch. Now, we unswitch the entire switch at once, with cases going to the various destinations. This lets us unswitch multiple exiting edges in a single operation and also avoids numerous extremely bad behaviors, where we would introduce 1000s of branches to test for thousands of possible values, all of which would take the exact same exit path bypassing the loop. Now we will use a switch with 1000s of cases that can be efficiently lowered into a jumptable. This avoids relying on somehow forming a switch out of the branches or getting horrible code if that fails for any reason. Another significant change is that this pass actively updates the CFG based on unswitching. For trivial unswitching, this is actually very easy because of the definition of loop simplified form. Doing this makes the code coming out of loop unswitch dramatically more friendly. We still should run loop-simplifycfg (at the least) after this to clean up, but it will have to do a lot less work. Finally, this pass makes much fewer attempts to simplify instructions based on the unswitch. Something like loop-instsimplify, instcombine, or GVN can be used to do increasingly powerful simplifications based on the now dominating predicate. The old simplifications are things that something like loop-instsimplify should get today or a very, very basic loop-instcombine could get. Keeping that logic separate is a big simplifying technique. Most of the code in this pass that isn't in the old one has to do with achieving specific goals: - Updating the dominator tree as we go - Unswitching all cases in a switch in a single step. I think it is still shorter than just the trivial unswitching code in the old pass despite having this functionality. Differential Revision: https://reviews.llvm.org/D32409 llvm-svn: 301576	2017-04-27 18:45:20 +00:00
Eli Friedman	10ab923b32	[GlobalOpt] Correctly update metadata when localizing a global. Just calling dropAllReferences leaves pointers to the ConstantExpr behind, so we would eventually crash with a null pointer dereference. Differential Revision: https://reviews.llvm.org/D32551 llvm-svn: 301575	2017-04-27 18:39:08 +00:00
Sanjoy Das	40c32dd9a0	Use a pointer type for target frame indices during statepoint lowering Summary: The type of the target frame index is intptr, not the type of the value we're going to store into it. Without this change we crash in the attached test case when trying to type-legalize a TargetFrameIndex. Patchpoint lowering types the target frame index as intptr as well. Reviewers: reames, bogner, arsenm Subscribers: arsenm, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32256 llvm-svn: 301566	2017-04-27 17:17:16 +00:00
Xinliang David Li	d21601a929	[PartialInlining]: Improve partial inlining to handle complex conditions Differential Revision: http://reviews.llvm.org/D32249 llvm-svn: 301561	2017-04-27 16:34:00 +00:00
Sanjay Patel	c3e00fcadd	[x86] add minimal tests for potential size-changing vsel transforms; NFC llvm-svn: 301554	2017-04-27 16:10:20 +00:00
Sam Kolton	5d99386b4d	[AMDGPU] DPP: add support for GFX9 Reviewers: artem.tamazov Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32588 llvm-svn: 301551	2017-04-27 15:42:38 +00:00
Zoran Jovanovic	ffef3e3c6a	[mips][microMIPS] Adding code size reduction pass for MicroMIPS Author: milena.vujosevic.janicic Reviewers: sdardis The code implements size reduction pass for MicroMIPS. Load and store instructions are examined and transformed, if possible. lw32 instruction is transformed into 16-bit instruction lwsp sw32 instruction is transformed into 16-bit instruction swsp Arithmetic instrcutions are examined and transformed, if possible. addu32 instruction is transformed into 16-bit instruction addu16 subu32 instruction is transformed into 16-bit instruction subu16 Differential Revision: https://reviews.llvm.org/D15144 llvm-svn: 301540	2017-04-27 13:10:48 +00:00
Diana Picus	4f46be327c	[ARM] GlobalISel: Fix extended stack operands Fix a crash when trying to extend a value passed as a sign- or zero-extended stack parameter. The cause of the crash was that we were setting the size of the loaded value to 32 bits, and then tyring to extend again to 32 bits. This patch addresses the issue by also introducing a G_TRUNC after the load. This will leave the unused bits to their original values set by the caller, while being consistent about the types. For values that are not extended, we just use a smaller load. llvm-svn: 301531	2017-04-27 10:23:30 +00:00
Andrew V. Tischenko	9108ae2b50	2 tests that were lost in rL301390 llvm-svn: 301529	2017-04-27 10:20:35 +00:00
George Rimar	e6ef4488e1	[llvm-dwarfdump] - Change format for .gdb_index dump. It is useful to output size of ranges when address ranges section of .gdb_index is dumped. It helps to compare outputs produced by different linkers, for example. In that case address ranges can look very different, when they are the same at fact. Difference comes from different low address because of different address of .text. Differential revision: https://reviews.llvm.org/D32492 llvm-svn: 301527	2017-04-27 10:00:13 +00:00
Igor Breger	360d0f23ee	[GlobalISel][X86] handle not symmetric G_COPY Summary: handle not symmetric G_COPY Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D32420 llvm-svn: 301523	2017-04-27 08:02:03 +00:00
Konstantin Zhuravlyov	97a663b6a2	AMDGPU: Fix assert in scheduler Assert is triggered if DBG_VALUE is first instruction in BB Differential Revision: https://reviews.llvm.org/D32572 llvm-svn: 301511	2017-04-27 03:22:44 +00:00
Chandler Carruth	c246a4c973	Disable GVN Hoist due to still more bugs being found in it. There is also a discussion about exactly what we should do prior to re-enabling it. The current bug is http://llvm.org/PR32821 and the discussion about this is in the review thread for r300200. llvm-svn: 301505	2017-04-27 00:28:03 +00:00
Rui Ueyama	0fcbb2893e	Revert r301487: Replace HashString algorithm with xxHash64 This reverts commit r301487 to make buildbots green. llvm-svn: 301491	2017-04-26 23:15:10 +00:00
Adrian Prantl	1d12b885b0	Add support for DW_TAG_thrown_type. For Swift we would like to be able to encode the error types that a function may throw, so the debugger can display them alongside the function's return value when finish-ing a function. DWARF defines DW_TAG_thrown_type (intended to be used for C++ throw() declarations) that is a perfect fit for this purpose. This patch wires up support for DW_TAG_thrown_type in LLVM by adding a list of thrown types to DISubprogram. To offset the cost of the extra pointer, there is a follow-up patch that turns DISubprogram into a variable-length node. rdar://problem/29481673 Differential Revision: https://reviews.llvm.org/D32559 llvm-svn: 301489	2017-04-26 22:56:44 +00:00
Rui Ueyama	87b30ac9d3	Replace HashString algorithm with xxHash64 The previous algorithm processed one character at a time, which is very painful on a modern CPU. Replace it with xxHash64, which both already exists in the codebase and is fairly fast. Patch from Scott Smith! Differential Revision: https://reviews.llvm.org/D32509 llvm-svn: 301487	2017-04-26 22:45:04 +00:00
Sanjay Patel	a0547c3d9f	[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X) Besides better codegen, the motivation is to be able to canonicalize this pattern in IR (currently we don't) knowing that the backend is prepared for that. This may also allow removing code for special constant cases in DAGCombiner::foldSelectOfConstants() that was added in D30180. Differential Revision: https://reviews.llvm.org/D31944 llvm-svn: 301457	2017-04-26 20:26:46 +00:00
Dmitry Preobrazhensky	43d297eb45	[AMDGPU][MC] Added arg checks for vmcnt, expcnt, lgkmcnt helpers Summary of changes: - corrected vmcnt, expcnt, lgkmcnt helpers to checks their argument for truncation; - added saturated versions of these helpers. See bug 32711 for details: https://bugs.llvm.org//show_bug.cgi?id=32711 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32546 llvm-svn: 301439	2017-04-26 17:55:50 +00:00
Peter Collingbourne	fa58f7528e	LTO: Mark undefined module asm symbols as used. Marking them as used causes them to be considered visible outside of LTO. This prevents the symbols from being internalized or discarded, either by GlobalDCE or by summary-based dead stripping in ThinLTO. This change makes it unnecessary to add these symbols to llvm.compiler.used in the backend, as the symbols are kept alive by virtue of being external, so remove the backend code that handles that. Fixes PR32798. Differential Revision: https://reviews.llvm.org/D32544 llvm-svn: 301438	2017-04-26 17:53:39 +00:00
Sanjoy Das	2cbeb00f38	Reverts commit r301424, r301425 and r301426 Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429	2017-04-26 16:37:05 +00:00
Matthew Simpson	9eed0bee3d	[LV] Handle external uses of floating-point induction variables Reference: https://bugs.llvm.org/show_bug.cgi?id=32758 Differential Revision: https://reviews.llvm.org/D32445 llvm-svn: 301428	2017-04-26 16:23:02 +00:00
Sanjoy Das	8b32b81954	Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts Summary: In cases where an instruction (a call site, say) is RAUW'ed with some other value (this is possible via the `returned` attribute, amongst other things), we want the slot in UnknownInsts to point to the original Instruction we wanted to track, not the value it got replaced by. Fixes PR32587. Reviewers: davide Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32268 llvm-svn: 301426	2017-04-26 16:21:02 +00:00
Vedant Kumar	7f5b3d6fc8	[sampleprof] Drop test dependency on the string hash func (NFC) The SampleProfWriter emits function information in an order determined by the string hash function. The situation is a bit brittle, because changing the hash function can break the tests. Instead of sorting the function samples to get a relaible ordering (that might be too expensive), make the tests not depend on a particular ordering of function samples. Differential Revision: https://reviews.llvm.org/D32516 llvm-svn: 301419	2017-04-26 15:39:53 +00:00
Dmitry Preobrazhensky	c7d35a0d6a	[AMDGPU][MC] Added check for truncation of SOPK imm operand See bug 30827: https://bugs.llvm.org//show_bug.cgi?id=30827 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32535 llvm-svn: 301418	2017-04-26 15:34:19 +00:00
Sanjay Patel	3603e3f22d	[x86] change tests to use sext, not zext; NFC These are intended to exercise D31944, so we need sexts. llvm-svn: 301412	2017-04-26 14:35:54 +00:00
Sanjay Patel	e2ec05a62a	[TargetLowering] fix isConstTrueVal to account for build vector truncation Build vectors have magical truncation powers, so we have things like this: v4i1 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1> v4i16 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1> If we don't truncate the splat node returned by getConstantSplatNode(), then we won't find truth when ZeroOrNegativeOneBooleanContent is the rule. Differential Revision: https://reviews.llvm.org/D32505 llvm-svn: 301408	2017-04-26 14:05:42 +00:00
Ranjeet Singh	acbd4e141f	Fix signed multiplication with overflow fallback. For targets that don't have ISD::MULHS or ISD::SMUL_LOHI for the type and the double width type is illegal, then the two operands are sign extended to twice their size then multiplied to check for overflow. The extended upper halves were mismatched causing an incorrect result. This fixes the mismatch. A test was added for ARM V6-M where the bug was detected. Patch by James Duley. Differential Revision: https://reviews.llvm.org/D31807 llvm-svn: 301404	2017-04-26 13:41:43 +00:00
Simon Pilgrim	e093594074	[X86] Added pointer math zext test case (PR22970) llvm-svn: 301401	2017-04-26 13:03:00 +00:00
Simon Pilgrim	e6a7708448	[X86][SSE] Add test case for repeated vector insertions of the same element (PR15298) llvm-svn: 301396	2017-04-26 12:23:32 +00:00
Filipe Cabecinhas	92dc348773	Simplify the CFG after loop pass cleanup. Summary: Otherwise we might end up with some empty basic blocks or single-entry-single-exit basic blocks. This fixes PR32085 Reviewers: chandlerc, danielcdh Subscribers: mehdi_amini, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D30468 llvm-svn: 301395	2017-04-26 12:02:41 +00:00
Sagar Thakur	b458b468a2	[mips] Fix test mips64fpldst.ll with machine verifier enabled Removed micro mips register classes for gp initialization because gp initialization uses pure mips64 instruction. Even when compiling for micro mips, gp initialization can be done with pure mips64 instructions. Reviewed by Simon Dardis Differential: D32286 llvm-svn: 301394	2017-04-26 11:40:12 +00:00
Ayman Musa	d9fb157845	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction. Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 llvm-svn: 301386	2017-04-26 07:08:44 +00:00
Craig Topper	17a2b694c0	[InstCombine] Add test cases for opportunities to improve knownbits handling for cttz and ctlz intrinsics. llvm-svn: 301385	2017-04-26 05:59:19 +00:00
Vadzim Dambrouski	d91fb8c367	[MSP430] Fix PR32769: Select8 and Select16 need to have SR in Uses. If Select pseudo instruction doesn't have use SR, then CMP instructions are being marked as dead and later can be removed by MachineCSE pass. This leads to incorrect code generation. Differential Revision: https://reviews.llvm.org/D32473 llvm-svn: 301372	2017-04-26 00:33:59 +00:00
Vedant Kumar	77deb5c788	[gcov] Sort file info before printing it The order in which GCOV file info is printed depends on the string hash function. This makes some GCOV tests brittle, because the tests must be updated whenever the hash function changes. Sort the filenames before printing out the file info to solve the problem. This should be relatively cheap. Differential Revision: https://reviews.llvm.org/D32512 llvm-svn: 301371	2017-04-26 00:16:10 +00:00
Sam Clegg	cc182aaaef	[WebAssembly] Allow for signed relocation addends Summary: Addends are used as offsets to addresses of globals and can be both positive and negative. This change prints libObject in line with the spec and the MC layer. Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D32507 llvm-svn: 301369	2017-04-26 00:02:31 +00:00
Sanjay Patel	227c901dd8	[x86] add more tests for potential change in bool math folding; NFC Also, use AVX2 to show a potential difference for 256-bit vectors. llvm-svn: 301362	2017-04-25 20:56:14 +00:00
Konstantin Zhuravlyov	54ba4312a3	AMDGPU: Fix ValueKind code object metadata for images Differential Revision: https://reviews.llvm.org/D32504 llvm-svn: 301360	2017-04-25 20:38:26 +00:00
Sanjay Patel	7e6ee7c00d	[x86] regenerate checks; NFC llvm-svn: 301359	2017-04-25 20:30:08 +00:00
Zachary Turner	ee3b9c2558	[llvm-pdbdump] Dump File / Line Info to YAML. We were already parsing and dumping this to the human readable format, but not to the YAML format. This does so, in preparation for reading it in and reconstructing the line information from YAML. llvm-svn: 301357	2017-04-25 20:22:02 +00:00
Matthias Braun	c36a78c3f3	SimplifyLibCalls: Fix crash on memset(notmalloc()) rdar://31520787 llvm-svn: 301352	2017-04-25 19:44:25 +00:00
Adrian Prantl	dd21502482	Fix an assertion when skipping stack values in DWARF2 mode. The fix consists of resetting LocationKind when addMachineRegExpression fails. rdar://problem/31803010 llvm-svn: 301351	2017-04-25 19:40:53 +00:00

... 14 15 16 17 18 ...

45938 Commits