llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	28f1e0dab9	[X86][AVX512] Added some more complex v64i8 shuffles llvm-svn: 287444	2016-11-19 17:50:14 +00:00
Sanjay Patel	47e577eb92	[InstCombine] add tests to show likely unwanted select widening; NFC This is a prerequisite patch for D26556: https://reviews.llvm.org/D26556 ...because there was no direct coverage for these folds (which in some cases are adding instructions). llvm-svn: 287400	2016-11-18 23:22:00 +00:00
Konstantin Zhuravlyov	aefee42e0f	[AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input Differential Revision: https://reviews.llvm.org/D26862 llvm-svn: 287389	2016-11-18 22:31:08 +00:00
Simon Pilgrim	e40900dddd	[SelectionDAG] Add knowbits support for CONCAT_VECTOR opcode llvm-svn: 287387	2016-11-18 22:21:22 +00:00
Simon Pilgrim	3a5328ecdd	[X86] Add knownbits concat_vector test Support coming in a future patch llvm-svn: 287385	2016-11-18 21:59:38 +00:00
Michael Zolotukhin	5020c9971b	[LoopSimplify] Preserve LCSSA when removing edges from unreachable blocks. This fixes PR30454. llvm-svn: 287379	2016-11-18 21:01:12 +00:00
Geoff Berry	de50acc31e	[MIRPrinter] XFAIL test for powerpc This test introduced in r287368 is failing on powerpc for reasons unrelated to branch probabilities. See PR31062. llvm-svn: 287375	2016-11-18 20:08:05 +00:00
Matthias Braun	db39fd6c53	Statistic/Timer: Include timers in PrintStatisticsJSON(). Differential Revision: https://reviews.llvm.org/D25588 llvm-svn: 287370	2016-11-18 19:43:24 +00:00
Geoff Berry	b51774ac8c	[MIRPrinter] Print raw branch probabilities as expected by MIRParser Fixes PR28751. Reviewers: MatzeB, qcolombet Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26775 llvm-svn: 287368	2016-11-18 19:37:24 +00:00
Hans Wennborg	105e05a2a4	Fix test from r287353: don't use /dev/null llvm-svn: 287360	2016-11-18 18:27:31 +00:00
Adam Nemet	e9bd022c41	[LTO] Add option to generate optimization records It is used to drive this from the clang driver via -mllvm. Same option name is used as in opt. Differential Revision: https://reviews.llvm.org/D26832 llvm-svn: 287356	2016-11-18 18:06:28 +00:00
Hans Wennborg	aeacdc258b	IRMover: Avoid accidentally mapping types from the destination module (PR30799) During Module linking, it's possible for SrcM->getIdentifiedStructTypes(); to return types that are actually defined in the destination module (DstM). Depending on how the bitcode file was read, getIdentifiedStructTypes() might do a walk over all values, including metadata nodes, looking for types. In my case, a debug info metadata node was shared between the two modules, and it referred to a type defined in the destination module (see test case). Differential Revision: https://reviews.llvm.org/D26212 llvm-svn: 287353	2016-11-18 17:33:05 +00:00
Simon Pilgrim	7bde5df5f0	[X86][AVX512] Split AVX512F/AVX512VL tests to demonstrate missed int2fp opportunities without AVX512VL llvm-svn: 287348	2016-11-18 15:31:36 +00:00
Tom Stellard	df613198c0	GlobalISel: Fix unconditional fallback with global isel abort is disabled Reviewers: t.p.northover, ab, qcolombet Subscribers: mehdi_amini, vkalintiris, wdng, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D26765 llvm-svn: 287344	2016-11-18 14:14:35 +00:00
Tom Stellard	01e65d2cfc	AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts Summary: The 32-bit instructions don't zero the high 16-bits like the 16-bit instructions do. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26828 llvm-svn: 287342	2016-11-18 13:53:34 +00:00
Florian Hahn	77382be56b	[simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations. insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now propagates existing llvm.loop metadata to newly the added backedge. llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp now propagates existing llvm.loop metadata to the branch instructions in the predecessor blocks of the empty block that is removed. Differential Revision: https://reviews.llvm.org/D26495 llvm-svn: 287341	2016-11-18 13:12:07 +00:00
Nicolai Haehnle	ce2b589df5	AMDGPU: Fix legalization of MUBUF instructions in shaders Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339	2016-11-18 11:55:52 +00:00
Ehsan Amiri	ff0942e6ea	[Power9] Add patterns for vnegd, vnegw Exploit new instructions by adding patterns to .td file. https://reviews.llvm.org/D26551 llvm-svn: 287334	2016-11-18 11:05:55 +00:00
Simon Pilgrim	3e5045e8f1	[X86][AVX2] Add v8i32->v8i64 mul test (PR30845) llvm-svn: 287332	2016-11-18 11:00:36 +00:00
Ehsan Amiri	85818684c6	[PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended When we see a SETCC whose only users are zero extend operations, we can replace it with a subtraction. This results in doing all calculations in GPRs and avoids CR use. Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are ways that this can be extended. For example for signed condition codes. In that case we will be introducing additional sign extend instructions, so more careful profitability analysis may be required. Another direction to extend this is for equal, not equal conditions. Also when users of SETCC are any_ext or sign_ext, we might be able to do something similar. llvm-svn: 287329	2016-11-18 10:41:44 +00:00
Craig Topper	1de753f7f5	[InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements. This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches. llvm-svn: 287316	2016-11-18 06:04:33 +00:00
Craig Topper	02b5a1b50f	[AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects. The same thing was done to 32-bit and 64-bit element sizes previously. This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics. llvm-svn: 287312	2016-11-18 05:04:44 +00:00
Matt Arsenault	742deb2495	AMDGPU: Fix crash on illegal type for inlineasm There are still crashes on non-MVT types in other places. llvm-svn: 287310	2016-11-18 04:42:57 +00:00
Alexei Starovoitov	8f9f8210c1	convert bpf assembler to look like kernel verifier output since bpf instruction set was introduced people learned to read and understand kernel verifier output whereas llvm asm output stayed obscure and unknown. Convert llvm to emit assembler text similar to kernel to avoid this discrepancy Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 287300	2016-11-18 02:32:35 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Anna Zaks	9cd5ed1241	[asan] Turn on Mach-O global metadata liveness tracking by default This patch turns on the metadata liveness tracking since all known issues have been resolved. The future has been implemented in https://reviews.llvm.org/D16737 and enables support of dead code stripping option on Mach-O platforms. As part of enabling the feature, I also plan on reverting the following patch to compiler-rt: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160704/369910.html Differential Revision: https://reviews.llvm.org/D26772 llvm-svn: 287235	2016-11-17 16:55:40 +00:00
Konstantin Zhuravlyov	0a1a7b6b23	Revert "AMDGPU: Enable ConstrainCopy DAG mutation" This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233	2016-11-17 16:41:49 +00:00
Simon Pilgrim	8eca5520dc	[X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves. If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer). This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases). Partial fix for PR30845 Differential Revision: https://reviews.llvm.org/D26590 llvm-svn: 287223	2016-11-17 12:14:49 +00:00
Pablo Barrio	c41e856f53	[ARM] Relax restriction on variadic functions for tailcall optimization Summary: Variadic functions can be treated in the same way as normal functions with respect to the number and types of parameters. Reviewers: grosbach, olista01, t.p.northover, rengolin Subscribers: javed.absar, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26748 llvm-svn: 287219	2016-11-17 10:56:58 +00:00
Oren Ben Simhon	489d6eff4f	[X86] RegCall - Handling v64i1 in 32/64 bit target Register Calling Convention defines a new behavior for v64i1 types. This type should be saved in GPR. However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit). Differential Revision: https://reviews.llvm.org/D26181 llvm-svn: 287217	2016-11-17 09:59:40 +00:00
Sanjoy Das	4a8fe09040	[ImplicitNullCheck] Fix an edge case where we were hoisting incorrectly ImplicitNullCheck keeps track of one instruction that the memory operation depends on that it also hoists with the memory operation. When hoisting this dependency, it would sometimes clobber a live-in value to the basic block we were hoisting the two things out of. Fix this by explicitly looking for such dependencies. I also noticed two redundant checks on `MO.isDef()` in IsMIOperandSafe. They're redundant since register MachineOperands are either Defs or Uses -- there is no third kind. I'll change the checks to asserts in a later commit. llvm-svn: 287213	2016-11-17 07:29:40 +00:00
Craig Topper	dfaf9201cb	[X86] Add a test case where, due to a bug in selectScalarSSELoad, we fold the same load twice. llvm-svn: 287210	2016-11-17 05:37:39 +00:00
Konstantin Zhuravlyov	20ba24e231	[AMDGPU] Add missing test for rL287203 llvm-svn: 287204	2016-11-17 04:33:20 +00:00
Konstantin Zhuravlyov	3f0cdc7a11	[AMDGPU] Promote f16/i16 conversions to f32/i32 llvm-svn: 287201	2016-11-17 04:00:46 +00:00
Konstantin Zhuravlyov	662e01dfbe	[AMDGPU] Expand `br_cc` for f16 Differential Revision: https://reviews.llvm.org/D26732 llvm-svn: 287199	2016-11-17 03:49:01 +00:00
Dehao Chen	41d72a8632	Use profile info to adjust loop unroll threshold. Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186	2016-11-17 01:17:02 +00:00
Dylan McKay	48c26b2b12	[AVR] Remove some accidentally-commited code that broke the bots This is a remnant of an on-chip unit testing tool that has since been moved out-of-tree. It was accidentally committed in r287162. llvm-svn: 287180	2016-11-17 00:09:38 +00:00
Peter Collingbourne	f72a8d4e08	Introduce GlobalSplit pass. This pass splits globals into elements using inrange annotations on getelementptr indices. Differential Revision: https://reviews.llvm.org/D22295 llvm-svn: 287178	2016-11-16 23:40:26 +00:00
Dylan McKay	6dd69032c9	[AVR] Fix basic block naming in ctlz and cttz tests The branch selector would change the names. llvm-svn: 287174	2016-11-16 22:48:38 +00:00
Dylan McKay	9701c42de9	[AVR] Add tests for counting leading/trailing zeros This adds two test files that verify the 'cttz' and 'ctlz' operations. llvm-svn: 287172	2016-11-16 22:38:43 +00:00
Sanjay Patel	066139a3ec	[x86] allow FP-logic ops when one operand is FP and result is FP We save an inter-register file move this way. If there's any CPU where the FP logic is slower, we could transform this back to int-logic in MachineCombiner. This helps, but doesn't solve, PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 The 'andn' test shows that we're missing a pattern match to recognize the xor with -1 constant as a 'not' op. llvm-svn: 287171	2016-11-16 22:34:05 +00:00
Kevin Enderby	7fa40c9f2b	General clean up of error handling in llvm-objdump to remove its use of report_fatal_error(). No real functional change with this commit. The problem with report_fatal_error() is it does not include the tool name and the file name the for which the error message was generated. Uses of report_fatal_error() were change to report_error() or error() to get a better error and to make the code smaller and cleaner. Also changed things like error(errorToErrorCode(SOrErr.takeError())) to use report_error() with a file name and the llvm::Error (as well as the ArchitectureName if available) so the error message is printed. llvm-svn: 287163	2016-11-16 22:17:38 +00:00
Dylan McKay	a789f40002	[AVR] Add the pseudo instruction expansion pass Summary: A lot of the pseudo instructions are required because LLVM assumes that all integers of the same size as the pointer size are legal. This means that it will not currently expand 16-bit instructions to their 8-bit variants because it thinks 16-bit types are legal for the operations. This also adds all of the CodeGen tests that required the pass to run. Reviewers: arsenm, kparzysz Subscribers: wdng, mgorny, modocache, llvm-commits Differential Revision: https://reviews.llvm.org/D26577 llvm-svn: 287162	2016-11-16 21:58:04 +00:00
Sanjoy Das	df4b162e4d	[ImplicitNullChecks] Do not not handle call MachineInstrs We don't track callee clobbered registers correctly, so avoid hoisting across calls. Note: for this bug to trigger we need a `readonly` call target, since we already have logic to not hoist across potentially storing instructions either. llvm-svn: 287159	2016-11-16 21:45:22 +00:00
Peter Collingbourne	7a74803abf	Bitcode: Introduce initial multi-module reader API. Implement getLazyBitcodeModule() and parseBitcodeFile() in terms of it. Differential Revision: https://reviews.llvm.org/D26719 llvm-svn: 287156	2016-11-16 21:44:45 +00:00
Tim Northover	397f9d9d05	ARM: fix CodeGen for 64-bit shifts. One half of the shifts obviously needed conditional selection based on whether the shift amount is more than 32-bits, but leaving the other half as the natural shift isn't acceptable either: it's undefined behaviour to shift a 32-bit value by more than 31. llvm-svn: 287149	2016-11-16 20:54:28 +00:00
Matt Arsenault	3b36bb1d87	AMDGPU: Enable ConstrainCopy DAG mutation This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146	2016-11-16 20:35:23 +00:00
Geoff Berry	8301c645c8	[AArch64] Handle vector types in replaceZeroVectorStore. Summary: Extend replaceZeroVectorStore to handle more vector type stores, floating point zero vectors and set alignment more accurately on split stores. This is a follow-up change to r286875. This change fixes PR31038. Reviewers: MatzeB Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26682 llvm-svn: 287142	2016-11-16 19:35:19 +00:00
Tom Stellard	0d162b1c4f	AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass Summary: 1. Don't try to copy values to and from the same register class. 2. Replace copies with of registers with immediate values with v_mov/s_mov instructions. The main purpose of this change is to make MachineSink do a better job of determining when it is beneficial to split a critical edge, since the pass assumes that copies will become move instructions. This prevents a regression in uniform-cfg.ll if we enable critical edge splitting for AMDGPU. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23408 llvm-svn: 287131	2016-11-16 18:42:17 +00:00
Sanjay Patel	7f3d51f840	[x86] add fake scalar FP logic instructions to ReplaceableInstrs to save some bytes We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of compilers, but logically equivalent int, float, and double variants of bitwise-logic instructions are reality in x86, and the float variant may be a shorter instruction depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all the time. This is a preliminary step towards solving PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 Differential Revision: https://reviews.llvm.org/D26712 llvm-svn: 287122	2016-11-16 17:42:40 +00:00
Reid Kleckner	3a83e76811	[sancov] Name the global containing the main source file name If the global name doesn't start with __sancov_gen, ASan will insert unecessary red zones around it. llvm-svn: 287117	2016-11-16 16:50:43 +00:00
Simon Pilgrim	79416ea76a	[X86] Add integer division test for PR23590 Shows missed opportunity to recognise reduced integer division result size llvm-svn: 287110	2016-11-16 14:54:34 +00:00
Simon Pilgrim	b57dd17142	[X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen. LLVM counterpart to D26686 Differential Revision: https://reviews.llvm.org/D26736 llvm-svn: 287108	2016-11-16 14:48:32 +00:00
Simon Pilgrim	9e355bc5bb	[X86][AVX512] Added some mask/maskz tests for sitofp/uitofp i32 to f64 llvm-svn: 287106	2016-11-16 14:24:04 +00:00
Simon Pilgrim	c223aa52b1	[X86] Regenerated integer divide tests to test on 32 and 64 bit targets llvm-svn: 287104	2016-11-16 14:12:11 +00:00
Simon Pilgrim	dd8c71c646	[X86][SSE] Added PSUBUS from SELECT tests from D25987 llvm-svn: 287103	2016-11-16 13:59:03 +00:00
Simon Dardis	8ca1cbccc6	[mips] Fix unsigned/signed type error MipsFastISel uses a a class to represent addresses with a signed member to represent the offset. MipsFastISel::emitStore, emitLoad and computeAddress all treated the offset as being positive. In cases where the offset was actually negative and a frame pointer was used, this would cause the constant synthesis routine to crash as it would generate an unexpected instruction sequence when frame indexes are replaced. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D26192 llvm-svn: 287099	2016-11-16 11:29:07 +00:00
Simon Dardis	7b7cb8d9dd	[mips] not instruction alias This patch adds the single operand form of the not alias to microMIPS and MIPS along with additional tests. This partially resolves PR/30381. Thanks to Sean Bruno for reporting the issue! llvm-svn: 287097	2016-11-16 11:04:49 +00:00
Ayman Musa	4d60243bfd	[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss\|sd} intrinsics. Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 287087	2016-11-16 09:00:28 +00:00
Craig Topper	6910fa0ef4	[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083	2016-11-16 05:24:10 +00:00
Davide Italiano	6cf09265f9	[ELF] Convert ELF.h to Expected<T>. This has two advantages: 1) We slowly move away from ErrorOr to the new handling interface, in the hope of having an uniform error handling in LLVM, eventually. 2) We're starting to have meaningful error messages for invalid object ELF files, rather than a generic "parse error". At some point we should include also the offset to improve the quality of the diagnostic. llvm-svn: 287081	2016-11-16 05:10:28 +00:00
Saleem Abdulrasool	d05c5aea47	test: use separate input file for test Rather than using sed to generate the input and pipe the result to strings, use the static input instead. llvm-svn: 287079	2016-11-16 04:08:46 +00:00
Matthias Braun	3d51cf0a2c	AArch64: Use DeadRegisterDefinitionsPass before regalloc. Doing this before register allocation reduces register pressure as we do not even have to allocate a register for those dead definitions. Differential Revision: https://reviews.llvm.org/D26111 llvm-svn: 287076	2016-11-16 03:38:27 +00:00
Konstantin Zhuravlyov	2a87a42035	[AMDGPU] Handle f16 select{_cc} - Select `select` to `v_cndmask_b32` - Expand `select_cc` - Refactor patterns Differential Revision: https://reviews.llvm.org/D26714 llvm-svn: 287074	2016-11-16 03:16:26 +00:00
Quentin Colombet	fb9b0cdcfe	[RegAllocGreedy] Record missed hint for late recoloring. In https://reviews.llvm.org/D25347, Geoff noticed that we still have useless copy that we can eliminate after register allocation. At the time the allocation is chosen for those copies, they are not useless but, because of changes in the surrounding code, later on they might become useless. The Greedy allocator already has a mechanism to deal with such cases with a late recoloring. However, we missed to record the some of the missed hints. This commit fixes that. llvm-svn: 287070	2016-11-16 01:07:12 +00:00
Vyacheslav Klochkov	b3dc774a99	Fixed the lost FastMathFlags for CALL operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26575 llvm-svn: 287064	2016-11-16 00:55:50 +00:00
Justin Lebar	2860573529	[BypassSlowDivision] Handle division by constant numerators better. Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062	2016-11-16 00:44:47 +00:00
Joerg Sonnenberger	8c1a9ac52b	Always use relative jump table encodings on PowerPC64. For the default, small and medium code model, use the existing difference from the jump table towards the label. For all other code models, setup the picbase and use the difference between the picbase and the block address. Overall, this results in smaller data tables at the expensive of one or two more arithmetic operation at the jump site. Given that we only create jump tables with a lot more than two entries, it is a net win in size. For larger code models the assumption remains that individual functions are no larger than 2GB. Differential Revision: https://reviews.llvm.org/D26336 llvm-svn: 287059	2016-11-16 00:37:30 +00:00
Jan Vesely	e8cc395e4f	AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument wbinvl.* are vector instruction that do not sue vector registers. v2: check only M?BUF instructions Differential Revision: https://reviews.llvm.org/D26633 llvm-svn: 287056	2016-11-15 23:55:15 +00:00
Sanjay Patel	aaf430452b	[x86] regenerate checks; NFC llvm-svn: 287051	2016-11-15 23:09:53 +00:00
Kevin Enderby	844c4ac55a	General clean up of Mach-O error handling in llvm-objdump. To get a good error message for all files that could contain Mach-O files the code in llvm-objdump needs to use the archive member name and name of the architecture of a slice of a universal file in those cases where the error come from a Mach-O file in an archive or a universal file. Most of this is fixed by moving the call to checkSymbolTable() into ProcessMachO() and calling it when the operation needs the symbol table. And then calling the form of report_error() that has the ArchiveName and ArchitectureName arguments. One other place needed to call this form of report_error() also with these arguments. Also changed the code in MachODump.cpp to not use report_fatal_error() and use report_error() instead to make the code smaller and cleaner. All cases of this are for errors with the symbol table which should now never be tripped since checkSymbolTable() should be called first to get a good error message in these cases. llvm-svn: 287050	2016-11-15 23:07:41 +00:00
Sanjay Patel	07529a313a	[x86] auto-generate better checks; NFC llvm-svn: 287049	2016-11-15 23:01:11 +00:00
Sanjay Patel	87cb0745eb	[x86] auto-generate better checks; NFC llvm-svn: 287048	2016-11-15 22:42:20 +00:00
Filipe Cabecinhas	ec350b71fa	[AddressSanitizer] Add support for (constant-)masked loads and stores. This patch adds support for instrumenting masked loads and stores under ASan, if they have a constant mask. isInterestingMemoryAccess now supports returning a mask to be applied to the loads, and instrumentMop will use it to generate additional checks. Added tests for v4i32 v8i32, and v4p0i32 (~v4i64) for both loads and stores (as well as a test to verify we don't add checks to non-constant masks). Differential Revision: https://reviews.llvm.org/D26230 llvm-svn: 287047	2016-11-15 22:37:30 +00:00
Sanjay Patel	9a4ce290d0	[x86] auto-generate better checks; NFC llvm-svn: 287046	2016-11-15 22:33:16 +00:00
Amaury Sechet	003216b319	[C API] Prevent nullptr dereferences in C API for counting attributes. See https://reviews.llvm.org/D26392 Patch by @maleadt llvm-svn: 287044	2016-11-15 22:19:59 +00:00
Peter Collingbourne	bc9a574657	Object: replace backslashes with slashes in embedded relative thin archive paths on Windows. This makes these thin archives portable between *nix and Windows. Differential Revision: https://reviews.llvm.org/D26696 llvm-svn: 287038	2016-11-15 21:36:35 +00:00
Chad Rosier	201fc1ed26	[AArch64] Add support for Qualcomm's Falkor CPU. Differential Revision: https://reviews.llvm.org/D26673 llvm-svn: 287036	2016-11-15 21:34:12 +00:00
Tom Stellard	d23de360db	AMDGPU/SI: Fix pattern for i16 = sign_extend i1 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26670 llvm-svn: 287035	2016-11-15 21:25:56 +00:00
Sanjay Patel	2a51748a5d	[x86] add tests for FP-logic equivalent instruction replacement The ANDN test needs at least 3 different fixes. llvm-svn: 287032	2016-11-15 21:19:28 +00:00
Kostya Serebryany	9d6dc7b164	[sanitizer-coverage] make sure asan does not instrument coverage guards (reported in https://github.com/google/oss-fuzz/issues/84 ) llvm-svn: 287030	2016-11-15 21:12:50 +00:00
Tim Northover	bf55f7ea59	llvm-objdump: deal with unexpected object files more gracefully. Specifically, we don't want to segfault on release builds, so print the problem instead. llvm-svn: 287022	2016-11-15 20:26:01 +00:00
Matt Arsenault	d4bb5e4831	AMDGPU: Enable store clustering Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021	2016-11-15 20:22:55 +00:00
Haicheng Wu	faee2b71a7	[AArch64] Lower multiplication by a constant int to shl+add+shl Lower a = b * C where C = (2^n + 1) * 2^m to add w0, w0, w0, lsl n lsl w0, w0, m Differential Revision: https://reviews.llvm.org/D229245 llvm-svn: 287019	2016-11-15 20:16:48 +00:00
Matt Arsenault	3666629837	AMDGPU: Analyze mubuf with immediate soffset Fixes giving up on clustering common addr64 accesses with constant 0 soffset. llvm-svn: 287018	2016-11-15 20:14:27 +00:00
Wei Mi	37c4aaaf52	Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific. llvm-svn: 287014	2016-11-15 19:42:05 +00:00
Stanislav Mekhanoshin	ea91cca593	[AMDGPU] Add wave barrier builtin The wave barrier represents the discardable barrier. Its main purpose is to carry convergent attribute, thus preventing illegal CFG optimizations. All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA. The barrier is discarded during code generation. Differential Revision: https://reviews.llvm.org/D26585 llvm-svn: 287007	2016-11-15 19:00:15 +00:00
Sanjay Patel	22465125b3	[x86] auto-generate checks; NFC Also, fix the test params to use an attribute rather than a CPU model and remove the AVX run because that does nothing but check for a 'v' prefix in all of these tests. llvm-svn: 287003	2016-11-15 18:44:53 +00:00
Wei Mi	7ccf7651c0	[LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 286999	2016-11-15 18:35:53 +00:00
Pawel Bylica	c3f6c97f71	Integer legalization: fix MUL expansion Summary: This fixes the runtime results produces by the fallback multiplication expansion introduced in r270720. For tests I created a fuzz tester that compares the results with Boost.Multiprecision. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26628 llvm-svn: 286998	2016-11-15 18:29:24 +00:00
Zaara Syeda	a19c9e60e9	vector load store with length (left justified) llvm portion llvm-svn: 286993	2016-11-15 17:54:19 +00:00
Wei Mi	d2948cef70	[IndVars] Change the order to compute WidenAddRec in widenIVUse. When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence return non-null but different WideAddRec, if getWideRecurrence is called before getExtendedOperandRecurrence, we won't bother to call getExtendedOperandRecurrence again. But As we know it is possible that after SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null WideAddRec, we know for sure that it is legal to do widening for current instruction. So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which will increase the chance of successful widening. Differential Revision: https://reviews.llvm.org/D26059 llvm-svn: 286987	2016-11-15 17:34:52 +00:00
Craig Topper	c7486af9c9	[AVX-512] Add AVX-512 vector shift intrinsics to memory santitizer. Just needed to add the intrinsics to the exist switch. The code is generic enough to support the wider vectors with no changes. llvm-svn: 286980	2016-11-15 16:27:33 +00:00
Simon Pilgrim	ceffb43b1b	[X86][SSE] Improve SINT_TO_FP of boolean vector results (signum) This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979	2016-11-15 16:24:40 +00:00
Sanjay Patel	bb238bb4e5	[InstCombine] add tests for bitcasted selects; NFC llvm-svn: 286978	2016-11-15 16:01:16 +00:00
Pablo Barrio	4f80c93a2e	Revert "[JumpThreading] Unfold selects that depend on the same condition" This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc. llvm-svn: 286976	2016-11-15 15:42:23 +00:00
Robert Lougher	b0905209dd	[LoopVectorizer] When estimating reg usage, unused insts may "end" another use The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 llvm-svn: 286969	2016-11-15 14:27:33 +00:00
Tony Jiang	5f850cd1b1	[PowerPC] Implement BE VSX load/store builtins - llvm portion. This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE, they behaves exactly the same with vec_xl and vec_xst, therefore they are simply implemented by defining a matching macro. On LE, they are implemented by defining new builtins and intrinsics. For int/float/long long/double, it is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short, we also need some extra shuffling before or after call the builtins to get the desired BE order. For int128, simply call vec_xl or vec_xst. llvm-svn: 286967	2016-11-15 14:25:56 +00:00
Zvi Rackover	f0b9b57bd3	[X86][FastISel] Fix lowering of overflow result on AVX512 targets Summary: Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class. We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction. Fixes pr30981. Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D26620 llvm-svn: 286958	2016-11-15 13:29:23 +00:00
Javed Absar	f043dac25d	[ARM] Add machine scheduler for Cortex-R52 This patch adds the Sched Machine Model for Cortex-R52. Details of the pipeline and descriptions are in comments in file ARMScheduleR52.td included in this patch. Reviewers: rengolin, jmolloy Differential Revision: https://reviews.llvm.org/D26500 llvm-svn: 286949	2016-11-15 11:34:54 +00:00
Asaf Badouh	b573553424	DAGCombiner: fix combine of trunc and select bugzilla: https://llvm.org/bugs/show_bug.cgi?id=29002 pr29002 Differential Revision: https://reviews.llvm.org/D26449 llvm-svn: 286938	2016-11-15 07:55:22 +00:00
Matt Arsenault	1c8d933881	TableGen: Add operator !or llvm-svn: 286936	2016-11-15 06:49:28 +00:00
Zvi Rackover	76dbf26599	[X86][GlobalISel] Add minimal call lowering support to the IRTranslator Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935	2016-11-15 06:34:33 +00:00
Craig Topper	0637099f24	[AVX-512] Add an example test case for PR31018. llvm-svn: 286934	2016-11-15 05:21:55 +00:00
Matt Arsenault	c79dc70d50	AMDGPU: Fix f16 fabs/fneg llvm-svn: 286931	2016-11-15 02:25:28 +00:00
Saleem Abdulrasool	f7009b42f8	llvm-strings: support the `-n` option Permit specifying the match length (the `-n` or `--bytes` option). The deprecated `-[length]` form is not supported as an option. This allows the strings tool to display only the specified length strings rather than the hardcoded default length of >= 4. llvm-svn: 286914	2016-11-15 00:43:52 +00:00
Matt Arsenault	972034bda9	AMDGPU: Fix formatting of 1/2pi immediate llvm-svn: 286912	2016-11-15 00:04:33 +00:00
Tom Stellard	9c884e495c	MIRParser: Add support for parsing vreg reg alloc hints Reviewers: qcolombet, MatzeB Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26573 llvm-svn: 286911	2016-11-15 00:03:14 +00:00
Evandro Menezes	9fc54826e0	[AArch64] Compute the Newton series for reciprocals natively Implement the Newton series for square root, its reciprocal and reciprocal natively using the specialized instructions in AArch64 to perform each series iteration. Differential revision: https://reviews.llvm.org/D26518 llvm-svn: 286907	2016-11-14 23:29:01 +00:00
Peter Collingbourne	3cb86272fc	Linker: Remove unnecessary call to copyMetadata in IRLinker::linkGlobalVariable. This was causing us to create duplicate metadata on global variables. Debug info test case by Adrian Prantl, additional test cases by me. Fixes PR31012. Differential Revision: https://reviews.llvm.org/D26622 llvm-svn: 286905	2016-11-14 23:18:38 +00:00
Tim Northover	e33b175411	GlobalISel: add tests for G_ZEXT/G_SEXT to types smaller than 32-bits. Support was accidentally added in r286407, but there were no tests at the time. llvm-svn: 286903	2016-11-14 22:50:22 +00:00
Sanjay Patel	aaa06fa486	[InstCombine] add tests to show missing bitcast folds llvm-svn: 286900	2016-11-14 22:44:06 +00:00
Tom Stellard	11e60ff7da	RegAllocGreedy: Properly initialize this pass, so that -run-pass will work Reviewers: qcolombet, MatzeB Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26572 llvm-svn: 286895	2016-11-14 21:50:13 +00:00
Kuba Brecka	ddfdba3b01	[tsan] Add support for C++ exceptions into TSan (call __tsan_func_exit during unwinding), LLVM part This adds support for TSan C++ exception handling, where we need to add extra calls to __tsan_func_exit when a function is exitted via exception mechanisms. Otherwise the shadow stack gets corrupted (leaked). This patch moves and enhances the existing implementation of EscapeEnumerator that finds all possible function exit points, and adds extra EH cleanup blocks where needed. Differential Revision: https://reviews.llvm.org/D26177 llvm-svn: 286893	2016-11-14 21:41:13 +00:00
Saleem Abdulrasool	f10a871419	Revert "Revert "llvm-strings: support printing the filename"" Change the dynamic files to static in the hope that it will actually fix the transient errors that Ive been unable to reproduce. llvm-svn: 286891	2016-11-14 21:10:41 +00:00
Kevin Enderby	22fc007809	Add a checkSymbolTable() method to the MachOObjectFile class. The philosophy of the error checking in libObject for Mach-O files is that the constructor will check the load commands so for their tables the offsets and sizes are properly contained in the file. But there is no checking of the entries of any of the tables. For the contents of the tables themselves the methods accessing the contents of the entries return errors as needed. In some cases this however makes it difficult or cumbersome to produce a good error message which would include the tool name, file name, archive member, and name of the architecture of a slice of a universal file the error occurred in. So idea is that there will be a method to check a table which can be called up front before using it allowing a good error message to be produced before a table is used. And if only verification of the Mach-O file and its tables are wanted a new possible method checkAllTables() could be added to call all of the methods to check all the tables at some time when such methods exist. The checkSymbolTable() is the first of such methods to check one of the Mach-O file tables. This method initially will used in llvm-objdump’s DisassembleMachO() routine before it gets the section and symbol information. As if there are problems with the symbol table currently the error is first encountered by the bool operator() in the SymbolSorter() struct which passed to std::sort(). In this case there is no context as to the file name the symbol which results a poor error message: LLVM ERROR: truncated or malformed object (bad string index: 22 for symbol at index 1) with the added call to the checkSymbolTable() method the error message includes the tool name and file name: llvm-objdump: 'macho-invalid-symbol-strx': truncated or malformed object (bad string table index: 22 past the end of string table, for symbol at index 1) llvm-svn: 286887	2016-11-14 20:57:04 +00:00
Tim Northover	46a6f0fbf0	Recommit: ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. Fixed usage of std::sort so that we (hopefully) use instantiations that actually exist in GCC 4.8. llvm-svn: 286881	2016-11-14 20:28:24 +00:00
Michael Kuperstein	f221f13ccc	[X86] Tests exhibiting bad parial reloading behavior. NFC. llvm-svn: 286878	2016-11-14 19:58:11 +00:00
Geoff Berry	526c50588d	[AArch64] Split 0 vector stores into scalar store pairs. Summary: Replace a splat of zeros to a vector store by scalar stores of WZR/XZR. The load store optimizer pass will merge them to store pair stores. This should be better than a movi to create the vector zero followed by a vector store if the zero constant is not re-used, since one instructions and one register live range will be removed. For example, the final generated code should be: stp xzr, xzr, [x0] instead of: movi v0.2d, #0 str q0, [x0] Reviewers: t.p.northover, mcrosier, MatzeB, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26561 llvm-svn: 286875	2016-11-14 19:39:04 +00:00
Teresa Johnson	4fef68cb8d	[ThinLTO] Only promote exported locals as marked in index Summary: We have always speculatively promoted all renamable local values (except const non-address taken variables) for both the exporting and importing module. We would then internalize them back based on the ThinLink results if they weren't actually exported. This is inefficient, and results in unnecessary renames. It also meant we had to check the non-renamability of a value in the summary, which was already checked during function importing analysis in the ThinLink. Made renameModuleForThinLTO (which does the promotion/renaming) instead use the index when exporting, to avoid unnecessary renames/promotions. For importing modules, we can simply promoted all values as any local we import by definition is exported and needs promotion. This required changes to the method used by the FunctionImport pass (only invoked from 'opt' for testing) and when invoked from llvm-link, since neither does a ThinLink. We simply conservatively mark all locals in the index as promoted, which preserves the current aggressive promotion behavior. I also needed to change an llvm-lto based test where we had previously been aggressively promoting values that weren't importable (aliasees), but now will not promote. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26467 llvm-svn: 286871	2016-11-14 19:21:41 +00:00
Tim Northover	1b66f39cf2	Revert "ARM: sort register lists by encoding in push/pop instructions." This reverts commit 286866. It broke a bot, something to do with exactly which templates std::sort accepts. llvm-svn: 286867	2016-11-14 19:05:28 +00:00
Tim Northover	e908ea844c	ARM: sort register lists by encoding in push/pop instructions. For example we were producing push {r8, r10, r11, r4, r5, r7, lr} This is misleading (r4, r5 and r7 are actually pushed before the rest), and other components (stack folding recently) often forget to deal with the extra complexity coming from the different order, leading to miscompiles. Finally, we warn about our own code in -no-integrated-as mode without this, which is really not a good idea. llvm-svn: 286866	2016-11-14 19:02:17 +00:00
Sean Fertile	a435e07de8	[PPC] Add intrinsic mapping to the xscvhpsp instruction add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to Single-Precision' instruction. Differential review: https://reviews.llvm.org/D26536 llvm-svn: 286862	2016-11-14 18:43:59 +00:00
Changpeng Fang	8236fe103f	AMDGPU/SI: Support data types other than V4f32 in image intrinsics Summary: Extend image intrinsics to support data types of V1F32 and V2F32. TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active, even though such case should be very rare. Reviewers: tstellarAMD Differential Revision: http://reviews.llvm.org/D26472 llvm-svn: 286860	2016-11-14 18:33:18 +00:00
Zvi Rackover	35bb7fdadc	[X86] Adding reproducer for pr30981 llvm-svn: 286855	2016-11-14 18:10:44 +00:00
Teresa Johnson	3624bdf60a	Restore "[ThinLTO] Prevent exporting of locals used/defined in module level asm" This restores the rest of r286297 (part was restored in r286475). Specifically, it restores the part requiring adding a dependency from the Analysis to Object library (downstream use changed to correctly model split BitReader vs BitWriter libraries). Original description of this part of patch follows: Module level asm may also contain defs of values. We need to prevent export of any refs to local values defined in module level asm (e.g. a ref in normal IR), since that also requires renaming/promotion of the local. To do that, the summary index builder looks at all values in the module level asm string that are not marked Weak or Global, which is exactly the set of locals that are defined. A summary is created for each of these local defs and flagged as NoRename. This required adding handling to the BitcodeWriter to look at GV declarations to see if they have a summary (rather than skipping them all). Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to ensure that an MCAsmParser is available, otherwise the module asm parse would silently fail. Initialized the asm parser in the opt tool for use in testing this fix. Fixes PR30610. llvm-svn: 286844	2016-11-14 17:12:32 +00:00
Sumanth Gundapaneni	d428cf8b5f	[Hexagon] Remove unsafe load instructions that affect Stack Slot Coloring The Stack slot coloring pass removes a store that is followed by a load that deal with the same stack slot. The function isLoadFromStackSlot is supposed to consider the loads that have no side-effects. This patch fixed the issue by removing the unsafe loads from this function Eg: %vreg0<def> = L2_loadruh_io <fi#15>, 0 S2_storeri_io <fi#15>, 0, %vreg0 In this case, we load an unsigned extended half word and store this in to the same stack slot. The Stack slot coloring pass considers safe to remove the store. This patch marked all the non-vector byte and half word loads as unsafe. llvm-svn: 286843	2016-11-14 17:11:00 +00:00
Simon Pilgrim	779da8e5ea	[CostModel][X86] Added mul costs for vXi8 vectors More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838	2016-11-14 15:54:24 +00:00
Simon Pilgrim	27fed8e5d6	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832	2016-11-14 14:45:16 +00:00
Sean Fertile	adda5b2d2b	[PPC] add intrinsics for vec extract exp/significand and vec test data class. Differential Revision: https://reviews.llvm.org/D26272 llvm-svn: 286829	2016-11-14 14:42:37 +00:00
Renato Golin	199b6b941d	Revert "llvm-strings: support printing the filename" Also, Revert "test: remove the archive before modifying it" Revert "test: explicitly use gnu format" This reverts commits r286778, r286729 and r286767, as they are randomly failing on many bots (AArch64, x86_64). llvm-svn: 286820	2016-11-14 13:09:24 +00:00
James Molloy	6df8f27c95	[InlineCost] Remove skew when calculating call costs When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself. However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case: int g() { h(); } int f() { g(); } llvm-svn: 286814	2016-11-14 11:14:41 +00:00
Craig Topper	8f85ad1755	[AVX-512] Add suffixless aliases for EVEX encoded vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior. Fixes another problem from PR28850. llvm-svn: 286790	2016-11-14 02:46:58 +00:00
Craig Topper	b8596e4d1d	[X86] Cleanup 'x' and 'y' mnemonic suffixes for vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787	2016-11-14 01:53:29 +00:00
Craig Topper	353e59b6d6	[AVX-512] Remove and autoupgrade masked dword/qword variable shift intrinsics to the new unmasked versions and selects. llvm-svn: 286786	2016-11-14 01:53:22 +00:00
Saleem Abdulrasool	25d7683fe7	test: remove the archive before modifying it The archive may already exist when not doing a clean test run. The dirty state can cause a test failure. Remove the archive first. llvm-svn: 286778	2016-11-13 20:43:41 +00:00
Saleem Abdulrasool	7091820a96	llvm-cxxfilt: support reading from stdin `c++filt` when given no arguments runs as a REPL, decoding each line as a decorated name. Unify the test structure to be more uniform, with the tests for llvm-cxxfilt living under test/tools/llvm-cxxfilt. llvm-svn: 286777	2016-11-13 20:43:38 +00:00
Sanjay Patel	cfcc42bdc2	[ValueTracking] recognize even more variants of smin/smax Similar to: https://reviews.llvm.org/rL285499 https://reviews.llvm.org/rL286318 We can't minimally expose this in IR tests because we don't have min/max intrinsics, but the difference is visible in codegen because SelectionDAGBuilder::visitSelect() uses matchSelectPattern(). We're not canonicalizing these patterns in IR (yet), so I don't expect there to be any regressions as noted here: http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html llvm-svn: 286776	2016-11-13 20:04:52 +00:00
Craig Topper	ba13703bb3	[AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd. Fixes PR24941. llvm-svn: 286775	2016-11-13 19:58:18 +00:00
Saleem Abdulrasool	6ac46b42ac	test: synchronise lit substitutions llvm-strings was added to the test dependencies without updating the lit substitutions. Synchronise the list. llvm-svn: 286773	2016-11-13 19:37:00 +00:00
Saleem Abdulrasool	8b9be8fd41	llvm-strings: support printing the filename This adds support for the `-f` or `--print-file-name` option for strings. llvm-svn: 286767	2016-11-13 19:07:48 +00:00
Matt Arsenault	dc45274d54	AMDGPU: Implement SGPR spilling with scalar stores nThis avoids the nasty problems caused by using memory instructions that read the exec mask while spilling / restoring registers used for control flow masking, but only for VI when these were added. This always uses the scalar stores when enabled currently, but it may be better to still try to spill to a VGPR and use this on the fallback memory path. The cache also needs to be flushed before wave termination if a scalar store is used. llvm-svn: 286766	2016-11-13 18:20:54 +00:00
Igor Breger	e2399f9e0e	revert commit r286761, some builds failed on Win platforms llvm-svn: 286765	2016-11-13 15:48:11 +00:00
Simon Pilgrim	055c09c1c0	[X86][SSE] Add zero lower 32-bits test case for PR30845 llvm-svn: 286764	2016-11-13 15:32:11 +00:00
Simon Pilgrim	8f7c56125e	[X86][AVX512] Add masked VPMOZX test case for PR26762 llvm-svn: 286763	2016-11-13 15:16:43 +00:00
Simon Pilgrim	ce59a536f7	[X86][SSE] Add additional test case for PR30845 llvm-svn: 286762	2016-11-13 14:57:52 +00:00
Ayman Musa	c09b3769ae	[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss\|sd} intrinsics. Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 286761	2016-11-13 14:51:25 +00:00
Ayman Musa	46af8f9c6f	[X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions. Differential Revision: https://reviews.llvm.org/D26022 llvm-svn: 286758	2016-11-13 14:29:32 +00:00
Craig Topper	b4173a5a70	[InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked AVX-512 variable shift intrinsics. llvm-svn: 286755	2016-11-13 07:26:19 +00:00
Craig Topper	43e97649a1	[AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords. These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2. llvm-svn: 286754	2016-11-13 07:26:15 +00:00
Konstantin Zhuravlyov	f86e4b7266	[AMDGPU] Add f16 support (VI+) Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753	2016-11-13 07:01:11 +00:00
Craig Topper	706d897d8a	[AVX-512] Move masked shift intrinsics tests to the autoupgrade test file. These missed being moved in r286725. llvm-svn: 286746	2016-11-13 03:42:27 +00:00
Craig Topper	8b831cbb2a	[InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 shift by immediate and shift by single value. This does not include support for the AVX-512 variable shifts. That will be coming in a future patch. llvm-svn: 286739	2016-11-13 01:51:55 +00:00
Sanjay Patel	a1b8c10bf6	[x86] add smin/smax with zero tests These are vector tests corresponding to the discussion at: http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html Apart from the lack of min/max matching, the and/andn difference shows a lack of DAG-level canonicalization. llvm-svn: 286737	2016-11-13 00:32:39 +00:00
Simon Pilgrim	6e09afa9d0	[X86][SSE] Add test case for PR30845 llvm-svn: 286734	2016-11-12 23:44:58 +00:00
Saleem Abdulrasool	43400b4c06	test: explicitly use gnu format This should fix the Darwin buildbots. llvm-svn: 286729	2016-11-12 19:03:08 +00:00
Saleem Abdulrasool	be3a2919f4	llvm-strings: trivialise logic until we support more options Until we have handling for ignoring unloaded sections, simplify the logic to the point of triviality. This fixes the scanning of archives, particularly when embedded in archives. llvm-svn: 286727	2016-11-12 18:37:04 +00:00
Craig Topper	da6a63db1c	[AVX-512] Remove the remaining masked shift by immediate or by single value. Autoupgrade them to recently introduced unmasked versions and a select. After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics. llvm-svn: 286725	2016-11-12 18:04:46 +00:00
Michal Gorny	a583be4e52	[OCaml] Clear cross-target test deps when building out-of-tree Clear cross-target test dependencies when using LLVM_OCAML_OUT_OF_TREE, in order to make it possible to run check-llvm-bindings-ocaml without rebuilding the whole LLVM. Differential Revision: https://reviews.llvm.org/D26580 llvm-svn: 286720	2016-11-12 14:58:30 +00:00
Craig Topper	9d25c5e2fa	[AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM. Summary: This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend. This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang. Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26333 llvm-svn: 286711	2016-11-12 05:28:24 +00:00
Craig Topper	5cb13062d2	[AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQ Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709	2016-11-12 05:05:27 +00:00
Saleem Abdulrasool	c1f8e1f35c	build: add a dependency on llvm-strings Since we now have tests for llvm-strings, add a dependency on the tool. llvm-svn: 286707	2016-11-12 03:45:21 +00:00
Saleem Abdulrasool	2729786fff	llvm-strings: ensure that the last string is correctly printed We would ignore the last string that appeared if the file ended with a printable character. Ensure that we get the last string. llvm-svn: 286706	2016-11-12 03:39:21 +00:00
whitequark	4dcf92a27e	[OCaml] Adapt to the new attribute C API. llvm-svn: 286705	2016-11-12 03:38:30 +00:00
Tom Stellard	b4c8e8e30b	AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI Summary: This fixes a regression caused by r286464. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26570 llvm-svn: 286687	2016-11-12 00:19:11 +00:00
Tom Stellard	9fdbec870c	AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies Summary: This pass was assuming that when a PHI instruction defined a register used by another PHI instruction that the defining insstruction would be legalized before the using instruction. This assumption was causing the pass to not legalize some PHI nodes within divergent flow-control. This fixes a bug that was uncovered by r285762. Reviewers: nhaehnle, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26303 llvm-svn: 286676	2016-11-11 23:35:42 +00:00
Sanjay Patel	8e68394376	[InstCombine] update test to use FileCheck; NFC llvm-svn: 286668	2016-11-11 23:12:46 +00:00
Anna Zaks	3c43737832	[tsan][llvm] Implement the function attribute to disable TSan checking at run time This implements a function annotation that disables TSan checking for the function at run time. The benefit over attribute((no_sanitize("thread"))) is that the accesses within the callees will also be suppressed. The motivation for this attribute is a guarantee given by the objective C language that the calls to the reference count decrement and object deallocation will be synchronized. To model this properly, we would need to intercept all ref count decrement calls (which are very common in ObjC due to use of ARC) and also every single message send. Instead, we propose to just ignore all accesses made from within dealloc at run time. The main downside is that this still does not introduce any synchronization, which means we might still report false positives if the code that relies on this synchronization is not executed from within dealloc. However, we have not seen this in practice so far and think these cases will be very rare. Differential Revision: https://reviews.llvm.org/D25858 llvm-svn: 286663	2016-11-11 23:01:02 +00:00
Adam Nemet	9bfbf8bbdf	[LV] Stop saying "use -Rpass-analysis=loop-vectorize" This is PR28376. Unfortunately given the current structure of optimization diagnostics we lack the capability to tell whether the user has passed -Rpass-analysis=loop-vectorize since this is local to the front-end (BackendConsumer::OptimizationRemarkHandler). So rather than printing this even if the user has already passed -Rpass-analysis, this patch just punts and stops recommending this option. I don't think that getting this right is worth the complexity. Differential Revision: https://reviews.llvm.org/D26563 llvm-svn: 286662	2016-11-11 22:51:46 +00:00
Nemanja Ivanovic	ec4b0c360f	[PowerPC] Add remaining vector permute builtins in altivec.h - LLVM portion This patch corresponds to review: https://reviews.llvm.org/D26480 Adds all the intrinsics used for various permute builtins that will be added to altivec.h. llvm-svn: 286638	2016-11-11 21:42:01 +00:00
Evgeniy Stepanov	1fe189d795	[cfi] Fix weak functions handling. When a function pointer is replaced with a jumptable pointer, special case is needed to preserve the semantics of extern_weak functions. Since a jumptable entry can not be extern_weak, we emulate that behaviour by replacing all references to F (the extern_weak function) with the following expression: F != nullptr ? JumpTablePtr : nullptr. Extra special care is needed for global initializers, since most (or probably all) backends can not lower an initializer that includes this kind of constant expression. Initializers like that are replaced with a global constructor (i.e. a runtime initializer). llvm-svn: 286636	2016-11-11 21:39:26 +00:00
Vyacheslav Klochkov	f1a12fe0f5	Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26543 llvm-svn: 286626	2016-11-11 19:55:29 +00:00
Sanjay Patel	da0149dd74	[InstCombine] add tests to show size-increasing select transforms llvm-svn: 286619	2016-11-11 19:37:54 +00:00
Chad Rosier	811e76dbcd	[AArch64] Add test to show narrow zero store merging is disabled with strict align. NFC. llvm-svn: 286617	2016-11-11 19:25:48 +00:00
Geoff Berry	25fa4999ff	[AArch64] Fix bugs in isel lowering replaceSplatVectorStore. Summary: Fix off-by-one indexing error in loop checking that inserted value was a splat vector. Add code to check that INSERT_VECTOR_ELT nodes constructing the splat vector have the expected constant index values. Reviewers: t.p.northover, jmolloy, mcrosier Subscribers: aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D26409 llvm-svn: 286616	2016-11-11 19:25:20 +00:00
Evgeniy Stepanov	f48ffab554	[cfi] Implement cfi-icall using inline assembly. The current implementation is emitting a global constant that happens to evaluate to the same bytes + relocation as a jump instruction on X86. This does not work for PIE executables and shared libraries though, because we end up with a wrong relocation type. And it has no chance of working on ARM/AArch64 which use different relocation types for jump instructions (R_ARM_JUMP24) that is never generated for data. This change replaces the constant with module-level inline assembly followed by a hidden declaration of the jump table. Works fine for ARM/AArch64, but has some drawbacks. * Extra symbols are added to the static symbol table, which inflate the size of the unstripped binary a little. Stripped binaries are not affected. This happens because jump table declarations must be external (because their body is in the inline asm). * Original functions that were anonymous are now named <original name>.cfi, and it affects symbolization sometimes. This is necessary because the only user of these functions is the (inline asm) jump table, so they had to be added to @llvm.used, which does not allow unnamed functions. llvm-svn: 286611	2016-11-11 18:49:09 +00:00
Adrian Prantl	554fd99dd5	Revert "Use private linkage for MergedGlobals variables" on Darwin. This is a partial revert of r244615 (http://reviews.llvm.org/D11942), which caused a major regression in debug info quality. Turning the artificial __MergedGlobal symbols into private symbols (l__MergedGlobal) means that the linker will not include them in the symbol table of the final executable. Without a symbol table entry dsymutil is not be able to process the debug info for any of the merged globals and thus drops the debug info for all of them. This patch is enabling the old behavior for all MachO targets while leaving all other targets unaffected. rdar://problem/29160481 https://reviews.llvm.org/D26531 llvm-svn: 286607	2016-11-11 17:50:09 +00:00
Nemanja Ivanovic	2efc3cb968	[PowerPC] Add vector conversion builtins to altivec.h - LLVM portion This patch corresponds to review: https://reviews.llvm.org/D26307 Adds all the intrinsics used for various conversion builtins that will be added to altivec.h. These are type conversions between various types of vectors. llvm-svn: 286596	2016-11-11 14:41:19 +00:00
John Brawn	3e0edbf269	Fix test/tools/gold/X86/thinlto_funcimport.ll on non-X86 hosts Pass -m elf_x86_64 to gold, as is done in other tests. llvm-svn: 286593	2016-11-11 14:12:15 +00:00
Chad Rosier	10c7aaaee9	[AArch64] Enable merging of adjacent zero stores for all subtargets. This optimization merges adjacent zero stores into a wider store. e.g., strh wzr, [x0] strh wzr, [x0, #2] ; becomes str wzr, [x0] e.g., str wzr, [x0] str wzr, [x0, #4] ; becomes str xzr, [x0] Previously, this was only enabled for Kryo and Cortex-A57. Differential Revision: https://reviews.llvm.org/D26396 llvm-svn: 286592	2016-11-11 14:10:12 +00:00
Ulrich Weigand	a0e7325023	[SystemZ] Support CL(G)T instructions This adds support for the compare logical and trap (memory) instructions that were added as part of the miscellaneous instruction extensions feature with zEC12. llvm-svn: 286587	2016-11-11 12:48:26 +00:00
Ulrich Weigand	92c2c672e5	[SystemZ] Support load-and-zero-rightmost-byte facility This adds support for the LZRF/LZRG/LLZRGF instructions that were added on z13, and uses them for code generation were appropriate. SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF over RISBG where both would be possible. llvm-svn: 286586	2016-11-11 12:46:28 +00:00
Ulrich Weigand	5dc7b67c62	[SystemZ] Use LLGT(R) instructions This adds support for the 31-to-64-bit zero extension instructions LLGT and LLGTR and uses them for code generation where appropriate. Since this operation can also be performed via RISBG, we have to update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT over RISBG in case both are possible. The patch includes some simplification to the tryRISBGZero code; this is not intended to cause any (further) functional change in codegen. llvm-svn: 286585	2016-11-11 12:43:51 +00:00
Simon Pilgrim	807f9cf243	[SelectionDAG] Add support for vector demandedelts in BSWAP opcodes llvm-svn: 286582	2016-11-11 11:51:29 +00:00
Simon Pilgrim	08dedfc589	[X86] Add knownbits vector BSWAP test In preparation for demandedelts support llvm-svn: 286579	2016-11-11 11:33:21 +00:00
Simon Pilgrim	813721e98a	[SelectionDAG] Add support for vector demandedelts in UREM/SREM opcodes llvm-svn: 286578	2016-11-11 11:23:43 +00:00
Simon Pilgrim	8bc531d349	[X86] Add knownbits vector UREM/SREM tests In preparation for demandedelts support llvm-svn: 286577	2016-11-11 11:11:40 +00:00
Simon Pilgrim	0652227814	[SelectionDAG] Add support for vector demandedelts in UDIV opcodes llvm-svn: 286576	2016-11-11 10:47:24 +00:00
Simon Pilgrim	da1a43e861	[X86] Add knownbits vector UDIV test In preparation for demandedelts support llvm-svn: 286575	2016-11-11 10:39:15 +00:00
Diana Picus	22274934f4	[ARM] Add plumbing for GlobalISel Add GlobalISel skeleton, up to the point where we can select a ret void. llvm-svn: 286573	2016-11-11 08:27:37 +00:00
Matthias Braun	325cd2c98a	ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps() addSchedBarrierDeps() is supposed to add use operands to the ExitSU node. The current implementation adds uses for calls/barrier instruction and the MBB live-outs in all other cases. The use operands of conditional jump instructions were missed. Also added code to macrofusion to set the latencies between nodes to zero to avoid problems with the fusing nodes lingering around in the pending list now. Differential Revision: https://reviews.llvm.org/D25140 llvm-svn: 286544	2016-11-11 01:34:21 +00:00
Stanislav Mekhanoshin	6fc8a1cdaa	Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies" This reverts commit r286171, it breaks piglit test fs-discard-exit-2 llvm-svn: 286530	2016-11-11 00:22:34 +00:00
Matthias Braun	f29b12dca8	ScheduleDAGInstrs: Ignore dependencies of constant physregs There is no need to track dependencies for constant physregs, as they don't change their value no matter in what order you read/write to them. Differential Revision: https://reviews.llvm.org/D26221 llvm-svn: 286526	2016-11-10 23:46:44 +00:00
Simon Pilgrim	38f0045cb0	[SelectionDAG] Add support for vector demandedelts in ADD/SUB opcodes llvm-svn: 286516	2016-11-10 22:41:49 +00:00
Justin Lebar	ea27ef6969	[LSR] Tweak loop-strength-reduce-crash test. Test-only change. Run opt instead of llc, and update the comment. llvm-svn: 286515	2016-11-10 22:37:13 +00:00
Peter Collingbourne	d93620bf4d	IR: Introduce inrange attribute on getelementptr indices. If the inrange keyword is present before any index, loading from or storing to any pointer derived from the getelementptr has undefined behavior if the load or store would access memory outside of the bounds of the element selected by the index marked as inrange. This can be used, e.g. for alias analysis or to split globals at element boundaries where beneficial. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html Differential Revision: https://reviews.llvm.org/D22793 llvm-svn: 286514	2016-11-10 22:34:55 +00:00
Simon Pilgrim	a0dee61df3	[X86] Updated knownbits vector ADD/SUB test In preparation for demandedelts support llvm-svn: 286513	2016-11-10 22:34:12 +00:00
Simon Pilgrim	8bbfacaf2c	[X86] Add knownbits vector ADD test llvm-svn: 286511	2016-11-10 22:21:04 +00:00
Simon Pilgrim	fe3a54371d	[SelectionDAG] Add support for splatted vectors in SUB opcode llvm-svn: 286509	2016-11-10 21:57:42 +00:00
Simon Pilgrim	7e0a4b8fdf	[X86] Add knownbits vector SUB test llvm-svn: 286508	2016-11-10 21:50:23 +00:00
Matthias Braun	9d62c5571b	RegisterCoalescer: Ignore interferences for constant physregs When copying to/from a constant register interferences can be ignored. Also update the documentation for isConstantPhysReg() to make it more obvious that this transformation is valid. Differential Revision: https://reviews.llvm.org/D26106 llvm-svn: 286503	2016-11-10 21:22:47 +00:00
Yaxun Liu	d6fbe65040	AMDGPU: Emit runtime metadata as a note element in .note section Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata. However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html). This patch lets AMDGPU backend emits runtime metadata as a note element in .note section. Differential Revision: https://reviews.llvm.org/D25781 llvm-svn: 286502	2016-11-10 21:18:49 +00:00
Adam Nemet	7da20c39ee	[OptDiag] Remove non-printable chars from function name The r283656 did this in the remark arguments. We also need to do this in the main function attribute as that is written to YAML as well. llvm-svn: 286482	2016-11-10 17:47:03 +00:00
Simon Pilgrim	d67af68f06	[SelectionDAG] Add support for vector demandedelts in TRUNCATE opcodes llvm-svn: 286481	2016-11-10 17:43:52 +00:00
Simon Pilgrim	e517f0a417	[X86] Add knownbits vector TRUNC test In preparation for demandedelts support llvm-svn: 286477	2016-11-10 17:24:33 +00:00
Teresa Johnson	a081145ebd	Restore part of "[ThinLTO] Prevent exporting of locals used/defined in module level asm" This restores the part of r286297 that didn't require adding a dependency from the Analysis to Object library. There are two parts to the original fix, and this will address the handling for the case where locals are used in module level asm. The part that requires functionality in libObject handles local defs in module level asm, and was reverted because our downstream build of clang builds lib/Bitcode into a single library, and this new dependency introduced a cycle there. I am trying to get that fixed (see D26502), so for now that change isn't being restored llvm-svn: 286475	2016-11-10 16:57:32 +00:00
Simon Pilgrim	ee187fd6e7	[SelectionDAG] Add support for vector demandedelts in MUL opcodes llvm-svn: 286471	2016-11-10 16:27:42 +00:00
Asaf Badouh	bb2338e939	reproducer for pr29002 https://reviews.llvm.org/D26449 llvm-svn: 286470	2016-11-10 16:27:27 +00:00
Tom Stellard	115a61560e	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464	2016-11-10 16:02:37 +00:00
Simon Pilgrim	2cf393c8fe	[X86] Add knownbits vector MUL test In preparation for demandedelts support llvm-svn: 286463	2016-11-10 15:57:33 +00:00
Simon Pilgrim	ca57e53ded	[SelectionDAG] Add support for vector demandedelts in SRA opcodes llvm-svn: 286461	2016-11-10 15:05:09 +00:00
Sanjay Patel	40d33e7554	[InstCombine] auto-generate better checks; NFC Note that the existing metadata checking was re-added by hand because the script doesn't currently know how to generate checks for lines outside of functions. llvm-svn: 286460	2016-11-10 14:58:17 +00:00
Simon Pilgrim	7be6d99442	[X86] Add knownbits vector arithmetic shift test In preparation for demandedelts support llvm-svn: 286457	2016-11-10 14:46:24 +00:00
Simon Pilgrim	37c9034bd6	[DAGCombiner] Correctly extract the ConstOrConstSplat shift value for SHL nodes We were failing to extract a constant splat shift value if the shifted value was being masked. The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this. llvm-svn: 286454	2016-11-10 14:35:09 +00:00
Chad Rosier	c16824d217	Remove unnecessary check prefix directives. NFC. llvm-svn: 286453	2016-11-10 14:28:44 +00:00
Simon Pilgrim	87f38fa85c	[DAGCombiner] Show missed opportunity to UNDEF out-of-range SHL Fails to match constant shift value due to presence of AND mask. llvm-svn: 286452	2016-11-10 14:19:45 +00:00
Tobias Grosser	455b9bd65c	[RegionInfo] Add three tests that include infinite loops These examples are variations that were inspired from a small subgraph taken from paper.ll which are interesting as they show certain issues with infinite loops. llvm-svn: 286450	2016-11-10 13:56:19 +00:00
Simon Pilgrim	3bf99c056a	[SelectionDAG] Add support for vector demandedelts in SHL/SRL opcodes llvm-svn: 286448	2016-11-10 13:52:42 +00:00
Simon Pilgrim	ede8ad7c5a	[X86] Add knownbits vector logical shift test In preparation for demandedelts support llvm-svn: 286447	2016-11-10 13:34:17 +00:00
Oliver Stannard	18ca2adf2d	[ARM] Thumb2 LDR (literal) should accept PC as the destination The version of this instruction with the .w suffix already correctly accepts this, but the alias without the .w did not. Differential Revision: https://reviews.llvm.org/D26499 llvm-svn: 286446	2016-11-10 13:20:41 +00:00
Craig Topper	bd298c37d1	[AVX-512] Allow legacy cvtpd2dq intrinsics to select EVEX encoded instruction when available. llvm-svn: 286435	2016-11-10 07:47:17 +00:00
Craig Topper	e0845d8e8c	[AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions. This allows these intrinsics to also use EVEX instructons. llvm-svn: 286434	2016-11-10 07:24:52 +00:00
Craig Topper	f37b9b9b5f	[X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available. llvm-svn: 286433	2016-11-10 06:45:39 +00:00
Craig Topper	924c5ec472	[AVX-512] Add test cases to show missed opportunities for using VALIGND/Q to handle shuffles. llvm-svn: 286425	2016-11-10 03:39:19 +00:00
Sanjay Patel	4e1b5a53c7	[InstCombine] avoid infinite loop from shuffle-extract-insert sequence (PR30923) Removing the limitation in visitInsertElementInst() causes several regressions because we're not prepared to fold sequences of shuffles or inserts and extracts separated by shuffles. Fixing that appears to be a difficult mission because we are purposely trying to avoid creating shuffles with arbitrary shuffle masks because some targets may choke on those. https://llvm.org/bugs/show_bug.cgi?id=30923 llvm-svn: 286423	2016-11-10 00:15:14 +00:00
Peter Collingbourne	32ab3a817d	Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420	2016-11-09 23:53:43 +00:00
Dylan McKay	0d4778f841	[AVR] Add a selection of CodeGen tests Summary: This adds all of the CodeGen tests which currently pass. Reviewers: arsenm, kparzysz Subscribers: japaric, wdng Differential Revision: https://reviews.llvm.org/D26388 llvm-svn: 286418	2016-11-09 23:46:52 +00:00
Dylan McKay	3ffc449597	[AVR] Add all of the machine code test suite Summary: This adds all of the AVR machine code tests. Reviewers: arsenm, kparzysz Subscribers: wdng, japaric Differential Revision: https://reviews.llvm.org/D26387 llvm-svn: 286417	2016-11-09 23:46:25 +00:00
Tim Northover	a9105be437	GlobalISel: translate invoke and landingpad instructions Pretty bare-bones support for exception handling (no weird MSVC stuff, no SjLj etc), but it should get things going. llvm-svn: 286407	2016-11-09 22:39:54 +00:00
Dehao Chen	06e079a530	Update vectorization debug info unittest. Summary: The change will test the change in r286159. The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location. Reviewers: probinson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26428 llvm-svn: 286403	2016-11-09 22:25:19 +00:00
Sanjay Patel	600631daf3	[InstCombine] regenerate checks; NFC llvm-svn: 286402	2016-11-09 22:21:58 +00:00
Sanjay Patel	16da6c466f	[InstCombine] regenerate checks; NFC llvm-svn: 286399	2016-11-09 21:41:34 +00:00
Krzysztof Parzyszek	a540997ce4	[Hexagon] Separate Hexagon subreg indices for different register classes For pairs of 32-bit registers: isub_lo, isub_hi. For pairs of vector registers: vsub_lo, vsub_hi. Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg) that returns the appropriate subreg index for RegClass. llvm-svn: 286377	2016-11-09 16:19:08 +00:00
Krzysztof Parzyszek	601d7eb11a	[Hexagon] Eliminate Insert4 pseudo-instruction, use combines instead llvm-svn: 286368	2016-11-09 14:16:29 +00:00
Alexandros Lamprineas	0ee3ec2fe4	[ARM] Loop Strength Reduction crashes when targeting ARM or Thumb. Scalar Evolution asserts when not all the operands of an Add Recurrence Expression are loop invariants. Loop Strength Reduction should only create affine Add Recurrences, so that both the start and the step of the expression are loop invariants. Differential Revision: https://reviews.llvm.org/D26185 llvm-svn: 286347	2016-11-09 08:53:07 +00:00
Craig Topper	f334ac19ad	[AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32 This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions. If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result. It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result. Differential Revision: https://reviews.llvm.org/D26331 llvm-svn: 286345	2016-11-09 07:48:51 +00:00
Craig Topper	731bf9c5d6	[X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ. Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26330 llvm-svn: 286344	2016-11-09 07:31:32 +00:00
Craig Topper	ef1807fb73	[AVX-512] Add more varied alignments to tests for storing the lower 128-bits of a 256 or 512-bit subvector extract. llvm-svn: 286343	2016-11-09 05:38:47 +00:00
Craig Topper	28e3dfc02b	[AVX-512] Use alignedstore256 in patterns that look for stores of the lower 256-bits of a 512-bit vector to use a 256-bit aligned store. Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947. llvm-svn: 286342	2016-11-09 05:31:57 +00:00
Craig Topper	abf5041537	[AVX-512] Add test cases to demonstrate PR30947. We accidentally use 32 byte aligned store instructions when the original store was only 16 byte aligned if the store is from the lower bits of a subvector extract. llvm-svn: 286341	2016-11-09 05:31:53 +00:00
Craig Topper	5c842be9a0	[AVX-512] Make VBMI instruction set enabling imply that the BWI instruction set is also enabled. Summary: This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912. Reviewers: delena, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26322 llvm-svn: 286339	2016-11-09 04:50:48 +00:00
Mehdi Amini	b6a11a7879	Revert "[ThinLTO] Prevent exporting of locals used/defined in module level asm" This reverts commit r286297. Introduces a dependency from libAnalysis to libObject, which I missed during the review. llvm-svn: 286329	2016-11-09 01:45:13 +00:00
Dehao Chen	947dbe1254	Enable Loop Sink pass for functions that has profile. Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code. Reviewers: davidxl, hfinkel Subscribers: mehdi_amini, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26155 llvm-svn: 286325	2016-11-09 00:58:19 +00:00
Peter Collingbourne	58f7f0759f	Bitcode: Change the BitcodeReader to use llvm::Error internally. Differential Revision: https://reviews.llvm.org/D26430 llvm-svn: 286323	2016-11-09 00:51:04 +00:00
Sanjay Patel	e104554412	[ValueTracking] recognize obfuscated variants of umin/umax The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs. If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares. Ie, running the optimizer pessimizes the codegen for this case without this patch: define <4 x i32> @umax_vec(<4 x i32> %x) { %cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647> %sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647> ret <4 x i32> %sel } $ ./opt umax.ll -S \| ./llc -o - -mattr=avx vpmaxud LCPI0_0(%rip), %xmm0, %xmm0 $ ./opt -instcombine umax.ll -S \| ./llc -o - -mattr=avx vpxor %xmm1, %xmm1, %xmm1 vpcmpgtd %xmm0, %xmm1, %xmm1 vmovaps LCPI0_0(%rip), %xmm2 ## xmm2 = [2147483647,2147483647,2147483647,2147483647] vblendvps %xmm1, %xmm0, %xmm2, %xmm0 Differential Revision: https://reviews.llvm.org/D26096 llvm-svn: 286318	2016-11-09 00:24:44 +00:00
Sanjay Patel	4e9d6cd354	[InstCombine] fix profitability equation for max-of-nots transform As the test change shows, we can increase the critical path by adding a 'not' instruction, so make sure that we're actually removing an instruction if we do this transform. This transform could also cause us to miss folds of min/max pairs. llvm-svn: 286315	2016-11-09 00:13:11 +00:00
Adrian Prantl	3502f2089c	Emit the DW_AT_type for a C++ static member definition if it is more specific than the one in its DW_AT_specification. If a static member is an array, the translation unit containing the member definition may have a more specific type (including its length) than TUs only seeing the class declaration. This patch adds a DW_AT_type to the member's DW_TAG_variable in addition to the DW_AT_specification in these cases. The member type in the DW_AT_specification still shows the more generic type (without the length) to avoid defeating type uniquing. The DWARF standard discourages “duplicating” a DW_AT_type in a member variable definition but doesn’t explicitly forbid it. Having the more specific type (with the array length) available is what allows the debugger to print the contents of a static array member variable. https://reviews.llvm.org/D26368 rdar://problem/28706946 llvm-svn: 286302	2016-11-08 22:11:38 +00:00
Teresa Johnson	6955feebf3	[ThinLTO] Prevent exporting of locals used/defined in module level asm Summary: This patch uses the same approach added for inline asm in r285513 to similarly prevent promotion/renaming of locals used or defined in module level asm. All static global values defined in normal IR and used in module level asm should be included on either the llvm.used or llvm.compiler.used global. The former were already being flagged as NoRename in the summary, and I've simply added llvm.compiler.used values to this handling. Module level asm may also contain defs of values. We need to prevent export of any refs to local values defined in module level asm (e.g. a ref in normal IR), since that also requires renaming/promotion of the local. To do that, the summary index builder looks at all values in the module level asm string that are not marked Weak or Global, which is exactly the set of locals that are defined. A summary is created for each of these local defs and flagged as NoRename. This required adding handling to the BitcodeWriter to look at GV declarations to see if they have a summary (rather than skipping them all). Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to ensure that an MCAsmParser is available, otherwise the module asm parse would silently fail. Initialized the asm parser in the opt tool for use in testing this fix. Fixes PR30610. Reviewers: mehdi_amini Subscribers: johanengelen, krasin, llvm-commits Differential Revision: https://reviews.llvm.org/D26146 llvm-svn: 286297	2016-11-08 21:53:35 +00:00
Kuba Brecka	a49dcbb743	[asan] Speed up compilation of large C++ stringmaps (tons of allocas) with ASan This addresses PR30746, <https://llvm.org/bugs/show_bug.cgi?id=30746>. The ASan pass iterates over entry-block instructions and checks each alloca whether it's in NonInstrumentedStaticAllocaVec, which is apparently slow. This patch gathers the instructions to move during visitAllocaInst. Differential Revision: https://reviews.llvm.org/D26380 llvm-svn: 286296	2016-11-08 21:30:41 +00:00
Andrew Kaylor	9604f34996	[BasicAA] Teach BasicAA to handle the inaccessiblememonly and inaccessiblemem_or_argmemonly attributes Differential Revision: https://reviews.llvm.org/D26382 llvm-svn: 286294	2016-11-08 21:07:42 +00:00
Ulrich Weigand	05effca2d8	[SystemZ] Add missing FP extension instructions This completes assembler / disassembler support for all BFP instructions provided by the floating-point extensions facility. The instructions added here are not currently used for codegen. llvm-svn: 286285	2016-11-08 20:18:41 +00:00
Ulrich Weigand	4006e09d1d	[SystemZ] Add program mask and addressing mode instructions Add several instructions that operate on the program mask or the addressing mode. These are not really needed for code generation under Linux, but are provided for completeness for the assembler/disassembler. llvm-svn: 286284	2016-11-08 20:17:02 +00:00
Ulrich Weigand	fffc7110d6	[SystemZ] Model access registers as LLVM registers Add the 16 access registers as LLVM registers. This allows removing a lot of special cases in the assembler and disassembler where we were handling access registers; this can all just use the generic register code now. Also add a bunch of instructions to operate on access registers, for assembler/disassembler use only. No change in code generation intended. llvm-svn: 286283	2016-11-08 20:15:26 +00:00
Dan Gohman	e81021a5cb	[WebAssembly] Convert stackified IMPLICIT_DEF into constant 0. Since IMPLIFIT_DEF instructions are omitted in the output, when the output of an IMPLICIT_DEF instruction is stackified, the resulting register lacks an explicit push, leading to a push/pop mismatch. Fix this by converting such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit pushes. llvm-svn: 286274	2016-11-08 19:40:38 +00:00
Davide Italiano	1e77aaca8a	[LibcallsShrinkWrap] This pass doesn't preserve the CFG. For example, it invalidates the domtree, causing assertions in later passes which need dominator infos. Make it preserve GlobalsAA, as suggested by Eli. Differential Revision: https://reviews.llvm.org/D26381 llvm-svn: 286271	2016-11-08 19:18:20 +00:00
Nirav Dave	e833c6c61a	[MC][AArch64] Cleanup end-of-line parsing in AArch64 AsmParser. Reviewers: t.p.northover, rengolin Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D26309 llvm-svn: 286265	2016-11-08 18:31:04 +00:00
Ulrich Weigand	d2148caffc	[SystemZ] Refactor branch and conditional instruction patterns Rework patterns for branches, call & return instructions, compare-and-branch, compare-and-trap, and conditional move instructions. In particular, simplify creation of patterns for the extended opcodes of instructions that take a CC mask. Also, use semantical instruction classes for all the instructions instead of open-coding them in SystemZInstrInfo.td. Adds a couple of the basic branch instructions (that are unused for codegen) for the assembler/disassembler. llvm-svn: 286263	2016-11-08 18:30:50 +00:00
Sanjay Patel	8625c43662	[InstCombine] move min/max tests to min/max test file; NFC llvm-svn: 286256	2016-11-08 18:12:19 +00:00
Sanjay Patel	686cf49f7a	[InstCombine] update checks; NFC llvm-svn: 286255	2016-11-08 18:06:14 +00:00
Tim Northover	5f7dea85c2	GlobalISel: support selecting fpext/fptrunc instructions on AArch64. llvm-svn: 286253	2016-11-08 17:44:07 +00:00
Anton Korobeynikov	243a4700ce	Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes. Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23718 llvm-svn: 286252	2016-11-08 17:19:59 +00:00
Simon Pilgrim	bdb3c38157	[X86][SSE] Regenerate test (just adds missing header) llvm-svn: 286241	2016-11-08 15:42:49 +00:00
Simon Pilgrim	778596bf59	[TargetLowering] Fix undef vector element issue with true/false result handling Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching. The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements.... This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs). The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed. Differential Revision: https://reviews.llvm.org/D26031 llvm-svn: 286238	2016-11-08 15:07:01 +00:00
Pablo Barrio	9f45254138	[JumpThreading] Unfold selects that depend on the same condition Summary: These are good candidates for jump threading. This enables later opts (such as InstCombine) to combine instructions from the selects with instructions out of the selects. SimplifyCFG will fold the select again if unfolding wasn't worth it. Patch by James Molloy and Pablo Barrio. Reviewers: rengolin, haicheng, sebpop Subscribers: jojo, jmolloy, llvm-commits Differential Revision: https://reviews.llvm.org/D26391 llvm-svn: 286236	2016-11-08 14:53:30 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Roger Ferrer Ibanez	80c0f33c29	[AArch64] Fix incorrect CSEL node created Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64 transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to "a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have the same type which can lead to a wrong CSEL node that crashes later due to nonsensical copies. Differential Revision: https://reviews.llvm.org/D26394 llvm-svn: 286231	2016-11-08 13:34:41 +00:00
Simon Dardis	e7cc54058d	[mips] Renable small data section test. llvm-svn: 286230	2016-11-08 13:03:45 +00:00
Craig Topper	c6a0339fb0	[AVX-512] Add an avx512f without avx512vl command line to vec_fp_to_int.ll and regenerate. This will make a change in a future patch easier to see. NFC llvm-svn: 286216	2016-11-08 06:58:53 +00:00
Tim Northover	9ac0eba672	GlobalISel: support selecting G_SELECT on AArch64. llvm-svn: 286185	2016-11-08 00:45:29 +00:00
Tim Northover	7d88da6a46	GlobalISel: constrain PHI registers on AArch64. Self-referencing PHI nodes need their destination operands to be constrained because nothing else is likely to do so. For now we just pick a register class naively. Patch mostly by Ahmed again. llvm-svn: 286183	2016-11-08 00:34:06 +00:00
Chad Rosier	583a307e17	[AArch64] Remove dead check prefixes after r286110. NFC. llvm-svn: 286174	2016-11-07 23:13:59 +00:00
Chad Rosier	d8447a7d30	[AArch64] Rename test to reflect changes after r286110. NFC. llvm-svn: 286173	2016-11-07 23:13:55 +00:00
Stanislav Mekhanoshin	92e01ee90b	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a forward propagation of a v_cmp 64 bit result to an user is implemented. Additional side effect of this is that we may consume less VGPRs in a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. llvm-svn: 286171	2016-11-07 23:04:50 +00:00
Adam Nemet	b103fc52d3	[OptDiag, opt-viewer] Save callee's location and display as link With this we get a new field in the YAML record if the value being streamed out has a debug location. For examples, please see the changes to the tests. This is then used in opt-viewer to display a link for the callee function in the inlining remarks. Differential Revision: https://reviews.llvm.org/D26366 llvm-svn: 286169	2016-11-07 22:41:13 +00:00
Sanjin Sijaric	6f020d91a1	[AArch64] Transfer memory operands when lowering vector load/store intrinsics Summary: Some vector loads and stores generated from AArch64 intrinsics alias each other unnecessarily, preventing better scheduling. We just need to transfer memory operands during lowering. Reviewers: mcrosier, t.p.northover, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26313 llvm-svn: 286168	2016-11-07 22:39:02 +00:00
Derek Schuff	0d41b7b3f3	[WebAssembly] Emit a BasePointer when we have overly-aligned stack objects Because we shift the stack pointer by an unknown amount, we need an additional pointer. In the case where we have variable-size objects as well, we can't reuse the frame pointer, thus three pointers. Patch by Jacob Gravelle Differential Revision: https://reviews.llvm.org/D26263 llvm-svn: 286160	2016-11-07 22:00:48 +00:00
Sanjoy Das	e06ef141fc	Avoid tail recursion elimination across calls with operand bundles Summary: In some specific scenarios with well understood operand bundle types (like `"deopt"`) it may be possible to go ahead and convert recursion to iteration, but TailRecursionElimination does not have that logic today so avoid doing the right thing for now. I need some input on whether `"funclet"` operand bundles should also block tail recursion elimination. If not, I'll allow TRE across calls with `"funclet"` operand bundles and add a test case. Reviewers: rnk, majnemer, nlewycky, ahatanak Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26270 llvm-svn: 286147	2016-11-07 21:01:49 +00:00
Kuba Brecka	44e875ad5b	[tsan] Cast floating-point types correctly when instrumenting atomic accesses, LLVM part Although rare, atomic accesses to floating-point types seem to be valid, i.e. `%a = load atomic float ...`. The TSan instrumentation pass however tries to emit inttoptr, which is incorrect, we should use a bitcast here. Anyway, IRBuilder already has a convenient helper function for this. Differential Revision: https://reviews.llvm.org/D26266 llvm-svn: 286135	2016-11-07 19:09:56 +00:00
Matt Arsenault	f530e8b3f0	AMDGPU: Remove unnecessary and on conditional branch The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134	2016-11-07 19:09:33 +00:00
Matt Arsenault	52f14ec596	AMDGPU: Preserve vcc undef flags when inverting branch If the branch was on a read-undef of vcc, passes that used analyzeBranch to invert the branch condition wouldn't preserve the undef flag resulting in a verifier error. Fixes verifier failures in a future commit. Also fix verifier error when inserting copy for vccz corruption bug. llvm-svn: 286133	2016-11-07 19:09:27 +00:00
Benjamin Kramer	1697d39eef	[MemCpyOpt] Don't emit IR in an unspecified order Argument evaluation order is one of the edge cases where Clang differs from GCC, yielding different IR depending on which compiler LLVM was built with. Make the order deterministic and tune the test to actually verify the order instead of trying to hide it. llvm-svn: 286126	2016-11-07 17:47:28 +00:00
Richard Smith	857efb0880	Add -O0 support for @llvm.invariant.group.barrier by discarding it if it gets to ISel. Differential Revision: https://reviews.llvm.org/D26292 llvm-svn: 286119	2016-11-07 16:47:20 +00:00
Sanjay Patel	86408a8048	[InstCombine] allow splat vector folds in adjustMinMax() (retry r285732) This was reverted at r285866 because there was a crash handling a scalar select of vectors. I added a check for that pattern and a test case based on the example provided in the post-commit thread for r285732. llvm-svn: 286113	2016-11-07 15:52:45 +00:00
Amara Emerson	614b44bbe9	This patch adds support for 16 bit floating point registers to the inline asm register selection on AArch64. Without this patch, register allocation for the example below fails. define half @test(half %a1, half %a2) #0 { entry: %0 = tail call half asm "sqrshl ${0:h}, ${1:h}, ${2:h}", "=w,w,w" (half %a1, half %a2) #1 ret half %0 } Patch by Florian Hahn. Differential Revision: https://reviews.llvm.org/D25080 llvm-svn: 286111	2016-11-07 15:42:12 +00:00
Chad Rosier	d6daac4746	[AArch64] Removed the narrow load merging code in the ld/st optimizer. This feature has been disabled for some time now, so remove cruft. Differential Revision: https://reviews.llvm.org/D26248 llvm-svn: 286110	2016-11-07 15:27:22 +00:00
Chad Rosier	8f348017b0	[AliasSetTracker] Make AST smarter about assume intrinsics that don't actually affect memory. Differential Revision: https://reviews.llvm.org/D26252 llvm-svn: 286108	2016-11-07 14:11:45 +00:00
James Molloy	b03e0879fc	[Thumb1] Move padding earlier when synthesizing TBBs off of the PC When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding. If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset. In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4. llvm-svn: 286107	2016-11-07 13:38:21 +00:00
Simon Pilgrim	b56c731f18	[X86][AVX512] Add AVX512VL/AVX512BWVL vector truncation tests llvm-svn: 286105	2016-11-07 13:34:29 +00:00
Simon Pilgrim	02666ac9c3	[X86][SSE] Drop unnecessary -mcpu argument from trunc tests cpu/triple duplication llvm-svn: 286104	2016-11-07 13:28:20 +00:00
Craig Topper	b110e04851	[AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to selects and native zext/sext. This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select. llvm-svn: 286092	2016-11-07 02:12:57 +00:00
Craig Topper	7e545335d6	[AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to legacy intrinsics and a select. llvm-svn: 286089	2016-11-07 00:13:39 +00:00
Saleem Abdulrasool	804e12eeb5	ARM: lower fpowi appropriately for Windows ARM This handles the last case of the builtin function calls that we would generate code which differed from Microsoft's ABI. Rather than generating a call to `__pow{d,s}i2` we now promote the parameter to a float or double and invoke `powf` or `pow` instead. Addresses PR30825! llvm-svn: 286082	2016-11-06 19:46:54 +00:00
Simon Pilgrim	39df78e384	[SelectionDAG] Add support for vector demandedelts in XOR opcodes llvm-svn: 286075	2016-11-06 16:49:19 +00:00
Simon Pilgrim	3ac353cb51	[X86] Add knownbits vector xor test In preparation for demandedelts support llvm-svn: 286074	2016-11-06 16:36:29 +00:00
Craig Topper	46de41330c	[AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead upgrade them to a select and the older AVX2 intrinsic. llvm-svn: 286073	2016-11-06 16:29:19 +00:00
Craig Topper	af9b3fe752	[AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286072	2016-11-06 16:29:14 +00:00
Simon Pilgrim	dd4809a603	[SelectionDAG] Add support for vector demandedelts in OR opcodes llvm-svn: 286071	2016-11-06 16:29:09 +00:00
Craig Topper	c9467ed31e	[AVX-512] Remove intrinsics for 128/256-bit masked shift by single element in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286070	2016-11-06 16:29:08 +00:00
Craig Topper	1b468b4e3a	[AVX-512] Remove a 512-bit test cases from the avx512vl test file. It already exists in the avx512f test file. llvm-svn: 286069	2016-11-06 16:29:03 +00:00
Simon Pilgrim	c104185580	[X86] Add knownbits vector or test In preparation for demandedelts support llvm-svn: 286068	2016-11-06 16:05:59 +00:00
Craig Topper	6b3e7b47d8	[X86] Add a few more fptoui test cases to the vec_fp_to_int.ll. The codegen for these test cases will be improved for AVX512 in a future commit. llvm-svn: 286063	2016-11-06 07:50:25 +00:00
Craig Topper	5471fc29e4	[AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 addr:)) -> VCVTPS2PDZ128rm llvm-svn: 286059	2016-11-06 04:12:52 +00:00
Craig Topper	bd156195b0	[AVX-512] Add avx512vl command line to the fpext test and add -show-mc-encoding to show where we aren't using EVEX instructions. llvm-svn: 286058	2016-11-06 04:12:49 +00:00
Craig Topper	1162857ec4	[AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX instruction when available. llvm-svn: 286057	2016-11-06 04:12:46 +00:00
Craig Topper	9a4a3af5dd	[AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so they can use EVEX instructions when available. llvm-svn: 286056	2016-11-06 04:12:42 +00:00
Craig Topper	a4a51f1afe	[AVX-512] Add -show-mc-encoding to legacy vector intrinsic tests so we can see when VEX or EVEX encoded instructions are being emitted. Make sure the tests all have an avx2 command line and an skx command line. llvm-svn: 286055	2016-11-06 02:03:58 +00:00
Justin Lebar	54b0be048e	[LoopStrengthReduce] Don't use a DenseSet<int64_t> when we might add any valid int64_t to the set. Summary: SmallSetVector uses DenseSet, but that means we need to reserve some values for the empty and tombstone keys. It seems to me we should have a general way to let us store full-range ints inside of DenseSets, and furthermore that we probably shouldn't silently let you add ints into DenseSets without explicitly promising that they're in range. But that's a battle for another day; for now, just fix this code, since we currently do something Very Bad when compiling ffmpeg. Fixes PR30914. Reviewers: jeremyhu Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D26323 llvm-svn: 286038	2016-11-05 16:47:25 +00:00
Krzysztof Parzyszek	b7eb7fc892	[Hexagon] Account for <def,read-undef> when validating moves for predication llvm-svn: 286009	2016-11-04 20:41:03 +00:00
Weiming Zhao	6100118a52	Fix 24560: assembler does not share constant pool for same constants Summary: This patch returns the same label if the CP entry with the same value has been created. Reviewers: eli.friedman, rengolin, jmolloy Subscribers: majnemer, jmolloy, llvm-commits Differential Revision: https://reviews.llvm.org/D25804 llvm-svn: 286006	2016-11-04 19:17:32 +00:00
NAKAMURA Takumi	b4eef1fa4a	llvm/test/Transforms/DCE/calls-errno.ll: Suppress checking @pow(+0,-1). It depends on host's pow(3), and mingw's pow doesn't raise any errors, just returns +INF. llvm-svn: 286005	2016-11-04 18:50:45 +00:00
Zvi Rackover	85bc64c734	[X86] Broadcast from memory intructions aren't unfoldable Broadcast from memory instructions should be treated as moves. They can't be unfolded. Fixes pr30693. llvm-svn: 285998	2016-11-04 15:15:19 +00:00
Zvi Rackover	1522b33195	Add bugpoint-reduced reproducer for pr30693 llvm-svn: 285997	2016-11-04 14:53:22 +00:00
Tom Stellard	2d2d33f1dc	Revert "AMDGPU: Add VI i16 support" This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995	2016-11-04 13:06:34 +00:00
Weiming Zhao	962eaaea9c	[Cortex-M0] Atomic lowering Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls. Reviewers: rengolin, efriedma, t.p.northover, jmolloy Subscribers: efriedma, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D26120 llvm-svn: 285969	2016-11-03 21:49:08 +00:00
Kevin Enderby	7747cb55dc	Add support for the ARM_THREAD_STATE64 and in llvm-objdump for Mach-O files add the printing of the ARM_THREAD_STATE64 in the same format as otool-classic(1) on darwin. To do this the 64-bit ARM general tread state needed to be defined in include/llvm/Support/MachO.h . rdar://28985800 llvm-svn: 285967	2016-11-03 20:51:28 +00:00
Adrian Prantl	dbfda63695	Add DWARF debug info support for C++11 inline namespaces. This implements the DWARF 5 DW_AT_export_symbols feature: http://dwarfstd.org/ShowIssue.php?issue=141212.1 <rdar://problem/18616046> llvm-svn: 285959	2016-11-03 19:42:02 +00:00
Rafael Espindola	ed1395a792	Add error handling to getEntry. Issue found by inspection. llvm-svn: 285951	2016-11-03 18:05:33 +00:00
Rafael Espindola	6a4949756a	Replace a report_fatal_error with an ErrorOr. llvm-svn: 285942	2016-11-03 17:28:33 +00:00
Tom Stellard	2b3379cdff	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939	2016-11-03 17:13:50 +00:00
Davide Italiano	a1f241d1c3	Make this test Windows-only (try to placate buildbots). llvm-svn: 285931	2016-11-03 16:43:10 +00:00
Alexander Timofeev	f867a40bf6	[AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads. hange explores the fact that LDS reads may be reordered even if access the same location. Prior the change, algorithm immediately stops as soon as any memory access encountered between loads that are expected to be merged together. Although, Read-After-Read conflict cannot affect execution correctness. Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%. Also improvement expected on any massive sequences of reads from LDS. Differential Revision: https://reviews.llvm.org/D25944 llvm-svn: 285919	2016-11-03 14:37:13 +00:00
James Molloy	e7d97368f2	Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently" This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 . llvm-svn: 285912	2016-11-03 14:08:01 +00:00
Rafael Espindola	7b2750afa5	replace a report_fatal_error with a ErrorOr. llvm-svn: 285910	2016-11-03 13:58:15 +00:00
James Molloy	b60d8b1987	[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk. For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 285893	2016-11-03 10:18:20 +00:00
Craig Topper	7b9cc1474f	[AVX-512] Use 'vnot' instead of 'not' in patterns involving vXi1 vectors. This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR. Reviewers: delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26134 llvm-svn: 285878	2016-11-03 06:04:28 +00:00
Elena Demikhovsky	caaceef4b3	Expandload and Compressstore intrinsics 2 new intrinsics covering AVX-512 compress/expand functionality. This implementation includes syntax, DAG builder, operation lowering and tests. Does not include: handling of illegal data types, codegen prepare pass and the cost model. llvm-svn: 285876	2016-11-03 03:23:55 +00:00
Teresa Johnson	0515fb8d4b	[ThinLTO] Handle distributed backend case when doing renaming Summary: The recent change I made to consult the summary when deciding whether to rename (to handle inline asm) in r285513 broke the distributed build case. In a distributed backend we will only have a portion of the combined index, specifically for imported modules we only have the summaries for any imported definitions. When renaming on import we were asserting because no summary entry was found for a local reference being linked in (def wasn't imported). We only need to consult the summary for a renaming decision for the exporting module. For imports, we would have prevented importing any references to NoRename values already. Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26250 llvm-svn: 285871	2016-11-03 01:07:16 +00:00
Greg Bedwell	5fc6f94591	Revert "[InstCombine] allow splat vector folds in adjustMinMax()" This reverts commit r285732. This change introduced a new assertion failure in the following testcase at -O2: typedef short __v8hi __attribute__((__vector_size__(16))); __v8hi foo(__v8hi &V1, __v8hi &V2, unsigned mask) { __v8hi Result = V1; if (mask & 0x80) Result[0] = V2[0]; return Result; } llvm-svn: 285866	2016-11-02 23:17:05 +00:00
Adrian McCarthy	4333daab1c	Emit S_COMPILE3 record once per TU rather than once per function This has some ripple effects in several tests. llvm-svn: 285862	2016-11-02 21:30:35 +00:00
Kevin Enderby	fbebe1632a	Add the rest of the additional error checks for invalid Mach-O files when the offsets and sizes of an element of the Mach-O file overlaps with another element in the Mach-O file. Some other tests for malformed Mach-O files now run into these checks so their tests were also adjusted. llvm-svn: 285860	2016-11-02 21:08:39 +00:00
Davide Italiano	807c699bbb	[RuntimeDyld] Move an X86 only test to the correct directory. This is an attempt to placate the bots after r285841. llvm-svn: 285859	2016-11-02 21:05:42 +00:00
Eli Friedman	b6befc3bc4	DCE math library calls with a constant operand. On platforms which use -fmath-errno, math libcalls without any uses require some extra checks to figure out if they are actually dead. Fixes https://llvm.org/bugs/show_bug.cgi?id=30464 . Differential Revision: https://reviews.llvm.org/D25970 llvm-svn: 285857	2016-11-02 20:48:11 +00:00
Vedant Kumar	5a0e92b04c	[llvm-cov] Turn line numbers in html reports into clickable links llvm-svn: 285853	2016-11-02 19:44:13 +00:00
Krzysztof Parzyszek	ead77016d8	[Hexagon] Remove registers coalesced in expand-condsets from live intervals llvm-svn: 285846	2016-11-02 17:59:54 +00:00
Artem Tamazov	e8bb4bcafc	[AMDGPU][mc] Improve test of special asm symbols. Test simplified. Coverage extended. Differential Revision: https://reviews.llvm.org/D26198 llvm-svn: 285844	2016-11-02 17:45:58 +00:00
Davide Italiano	6b2bba14a9	[lli/COFF] Set the correct alignment for common symbols Otherwise we set it always to zero, which is not correct, and we assert inside alignTo (Assertion failed: Align != 0u && "Align can't be 0."). Differential Revision: https://reviews.llvm.org/D26173 llvm-svn: 285841	2016-11-02 17:32:19 +00:00
Matt Arsenault	bf9ee26aea	AMDGPU: Cleanup some xfailed tests Some of these are already fixed or tested somewhere else. llvm-svn: 285840	2016-11-02 17:24:54 +00:00
Zachary Turner	7251ede7c5	Add CodeViewRecordIO for reading and writing. Using a pattern similar to that of YamlIO, this allows us to have a single codepath for translating codeview records to and from serialized byte streams. The current patch only hooks this up to the reading of CodeView type records. A subsequent patch will hook it up for writing of CodeView type records, and then a third patch will hook up the reading and writing of CodeView symbols. Differential Revision: https://reviews.llvm.org/D26040 llvm-svn: 285836	2016-11-02 17:05:19 +00:00
Nicolai Haehnle	368972c3b3	AMDGPU: Allow additional implicit operands on MOVRELS instructions Summary: The post-RA scheduler occasionally uses additional implicit operands when the vector implicit operand as a whole is killed, but some subregisters are still live because they are directly referenced later. Unfortunately, this seems incredibly subtle to reproduce. Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test and others. Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25656 llvm-svn: 285835	2016-11-02 17:03:11 +00:00
Nirav Dave	0a392a8e7f	[ARM][MC] Cleanup ARM Target Assembly Parser Summary: Correctly parse end-of-statement tokens and handle preprocessor end-of-line comments in ARM assembly processor. Reviewers: rnk, majnemer Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26152 llvm-svn: 285830	2016-11-02 16:22:51 +00:00
Matt Arsenault	44deb7914e	BranchRelaxation: Fix computing indirect branch block size llvm-svn: 285828	2016-11-02 16:18:29 +00:00
Adrian Prantl	56527dc804	Emit DW_OP_piece also if the previous value was a constant. This fixes a bug in the DWARF backend. llvm-svn: 285826	2016-11-02 16:12:16 +00:00
Rafael Espindola	25be8c8856	Avoid a report_fatal_error in sections(). Have it return a ErrorOr<Range> and delete section_begin and section_end. llvm-svn: 285807	2016-11-02 14:10:57 +00:00
Bjorn Pettersson	7424c8ccd1	[Reassociate] Skip analysis of dead code to avoid infinite loop. Summary: It was detected that the reassociate pass could enter an inifite loop when analysing dead code. Simply skipping to analyse basic blocks that are dead avoids such problems (and as a side effect we avoid spending time on optimising dead code). The solution is using the same Reverse Post Order ordering of the basic blocks when doing the optimisations, as when building the precalculated rank map. A nice side-effect of this solution is that we now know that we only try to do optimisations for blocks with ranked instructions. Fixes https://llvm.org/bugs/show_bug.cgi?id=30818 Reviewers: llvm-commits, davide, eli.friedman, mehdi_amini Subscribers: dberlin Differential Revision: https://reviews.llvm.org/D26154 llvm-svn: 285793	2016-11-02 08:55:19 +00:00
Peter Collingbourne	ff2c2ec6b2	Bitcode: Check file size before reading bitcode header. Should unbreak ocaml binding tests. Also added an llvm-dis test that checks for the same thing. llvm-svn: 285777	2016-11-02 00:39:11 +00:00
Peter Collingbourne	028eb5a3f8	Bitcode: Change reader interface to take memory buffers. As proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106595.html This change also fixes an API oddity where BitstreamCursor::Read() would return zero for the first read past the end of the bitstream, but would report_fatal_error for subsequent reads. Now we always report_fatal_error for all reads past the end. Updated clients to check for the end of the bitstream before reading from it. I also needed to add padding to the invalid bitcode tests in test/Bitcode/. This is because the streaming interface was not checking that the file size is a multiple of 4. Differential Revision: https://reviews.llvm.org/D26219 llvm-svn: 285773	2016-11-02 00:08:19 +00:00
Matt Arsenault	663ab8c119	AMDGPU: Use brev for materializing SGPR constants This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765	2016-11-01 23:14:20 +00:00
Matt Arsenault	3d463193a9	AMDGPU: Default to using scalar mov to materialize immediate This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762	2016-11-01 22:55:07 +00:00
Rafael Espindola	7909e22c7c	Don't compute DotShstrtab eagerly. This saves a field that is not always used. It also avoids failing a program that doesn't need the section names. llvm-svn: 285753	2016-11-01 21:33:55 +00:00
Rafael Espindola	120dca3b63	Use the existing std::error_code out parameter. This avoids calling exit with a partially constructed object. llvm-svn: 285738	2016-11-01 20:24:22 +00:00

... 5 6 7 8 9 ...

41169 Commits