llvm-project

Commit Graph

Author	SHA1	Message	Date
Wei Mi	785858cf6c	Recommit "Use ValueOffsetPair to enhance value reuse during SCEV expansion". The fix for PR28705 will be committed consecutively. In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion. However, const folding and sext/zext distribution can make the reuse still difficult. A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and S1 = S2 + C_a S3 = S2 + C_b where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused by the fact that S3 is generated from S1 after const folding. In order to do that, we represent ExprValueMap as a mapping from SCEV to ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to V1 - C_a + C_b. Differential Revision: https://reviews.llvm.org/D21313 llvm-svn: 278160	2016-08-09 20:37:50 +00:00
Sanjay Patel	b61346b8b0	regenerate checks and remove 'opt' run dependency llvm-svn: 278154	2016-08-09 20:09:16 +00:00
Anna Thomas	b2d12b81c3	[EarlyCSE] Teach about CSE'ing over invariant.start intrinsics Summary: Teach EarlyCSE about invariant.start intrinsic. Specifically, we can perform store-load, load-load forwarding over this call. Reviewers: majnemer, reames, dberlin, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23268 llvm-svn: 278153	2016-08-09 20:00:47 +00:00
Ying Yi	6b1f5f891f	[llvm-cov] Swapped the line and count columns. In the coverage report, the line and count columns have been swapped to make it more readable. A follow-up commit in compiler-rt is needed Differential Revision: https://reviews.llvm.org/D23281 llvm-svn: 278152	2016-08-09 19:53:35 +00:00
Sanjay Patel	a814d89b61	update to use FileCheck and auto-generate checks llvm-svn: 278150	2016-08-09 19:42:52 +00:00
Lang Hames	bb9431acda	Re-apply r278065 (Weak symbol support in RuntimeDyld) with a fix for ELF. llvm-svn: 278149	2016-08-09 19:27:17 +00:00
David Majnemer	adc688ce9c	[X86] Don't model UD2/UD2B as a terminator A UD2 might make its way into the program via a call to @llvm.trap. Obviously, calls are not terminators. However, we modeled the X86 instruction, UD2, as a terminator. Later on, this confuses the epilogue insertion machinery which results in the epilogue getting inserted before the UD2. For some platforms, like x64, the result is a violation of the ABI. Instead, model UD2/UD2B as a side effecting instruction which may observe memory. llvm-svn: 278144	2016-08-09 17:55:12 +00:00
Simon Pilgrim	76964e3140	[DAGCombiner] Better support for shifting large value type by constants As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths. This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required. Differential Revision: https://reviews.llvm.org/D23007 llvm-svn: 278141	2016-08-09 17:39:11 +00:00
Anna Thomas	037e540f08	[AliasAnalysis] Treat invariant.start as read-memory Summary: We teach alias analysis that invariant.start is readonly. This helps with GVN and memcopy optimizations that currently treat. invariant.start as a clobber. We need to treat this as readonly, so that DSE does not incorrectly remove stores prior to the invariant.start Reviewers: sanjoy, reames, majnemer, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23214 llvm-svn: 278138	2016-08-09 17:18:05 +00:00
Sanjay Patel	4ee824a88d	auto-generate checks llvm-svn: 278137	2016-08-09 17:03:51 +00:00
Sanjay Patel	25af7444df	auto-generate checks llvm-svn: 278136	2016-08-09 17:02:17 +00:00
Sanjay Patel	52958dc111	auto-generate checks llvm-svn: 278135	2016-08-09 16:59:54 +00:00
Sanjay Patel	9f36a2d54b	add tests for missing vector icmp folds llvm-svn: 278132	2016-08-09 16:39:05 +00:00
Sanjay Patel	f36a29199f	update to use FileCheck and auto-generate checks llvm-svn: 278131	2016-08-09 16:19:57 +00:00
Sanjay Patel	a6090256d5	regenerate checks llvm-svn: 278130	2016-08-09 16:17:46 +00:00
Sanjay Patel	e466865f67	add tests for missing vector icmp folds llvm-svn: 278129	2016-08-09 16:05:57 +00:00
Xinliang David Li	9035cfceef	[Profile] turn off verbose warnings by default no prof data for func warning is turned off by default due to its high verbosity and minimal usefulness. Differential Revision: http://reviews.llvm.org/D23295 llvm-svn: 278127	2016-08-09 15:35:28 +00:00
Artur Pilipenko	c710a461b5	[LVI] Make LVI smarter about comparisons with non-constants Make LVI smarter about comparisons with a non-constant. For example, a s< b constraints a to be in [INT_MIN, INT_MAX) range. This is a part of https://llvm.org/bugs/show_bug.cgi?id=28620 fix. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D23205 llvm-svn: 278122	2016-08-09 14:50:08 +00:00
Simon Pilgrim	27740d038c	[X86][XOP] Add support for combining target shuffles to VPERMIL2PD/VPERMIL2PS llvm-svn: 278120	2016-08-09 12:56:15 +00:00
Elena Demikhovsky	0e0e07f436	AVX-512: A new test for FMA intrinsic A new test that explores sub-optimal sequence of FMA intrinsic and FNEG operation. An upcoming patch will fix it. llvm-svn: 278117	2016-08-09 11:54:14 +00:00
Simon Pilgrim	aae7d4a1b6	[X86][XOP] Add support for combining target shuffles to VPPERM llvm-svn: 278114	2016-08-09 10:56:29 +00:00
Dean Michael Berris	3a25d84a51	[XRay] Test for xray_instr_map in object file. (NFC) This makes a trivial change in the emission of the per-function XRay tables, and makes sure that the xray_instr_map section does show up in the object file. llvm-svn: 278113	2016-08-09 10:42:11 +00:00
Artur Pilipenko	d97eedff40	Revert 278107 which causes buildbot failures and in addition has wrong commit message llvm-svn: 278109	2016-08-09 10:00:22 +00:00
Artur Pilipenko	a410d81f64	Teach CorrelatedValuePropagation to mark adds as no wrap Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem. Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D23059 llvm-svn: 278107	2016-08-09 09:41:34 +00:00
Simon Pilgrim	54c32ddf55	[X86][SSE] Fix memory folding of (v)roundsd / (v)roundss We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier. This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481. Differential Revision: https://reviews.llvm.org/D23276 llvm-svn: 278106	2016-08-09 09:32:34 +00:00
Craig Topper	92a4ff1294	[AVX-512] Add support for execution domain switching masked logical ops between floating point and integer domain. This switches PS<->D and PD<->Q. llvm-svn: 278097	2016-08-09 05:26:07 +00:00
Craig Topper	9bd6241106	[X86] Remove the Fv packed logical operation alias instructions. Replace them with patterns to the regular instructions. This enables execution domain fixing which is why the tests changed. llvm-svn: 278090	2016-08-09 03:06:33 +00:00
Craig Topper	de06b51d3d	[X86] Remove unnecessary bitcast from the front of AVX1Only 256-bit logical operation patterns. llvm-svn: 278088	2016-08-09 03:06:26 +00:00
Matthias Braun	7313ca6dbf	X86InstrInfo: Update liveness in classifyLea() We need to update liveness information when we create COPYs in classifyLea(). This fixes http://llvm.org/28301 llvm-svn: 278086	2016-08-09 01:47:26 +00:00
Derek Schuff	53b9af02c8	[WebAssembly] Fix bugs in WebAssemblyLowerEmscriptenExceptions pass * Delete extra '_' prefixes from JS library function names. fixImports() function in JS glue code deals with this for wasm. * Change command-line option names in order to be consistent with asm.js. * Add missing lowering code for llvm.eh.typeid.for intrinsics * Delete commas in mangled function names * Fix a function argument attributes bug. Because we add the pointer to the original callee as the first argument of invoke wrapper, all argument attribute indices have to be incremented by one. Patch by Heejin Ahn Differential Revision: https://reviews.llvm.org/D23258 llvm-svn: 278081	2016-08-09 00:29:55 +00:00
Derek Schuff	b7d6d9e3cd	[WebAssembly] Fix CFI index to account for padding nullptr function The WebAssembly linker now creates a dummy function at index 0 to prevent miscomparisons with the NULL pointer, see https://github.com/WebAssembly/binaryen/pull/658. Thanks to pcc for pointing out this problem! Patch by Dominic Chen Differential Revision: https://reviews.llvm.org/D23137 llvm-svn: 278073	2016-08-08 23:56:01 +00:00
Lang Hames	072728d419	Revert r278065 while I investigate some build-bot breakage. llvm-svn: 278069	2016-08-08 22:57:30 +00:00
Lang Hames	33c0b6bfca	[RuntimeDyld][Orc][MCJIT] Add partial weak-symbol support to RuntimeDyld. This patch causes RuntimeDyld to check for existing definitions when it encounters weak symbols. If a definition already exists then the new weak definition is discarded. All symbol lookups within a "logical dylib" should now agree on the address of any given weak symbol. This allows the JIT to better match the behavior of the static linker for C++ code. This support is only partial, as it does not allow strong definitions that occur after the first weak definition (in JIT symbol lookup order) to override the previous weak definitions. Support for this will be added in a future patch. llvm-svn: 278065	2016-08-08 22:53:37 +00:00
Charles Davis	e9c32c7ed3	Revert "[X86] Support the "ms-hotpatch" attribute." This reverts commit r278048. Something changed between the last time I built this--it takes awhile on my ridiculously slow and ancient computer--and now that broke this. llvm-svn: 278053	2016-08-08 21:20:15 +00:00
Charles Davis	0822aa118e	[X86] Support the "ms-hotpatch" attribute. Summary: Based on two patches by Michael Mueller. This is a target attribute that causes a function marked with it to be emitted as "hotpatchable". This particular mechanism was originally devised by Microsoft for patching their binaries (which they are constantly updating to stay ahead of crackers, script kiddies, and other ne'er-do-wells on the Internet), but is now commonly abused by Windows programs to hook API functions. This mechanism is target-specific. For x86, a two-byte no-op instruction is emitted at the function's entry point; the entry point must be immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding. This padding is where the patch code is written. The two byte no-op is then overwritten with a short jump into this code. The no-op is usually a `movl %edi, %edi` instruction; this is used as a magic value indicating that this is a hotpatchable function. Reviewers: majnemer, sanjoy, rnk Subscribers: dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D19908 llvm-svn: 278048	2016-08-08 21:01:39 +00:00
Krzysztof Parzyszek	341cf3fbe5	[Hexagon] Add pattern for 64-bit mulhs llvm-svn: 278040	2016-08-08 19:24:25 +00:00
Elliot Colp	d9e6668928	Re-add SystemZ SNaN test The floating-point bug affecting ninja-x64-msvc-RA-centos6 is fixed (r277813) so this test should now pass llvm-svn: 278034	2016-08-08 18:11:13 +00:00
Nirav Dave	f45fd2ba87	[X86] Improve code size on X86 segment moves Moves of a value to a segment register from a 16-bit register is equivalent to one from it's corresponding 32-bit register. Match gas's behavior and rewrite instructions to the shorter of equivalent forms. Reviewers: rnk, ab Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23166 llvm-svn: 278031	2016-08-08 18:01:04 +00:00
Oliver Stannard	8331aaee8f	[ARM] Add support for embedded position-independent code This patch adds support for some new relocation models to the ARM backend: * Read-only position independence (ROPI): Code and read-only data is accessed PC-relative. The offsets between all code and RO data sections are known at static link time. This does not affect read-write data. * Read-write position independence (RWPI): Read-write data is accessed relative to the static base register (r9). The offsets between all writeable data sections are known at static link time. This does not affect read-only data. These two modes are independent (they specify how different objects should be addressed), so they can be used individually or together. They are otherwise the same as the "static" relocation model, and are not compatible with SysV-style PIC using a global offset table. These modes are normally used by bare-metal systems or systems with small real-time operating systems. They are designed to avoid the need for a dynamic linker, the only initialisation required is setting r9 to an appropriate value for RWPI code. I have only added support to SelectionDAG, not FastISel, because FastISel is currently disabled for bare-metal targets where these modes would be used. Differential Revision: https://reviews.llvm.org/D23195 llvm-svn: 278015	2016-08-08 15:28:31 +00:00
Zhan Jun Liau	4fbc3f4a37	[SystemZ] Add support for the .insn directive Summary: Add support for the .insn directive. .insn is an s390 specific directive that allows encoding of an instruction instead of using a mnemonic. The motivating case is some code in node.js that requires support for the .insn directive. Reviewers: koriakin, uweigand Subscribers: koriakin, llvm-commits Differential Revision: https://reviews.llvm.org/D21809 llvm-svn: 278012	2016-08-08 15:13:08 +00:00
Sebastian Pop	bfb96c5bfd	GVN-hoist: enable by default llvm-svn: 278010	2016-08-08 14:46:15 +00:00
Silviu Baranga	fa00ba3c1a	[AArch64] PR28877: Don't assume we're running after legalization when creating vcvtfp2fxs Summary: The DAG combine transformation that was generating the aarch64_neon_vcvtfp2fxs node was assuming that all inputs where legal and wasn't accounting that the input could be a v4f64 if we're trying to do the transformation before legalization. We now bail out in this case. All illegal types besides v4f64 were already rejected. Fixes https://llvm.org/bugs/show_bug.cgi?id=28877. Reviewers: jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D23261 llvm-svn: 278002	2016-08-08 13:13:57 +00:00
Daniel Sanders	3feeb9c851	Re-commit r277988: [mips][ias] Fix all the hacks related to MIPS-specific unary operators (%hi/%lo/%gp_rel/etc.). Hopefully with the MSVC builds fixed. I've added a missing '#include <tuple>' that gcc and clang don't seem to need. llvm-svn: 277995	2016-08-08 11:50:25 +00:00
Daniel Sanders	cae9aeed39	Revert r277988: [mips][ias] Fix all the hacks related to MIPS-specific unary operators (%hi/%lo/%gp_rel/etc.). It seems that MSVC doesn't like std::tie(). llvm-svn: 277990	2016-08-08 09:33:14 +00:00
Daniel Sanders	2ab623b5a3	[mips][ias] Fix all the hacks related to MIPS-specific unary operators (%hi/%lo/%gp_rel/etc.). Summary: They are now lexed as a single token on targets where MCAsmInfo::HasMipsExpressions is true and then parsed in a similar way to the '~' operator as part of MCExpr::parseExpression. As a result: * expressions and immediates no longer have different parsing rules. The difference is now solely down to whether evaluateAsAbsolute() succeeds. * %hi(%neg(%gp_rel(x))) are no longer parsed as a single operator and decomposed into the three MipsMCExpr nodes. They are parsed directly as three MipsMCExpr nodes. * parseMemOperand no longer needs to eat all the surrounding parenthesis to get at the outermost operator to make this work * %hi(%neg(%gp_rel(x))) and %lo(%neg(%gp_rel(x))) are no longer the only 3-in-1 relocs that parse for N64. They're still the only combinations that are permitted in relocatable expressions though. Fixing that should be a later patch. * We no longer need to list all the tokens that can occur as the first token of an expression or immediate. test/MC/Mips/expr1.s: This change also prevents the incorrect lowering of %lo(2*4)+foo to %lo(8+foo) which is not an equivalent expression (the difference is whether foo is truncated to 16-bit or not) and the test has been updated to account for the macro expansion the correct expression requires. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D23110 llvm-svn: 277988	2016-08-08 09:20:52 +00:00
Daniel Berlin	4b4c722e79	[MSSA] Fix PR28880 by fixing use optimizer's lower bound tracking behavior. Summary: In the use optimizer, we need to keep of whether the lower bound still dominates us or else we may decide a lower bound is still valid when it is not due to intervening pushes/pops. Fixes PR28880 (and probably a bunch of other things). Reviewers: george.burgess.iv Subscribers: MatzeB, llvm-commits, sebpop Differential Revision: https://reviews.llvm.org/D23237 llvm-svn: 277978	2016-08-08 04:44:53 +00:00
Eli Friedman	02419a9849	[JumpThreading] Fix handling of aliasing metadata. Summary: The correctness fix here is that when we CSE a load with another load, we need to combine the metadata on the two loads. This matches the behavior of other passes, like instcombine and GVN. There's also a minor optimization improvement here: for load PRE, the aliasing metadata on the inserted load should be the same as the metadata on the original load. Not sure why the old code was throwing it away. Issue found by inspection. Differential Revision: http://reviews.llvm.org/D21460 llvm-svn: 277977	2016-08-08 04:10:22 +00:00
Davide Italiano	e3b916d164	[SimplifyLibCalls] Emit sqrt intrinsic instead of a libcall. llvm-svn: 277972	2016-08-08 03:23:01 +00:00
Eli Friedman	2a65dd1ba6	[SROA] Fix crash with lifetime intrinsic partially covering alloca. Summary: PromoteMemToReg looks specifically for the pattern bitcast+lifetime.start (or a bitcast-equivalent GEP); any offset will lead to an assertion failure. Fixes https://llvm.org/bugs/show_bug.cgi?id=27999 . Differential Revision: https://reviews.llvm.org/D22737 llvm-svn: 277969	2016-08-08 01:30:53 +00:00
Craig Topper	f44423120f	[AVX-512] Improve lowering of inserting a single element into lowest element of a 512-bit vector of zeroes by using vmovq/vmovd/vmovss/vmovsd. llvm-svn: 277965	2016-08-07 21:52:59 +00:00
Davide Italiano	27da131f32	[SLC] Emit an intrinsic instead of a libcall for pow. Differential Revision: https://reviews.llvm.org/D22104 llvm-svn: 277963	2016-08-07 20:27:03 +00:00
Nico Weber	99ceee8a85	Revert r277905, it caused PR28894 llvm-svn: 277962	2016-08-07 20:18:04 +00:00
Craig Topper	2c51c74d52	[AVX-512] Add 512-bit logical operations to load folding tables. Add avx512f stack folding test and move some tests from the avx512vl test. llvm-svn: 277961	2016-08-07 17:14:09 +00:00
Craig Topper	938e7ab9e1	[AVX-512] Add EVEX encoded floating point MAX/MIN instructions to the load folding tables. llvm-svn: 277960	2016-08-07 17:14:05 +00:00
Elena Demikhovsky	dca03bebd3	AVX-512: Changed lowering of BITCAST between i1 vectors and i8/i16/i32 integer values Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV. It allows to suppress two opposite BITCAST operations and avoid redundant "movs". Differential Revision: https://reviews.llvm.org/D23247 llvm-svn: 277958	2016-08-07 13:05:58 +00:00
Simon Pilgrim	69f2299efc	[X86][AVX512BW] Add sext/zext AVX512BW 512-bit vector tests llvm-svn: 277957	2016-08-07 12:41:36 +00:00
Simon Pilgrim	a23141eca7	[X86][AVX512] Add sext/zext to 512-bit vector tests llvm-svn: 277956	2016-08-07 12:10:46 +00:00
Elena Demikhovsky	2fabdcc60a	AVX-512: Added a test for cmp intrinsics This is a new test that should explore a current suboptimal sequence in passing values between cmp and kor intrinsics. The code will be optimized in an upcoming patch. Submitted bug here: https://llvm.org/bugs/show_bug.cgi?id=28839 llvm-svn: 277954	2016-08-07 09:29:34 +00:00
David Majnemer	d150137f64	[InstSimplify] Fold gep (gep V, C), (sub 0, V) to C llvm-svn: 277952	2016-08-07 07:58:12 +00:00
David Majnemer	dc8767a49a	[InstSimplify] Try hard to simplify pointer comparisons Simplify ptrtoint comparisons involving operands with different source types. llvm-svn: 277951	2016-08-07 07:58:10 +00:00
David Majnemer	4e4f4437c2	[InstCombine] Infer inbounds on geps of allocas llvm-svn: 277950	2016-08-07 07:58:00 +00:00
Craig Topper	49841c3812	[X86] Add commutable floating point max/min instructions to the load folding tables. llvm-svn: 277949	2016-08-07 05:39:51 +00:00
Craig Topper	2c1f6706de	[AVX-512] Add andnps/andnpd to the avx512vl stack folding test. llvm-svn: 277948	2016-08-07 05:39:48 +00:00
Michael Zolotukhin	442b82f0eb	Revert "Revert "[LoopSimplify] Fix updating LCSSA after separating nested loops."" This reverts commit r277901. Reaaply the commit as it looks like it has nothing to do with the bots failures. llvm-svn: 277946	2016-08-07 01:56:54 +00:00
Lang Hames	73976f622d	[ORC] Re-apply r277896, removing bogus triples and datalayouts that broke tests on linux last time. llvm-svn: 277942	2016-08-06 22:36:26 +00:00
Simon Pilgrim	bc573ca1b8	[X86][AVX2] Improve sign/zero extension on AVX2 targets Split extensions to large vectors into 256-bit chunks - the equivalent of what we do with pre-AVX2 into 128-bit chunks llvm-svn: 277939	2016-08-06 21:21:12 +00:00
Gor Nishanov	874651129e	[Coroutines] Passify the build bots. Remove restart-trigger.ll test for now llvm-svn: 277937	2016-08-06 21:01:22 +00:00
Gor Nishanov	2ed6e788a8	[Coroutines] Part 5: Add CGSCC restart trigger Summary: CoroSplit pass processes the coroutine twice. First, it lets it go through complete IPO optimization pipeline as a single function. It forces restart of the pipeline by inserting an indirect call to an empty function "coro.devirt.trigger" which is devirtualized by CoroElide pass that triggers a restart of the pipeline by CGPassManager. (In later patches, when CoroSplit pass sees the same coroutine the second time, it splits it up, adds coroutine subfunctions to the SCC to be processed by IPO pipeline.) Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization (https://reviews.llvm.org/D23229) 5.Add CGSCC restart trigger + tests. <= we are here 6.Add coroutine heap elision + tests. 7.Add the rest of the logic (split into more patches) Reviewers: mehdi_amini, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23234 llvm-svn: 277936	2016-08-06 20:44:39 +00:00
Craig Topper	19505bc354	[AVX-512] Add AVX-512 scalar CVT instructions to hasUndefRegUpdate. llvm-svn: 277933	2016-08-06 19:31:50 +00:00
Craig Topper	b0476fcc1f	[AVX-512] Add AVX512 run line to a test and re-generate the checks. Future commits will refine some of the sequences. llvm-svn: 277932	2016-08-06 19:31:47 +00:00
Simon Pilgrim	7d168e19e8	[X86][SSE] Enable commutation between MOVHLPS and UNPCKHPD Assuming SSE2 is available then we can safely commute between these, removing some unnecessary register moves and improving memory folding opportunities. VEX encoded versions don't benefit so I haven't added support to them. llvm-svn: 277930	2016-08-06 18:40:28 +00:00
Simon Pilgrim	ef10e922d8	[X86][SSE] Regenerate SSE1 shuffle tests llvm-svn: 277925	2016-08-06 13:46:09 +00:00
David Majnemer	a19d0f2f3e	[ValueTracking] Teach computeKnownBits about [su]min/max Reasoning about a select in terms of a min or max allows us to derive a tigher bound on the result. llvm-svn: 277914	2016-08-06 08:16:00 +00:00
Sanjoy Das	ba04d3a620	[InstCombine] Don't coerce non-integral pointers to integers Reviewers: majnemer Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23231 llvm-svn: 277910	2016-08-06 02:58:48 +00:00
Matthias Braun	9a0035d8d2	Revert "(refs/bisect/bad) GVN-hoist: enable by default" GVN-Hoist appears to miscompile llvm-testsuite SingleSource/Benchmarks/Misc/fbench.c at the moment. I filed http://llvm.org/PR28880 This reverts commit r277786. llvm-svn: 277909	2016-08-06 02:23:15 +00:00
Gor Nishanov	31d8c9af89	Part 4c: Coroutine Devirtualization: Devirtualize coro.resume and coro.destroy. Summary: This is the 4c patch of the coroutine series. CoroElide pass now checks if PostSplit coro.begin is referenced by coro.subfn.addr intrinsics. If so replace coro.subfn.addrs with an appropriate coroutine subfunction associated with that coro.begin. Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization <= we are here 5.Add CGSCC restart trigger + tests. 6.Add coroutine heap elision + tests. 7.Add the rest of the logic (split into more patches) Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23229 llvm-svn: 277908	2016-08-06 02:16:35 +00:00
Nico Weber	c893e603ab	Revert r277896. It breaks ExecutionEngine/OrcLazy/weak-function.ll on most bots. Script: -- ... -- Exit Code: 1 Command Output (stderr): -- Could not find main function. llvm-svn: 277907	2016-08-06 02:00:45 +00:00
Kyle Butt	71cb44d969	CodeGen: If Convert blocks that would form a diamond when tail-merged. The following function currently relies on tail-merging for if conversion to succeed. The common tail of cond_true and cond_false is extracted, and this then forms a diamond pattern that can be successfully if converted. If this block does not get extracted, either because tail-merging is disabled or the threshold is higher, we should still recognize this pattern and if-convert it. define i32 @t2(i32 %a, i32 %b) nounwind { entry: %tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1] br i1 %tmp1434, label %bb17, label %bb.outer bb.outer: ; preds = %cond_false, %entry %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ] %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ] br label %bb bb: ; preds = %cond_true, %bb.outer %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ] %tmp. = sub i32 0, %b_addr.021.0.ph %tmp.40 = mul i32 %indvar, %tmp. %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph br i1 %tmp3, label %cond_true, label %cond_false cond_true: ; preds = %bb %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph %indvar.next = add i32 %indvar, 1 br i1 %tmp1437, label %bb17, label %bb cond_false: ; preds = %bb %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0 %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10 br i1 %tmp14, label %bb17, label %bb.outer bb17: ; preds = %cond_false, %cond_true, %entry %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ] ret i32 %a_addr.026.1 } Without tail-merging or diamond-tail if conversion: LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ble LBB1_3 @ BB#2: @ %cond_true @ in Loop: Header=BB1_1 Depth=1 subs r0, r0, r1 cmp r1, r0 it ne cmpne r0, r1 bgt LBB1_4 LBB1_3: @ %cond_false @ in Loop: Header=BB1_1 Depth=1 subs r1, r1, r0 cmp r1, r0 bne LBB1_1 LBB1_4: @ %bb17 bx lr With diamond-tail if conversion, but without tail-merging: @ BB#0: @ %entry cmp r0, r1 it eq bxeq lr LBB1_1: @ %bb @ =>This Inner Loop Header: Depth=1 cmp r0, r1 ite le suble r1, r1, r0 subgt r0, r0, r1 cmp r1, r0 bne LBB1_1 @ BB#2: @ %bb17 bx lr llvm-svn: 277905	2016-08-06 01:52:37 +00:00
Michael Zolotukhin	09cf304ebc	Revert "[LoopSimplify] Fix updating LCSSA after separating nested loops." This reverts commit r277877. Try to appease clang-x64-ninja-win7 buildbot. llvm-svn: 277901	2016-08-06 01:48:51 +00:00
Lang Hames	62a459603c	[ORC] Add (partial) weak symbol support to the CompileOnDemand layer. This adds partial support for weak functions to the CompileOnDemandLayer by modifying the addLogicalModule method to check for existing stub definitions before building a new stub for a weak function. This scheme is sufficient to support ODR definitions, but fails for general weak definitions if strong definition is encountered after the first weak definition. (A more extensive refactor will be required to fully support weak symbols). This patch does not add weak symbol support to RuntimeDyld: I hope to add that in the near future. llvm-svn: 277896	2016-08-06 00:54:43 +00:00
Sanjoy Das	cf181867a6	[IRCE] Preserve loop-simplify form Fixes PR28764. Right now there is no way to test this, but (as mentioned on the PR) with Michael Zolotukhin's yet to be checked in LoopSimplify verfier, 8 of the llvm-lit tests for IRCE crash. llvm-svn: 277891	2016-08-06 00:01:56 +00:00
Michael Zolotukhin	4c65c3596a	[LoopSimplify] Fix updating LCSSA after separating nested loops. This fixes PR28825. The problem was that we only checked if a value from a created inner loop is used in the outer loop, and fixed LCSSA for them. But we missed to fixup LCSSA for values used in exits of the outer loop. llvm-svn: 277877	2016-08-05 21:52:58 +00:00
Justin Bogner	6863027f00	PowerPC: Add a triple to this test This is running opt without specifying a triple, which isn't correct. llvm-svn: 277875	2016-08-05 21:49:54 +00:00
Marek Olsak	355a8642b4	AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland Summary: This is the setting of the Vulkan closed source driver. It decreases the max wave count from 10 to 8. 26010 shaders in 14650 tests Totals: VGPRS: 829593 -> 808440 (-2.55 %) Spilled SGPRs: 81878 -> 42226 (-48.43 %) Spilled VGPRs: 367 -> 358 (-2.45 %) Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread Code Size: 36677864 -> 35923932 (-2.06 %) bytes There is a massive decrease in SGPR spilling in general and -7.4% spilled VGPRs for DiRT Showdown (= SGPRs spilled to scratch?) Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23034 llvm-svn: 277867	2016-08-05 21:23:29 +00:00
Weiming Zhao	f68a6a720c	[ARM] Constant Materialize: imms with specific value can be encoded into mov.w Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes. I'm resubmitting this patch. The test case in the original commit r277610 does not specify triple, so builds with differnt default triple will have different output. This patch fixed trile as thumb-darwin-apple. Reviewers: john.brawn, jmolloy, bruno Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits Differential Revision: https://reviews.llvm.org/D23090 llvm-svn: 277865	2016-08-05 20:58:29 +00:00
Sanjoy Das	d4c85af7fd	[SCEV] Un-grep'ify tests; NFC llvm-svn: 277861	2016-08-05 20:33:49 +00:00
Dehao Chen	de39cb9384	Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites. Reviewers: davidxl, eraman Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D22368 llvm-svn: 277860	2016-08-05 20:28:41 +00:00
Ivan Krasin	b05e06e4fd	WholeProgramDevirt: print remarks with devirtualized method names. Summary: Chrome on Linux uses WholeProgramDevirt for speed ups, and it's important to detect regressions on both sides: the toolchain, if fewer methods get devirtualized after an update, and Chrome, if an innocently looking change caused many hot methods become virtual again. The need to track devirtualized methods is not Chrome-specific, but it's probably the only user of the pass at this time. Reviewers: kcc Differential Revision: https://reviews.llvm.org/D23219 llvm-svn: 277856	2016-08-05 19:45:16 +00:00
Sanjoy Das	6fa08aafcc	[ConstantFolding] Don't create illegal (non-integral) inttoptrs Reviewers: majnemer, arsenm Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23182 llvm-svn: 277854	2016-08-05 19:23:29 +00:00
Sanjoy Das	b0b4e86215	[SCEV] Don't infinitely recurse on unreachable code llvm-svn: 277848	2016-08-05 18:34:14 +00:00
Kevin Enderby	600fb3f28e	Add the first of what will be a long line of additional error checks for invalid Mach-O files. This is where an LC_SEGMENT load command has a fileoff field that extends past the end of the file. Also fix llvm-nm and llvm-size to remove the errorToErrorCode() call so error messages are printed. And needed to update a few test cases now that they do print the error messages just a bit differently. llvm-svn: 277845	2016-08-05 18:19:40 +00:00
Dehao Chen	17c6afc35b	Do not assign new discriminator for all intrinsics. Summary: We do not care about intrinsic calls when assigning discriminators. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23212 llvm-svn: 277843	2016-08-05 17:56:49 +00:00
Tim Northover	14e7f73a0f	GlobalISel: clear pending phis after MachineFunction translated Test is just reordering the existing functions (it would trigger for any function after one with a phi). llvm-svn: 277841	2016-08-05 17:50:36 +00:00
Simon Pilgrim	69b6a70834	[X86][SSE] Add initial support for 2 input target shuffle combining. At the moment only the INSERTPS matching can actually use 2 inputs but the plumbing is now in place. llvm-svn: 277839	2016-08-05 17:36:14 +00:00
Tim Northover	97d0cb3165	GlobalISel: IRTranslate PHI instructions llvm-svn: 277835	2016-08-05 17:16:40 +00:00
Gor Nishanov	f3bb361750	opt: Adding -O0 to opt tool Summary: Having -O0 in opt allows testing that -O0 optimization pipeline is built correctly. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23208 llvm-svn: 277829	2016-08-05 16:27:33 +00:00
Ulrich Weigand	c3b495a649	[PowerPC] Wrong fast-isel codegen for VSX floating-point loads There were two locations where fast-isel would generate a LFD instruction with a target register class VSFRC instead of F8RC when VSX was enabled. This can ccause invalid registers to be used in certain cases, like: lfd 36, ... instead of using a VSX load instruction. The wrong register number gets silently truncated, causing invalid code to be generated. The first place is PPCFastISel::PPCEmitLoad, which had multiple problems: 1.) The IsVSSRC and IsVSFRC flags are not initialized correctly, since they are computed from resultReg, which is still zero at this point in many cases. Fixed by changing the helper routines to operate on a register class instead of a register and passing in UseRC. 2.) Even with this fixed, Is64VSXLoad is still wrong due to a typo: bool Is32VSXLoad = IsVSSRC && Opc == PPC::LFS; bool Is64VSXLoad = IsVSSRC && Opc == PPC::LFD; The second line needs to use isVSFRC (like PPCEmitStore does). 3.) Once both the above are fixed, we're now generating a VSX instruction -- but an incorrect one, since generation of an indexed instruction with null index is wrong. Fixed by copying the code handling the same issue in PPCEmitStore. The second place is PPCFastISel::PPCMaterializeFP, where we would emit an LFD to load a constant from the literal pool, and use the wrong result register class. Fixed by hardcoding a F8RC class even on systems supporting VSX. Fixes: https://llvm.org/bugs/show_bug.cgi?id=28630 Differential Revision: https://reviews.llvm.org/D22632 llvm-svn: 277823	2016-08-05 15:22:05 +00:00
Zhan Jun Liau	8d3f29759f	[SystemZ] Add missing classes and instructions Summary: Add instruction formats E, RSI, SSd, SSE, and SSF. Added BRXH, BRXLE, PR, MVCK, STRAG, and ECTG instructions to test out those formats. Reviewers: uweigand Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23179 llvm-svn: 277822	2016-08-05 15:14:34 +00:00
Benjamin Kramer	000a87d1b0	Actually, r277337 was fine. Just kill the DAGs that made the test allow nondeterminism. llvm-svn: 277821	2016-08-05 14:58:34 +00:00
Benjamin Kramer	aa160c22f7	[SimplifyCFG] Make range reduction code deterministic. This generated IR based on the order of evaluation, which is different between GCC and Clang. With that in mind you get bootstrap miscompares if you compare a Clang built with GCC-built Clang vs. Clang built with Clang-built Clang. Diagnosing that made my head hurt. This also reverts commit r277337, which "fixed" the test case. llvm-svn: 277820	2016-08-05 14:55:02 +00:00
Sanjay Patel	5a9b9f98c0	reduce tests; auto-generate checks llvm-svn: 277819	2016-08-05 14:50:11 +00:00
Strahinja Petrovic	30e0ce8e9f	[PowerPC] fix passing long double arguments to function (soft-float) This patch fixes passing long double type arguments to function in soft float mode. If there is less than 4 argument registers free (long double type is mapped in 4 gpr registers in soft float mode) long double type argument must be passed through stack. Differential Revision: https://reviews.llvm.org/D20114. llvm-svn: 277804	2016-08-05 08:47:26 +00:00
Nicolai Haehnle	870bf1788c	[InstCombine] try to fold (select C, (sext A), B) into logical ops Summary: Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and B is a compatible constant, also for zext instead of sext. This will then be further folded into logical operations. The transformation would be valid for non-i1 types as well, but other parts of InstCombine prefer to have sext from non-i1 as an operand of select. Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32 for boolean operations. With this change, the boolean logic is fully recovered. Reviewers: majnemer, spatel, tstellarAMD Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22747 llvm-svn: 277801	2016-08-05 08:22:29 +00:00
Bruno Cardoso Lopes	358e60a6b3	[LIT][Darwin] Change %ld64 to be prefixed with DYLD_INSERT_LIBRARIES Followup from r277778, after Mehdi's comments. Expand %ld64 to perform the necessary preload instead, that way new tests do not need to worry about setting up DYLD_INSERT_LIBRARIES themselves. rdar://problem/24300926 llvm-svn: 277788	2016-08-04 23:58:30 +00:00
Sebastian Pop	c33f0e25c9	GVN-hoist: enable by default llvm-svn: 277786	2016-08-04 23:49:07 +00:00
Sebastian Pop	429740a6c2	GVN-hoist: fix early exit logic The patch splits a complex && if condition into easier to read and understand logic. That wrong early exit condition was letting some instructions with not all operands available pass through when HoistingGeps was true. Differential Revision: https://reviews.llvm.org/D23174 llvm-svn: 277785	2016-08-04 23:49:05 +00:00
Michael Kuperstein	3ceac2bbd5	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782	2016-08-04 22:48:03 +00:00
Sanjay Patel	3bade138b5	[InstCombine] use m_APInt to allow icmp eq (mul X, C1), C2 folds for splat constant vectors This concludes the splat vector enhancements for foldICmpEqualityWithConstant(). Other commits in this series: https://reviews.llvm.org/rL277762 https://reviews.llvm.org/rL277752 https://reviews.llvm.org/rL277738 https://reviews.llvm.org/rL277731 https://reviews.llvm.org/rL277659 https://reviews.llvm.org/rL277638 https://reviews.llvm.org/rL277629 llvm-svn: 277779	2016-08-04 22:19:27 +00:00
Bruno Cardoso Lopes	8daab7582b	[LIT][Darwin] Preload libclang_rt.asan_osx_dynamic.dylib when necessary Green Dragon's darwin stage2 asan bot fails on some checks: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check test/tools/lto/hide-linkonce-odr.ll test/tools/lto/opt-level.ll ERROR: Interceptors are not working. This may be because AddressSanitizer is loaded too late (e.g. via dlopen) To fix this, %ld64 needs to load 'libclang_rt.asan_osx_dynamic.dylib' before libLTO.dylib, via DYLD_INSERT_LIBRARIES. This won't work by updating config.environment, since some shim binary in the way scrubs the env vars. Instead, provide the path to this lib through %asanrtlib, which can then be used by tests directly with DYLD_INSERT_LIBRARIES. rdar://problem/24300926 llvm-svn: 277778	2016-08-04 22:01:38 +00:00
Tim Northover	61c16142b4	GlobalISel: extend add widening to SUB, MUL, OR, AND and XOR. These are the operations that are trivially identical. Division is omitted for now because you need to use the correct sign/zero extension. llvm-svn: 277775	2016-08-04 21:39:49 +00:00
Tim Northover	1cfa919b3d	GlobalISel: add support for G_MUL llvm-svn: 277774	2016-08-04 21:39:44 +00:00
David Majnemer	b48ed0f721	[CloneFunction] Add a testcase for r277691/r277693 PR28848 had a very nice reduction of the underlying cause of the bug. Our ValueMap had, in an entry for an Instruction, a ConstantInt. This is not at all unexpected but should be handled properly. llvm-svn: 277773	2016-08-04 21:28:59 +00:00
Chris Bieneman	17e42a0980	[Mach0YAML] Change n_type from uint8_t to llvm::yaml::Hex8 Since this field is generally masked, it is way easier to understand it as a Hex value than decimal. llvm-svn: 277770	2016-08-04 21:07:39 +00:00
Tim Northover	9656f1476c	GlobalISel: implement narrowing for G_ADD. llvm-svn: 277769	2016-08-04 20:54:13 +00:00
Matt Arsenault	6ad97732aa	GVNHoist: Don't hoist convergent calls llvm-svn: 277767	2016-08-04 20:52:57 +00:00
David Majnemer	f93082e71a	[coroutines] Part 4[ab]: Coroutine Devirtualization: Lower coro.resume and coro.destroy. This is the forth patch in the coroutine series. CoroEaly pass now lowers coro.resume and coro.destroy intrinsics by replacing them with an indirect call to an address returned by coro.subfn.addr intrinsic. This is done so that CGPassManager recognizes devirtualization when CoroElide replaces a call to coro.subfn.addr with an appropriate function address. Patch by Gor Nishanov! Differential Revision: https://reviews.llvm.org/D22998 llvm-svn: 277765	2016-08-04 20:30:07 +00:00
Sanjay Patel	d938e88e89	[InstCombine] use m_APInt to allow icmp eq (and X, C1), C2 folds for splat constant vectors llvm-svn: 277762	2016-08-04 20:05:02 +00:00
Yaxun Liu	86c052238a	[OpenCL] Add missing tests for getOCLTypeName Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown. Patch by Aaron En Ye Shi. Differential Revision: https://reviews.llvm.org/D22964 llvm-svn: 277759	2016-08-04 19:45:00 +00:00
Tim Northover	2f32e7f0ac	AArch64: don't assume all i128s are BUILD_PAIRs It leads to a crash when they're not. I'm sure I've made this mistake before, at least once. llvm-svn: 277755	2016-08-04 19:32:28 +00:00
Chris Bieneman	9f749c8e03	[macho2yaml] String table can contain null strings Since the string table being read from the MachO is a properly bounded StringRef including null strings is safe and reasonable. This occurs frequently with stripped binaries where the string table has been modified. llvm-svn: 277753	2016-08-04 19:19:25 +00:00
Sanjay Patel	b3de75d3a0	[InstCombine] use m_APInt to allow icmp eq (or X, C1), C2 folds for splat constant vectors llvm-svn: 277752	2016-08-04 19:12:12 +00:00
Tim Northover	06db18fbf8	GlobalISel: also add G_TRUNC to IRTranslator. llvm-svn: 277749	2016-08-04 18:35:17 +00:00
Tim Northover	323358184e	GlobalISel: add code to widen scalar G_ADD llvm-svn: 277747	2016-08-04 18:35:11 +00:00
Sanjay Patel	80f2eec4b2	remove FIXME comments (fixed with r277738) llvm-svn: 277744	2016-08-04 18:14:02 +00:00
Derek Schuff	732636d901	[WebAssembly] Check return value of getRegForValue in FastISel Previously, FastISel for WebAssembly wasn't checking the return value of `getRegForValue` in certain cases, which would generate instructions referencing NoReg. This patch fixes this behavior. Patch by Dominic Chen Differential Revision: https://reviews.llvm.org/D23100 llvm-svn: 277742	2016-08-04 18:01:52 +00:00
Krzysztof Parzyszek	04c0796e37	[Hexagon] Validate register class when doing bit simplification llvm-svn: 277740	2016-08-04 17:56:19 +00:00
Sanjay Patel	bcaf6f39dd	[InstCombine] use m_APInt to allow icmp eq (op X, Y), C folds for splat constant vectors I'm removing a misplaced pair of more specific folds from InstCombine in this patch as well, so we know where those folds are happening in InstSimplify. llvm-svn: 277738	2016-08-04 17:48:04 +00:00
Sanjay Patel	bf82f44e7b	add tests for missing vector folds llvm-svn: 277736	2016-08-04 16:48:30 +00:00
Alina Sbirlea	6f937b1144	LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments. Summary: TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses. A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted. Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures: - x86 failing tests either did not have the right target, or the right alignment. - NVPTX failing tests did not have the right alignment. - AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks for a maximum size allowed and relies on the next condition checking for %4 for correctness. This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list). Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure), so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context. Reviewers: arsenm, jlebar, tstellarAMD Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23068 llvm-svn: 277735	2016-08-04 16:38:44 +00:00
Daniel Sanders	5dcbac57c5	[mips] Set Personality and LSDA encoding for FreeBSD Reviewers: seanbruno, sdardis Subscribers: tberghammer, danalbert, srhines, dsanders, sdardis, llvm-commits, seanbruno Differential Revision: https://reviews.llvm.org/D23113 llvm-svn: 277732	2016-08-04 15:36:03 +00:00
Sanjay Patel	9d591d15ec	[InstCombine] use m_APInt to allow icmp eq (sub C1, X), C2 folds for splat constant vectors llvm-svn: 277731	2016-08-04 15:19:25 +00:00
Krzysztof Parzyszek	7773c58458	[Hexagon] Clear kill flags from modified registers in peephole optimizer llvm-svn: 277727	2016-08-04 14:17:16 +00:00
Nikolai Bozhenov	f679530ba1	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725	2016-08-04 12:47:28 +00:00
Hrvoje Varga	846bdb746d	[mips][microMIPS] Implement CFC1, CFC2, CTC1 and CTC2 instructions Differential Revision: https://reviews.llvm.org/D22347 llvm-svn: 277719	2016-08-04 11:22:52 +00:00
Simon Pilgrim	c8fe132756	[X86] Dropped XOP ctbits checks - they match the AVX checks llvm-svn: 277718	2016-08-04 11:04:13 +00:00
Simon Pilgrim	5d5ca9c0cb	[X86][SSE] Add initial costs for vector CTTZ/CTLZ llvm-svn: 277716	2016-08-04 10:51:41 +00:00
Ying Yi	0ef31b7960	[LLVM-COV]Replace tabs to the space indentations in the HTML coverage report. When using orbis-llvm-cov.exe to generate the HTML report, the HTML report can look quite different to the source file if it includes tabs.The default tab size is 2 spaces instead of 8 spaces. A command line switch is be added to set the tab size. Differential Revision: https://reviews.llvm.org/D23087 llvm-svn: 277715	2016-08-04 10:39:43 +00:00
Simon Pilgrim	8ae6dad49b	[X86][SSE] Don't decide when to scalarize CTTZ/CTLZ for performance at lowering - this is what cost models are for Improved CTTZ/CTLZ costings will be added shortly llvm-svn: 277713	2016-08-04 10:14:39 +00:00
Simon Dardis	57f4ae4625	[mips] Enable tail calls by default Enable tail calls by default for (micro)MIPS(64). microMIPS is slightly more tricky than doing it for MIPS(R6) or microMIPSR6. microMIPS has two instruction encodings: 16bit and 32bit along with some restrictions on the size of the instruction that can fill the delay slot. For safe tail calls for microMIPS, the delay slot filler attempts to find a correct size instruction for the delay slot of TAILCALL pseudos. Reviewers: dsanders, vkalintris Subscribers: jfb, dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D21138 llvm-svn: 277708	2016-08-04 09:17:07 +00:00
Dean Michael Berris	7e9abea2ae	[XRay] Align entry and return sleds to 2 byte boundaries This should ensure that we can atomically write two bytes (on top of the retq and the one past it) and have those two bytes not straddle cache lines. We also move the label past the alignment instruction so that we can refer to the actual first instruction, as opposed to potential padding before the aligned instruction. Update the tests to allow us to reflect the new order of assembly. Reviewers: rSerge, echristo, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23101 llvm-svn: 277701	2016-08-04 07:37:28 +00:00
Matt Arsenault	b0e32f1ba1	AMDGPU: Fix a slow test by using basic regalloc This just tests that the register limit isn't exceeded, so the regisetr allocation doesn't need to be great.' The critically slow part is all in greedy RA, so switch to basic. llvm-svn: 277700	2016-08-04 07:04:54 +00:00
Amaury Sechet	bf3adfdbfb	Fix intrinsics.ll test llvm-svn: 277695	2016-08-04 05:35:25 +00:00
Amaury Sechet	6bea674c43	Add popcount(n) == bitsize(n) -> n == -1 transformation. Summary: As per title. Reviewers: majnemer, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23139 llvm-svn: 277694	2016-08-04 05:27:20 +00:00
David Majnemer	909793fa63	Reinstate "[CloneFunction] Don't remove side effecting calls" This reinstates r277611 + r277614 and reverts r277642. A cast_or_null should have been a dyn_cast_or_null. llvm-svn: 277691	2016-08-04 04:24:02 +00:00
Bruno Cardoso Lopes	bd887581fc	Revert "GVN-hoist: enable by default" & "Make GVN Hoisting obey optnone/bisect." This reverts commits r277685 & r277688. r277685 broke compiler-rt compilation http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/23335 and r277685 is a followup from it. llvm-svn: 277690	2016-08-04 04:16:24 +00:00
Chandler Carruth	a053a88df5	[PM] Change the name of the repeating utility to something less overloaded (and simpler). Sean rightly pointed out in code review that we've started using "wrapper pass" as a specific part of the old pass manager, and in fact it is more applicable there. Here, we really have a pass template to build a repeated pass, so call it that. llvm-svn: 277689	2016-08-04 03:52:53 +00:00
Sebastian Pop	b33bfa198c	Make GVN Hoisting obey optnone/bisect. Differential Revision: https://reviews.llvm.org/D23136 llvm-svn: 277688	2016-08-04 02:05:08 +00:00
Rui Ueyama	7e49549d4f	pdbdump: Add a test to verify the result of PDB -> YAML -> PDB conversions. Currently not all information can be restored from YAML. This test verifies only the PDB header. llvm-svn: 277682	2016-08-03 23:54:39 +00:00
Rui Ueyama	d1d8c8312a	pdbdump: Fix crash bug. pdbdump calls DbiStreamBuilder::commit through PDBFileBuilder::commit without calling DbiStreamBuilder::finalize. Because `finalize` initializes `Header` member, `Header` remained nullptr which caused a crash bug. Differential Revision: https://reviews.llvm.org/D23143 llvm-svn: 277681	2016-08-03 23:43:23 +00:00
Matthias Braun	1873998b16	RenameIndependentSubregs: Fix liveness query in rewriteOperands() rewriteOperands() always performed liveness queries at the base index rather than the RegSlot/Base as apropriate for the machine operand. This could lead to illegal rewriting in some cases. llvm-svn: 277661	2016-08-03 22:37:47 +00:00
Sanjay Patel	00a324e893	[InstCombine] use m_APInt to allow icmp eq (add X, C1), C2 folds for splat constant vectors llvm-svn: 277659	2016-08-03 22:08:44 +00:00
Guozhi Wei	9584d18d48	[PPC] Handling CallInst in PPCBoolRetToInt This patch fixes pr25548. Current implementation of PPCBoolRetToInt doesn't handle CallInst correctly, so it failed to do the intended optimization when there is a CallInst with parameters. This patch fixed that. llvm-svn: 277655	2016-08-03 21:43:51 +00:00
Bruno Cardoso Lopes	3fcf832cce	Revert "[ARM] Constant Materialize: imms with specific value can be encoded into mov.w" This reverts commit r277610 / d619aa8878c3dafcc0d29a46517f63ff3209fdd4. This make subtarget-no-movt.ll fail in http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/26892, llvm-svn: 277654	2016-08-03 21:26:21 +00:00
Sebastian Pop	5d3822fc12	GVN-hoist: compute MSSA once per function (PR28670) With this patch we compute the MemorySSA once and update it in the code generator. Differential Revision: https://reviews.llvm.org/D22966 llvm-svn: 277649	2016-08-03 20:54:33 +00:00
Sanjoy Das	ac5bf59b6e	[IndVars] Un-grepify test; NFC Some of these tests need to be cleaned up further to make it obvious what they're testing, but as a first step remove all instances of "grep". llvm-svn: 277648	2016-08-03 20:53:23 +00:00
Matthias Braun	4dc6933d44	opt-bisect-legacy-pass-manager.ll: Test only works with default triple configured llvm-svn: 277645	2016-08-03 20:28:19 +00:00
Reid Kleckner	a6be60871f	Revert "[CloneFunction] Don't remove side effecting calls" This reverts commit r277611 and the followup r277614. Bootstrap builds and chromium builds are crashing during inlining after this change. llvm-svn: 277642	2016-08-03 20:01:01 +00:00
George Burgess IV	024f3d2683	[MSSA] Add special handling for invariant/constant loads. This is a follow-up to r277637. It teaches MemorySSA that invariant loads (and loads of provably constant memory) are always liveOnEntry. llvm-svn: 277640	2016-08-03 19:57:02 +00:00
Sanjay Patel	2e9675ff52	[InstCombine] use m_APInt to allow icmp eq (srem X, C1), C2 folds for splat constant vectors llvm-svn: 277638	2016-08-03 19:48:40 +00:00
George Burgess IV	82e355ce48	[MSSA] Add logic for special handling of atomics/volatiles. This patch makes MemorySSA recognize atomic/volatile loads, and makes MSSA treat said loads specially. This allows us to be a bit more aggressive in some cases. Administrative note: Revision was LGTM'ed by reames in person. Additionally, this doesn't include the `invariant.load` recognition in the differential revision, because I feel it's better to commit that separately. Will commit soon. Differential Revision: https://reviews.llvm.org/D16875 llvm-svn: 277637	2016-08-03 19:39:54 +00:00
Elliot Colp	6af6f64f87	I can't reproduce this buildbot failure locally, so temporarily remove this test while I investigate. http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/27427 llvm-svn: 277636	2016-08-03 19:39:20 +00:00
Tobias Grosser	8757e387dd	[InstCombine] Refactor optimization of zext(or(icmp, icmp)) to enable more aggressive cast-folding Summary: InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation. Tested via `make check` and `lnt`. Background ========== For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again. Details on the submitted changes ================================ - In `visitZExt()` we combine the unfolding and optimization of `zext` instructions. - In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly. - In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions. - The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes. - The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105. Reviewers: grosser, majnemer, spatel Subscribers: eli.friedman, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22864 Contributed-by: Matthias Reisinger <d412vv1n@gmail.com> llvm-svn: 277635	2016-08-03 19:30:35 +00:00
Nicolai Haehnle	fcac6f8376	[InstCombine] Cleanup select-bitext.ll tests Follow-up to r277596. llvm-svn: 277633	2016-08-03 19:10:13 +00:00
Simon Pilgrim	898f030f70	[X86][SSE] Enable target shuffle combining to combine multiple shuffle inputs. We currently only support combining target shuffles that consist of a single source input (plus elements known to be undef/zero). This patch generalizes the recursive combining of the target shuffle to collect all the inputs, merging any duplicates along the way, into a full set of src ops and its shuffle mask. We uncover a number of cases where we have failed to combine a unary shuffle because the input has been duplicated and separated during lowering. This will allow us to combine to 2-input shuffles in a future patch. Differential Revision: https://reviews.llvm.org/D22859 llvm-svn: 277631	2016-08-03 19:08:24 +00:00
Vedant Kumar	4031d9f80e	Reapply "More fixes to get good error messages for bad archives." This reverts commit the revert commit r277627. The build errors mentioned in r277627 were likely caused by an unclean build directory. Sorry for the noise. llvm-svn: 277630	2016-08-03 19:02:50 +00:00
Sanjay Patel	43aeb001c9	[InstCombine] use m_APInt to allow icmp (binop X, Y), C folds with constant splat vectors This removes the restriction for the icmp constant, but as noted by the FIXME comments, we still need to change individual checks for binop operand constants. llvm-svn: 277629	2016-08-03 18:59:03 +00:00
Vedant Kumar	bfb6072d84	Revert "More fixes to get good error messages for bad archives." This reverts commit r277540. It breaks the build with: ../lib/Object/Archive.cpp:264:41: error: return type of out-of-line definition of 'llvm::object::ArchiveMemberHeader::getUID' differs from that in the declaration Expected<unsigned> ArchiveMemberHeader::getUID() const { ~~~~~~~~~~~~~~~~~~ ^ include/llvm/Object/Archive.h:53:12: note: previous declaration is here unsigned getUID() const; ~~~~~~~~ ^ llvm-svn: 277627	2016-08-03 18:44:32 +00:00
Krzysztof Parzyszek	23ee12e173	[Hexagon] Generate COPY/REG_SEQUENCE more aggressively for vectors llvm-svn: 277626	2016-08-03 18:35:48 +00:00
Duncan P. N. Exon Smith	9cbc69d1fe	IR: Drop uniquing when an MDNode Value operand is deleted This is a fix for PR28697. An MDNode can indirectly refer to a GlobalValue, through a ConstantAsMetadata. When the GlobalValue is deleted, the MDNode operand is reset to `nullptr`. If the node is uniqued, this can lead to a hard-to-detect cache invalidation in a Metadata map that's shared across an LLVMContext. Consider: 1. A map from Metadata* to `T` called RemappedMDs. 2. A node that references a global variable, `!{i1* @GV}`. 3. Insert `!{i1* @GV} -> SomeT` in the map. 4. Delete `@GV`, leaving behind `!{null} -> SomeT`. Looking up the generic and uninteresting `!{null}` gives you `SomeT`, which is likely related to `@GV`. Worse, `SomeT`'s lifetime may be tied to the deleted `@GV`. This occurs in practice in the shared ValueMap used since r266579 in the IRMover. Other code that handles more than one Module (with different lifetimes) in the same LLVMContext could hit it too. The fix here is a partial revert of r225223: in the rare case that an MDNode operand is a ConstantAsMetadata (i.e., wrapping a node from the Value hierarchy), drop uniquing if it gets replaced with `nullptr`. This changes step #4 above to leave behind `distinct !{null} -> SomeT`, which can't be confused with the generic `!{null}`. In theory, this can cause some churn in the LLVMContext's MDNode uniquing map when Values are being deleted. However: - The number of GlobalValues referenced from uniqued MDNodes is expected to be quite small. E.g., the debug info metadata schema only references GlobalValues from distinct nodes. - Other Constants have the lifetime of the LLVMContext, whose teardown is careful to drop references before deleting the constants. As a result, I don't expect a compile time regression from this change. llvm-svn: 277625	2016-08-03 18:19:43 +00:00
Ehsan Amiri	a538b0f023	Adding -verify-machineinstrs option to PowerPC tests Currently we have a number of tests that fail with -verify-machineinstrs. To detect this cases earlier we add the option to the testcases with the exception of tests that will currently fail with this option. PR 27456 keeps track of this failures. No code review, as discussed with Hal Finkel. llvm-svn: 277624	2016-08-03 18:17:35 +00:00
David Majnemer	fad0490869	[CloneFunction] Don't remove side effecting calls We were able to figure out that the result of a call is some constant. While propagating that fact, we added the constant to the value map. This is problematic because it results in us losing the call site when processing the value map. This fixes PR28802. llvm-svn: 277611	2016-08-03 17:12:47 +00:00
Weiming Zhao	57dc4cf0e1	[ARM] Constant Materialize: imms with specific value can be encoded into mov.w Summary: Thumb2 supports encoding immediates with specific patterns into mov.w by splatting the low 8 bits into other bytes. Reviewers: john.brawn, jmolloy Subscribers: jmolloy, aemerson, rengolin, samparker, llvm-commits Differential Revision: https://reviews.llvm.org/D23090 llvm-svn: 277610	2016-08-03 17:05:23 +00:00
Renato Golin	f583097434	Revert "Teach CorrelatedValuePropagation to mark adds as no wrap" This reverts commit r277592, trying to fix the AArch64 42VMA buildbot. llvm-svn: 277607	2016-08-03 16:20:48 +00:00
Elliot Colp	82b1468a4d	Disable shrinking of SNaN constants When expanding FP constants, we attempt to shrink doubles to floats and perform an extending load. However, on SystemZ, and possibly on other targets (I've only confirmed the problem on SystemZ), the FP extending load instruction may convert SNaN into QNaN, or may cause an exception. So in the general case, we would still like to shrink FP constants, but SNaNs should be left as doubles. Differential Revision: https://reviews.llvm.org/D22685 llvm-svn: 277602	2016-08-03 15:09:21 +00:00
Krzysztof Parzyszek	ed4e7827bb	[Hexagon] Do not check alignment for unsized types in isLegalAddressingMode When the same base address is used to load two different data types, LSR would assume a memory type of "void". This type is not sized and has no alignment information. Checking for it causes a crash. llvm-svn: 277601	2016-08-03 15:06:18 +00:00
Sanjay Patel	91bab5364e	add a vector variant of each test llvm-svn: 277598	2016-08-03 14:25:55 +00:00
Nicolai Haehnle	c1f1ad998a	[InstCombine] Add select-bitext.ll tests As requested for D22747. llvm-svn: 277596	2016-08-03 13:37:56 +00:00
Artur Pilipenko	68cb947cc9	Teach CorrelatedValuePropagation to mark adds as no wrap Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem. Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D23059 llvm-svn: 277592	2016-08-03 13:11:39 +00:00
Igor Breger	c59b3a2236	[AVX512] Add aliases for vcvttss2si{l\|q}, vcvttsd2si{l\|q}, vcvttss2usi{l\|q}, vcvttsd2usi{l\|q} instructions. Differential Revision: http://reviews.llvm.org/D23111 llvm-svn: 277586	2016-08-03 10:58:05 +00:00
Chandler Carruth	241bf2456f	[PM] Add a generic 'repeat N times' pass wrapper to the new pass manager. While this has some utility for debugging and testing on its own, it is primarily intended to demonstrate the technique for adding custom wrappers that can provide more interesting interation behavior in a nice, orthogonal, and composable layer. Being able to write these kinds of very dynamic and customized controls for running passes was one of the motivating use cases of the new pass manager design, and this gives a hint at how they might look. The actual logic is tiny here, and most of this is just wiring in the pipeline parsing so that this can be widely used. I'm adding this now to show the wiring without a lot of business logic. This is a precursor patch for showing how a "iterate up to N times as long as we devirtualize a call" utility can be added as a separable and composable component along side the CGSCC pass management. Differential Revision: https://reviews.llvm.org/D22405 llvm-svn: 277581	2016-08-03 07:44:48 +00:00
Dean Michael Berris	0b8f6c8777	[XRay] Make the xray_instr_map section specification more correct Summary: We also add a test to show what currently happens when we create a section per function and emit an xray_instr_map. This illustrates the relationship (or lack thereof) between the per-function section and the xray_instr_map section. We also change the code generation slightly so that we don't always create group sections, but rather only do so if a function where the table is associated with is in a group. Also in this change: - Remove the "merge" flag on the xray_instr_map section. - Test that we're generating the right table for comdat and non-comdat functions. Reviewers: echristo, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23104 llvm-svn: 277580	2016-08-03 07:21:55 +00:00
Mehdi Amini	f9721ba5f1	RecordStreamer: handle inline asm "lazy_reference" and mark symbols as "used" llvm-svn: 277564	2016-08-03 03:51:42 +00:00
Chandler Carruth	6cb2ab2c60	[PM] Significantly refactor the pass pipeline parsing to be easier to reason about and less error prone. The core idea is to fully parse the text without trying to identify passes or structure. This is done with a single state machine. There were various bugs in the logic around this previously that were repeated and scattered across the code. Having a single routine makes it much easier to fix and get correct. For example, this routine doesn't suffer from PR28577. Then the actual pass construction is handled using much easier to read code and simple loops, with particular pass manager construction sunk to live with other pass construction. This is especially nice as the pass managers are in fact passes. Finally, the "implicit" pass manager synthesis is done much more simply by forming "pre-parsed" structures rather than having to duplicate tons of logic. One of the bugs fixed by this was evident in the tests where we accepted a pipeline that wasn't really well formed. Another bug is PR28577 for which I have added a test case. The code is less efficient than the previous code but I'm really hoping that's not a priority. ;] Thanks to Sean for the review! Differential Revision: https://reviews.llvm.org/D22724 llvm-svn: 277561	2016-08-03 03:21:41 +00:00
Ivan Krasin	3aade11252	Add -lowertypetests-bitsets-level to control bitsets generation. Summary: Sometimes, bitsets could get really large (>300k entries) and we might want to drop a check, as it would have a too much cost. Adding a flag to control how much penalty are we willing to pay for bitsets. Reviewers: kcc Differential Revision: https://reviews.llvm.org/D23088 llvm-svn: 277556	2016-08-03 00:59:38 +00:00
Sanjay Patel	287b81d27b	add vector test for icmp+sub llvm-svn: 277555	2016-08-03 00:36:54 +00:00
Daniel Berlin	df10119e4e	Support for lifetime begin/end markers in the MemorySSA use optimizer Summary: Depends on D23072 Reviewers: george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23076 llvm-svn: 277553	2016-08-03 00:01:46 +00:00
Sanjoy Das	4babac89cc	[Verifier] Add more tests related to non-integral pointers As suggested by Matt Arsenault in post-commit review. llvm-svn: 277550	2016-08-02 23:32:53 +00:00
Rui Ueyama	057625f616	Fix a test for r277545. This change should have been submitted with that commit. llvm-svn: 277548	2016-08-02 23:25:59 +00:00
Evgeniy Stepanov	d99f80b48e	[safestack] Layout large allocas first to reduce fragmentation. llvm-svn: 277544	2016-08-02 23:21:30 +00:00
Derek Schuff	39bf39f35c	[WebAssembly] Initial SIMD128 support. Kicks off the implementation of wasm SIMD128 support (spec: https://github.com/stoklund/portable-simd/blob/master/portable-simd.md), adding support for add, sub, mul for i8x16, i16x8, i32x4, and f32x4. The spec is WIP, and might change in the near future. Patch by João Porto Differential Revision: https://reviews.llvm.org/D22686 llvm-svn: 277543	2016-08-02 23:16:09 +00:00
Tim Northover	765777ce67	ARM: only form SMMLS when SUBE flags unused. In this particular example we wouldn't want the smmls anyway (the value is actually unused), but in general smmls does not provide the required flags register so if that SUBE result is used we can't replace it. llvm-svn: 277541	2016-08-02 23:12:36 +00:00
Kevin Enderby	395cc09444	More fixes to get good error messages for bad archives. Fixed the last incorrect uses of llvm_unreachable() in the code which were actually just cases of errors in the input Archives. llvm-svn: 277540	2016-08-02 22:58:55 +00:00
Matt Arsenault	979902b3ff	AMDGPU: fdiv -1, x -> rcp -x llvm-svn: 277535	2016-08-02 22:25:04 +00:00
Krzysztof Parzyszek	824d347d2d	[Hexagon] Recognize vcombine in copy propagation llvm-svn: 277528	2016-08-02 21:49:20 +00:00
Michael Zolotukhin	ca0d48b742	[LoopUnroll] Fix a PowerPC test broken by r277524. llvm-svn: 277527	2016-08-02 21:43:25 +00:00
Michael Zolotukhin	b2738e41bf	[LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its original value. As agreed in post-commit review of r265388, I'm switching the flag to its original value until the 90% runtime performance regression on SingleSource/Benchmarks/Stanford/Bubblesort is addressed. llvm-svn: 277524	2016-08-02 21:24:14 +00:00
Artem Belevich	db4bc667af	[NVPTX] remove unnecessary named metadata update that happens to break debug info. Also added test case to verify IR changes done by NVPTXGenericToNVVM pass. Differential Revision: https://reviews.llvm.org/D22837 llvm-svn: 277520	2016-08-02 20:58:24 +00:00
Wei Mi	dc7001afb2	[LoopVectorize] Change comment for isOutOfScope in collectLoopUniforms, NFC Update comment for isOutOfScope and add a testcase for uniform value being used out of scope. Differential Revision: https://reviews.llvm.org/D23073 llvm-svn: 277515	2016-08-02 20:27:49 +00:00
Tim Northover	1021d89398	AArch64: properly calculate cmpxchg status in FastISel. We were relying on the misleadingly-names $status result to actually be the status. Actually it's just a scratch register that may or may not be valid (and is the inverse of the real ststus anyway). Success can be determined by comparing the value loaded against the one we wanted to see for "cmpxchg strong" loops like this. Should fix PR28819. llvm-svn: 277513	2016-08-02 20:22:36 +00:00
Sanjoy Das	f45e03e201	[IRCE] Preserve DomTree and LCSSA This changes IRCE to "preserve" LCSSA and DomTree by recomputing them. It still does not preserve LoopSimplify. llvm-svn: 277505	2016-08-02 19:31:54 +00:00
Nicolai Haehnle	8a482b33fe	AMDGPU: Stay in WQM for non-intrinsic stores Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504	2016-08-02 19:31:14 +00:00
Michael Zolotukhin	d9b6ad3c01	[LoopUnroll] Ensure we create prolog loops in simplified form. llvm-svn: 277502	2016-08-02 19:19:31 +00:00
Nirav Dave	9263ae3b5a	Fix handling of end-of-line preprocessor comments Attempt 2 Attempt 2: Retryign after Tsan.mman test fix. Attempt 1: Recommitting after fixing test. When parsing assembly where the line comment syntax is not hash, the lexer cannot distinguish between hash's that start a hash line comment and one that is part of an assembly statement and must be distinguished during parsing. Previously, this was incompletely handled by not checking for EndOfStatement at the end of statements and interpreting hash prefixed statements as comments. Change EndOfStatement Parsing to check for Hash comments and reintroduce Hash statement parsing to catch previously handled cases. Reviewers: rnk, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23017 llvm-svn: 277501	2016-08-02 19:17:54 +00:00
Nicolai Haehnle	bef0e90cf1	AMDGPU: Track physical registers in SIWholeQuadMode Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM. The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0. This is a candidate for the 3.9 branch, as it fixes a possible hang. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22673 llvm-svn: 277500	2016-08-02 19:17:37 +00:00
Ahmed Bougacha	91bdeb1cc2	[AArch64][GlobalISel] Replace test REQUIRES with lit.local.cfg. NFC. I forgot the REQUIRES once (see r277486). Let's prevent it from happening again. llvm-svn: 277499	2016-08-02 19:04:29 +00:00
Ahmed Bougacha	8a31ed2432	[AArch64] Remove useless 'import re' from CodeGen lit.local.cfg. NFC. llvm-svn: 277498	2016-08-02 19:04:25 +00:00
Krzysztof Parzyszek	962932c2e2	[Hexagon] Prefer _io over _rr for 64-bit store with constant offset Identify patterns where the address is aligned to an 8-byte boundary, but both the base address and the constant offset are both proper multiples of 4. In such cases, extract Base+4 into a separate instruc- tion, and use S2_storerd_io, instead of using S4_storerd_rr. llvm-svn: 277497	2016-08-02 18:50:05 +00:00
Nirav Dave	8601ac11aa	[MC] Fix Intel Operand assembly parsing for .set ids Recommitting after fixing overaggressive fastpath return in parsing. Fix intel syntax special case identifier operands that refer to a constant (e.g. .set <ID> n) to be interpreted as immediate not memory in parsing. Associated commit to fix clang test commited shortly. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22585 llvm-svn: 277489	2016-08-02 17:56:03 +00:00
Ahmed Bougacha	0d020190dd	[AArch64][GlobalISel] Add REQUIRES: global-isel to verifier tests. I thought the directory had a lit.local.cfg, but it doesn't. I'll add one, but for now, add the REQUIRES line. While there, move the triple into the IR and add a datalayout. llvm-svn: 277486	2016-08-02 17:19:35 +00:00
Ahmed Bougacha	bfaddd999a	[GlobalISel] Set the Selected MF property. None of GlobalISel requires the property, but this lets us use the verifier instead of rolling our own "all instructions selected" check. llvm-svn: 277484	2016-08-02 16:49:25 +00:00
Ahmed Bougacha	b14e944cdb	[GlobalISel] Verify Selected MF property. After instruction selection, there should be no pre-isel generic instructions remaining, nor should generic virtual registers be used. Verify that. llvm-svn: 277483	2016-08-02 16:49:22 +00:00
Ahmed Bougacha	b109d51865	[GlobalISel] Add Selected MachineFunction property. Selected: the InstructionSelect pass ran and all pre-isel generic instructions have been eliminated; i.e., all instructions are now target-specific or non-pre-isel generic instructions (e.g., COPY). Since only pre-isel generic instructions can have generic virtual register operands, this also means that all generic virtual registers have been constrained to virtual registers (assigned to register classes) and that all sizes attached to them have been eliminated. This lets us enforce certain invariants across passes. This property is GlobalISel-specific, but is always available. llvm-svn: 277482	2016-08-02 16:49:19 +00:00
Ahmed Bougacha	4628e37e7f	[GlobalISel] Set and require RegBankSelected MF property. The InstructionSelect pass assumes that RegBankSelect ran; set the property on all tests (thereby verifying the test inputs) and require it in the pass. llvm-svn: 277477	2016-08-02 16:17:18 +00:00
Ahmed Bougacha	3681c772cf	[GlobalISel] Verify RegBankSelected MF property. RegBankSelected functions shouldn't have any generic virtual register not assigned to a bank. Verify that. llvm-svn: 277476	2016-08-02 16:17:15 +00:00
Ahmed Bougacha	2471265508	[GlobalISel] Add RegBankSelected MachineFunction property. RegBankSelected: the RegBankSelect pass ran and all generic virtual registers have been assigned to a register bank. This lets us enforce certain invariants across passes. This property is GlobalISel-specific, but is always available. llvm-svn: 277475	2016-08-02 16:17:10 +00:00
Matthew Simpson	18d8898317	[LV] Generate both scalar and vector integer induction variables This patch enables the vectorizer to generate both scalar and vector versions of an integer induction variable for a given loop. Previously, we only generated a scalar induction variable if we knew all its users were going to be scalar. Otherwise, we generated a vector induction variable. In the case of a loop with both scalar and vector users of the induction variable, we would generate the vector induction variable and extract scalar values from it for the scalar users. With this patch, we now generate both versions of the induction variable when there are both scalar and vector users and select which version to use based on whether the user is scalar or vector. Differential Revision: https://reviews.llvm.org/D22869 llvm-svn: 277474	2016-08-02 15:25:16 +00:00
Ahmed Bougacha	24d0d4d2ec	[GlobalISel] Set, require, and verify Legalized MF property. RegBankSelect and InstructionSelect run after the legalizer and require a Legalized function: check that all instructions are legal. Note that this should be in the MachineVerifier, but it can't use the MachineLegalizer as it's currently in the separate GlobalISel library. Note that the RegBankSelect verifier checks have the same layering problem, but we only use inline methods so end up not needing to link against the GlobalISel library. llvm-svn: 277472	2016-08-02 15:10:32 +00:00
Ahmed Bougacha	0d7b0cb865	[GlobalISel] Add Legalized MachineFunction property. Legalized: The MachineLegalizer ran; all pre-isel generic instructions have been legalized, i.e., all instructions are now one of: - generic and always legal (e.g., COPY) - target-specific - legal pre-isel generic instructions. This lets us enforce certain invariants across passes. This property is GlobalISel-specific, but is always available. llvm-svn: 277470	2016-08-02 15:10:25 +00:00
Nirav Dave	f94cd9df0f	Revert "[MC] Fix handling of end-of-line preprocessor comments" Causes TSan failure on PPC64 This reverts commit r277459. llvm-svn: 277468	2016-08-02 15:08:52 +00:00
Matthew Simpson	58f562887b	[LV] Untangle the concepts of uniform and scalar This patch refactors the logic in collectLoopUniforms and collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It adds isScalarAfterVectorization along side isUniformAfterVectorization to distinguish the two. Known scalar values include those that are uniform, getelementptr instructions that won't be vectorized, and induction variables and induction variable update instructions whose users are all known to be scalar. This patch includes the following functional changes: - In collectLoopUniforms, we mark uniform the pointer operands of interleaved accesses. Although non-consecutive, these pointers are treated like consecutive pointers during vectorization. - In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it isScalarAfterVectorization rather than isUniformAfterVectorization. This differs from the previous functionaly in that we now add getelementptr instructions that will not be vectorized into VecValuesToIgnore. This patch also removes the ValuesNotWidened set used for induction variable scalarization since, after the above changes, it is now equivalent to isScalarAfterVectorization. Differential Revision: https://reviews.llvm.org/D22867 llvm-svn: 277460	2016-08-02 14:29:41 +00:00
Nirav Dave	9b0ee9c522	[MC] Fix handling of end-of-line preprocessor comments Recommitting after fixing test. When parsing assembly where the line comment syntax is not hash, the lexer cannot distinguish between hash's that start a hash line comment and one that is part of an assembly statement and must be distinguished during parsing. Previously, this was incompletely handled by not checking for EndOfStatement at the end of statements and interpreting hash prefixed statements as comments. Change EndOfStatement Parsing to check for Hash comments and reintroduce Hash statement parsing to catch previously handled cases. Reviewers: rnk, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23017 llvm-svn: 277459	2016-08-02 14:25:49 +00:00
Sam Parker	18bc3a002e	[ARM] Improve smul* and smla* isel for Thumb2 Added (sra (shl x, 16), 16) to the sext_16_node PatLeaf for ARM to simplify some pattern matching. This has allowed several patterns for smul* and smla* to be removed as well as making it easier to add the matching for the corresponding instructions for Thumb2 targets. Also added two Pat classes that are predicated on Thumb2 with the hasDSP flag and UseMulOps flags. Updated the smul codegen test with the wider range of patterns plus the ThumbV6 and ThumbV6T2 targets. Differential Revision: https://reviews.llvm.org/D22908 llvm-svn: 277450	2016-08-02 12:44:27 +00:00
Ahmed Bougacha	45eb3b94d4	[GlobalISel] Don't RegBankSelect target-specific instructions. They don't have types and should be using register classes. llvm-svn: 277447	2016-08-02 11:41:16 +00:00
Ahmed Bougacha	faf8e9f8c6	[GlobalISel] Don't legalize non-generic instructions. They don't have types and should be legal. llvm-svn: 277446	2016-08-02 11:41:09 +00:00
Igor Breger	f44b79d08e	[AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435	2016-08-02 09:15:28 +00:00
Matt Arsenault	dfa7683d71	AArch64: Add missing branch relaxation tests The branch relaxation pass has the worst test coverage of any pass in AArch64. Add a few tests that hit some large pieces of code in the pass. llvm-svn: 277428	2016-08-02 07:41:05 +00:00
Craig Topper	05948fb36c	[AVX-512] Correct ExeDomain for many AVX-512 instructions. llvm-svn: 277416	2016-08-02 05:11:15 +00:00
Sanjoy Das	65ec15b095	[Verifier] Improve test coverage for rL277413 As suggest via post-commit review. llvm-svn: 277414	2016-08-02 03:23:22 +00:00
Sanjoy Das	e1129ee64a	[Verifier] Disallow illegal ptr<->int casts in ConstantExprs This should have been a part of rL277085, but I hadn't considered this case. llvm-svn: 277413	2016-08-02 02:55:57 +00:00
Bruno Cardoso Lopes	42327a32b2	Revert r277408 and r277407 Revert r277408 "Fix test from rL277407." Revert r277407 "[MC] Fix handling of end-of-line preprocessor comments" This is currently breaking: http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_check/20731 llvm-svn: 277412	2016-08-02 02:53:59 +00:00
Sean Silva	f801575fd0	CodeExtractor : Add ability to preserve profile data. Added ability to estimate the entry count of the extracted function and the branch probabilities of the exit branches. Patch by River Riddle! Differential Revision: https://reviews.llvm.org/D22744 llvm-svn: 277411	2016-08-02 02:15:45 +00:00
Nirav Dave	d0e8d251eb	Fix test from rL277407. llvm-svn: 277408	2016-08-02 01:27:09 +00:00
Nirav Dave	3140fec182	[MC] Fix handling of end-of-line preprocessor comments Summary: When parsing assembly where the line comment syntax is not hash, the lexer cannot distinguish between hash's that start a hash line comment and one that is part of an assembly statement and must be distinguished during parsing. Previously, this was incompletely handled by not checking for EndOfStatement at the end of statements and interpreting hash prefixed statements as comments. Change EndOfStatement Parsing to check for Hash comments and reintroduce Hash statement parsing to catch previously handled cases. Reviewers: rnk, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23017 llvm-svn: 277407	2016-08-02 01:05:29 +00:00
Hans Wennborg	7a3a49b18a	Revert r276895 "[MC][X86] Fix Intel Operand assembly parsing for .set ids" This caused PR28805. Adding a regression test. llvm-svn: 277402	2016-08-01 23:00:01 +00:00
Derek Schuff	c64d7655b2	[WebAssembly] Support CFI for WebAssembly target Summary: This patch implements CFI for WebAssembly. It modifies the LowerTypeTest pass to pre-assign table indexes to functions that are called indirectly, and lowers type checks to test against the appropriate table indexes. It also modifies the WebAssembly backend to support a special ".indidx" assembly directive that propagates the table index assignments out to the linker. Patch by Dominic Chen Differential Revision: https://reviews.llvm.org/D21768 llvm-svn: 277398	2016-08-01 22:25:02 +00:00
Lang Hames	7643d98d86	[Orc] Fix common symbol support in ORC. Common symbol support in ORC was broken in r270716 when the symbol resolution rules in RuntimeDyld were changed. With the switch to lazily materialized symbols in r277386, common symbols can be supported by having RuntimeDyld::emitCommonSymbols search for (but not materialize!) definitions elsewhere in the logical dylib. This patch adds the 'Common' flag to JITSymbolFlags, and the necessary check to RuntimeDyld::emitCommonSymbols. llvm-svn: 277397	2016-08-01 22:23:24 +00:00
Michael Kuperstein	c40618610f	[PM] Port SpeculativeExecution to the new PM Differential Revision: https://reviews.llvm.org/D23033 llvm-svn: 277393	2016-08-01 21:48:33 +00:00
Derek Schuff	f41f67d3d9	[WebAssembly] Add asm.js-style exception handling support Summary: This patch includes asm.js-style exception handling support for WebAssembly. The WebAssembly MVP does not have any support for unwinding or non-local control flow. In order to support C++ exceptions, emscripten currently uses JavaScript exceptions along with some support code (written in JavaScript) that is bundled by emscripten with the generated code. This scheme lowers exception-related instructions for wasm such that wasm modules can be compatible with emscripten's existing scheme and share the support code. Patch by Heejin Ahn Differential Revision: https://reviews.llvm.org/D22958 llvm-svn: 277391	2016-08-01 21:34:04 +00:00
Zachary Turner	d3c7b8e303	[msf] Teach LLVM to parse a split Fpm. The FPM is split at regular intervals across the MSF file, as the MS code suggests. It turns out that the value of the interval is precisely the block size. If the block size is 4096, then there are two Fpm pages every 4096 blocks. So here we teach the PDBFile class to parse a split FPM, and also add more options when dumping the FPM to display some additional information such as orphaned pages (pages which the FPM says are allocated, but which nothing appears to use), use after free pages (pages which the FPM says are not allocated, but which are referenced by a stream), and multiple use pages (pages which the FPM says are allocated but are used more than once). Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D23022 llvm-svn: 277388	2016-08-01 21:19:45 +00:00
Michael Kuperstein	c97da7f3a4	[DAGCombine] Make sext(setcc) combine respect getBooleanContents We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)" Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value of T is 1 or -1, depending on the type of the setcc, and getBooleanContents() for the type if it is not i1. This fixes PR28504. llvm-svn: 277371	2016-08-01 19:39:49 +00:00
Ron Lieberman	8123b966cb	[Hexagon] Generate vector printing instructions llvm-svn: 277370	2016-08-01 19:36:39 +00:00
George Burgess IV	5f0e76dca6	[CFLAA] Remove modref queries from CFLAA. As it turns out, modref queries are broken with CFLAA. Specifically, the data source we were using for determining modref behaviors explicitly ignores operations on non-pointer values. So, it wouldn't note e.g. storing an i32 to an i32* (or loading an i64 from an i64). It also ignores external function calls, rather than acting conservatively for them. (N.B. These operations, where necessary, are* tracked by CFLAA; we just use a different mechanism to do so. Said mechanism is relatively imprecise, so it's unlikely that we can provide reasonably good modref answers with it as implemented.) Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22978 llvm-svn: 277366	2016-08-01 18:47:28 +00:00
Evandro Menezes	82e245a202	[AArch64] Add support for Samsung Exynos M2 (NFC). llvm-svn: 277364	2016-08-01 18:39:45 +00:00
David Majnemer	d1548eaa17	Included test for r277360. llvm-svn: 277361	2016-08-01 18:07:19 +00:00
David Majnemer	ba6665d88a	[Verifier] Resume instructions can only be in functions w/ a personality This fixes PR28799. llvm-svn: 277360	2016-08-01 18:06:34 +00:00
Krzysztof Parzyszek	ddafa2cd5f	[Hexagon] Check for offset overflow when reserving scavenging slots Scavenging slots were only reserved when pseudo-instruction expansion in frame lowering created new virtual registers. It is possible to still need a scavenging slot even if no virtual registers were created, in cases where the stack is large enough to overflow instruction offsets. llvm-svn: 277355	2016-08-01 17:15:30 +00:00
Nirav Dave	6e0b732009	Add removed inline-assembly-comment test from r277146 llvm-svn: 277349	2016-08-01 15:36:10 +00:00
Daniel Sanders	b3ae33c7a6	[mips][fastisel] Correct argument lowering for (f64, f64, i32) and similar. Summary: Allocating an AFGR64 shadows two GPR32's instead of just one. This fixes an LNT regression detected by our internal buildbots. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D23012 llvm-svn: 277348	2016-08-01 15:32:51 +00:00
Valery Pykhtin	902db3101b	[AMDGPU] refactor DS instruction definitions. NFC. Differential revision: https://reviews.llvm.org/D22522 llvm-svn: 277344	2016-08-01 14:21:30 +00:00
Simon Pilgrim	46f119a59f	[X86] Use implicit masking of SHLD/SHRD shift double instructions Similar to the regular shift instructions, SHLD/SHRD only use the bottom bits of the shift value llvm-svn: 277341	2016-08-01 12:11:43 +00:00
Simon Pilgrim	7fd4ad6849	Fixed test check ordering issue on windows buildbots llvm-svn: 277337	2016-08-01 10:40:15 +00:00
James Molloy	bade86cedc	[SimplifyCFG] Fix nasty RAUW bug from r277325 Using RAUW was wrong here; if we have a switch transform such as: 18 -> 6 then 6 -> 0 If we use RAUW, while performing the second transform the transformed 6 from the first will be also replaced, so we end up with: 18 -> 0 6 -> 0 Found by clang stage2 bootstrap; testcase added. llvm-svn: 277332	2016-08-01 09:34:48 +00:00
Diana Picus	ab5a4c7dbb	[AArch64] Return the correct size for TLSDESC_CALLSEQ The branch relaxation pass is computing the wrong offsets because it assumes TLSDESC_CALLSEQ eats up 4 bytes, when in fact it is lowered to an instruction sequence taking up 16 bytes. This can become a problem in huge files with lots of TLS accesses, as it may slowly move branch targets out of the range computed by the branch relaxation pass. Fixes PR24234 https://llvm.org/bugs/show_bug.cgi?id=24234 Differential Revision: https://reviews.llvm.org/D22870 llvm-svn: 277331	2016-08-01 08:38:49 +00:00
Craig Topper	d2b2d745ff	[AVX-512] Fix a test missed in r277327. llvm-svn: 277330	2016-08-01 08:15:30 +00:00
James Molloy	91821bd0b4	[SimplifyCFG] Try and pacify buildbots after r277325 It looks like the two independent parts of the rotate operation (a lshr and shl) are being reordered on some bots. Add CHECK-DAGs to account for this. llvm-svn: 277329	2016-08-01 08:09:55 +00:00
Craig Topper	c48c029610	[AVX-512] Fix duplicate column in AVX512 execution dependency table that was preventing VMOVDQU32/VMOVDQA32 from being recognized. Fix a bug in the code that stops execution dependency fix from turning operations on 32-bit integer element types into operations on 64-bit integer element types. llvm-svn: 277327	2016-08-01 07:55:33 +00:00
Craig Topper	ddc96cd33d	[X86] Regenerate a test to pick up shuffle comments that were added at some point. llvm-svn: 277326	2016-08-01 07:55:24 +00:00
James Molloy	b2e436de42	[SimplifyCFG] Range reduce switches If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example: switch (i) { case 5: ... case 9: ... case 13: ... case 17: ... } can become: if ( (i - 5) % 4 ) goto default; switch ((i - 5) / 4) { case 0: ... case 1: ... case 2: ... case 3: ... } or even better: switch ( ROTR(i - 5, 2) { case 0: ... case 1: ... case 2: ... case 3: ... } The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables. llvm-svn: 277325	2016-08-01 07:45:11 +00:00
Hrvoje Varga	00d96ee7b9	[mips] Clang generates unaligned offset for MSA instruction st.d Differential Revision: https://reviews.llvm.org/D19475 llvm-svn: 277323	2016-08-01 06:46:20 +00:00
Diana Picus	850043b25a	[AArch64] Register passes so they can be run by llc Initialize all AArch64-specific passes in the TargetMachine so they can be run by llc. This can lead to conflicts in opt with some command line options that share the same name as the pass, so I took this opportunity to do some cleanups: * rename all relevant command line options from "aarch64-blah" to "aarch64-enable-blah" and update the tests accordingly * run clang-format on their declarations * move all these declarations to a common place (the TargetMachine) as opposed to having them scattered around (AArch64BranchRelaxation and AArch64AddressTypePromotion were the only offenders) llvm-svn: 277322	2016-08-01 05:56:57 +00:00
Craig Topper	749a111f1e	[AVX-512] Teach X86InstrInfo::getLargestLegalSuperClass to inflate to FR32X/FR64X if AVX512 is supported and VR128X/VR256X if VLX is supported. Had to update a stack folding test to clobber the other 16 registers since this now made them get used instead of spilling. llvm-svn: 277321	2016-08-01 05:31:50 +00:00
Craig Topper	9161e4ec22	[AVX512] Replace scalar fp arithmetic intrinsics with native IR in an AVX512 test. The intrinsics aren't lowered to AVX512 instructions. The intrinsics really should be removed and autoupgraded. llvm-svn: 277320	2016-08-01 04:29:16 +00:00
Sean Silva	423c7149dc	Revert r277313 and r277314. They seem to trigger an LSan failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/15140/steps/check-llvm%20asan/logs/stdio Revert "Add the tests for r277313" This reverts commit r277314. Revert "CodeExtractor : Add ability to preserve profile data." This reverts commit r277313. llvm-svn: 277317	2016-08-01 04:16:09 +00:00
Sean Silva	e5a5c966cd	Move this test to x86-specific directory. No bots have yelled yet, but this test references an x86 intrinsic. Also, it invokes llc on x86 IR. Fixup to r277315. llvm-svn: 277316	2016-08-01 03:22:05 +00:00
Sean Silva	a0a802abe3	Fix - CodeExtractor : Inherit Target Dependent Attributes from the parent function. When extracting a set of blocks make sure to inherit all of the target dependent attributes to make sure that the function will be valid for lowering. One example is the "target-features" attribute for x86, if the extracted region has functionality that relies on a specific feature it will fail to be lowered. This also allows for extracted functions to be valid for inlining, at least back into the parent function, as the target attributes are tested when inlining for compatibility. Patch by River Riddle! Differential Revision: https://reviews.llvm.org/D22713 llvm-svn: 277315	2016-08-01 03:15:32 +00:00
Sean Silva	72be9a6937	Add the tests for r277313 Forgot to `git add` them. llvm-svn: 277314	2016-08-01 03:04:34 +00:00
Simon Pilgrim	8ae4354df6	[X86][SSE] Regenerate frem tests llvm-svn: 277311	2016-07-31 21:59:23 +00:00
Simon Pilgrim	b089ba4c65	[X86][SSE] Regenerate fpext tests llvm-svn: 277310	2016-07-31 21:55:33 +00:00
Craig Topper	7afdc0fb25	[AVX512] Always use EVEX encodings for 128/256-bit move instructions in getLoadStoreRegOpcode if VLX is supported. llvm-svn: 277305	2016-07-31 20:20:05 +00:00
Craig Topper	4c53e60360	[AVX512] Add VLX packed move instructions to the execution dependency fix pass and update tests. llvm-svn: 277304	2016-07-31 20:20:01 +00:00
Craig Topper	338ec9a0cb	[AVX512] Stop treating VR512 specially in getLoadStoreRegOpcode and use the regular switch which already tried to handle it, but was unreachable. This has the added benefit of enabling aligned loads/stores if the stack is aligned. llvm-svn: 277302	2016-07-31 20:19:53 +00:00
Craig Topper	2a6bbb8203	[AVX512] Add X86::VR512RegClassID to X86RegisterInfo::getLargestLegalSuperClass. llvm-svn: 277301	2016-07-31 20:19:50 +00:00
Simon Pilgrim	6be48e4aa7	[X86] Improve 64-bit shifts on 32-bit targets (PR14593) As discussed on PR14593, this patch adds support for lowering to SHLD/SHRD from the patterns generated by DAGTypeLegalizer::ExpandShiftWithKnownAmountBit. Differential Revision: https://reviews.llvm.org/D23000 llvm-svn: 277299	2016-07-31 19:50:45 +00:00
Simon Pilgrim	64845bb8b4	[X86] Add tests for the lowering SHLD/SHRD from manual pattern similar to those generated by ExpandShiftWithKnownAmountBit Test for add(v,v) as well as shl(v,1) llvm-svn: 277293	2016-07-31 17:51:37 +00:00
Craig Topper	00d34ed64f	[AVX-512] Don't let ExeDependencyFix pass convert VPANDD/Q to VPANDPS/PD unless DQI instructions are supported. Same for ANDN, OR, and XOR. Thanks to Igor Breger for pointing out my mistake. llvm-svn: 277292	2016-07-31 17:15:07 +00:00
Simon Pilgrim	1e096b3a7a	[X86] Add tests for the lowering SHLD/SHRD from manual patterns As discussed on D23000 llvm-svn: 277291	2016-07-31 17:11:49 +00:00
Simon Pilgrim	9e201eac32	[SLPVectorizer][X86] Added vXi8/vXi16 sitofp/uitofp tests Dropped useless 2i32-2f32 test llvm-svn: 277281	2016-07-30 21:01:34 +00:00
Simon Pilgrim	fbe0fcb009	[X86][SSE] Regenerate vshift tests llvm-svn: 277278	2016-07-30 20:28:02 +00:00
Simon Pilgrim	f5134a2867	[SLPVectorizer][X86] Added SITOFP/UITOFP vectorization tests llvm-svn: 277275	2016-07-30 18:43:30 +00:00
Simon Pilgrim	2e9de1cebb	[X86][AVX] Added signum example test functions from PR13248 These are good examples of missed combine opportunities with zero/all bit vector compare results llvm-svn: 277274	2016-07-30 16:29:19 +00:00
Simon Pilgrim	b9e1f9c5c5	[X86][X87] Add vector arithmetic tests for targets with sse disabled To make sure the X86_64 target isn't doing anything stupid llvm-svn: 277272	2016-07-30 16:01:30 +00:00
Simon Pilgrim	cf49fa3251	[X86][SSE] Let 64-bit targets use the fast 2i32-2f32 UINT_TO_FP conversion as well as 32-bit The 2i32-2i64 legalization means that we can use the slightly quicker double bits + fptrunc approach for the same results llvm-svn: 277271	2016-07-30 14:06:59 +00:00
Matt Arsenault	749035b7b1	AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior This should really be true for any immediate, not just inline ones. llvm-svn: 277260	2016-07-30 01:40:36 +00:00
Weiming Zhao	812fde3603	DAG: avoid duplicated truncating for sign extended operand Summary: When performing cmp for EQ/NE and the operand is sign extended, we can avoid the truncaton if the bits to be tested are no less than origianl bits. Reviewers: eli.friedman Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits Differential Revision: https://reviews.llvm.org/D22933 llvm-svn: 277252	2016-07-29 23:33:48 +00:00
Tim Northover	5fc93b75d9	GlobalISel: translate "unreachable" (into nothing) Easiest instruction ever! llvm-svn: 277225	2016-07-29 22:41:55 +00:00
Tim Northover	5fb414d870	GlobalISel: support translation of intrinsic calls. These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS. We may decide to split the latter up with finer-grained restrictions later, if necessary. llvm-svn: 277224	2016-07-29 22:32:36 +00:00
Kevin Enderby	31b07f1445	Think this will fix issues with the error messages generated for malformed-archives.test in r277177 and added back this test which was deleted in r277196 while I tracked down these problems. Changed from constructing Twine's to std::string's as Twine's don't work across statements. Also removed a few unneeded Twine() constructions. Fix the write_escaped() calls to not pass the unintended second argument fixing the warning on the ld-x86_64-win7 bot. llvm-svn: 277223	2016-07-29 22:32:02 +00:00
Michael Kuperstein	f396b4c40d	[X86] Match PSADBW in straight-line code Up until now, we only had code to match PSADBW patterns that look like what comes out of the loop vectorizer - a partial reduction inside the loop body that gets fed into a horizontal operation in a different basic block. This adds support for straight-line patterns, like those generated by the SLP vectorizer. Differential Revision: https://reviews.llvm.org/D22889 llvm-svn: 277219	2016-07-29 21:45:51 +00:00
Michael Kuperstein	e9ac9b9aaf	[Hexagon] Fix test that uses -debug-only to require asserts. llvm-svn: 277218	2016-07-29 21:44:33 +00:00
Rui Ueyama	7a5cdc6225	pdbdump: Dump Free Page Map contents. Differential Revision: https://reviews.llvm.org/D22974 llvm-svn: 277216	2016-07-29 21:38:00 +00:00
Simon Pilgrim	f107ffa8f0	[X86][AVX] Fix VBROADCASTF128 selection bug (PR28770) Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors. llvm-svn: 277214	2016-07-29 21:05:10 +00:00
Tim Northover	6b3bd61283	CodeGen: add new "intrinsic" MachineOperand kind. This will be used during GlobalISel, where we need a more robust and readable way to write tests than a simple immediate ID. llvm-svn: 277209	2016-07-29 20:32:59 +00:00
Eli Bendersky	c08ff1267f	Add a REQUIRES: assert on a Lanai test that uses a -debug-only flag llvm-svn: 277204	2016-07-29 19:35:22 +00:00
Adam Nemet	12937c361f	[LoopUnroll] Include hotness of region in opt remark LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203	2016-07-29 19:29:47 +00:00
Simon Pilgrim	455460f310	Fixed line endings llvm-svn: 277199	2016-07-29 18:58:57 +00:00
David Majnemer	718da3d1f6	[ConstantFolding] Handle bitcasts of undef fp vector elements We used the wrong type for constructing a zero vector element which led to type mismatches. This fixes PR28771. llvm-svn: 277197	2016-07-29 18:48:27 +00:00
Kevin Enderby	5207b5dcae	Remove the test/tools/llvm-objdump/malformed-archives.test for now while I investagate the bot failures with this test. llvm-svn: 277196	2016-07-29 18:46:24 +00:00
Andrew Kaylor	b99d1cc7ed	Recommitting r275284: add support to inline __builtin_mempcpy Patch by Sunita Marathe Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!) llvm-svn: 277189	2016-07-29 18:23:18 +00:00
Kyle Butt	02d8d054ab	Codegen: MachineBlockPlacement Improve probability layout. The following pattern was being layed out poorly: A / \ B C / \ / \ D E ? (Doesn't matter) Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E) The current algorithm gives: A,B,C,E (D goes on worklist) It does this even if C has a frequency count of 0. This patch adjusts the layout calculation so that if freq(B->E) >> freq(C->E) then we go ahead and layout E rather than C. Fallthrough half the time is better than fallthrough never, or fallthrough very rarely. The resulting layout is: A,B,E, (C and D are in a worklist) llvm-svn: 277187	2016-07-29 18:09:28 +00:00
Kyle Butt	af324f76ad	Tests: Add branch weights to non-layout tests. Add branch weights to a few tests that aren't testing layout to make them less sensitive to changes in the layout algorithm. llvm-svn: 277186	2016-07-29 18:09:25 +00:00
Tim Northover	69c2ba546f	GlobalISel: add generic conditional branch. Just the basic equivalent to DAG's condbr for now, we'll get to things like br_cc when we start doing more legalization. llvm-svn: 277184	2016-07-29 17:58:00 +00:00
Krzysztof Parzyszek	b0c4376697	[Hexagon] Testcase for not merging stores into a misaligned store The DAG combiner will try to merge consecutive stores into a bigger store, unless the resulting store is not fast. Misaligned vector stores are allowed on Hexagon, but are not fast. Add a testcase to make sure this type of merging does not occur. Patch by Pranav Bhandarkar. llvm-svn: 277182	2016-07-29 17:55:37 +00:00
Krzysztof Parzyszek	3e137e3429	Revert r277178, the actual change had already been applied Will submit another patch with the testcase only. llvm-svn: 277180	2016-07-29 17:50:47 +00:00
Krzysztof Parzyszek	68fe439d06	[Hexagon] Misaligned loads and stores are not fast The DAG combiner tries to merge stores to adjacent vector wide memory locations by creating stores which are integral multiples of the vector width. Discourage this by informing it that this is slow. This should not affect legalization passes, because all of them ignore the "Fast" argument. Patch by Pranav Bhandarkar. llvm-svn: 277178	2016-07-29 17:45:16 +00:00
Kevin Enderby	f4586039f6	The next step along the way to getting good error messages for bad archives. As mentioned in commit log for r276686 this next step is adding a new method in the ArchiveMemberHeader class to get the full name that does proper error checking, and can be use for error messages. To do this the name of ArchiveMemberHeader::getName() is changed to ArchiveMemberHeader::getRawName() to be consistent with Archive::Child::getRawName(). Then the “new” method is the addition of a new implementation of ArchiveMemberHeader::getName() which gets the full name and provides proper error checking. Which is mostly a rewrite of what was Archive::Child::getName() and cleaning up incorrect uses of llvm_unreachable() in the code which were actually just cases of errors in the input Archives. Then Archive::Child::getName() is changed to return Expected<> and use the new implementation of ArchiveMemberHeader::getName() . Also needed to change Archive::getMemoryBufferRef() with these changes to return Expected<> as well to propagate Errors up. As well as changing Archive::isThinMember() to return Expected<> . llvm-svn: 277177	2016-07-29 17:44:13 +00:00
Ahmed Bougacha	6db3cfe2da	[AArch64][GlobalISel] Select G_XOR. llvm-svn: 277173	2016-07-29 16:56:25 +00:00
Ahmed Bougacha	784e3423e6	[GlobalISel] Add G_XOR. llvm-svn: 277172	2016-07-29 16:56:20 +00:00
Ahmed Bougacha	7adfac56b3	[AArch64][GlobalISel] Select G_LOAD/G_STORE. Mostly straightforward as we ignore addressing modes and just use the base + unsigned immediate offset (always 0) variants. This currently fails to select extloads because we have yet to agree on a representation. llvm-svn: 277171	2016-07-29 16:56:16 +00:00
Brendon Cahoon	254f889dc5	MachinePipeliner pass that implements Swing Modulo Scheduling Software pipelining is an optimization for improving ILP by overlapping loop iterations. Swing Modulo Scheduling (SMS) is an implementation of software pipelining that attempts to reduce register pressure and generate efficient pipelines with a low compile-time cost. This implementaion of SMS is a target-independent back-end pass. When enabled, the pass should run just prior to the register allocation pass, while the machine IR is in SSA form. If the pass is successful, then the original loop is replaced by the optimized loop. The optimized loop contains one or more prolog blocks, the pipelined kernel, and one or more epilog blocks. This pass is enabled for Hexagon only. To enable for other targets, a couple of target specific hooks must be implemented, and the pass needs to be called from the target's TargetMachine implementation. Differential Review: http://reviews.llvm.org/D16829 llvm-svn: 277169	2016-07-29 16:44:44 +00:00
Krzysztof Parzyszek	0bd55a7608	[Hexagon] Custom lower VECTOR_SHUFFLE and EXTRACT_SUBVECTOR for HVX If the mask of a vector shuffle has alternating odd or even numbers starting with 1 or 0 respectively up to the largest possible index for the given type in the given HVX mode (single of double) we can generate vpacko or vpacke instruction respectively. E.g. %42 = shufflevector <32 x i16> %37, <32 x i16> %41, <32 x i32> <i32 1, i32 3, ..., i32 63> is %42.h = vpacko(%41.w, %37.w) Patch by Pranav Bhandarkar. llvm-svn: 277168	2016-07-29 16:44:27 +00:00
Matt Masten	a6669a1e05	Initial support for vectorization using svml (short vector math library). Differential Revision: https://reviews.llvm.org/D19544 llvm-svn: 277166	2016-07-29 16:42:44 +00:00
Paul Robinson	fd0fb094be	Reinstate optnone test for GVN Hoisting, removed in r276479. llvm-svn: 277158	2016-07-29 16:05:50 +00:00
Nirav Dave	b20c34edf3	Remove inline-comment-2.ll until I can debug why it fails on some builds llvm-svn: 277152	2016-07-29 15:24:06 +00:00
Krzysztof Parzyszek	0006e1afdd	[Hexagon] Improve balancing of address calculation Rebalances address calculation trees and applies Hexagon-specific optimizations to the trees to improve instruction selection. Patch by Tobias Edler von Koch. llvm-svn: 277151	2016-07-29 15:15:35 +00:00
Nirav Dave	3853f608be	Fix inline-comment-2.ll triple llvm-svn: 277149	2016-07-29 15:12:00 +00:00
David L Kreitzer	8b959e5cfa	Avoid unnecessary 32-bit to 64-bit zero extensions following 32-bit CMOV instructions on x86_64. The 32-bit CMOV implicitly zero extends. Differential Revision: https://reviews.llvm.org/D22941 llvm-svn: 277148	2016-07-29 15:09:54 +00:00
Nirav Dave	8b3dc876ea	[MC] When emitting output hash comments always use standard line comment seperator llvm-svn: 277146	2016-07-29 14:42:00 +00:00
Daniel Sanders	cbaca42a03	Re-commit: [mips][fastisel] Handle 0-4 arguments without SelectionDAG. Summary: Implements fastLowerArguments() to avoid the need to fall back on SelectionDAG for 0-4 argument functions that don't do tricky things like passing double in a pair of i32's. This allows us to move all except one test to -fast-isel-abort=3. The remaining one has function prototypes of the form 'i32 (i32, double, double)' which requires floats to be passed in GPR's. The previous commit had an uninitialized variable that caused the incoming argument region to have undefined size. This has been fixed. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22680 llvm-svn: 277136	2016-07-29 12:27:28 +00:00
Simon Pilgrim	cb780b32a3	[X86][SSE] Optimize the truncation of vector comparison results with PACKSS We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element. Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently. We avoid performing this on AVX512 as it should have better alternative truncation instructions. Differential Revision: https://reviews.llvm.org/D22814 llvm-svn: 277132	2016-07-29 10:23:10 +00:00
Prakhar Bahuguna	d1233e857e	[Thumb] Emit Thumb move in both Thumb modes for struct_byval predicates Summary: The MOV/MOVT instructions being chosen for struct_byval predicates was conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction being incorrectly emitted in Thumb1 mode. This is especially apparent with v8-m.base targets. This patch ensures that Thumb instructions are emitted in both Thumb modes. Reviewers: rengolin, t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D22865 llvm-svn: 277128	2016-07-29 09:16:46 +00:00
Craig Topper	e4f868ea16	[AVX512] Mark EVEX VMOVSSrm and VMOVSDrm as canFoldAsLoad and isReMaterializable. llvm-svn: 277120	2016-07-29 06:06:04 +00:00
Craig Topper	07aa37039e	[AVX512] Add AVX512 run lines to some tests for scalar fma/add/sub/mul/div and regenerate. Follow up commits will bring AVX512 code up to the same quality as AVX/SSE. llvm-svn: 277118	2016-07-29 06:05:58 +00:00
David Majnemer	130b9f99d6	[EarlyCSE] Correctly handle simplified, but live, instructions Some instructions may have their uses replaced with a symbolic constant. However, the instruction may still have side effects which percludes it from being removed from the function. EarlyCSE treated such an instruction as if it were removed, resulting in PR28763. llvm-svn: 277114	2016-07-29 05:39:21 +00:00
David Majnemer	e4218cf11e	[ConstantFolding] Fold bitcasts of vectors w/ undef elements An undef vector element can be treated as if it had any value. Folding such a vector element to 0 in a bitcast can open up further folding opportunities. llvm-svn: 277104	2016-07-29 04:06:09 +00:00
David Majnemer	57b94c8d6a	[ConstantFolding] Use ConstantExpr::getWithOperands ConstantExpr::getWithOperands does much of the hard work that ConstantFoldInstOperandsImpl tries to do but more completely. This lets us fold ExtractValue/InsertValue expressions. llvm-svn: 277100	2016-07-29 03:27:31 +00:00
David Majnemer	d536f2328e	[ConstnatFolding] Teach the folder how to fold ConstantVector A ConstantVector can have ConstantExpr operands and vice versa. However, the folder had no ability to fold ConstantVectors which, in some cases, was an optimization barrier. Instead, rephrase the folder in terms of Constants instead of ConstantExprs and teach callers how to deal with failure. llvm-svn: 277099	2016-07-29 03:27:26 +00:00
Craig Topper	c7de3a1018	[AVX512] Remove the intrinsic forms of VMOVSS/VMOVSD. We don't need two different forms of 'rr' and 'rm'. This matches SSE/AVX. I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently. llvm-svn: 277098	2016-07-29 02:49:08 +00:00
Teresa Johnson	7de70738df	Capture stderr when checking for gold version On MacOS the ld version is emitted to stderr, resulting in lots of messages in the ninja check output. llvm-svn: 277092	2016-07-29 00:39:56 +00:00
Piotr Padlewski	84abc74f2c	Added ThinLTO inlining statistics Summary: copypasta doc of ImportedFunctionsInliningStatistics class \brief Calculate and dump ThinLTO specific inliner stats. The main statistics are: (1) Number of inlined imported functions, (2) Number of imported functions inlined into importing module (indirect), (3) Number of non imported functions inlined into importing module (indirect). The difference between first and the second is that first stat counts all performed inlines on imported functions, but the second one only the functions that have been eventually inlined to a function in the importing module (by a chain of inlines). Because llvm uses bottom-up inliner, it is possible to e.g. import function `A`, `B` and then inline `B` to `A`, and after this `A` might be too big to be inlined into some other function that calls it. It calculates this statistic by building graph, where the nodes are functions, and edges are performed inlines and then by marking the edges starting from not imported function. If `Verbose` is set to true, then it also dumps statistics per each inlined function, sorted by the greatest inlines count like - number of performed inlines - number of performed inlines to importing module Reviewers: eraman, tejohnson, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D22491 llvm-svn: 277089	2016-07-29 00:27:16 +00:00
Sanjoy Das	c6af5ead86	[IR] Introduce a non-integral pointer type Summary: This change adds a `ni` specifier in the `datalayout` string to denote pointers in some given address spaces as "non-integral", and adds some typing rules around these special pointers. Reviewers: majnemer, chandlerc, atrick, dberlin, eli.friedman, tstellarAMD, arsenm Subscribers: arsenm, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22488 llvm-svn: 277085	2016-07-28 23:43:38 +00:00
Adam Nemet	aa3506c5f0	[BPI] Add new LazyBPI analysis Summary: The motivation is the same as in D22141: In order to add the hotness attribute to optimization remarks we need BFI to be available in all passes that emit optimization remarks. BFI depends on BPI so unless we make this lazy as well we would still compute BPI unconditionally. The solution is to use the new LazyBPI pass in LazyBFI and only compute BPI when computation of BFI is requested by the client. I extended the laziness test using a LoopDistribute test to also cover BPI. Reviewers: hfinkel, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22835 llvm-svn: 277083	2016-07-28 23:31:12 +00:00
Changpeng Fang	26fb9d268b	AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB. Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073	2016-07-28 23:01:45 +00:00
Vitaly Buka	0ab23cf1c8	Do not remove empty lifetime.start/lifetime.end ranges Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277072	2016-07-28 22:59:03 +00:00
Vitaly Buka	2fae6a7702	Should be committed as one CL. This reverts commits r277068 r277067 r277066. llvm-svn: 277071	2016-07-28 22:59:01 +00:00
Vitaly Buka	f0500b6ae5	Do not remove empty lifetime.start/lifetime.end ranges Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277068	2016-07-28 22:50:48 +00:00
Vitaly Buka	3645793872	maned llvm-svn: 277067	2016-07-28 22:50:45 +00:00
Michael Kuperstein	e45d4d9b35	[PM] Port LowerGuardIntrinsic to the new PM. llvm-svn: 277057	2016-07-28 22:08:41 +00:00
David Majnemer	3d32b7ed0d	[coroutines] Part 3 of N: Adding Boilerplate for Coroutine Passes This adds boilerplate code for all coroutine passes, the passes are no-ops for now. Also, a small test has been added to verify that passes execute in the expected order or not at all if coroutine support is disabled. Patch by Gor Nishanov! Differential Revision: https://reviews.llvm.org/D22847 llvm-svn: 277033	2016-07-28 21:04:31 +00:00
Krzysztof Parzyszek	6400dec5ab	Fix build breaks after r277028 llvm-svn: 277031	2016-07-28 20:25:21 +00:00
Krzysztof Parzyszek	167d918225	[Hexagon] Implement MI-level constant propagation llvm-svn: 277028	2016-07-28 20:01:59 +00:00
Krzysztof Parzyszek	c43644d332	[Hexagon] Insert CFI instructions before throwing calls Normally, CFI instructions should be inserted after allocframe, but if allocframe is in the same packet with a call, the CFI instructions should be inserted before that packet. llvm-svn: 277020	2016-07-28 19:13:46 +00:00
Ahmed Bougacha	8550509b64	[AArch64][GlobalISel] Select G_BR. This is the first unsized instruction we support; move down the 'sized' check to binops. llvm-svn: 277007	2016-07-28 17:15:15 +00:00
Ahmed Bougacha	d760de0b32	[MIRParser] Accept unsized generic instructions. Since r276158, we require generic instructions to have a sized type. G_BR doesn't; relax the restriction. llvm-svn: 277006	2016-07-28 17:15:12 +00:00
Ahmed Bougacha	d7748d6491	[AArch64][GlobalISel] Select GPR G_SUB. llvm-svn: 277003	2016-07-28 16:58:35 +00:00
Ahmed Bougacha	61a7928dde	[AArch64][GlobalISel] Select GPR G_AND. llvm-svn: 277002	2016-07-28 16:58:31 +00:00
Ahmed Bougacha	46c05fc861	[GlobalISel] Remove types on selected insts instead of using LLT(). LLT() has a particular meaning: it's one invalid type. But we really want selected instructions to have no type whatsoever. Also verify that types don't linger after ISel, and enable the verifier on the AArch64 select test. llvm-svn: 277001	2016-07-28 16:58:27 +00:00
Ahmed Bougacha	07994ec39b	[AArch64][GlobalISel] Remove 'alignment' from MIR tests. NFC. llvm-svn: 277000	2016-07-28 16:58:21 +00:00
Wei Ding	07e03712d3	AMDGPU : Add intrinsics for compare with the full wavefront result Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998	2016-07-28 16:42:13 +00:00
Daniel Sanders	6e74651658	Revert r276982 and r276984: [mips][fastisel] Handle 0-4 arguments without SelectionDAG It seems that the stack offset in callabi.ll varies between machines. I'll look into it. llvm-svn: 276989	2016-07-28 15:37:42 +00:00
Craig Topper	7e27885f69	[X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own. Differential Revision: https://reviews.llvm.org/D22799 llvm-svn: 276987	2016-07-28 15:28:56 +00:00
Daniel Sanders	e0b529f619	[mips][fastisel] Handle 0-4 arguments without SelectionDAG. Summary: Implements fastLowerArguments() to avoid the need to fall back on SelectionDAG for 0-4 argument functions that don't do tricky things like passing double in a pair of i32's. This allows us to move all except one test to -fast-isel-abort=3. The remaining one has function prototypes of the form 'i32 (i32, double, double)' which requires floats to be passed in GPR's. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22680 llvm-svn: 276982	2016-07-28 14:55:28 +00:00
Nicolai Haehnle	3b572002a2	AMDGPU: add execfix flag to SI_ELSE Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972	2016-07-28 11:39:24 +00:00
David Majnemer	19d024b2fd	[ConstantFolding] Don't bail on folding if ConstantFoldConstantExpression fails When folding an expression, we run ConstantFoldConstantExpression on each operand of that expression. However, ConstantFoldConstantExpression can fail and retur nullptr. Previously, we would bail on further refining the expression. Instead, use the original operand and see if we can refine a later operand. llvm-svn: 276959	2016-07-28 06:39:48 +00:00
David Majnemer	67f684e18e	[CodeView] Don't crash on functions without subprograms A function may have instructions annotated with debug info without having a subprogram. This fixes PR28747. llvm-svn: 276956	2016-07-28 05:03:22 +00:00
David Majnemer	0be7155350	[InstCombine] Handle failures from ConstantFoldConstantExpression ConstantFoldConstantExpression returns null when folding fails. This fixes PR28745. llvm-svn: 276952	2016-07-28 02:29:06 +00:00
Wei Mi	315bb33f27	Fix the assertion error in collectLoopUniforms caused by empty Worklist before expanding. Contributed-by: David Callahan Differential Revision: https://reviews.llvm.org/D22886 llvm-svn: 276943	2016-07-27 23:53:58 +00:00
George Burgess IV	dbd35c44d4	[CFLAA] Add getModRefBehavior to CFLAnders. This patch lets CFLAnders respond to mod-ref queries. It also includes a small bugfix to CFLSteens. Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22823 llvm-svn: 276939	2016-07-27 23:07:07 +00:00
Vedant Kumar	fc07e8b428	[llvm-cov] Add a debug mode for source range highlighting (in html) llvm-cov's `-dump' option now emits information which helps debug source range highlighting in html mode. llvm-svn: 276924	2016-07-27 21:57:15 +00:00
Justin Lebar	23a9686011	[LSV] Don't assume that bitcast ops are Instructions. Summary: When we ask the builder to create a bitcast on a constant, we get back a constant, not an instruction. Reviewers: asbirlea Subscribers: jholewinski, mzolotukhin, llvm-commits, arsenm Differential Revision: https://reviews.llvm.org/D22878 llvm-svn: 276922	2016-07-27 21:45:48 +00:00
Krzysztof Parzyszek	06a2b6b1ee	[Hexagon] Find speculative loop preheader in hardware loop generation Before adding a new preheader block, check if there is a candidate block where the loop setup could be placed speculatively. This will be off by default. llvm-svn: 276919	2016-07-27 21:20:54 +00:00
Krzysztof Parzyszek	5241b8efcf	[Hexagon] Do not optimize volatile stack spill slots llvm-svn: 276916	2016-07-27 20:50:42 +00:00
Andrew Kaylor	9155354ff2	Revert EH-specific checks in BranchFolding that were causing blow ups in compile time. Differential Revision: https://reviews.llvm.org/D22839 llvm-svn: 276898	2016-07-27 17:55:33 +00:00
Tim Northover	8d2f52e035	GlobalISel: support zero-sized allocas All allocas must be at least 1 byte at the MachineIR level so we allocate just one byte. llvm-svn: 276897	2016-07-27 17:47:54 +00:00
Nirav Dave	06a99a46e2	[MC][X86] Fix Intel Operand assembly parsing for .set ids Fix intel syntax special case identifier operands that refer to a constant (e.g. .set <ID> n) to be interpreted as immediate not memory in parsing. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22585 llvm-svn: 276895	2016-07-27 17:39:41 +00:00
Simon Pilgrim	7efafbc583	[X86][SSE] Updated test so that both are applying the post-multiply This is to ensure that there are no diffs other than due to buildvector/legalization llvm-svn: 276882	2016-07-27 15:30:20 +00:00
Renato Golin	5267d53779	[ARM] Check that the thumb COFF segment flag gets set on thumb windows Patch by Martin Storsjö. llvm-svn: 276877	2016-07-27 14:37:18 +00:00
Ahmed Bougacha	6756a2c953	[GlobalISel] Introduce an instruction selector. And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875	2016-07-27 14:31:55 +00:00
Daniel Sanders	c5537427c2	[mips][ias] Check '$rs = $rd' constraints when both registers are in AsmText. Summary: This is one possible solution to the problem of ignoring constraints that Simon raised in D21473 but it's a bit of a hack. The integrated assembler currently ignores violations of the tied register constraints when the operands involved in a tie are both present in the AsmText. For example, 'dati $rs, $rt, $imm' with the '$rs = $rt' will silently replace $rt with $rs. So 'dati $2, $3, 1' is processed as if the user provided 'dati $2, $2, 1' without any diagnostic being emitted. This is difficult to solve properly because there are multiple parts of the matcher that are silently forcing these constraints to be met. Tied operands are rendered to instructions by cloning previously rendered operands but this is unnecessary because the matcher was already instructed to render the operand it would have cloned. This is also unnecessary because earlier code has already replaced the MCParsedOperand with the one it was tied to (so the parsed input is matched as if it were 'dati <RegIdx 2>, <RegIdx 2>, <Imm 1>'). As a result, it looks like fixing this properly amounts to a rewrite of the tied operand handling which affects all targets. This patch however, merely inserts a checking hook just before the substitution of MCParsedOperands and the Mips target overrides it. It's not possible to accurately check the registers are the same this early (because numeric registers haven't been bound to a register class yet) so it cheats a bit and checks that the tokens that produced the operand are lexically identical. This works because tied registers need to have the same register class but it does have a flaw. It will reject 'dati $4, $a0, 1' for violating the constraint even though $a0 ends up as the same register as $4. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D21994 llvm-svn: 276867	2016-07-27 13:49:44 +00:00
Teresa Johnson	d92012d51c	[test/gold] Add gold test subdirectory tests needing v1.12 (or higher) Summary: As discussed in the review for D22677, added a subdirectory to enable tests that require at least version 1.12 of gold. Add an initial test requiring this version. Reviewers: davidxl, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D22827 llvm-svn: 276860	2016-07-27 12:59:51 +00:00
Renato Golin	80e58869f8	[ARM] Set a non-conflicting comment character for assembly in MSVC mode Currently, for ARMCOFFMCAsmInfoMicrosoft, no comment character is set, thus the idefault, '#', is used. The hash character doesn't work as comment character in ARM assembly, since '#' is used for immediate values. The comment character is set to ';', which is the comment character used by MS armasm.exe. (The microsoft armasm.exe uses a different directive syntax than what LLVM currently supports though, similar to ARM's armasm.) This allows inline assembly with immediate constants to be built (and brings the assembly output from clang -S closer to being possible to assemble). A test is added that verifies that ';' is correctly interpreted as comments in this mode, and verifies that assembling code that includes literal constants with a '#' works. Patch by Martin Storsjö. llvm-svn: 276859	2016-07-27 12:31:58 +00:00
Renato Golin	b9edb5c9b0	[ARM] Adds test for immediate encoding The encoding of expressions as immediates wasn't correct, and was reported in PR23000. However, we have done some refactoring on how immediates are handled and now it seems the problem is fixed. This is a test just to make sure it won't regress again. llvm-svn: 276858	2016-07-27 12:15:26 +00:00
Simon Pilgrim	10bf0ff879	[DAGCombiner] Use APInt directly to detect out of range shift constants Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match As mentioned on D22726 llvm-svn: 276855	2016-07-27 10:30:55 +00:00
Davide Italiano	7c9fc738b1	[MC] Add command-line option to choose the max nest level in asm macros. Submitted by: t83wCSLq Differential Revision: https://reviews.llvm.org/D22313 llvm-svn: 276842	2016-07-27 05:51:56 +00:00
Sebastian Pop	55c3007b88	GVN-hoist: improve code generation for recursive GEPs When loading or storing in a field of a struct like "a.b.c", GVN is able to detect the equivalent expressions, and GVN-hoist would fail in the code generation. This is because the GEPs are not hoisted as scalar operations to avoid moving the GEPs too far from their ld/st instruction when the ld/st is not movable. So we end up having to generate code for the GEP of a ld/st when we move the ld/st. In the case of a GEP referring to another GEP as in "a.b.c" we need to code generate all the GEPs necessary to make all the operands available at the new location for the ld/st. With this patch we recursively walk through the GEP operands checking whether all operands are available, and in the case of a GEP operand, it recursively makes all its operands available. Code generation happens from the inner GEPs out until reaching the GEP that appears as an operand of the ld/st. Differential Revision: https://reviews.llvm.org/D22599 llvm-svn: 276841	2016-07-27 05:48:12 +00:00
Vedant Kumar	90be9db7d9	[llvm-cov] Escape '\' in strings when emitting JSON Test that Windows path separators are escaped properly. Add a round-trip test to verify the JSON produced by the exporter. llvm-svn: 276832	2016-07-27 04:08:32 +00:00
David Majnemer	bc36b15253	[ConstantFolding] Correctly handle failures in ConstantFoldConstantExpressionImpl Failures in ConstantFoldConstantExpressionImpl were ignored causing crashes down the line. This fixes PR28725. llvm-svn: 276827	2016-07-27 02:39:16 +00:00
Andrew Kaylor	f990fa5f7b	Reverting r276771 due to MSan failures. llvm-svn: 276824	2016-07-27 01:19:24 +00:00
Matt Arsenault	e3862cdc93	AMDGPU: Use rcp for fdiv 1, x with fpmath metadata Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823	2016-07-26 23:25:44 +00:00
Hans Wennborg	685e8ff953	Revert r276136 "Use ValueOffsetPair to enhance value reuse during SCEV expansion." It causes Clang tests to fail after Windows self-host (PR28705). (Also reverts follow-up r276139.) llvm-svn: 276822	2016-07-26 23:25:13 +00:00
Matt Arsenault	918e81c3d4	AMDGPU: Add more tests for LDS size with occupancy llvm-svn: 276821	2016-07-26 23:15:59 +00:00
Vedant Kumar	7101d73c71	Retry: [llvm-cov] Add support for exporting coverage data to JSON This enables users to export coverage information as portable JSON for use by analysis tools and storage in document based databases. The export sub-command is invoked just like the others: llvm-cov export -instr-profile path/to/foo.profdata path/to/foo.binary The resulting JSON contains a list of files and functions. Every file object contains a list of segments, expansions, and a summary of the file's region, function, and line coverage. Every function object contains the function's name and regions. There is also a total summary for the entire object file. Changes since the initial commit (r276813): - Fixed the regexes in the tests to handle Windows filepaths. Patch by Eddie Hurtig! Differential Revision: https://reviews.llvm.org/D22651 llvm-svn: 276818	2016-07-26 22:50:58 +00:00
Vedant Kumar	e85353b849	Revert "[llvm-cov] Add support for exporting coverage data to JSON" This reverts commit r276813. The Windows bots are complaining about some of the filename regexes in the tests: http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/5299 llvm-svn: 276816	2016-07-26 21:55:39 +00:00
Matthias Braun	333e468d15	MIRParser: Use dot instead of colon to mark subregisters Change the syntax to use `%0.sub8` to denote a subregister. This seems like a more natural fit to denote subregisters; I also plan to introduce a new ":classname" syntax in upcoming patches to denote the register class of a vreg. Note that this commit disallows plain identifiers to start with a '.' character. This shouldn't affect anything as external names/IR references are all prefixed with '$'/'%', plain identifiers are only used for instruction names, register mask names and subreg indexes. Differential Revision: https://reviews.llvm.org/D22390 llvm-svn: 276815	2016-07-26 21:49:34 +00:00
Vedant Kumar	d5b7436c1f	[llvm-cov] Add support for exporting coverage data to JSON This enables users to export coverage information as portable JSON for use by analysis tools and storage in document based databases. The export sub-command is invoked just like the others: llvm-cov export -instr-profile path/to/foo.profdata path/to/foo.binary The resulting JSON contains a list of files and functions. Every file object contains a list of segments, expansions, and a summary of the file's region, function, and line coverage. Every function object contains the function's name and regions. There is also a total summary for the entire object file. Patch by Eddie Hurtig! Differential Revision: https://reviews.llvm.org/D22651 llvm-svn: 276813	2016-07-26 21:35:43 +00:00
Krzysztof Parzyszek	2a480599bb	[Hexagon] Post-increment loads/stores enhancements - Generate vector post-increment stores more aggressively. - Predicate post-increment and vector stores in early if-conversion. llvm-svn: 276800	2016-07-26 20:30:30 +00:00
Tim Northover	ad2b717f2c	GlobalISel: add generic load and store instructions. Pretty straightforward, the only oddity is the MachineMemOperand (which it's surprisingly difficult to share code for). llvm-svn: 276799	2016-07-26 20:23:26 +00:00
Krzysztof Parzyszek	57c3ddddec	[Hexagon] Gracefully handle reg class mismatch in HexagonLoopReschedule llvm-svn: 276793	2016-07-26 19:17:13 +00:00
Krzysztof Parzyszek	6eba5b8c37	[Hexagon] Rerun bit tracker on new instructions in RIE Consider this case: vreg1 = A2_zxth vreg0 (1) ... vreg2 = A2_zxth vreg1 (2) Redundant instruction elimination could delete the instruction (1) because the user (2) only cares about the low 16 bits. Then it could delete (2) because the input is already zero-extended. The problem is that the properties allowing each individual instruction to be deleted depend on the existence of the other instruction, so either one can be deleted, but not both. The existing check for this situation in RIE was insufficient. The fix is to update all dependent cells when an instruction is removed (replaced via COPY) in RIE. llvm-svn: 276792	2016-07-26 19:08:45 +00:00
Krzysztof Parzyszek	1adca30c39	[Hexagon] Bitwise operations for insert/extract word not simplified Change the bit simplifier to generate REG_SEQUENCE instructions in addition to COPY, which will handle cases of word insert/extract. llvm-svn: 276787	2016-07-26 18:30:11 +00:00
Justin Lebar	16da82f4d2	Fix NVPTX/call-with-alloca-buffer.ll after r276777. r276777 makes InstSimplify stronger, letting it see through some unnecessary addrspace casts. llvm-svn: 276786	2016-07-26 18:28:33 +00:00
Matthias Braun	ee0679207b	MIRParser: Use shorter cfi identifiers In an instruction like: CFI_INSTRUCTION .cfi_def_cfa ... we can drop the '.cfi_' prefix since that should be obvious by the context: CFI_INSTRUCTION def_cfa ... While being a terser and cleaner syntax this also prepares to dropping support for identifiers starting with a dot character so we can use it for expressions. Differential Revision: http://reviews.llvm.org/D22388 llvm-svn: 276785	2016-07-26 18:20:00 +00:00
Davide Italiano	f17d48e58a	[MC] Don't crash when trying to emit a relocation against .bss. Turn that into an error instead. llvm-svn: 276783	2016-07-26 18:16:33 +00:00
David Majnemer	6774d612d4	[InstSimplify] Cast folding can be made more generic Use isEliminableCastPair to determine if a pair of casts are foldable. llvm-svn: 276777	2016-07-26 17:58:05 +00:00
Tim Northover	ab395cb071	GlobalISel: add correct operand type to G_FRAME_INDEX instrs. Frame indices should use "addFrameIndex", not "addImm". llvm-svn: 276775	2016-07-26 17:42:40 +00:00
Krzysztof Parzyszek	29c567a3f0	[Hexagon] Add support for proper handling of H and L constraints H -> High part of reg pair. L -> Low part of reg pair. Patch by Sundeep Kushwaha. llvm-svn: 276773	2016-07-26 17:31:02 +00:00
Tim Northover	26e40bdb9b	GlobalISel: omit braces on MachineInstr types when there's only one. Tidies up the representation a bit in the common case. llvm-svn: 276772	2016-07-26 17:28:01 +00:00
Andrew Kaylor	3104a6bad0	Re-committing r275284: add support to inline __builtin_mempcpy Patch by Sunita Marathe Differential Revision: http://reviews.llvm.org/D21920 llvm-svn: 276771	2016-07-26 17:23:13 +00:00
Matt Arsenault	07f65718bb	AMDGPU: Add missing tests for xnack option for HSA llvm-svn: 276765	2016-07-26 16:45:50 +00:00
Matt Arsenault	32fc527c65	AMDGPU: Add fp legacy instruction intrinsics This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764	2016-07-26 16:45:45 +00:00
Oliver Stannard	1c6e591457	[ARM] Improve error messages for .arch_extension directive - More informative message when extension name is not an identifier token. - Stop parsing directive if extension is unknown (avoid duplicate error messages). - Report unsupported extensions with a source location, rather than report_fatal_error. Differential Revision: https://reviews.llvm.org/D22806 llvm-svn: 276748	2016-07-26 14:24:43 +00:00
Oliver Stannard	2171828a49	[ARM] Implement -mimplicit-it assembler option This option, compatible with gas's -mimplicit-it, controls the generation/checking of implicit IT blocks in ARM/Thumb assembly. This option allows two behaviours that were not possible before: - When in ARM mode, emit a warning when assembling a conditional instruction that is not in an IT block. This is enabled with -mimplicit-it=never and -mimplicit-it=thumb. - When in Thumb mode, automatically generate IT instructions when an instruction with a condition code appears outside of an IT block. This is enabled with -mimplicit-it=thumb and -mimplicit-it=always. The default option is -mimplicit-it=arm, which matches the existing behaviour (allow conditional ARM instructions outside IT blocks without warning, and error if a conditional Thumb instruction is outside an IT block). The general strategy for generating IT blocks in Thumb mode is to keep a small list of instructions which should be in the IT block, and only emit them when we encounter something in the input which means we cannot continue the block. This could be caused by: - A non-predicable instruction - An instruction with a condition not compatible with the IT block - The IT block already contains 4 instructions - A branch-like instruction (including ALU instructions with the PC as the destination), which cannot appear in the middle of an IT block - A label (branching into an IT block is not legal) - A change of section, architecture, ISA, etc - The end of the assembly file. Some of these, such as change of section and end of file, are parsed outside of the ARM asm parser, so I've added a new virtual function to AsmParser to ensure any previously-parsed instructions have been emitted. The ARM implementation of this flushes the currently pending IT block. We now have to try instruction matching up to 3 times, because we cannot know if the current IT block is valid before matching, and instruction matching changes depending on the IT block state (due to the 16-bit ALU instructions, which set the flags iff not in an IT block). In the common case of not having an open implicit IT block and the instruction being matched not needing one, we still only have to run the matcher once. I've removed the ITState.FirstCond variable, because it does not store any information that isn't already represented by CurPosition. I've also updated the comment on CurPosition to accurately describe it's meaning (which this patch doesn't change). Differential Revision: https://reviews.llvm.org/D22760 llvm-svn: 276747	2016-07-26 14:19:47 +00:00
Simon Pilgrim	0280959c0d	[X86][SSE] Added extra memory folding tests for cvtsd2ss intrinsic SSE only fold partial reg update instructions when optsize is enabled llvm-svn: 276743	2016-07-26 12:44:50 +00:00
Simon Pilgrim	019e102426	[X86][SSE] Fixed issue with memory folding of (v)cvtsd2ss intrinsics Fixed typo in the intrinsic definitions of (v)cvtsd2ss with memory folding. This was only unearthed when rL276102 started using the intrinsic again..... llvm-svn: 276740	2016-07-26 10:41:28 +00:00
Simon Dardis	68a204ddc1	[mips] MIPS64R6 compact branch support MIPS64R6 compact branch support. As the MIPS LLVM backend uses distinct MachineInstrs for certain 32 and 64 bit instructions (e.g. BEQ & BEQ64) that map to the same instruction, extend compact branch support for the corresponding 64bit branches. Reviewers: dsanders Differential Revision: https://reviews.llvm.org/D20164 llvm-svn: 276739	2016-07-26 10:25:07 +00:00
Simon Dardis	273fc26b79	[mips] sgtu, s[rl]l, sra, dnegu, neg instruction aliases Add the instruction alias sgtu (register form only), two operand forms of s[rl]l and sra, and missing single/two operand forms of dnegu/neg. Reviewers: dsanders Differential Revision: https://reviews.llvm.org/D22752 llvm-svn: 276736	2016-07-26 09:13:46 +00:00
David Majnemer	a90a621d1e	Reapply: [InstSimplify] Add support for bitcasts" This reverts commit r276700 and reapplies r276698. The relevant clang tests have been updated. llvm-svn: 276727	2016-07-26 05:52:29 +00:00
Sebastian Pop	91d4a30159	GVN-hoist: use a DFS numbering of instructions (PR28670) Instead of DFS numbering basic blocks we now DFS number instructions that avoids the costly operation of which instruction comes first in a basic block. Patch mostly written by Daniel Berlin. Differential Revision: https://reviews.llvm.org/D22777 llvm-svn: 276714	2016-07-26 00:15:10 +00:00
Evgeniy Stepanov	906f6fb565	[safestack] Fix stack guard live range. Stack guard slot is live throughout the function. llvm-svn: 276712	2016-07-26 00:05:14 +00:00
Adam Nemet	39e039c606	[lit] Don't match tool names within new PM's <> markers For example, stop expanding 'opt' in -passes='require<opt-remark-emit>'. llvm-svn: 276707	2016-07-25 23:09:10 +00:00
Renato Golin	32b165f561	[ARM] Saturation instructions are DSP-only The saturation instructions appeared in v6T2, with DSP extensions, but they were being accepted / generated on any, with the new introduction of the saturation detection in the back-end. This commit restricts the usage to DSP-enable only cores. Fixes PR28607. llvm-svn: 276701	2016-07-25 22:25:25 +00:00
David Majnemer	6e06b577cc	Revert "[InstSimplify] Add support for bitcasts" This reverts commit r276698. Clang has tests which rely on the optimizer :( llvm-svn: 276700	2016-07-25 22:24:59 +00:00
David Majnemer	62611fd3f7	[InstSimplify] Add support for bitcasts BitCasts of BitCasts can be folded away as can BitCasts which don't change the type of the operand. llvm-svn: 276698	2016-07-25 22:04:58 +00:00
Simon Pilgrim	fa863fb2f1	[X86] Regenerate v2i256 shift legalization tests llvm-svn: 276692	2016-07-25 21:14:22 +00:00
Simon Pilgrim	2d7735e428	[X86] Regenerate i64 shift legalization tests llvm-svn: 276691	2016-07-25 21:11:45 +00:00
Tim Northover	7c9eba90ff	GlobalISel: add generic casts to IRTranslator This adds LLVM's 3 main cast instructions (inttoptr, ptrtoint, bitcast) to the IRTranslator. The first two are direct translations (with 2 MachineInstr types each). Since LLT discards information, a bitcast might become trivial and we emit a COPY in those cases instead. llvm-svn: 276690	2016-07-25 21:01:29 +00:00
Tim Northover	e2e0067352	GlobalISel[AArch64]: support pointer types in argument lowering. They're basically i64 for AArch64, but we'll leave them intact for stranger targets. Also add some tests for the (very few) other cases we can handle right now. llvm-svn: 276689	2016-07-25 21:01:17 +00:00
Michael Kuperstein	39feb6290c	[PM] Port SymbolRewriter to the new PM Differential Revision: https://reviews.llvm.org/D22703 llvm-svn: 276687	2016-07-25 20:52:00 +00:00
Kevin Enderby	95b0842e64	Next step along the way to getting good error messages for bad archives. I consulted with Lang Hames on this work, and the goal was to add a bit of "where" in the archive the error occurred along with what the error was. So this step changes ArchiveMemberHeader into a class with a pointer to the archive header and the parent archive. Which allows the methods in the ArchiveMemberHeader to determine which member the header is for to include that information in the error message. For this first step the "where" is just the offset to the member in the archive. The next step will be a new method on ArchiveMemberHeader to get the full name, if possible, to be use in the error message. Which will now be possible as ArchiveMemberHeader contains a pointer to the Archive with its string table and its size, etc. so the full name can be determined from the header if it is valid. Also this change adds the missing checks the archive header is actually contained in the buffer and is not truncated, as well as if the terminating characters are correct in the header. And changes one error message in Archive::Child::getNext() where the name or offset to member is now added. llvm-svn: 276686	2016-07-25 20:36:36 +00:00
Jan Vesely	b64c8925e9	AMDGPU: Remove read_workdim intrinsic Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682	2016-07-25 20:17:02 +00:00
Matt Arsenault	7cddfed7e8	Scalarizer: Support scalarizing intrinsics llvm-svn: 276681	2016-07-25 20:02:54 +00:00
Matt Arsenault	e047b2598e	AMDGPU: Fix missing verify-machineinstrs in control flow test llvm-svn: 276679	2016-07-25 19:39:06 +00:00
Evgeniy Stepanov	8d78bd5041	Fix invalid iterator use in safestack coloring. llvm-svn: 276676	2016-07-25 19:25:40 +00:00
Rong Xu	705f7775bb	[PGO] Fix profile mismatch in COMDAT function with pre-inliner Pre-instrumentation inline (pre-inliner) greatly improves the IR instrumentation code performance, among other benefits. One issue of the pre-inliner is it can introduce CFG-mismatch for COMDAT functions. This is due to the fact that the same COMDAT function may have different early inline decisions across different modules -- that means different copies of COMDAT functions will have different CFG checksum. In this patch, we propose a partially renaming the COMDAT group and its member function/variable so we have different profile counter for each version. We will post-fix the COMDAT function and the group name with its FunctionHash. Differential Revision: http://reviews.llvm.org/D22600 llvm-svn: 276673	2016-07-25 18:45:37 +00:00
Simon Pilgrim	ce8d82775c	[X86][SSE] Added 2048-bit vector comparison tests Upper limit of what can be held in a <32 x i8> result llvm-svn: 276666	2016-07-25 17:56:01 +00:00
Elena Demikhovsky	64e5f929d0	AVX-512: Fixed [US]INT_TO_FP selection for i1 vectors. It failed with assertion before this patch. Differential Revision: https://reviews.llvm.org/D22735 llvm-svn: 276648	2016-07-25 16:51:00 +00:00
Wei Mi	97de034e18	Remove useless pass from the pipeline in test/Analysis/Dominators/2007-01-14-BreakCritEdges.ll. llvm-svn: 276644	2016-07-25 16:27:34 +00:00
Krzysztof Parzyszek	080bebd212	[Hexagon] Add target feature to generate long calls llvm-svn: 276638	2016-07-25 14:42:11 +00:00
Sam Parker	d5ca0a65b5	[ARM] Improve longMAC codegen test Added thumb targets and dataflow checks to the longMAC test. Differential Revision: https://reviews.llvm.org/D22684 llvm-svn: 276629	2016-07-25 10:11:00 +00:00
Simon Dardis	618975206e	[mips] Optimize materialization of i64 constants Avoid MipsAnalyzeImmediate usage if the constant fits in an 32-bit integer. This allows us to generate the same instructions for the materialization of the same constants regardless the width of their type. Patch by: Vasileios Kalintiris Contributions by: Simon Dardis Reviewers: Daniel Sanders Differential Review: https://reviews.llvm.org/D21689 llvm-svn: 276628	2016-07-25 09:57:28 +00:00
Sam Parker	68c71cd1e4	[ARM] Enable ISel of SMMLS for ARM and Thumb2 Use ISelDAGToDAG to recognise the SMMLS instruction pattern. Differential Revision: https://reviews.llvm.org/D22562 llvm-svn: 276624	2016-07-25 09:20:20 +00:00
Craig Topper	ce415ff9c5	[AVX512] Add load folding support for the unmasked forms of the FMA instructions. llvm-svn: 276615	2016-07-25 07:20:35 +00:00
Craig Topper	318e40b6f7	[AVX512] Add some additional patterns so that we can fold broadcast loads in the first argument of an FMADD/FMSUB/FNMADD/FNMSUB/FMADDSUB/FMSUBADD node. Also add patterns to support all combinations of the broadcast input and the preserved input for masked versions. llvm-svn: 276614	2016-07-25 07:20:31 +00:00
Craig Topper	6bcbf5338c	[AVX512] Cleanup FMA operand order in patterns to match the VEX versions and to really be 213, 231, and 132. llvm-svn: 276613	2016-07-25 07:20:28 +00:00
Sean Silva	fe5abd5e0c	Fix : Partial Inliner requires AssumptionCacheTracker The public InlineFunction utility assumes that the passed in InlineFunctionInfo has a valid AssumptionCacheTracker. Patch by River Riddle! Differential Revision: https://reviews.llvm.org/D22706 llvm-svn: 276609	2016-07-25 05:00:00 +00:00
David Majnemer	68623a0e9f	[GVNHoist] Merge metadata on hoisted instructions less conservatively We can combine metadata from multiple instructions intelligently for certain metadata nodes. llvm-svn: 276602	2016-07-25 02:21:25 +00:00
David Majnemer	4728569d0a	[GVNHoist] Properly merge alignments when hoisting If we two loads of two different alignments, we must use the minimum of the two alignments when hoisting. Same deal for stores. For allocas, use the maximum of the two allocas. llvm-svn: 276601	2016-07-25 02:21:23 +00:00
Simon Pilgrim	a6878bdc0f	[X86][SSE] Added PR27854 tests llvm-svn: 276571	2016-07-24 16:39:50 +00:00
Simon Pilgrim	b7d75fee74	[X86] Add shift double tests for PR14593 llvm-svn: 276570	2016-07-24 16:10:21 +00:00
Simon Pilgrim	381a0ade5a	[X86] Add 'FeatureSlowSHLD' to cpu 'bdver4' As with all AMD CPUs, excavator has poor SHLD/SHRD performance. Also added bdver3 to the test as it was missing. llvm-svn: 276569	2016-07-24 16:00:53 +00:00
Simon Pilgrim	b2df2dc298	[X86] Add SHRD shift combine tests llvm-svn: 276568	2016-07-24 15:47:44 +00:00
Simon Pilgrim	0ededcb344	[X86] Regenerate shift by parts tests llvm-svn: 276567	2016-07-24 15:38:51 +00:00
Simon Pilgrim	114205076d	[X86][SSE] Regenerate shifts tests llvm-svn: 276566	2016-07-24 15:25:36 +00:00
Simon Pilgrim	30a7cc2e1f	[X86][SSE] Regenerate SSE copysign tests llvm-svn: 276565	2016-07-24 15:17:50 +00:00
Simon Pilgrim	7f1c1db73c	[X86][AVX512VL] Added AVX512VL half2float vector conversions tests to demonstrate PR23941 llvm-svn: 276563	2016-07-24 13:01:51 +00:00
Craig Topper	2dca3b287b	[X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions. This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions. llvm-svn: 276559	2016-07-24 08:26:38 +00:00
Elena Demikhovsky	376a18bd92	[Loop Vectorizer] Handling loops FP induction variables. Allowed loop vectorization with secondary FP IVs. Like this: float *A; float x = init; for (int i=0; i < N; ++i) { A[i] = x; x -= fp_inc; } The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute. Differential Revision: https://reviews.llvm.org/D21330 llvm-svn: 276554	2016-07-24 07:24:54 +00:00
Simon Pilgrim	9e99dd9e8b	[X86][SSE] Added float widened broadcast tests llvm-svn: 276535	2016-07-23 21:24:02 +00:00
Simon Pilgrim	8d18969716	[X86][SSE] Added more widened broadcast tests Added more vXi16 and vXi8 tests llvm-svn: 276534	2016-07-23 21:15:31 +00:00
Simon Pilgrim	b9e47a8cd0	[X86][SSE] Added tests where we should be trying to widen a load+splat into a broadcast llvm-svn: 276527	2016-07-23 16:19:17 +00:00
Simon Pilgrim	8aa6f34455	[X86][SSE] Regenerated uitofp <2 x i32> -> <2 x float> conversion tests Demonstrate difference in codegen discussed on PR14760 llvm-svn: 276526	2016-07-23 15:55:42 +00:00
Sanjay Patel	1271bf9178	[InstCombine] allow icmp (bit-manipulation-intrinsic(), C) folds for vectors llvm-svn: 276523	2016-07-23 13:06:49 +00:00
Craig Topper	b6519db90d	[AVX512] Implement commuting support for EVEX encoded FMA3 instructions. llvm-svn: 276521	2016-07-23 07:16:56 +00:00
Xinliang David Li	9239245401	[Profile] Use explicit flag to enable IR PGO Patch by Jake VanAdrighem Differential Revision: http://reviews.llvm.org/D22607 llvm-svn: 276516	2016-07-23 04:28:52 +00:00
Sanjoy Das	a7d9ec8751	[SCEV] Make isImpliedCondOperandsViaRanges smarter This change lets us prove things like "{X,+,10} s< 5000" implies "{X+7,+,10} does not sign overflow" It does this by replacing replacing getConstantDifference by computeConstantDifference (which is smarter) in isImpliedCondOperandsViaRanges. llvm-svn: 276505	2016-07-23 00:54:36 +00:00
Sanjay Patel	8d8594acb9	auto-generate checks llvm-svn: 276501	2016-07-23 00:09:54 +00:00
Tom Stellard	b8253c88b6	Revert "[AMDGPU] Emit read-only data to .rodata for hsa" This reverts commit r276298. Data stored in .rodata can have a negative offset from .text, but we don't support negative values in relocations yet. This caused a regression in one of the amp conformance tests: 5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01 llvm-svn: 276498	2016-07-22 23:46:40 +00:00
Adam Nemet	9e6e63fba2	[LoopDataPrefetch] Include hotness of region in opt remark llvm-svn: 276488	2016-07-22 22:53:17 +00:00
Sanjay Patel	e063ddb347	add tests for icmp vector folds llvm-svn: 276482	2016-07-22 22:19:52 +00:00
Tim Northover	98a56eb7f4	GlobalISel: allow multiple types on MachineInstrs. llvm-svn: 276481	2016-07-22 22:13:36 +00:00
Vitaly Buka	e3a032a740	Unpoison stack before resume instruction Summary: Clang inserts cleanup code before resume similar way as before return instruction. This makes asan poison local variables causing false use-after-scope reports. __asan_handle_no_return does not help here as it was executed before llvm.lifetime.end inserted into resume block. To avoid false report we need to unpoison stack for resume same way as for return. PR27453 Reviewers: kcc, eugenis Differential Revision: https://reviews.llvm.org/D22661 llvm-svn: 276480	2016-07-22 22:04:38 +00:00
Alina Sbirlea	ba21ffebff	Add flag to PassManagerBuilder to disable GVN Hoist Pass. Summary: Adding a flag to diable GVN Hoisting by default. Note: The GVN Hoist Pass causes some Halide tests to hang. Halide will disable the pass while investigating. Reviewers: llvm-commits, chandlerc, spop, dberlin Subscribers: mehdi_amini Differential Revision: https://reviews.llvm.org/D22639 llvm-svn: 276479	2016-07-22 22:02:19 +00:00
Michael Kuperstein	38e7298093	[SLPVectorizer] Vectorize reverse-order loads in horizontal reductions When vectorizing a tree rooted at a store bundle, we currently try to sort the stores before building the tree, so that the stores can be vectorized. For other trees, the order of the root bundle - which determines the order of all other bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads happens to appear in the wrong order, we will not vectorize it. This is partially mitigated when the root is a binary operator, by trying to build a "reversed" tree when that's considered profitable. This patch extends the workaround we have for binops to trees rooted in a horizontal reduction. This fixes PR28474. Differential Revision: https://reviews.llvm.org/D22554 llvm-svn: 276477	2016-07-22 21:28:48 +00:00
Sanjay Patel	cbc4377af1	add tests for icmp vector folds llvm-svn: 276476	2016-07-22 21:28:20 +00:00
Sanjay Patel	97e61dcc2d	add tests for icmp vector folds llvm-svn: 276475	2016-07-22 21:13:08 +00:00
Sanjay Patel	b73d7aed71	add tests for icmp vector folds llvm-svn: 276472	2016-07-22 21:02:33 +00:00
Vedant Kumar	127d0502a0	[llvm-cov] Don't copy stylesheets into index files Just link in the stylesheet from the toplevel dir of the report. llvm-svn: 276468	2016-07-22 20:49:23 +00:00
Sanjay Patel	859278005d	update to use FileCheck and auto-generate checks llvm-svn: 276466	2016-07-22 20:39:07 +00:00
Sanjay Patel	296a776a5b	add tests for icmp vector folds llvm-svn: 276464	2016-07-22 20:11:08 +00:00
Tim Northover	33b07d6725	GlobalISel: implement legalization pass, with just one transformation. This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461	2016-07-22 20:03:43 +00:00
Teresa Johnson	f432c9cefa	[ThinLTO/gold] Remove thin archive part of new test due to bot failures I am getting a bot failure from the thin archive part of this test: From http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/40468/steps/test_llvm/logs/LLVM%20%3A%3A%20tools__gold__X86__thinlto_emit_linked_objects.ll: Command Output (stderr): -- /home/bb/cmake-llvm-x86_64-linux/build/./bin/llvm-ar: creating /home/bb/cmake-llvm-x86_64-linux/build/test/tools/gold/X86/Output/thinlto_emit_linked_objects.ll.tmp2.a /usr/bin/ld.gold: internal error in add_writer, at ../../gold/token.h:124 -- This appears to be an issue with an older version of gold. The test case passes for me locally when I use the gold v1.12 I was testing with, but when I tried the gold installed on my system which is v1.11 I get the same error. Remove the thin archive version of the test, since there isn't a way to predicate it on gold version. llvm-svn: 276453	2016-07-22 18:32:30 +00:00
Jun Bum Lim	6a7dc5c430	Recommit - [DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals Recommiting r275571 after fixing crash reported in PR28270. Now we erase elements of IOL in deleteDeadInstruction(). Original Summary: This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics. Add test cases which was missing opportunities before. llvm-svn: 276452	2016-07-22 18:27:24 +00:00
Sanjay Patel	beaea95a0d	add tests for vector bit manipulation intrinsics llvm-svn: 276451	2016-07-22 18:22:25 +00:00
Teresa Johnson	1e2708c9e0	[ThinLTO/gold] Support for getting list of included objects from gold Summary: In the distributed backend case, the ThinLink step and the final native object link are separate processes. This can be problematic when archive libraries are involved in the link (e.g. via --start-lib/--end-lib pairs). The linker only includes objects from libraries when there is a strong reference to them, and depending on the intervening ThinLTO backend processes' importing/inlining, the strong references may appear different in the two link steps. See D22356 and D22467 for two scenarios where this causes issues. To ensure that the final link includes the same objects, this patch adds support for an "=filename" form of the thinlto-index-only plugin option, in which case objects gold included in the link are emitted to the given filename. This should be used as input to the final link (e.g. via the @filename option to gold), instead of listing all the objects within --start-lib/--end-lib pairs again. Note that the support for the gold callback that identifies included objects was added in gold version 1.12. Reviewers: davidxl, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D22677 llvm-svn: 276450	2016-07-22 18:20:22 +00:00
Wei Mi	e04d0eff29	[PM] Port BreakCriticalEdges to the new PM. Differential Revision: https://reviews.llvm.org/D22688 llvm-svn: 276449	2016-07-22 18:04:25 +00:00
Anna Thomas	0be4a0e6a4	Invariant start/end intrinsics overloaded for address space Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: apilipenko, reames Subscribers: llvm-commits llvm-svn: 276447	2016-07-22 17:49:40 +00:00
Matt Arsenault	e2fe67b951	AMDGPU: Remove redundant test llvm-svn: 276439	2016-07-22 17:01:36 +00:00
Matt Arsenault	3c07c813c0	AMDGPU: Fix groupstaticsize for large LDS The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438	2016-07-22 17:01:33 +00:00
Matt Arsenault	8d718dcfda	AMDGPU: Add HSA dispatch id intrinsic llvm-svn: 276437	2016-07-22 17:01:30 +00:00
Matt Arsenault	7fb961f3e6	AMDGPU: Fix i1 fp_to_int R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435	2016-07-22 17:01:21 +00:00
Tim Northover	bd5054602e	GlobalISel: implement alloca instruction llvm-svn: 276433	2016-07-22 16:59:52 +00:00
Simon Pilgrim	820f87a72d	[SelectionDAG] Optimization of BITREVERSE legalization for power-of-2 integer scalar/vector types An extension of D19978, this patch replaces the default BITREVERSE evaluation of individual bit masks+shifts with block mask+shifts when we have integer elements of power-of-2 bits in size. After calling BSWAP to reverse the order of the constituent bytes (which typically follows a similar approach), every neighbouring 4-bits, 2-bits and finally 1-bit pairs are masked off and swapped over with shifts. In doing so we can significantly reduce the number of operations required. Differential Revision: https://reviews.llvm.org/D21578 llvm-svn: 276432	2016-07-22 16:46:25 +00:00
Krzysztof Parzyszek	d3d0a4bda3	[Hexagon] Use loop data prefetch on Hexagon llvm-svn: 276422	2016-07-22 14:22:43 +00:00
Simon Pilgrim	ea0d4f9962	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied) As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276416	2016-07-22 13:58:44 +00:00
Ahmed Bougacha	29333c9de6	[FastISel] Ignore @llvm.assume. llvm-svn: 276410	2016-07-22 12:54:53 +00:00
Ying Yi	e59ee43cf1	[llvm-cov] - Add the coverage of lines in the summary report. The llvm-cov ‘report' command displays a summary of the coverage of a binary file. The summary report currently only includes covered regions and covered functions. This patch adds the coverage of lines in the summary report. Differential Revision: https://reviews.llvm.org/D22569 llvm-svn: 276409	2016-07-22 12:46:13 +00:00
Benjamin Kramer	a81f4728f3	[llvm-profdata] Bring back reading profile data from STDIN. This feature was lost in r276197. llvm-svn: 276407	2016-07-22 12:39:55 +00:00
Benjamin Kramer	5ba0e20315	Revert "[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128" It caused PR28657. This reverts commit r276281. llvm-svn: 276405	2016-07-22 11:03:10 +00:00
Ying Yi	60a3da3f4c	[llvm-cov] - Improve llvm-cov error message Summary: When giving the following command: % llvm-cov report -instr-profile=default.profraw llvm-cov will give the following error message: >llvm-cov report: Not enough positional command line arguments specified! >Must specify at least 1 positional arguments: See: orbis-llvm-cov report -help This patch changes the error message from '1 positional arguments' to '1 positional argument'. Differential Revision: https://reviews.llvm.org/D22621 llvm-svn: 276404	2016-07-22 10:52:21 +00:00
Hrvoje Varga	2db00ce4b6	[mips][microMIPS] Implement SLT, SLTI, SLTIU, SLTU microMIPS32r6 instructions Differential Revision: https://reviews.llvm.org/D19906 llvm-svn: 276397	2016-07-22 07:18:33 +00:00
Craig Topper	52e2e8381b	[AVX512] Add ExeDomain to vector extend and truncate instructions. llvm-svn: 276394	2016-07-22 05:46:44 +00:00
Craig Topper	f4151bea72	[AVX512] Add initial support for the Execution Domain fixing pass to change some EVEX instructions. llvm-svn: 276393	2016-07-22 05:00:52 +00:00
Craig Topper	0b90756b0a	[AVX512] Add load folding for some AVX512VL logic and arithmetic instructions. llvm-svn: 276391	2016-07-22 05:00:39 +00:00
Craig Topper	ab13b33ded	[AVX512] Update X86InstrInfo::foldMemoryOperandCustom to handle the EVEX encoded instructions too. llvm-svn: 276390	2016-07-22 05:00:35 +00:00
David Majnemer	522a91181a	Don't remove side effecting instructions due to ConstantFoldInstruction Just because we can constant fold the result of an instruction does not imply that we can delete the instruction. It may have side effects. This fixes PR28655. llvm-svn: 276389	2016-07-22 04:54:44 +00:00
Vitaly Buka	53054a7024	Fix detection of stack-use-after scope for char arrays. Summary: Clang inserts GetElementPtrInst so findAllocaForValue was not able to find allocas. PR27453 Reviewers: kcc, eugenis Differential Revision: https://reviews.llvm.org/D22657 llvm-svn: 276374	2016-07-22 00:56:17 +00:00
Sanjoy Das	aae623f4c2	[IRCE] Don't misuse CHECK-LABEL; NFC llvm-svn: 276373	2016-07-22 00:41:02 +00:00
Sanjoy Das	bb969791b4	[IRCE] Add an option to skip profitability checks If `-irce-skip-profitability-checks` is passed in, IRCE will kick in in all cases where it is legal for it to kick in. This flag is intended to help diagnose and analyse performance issues. llvm-svn: 276372	2016-07-22 00:40:56 +00:00
Vedant Kumar	fa522ca3b3	[llvm-cov] Strengthen a test case Check that stylesheets work when we're not using -output-dir. llvm-svn: 276363	2016-07-21 23:31:26 +00:00
Vedant Kumar	c076c49076	[llvm-cov] Use relative paths to the stylesheet (for html reports) This makes it easy to swap out the default stylesheet for a custom one. It also shaves ~6.62 MB out of the report directory for a full coverage build of llvm+clang. While we're at it, prune the CSS and add tests for it. llvm-svn: 276359	2016-07-21 23:26:15 +00:00
Sebastian Pop	31fd506623	GVH-hoist: only clone GEPs (PR28606) Do not clone stored values unless they are GEPs that are special cased to avoid hoisting them without hoisting their associated ld/st. Differential revision: https://reviews.llvm.org/D22652 llvm-svn: 276358	2016-07-21 23:22:10 +00:00
Wei Mi	1cf58f8996	[PM] Port NaryReassociate to the new PM Differential Revision: https://reviews.llvm.org/D22648 llvm-svn: 276349	2016-07-21 22:28:52 +00:00
Quentin Colombet	ecd81a3d1b	[MIRTesting] Abort when failing to parse a function. When we failed to parse a function in the mir parser, we should abort the whole compilation instead of continuing in a weird state. Indeed, this was creating strange machine function passes failures that were hard to understand, until we notice that the function actually did not get parsed correctly! llvm-svn: 276348	2016-07-21 22:25:57 +00:00
Michael Kuperstein	c523333bbf	[X86] Do not use AND8ri8 in AVX512 pattern This variant is (as documented in the TD) for disassembler use only, and should not be used in patterns - it is longer, and is broken on 64-bit. llvm-svn: 276347	2016-07-21 22:24:08 +00:00
Sanjay Patel	e9fc79bb13	[InstSimplify] don't crash handling a pointer or aggregate type llvm-svn: 276345	2016-07-21 21:56:00 +00:00
Akira Hatanaka	b8d2873d93	[AArch64][Inline-Asm] Return the 32-bit floating point register class when constraint "w" is used on a 32-bit operand. This enables compiling the following code, which used to error out in the backend: void foo1(int a) { asm volatile ("sqxtn h0, %s0\n" : : "w"(a):); } Fixes PR28633. llvm-svn: 276344	2016-07-21 21:39:05 +00:00
Sanjay Patel	a3bfb4e313	[InstSimplify] recognize trunc + icmp sgt/slt variants of select simplifications (PR28466) rL245171 exposed a hole in InstSimplify that manifested in a strange way in PR28466: https://llvm.org/bugs/show_bug.cgi?id=28466 It's possible to use trunc + icmp sgt/slt in place of an and + icmp eq/ne, so we need to recognize that pattern to eliminate selects that are choosing between some value and some bitmasked version of that value. Note that there is significant room for improvement (refactoring) and enhancement (more patterns, possibly in InstCombine rather than here). Differential Revision: https://reviews.llvm.org/D22537 llvm-svn: 276341	2016-07-21 21:26:45 +00:00
Adam Nemet	84a6425d61	[OptDiag,LDist] Convert remaining opt remarks to use the new API llvm-svn: 276340	2016-07-21 21:21:34 +00:00
Matthew Simpson	102729cf1b	[LV] Move vector int induction update to end of latch This patch moves the update instruction for vectorized integer induction phi nodes to the end of the latch block. This ensures consistent placement of all induction updates across all the kinds of int inductions we create (scalar, splat vector, or vector phi). Differential Revision: https://reviews.llvm.org/D22416 llvm-svn: 276339	2016-07-21 21:20:15 +00:00
Sanjay Patel	9eec550a2b	add vector tests and a simpler version of the negative tests llvm-svn: 276328	2016-07-21 20:11:08 +00:00
Anna Thomas	c858faa244	Revert "Invariant start/end intrinsics overloaded for address space" This reverts commit r276316. llvm-svn: 276320	2016-07-21 19:06:28 +00:00
Anna Thomas	29b24dfe44	Invariant start/end intrinsics overloaded for address space Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: tstellarAMD, reames, apilipenko Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22519 llvm-svn: 276316	2016-07-21 18:41:44 +00:00
Quentin Colombet	2b59eab79f	[IRTranslator] Add G_SUB opcode. This commit adds a generic SUB opcode to global-isel. llvm-svn: 276308	2016-07-21 17:26:50 +00:00
Konstantin Zhuravlyov	3c0d8d22fe	[AMDGPU] Emit read-only data to .rodata for hsa Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298	2016-07-21 15:59:23 +00:00
Quentin Colombet	7bcc921dd8	[IRTranslator] Add G_AND opcode. This commit adds a generic AND opcode to global-isel. llvm-svn: 276297	2016-07-21 15:50:42 +00:00
Konstantin Zhuravlyov	155626238b	AMDGPU/SI: Add support for R_AMDGPU_ABS32 Differential Revision: https://reviews.llvm.org/D21646 llvm-svn: 276294	2016-07-21 15:29:19 +00:00
Geoff Berry	4ff2e36d32	[AArch64] Load/store opt: Don't count transient instructions towards search limits. Summary: This change also changes findMatchingInsn and findMatchingUpdateInsnForward to take DBG_VALUE opcodes into account when tracking register defs and uses, which could potentially inhibit these optimizations in the presence of debug information. Reviewers: mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22582 llvm-svn: 276293	2016-07-21 15:20:25 +00:00
Simon Pilgrim	88e0940d3b	[X86][SSE] Allow folding of store/zext with PEXTRW of 0'th element Under normal circumstances we prefer the higher performance MOVD to extract the 0'th element of a v8i16 vector instead of PEXTRW. But as detailed on PR27265, this prevents the SSE41 implementation of PEXTRW from folding the store of the 0'th element. Additionally it prevents us from making use of the fact that the (SSE2) reg-reg version of PEXTRW implicitly zero-extends the i16 element to the i32/i64 destination register. This patch only preferentially lowers to MOVD if we will not be zero-extending the extracted i16, nor prevent a store from being folded (on SSSE41). Fix for PR27265. Differential Revision: https://reviews.llvm.org/D22509 llvm-svn: 276289	2016-07-21 14:54:17 +00:00
Simon Pilgrim	4caefdf834	Fixed line endings llvm-svn: 276287	2016-07-21 14:36:41 +00:00
Simon Pilgrim	c8e20b1150	[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector. This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match. We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts). Differential Revision: https://reviews.llvm.org/D22460 llvm-svn: 276281	2016-07-21 14:10:54 +00:00
Marina Yatsina	c1fa163392	ExecutionDepsFix - Fix bug in clearance calculation The clearance calculation did not take into account registers defined as outputs or clobbers in inline assembly machine instructions because these register defs are implicit. Differential Revision: http://reviews.llvm.org/D22580 llvm-svn: 276266	2016-07-21 12:37:07 +00:00
Matt Arsenault	f0ba86a4d5	AMDGPU: Fix phis from blocks split due to register indexing llvm-svn: 276257	2016-07-21 09:40:57 +00:00
David Majnemer	825e4ab9e3	[GVNHoist] Preserve optimization hints which agree If we have optimization hints with agree with each other along different paths, preserve them. llvm-svn: 276248	2016-07-21 07:16:26 +00:00
David Majnemer	4808f26422	[GVNHoist] Don't wrongly preserve TBAA We hoisted loads/stores without taking into account which can cause miscompiles. llvm-svn: 276240	2016-07-21 05:59:53 +00:00
Matthias Braun	d9fdad72ae	IPRA: Fix RegMask calculation for alias registers This patch fixes a very subtle bug in regmask calculation. Thanks to zan jyu Wong <zyfwong@gmail.com> for bringing this to notice. For example if CL is only clobbered than CH should not be marked clobbered but CX, RCX and ECX should be mark clobbered. Previously for each modified register all of its aliases are marked clobbered by markRegClobbred() in RegUsageInfoCollector.cpp but that is wrong because when CL is clobbered then MRI::isPhysRegModified() will return true for CL, CX, ECX, RCX which is correct behavior but then for CX, EXC, RCX we mark CH also clobbered as CH is aliased to CX,ECX,RCX so markRegClobbred() is not required because isPhysRegModified already take cares of proper aliasing register. A very simple test case has been added to verify this change. Please find relevant bug report here : http://llvm.org/PR28567 Patch by Vivek Pandya <vivekvpandya@gmail.com> Differential Revision: https://reviews.llvm.org/D22400 llvm-svn: 276235	2016-07-21 03:50:39 +00:00
Adam Nemet	7cfd5971ab	[OptDiag,LV] Add hotness attribute to applied-optimization remarks Test coverage is provided by modifying the function in the FP-math testcase that we are allowed to vectorize. llvm-svn: 276223	2016-07-21 01:07:13 +00:00
Sanjay Patel	0753c06d9c	[InstCombine] LogicOpc (zext X), C --> zext (LogicOpc X, C) (PR28476) The benefits of this change include: 1. Remove DeMorgan-matching code that was added specifically to work-around the missing transform in http://reviews.llvm.org/rL248634. 2. Makes the DeMorgan transform work for vectors too. 3. Fix PR28476: https://llvm.org/bugs/show_bug.cgi?id=28476 Extending this transform to other casts and other associative operators may be useful too. See https://reviews.llvm.org/D22421 for a prerequisite for doing that though. Differential Revision: https://reviews.llvm.org/D22271 llvm-svn: 276221	2016-07-21 00:24:18 +00:00
Adam Nemet	0e0e2d5d26	[OptDiag,LV] Add hotness attribute to the derived analysis remarks This includes FPCompute and Aliasing. Testcase is based on no_fpmath.ll. llvm-svn: 276211	2016-07-20 23:50:32 +00:00
Sanjay Patel	5f3c70307d	[InstSimplify][InstCombine] don't crash when folding vector selects of icmp Differential Revision: https://reviews.llvm.org/D22602 llvm-svn: 276209	2016-07-20 23:40:01 +00:00
Xinliang David Li	fb64ebe313	Fix test failure on Win llvm-svn: 276202	2016-07-20 22:53:39 +00:00
Xinliang David Li	9a1bfcfa16	Reapply r276185 Fix the test case that should not depend on dir iteration order. llvm-svn: 276197	2016-07-20 22:24:52 +00:00
Justin Lebar	cd564c6b46	[NVPTX] Enable the load-store vectorizer on nvptx. Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196	2016-07-20 22:11:36 +00:00
Xinliang David Li	ce3f385eeb	Revert r276185 -- build bot failure llvm-svn: 276194	2016-07-20 21:50:38 +00:00
Adam Nemet	5b3a5cf6b0	[OptDiag,LV] Add hotness attribute to analysis remarks The earlier change added hotness attribute to missed-optimization remarks. This follows up with the analysis remarks (the ones explaining the reason for the missed optimization). llvm-svn: 276192	2016-07-20 21:44:26 +00:00
Artem Belevich	7e9c9a6582	[NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC. After r276153 the pass applies to both kernels and regular functions. Differential Revision: https://reviews.llvm.org/D22583 llvm-svn: 276189	2016-07-20 21:44:07 +00:00
Xinliang David Li	d0b867e3e5	[Profile] support directory reading in profile merging Differential Revision: http://reviews.llvm.org/D22560 llvm-svn: 276185	2016-07-20 21:31:29 +00:00
Ahmed Bougacha	a0cdd79070	[AArch64][FastISel] Select -O0 legal cmpxchg. At -O0, cmpxchg survives AtomicExpand: it's mostly straightforward to select it in fast-isel, and let the pseudo be expanded later. extractvalues on the result are the tricky part: the generic logic only works for legal types (and it would be painful to make it support illegal types), so we can only support i32/i64 cmpxchg. llvm-svn: 276183	2016-07-20 21:12:32 +00:00
Ahmed Bougacha	b0674d1143	[AArch64][FastISel] Select atomic stores into STLR. llvm-svn: 276182	2016-07-20 21:12:27 +00:00
David Majnemer	bd21012c6c	[GVNHoist] Don't hoist PHI nodes We hoisted PHIs without respecting their special insertion point in the block, leading to verfier errors. This fixes PR28626. llvm-svn: 276181	2016-07-20 21:05:01 +00:00
Davide Italiano	15ff2d6d0c	[SCCP] Zap multiple return values. We can replace the return values with undef if we replaced all the call uses with a constant/undef. Differential Revision: https://reviews.llvm.org/D22336 llvm-svn: 276174	2016-07-20 20:17:13 +00:00
Justin Lebar	a272c12b73	[LSV] Don't move stores across may-load instrs, and loosen restrictions on moving loads. Summary: Previously we wouldn't move loads/stores across instructions that had side-effects, where that was defined as may-write or may-throw. But this is not sufficiently restrictive: Stores can't safely be moved across instructions that may load. This patch also adds a DEBUG check that all instructions in our chain are either loads or stores. Reviewers: asbirlea Subscribers: llvm-commits, jholewinski, arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D22547 llvm-svn: 276171	2016-07-20 20:07:37 +00:00
Justin Lebar	62b03e344e	[LSV] Vectorize up to side-effecting instructions. Summary: Previously if we had a chain that contained a side-effecting instruction, we wouldn't vectorize it at all. Now we'll vectorize everything that comes before the side-effecting instruction. Reviewers: asbirlea Subscribers: arsenm, jholewinski, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22536 llvm-svn: 276170	2016-07-20 20:07:34 +00:00
Rui Ueyama	d8388aaecb	[pdbdump] Use the "flow" style to print out a sequence of uint32_t. Summary: Lists can be written either with "-" or "[]" in YAML. Differential Revision: https://reviews.llvm.org/D22579 llvm-svn: 276168	2016-07-20 19:41:47 +00:00
Tim Northover	62ae568bbb	GlobalISel: implement low-level type with just size & vector lanes. This should be all the low-level instruction selection needs to determine how to implement an operation, with the remaining context taken from the opcode (e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math). llvm-svn: 276158	2016-07-20 19:09:30 +00:00
Artem Belevich	74158b5061	[NVPTX] deal with all aggregate return types. Fixes a crash in llvm_unreachable when a function has array return type. Differential Revision: https://reviews.llvm.org/D22524 llvm-svn: 276154	2016-07-20 18:39:52 +00:00
Artem Belevich	b2e76a5e7a	[NVPTX] Improve lowering of byval args of device functions. Avoid unnecessary spills of byval arguments of device functions to local space on SASS level and subsequent pointer conversion to generic address space that follows. Instead, make a local copy in IR, provide a way to access arguments directly, and let LLVM optimize the copy away when possible. Differential Review: https://reviews.llvm.org/D21421 llvm-svn: 276153	2016-07-20 18:39:47 +00:00
Sanjay Patel	c0812702f8	minimize tests and auto-generate checks llvm-svn: 276147	2016-07-20 17:58:20 +00:00
Wei Mi	481232e991	Fix test/Analysis/ScalarEvolution/scev-expander-existing-value-offset.ll for rL276136. The content in this testcase was accidentally duplicated. Fix the error. llvm-svn: 276139	2016-07-20 16:54:58 +00:00
Wei Mi	db80c0c77f	Use ValueOffsetPair to enhance value reuse during SCEV expansion. In D12090, the ExprValueMap was added to reuse existing value during SCEV expansion. However, const folding and sext/zext distribution can make the reuse still difficult. A simplified case is: suppose we know S1 expands to V1 in ExprValueMap, and S1 = S2 + C_a S3 = S2 + C_b where C_a and C_b are different SCEVConstants. Then we'd like to expand S3 as V1 - C_a + C_b instead of expanding S2 literally. It is helpful when S2 is a complex SCEV expr and S2 has no entry in ExprValueMap, which is usually caused by the fact that S3 is generated from S1 after const folding. In order to do that, we represent ExprValueMap as a mapping from SCEV to ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a} into the ExprValueMap when we create SCEV for V1. When S3 is expanded, it will first expand S2 to V1 - C_a because of S2->{V1, C_a} in the map, then expand S3 to V1 - C_a + C_b. Differential Revision: https://reviews.llvm.org/D21313 llvm-svn: 276136	2016-07-20 16:40:33 +00:00
Matt Arsenault	f14db7a933	AMDGPU: Add missing test coverage for control flow breaks None of the current lit tests hit si_break handling. llvm-svn: 276129	2016-07-20 15:20:35 +00:00
Yaxun Liu	4b1d9f7f18	AMDGPU: Fix bug causing crash due to invalid opencl version metadata. Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119	2016-07-20 14:38:06 +00:00
Benjamin Kramer	b4d64cf27d	Revert "[InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))" Makes InstCombine infloop when compiling v8. This reverts commit r275989 and r276105. llvm-svn: 276106	2016-07-20 11:40:16 +00:00
Tobias Grosser	8c6201b49f	[InstCombine] Provide more test cases for cast-folding [NFC] Summary: In r275989 we enabled the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))`. Here we add more test cases to assure this folding works for all logical operations `and`/`or`/`xor`. Reviewers: grosser Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22561 Contributed-by: Matthias Reisinger llvm-svn: 276105	2016-07-20 11:24:27 +00:00
Simon Pilgrim	1b4f511aaa	[X86][SSE] Add cost model values for CTPOP of vectors This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104	2016-07-20 10:41:28 +00:00
Diana Picus	f345d40ae2	[ARM] Skip inline asm memory operands in DAGToDAGISel Retry r275776 (no changes, we suspect the issue was with another commit). The current logic for handling inline asm operands in DAGToDAGISel interprets the operands by looking for constants, which should represent the flags describing the kind of operand we're dealing with (immediate, memory, register def etc). The operands representing actual data are skipped only if they are non-const, with the exception of immediate operands which are skipped explicitly when a flag describing an immediate is found. The oversight is that memory operands may be const too (e.g. for device drivers reading a fixed address), so we should explicitly skip the operand following a flag describing a memory operand. If we don't, we risk interpreting that constant as a flag, which is definitely not intended. Fixes PR26038 Differential Revision: https://reviews.llvm.org/D22103 llvm-svn: 276101	2016-07-20 09:48:24 +00:00
David Majnemer	a75736087d	Forgot to add a test for r276008. llvm-svn: 276082	2016-07-20 04:13:05 +00:00
David Majnemer	5d26127752	Revert "Disable this-return argument forwarding on ARM/AArch64" Inference of the 'returned' attribute was fixed in r276008, lets try turning the backend support back on. This reverts commit r275677. llvm-svn: 276081	2016-07-20 04:13:01 +00:00
Adam Nemet	67c8929a2c	[LV] Add hotness attribute to missed-optimization remarks The new OptimizationRemarkEmitter analysis pass is hooked up to both new and old PM passes. llvm-svn: 276080	2016-07-20 04:03:43 +00:00
Michael Zolotukhin	6bc56d552a	Revert "Revert r275883 and r275891. They seem to cause PR28608." This reverts commit r276064, and thus reapplies r275891 and r275883 with a fix for PR28608. llvm-svn: 276077	2016-07-20 01:55:27 +00:00
Justin Lebar	6114b37838	[LSV] Don't assume that loads/stores appear in address order in the BB. Summary: getVectorizablePrefix previously didn't work properly in the face of aliasing loads/stores. It unwittingly assumed that the loads/stores appeared in the BB in address order. If they didn't, it would do the wrong thing. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22535 llvm-svn: 276072	2016-07-20 00:55:12 +00:00
Matthias Braun	5b9722d6c7	Revert "RegScavenging: Add scavengeRegisterBackwards()" Reverting this commit for now as it seems to be causing failures on test-suite tests on the clang-ppc64le-linux-lnt bot. This reverts commit r276044. llvm-svn: 276068	2016-07-20 00:21:32 +00:00
Sean Silva	554efb28d2	Revert r275883 and r275891. They seem to cause PR28608. Revert "[LoopSimplify] Update LCSSA after separating nested loops." This reverts commit r275891. Revert "[LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA form." This reverts commit r275883. llvm-svn: 276064	2016-07-19 23:54:29 +00:00
Sean Silva	e3c18a5ae8	[PM] Port LoopUnroll. We just set PreserveLCSSA to always true since we don't have an analogous method `mustPreserveAnalysisID(LCSSA)`. Also port LoopInfo verifier pass to test LoopUnrollPass. llvm-svn: 276063	2016-07-19 23:54:23 +00:00
Justin Lebar	8778c62629	[LSV] Insert stores at the right point. Summary: Previously, the insertion point for stores was the last instruction in Chain before calling getVectorizablePrefixEndIdx. Thus if getVectorizablePrefixEndIdx didn't return Chain.size(), we still would insert at the last instruction in Chain. This patch changes our internal API a bit in an attempt to make it less prone to this sort of error. As a result, we end up recalculating the Chain's boundary instructions, but I think worrying about the speed hit of this is a premature optimization right now. Reviewers: asbirlea, tstellarAMD Subscribers: mzolotukhin, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D22534 llvm-svn: 276056	2016-07-19 23:19:20 +00:00
Justin Lebar	d9446d3770	[LSV] Add detail to correct-order.ll test. Summary: This helps keep us honest -- there were a number of ways we could screw up and still have passed this test. Reviewers: asbirlea Subscribers: llvm-commits, arsenm Differential Revision: https://reviews.llvm.org/D22531 llvm-svn: 276053	2016-07-19 23:18:59 +00:00
Matt Arsenault	a1fe17c9ad	AMDGPU: Change fdiv lowering based on !fpmath metadata If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051	2016-07-19 23:16:53 +00:00
Paul Robinson	2d23c029f7	Make GVN Hoisting obey optnone/bisect. Differential Revision: http://reviews.llvm.org/D22545 llvm-svn: 276048	2016-07-19 22:57:14 +00:00
Matthias Braun	84fd4bee6c	RegScavenging: Add scavengeRegisterBackwards() This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 276044	2016-07-19 22:37:09 +00:00
Sanjay Patel	d4ea94eb94	regenerate checks llvm-svn: 276042	2016-07-19 22:32:15 +00:00
Evandro Menezes	238fa76574	[AArch64] Properly validate the reciprocal estimation. Add check for legal data types when expanding into a Newton series. Differential Revision: https://reviews.llvm.org/D22267 llvm-svn: 276041	2016-07-19 22:31:11 +00:00
Sanjay Patel	2d477e59e8	[InstCombine] fold add(zext(xor X, C), C) --> sext X when C is INT_MIN in the source type The pattern may look more obviously like a sext if written as: define i32 @g(i16 %x) { %zext = zext i16 %x to i32 %xor = xor i32 %zext, 32768 %add = add i32 %xor, -32768 ret i32 %add } We already have that fold in visitAdd(). Differential Revision: https://reviews.llvm.org/D22477 llvm-svn: 276035	2016-07-19 22:09:34 +00:00
George Burgess IV	8b85321bae	[CFLAA] Make a test tell the truth. NFC. Dishonesty noted by Jia Chen. llvm-svn: 276028	2016-07-19 20:56:41 +00:00
George Burgess IV	3b059841ff	[CFLAA] Add some interproc. analysis to CFLAnders. This patch adds function summary support to CFLAnders. It also comes with a lot of tests! Woohoo! Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22450 llvm-svn: 276026	2016-07-19 20:47:15 +00:00
Kevin Enderby	6524bd8c00	Next step along the way to getting good error messages for bad archives. This step builds on Lang Hames work to change Archive::child_iterator for better interoperation with Error/Expected. Building on that it is now possible to return an error message when the size field of an archive contains non-decimal characters. llvm-svn: 276025	2016-07-19 20:47:07 +00:00
Sanjay Patel	47c04f9543	add even more missing tests for simplifySelectBitTest() llvm-svn: 276024	2016-07-19 20:47:00 +00:00
Vedant Kumar	57faf2d208	[tsan] Don't instrument __llvm_gcov_global_state_pred or __llvm_gcda* r274801 did not go far enough to allow gcov+tsan to cooperate. With this commit it's possible to run the following code without false positives: std::thread T1(fib), T2(fib); T1.join(); T2.join(); llvm-svn: 276015	2016-07-19 20:16:08 +00:00
Tim Northover	554fbd05e8	ARM: move feature for Thumb2 pkhbt/pkhtb onto architectures. There's not much functional change, but it really is an architectural feature (on v6T2, v7A, v7R and v7EM) rather than something each CPU implements individually. The main functional change is the default behaviour you get when specifying only "-triple". llvm-svn: 276013	2016-07-19 19:49:13 +00:00
Ahmed Bougacha	5a59b24bdd	[GlobalISel] Mark newly-created gvregs as having a bank. Also verify that we never try to set the size of a vreg associated to a register class. Report an error when we encounter that in MIR. Fix a testcase that hit that error and had a size for no reason. llvm-svn: 276012	2016-07-19 19:48:36 +00:00
David Majnemer	5246e0b2c2	[FunctionAttrs] Correct the safety analysis for inference of 'returned' We skipped over ReturnInsts which didn't return an argument which would lead us to incorrectly conclude that an argument returned by another ReturnInst was 'returned'. This reverts commit r275756. This fixes PR28610. llvm-svn: 276008	2016-07-19 18:50:26 +00:00
David Majnemer	07ea344222	Add a testcase for r275581 llvm-svn: 276002	2016-07-19 17:52:41 +00:00
Sanjay Patel	8b76ebe5b8	add tests related to PR28466 llvm-svn: 275995	2016-07-19 17:07:35 +00:00
Simon Pilgrim	5366d0e0bc	[X86][AVX512] Added AVX512 subvector broadcast tests llvm-svn: 275994	2016-07-19 17:04:28 +00:00
Simon Pilgrim	f2d02cb0f6	[X86][AVX] Fixed typo in test names llvm-svn: 275992	2016-07-19 16:52:05 +00:00
Sanjay Patel	d2ff6d727f	add missing test for simplifySelectBitTest() llvm-svn: 275990	2016-07-19 16:49:55 +00:00
Tobias Grosser	1c38262279	[InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp)) Summary: Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one: > Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp. Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check: `if ((!isa<ICmpInst>(Cast0Src) \|\| !isa<ICmpInst>(Cast1Src)) && ...` This check seems to sort out more cases than necessary because: - the reverse transformation is obviously done for `or` instructions only - and also not every `zext icmp` pair is necessarily the result of this reverse transformation Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`). As an example, consider the following IR snippet ``` %1 = icmp sgt i64 %a, %b %2 = zext i1 %1 to i8 %3 = icmp slt i64 %a, %c %4 = zext i1 %3 to i8 %5 = and i8 %2, %4 ``` which would now be transformed to ``` %1 = icmp sgt i64 %a, %b %2 = icmp slt i64 %a, %c %3 = and i1 %1, %2 %4 = zext i1 %3 to i8 ``` This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code. Reviewers: grosser, vtjnash, majnemer Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22511 Contributed-by: Matthias Reisinger llvm-svn: 275989	2016-07-19 16:39:17 +00:00
Simon Pilgrim	0ea8d275cc	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. A companion clang patch is at D22105 Differential Revision: https://reviews.llvm.org/D22106 llvm-svn: 275981	2016-07-19 15:07:43 +00:00
Sam Parker	6ca4bbb00d	[ARM] Refactor Thumb2 Mul and Mla instr descs Recommitting after r274347 was reverted. This patch introduces some classes to refactor the 3 and 4 register Thumb2 multiplication instruction descriptions, plus improved tests for some of those instructions. Differential Revision: https://reviews.llvm.org/D21929 llvm-svn: 275979	2016-07-19 14:44:05 +00:00
Peter Smith	cbcecca538	Add support for tlsldm assembler operator to ARM target The standard local dynamic model for TLS on ARM systems needs two relocations: - R_ARM_TLS_LDM32 (module idx) - R_ARM_TLS_LDO32 (offset of object from origin of module TLS block) In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to produce these relocations. llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm). This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32. TLS for ARM is defined in Addenda to, and Errata in, the ABI for the ARM Architecture Differential Revision: https://reviews.llvm.org/D22461 llvm-svn: 275977	2016-07-19 14:15:33 +00:00
Simon Pilgrim	b87a21f1c3	[AARCH64] Fix linu triple typo As promised in D22191 llvm-svn: 275976	2016-07-19 14:12:45 +00:00
Simon Pilgrim	fc4d4b251d	[AARCH64] Enable AARCH64 lit tests on windows dev machines As discussed on PR27654, this patch fixes the triples of a lot of aarch64 tests and enables lit tests on windows This will hopefully help stop cases where windows developers break the aarch64 target Differential Revision: https://reviews.llvm.org/D22191 llvm-svn: 275973	2016-07-19 13:35:11 +00:00
Daniel Sanders	3878412875	[mips][ias] R_MIPS_GOT_(PAGE\|OFST) do not need symbols Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: https://reviews.llvm.org/D22458 llvm-svn: 275968	2016-07-19 10:58:06 +00:00
Daniel Sanders	6a73883c48	[mips] Correct label prefixes for N32 and N64. Summary: N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own ($). This fixes the majority of object differences between -fintegrated-as and -fno-integrated-as. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: https://reviews.llvm.org/D22412 llvm-svn: 275967	2016-07-19 10:49:03 +00:00
Elena Demikhovsky	2c0780b8e5	AVX-512: Fixed BT instruction selection. The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets. But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1). I added the new sequence (truncate (a >>n) to i1) to the BT pattern. Differential Revision: https://reviews.llvm.org/D22354 llvm-svn: 275950	2016-07-19 07:14:21 +00:00
Craig Topper	d6ca1dc45e	[AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions. llvm-svn: 275942	2016-07-19 02:00:38 +00:00
George Burgess IV	5f30897b7b	[MemorySSA] Update to the new shiny walker. This patch updates MemorySSA's use-optimizing walker to be more accurate and, in some cases, faster. Essentially, this changed our core walking algorithm from a cache-as-you-go DFS to an iteratively expanded DFS, with all of the caching happening at the end. Said expansion happens when we hit a Phi, P; we'll try to do the smallest amount of work possible to see if optimizing above that Phi is legal in the first place. If so, we'll expand the search to see if we can optimize to the next phi, etc. An iteratively expanded DFS lets us potentially quit earlier (because we don't assume that we can optimize above all phis) than our old walker. Additionally, because we don't cache as we go, we can now optimize above loops. As an added bonus, this patch adds a ton of verification (if EXPENSIVE_CHECKS are enabled), so finding bugs is easier. Differential Revision: https://reviews.llvm.org/D21777 llvm-svn: 275940	2016-07-19 01:29:15 +00:00
Vedant Kumar	e3a0bf5048	Retry: [llvm-profdata] Speed up merging by using a thread pool Add a "-j" option to llvm-profdata to control the number of threads used. Auto-detect NumThreads when it isn't specified, and avoid spawning threads when they wouldn't be beneficial. I tested this patch using a raw profile produced by clang (147MB). Here is the time taken to merge 4 copies together on my laptop: No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total Changes since the initial commit: - When handling odd-length inputs, call ThreadPool::wait() before merging the last profile. Should fix a race/off-by-one (see r275937). Differential Revision: https://reviews.llvm.org/D22438 llvm-svn: 275938	2016-07-19 01:17:20 +00:00
Vedant Kumar	21ab20e005	Revert "[llvm-profdata] Speed up merging by using a thread pool" This reverts commit r275921. It broke the ppc64be bot: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/3537 I'm not sure why it broke, but based on the output, it looks like an off-by-one (one profile left un-merged). llvm-svn: 275937	2016-07-19 00:57:09 +00:00
Wei Mi	79997a24d7	Recommit the patch "Use uniforms set to populate VecValuesToIgnore". For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. llvm-svn: 275936	2016-07-19 00:50:43 +00:00
Matt Arsenault	cb540bc03c	AMDGPU: Expand register indexing pseudos in custom inserter This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934	2016-07-19 00:35:03 +00:00
Sanjoy Das	ab73c9d88e	[LoopReroll] Reroll loops with unordered atomic memory accesses Reviewers: hfinkel, jfb, reames Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22385 llvm-svn: 275932	2016-07-19 00:23:54 +00:00
Matt Arsenault	50b76399ed	AMDGPU: Fix test name and broken CHECK-LABEL llvm-svn: 275928	2016-07-18 23:09:51 +00:00
Vedant Kumar	0bd9907581	[llvm-profdata] Speed up merging by using a thread pool Add a "-j" option to llvm-profdata to control the number of threads used. Auto-detect NumThreads when it isn't specified, and avoid spawning threads when they wouldn't be beneficial. I tested this patch using a raw profile produced by clang (147MB). Here is the time taken to merge 4 copies together on my laptop: No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total Differential Revision: https://reviews.llvm.org/D22438 llvm-svn: 275921	2016-07-18 22:02:39 +00:00
Artem Belevich	9f97dcb018	[NVPTX] Make sure we adjust alignment at all call sites .. including calls from kernel functions that were ignored by mistake before. llvm-svn: 275920	2016-07-18 21:58:48 +00:00
Dehao Chen	6132ee8502	[PM] Convert Loop Strength Reduce pass to new PM Summary: Convert Loop String Reduce pass to new PM Reviewers: davidxl, silvas Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22468 llvm-svn: 275919	2016-07-18 21:41:50 +00:00
Teresa Johnson	2124157102	[PM] Port FunctionImport Pass to new PM Summary: Port FunctionImport Pass to new PM. Reviewers: mehdi_amini, davide Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D22475 llvm-svn: 275916	2016-07-18 21:22:24 +00:00
Wei Mi	f9afff71a2	Revert rL275912. llvm-svn: 275915	2016-07-18 21:14:43 +00:00
Wei Mi	1fd25726af	Use uniforms set to populate VecValuesToIgnore. For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 llvm-svn: 275912	2016-07-18 20:59:53 +00:00
Sanjay Patel	dbf44f5016	add tests for missed sext transform llvm-svn: 275908	2016-07-18 20:37:51 +00:00
Sanjay Patel	8a2bf3099f	auto-generate checks llvm-svn: 275899	2016-07-18 20:06:51 +00:00
Artem Belevich	052b1ed2fd	[NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions. Taking address of a byval variable in PTX is legal, but currently runs into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042). Work around the issue by enforcing minimum alignment on byval arguments of device functions. The change is a no-op on SASS level for sm_3x where ptxas already aligns local copy by at least 4. Differential Revision: https://reviews.llvm.org/D22428 llvm-svn: 275893	2016-07-18 19:54:56 +00:00
Michael Zolotukhin	ea5b72825b	[LoopSimplify] Update LCSSA after separating nested loops. Summary: Usually LCSSA survives this transformation, but in some cases (see attached test) it doesn't: values from the original loop after separating might be used from the outer loop. Before the transformation it was the same loop, so LCSSA phis were not required. This fixes PR28272. Reviewers: sanjoy, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21665 llvm-svn: 275891	2016-07-18 19:44:19 +00:00
Vitaly Buka	c93e10fcbb	Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel" Breaks asan, see https://reviews.llvm.org/D22103 This reverts commit r275776. llvm-svn: 275890	2016-07-18 19:44:01 +00:00
Vitaly Buka	fa474e3eb9	Revert "[ARM] Update test to use CHECK-LABEL. NFCI." Breaks asan, see https://reviews.llvm.org/D22103 This reverts commit r275777. llvm-svn: 275889	2016-07-18 19:43:58 +00:00
Michael Zolotukhin	7a3040dc83	[LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA form. Summary: SSAUpdate might insert PHI-nodes inside loops, which can break LCSSA form unless we fix it up. This fixes PR28424. Reviewers: sanjoy, chandlerc, hfinkel Subscribers: uabelho, llvm-commits Differential Revision: http://reviews.llvm.org/D21997 llvm-svn: 275883	2016-07-18 19:05:08 +00:00
Simon Pilgrim	069c732f82	[X86][SSE] Regenerate extraction from promotion test Added tests for SSE2 as well as SSE41 llvm-svn: 275878	2016-07-18 18:53:15 +00:00
Simon Pilgrim	a68b8df3a7	[X86][SSE] Regenerate extraction+store memop tests Added tests for SSE2 as well as SSE41+AVX llvm-svn: 275876	2016-07-18 18:44:01 +00:00
Simon Pilgrim	b21b47ba61	[X86][SSE] Regenerate truncate+extension memop tests Added tests for SSE2 as well as SSE41 llvm-svn: 275875	2016-07-18 18:42:33 +00:00
Simon Pilgrim	600baaed89	Regenerate test llvm-svn: 275872	2016-07-18 18:38:51 +00:00
Matt Arsenault	c96e1deffa	AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32 llvm-svn: 275871	2016-07-18 18:35:05 +00:00
Matt Arsenault	4c519d3518	AMDGPU/R600: Replace barrier intrinsics llvm-svn: 275870	2016-07-18 18:34:59 +00:00
Matt Arsenault	efb24540b1	AMDGPU: Remove dead check in AMDGPUPromoteAlloca This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869	2016-07-18 18:34:53 +00:00
Tim Northover	918f05063c	CodeGenPrep: use correct function to determine Global's alignment. Elsewhere (particularly computeKnownBits) we assume that a global will be aligned to the value returned by Value::getPointerAlignment. This is used to boost the alignment on memcpy/memset, so any target-specific request can only increase that value. llvm-svn: 275866	2016-07-18 18:28:52 +00:00
Vedant Kumar	2e0893629a	[llvm-cov] Place anchors around line numbers in html reports Based on a suggestion by Harlan Haskins! llvm-svn: 275840	2016-07-18 17:53:16 +00:00
Simon Pilgrim	c941f6b329	[X86][AVX] Add target shuffle decode support for VBROADCAST Currently we only decode broadcasts from a vector of the same size. llvm-svn: 275823	2016-07-18 17:32:59 +00:00
Krzysztof Parzyszek	5948ea78b9	[Hexagon] Handle returning small structures by value This is compliant with the official ABI, but allows experimentation with calling conventions. llvm-svn: 275822	2016-07-18 17:30:41 +00:00
Chih-Hung Hsieh	4d9f2c154d	[X86] Accept SELECT op code for x86-64 fp128 type DAGTypeLegalizer::CanSkipSoftenFloatOperand should allow SELECT op code for x86_64 fp128 type for MME targets, so SoftenFloatOperand does not abort on SELECT op code. Differential Revision: http://reviews.llvm.org/D21758 llvm-svn: 275818	2016-07-18 17:20:09 +00:00
Adam Nemet	d6ba0bf831	[LoopDist] This test does not require ASSERTS Only its counterpart, diagnostics-with-hotness-lazy-BFI.ll, which invokes opt with -debug-only=. llvm-svn: 275812	2016-07-18 16:37:32 +00:00
Adam Nemet	b2593f78ca	[LoopDist] Port to new PM Summary: The direct motivation for the port is to ensure that the OptRemarkEmitter tests work with the new PM. This remains a function pass because we not only create multiple loops but could also version the original loop. In the test I need to invoke opt with -passes='require<aa>,loop-distribute'. LoopDistribute does not directly depend on AA however LAA does. LAA uses getCachedResult so I think we need manually pull in 'aa'. Reviewers: davidxl, silvas Subscribers: sanjoy, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22437 llvm-svn: 275811	2016-07-18 16:29:27 +00:00
Simon Pilgrim	4ac7420618	[X86][AVX2] Added tests that demonstrate duplicate broadcasts We don't yet decode broadcasts as a target shuffle llvm-svn: 275808	2016-07-18 16:17:34 +00:00
Krzysztof Parzyszek	786333ffcc	[Hexagon] Enable .cur formation in MISched for Hexagon V60 Schedule a load and its use in the same packet in MISched. Previously, isResourceAvailable was returning false for dependences in the same packet, which prevented MISched from packetizing a load and its use in the same packet for v60. Patch by Ikhlas Ajbar. llvm-svn: 275804	2016-07-18 16:05:27 +00:00
Alexander Kornienko	63dd36faa5	Revert "r275571 [DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals" Causes https://llvm.org/bugs/show_bug.cgi?id=28588 llvm-svn: 275801	2016-07-18 15:51:31 +00:00
Nemanja Ivanovic	d3c284f645	[PowerPC] Remove redundant direct moves when extracting integers and converting to FP This patch corresponds to review: https://reviews.llvm.org/D21354 We use direct moves for extracting integer elements from vectors. We also use direct moves when converting integers to FP. When these operations are chained, we get a direct move out of a VSR followed by a direct move back into a VSR. These are redundant - all we need to do is line up the element and convert. llvm-svn: 275796	2016-07-18 15:30:00 +00:00
Nirav Dave	a645433c5f	[MC] Cleanup Error Handling in AsmParser Add parseToken and compatriot functions to stitch error checks in straight linear code. As part of this fix some erronous handling of directives where the EndOfStatement token either was not checked or Lexed on termination. Reviewers: rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22312 llvm-svn: 275795	2016-07-18 15:24:03 +00:00
Krzysztof Parzyszek	393b37937b	[Hexagon] Use timing class info as tie-breaker in machine scheduler Patch by Sirish Pande. llvm-svn: 275794	2016-07-18 15:17:10 +00:00
Krzysztof Parzyszek	3467e9d0a9	[Hexagon] HexagonMachineScheduler should account for resources The machine scheduler needs to account for available resources more accurately in order to avoid scheduling an instruction that forces a new packet to be created. This occurs in two ways: First, an instruction without an available resource may have a large priority due to other metrics and be scheduled when there are other instructions with available resources. Second, an instruction with a non-zero latency may become available prematurely. In both these cases, we attempt change the priority in order to allow a better instruction to be scheduled. Patch by Brendon Cahoon. llvm-svn: 275793	2016-07-18 14:52:13 +00:00
Krzysztof Parzyszek	748d3efec6	[Hexagon] Fix zero latency instructions with multiple predecessors An instruction may have multiple predecessors that are candidates for using .cur. However, only one of them can use .cur in the packet. When this case occurs, we need to make sure that only one of the dependences gets a 0 latency value. Patch by Brendon Cahoon. llvm-svn: 275790	2016-07-18 14:23:10 +00:00
Simon Pilgrim	1b2ab113fb	[SLPVectorizer][X86] Added sqrt vectorization tests llvm-svn: 275788	2016-07-18 13:20:54 +00:00
Simon Dardis	d32a2d30cb	[inlineasm] Propagate operand constraints to the backend When SelectionDAGISel transforms a node representing an inline asm block, memory constraint information is not preserved. This can cause constraints to be broken when a memory offset is of the form: offset + frame index when the frame is resolved. By propagating the constraints all the way to the backend, targets can enforce memory operands of inline assembly to conform to their constraints. For MIPSR6, some instructions had their offsets reduced to 9 bits from 16 bits such as ll/sc. This becomes problematic when using inline assembly to perform atomic operations, as an offset can generated that is too big to encode in the instruction. Reviewers: dsanders, vkalintris Differential Review: https://reviews.llvm.org/D21615 llvm-svn: 275786	2016-07-18 13:17:31 +00:00
Nicolai Haehnle	bef1ceb815	AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions. Summary: The work item intrinsics are not available for the shader calling conventions. And even if we did hook them up most shader stages haves some extra restrictions on the amount of available LDS. Reviewers: tstellarAMD, arsenm Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D20728 llvm-svn: 275779	2016-07-18 09:02:47 +00:00
Diana Picus	6731f13458	[ARM] Update test to use CHECK-LABEL. NFCI. llvm-svn: 275777	2016-07-18 07:48:42 +00:00
Diana Picus	73ed44d328	[ARM] Skip inline asm memory operands in DAGToDAGISel The current logic for handling inline asm operands in DAGToDAGISel interprets the operands by looking for constants, which should represent the flags describing the kind of operand we're dealing with (immediate, memory, register def etc). The operands representing actual data are skipped only if they are non-const, with the exception of immediate operands which are skipped explicitly when a flag describing an immediate is found. The oversight is that memory operands may be const too (e.g. for device drivers reading a fixed address), so we should explicitly skip the operand following a flag describing a memory operand. If we don't, we risk interpreting that constant as a flag, which is definitely not intended. Fixes PR26038 Differential Revision: https://reviews.llvm.org/D22103 llvm-svn: 275776	2016-07-18 07:35:14 +00:00
Craig Topper	a3c55f5915	[AVX512] Add EVEX versions of scalar ADD/SUB/MUL/DIV to load folding tables. llvm-svn: 275775	2016-07-18 06:49:32 +00:00
Craig Topper	83613bb436	[X86] Fix test checks to include leading 'v' on avx mnemonic names. llvm-svn: 275774	2016-07-18 06:49:29 +00:00
Diana Picus	774d157a5d	[ARM] Honour ABI for rem under -O0 for EABI, GNUEABI, Android and Musl At higher optimization levels, we generate the libcall for DIVREM_Ix, which is fine: aeabi_{u\|i}divmod. At -O0 we generate the one for REM_Ix, which is the default {u}mod{q\|h\|s\|d}i3. This commit makes sure that we don't generate REM_Ix calls for ABIs that don't support them (i.e. where we need to use DIVREM_Ix instead). This is achieved by bailing out of FastISel, which can't handle non-double multi-reg returns, and letting the legalization infrastructure expand the REM_Ix calls. It also updates the divmod-eabi.ll test to run under -O0 as well, and adds some Windows checks to it to make sure we don't break things for it. Fixes PR27068 Differential Revision: https://reviews.llvm.org/D21926 llvm-svn: 275773	2016-07-18 06:48:25 +00:00
Craig Topper	1af6cc00dc	[X86] Add VPADD instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275769	2016-07-18 06:14:54 +00:00
Craig Topper	ba9b93d7f2	[X86] Add floating point packed logical ops to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275768	2016-07-18 06:14:50 +00:00
Craig Topper	3a99de4067	[X86] Add AVX512 instructions to X86InstrInfo::isAssociativeAndCommutative. llvm-svn: 275767	2016-07-18 06:14:47 +00:00
Craig Topper	f7a06c29bc	[X86] Add AVX512 load opcodes and a couple AVX load opcodes to X86InstrInfo::areLoadsFromSameBasePtr. llvm-svn: 275765	2016-07-18 06:14:43 +00:00
Craig Topper	650a15e2b3	[X86] Add more opcodes to isFrameLoadOpcode/isFrameStoreOpcode. Mainly AVX-512 related. llvm-svn: 275764	2016-07-18 06:14:39 +00:00
Craig Topper	5c913e84df	[AVX512] Use VMOVAPSZ128rr/VMOVAPS256rr for VR128X/VR256X physreg moves when VLX is supported. Ideally we would use VEX encoded moves instead of EVEX if the high 16 registers aren't referenced, but this a good first step. llvm-svn: 275763	2016-07-18 06:14:34 +00:00
David Majnemer	04c7c225a1	[GVNHoist] Change the key for VNtoInsns to a pair While debugging GVNHoist, I found it confusing that the entries in a VNtoInsns were not always value numbers. They _usually_ were except for StoreInst in which case they were a hash of two different value numbers. This leads to two observations: - It is more difficult to debug things when the semantic contents of VNtoInsns changes over time. - Using a single value number is not much cheaper, the value of VNtoInsns is a SmallVector. - It is not immediately clear what the algorithm would do if there were hash collisions in the StoreInst case. Using a DenseMap of std::pair sidesteps all of this. N.B. The changes in the test were due their sensitivity to the iteration order of VNtoInsns which has changed. llvm-svn: 275761	2016-07-18 06:11:37 +00:00
Vedant Kumar	733f795947	[llvm-cov] Attempt to fix a test failure on Windows Don't make the test/tools/llvm-cov/demangle.test depend on the order in which symbols are seen, or on the exact formatting llvm-cov emits after a symbol is printed. This is an attempt to fix a Windows bot failure: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/9141 I don't know what the root cause of the failure is, or why the showTemplateInstantiations test doesn't fail in the same way on the Windows bots. However, this measure can't hurt, and it'll at least get me on the blamelists again. llvm-svn: 275758	2016-07-18 04:49:42 +00:00
NAKAMURA Takumi	966bde50c3	Revert r275678, "Revert "Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute"" This reverts also r275029, "Update Clang tests after adding inference for the returned argument attribute" It broke LTO build. Seems miscompilation. llvm-svn: 275756	2016-07-18 03:23:25 +00:00
Davide Italiano	4edd54794b	[GVN] Move other PRE tests to a subdirectory. llvm-svn: 275742	2016-07-17 23:55:20 +00:00
Davide Italiano	ed8e0881c1	[GVN] Move the PRE/LOADPRE test in a subdirectory. llvm-svn: 275741	2016-07-17 23:48:18 +00:00
Davide Italiano	6a69f829bd	[GVN] Use FileCheck instead of grep for tests. llvm-svn: 275739	2016-07-17 23:21:26 +00:00
Simon Pilgrim	47638635cc	[X86] Add CTPOP/CTLZ/CTTZ scalar cost tests llvm-svn: 275725	2016-07-17 18:29:19 +00:00
Simon Pilgrim	5aa90c55b6	[X86][AVX] Added VBROADCASTF128/VBROADCASTI128 tests llvm-svn: 275713	2016-07-17 17:44:18 +00:00
Simon Pilgrim	d1e941ae85	[X86] Regenerated ctlz/cttz scalar tests for 32/64-bit targets with/without LZCNT/TZCNT support llvm-svn: 275710	2016-07-17 16:15:51 +00:00
Simon Pilgrim	0bf66c9d62	[X86] Regenerated popcnt scalar tests for 32/64-bit targets with/without POPCNT support llvm-svn: 275709	2016-07-17 16:04:19 +00:00
Teresa Johnson	cd21a646f6	[ThinLTO] Perform profile-guided indirect call promotion Summary: To enable profile-guided indirect call promotion in ThinLTO mode, we simply add call graph edges for each profitable target from the profile to the summaries, then the summary-guided importing will consider the callee for importing as usual. Also we need to enable the indirect call promotion pass creation in the PassManagerBuilder when PerformThinLTO=true (we are in the ThinLTO backend), so that the newly imported functions are considered for promotion in the backends. The IC promotion profiles refer to callees by GUID, which required adding GUIDs to the per-module VST in bitcode (and assigning them valueIds similar to how they are assigned valueIds in the combined index). Reviewers: mehdi_amini, xur Subscribers: mehdi_amini, davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D21932 llvm-svn: 275707	2016-07-17 14:47:01 +00:00
Elena Demikhovsky	eaa356501d	X86: Updated a test file. NFC. This test shows subotimal code generated for AVX-512 vs PENTIUM4. The issue will be fixed in an upcomming commit. llvm-svn: 275702	2016-07-17 07:03:13 +00:00
Dehao Chen	1a44452b11	[PM] Convert IVUsers analysis to new pass manager. Summary: Convert IVUsers analysis to new pass manager. Reviewers: davidxl, silvas Subscribers: junbuml, sanjoy, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22434 llvm-svn: 275698	2016-07-16 22:51:33 +00:00
Sanjay Patel	79acd2a96b	[InstCombine] allow X + signbit --> X ^ signbit for vector splats llvm-svn: 275691	2016-07-16 18:29:26 +00:00
Sanjay Patel	040bd16e56	add vector test to show missing transform llvm-svn: 275690	2016-07-16 18:24:18 +00:00
Sanjay Patel	eb50476f77	update tests to use FileCheck, consolidate tests, fix comments llvm-svn: 275688	2016-07-16 18:08:22 +00:00
Sanjay Patel	e540a31f59	update test to use FileCheck llvm-svn: 275687	2016-07-16 16:31:58 +00:00
Sanjay Patel	972a53fb42	auto-generate checks llvm-svn: 275686	2016-07-16 16:27:58 +00:00
Sanjay Patel	309f98ef3a	auto-ggenerate checks llvm-svn: 275685	2016-07-16 16:24:06 +00:00
Sanjay Patel	f9d2b20daf	[InstCombine] reassociate logic ops with constants separated by a zext This is a partial implementation of a general fold for associative+commutative operators: (op (cast (op X, C2)), C1) --> (cast (op X, op (C1, C2))) (op (cast (op X, C2)), C1) --> (op (cast X), op (C1, C2)) There are 7 associative operators and 13 cast types, so this could potentially go a lot further. Differential Revision: https://reviews.llvm.org/D22421 llvm-svn: 275684	2016-07-16 15:20:19 +00:00
Hal Finkel	660096b260	Revert "Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute" This reverts commit r275042; the initial commit triggered self-hosting failures on ARM/AArch64. James Molloy identified the problematic backend code, which has been disabled in r275677. Trying again... Original commit message: Let FuncAttrs infer the 'returned' argument attribute A function can have one argument with the 'returned' attribute, indicating that the associated argument is always the return value of the function. Add FuncAttrs inference logic. llvm-svn: 275678	2016-07-16 07:21:28 +00:00
Hal Finkel	04b5330ccd	Disable this-return argument forwarding on ARM/AArch64 r275042 reverted function-attribute inference for the 'returned' attribute because the feature triggered self-hosting failures on ARM and AArch64. James Molloy determined that the this-return argument forwarding feature, which directly ties the returned input argument to the returned value, was the cause. It seems likely that this forwarding code contains, or triggers, a subtle bug. Disabling for now until we can track that down. llvm-svn: 275677	2016-07-16 07:07:29 +00:00
Yaxun Liu	a711cc7951	Re-commit [AMDGPU] Add metadata for runtime Attempting to fix lit test failure on ppc. llvm-svn: 275676	2016-07-16 05:09:21 +00:00
Matthias Braun	538859cca3	llc: Add support for -run-pass none This does not schedule any passes besides the ones necessary to construct and print the machine function. This is useful to test .mir file reading and printing. Differential Revision: http://reviews.llvm.org/D22432 llvm-svn: 275664	2016-07-16 02:24:59 +00:00
Matthias Braun	c92a5fc9f6	ARM/MIR: Move test from MIR to CodeGen/ARM directory test/CodeGen/MIR/ARM/ARMLoadStoreDBG.mir is an actual test for the ARM load store optimization pass and not a test of the mir parser/printer. It belongs to test/CodeGen/ARM; This also updates the test to use the new -run-pass llc syntax. llvm-svn: 275662	2016-07-16 02:24:13 +00:00
Matthias Braun	5d00b3213e	MIParser: reject subregister indexes on physregs llvm-svn: 275658	2016-07-16 01:36:18 +00:00
Vedant Kumar	424f51bb04	[llvm-cov] Optionally use a symbol demangler when preparing reports Add an option to specify a symbol demangler (as well as options to the demangler). This can be used to make reports more human-readable. This option is especially useful in -output-dir mode, since it isn't as easy to manually pipe reports into a demangler in this mode. llvm-svn: 275640	2016-07-15 22:44:57 +00:00
Matt Arsenault	73d2f8954a	AMDGPU: Fix verifier error from partially undef copy In this situation: %VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11, %VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use> %VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3 %VGPR4<def> = COPY %VGPR2 The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1, but VGPR4 is defined immediately after this copy. llvm-svn: 275635	2016-07-15 22:32:02 +00:00
Michael Kuperstein	be2e3f5ce5	ExpandPostRAPseudos should transfer implicit uses, not only implicit defs Previously, we would expand: %BL<def> = COPY %DL<kill>, %EBX<imp-use,kill>, %EBX<imp-def> Into: %BL<def> = MOV8rr %DL<kill>, %EBX<imp-def> Dropping the imp-use on the floor. That confused CriticalAntiDepBreaker, which (correctly) assumes that if an instruction defs but doesn't use a register, that register is dead immediately before the instruction - while in this case, the high lanes of EBX can be very much alive. This fixes PR28560. Differential Revision: https://reviews.llvm.org/D22425 llvm-svn: 275634	2016-07-15 22:31:14 +00:00
Zachary Turner	b927e02e1b	[pdb] Teach MsfBuilder and other classes about the Free Page Map. Block 1 and 2 of an MSF file are bit vectors that represent the list of blocks allocated and free in the file. We had been using these blocks to write stream data and other data, so we mark them as the free page map now. We don't yet serialize these pages to the disk, but at least we make a note of what it is, and avoid writing random data to them. Doing this also necessitated cleaning up some of the tests to be more general and hardcode fewer values, which is nice. llvm-svn: 275629	2016-07-15 22:17:19 +00:00
Zachary Turner	5e534c7fb3	[pdb] Round trip the NameMap data structure to YAML. llvm-svn: 275628	2016-07-15 22:17:08 +00:00
Zachary Turner	faa554b2fd	[pdb] Use MsfBuilder to handle the writing PDBs. Previously we would read a PDB, then write some of it back out, but write the directory, super block, and other pertinent metadata back out unchanged. This generates incorrect PDBs since the amount of data written was not always the same as the amount of data read. This patch changes things to use the newly introduced `MsfBuilder` class to write out a correct and accurate set of Msf metadata for the data actually written, which opens up the door for adding and removing type records, symbol records, and other types of data to an existing PDB. llvm-svn: 275627	2016-07-15 22:16:56 +00:00
Matt Arsenault	93be6e8c0a	StructurizeCFG: Fix inverting constantexpr conditions llvm-svn: 275626	2016-07-15 22:13:16 +00:00
Matt Arsenault	a65e6b8335	AMDGPU: Remove brev intrinsic llvm-svn: 275620	2016-07-15 21:27:13 +00:00
Matt Arsenault	82e5e1e564	AMDGPU: Fix TargetPrefix for remaining r600 intrinsics llvm-svn: 275619	2016-07-15 21:27:08 +00:00
Matt Arsenault	11d3e21f2b	AMDGPU: Remove AMDGPU.ldexp llvm-svn: 275618	2016-07-15 21:26:56 +00:00
Matt Arsenault	09b2c4aee8	AMDGPU: Remove legacy rsq.clamped intrinsic Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining. Also fix mismatch with non-IEEE rsq selecting to IEEE rsq. llvm-svn: 275617	2016-07-15 21:26:52 +00:00
Saleem Abdulrasool	467269a40e	CodeGen: avoid emitting unnecessary CFI Remove unnecessary clutter in assembly output. When using SjLj EH, the CFI is not actually used for anything. Do not emit the CFI needlessly. The minor test adjustments are interesting. The prologue test was just overzealous matcching. The interesting case is the LSDA change. It was originally added to ensure that various compilations did not mangle the name (it explicitly checked the name!). However, subsequent cleanups made it more reliant on the CFI to find the name. Parse the generated code flow to generically find the label still. llvm-svn: 275614	2016-07-15 21:10:29 +00:00
Nico Weber	8d66df15f4	Teach fast isel about the win64 calling convention. This mostly just works. Vectorcall rets are still not supported. The win64_eh test change is because fast isel doesn't use rsi for temporary computations, so it doesn't need to be pushed. The test case I'm changing was originally added to test pushes, but by now there are other test cases in that file exercising that code path. https://reviews.llvm.org/D22422 llvm-svn: 275607	2016-07-15 20:18:37 +00:00
George Burgess IV	22682e293b	[CFLAA] Add attributes handling for CFLAnders. This patch adds proper handling of stratified attributes into our anders-style CFLAA implementation. It also comes bundled with more CFLAnders tests. :) Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22325 llvm-svn: 275604	2016-07-15 20:02:49 +00:00
George Burgess IV	6d30aa03a0	[CFLAA] Add an initial CFLAnders implementation. This adds an incomplete anders-style implementation for CFLAA. It's incomplete in that it's missing interprocedural analysis, attrs handling, etc. and that it needs more tests. More tests and features will be added in future commits. Patch by Jia Chen. Differential Revision: https://reviews.llvm.org/D22291 llvm-svn: 275602	2016-07-15 19:53:25 +00:00
Vitaly Buka	7f64844481	Revert "[AMDGPU] Add metadata for runtime" This reverts commit r275566. llvm-svn: 275599	2016-07-15 19:14:57 +00:00
Jingyue Wu	2b353a9522	[ReassociateGEP] Update tests to allow missing "inbounds" on certain GEPs. With r275532 fixing miscompilation of GVN, "inbounds" on certain GEPs in these tests cannot be preserved any more. Left a TODO in the tests for future reference. llvm-svn: 275596	2016-07-15 18:47:17 +00:00
Sanjay Patel	27fefb2fcf	add tests for associative ops blocked by a cast These are more generalized versions of the cases added in r275302 and r275297. llvm-svn: 275594	2016-07-15 18:39:02 +00:00
Rong Xu	96a19d35ae	[PGO] IRPGO pre-cleanup pass changes This patch adds a selected set of cleanup passes including a pre-inline pass before LLVM IR PGO instrumentation. The inline is only intended to apply those obvious/trivial ones before instrumentation so that much less instrumentation is needed to get better profiling information. This will drastically improve the instrumented code performance for large C++ applications. Another benefit is the context sensitive counts that can potentially improve the PGO optimization. Differential Revision: http://reviews.llvm.org/D21405 llvm-svn: 275588	2016-07-15 18:10:49 +00:00
Adam Nemet	aad816083e	[OptRemark,LDist] RFC: Add hotness attribute Summary: This is the first set of changes implementing the RFC from http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334 This is a cross-sectional patch; rather than implementing the hotness attribute for all optimization remarks and all passes in a patch set, it implements it for the 'missed-optimization' remark for Loop Distribution. My goal is to shake out the design issues before scaling it up to other types and passes. Hotness is computed as an integer as the multiplication of the block frequency with the function entry count. It's only printed in opt currently since clang prints the diagnostic fields directly. E.g.: remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300) A new API added is similar to emitOptimizationRemarkMissed. The difference is that it additionally takes a code region that the diagnostic corresponds to. From this, hotness is computed using BFI. The new API is exposed via an analysis pass so that it can be made dependent on LazyBFI. (Thanks to Hal for the analysis pass idea.) This feature can all be enabled by setDiagnosticHotnessRequested in the LLVM context. If this is off, LazyBFI is not calculated (D22141) so there should be no overhead. A new command-line option is added to turn this on in opt. My plan is to switch all user of emitOptimizationRemark* to use this module instead. Reviewers: hfinkel Subscribers: rcox2, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D21771 llvm-svn: 275583	2016-07-15 17:23:20 +00:00
Jun Bum Lim	a5737d8eac	[DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals Summary: This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics. Add test cases which was missing opportunities before. Reviewers: hfinkel, eeckstein, mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D21909 llvm-svn: 275571	2016-07-15 16:14:34 +00:00
Krzysztof Parzyszek	bba0bf7d37	[Hexagon] Improve patterns with stack-based addressing - Treat bitwise OR with a frame index as an ADD wherever possible, fold it into addressing mode. - Extend patterns for memops to allow memops with frame indexes as address operands. llvm-svn: 275569	2016-07-15 15:35:52 +00:00
Nico Weber	f7f2b81602	In dag-optnone.ll, use varargs instead of win64 to fast SDIsel. The test used to rely on targeting win64 to disable fast isel, but I'd like to teach fast isel about win64 rets. Change the test to use varargs to disable fast isel. llvm-svn: 275568	2016-07-15 15:30:18 +00:00
Yaxun Liu	b3d17690eb	[AMDGPU] Add metadata for runtime Added emitting metadata to elf for runtime. Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream. Differential Revision: https://reviews.llvm.org/D21849 llvm-svn: 275566	2016-07-15 14:58:21 +00:00
Sebastian Pop	4177480aad	code hoisting pass based on GVN This pass hoists duplicated computations in the program. The primary goal of gvn-hoist is to reduce the size of functions before inline heuristics to reduce the total cost of function inlining. Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki. Important algorithmic contributions by Daniel Berlin under the form of reviews. Differential Revision: http://reviews.llvm.org/D19338 llvm-svn: 275561	2016-07-15 13:45:20 +00:00
Simon Pilgrim	efd841e294	[X86][AVX] Added shuffle tests for UNPCK+PERMUTE lowerVectorShuffleAsPermuteAndUnpack could solve this if it worked with 256-bit vectors llvm-svn: 275554	2016-07-15 11:51:46 +00:00
Simon Pilgrim	cf9c31550c	[X86][AVX2] Added a memory version of test_mm256_broadcastsi128_si256 This should lower to vbroadcasti128 llvm-svn: 275552	2016-07-15 11:40:27 +00:00
Simon Pilgrim	2683ad54ad	[X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level. This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen. Fixes PR28136. llvm-svn: 275543	2016-07-15 09:49:12 +00:00
James Molloy	b3326df56a	[Thumb-1] Select post-increment load and store where possible Thumb-1 doesn't have post-inc or pre-inc load or store instructions. However the LDM/STM instructions with writeback can function as post-inc load/store: ldm r0!, {r1} @ load from r0 into r1 and increment r0 by 4 Obviously, this only works if the post increment is 4. llvm-svn: 275540	2016-07-15 08:03:56 +00:00
James Molloy	a454a11d60	[ARM] Prefer indirect calls in minsize mode ... When we emit several calls to the same function in the same basic block. An indirect call uses a "BLX r0" instruction which has a 16-bit encoding. If many calls are made to the same target, this can enable significant code size reductions. llvm-svn: 275537	2016-07-15 07:55:21 +00:00
David Majnemer	959a6623b5	XFAIL two SeparateConstOffsetFromGEP tests They appear to have relied on bugs hidden in copyIRFlags/andIRFlags. This has been filed as PR28564. llvm-svn: 275533	2016-07-15 05:37:22 +00:00
David Majnemer	92f84ccf0f	[IR] andIRFlags and copyIRFlags needs to handle GEP We didn't consider the inbounds flag on GEPs leading to downstream users introducing UB. This fixes PR28562. llvm-svn: 275532	2016-07-15 05:02:31 +00:00
Vedant Kumar	71d515b0b3	[llvm-cov] Relax a test for Windows Attempt to address this bot failure: http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/4967 llvm-svn: 275522	2016-07-15 02:11:37 +00:00
Vedant Kumar	b95dc4608d	[llvm-cov] Improve error messages While we're at it, extend an existing test to make sure that error messages look reasonable. llvm-svn: 275520	2016-07-15 01:53:39 +00:00
Matt Arsenault	b91805ea2b	AMDGPU: Fix not expanding control flow after some kill blocks Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 llvm-svn: 275510	2016-07-15 00:58:15 +00:00
Matt Arsenault	fa5a86a403	AMDGPU: Fix trying to skip from a block with no successors Found while reducing bug 28550 llvm-svn: 275509	2016-07-15 00:58:13 +00:00
Matt Arsenault	83ab049af2	AMDGPU: Fix splitting kill blocks with defs before kill llvm-svn: 275508	2016-07-15 00:58:09 +00:00
Reid Kleckner	c29b4f07f9	[codeview] Shrink inlined call site line info tables For a fully inlined call chain like a -> b -> c -> d, we were emitting line info for 'd' 3 separate times: once for d's actual InlineSite line table, and twice for 'b' and 'c'. This is particularly inefficient when all these functions are in different headers, because now we need to encode the file change. Windbg was coping with our suboptimal output, so this should not be noticeable from the debugger. llvm-svn: 275502	2016-07-14 23:47:15 +00:00
Tim Northover	fbefee3bff	llvm-objdump: extend __mh_execute_header handling to other special syms We don't need to print any of the special __mh_*_header symbols when disassembling. Since they point at the beginning of the segment (not where the actual code is) they're pretty misleading. Should also fix lld bots. llvm-svn: 275498	2016-07-14 23:13:03 +00:00
Simon Pilgrim	420b266d0a	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied) This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. This was incorrectly reverted in rL275421 during triage of PR28552. llvm-svn: 275497	2016-07-14 23:05:09 +00:00
Adam Nemet	74730d9ab0	[LoopDist] Fix typo in diagnostic llvm-svn: 275495	2016-07-14 22:33:46 +00:00
Tim Northover	f203ab5be3	llvm-objdump: handle stubbed and malformed dylibs better We were quite happy to read past the end of the valid section data when disassembling. Instead we entirely skip stub dylibs, and tell the user what's happened if their section only has partial data. llvm-svn: 275487	2016-07-14 22:13:32 +00:00
Ekaterina Romanova	7aea5906c0	[GVN] Fold constant expression in GVN. Fix for PR 28418. opt never finishes compiling a test when -gvn option is passed. The problem is caused by the fact that GVN fails to fold a constant expression. Differential Revision: https://reviews.llvm.org/D22185 llvm-svn: 275483	2016-07-14 22:02:25 +00:00
Teresa Johnson	35e0204eec	[ThinLTO/gold] Perform index-based weak/linkonce resolution Summary: Invoke the weak/linkonce symbol resolution support (already used by libLTO) that operates via the summary index. This ensures prevailing linkonce are kept, by making them weak, and marks preempted copies as available_externally when possible. With this change, the older support for keeping the prevailing linkonce (by changing their symbol resolution) is removed. Reviewers: mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D22302 llvm-svn: 275474	2016-07-14 21:13:24 +00:00
Matthew Simpson	65ca32b83c	[LV] Allow interleaved accesses in loops with predicated blocks This patch allows the formation of interleaved access groups in loops containing predicated blocks. However, the predicated accesses are prevented from forming groups. Differential Revision: https://reviews.llvm.org/D19694 llvm-svn: 275471	2016-07-14 20:59:47 +00:00
Krzysztof Parzyszek	ecea07c50e	[Hexagon] Packetize function call arguments with tail call instructions On Hexagon is it legal to packetize the instructions setting up call arguments with the call instruction itself. This was already done, except for tail calls. Make sure tail calls are handled as well. llvm-svn: 275458	2016-07-14 19:30:55 +00:00
Sanjoy Das	13623ad009	[JumpThreading] PRE unordered loads Summary: Extend JumpThreading's PRE to unordered atomic loads. Reviewers: hfinkel, reames Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D22326 llvm-svn: 275456	2016-07-14 19:21:15 +00:00
Jun Bum Lim	c837af306e	[PM] Port Dead Loop Deletion Pass to the new PM Summary: Port Dead Loop Deletion Pass to the new pass manager. Reviewers: silvas, davide Subscribers: llvm-commits, sanjoy, mcrosier Differential Revision: https://reviews.llvm.org/D21483 llvm-svn: 275453	2016-07-14 18:28:29 +00:00
Kostya Serebryany	dd5c7f9313	[sanitizer-coverage] make sure that calls to __sanitizer_cov_trace_pc are not merged (otherwise different calls get the same PC and confuse fuzzers) llvm-svn: 275449	2016-07-14 17:59:01 +00:00
Nirav Dave	a6c7595d0f	[X86][MC] Fix bracket expression parsing in intel-style assembly. Only perform struct field check on Identifier tokens. Fixes PR28547. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22361 llvm-svn: 275445	2016-07-14 17:37:05 +00:00
Saleem Abdulrasool	0233cc55de	X86: handle external tail calls in Windows JIT If there was a tail call, we would incorrectly handle the relocation. It would end up indexing into the array with an incorrect section id. The symbol was external to the module, so the Section ID was UNDEFINED (-1). We would then index the SmallVector with this ID, triggering an assertion. Use the Value rather than the section load address in this case. llvm-svn: 275442	2016-07-14 17:27:06 +00:00
Sanjay Patel	2996a342f3	auto-generate checks Note: I removed the checks after each jump because that's noise, but we apparently need branches rather than returning i1 to see the bt codegen in some cases. llvm-svn: 275439	2016-07-14 17:07:55 +00:00
Tim Northover	6003fb58df	ARM: fix vmov.i64 immediate validity check Typo meant we were only checking the low byte (repeatedly). llvm-svn: 275437	2016-07-14 17:04:34 +00:00
Tom Stellard	1b5cf6217e	GlobalsAA: Functions with the argmemonly attribute won't read arbitrary globals Summary: In preparation for changing GlobalsAA to stop assuming that intrinsics can't read arbitrary globals, we need to make sure GlobalsAA is querying function attributes rather than relying on this assumption. This patch was inspired by: http://reviews.llvm.org/D20206 Reviewers: jmolloy, hfinkel Subscribers: eli.friedman, llvm-commits Differential Revision: https://reviews.llvm.org/D21318 llvm-svn: 275433	2016-07-14 15:50:27 +00:00
Nico Weber	5bb284226b	Don't optimize movs to pushes in -O0 builds. https://reviews.llvm.org/D22362 llvm-svn: 275431	2016-07-14 15:40:22 +00:00
Ahmed Bougacha	85dc93c56b	[X86] Decode MPX BND registers. We were able to assemble, but not disassemble. Note that fixupRMValue was truncating EA_REG_BND0-3 because we hit the uint8_t max. The control registers were already squarely above it, but I don't think they ever go in .r/m, only in .reg. I also did notice an extra REX.W in our encoding, but I think that's fine. llvm-svn: 275427	2016-07-14 14:53:21 +00:00
Sam Kolton	7a2a323feb	[AMDGPU] Assembler: fix row_bcast parsing Summary: This change fix bug 28538 Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22355 llvm-svn: 275422	2016-07-14 14:50:35 +00:00
Nico Weber	3afaf16abc	Revert r275411, it cause PR28552. llvm-svn: 275421	2016-07-14 14:49:35 +00:00
Nico Weber	755cd760cd	Revert r275401, it caused PR28551. llvm-svn: 275420	2016-07-14 14:41:25 +00:00
Matthew Simpson	3c3b4a257b	[LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions This patch prevents increases in the number of instructions, pre-instcombine, due to induction variable scalarization. An increase in instructions can lead to an increase in the compile-time required to simplify the induction variables. We now maintain a new map for scalarized induction variables to prevent us from converting between the scalar and vector forms. This patch should resolve compile-time regressions seen after r274627. llvm-svn: 275419	2016-07-14 14:36:06 +00:00
Nico Weber	ecdf45b1e6	Teach fast isel calls and rets about stdcall. stdcall is callee-pop like thiscall, so the thiscall changes already did most of the work for this. This change only opts stdcall in and adds tests. llvm-svn: 275414	2016-07-14 13:54:26 +00:00
Simon Pilgrim	bed37ccd54	[X86][AVX] Added an additional vperm2f128 memory folding test llvm-svn: 275413	2016-07-14 13:40:53 +00:00
Simon Pilgrim	3ecb6bdd5f	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. llvm-svn: 275411	2016-07-14 13:28:43 +00:00
Daniel Sanders	46fe6550ac	[mips] SelectionDAGISel subclasses now follow the optimization level. Summary: It was recently discovered that, for Mips's SelectionDAGISel subclasses, all optimization levels caused SelectionDAGISel to behave like -O2. This change adds the necessary plumbing to initialize the optimization level. Reviewers: andrew.w.kaylor Subscribers: andrew.w.kaylor, sdardis, dean, llvm-commits, vradosavljevic, petarj, qcolombet, probinson, dsanders Differential Revision: https://reviews.llvm.org/D14900 llvm-svn: 275410	2016-07-14 13:25:22 +00:00
Simon Pilgrim	053d32906f	[X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations. llvm-svn: 275406	2016-07-14 12:58:04 +00:00
Sjoerd Meijer	716abbb2f5	This converts a signed remainder instruction to unsigned remainder, which enables the code size optimisation to fold a rem and div into a single aeabi_uidivmod call. This was not happening before because sdiv was converted but srem not, and instructions with different signedness are not combined. Differential Revision: http://reviews.llvm.org/D22214 llvm-svn: 275403	2016-07-14 12:23:48 +00:00
Simon Pilgrim	700e4a1ab8	[X86][AVX] Add 128-bit wide shuffle tests that should combine to blend-with-zero llvm-svn: 275402	2016-07-14 12:21:40 +00:00
Sebastian Pop	63847d04e7	code hoisting pass based on GVN This pass hoists duplicated computations in the program. The primary goal of gvn-hoist is to reduce the size of functions before inline heuristics to reduce the total cost of function inlining. Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki. Important algorithmic contributions by Daniel Berlin under the form of reviews. Differential Revision: http://reviews.llvm.org/D19338 llvm-svn: 275401	2016-07-14 12:18:53 +00:00
Simon Pilgrim	a76a8e50e5	[X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments support llvm-svn: 275400	2016-07-14 12:07:43 +00:00
Simon Pilgrim	9e812169cc	[X86][AVX] Regenerate broadcast upgrade tests llvm-svn: 275398	2016-07-14 11:05:43 +00:00
Sjoerd Meijer	38c2cd0c14	This implements a more optimal algorithm for selecting a base constant in constant hoisting. It not only takes into account the number of uses and the cost of expressions in which constants appear, but now also the resulting integer range of the offsets. Thus, the algorithm maximizes the number of uses within an integer range that will enable more efficient code generation. On ARM, for example, this will enable code size optimisations because less negative offsets will be created. Negative offsets/immediates are not supported by Thumb1 thus preventing more compact instruction encoding. Differential Revision: http://reviews.llvm.org/D21183 llvm-svn: 275382	2016-07-14 07:44:20 +00:00
David Majnemer	666aa945a5	[InstCombine] Masked loads with undef masks can fold to normal loads We were able to fold masked loads with an all-ones mask to a normal load. However, we couldn't turn a masked load with a mask with mixed ones and undefs into a normal load. llvm-svn: 275380	2016-07-14 06:58:42 +00:00
David Majnemer	17a95aaa7b	Simplify llvm.masked.load w/ undef masks We can always pick the passthru value if the mask is undef: we are permitted to treat the mask as-if it were filled with zeros. llvm-svn: 275379	2016-07-14 06:58:37 +00:00
Eli Friedman	17e8ea18e9	[X86] Fix stupid typo in isel lowering. Apparently someone miscounted the number of zeros in the immediate. Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 . llvm-svn: 275376	2016-07-14 05:48:25 +00:00
Matt Arsenault	ca7f5701f8	AMDGPU/R600: Delete/rename intrinsics no longer used by mesa Use the replacement pass to update the tests, and delete old names. llvm-svn: 275375	2016-07-14 05:47:17 +00:00
Matt Arsenault	897eee4187	AMDGPU: Remove unused intrinsics llvm-svn: 275371	2016-07-14 05:23:19 +00:00
Matt Arsenault	aa94c1e7ee	AMDGPU: Fix test not actually testing anything It wasn't actually running the pass, and since it is missing the llvm prefix, the eh intrinsic was not really an IntrinsicInst. Also add missing test for lifetime markers. llvm-svn: 275370	2016-07-14 05:23:15 +00:00
Dean Michael Berris	52735fc435	XRay: Add entry and exit sleds Summary: In this patch we implement the following parts of XRay: - Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches. - Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts). - X86-specific nop sleds as described in the white paper. - A machine function pass that adds the different instrumentation marker instructions at a very late stage. - A way of identifying which return opcode is considered "normal" for each architecture. There are some caveats here: 1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet. 2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library. Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D19904 llvm-svn: 275367	2016-07-14 04:06:33 +00:00
Davide Italiano	7dac027ed7	[IPSCCP] Constant fold struct argument/instructions when all the lattice values are constant. This now should also work with the interprocedural variant of the pass. Slightly easier now that the yak is shaved. Differential Revision: http://reviews.llvm.org/D22329 llvm-svn: 275363	2016-07-14 02:51:41 +00:00
Nico Weber	af7e8465e1	Teach fast isel about thiscall (and callee-pop) calls. http://reviews.llvm.org/D22315 llvm-svn: 275360	2016-07-14 01:52:51 +00:00
Mehdi Amini	8484f92f7f	[Scalarizer] PR28108: Skip over nullptr rather than crashing on it. Summary: In Scalarizer::gather we see if we already have a scattered form of Op, and in that case use the new form. In the particular case of PR28108, the found ValueVector SV has size 2, where the first Value is nullptr, and the second is indeed a proper Value. The nullptr then caused an assert to blow when we tried to do cast<Instruction>(SV[I]). With this patch we check SV[I] before doing the cast, and if it's nullptr we just skip over it. I don't know the Scalarizer well enough to know if this is the best fix or if something should be done else where to prevent the nullptr from being in the ValueVector at all, but at least this avoids the crash and looking at the test case output it looks reasonable. Reviewers: hfinkel, frasercrmck, wala, mehdi_amini Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21518 llvm-svn: 275359	2016-07-14 01:31:25 +00:00
Mehdi Amini	9e332a7719	Add missing test for r275347 "[IPRA] Set callee saved registers to none for local function when IPRA is enabled." llvm-svn: 275358	2016-07-14 01:31:20 +00:00
Adrian Prantl	0418ef2691	Synchronize LLVM and clang's ObjCDeclSpec::ObjCPropertyAttributeKind. This adds Clang-specific DWARF constants for nullability and ObjC class properties that are already generated by clang. This patch adds dwarfdump support and a more comprehensive testcase. <rdar://problem/27335745> llvm-svn: 275354	2016-07-14 00:41:18 +00:00
David Majnemer	7f781aba97	[ConstantFolding] Fold masked loads We can constant fold a masked load if the operands are appropriately constant. Differential Revision: http://reviews.llvm.org/D22324 llvm-svn: 275352	2016-07-14 00:29:50 +00:00
David Majnemer	f89660aba7	[ConstantFolding] Extend FoldReinterpretLoadFromConstPtr to handle negative offsets Treat loads which clip before the start of a global initializer the same way we treat clipping beyond the end of the initializer: use zeros. llvm-svn: 275345	2016-07-13 23:33:07 +00:00
Michael Kuperstein	be837fa40f	[DAG] Correctly chain masked loads If a masked loads is not added to the chain, it should not reset the chain's root. This fixes the remaining part of PR28515. llvm-svn: 275340	2016-07-13 23:23:40 +00:00
Quentin Colombet	68a84587c5	[MIR] Fix one GlobalISel test case that I missed in r275314. llvm-svn: 275333	2016-07-13 22:35:33 +00:00
Nico Weber	b888555bcc	Add a triple to fix test on bots after 275320. llvm-svn: 275327	2016-07-13 22:19:40 +00:00
Nico Weber	eb9488b151	Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact. This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well, but it's not clear if the pass should run in that setup. http://reviews.llvm.org/D22314 llvm-svn: 275320	2016-07-13 21:38:27 +00:00
Alina Sbirlea	640a61cd8b	Extended LoadStoreVectorizer to vectorize subchains. Summary: LSV used to abort vectorizing a chain for interleaved load/store accesses that alias. Allow a valid prefix of the chain to be vectorized, mark just the prefix and retry vectorizing the remaining chain. Reviewers: llvm-commits, jlebar, arsenm Subscribers: mzolotukhin Differential Revision: http://reviews.llvm.org/D22119 llvm-svn: 275317	2016-07-13 21:20:01 +00:00
Quentin Colombet	545e558b82	[MIR] Print on the given output instead of stderr. Currently the MIR framework prints all its outputs (errors and actual representation) on stderr. This patch fixes that by printing the regular output in the output specified with -o. Differential Revision: http://reviews.llvm.org/D22251 llvm-svn: 275314	2016-07-13 20:36:03 +00:00
Matt Arsenault	f071102647	AMDGPU: Remove last AMDIL intrinsics llvm-svn: 275309	2016-07-13 19:42:06 +00:00
Andrew Kaylor	346dd7f1bd	Reverting r275284 due to platform-specific test failures llvm-svn: 275304	2016-07-13 19:09:16 +00:00
Sanjay Patel	eff2aa70fc	add more tests for zexty xor sandwiches ...mmm sandwiches llvm-svn: 275302	2016-07-13 18:58:55 +00:00
Simon Pilgrim	5d664af3c3	[X86][SSE] Regenerate truncated shift test Check SSE2 and AVX2 implementations llvm-svn: 275300	2016-07-13 18:50:10 +00:00
Simon Pilgrim	631643e7d9	Regenerate test llvm-svn: 275299	2016-07-13 18:46:37 +00:00
Sanjay Patel	904a88025a	add test for zexty xor sandwich llvm-svn: 275297	2016-07-13 18:40:38 +00:00
Krzysztof Parzyszek	cb4dd7656b	Move mempcpy_call.ll to X86 subdirectory llvm-svn: 275294	2016-07-13 18:28:45 +00:00
Sanjay Patel	c00e48a3db	[InstCombine] extend vector select matching for non-splat constants In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean way to handle the regular m_Not cases and these non-splat vector patterns, so I've opted for the direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could move that helper function to a higher level. There is an open question as to which is of these forms should be considered the canonical IR: %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3> Differential Revision: http://reviews.llvm.org/D22114 llvm-svn: 275289	2016-07-13 18:07:02 +00:00
Andrew Kaylor	12cccdd731	Fix for Bug 26903, adds support to inline __builtin_mempcpy Patch by Sunita Marathe Differential Revision: http://reviews.llvm.org/D21920 llvm-svn: 275284	2016-07-13 17:25:11 +00:00
Matthias Braun	512424f28a	PatchableFunction: Skip pseudos that do not create code This fixes http://llvm.org/PR28524 llvm-svn: 275278	2016-07-13 16:37:29 +00:00
Teresa Johnson	b907d06151	[ThinLTO/gold] Enable symbol resolution in distributed backend case While testing a follow-on change to enable index-based symbol resolution and internalization in the distributed backends, I realized that a test case change I made in r275247 was only required because we were not analyzing symbols in the claimed files in thinlto-index-only mode. In the fixed test case there should be no internalization because we are linking in -shared mode, so f() is in fact exported, which is detected properly when we analyze symbols in thinlto-index-only mode. Note that this is not (yet) a correctness issue (because we are not yet performing the index-based linkage optimizations in the distributed backends - that's coming in a follow-on patch). llvm-svn: 275277	2016-07-13 16:35:56 +00:00
Sanjay Patel	610a2f6525	[x86][SSE/AVX] optimize pcmp results better (PR28484) We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading. One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the test cases is filed as PR28505: https://llvm.org/bugs/show_bug.cgi?id=28505 Differential Revision: http://reviews.llvm.org/D22225 llvm-svn: 275276	2016-07-13 16:04:07 +00:00
Simon Pilgrim	a99368fa35	[X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask comments llvm-svn: 275272	2016-07-13 15:45:36 +00:00
Simon Pilgrim	48d8340760	[X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask Added AVX512F VPERMILPS shuffle decoding support llvm-svn: 275270	2016-07-13 15:10:43 +00:00
Tom Stellard	418beb7671	AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21484 llvm-svn: 275268	2016-07-13 14:23:33 +00:00
Nirav Dave	8ea792db60	[MC] Fix lexing ordering in assembly label parsing to preserve same line comment placement. llvm-svn: 275265	2016-07-13 14:03:12 +00:00
Matt Arsenault	0056868c4a	AMDGPU: Fold out no-op kill intrinsics llvm-svn: 275253	2016-07-13 06:04:22 +00:00
David Majnemer	1b3db33e3d	[ConstantFolding] Don't treat negative GEP offsets as positive GEP offsets are signed, don't treat them as huge positive numbers. llvm-svn: 275251	2016-07-13 05:16:16 +00:00
Adam Nemet	c2f791d8a7	[BFI] Add new LazyBFI analysis pass Summary: This is necessary for D21771. In order to add the hotness attribute to optimization remarks we need BFI to be available in all passes that emit optimization remarks. However we don't want to pay for computing BFI unless the hotness attribute is requested. This is achieved by making BFI lazy at the very high-level through a new analysis pass -- BFI is not calculated unless requested. I am adding a test to check the laziness under D21771 where the first user of the analysis is added. Reviewers: hfinkel, dexonsmith, davidxl Subscribers: davidxl, dexonsmith, llvm-commits Differential Revision: http://reviews.llvm.org/D22141 llvm-svn: 275250	2016-07-13 05:01:48 +00:00
Teresa Johnson	27694571b1	[ThinLTO/gold] ThinLTO internalization fixes Internalization was missing cases where we originally had a local symbol that was promoted eagerly but not actually exported. This is because we were only internalizing the set of global (non-local) symbols that were PREVAILAING_DEF_IRONLY. Instead, collect the set of global symbols that are referenced outside of a single IR file, and skip internalization for those. llvm-svn: 275247	2016-07-13 03:42:41 +00:00
David Majnemer	a7b6c973e5	[ConstantFold] Don't incorrectly infer inbounds on array GEP The many levels of nesting inside the responsible code made it easy for bugs to sneak in. Flattening the logic makes it easier to see what's going on. llvm-svn: 275244	2016-07-13 03:24:41 +00:00
Keno Fischer	1efc3b70c5	Fix ScalarEvolutionExpander step scaling bug The expandAddRecExprLiterally function incorrectly transforms `[Start + Step * X]` into `Step * [Start + X]` instead of the correct transform of `[Step * X] + Start`. This caused https://github.com/JuliaLang/julia/issues/14704#issuecomment-174126219 due to what appeared to be sufficiently complicated loop interactions. Patch by Jameson Nash (jameson@juliacomputing.com). Reviewers: sanjoy Differential Revision: http://reviews.llvm.org/D16505 llvm-svn: 275239	2016-07-13 01:28:12 +00:00
Dehao Chen	9cba1f4e7e	New pass manager for LICM. Summary: Port LICM to the new pass manager. Reviewers: davidxl, silvas Subscribers: krasin, vitalybuka, silvas, davide, sanjoy, llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D21772 llvm-svn: 275222	2016-07-12 22:37:48 +00:00
Tim Northover	72eebfa4b0	GlobalISel: freeze reserved regs after IRTranslator. We can freeze the registers after the MachineFrameInfo has been configured (by telling it about calls, inline asm, ...). This doesn't happen at all yet, but will be part of IR translation. Fixes -verify-machineinstrs assertion. llvm-svn: 275221	2016-07-12 22:23:42 +00:00
Matt Arsenault	786724a22e	AMDGPU: Follow up to r275203 I meant to squash this into it. llvm-svn: 275220	2016-07-12 21:41:32 +00:00
Nemanja Ivanovic	f0407e3902	The test case I added is PowerPC specific but I accidentally had it in the wrong directory. Moved it to CodeGen/PowerPC. Sorry about the noise. llvm-svn: 275218	2016-07-12 21:24:08 +00:00
Michael Kuperstein	a99c46cc73	[LV] Remove wrong assumption about LCSSA The LCSSA pass itself will not generate several redundant PHI nodes in a single exit block. However, such redundant PHI nodes don't violate LCSSA form, and may be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass itself. So, assuming a single PHI node per exit block is not safe. llvm-svn: 275217	2016-07-12 21:24:06 +00:00
Nemanja Ivanovic	b43bb6141e	[Power9] Add codegen for VSX word insert/extract instructions This patch corresponds to review: http://reviews.llvm.org/D20239 It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that are useful in some cases for inserting and extracting vector elements of v4[if]32 vectors. llvm-svn: 275215	2016-07-12 21:00:10 +00:00
Piotr Padlewski	fa0cdb371b	Review fixes to lit documentation Reviewers: mehdi_amini Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22245 llvm-svn: 275214	2016-07-12 20:59:17 +00:00
Simon Pilgrim	6fa71da4a4	[X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128 llvm-svn: 275212	2016-07-12 20:27:32 +00:00
Davide Italiano	0080269342	[SCCP] Constant fold structs if all the lattice value are constant. Differential Revision: http://reviews.llvm.org/D22269 llvm-svn: 275208	2016-07-12 19:54:19 +00:00
Matthias Braun	96ec47db74	X86FixupBWInsts: No need for forward liveness analysis. With r274952 and r275201 in place there are no cases left where a forward liveness analysis yields different results than a backward one. So we can remove the forward stepping logic. Differential Revision: http://reviews.llvm.org/D22083 llvm-svn: 275204	2016-07-12 19:04:30 +00:00
Matt Arsenault	657f871a4e	AMDGPU: Fix verifier error with kill intrinsic Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic. llvm-svn: 275203	2016-07-12 19:01:23 +00:00
Dehao Chen	b9f8e29290	[PM] Port LoopIdiomRecognize Pass to new PM Summary: Port LoopIdiomRecognize Pass to new PM Reviewers: davidxl Subscribers: davide, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D22250 llvm-svn: 275202	2016-07-12 18:45:51 +00:00
Wei Ding	5b2636a152	AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8 Differential Revision: http://reviews.llvm.org/D22239 llvm-svn: 275197	2016-07-12 18:02:14 +00:00
Xinliang David Li	9eb472ba4b	[PGO] Don't include full file path in static function profile counter names Patch by Jake VanAdrighem Differential Revision: http://reviews.llvm.org/D22028 llvm-svn: 275193	2016-07-12 17:14:51 +00:00
Sanjay Patel	4a6a751dce	add tests for missing DeMorgan's Law folds llvm-svn: 275192	2016-07-12 17:05:04 +00:00
Sanjay Patel	3900191ecc	auto-generate checks llvm-svn: 275188	2016-07-12 16:21:55 +00:00
Sanjay Patel	93dffe629a	auto-generate checks llvm-svn: 275187	2016-07-12 16:17:30 +00:00
Sanjay Patel	6d1f227e6b	auto-generate checks llvm-svn: 275186	2016-07-12 16:13:04 +00:00
Haicheng Wu	711ca868fc	[AArch64] Set FMOVS0 and FMOVD0 as isAsCheapAsAMove when needed. If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove. Differential Revision: http://reviews.llvm.org/D22256 llvm-svn: 275178	2016-07-12 15:31:41 +00:00
Nemanja Ivanovic	eebbcb6d57	[PowerPC] Cannonicalize applicable vector shift immediates as swaps This patch corresponds to review: http://reviews.llvm.org/D21358 Vector shifts that have the same semantics as a vector swap are cannonicalized as such to provide additional opportunities for swap removal optimization to remove unnecessary swaps. llvm-svn: 275168	2016-07-12 12:16:27 +00:00
Amjad Aboud	acee568545	[codeview] Improved array type support. Added support for: 1. Multi dimension array. 2. Array of structure type, which previously was declared incompletely. 3. Dynamic size array. 4. Array where element type is a typedef, volatile or constant (this should resolve PR28311). Differential Revision: http://reviews.llvm.org/D21526 llvm-svn: 275167	2016-07-12 12:06:34 +00:00
Nicolai Haehnle	7968c34586	AMDGPU: Unify MOVRELSOffset and MOVRELDOffset Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160	2016-07-12 08:12:16 +00:00
Vitaly Buka	204dc533c5	Revert "New pass manager for LICM." Summary: This reverts commit r275118. Subscribers: sanjoy, mehdi_amini Differential Revision: http://reviews.llvm.org/D22259 llvm-svn: 275156	2016-07-12 06:25:32 +00:00
Craig Topper	a6e6febe2c	[AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR. llvm-svn: 275155	2016-07-12 05:27:53 +00:00
Ivan Krasin	5474645dc8	Print remarks from WholeProgramDevirt pass for each call site. Summary: It's useful to have some visibility about which call sites are devirtualized, especially for debug purposes. Another use case is a regression test on the application side (like, Chromium). Reviewers: pcc Differential Revision: http://reviews.llvm.org/D22252 llvm-svn: 275145	2016-07-12 02:38:37 +00:00
NAKAMURA Takumi	e92e2124f6	llvm/test/CodeGen/AMDGPU/selected-stack-object.ll REQUIRES +Asserts, since it expects assertion failure. llvm-svn: 275144	2016-07-12 02:18:09 +00:00
Haicheng Wu	1e39574e9f	[Kryo] Enable ZCZeroing feature This feature uses immediate #0 to zero a register. Differential Revision: http://reviews.llvm.org/D19985 llvm-svn: 275143	2016-07-12 02:04:01 +00:00
Nico Weber	c7bf646a99	Teach FastISel about thiscall (and, hence, about callee-pop). http://reviews.llvm.org/D22115 llvm-svn: 275135	2016-07-12 01:30:35 +00:00
Matt Arsenault	45f8216cee	AMDGPU: Remove superfluous string attributes from tests Also fix v_mac.ll not testing right thing for fneg llvm-svn: 275129	2016-07-11 23:35:48 +00:00
Mehdi Amini	e75aa6f674	Add a libLTO API to query a memory buffer and check if it contains ObjC categories The linker supports a feature to force load an object from a static archive if it defines an Objective-C category. This API supports this feature by looking at every section in the module to find if a category is defined in the module. llvm-svn: 275125	2016-07-11 23:10:18 +00:00
Dehao Chen	7ef5820fa3	New pass manager for LICM. Summary: Port LICM to the new pass manager. Reviewers: davidxl, silvas Subscribers: silvas, davide, sanjoy, llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D21772 llvm-svn: 275118	2016-07-11 22:45:24 +00:00
Alina Sbirlea	cbc6ac2afd	Correct ordering of loads/stores. Summary: Aiming to correct the ordering of loads/stores. This patch changes the insert point for loads to the position of the first load. It updates the ordering method for loads to insert before, rather than after. Before this patch the following sequence: "load a[1], store a[1], store a[0], load a[2]" Would incorrectly vectorize to "store a[0,1], load a[1,2]". The correctness check was assuming the insertion point for loads is at the position of the first load, when in practice it was at the last load. An alternative fix would have been to invert the correctness check. The current fix changes insert position but also requires reordering of instructions before the vectorized load. Updated testcases to reflect the changes. Reviewers: tstellarAMD, llvm-commits, jlebar, arsenm Subscribers: mzolotukhin Differential Revision: http://reviews.llvm.org/D22071 llvm-svn: 275117	2016-07-11 22:34:29 +00:00
Tim Northover	3e0361710a	ARM: validate immediate branch targets in AsmParser. Immediate branch targets aren't commonly used, but if they are we should make sure they can actually be encoded. This means they must be divisible by 2 when targeting Thumb mode, and by 4 when targeting ARM mode. Also do a little naming cleanup while I was changing everything around anyway. llvm-svn: 275116	2016-07-11 22:29:37 +00:00
Nicolai Haehnle	c06bfa1daa	AMDGPU: Treat texture gather instructions more like other MIMG instructions Summary: Setting MIMG to 0 has a bunch of unexpected side effects, including that isVMEM returns false which leads to incorrect treatment in the hazard recognizer. The reason I noticed it is that it also leads to incorrect treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug. The only reason why MIMG was set to 0 is to signal the special handling of dmasks, but that can be checked differently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22210 llvm-svn: 275113	2016-07-11 21:59:43 +00:00
Zachary Turner	dbeaea7b35	Refactor the PDB writing to use a builder approach llvm-svn: 275110	2016-07-11 21:45:26 +00:00
Zachary Turner	f6b9382467	[pdb] Add a pdb2yaml option to not dump file headers. This will be useful once we start adding the ability to dump type records and symbol records, since it will allow us to generate mergeable information instead of information that specifies an entire file. llvm-svn: 275109	2016-07-11 21:45:09 +00:00
Nicolai Haehnle	f52c3cf272	AMDGPU: fix local stack slot allocation bugs Summary: The main bug fix here is using the 32-bit encoding of V_ADD_I32 in materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary immediates work. The second part is that we may now require the SegmentWaveByteOffset even when there are initially no stack objects and VGPR spilling isn't enabled, for stack slots that are allocated later. This means that some bits become effectively dead and can be cleaned up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602 Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21551 llvm-svn: 275108	2016-07-11 21:44:40 +00:00
Michael Kuperstein	f0c59330e9	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106	2016-07-11 21:39:44 +00:00
Quentin Colombet	fb82c7bc94	[X86] Fix tailcall return address clobber bug. This bug (llvm.org/PR28124) was introduced by r237977, which refactored the tail call sequence to be generated in two passes instead of one. Unfortunately, the stack adjustment produced by the first pass was not recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing code such as the following, which clobbers the return address, to be generated: popl %edi popl %edi pushl %eax jmp tailcallee # TAILCALL To fix the problem, the entire stack adjustment is performed in X86ExpandPseudo::ExpandMI() for tail calls. Patch by Magnus Lång <margnus1@gmail.com> Differential Revision: http://reviews.llvm.org/D21325 llvm-svn: 275103	2016-07-11 21:03:03 +00:00
Alina Sbirlea	327955e057	Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains. Add additional parameters: AddressSpace, Alignment, Fast. Reviewers: llvm-commits, jlebar Subscribers: arsenm, mzolotukhin Differential Revision: http://reviews.llvm.org/D21935 llvm-svn: 275100	2016-07-11 20:46:17 +00:00
Michael Kuperstein	cfbac5f361	[X86] Disable FixupSetCC for CodeGenOpt::None It is an optimization pass, and should not run at -O0. Especially since Fast RA will not do the required register coalescing anyway, so it's a loss even from the optimization standpoint. This also works around (but doesn't quite fix) PR28489. llvm-svn: 275099	2016-07-11 20:40:44 +00:00
Chad Rosier	4f0dad1674	[IPRA] Properly compute register usage at call sites. Differential Revision: http://reviews.llvm.org/D21395 Patch by Vivek Pandya. PR28144 llvm-svn: 275087	2016-07-11 18:45:49 +00:00
Zhan Jun Liau	def708a0f9	[SystemZ] Recognize Load On Condition Immediate (LOCHI/LOGHI) opportunities Summary: Add support for the z13 instructions LOCHI and LOCGHI which conditionally load immediate values. Add target instruction info hooks so that if conversion will allow predication of LHI/LGHI. Author: RolandF Reviewers: uweigand Subscribers: zhanjunl Commiting on behalf of Roland. Differential Revision: http://reviews.llvm.org/D22117 llvm-svn: 275086	2016-07-11 18:45:03 +00:00
Jingyue Wu	641cfee976	[SLSR] Call getPointerSizeInBits with the correct address space. llvm-svn: 275083	2016-07-11 18:13:28 +00:00
Davide Italiano	e8ae0b5eb4	[PM/IPO] Port LowerTypeTests to the new PassManager. There's a little bit of churn in this patch because the initialization mechanism is now shared between the old and the new PM. Other than that, it's just a pretty mechanical translation. llvm-svn: 275082	2016-07-11 18:10:06 +00:00
Jacques Pienaar	c3a162c451	[lanai] Add more tests for assembly of conditional ALU ops llvm-svn: 275081	2016-07-11 17:58:16 +00:00
Dehao Chen	9232f98279	Implement callsite-hotness based inline cost for Sample-based PGO Summary: For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch. E.g. if (A1 && A2 && A3 && ..... && A10) { for (i=0; i < 100000000; i++) { callsite(); } } Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value. In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness. Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR. Reviewers: davidxl, eraman, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22118 llvm-svn: 275073	2016-07-11 16:48:54 +00:00
Dehao Chen	29d2641f52	Tune the weight propagation algorithm for sample profile. Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22180 llvm-svn: 275072	2016-07-11 16:40:17 +00:00
Sanjay Patel	8f1d408c74	[x86] make some of the tests 256-bit for testing diversity llvm-svn: 275070	2016-07-11 15:08:37 +00:00
Nirav Dave	8603062ee4	Fix branch relaxation in 16-bit mode. Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation to generate jumps with 16-bit sized immediates in 16-bit mode. This fixes PR22097. Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D20830 llvm-svn: 275068	2016-07-11 14:23:53 +00:00
Sanjay Patel	b428951990	[x86] specify triple to avoid bot failures llvm-svn: 275067	2016-07-11 14:17:54 +00:00
Nicolai Haehnle	889a20cf40	[Sink] Don't move calls to readonly functions across stores Summary: Reviewers: hfinkel, majnemer, tstellarAMD, sunfish Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17279 llvm-svn: 275066	2016-07-11 14:11:51 +00:00
Sanjay Patel	0d38830aca	[x86] update checks llvm-svn: 275064	2016-07-11 14:07:31 +00:00
Nirav Dave	53a72f4d3c	Provide support for preserving assembly comments Preserve assembly comments from input in output assembly and flags to toggle property. This is on by default for inline assembly and off in llvm-mc. Parsed comments are emitted immediately before an EOL which generally places them on the expected line. Reviewers: rtrieu, dwmw2, rnk, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20020 llvm-svn: 275058	2016-07-11 12:42:14 +00:00
Artem Tamazov	53c9de08d2	[AMDGPU][llvm-mc] Quickfix for r272748 to enable labels in branch instructions. Fixes issue mentioned at: https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13. Lit tests added. Differential Revision: http://reviews.llvm.org/D22133 llvm-svn: 275054	2016-07-11 12:07:18 +00:00
Zlatko Buljan	cba9f80ba8	[mips][microMIPS] Implement LDC1, SDC1, LDC2, SDC2, LWC1, SWC1, LWC2 and SWC2 instructions and add CodeGen support Differential Revision: http://reviews.llvm.org/D18824 llvm-svn: 275050	2016-07-11 07:41:56 +00:00
Elena Demikhovsky	d84f337953	AVX-512: DAG lowering for scalar MIN/MAX commutable ops DAG lowering was missing for the scalar FMINC, FMAXC nodes. The nodes are generated only in the "unsafe-fp-math" mode. Added tests. llvm-svn: 275048	2016-07-11 06:08:06 +00:00
Craig Topper	7ee070e7bc	[AVX512] Add support for 512-bit ANDN now that all ones build vectors survive long enough to allow the matching. llvm-svn: 275046	2016-07-11 05:36:53 +00:00
Craig Topper	516e14cd8e	[AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors. llvm-svn: 275045	2016-07-11 05:36:48 +00:00
Hal Finkel	02012bcfee	Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute Reverting r275027 and r275033. These seem to cause miscompiles on the AArch64 buildbot. llvm-svn: 275042	2016-07-11 04:51:23 +00:00
Hal Finkel	2cac58f604	Pointer-comparison folding should look through returned-argument functions For functions which are known to return a specific argument, pointer-comparison folding can look through the function calls as part of its analysis. Differential Revision: http://reviews.llvm.org/D9387 llvm-svn: 275039	2016-07-11 03:37:59 +00:00
Hal Finkel	bf3957a553	Teach isDereferenceablePointer to look through returned-argument functions For functions which are known to return their argument, isDereferenceableAndAlignedPointer can examine the argument value. Differential Revision: http://reviews.llvm.org/D9384 llvm-svn: 275038	2016-07-11 03:08:49 +00:00
Hal Finkel	e186debb8b	Teach SCEV to look through returned-argument functions When building SCEVs, if a function is known to return its argument, then we can build the SCEV using the corresponding argument value. Differential Revision: http://reviews.llvm.org/D9381 llvm-svn: 275037	2016-07-11 02:48:23 +00:00
Hal Finkel	6fd5e1f02b	Teach computeKnownBits to look through returned-argument functions If a function is known to return one of its arguments, we can use that in order to compute known bits of the return value. Differential Revision: http://reviews.llvm.org/D9397 llvm-svn: 275036	2016-07-11 02:25:14 +00:00
Hal Finkel	5c12d8fe8f	BasicAA should look through functions with returned arguments Motivated by the work on the llvm.noalias intrinsic, teach BasicAA to look through returned-argument functions when answering queries. This is essential so that we don't loose all other AA information when supplementing with llvm.noalias. Differential Revision: http://reviews.llvm.org/D9383 llvm-svn: 275035	2016-07-11 01:32:20 +00:00
Hal Finkel	d66a7b05db	Let FuncAttrs infer the 'returned' argument attribute A function can have one argument with the 'returned' attribute, indicating that the associated argument is always the return value of the function. Add FuncAttrs inference logic. Differential Revision: http://reviews.llvm.org/D22202 llvm-svn: 275027	2016-07-10 22:02:55 +00:00
Jan Vesely	2fa28c330c	AMDGPU/R600: Add implicitarg.ptr intrinsic Differential Revision: http://reviews.llvm.org/D21622 llvm-svn: 275024	2016-07-10 21:20:29 +00:00
Simon Pilgrim	2191faa433	[X86][SSE] Add support for target shuffle combining to PSHUFLW/PSHUFHW llvm-svn: 275022	2016-07-10 21:02:47 +00:00
Sanjay Patel	ccd08fc8c4	[x86, SSE, AVX] add tests for icmp+zext (PR28484) Note the inconsistent vpbroadcast generation for AVX2; another bug. llvm-svn: 275020	2016-07-10 20:45:14 +00:00
Simon Pilgrim	51c786bd91	[X86][SSE] Added tests for combining shuffles to PSHUFLW/PSHUFHW llvm-svn: 275019	2016-07-10 20:19:56 +00:00
Marcin Koscielnicki	cf7cc724a7	[SystemZ] Utilize Test Data Class instructions. This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32\|64\|128), which maps straight to the test data class instructions. A new IR pass is added to recognize instructions that can be converted to TDC and perform the necessary replacements. Differential Revision: http://reviews.llvm.org/D21949 llvm-svn: 275016	2016-07-10 14:41:22 +00:00
Craig Topper	0b0954570a	[AVX512] Add support for lowering to 512-bit SHUFPS. llvm-svn: 275011	2016-07-10 05:55:53 +00:00
Sean Silva	db90d4d9c1	[PM] Port LoopVectorize to the new PM. llvm-svn: 275000	2016-07-09 22:56:50 +00:00
Simon Pilgrim	606126e848	[X86][SSE] Add support for target shuffle combining to INSERTPS llvm-svn: 274990	2016-07-09 21:47:55 +00:00
Simon Pilgrim	890b415902	[X86][SSE] Regenerate vector shift tests llvm-svn: 274987	2016-07-09 20:55:20 +00:00
David Majnemer	28c3646f82	[COFF, Dwarf] Don't emit DW_AT_location for dllimported entities There exists no relocation which can describe the address of a dllimported variable: do not try to describe their location. llvm-svn: 274986	2016-07-09 20:47:48 +00:00
Jingyue Wu	debce55ac3	[SLSR] Fix crash on handling 128-bit integers. ConstantInt::getSExtValue may fail on >64-bit integers. Add checks to call getSExtValue only on narrow integers. As a minor aside, simplify slsr-gep.ll to remove unnecessary load instructions. llvm-svn: 274982	2016-07-09 19:13:18 +00:00
Jacques Pienaar	b32a912f72	[lanai] Treat .t as optional in assembly parser for RR operands and add predicate operand to ShiftRR llvm-svn: 274980	2016-07-09 18:26:04 +00:00
Matt Arsenault	c1e6a45f2e	AMDGPU: Merge / reorganize tests llvm-svn: 274972	2016-07-09 08:02:28 +00:00
Matt Arsenault	b2cb5f8105	AMDGPU: Simplify tests with per function subtargets llvm-svn: 274971	2016-07-09 07:55:03 +00:00
Matt Arsenault	dfec5ce032	AMDGPU: Fix fdiv lowering when f32 denormals supported Also fix test not actually using function labels. llvm-svn: 274969	2016-07-09 07:48:11 +00:00
Craig Topper	70610cf7b6	[X86] Remove and autoupgrade 512-bit non-temporal store intrinsics. llvm-svn: 274966	2016-07-09 04:38:27 +00:00
Davide Italiano	92b933a55c	[PM] Port CrossDSOCFI to the new pass manager. llvm-svn: 274962	2016-07-09 03:25:35 +00:00
Davide Italiano	cd96cfd8df	[PM] Port LoopSimplify to the new pass manager. While here move simplifyLoop() function to the new header, as suggested by Chandler in the review. Differential Revision: http://reviews.llvm.org/D21404 llvm-svn: 274959	2016-07-09 03:03:01 +00:00
Matt Arsenault	1322b6f8bb	AMDGPU: Improve offset folding for register indexing llvm-svn: 274954	2016-07-09 01:13:56 +00:00
Matthias Braun	152e7c8b12	VirtRegMap: Replace some identity copies with KILL instructions. An identity COPY like this: %AL = COPY %AL, %EAX<imp-def> has no semantic effect, but encodes liveness information: Further users of %EAX only depend on this instruction even though it does not define the full register. Replace the COPY with a KILL instruction in those cases to maintain this liveness information. (This reverts a small part of r238588 but this time adds a comment explaining why a KILL instruction is useful). llvm-svn: 274952	2016-07-09 00:19:07 +00:00
Piotr Padlewski	7a298c1df0	Added REQUIRES to TestingGuide documentation Reviewers: alexfh, wolfgangp, rengolin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22172 llvm-svn: 274949	2016-07-08 23:47:29 +00:00
Piotr Padlewski	3b77612839	Add 'thinlto_src_module' md with asserts or -enable-import-metadata Summary: This way the metadata will be only generated when asserts enabled, or when -enable-import-metadata specified FIXED missing colon on requires. Reviewers: tejohnson, eraman, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D22167 llvm-svn: 274947	2016-07-08 23:01:49 +00:00
Piotr Padlewski	d4b792346c	Revert "Add 'thinlto_src_module' md with asserts or -enable-import-metadata" Reverting because of 17463 http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17463 This reverts commit d20cb431bba2ba43b4c65a8556cff445bfefbb7c. llvm-svn: 274946	2016-07-08 22:55:48 +00:00
Jacques Pienaar	9e70127b0a	[lanai] Update test to use peephole-opt and not peephole-opts llvm-svn: 274945	2016-07-08 22:28:29 +00:00
Anna Thomas	9ad45adfd7	Revert "InstCombine rule to fold truncs whose value is available" This reverts commit r274853. Caused failure in ppcBE build llvm-svn: 274943	2016-07-08 22:15:08 +00:00
David Majnemer	230bbfbeec	[MC, COFF] Permit a variable to be redefined Our assertions in WinCOFFStreamer had unexpected side effects resulting in symbols getting unexpectedly marked as used. This fixes PR28462. llvm-svn: 274941	2016-07-08 21:54:16 +00:00
Piotr Padlewski	d6efefa2b8	Add 'thinlto_src_module' md with asserts or -enable-import-metadata Summary: This way the metadata will be only generated when asserts enabled, or when -enable-import-metadata specified Reviewers: tejohnson, eraman, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D22167 llvm-svn: 274938	2016-07-08 21:25:39 +00:00
Matt Arsenault	3fb8f9eabf	Reapply r274829 with fix for FP vectors llvm-svn: 274937	2016-07-08 21:25:33 +00:00
Adam Nemet	f836067cc0	[LAA] Port test to the new PM This is a follow-on to r274452. The LAA with the new PM is a loop pass so we go from inner to outer loops. Also using a CHECK-NOT didn't make much sense because we print something in either case; whether an invariant is 'found' or 'not found'. llvm-svn: 274935	2016-07-08 21:24:06 +00:00
Sanjay Patel	664514f7fe	[InstCombine] don't form select from bitcasted logic ops if bitcasts have >1 use This isn't a sure thing (are 2 extra bitcasts less expensive than a logic op?), but we'll try to err on the conservative side by going with the case that has less IR instructions. Note: This question came up in http://reviews.llvm.org/D22114 , but this part is independent of that patch proposal, so I'm making this small change ahead of that one. See also: http://reviews.llvm.org/rL274926 llvm-svn: 274932	2016-07-08 21:17:51 +00:00
Sanjay Patel	5246482c7a	add another multi-use test for logic->select transform llvm-svn: 274929	2016-07-08 21:08:16 +00:00
Sanjay Patel	f4a08ede03	[InstCombine] don't form select from logic ops if it's unlikely that we'll eliminate any ops llvm-svn: 274926	2016-07-08 20:53:29 +00:00
Sanjay Patel	297a0e67b6	adjust test so it won't completely optimize away llvm-svn: 274925	2016-07-08 20:35:53 +00:00
Sanjay Patel	0733e6b61c	add tests for multi-use folding to select llvm-svn: 274922	2016-07-08 20:22:27 +00:00
Dehao Chen	429f5c735f	Remove inline hints computation from SampleProfile.cpp Summary: As we will move to use uniformed hotness check in inliner, we do not need inline hints in SampleProfile pass any more. Reviewers: dnovillo, davidxl Subscribers: eraman, llvm-commits Differential Revision: http://reviews.llvm.org/D19287 llvm-svn: 274918	2016-07-08 20:12:44 +00:00
Nico Weber	28410c6846	Revert r274829, it caused PR28472. llvm-svn: 274916	2016-07-08 19:52:19 +00:00
Simon Pilgrim	0a0e0d4e8e	[X86] Regenerated bitreverse tests to demonstrate what is going on. llvm-svn: 274915	2016-07-08 19:51:08 +00:00
Simon Pilgrim	aaaeedb8cb	[X86] Added bitreverse tests for non-legal types Requested on D21578 llvm-svn: 274914	2016-07-08 19:48:33 +00:00
Simon Pilgrim	950419f948	[X86][AVX2] Add support for target shuffle combining to VPERMPD/VPERMQ llvm-svn: 274908	2016-07-08 19:23:29 +00:00
Davide Italiano	d555bde59f	[SCCP] Fold constants as we build them whne visiting cast instructions. This should be slightly more efficient and could avoid spurious overdefined markings, as Eli pointed out. Differential Revision: http://reviews.llvm.org/D22122 llvm-svn: 274905	2016-07-08 19:13:40 +00:00
Sanjay Patel	1b6b824548	[InstCombine] check for one-use before turning simple logic op into a select llvm-svn: 274891	2016-07-08 17:26:47 +00:00
Simon Pilgrim	4ca42e232d	[SLPVectorizer][X86] Added fma vectorization tests llvm-svn: 274889	2016-07-08 17:19:13 +00:00
Sanjay Patel	910ce0d511	add test to show multi-use output llvm-svn: 274887	2016-07-08 17:12:27 +00:00
Simon Pilgrim	b600ba3b79	[X86][AVX] Added combine test that should simplify to insertps llvm-svn: 274884	2016-07-08 17:01:42 +00:00
Sanjay Patel	cbfca9e8ef	[InstCombine] allow or(sext(A), B) --> A ? -1 : B transform for vectors llvm-svn: 274883	2016-07-08 17:01:15 +00:00
Zhan Jun Liau	7d4d436c74	[SystemZ] Add support for the .word directive. Summary: Branch off the work to add support for the .word directive, using addAliasForDirective. Reviewers: koriakin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22142 llvm-svn: 274878	2016-07-08 16:50:02 +00:00

... 16 17 18 19 20 ...

39519 Commits