llvm-project

Commit Graph

Author	SHA1	Message	Date
Tom Stellard	bb763e6b47	R600/SI: Refactor SIFoldOperands to simplify immediate folding This will make a future patch much less intrusive. llvm-svn: 225358	2015-01-07 17:42:16 +00:00
Ahmed Bougacha	aa2d290997	[X86] Teach FCOPYSIGN lowering to recognize constant magnitudes. For code like: float foo(float x) { return copysign(1.0, x); } We used to generate: andps <-0.000000e+00,0,0,0>, %xmm0 movss <1.000000e+00>, %xmm1 andps <nan>, %xmm1 orps %xmm0, %xmm1 Basically doing an abs(1.0f) in the two middle instructions. We now generate: andps <-0.000000e+00,0,0,0>, %xmm0 orps <1.000000e+00,0,0,0>, %xmm0 Builds on cleanups r223415, r223542. rdar://19049548 Differential Revision: http://reviews.llvm.org/D6555 llvm-svn: 225357	2015-01-07 17:33:03 +00:00
Jonas Paulsson	fcf0cba88c	New method SDep::isNormalMemoryOrBarrier() in ScheduleDAGInstrs.cpp. Used to iterate over previously added memory dependencies in adjustChainDeps() and iterateChainSucc(). SDep::isCtrl() was previously used in these places, that also gave anti and output edges. The code may be worse if these are followed, because MisNeedChainEdge() will conservatively return true since a non-memory instruction has no memory operands, and a false chain dep will be added. It is also unnecessary since all memory accesses of interest will be reached by memory dependencies, and there is a budget limit for the number of edges traversed. This problem was found on an out-of-tree target with enabled alias analysis. No test case for an in-tree target has been found. Reviewed by Hal Finkel. llvm-svn: 225351	2015-01-07 13:38:29 +00:00
Jonas Paulsson	bf408bbe38	Fix typos in comment and option help texts. For -enable-aa-sched-mi and -use-tbaa-in-sched-mi. llvm-svn: 225350	2015-01-07 13:20:57 +00:00
Asiri Rathnayake	77436f848f	Fix regression in r225266. The change in r225266 was reviewed under D6722. But the commit r225266 has a typo, causing some MCHammer failures. This patch fixes it. Change-Id: I573efcff25003af7478ac02548ebbe929fc7f5fd llvm-svn: 225347	2015-01-07 11:22:58 +00:00
Craig Topper	39354e1b1a	[X86] Merge a switch statement inside a default case of another switch statement on the same variable. There was no additional code in the default so this should be no functional change. llvm-svn: 225345	2015-01-07 08:10:38 +00:00
Craig Topper	8b3c47ca57	[X86] Don't mark the shift by 1 instructions as isConvertibleToThreeAddress. There is no handling for them. llvm-svn: 225344	2015-01-07 08:10:36 +00:00
Craig Topper	23fa478709	[X86] Remove some unused TYPE enums from the disassembler. llvm-svn: 225343	2015-01-07 07:47:52 +00:00
Karthik Bhat	9ba55334dc	Revert r225165 and r225169 Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case where the original subtraction could have caused an overflow. Reverting the same. llvm-svn: 225341	2015-01-07 06:34:34 +00:00
Chandler Carruth	fdb4180514	[PM] Fix a pretty nasty bug where the new pass manager would invalidate passes too many time. I think this is actually the issue that someone raised with me at the developer's meeting and in an email, but that we never really got to the bottom of. Having all the testing utilities made it much easier to dig down and uncover the core issue. When a pass manager is running many passes over a single function, we need it to invalidate the analyses between each run so that they can be re-computed as needed. We also need to track the intersection of preserved higher-level analyses across all the passes that we run (for example, if there is one module analysis which all the function analyses preserve, we want to track that and propagate it). Unfortunately, this interacted poorly with any enclosing pass adaptor between two IR units. It would see the intersection of preserved analyses, and need to invalidate any other analyses, but some of the un-preserved analyses might have already been invalidated and recomputed! We would fail to propagate the fact that the analysis had already been invalidated. The solution to this struck me as really strange at first, but the more I thought about it, the more natural it seemed. After a nice discussion with Duncan about it on IRC, it seemed even nicer. The idea is that invalidating an analysis causes it to be preserved! Preserving the lack of result is trivial. If it is recomputed, great. Until something else invalidates it again, we're good. The consequence of this is that the invalidate methods on the analysis manager which operate over many passes now consume their PreservedAnalyses object, update it to "preserve" every analysis pass to which it delivers an invalidation (regardless of whether the pass chooses to be removed, or handles the invalidation itself by updating itself). Then we return this augmented set from the invalidate routine, letting the pass manager take the result and use the intersection of that across each pass run to compute the final preserved set. This accounts for all the places where the early invalidation of an analysis has already "preserved" it for a future run. I've beefed up the testing and adjusted the assertions to show that we no longer repeatedly invalidate or compute the analyses across nested pass managers. llvm-svn: 225333	2015-01-07 01:58:35 +00:00
Tom Stellard	d00a923e5d	R600/SI: Add check for amdgcn triple forgotten in r225276. llvm-svn: 225331	2015-01-07 01:17:37 +00:00
David Majnemer	5310c1e954	Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as computeOverflowForUnsignedAdd. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225329	2015-01-07 00:39:50 +00:00
David Majnemer	3b83b3fa0b	InstCombine: Just a small tidy-up llvm-svn: 225328	2015-01-07 00:39:42 +00:00
Hal Finkel	b0e9b35bc3	[PowerPC] Transform a README.txt entry into a FIXME Remove the README.txt entry regarding register allocation of CR logical ops, and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM for clarifying what was intended and highlighting the relevant text in the ISA specification. llvm-svn: 225325	2015-01-07 00:15:29 +00:00
Lang Hames	66f755f84f	Revert r224935 "Refactor duplicated code. No intended functionality change." This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin. Reverting to get the bots green while I track down the source of the changed behavior. llvm-svn: 225311	2015-01-06 23:04:36 +00:00
Matt Arsenault	d0101a2dfd	R600/SI: Add combine for isinfinite pattern llvm-svn: 225310	2015-01-06 23:00:46 +00:00
Matt Arsenault	6f6233dc58	R600/SI: Pattern match isinf to v_cmp_class instructions llvm-svn: 225307	2015-01-06 23:00:41 +00:00
Matt Arsenault	f2290336b7	R600/SI: Add basic DAG combines for fp_class llvm-svn: 225306	2015-01-06 23:00:39 +00:00
Matt Arsenault	4831ce5491	R600/SI: Add class intrinsic llvm-svn: 225305	2015-01-06 23:00:37 +00:00
Rafael Espindola	83a362cde8	Change the .ll syntax for comdats and add a syntactic sugar. In order to make comdats always explicit in the IR, we decided to make the syntax a bit more compact for the case of a GlobalObject in a comdat with the same name. Just dropping the $name causes problems for @foo = globabl i32 0, comdat $bar = comdat ... and declare void @foo() comdat $bar = comdat ... So the syntax is changed to @g1 = globabl i32 0, comdat($c1) @g2 = globabl i32 0, comdat and declare void @foo() comdat($c1) declare void @foo() comdat llvm-svn: 225302	2015-01-06 22:55:16 +00:00
Hal Finkel	ed844c4ad1	[PowerPC] Reuse a load operand in int->fp conversions int->fp conversions on PPC must be done through memory loads and stores. On a modern core, this process begins by storing the int value to memory, then loading it using a (sometimes special) FP load instruction. Unfortunately, we would do this even when the value to be converted was itself a load, and we can just use that same memory location instead of copying it to another first. There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs, because the fp_to_int operand has not been lowered when the int_to_fp is being lowered. We handle this specially by invoking fp_to_int's lowering logic (partially) and getting the necessary memory location (some trivial refactoring was done to make this possible). This is all somewhat ugly, and it would be nice if some later CodeGen stage could just clean this stuff up, but because doing so would involve modifying target-specific nodes (or instructions), it is not immediately clear how that would work. Also, remove a related entry from the README.txt for which we now generate reasonable code. llvm-svn: 225301	2015-01-06 22:31:02 +00:00
Colin LeMahieu	507dd32703	[Hexagon] Adding compound jump encodings. llvm-svn: 225291	2015-01-06 20:03:31 +00:00
Tom Stellard	9d6797ae58	R600/SI: Insert s_waitcnt before s_barrier instructions. This ensures that all memory operations are complete when all threads reach the barrier. llvm-svn: 225290	2015-01-06 19:52:07 +00:00
Tom Stellard	b3931b814a	R600/SI: Fix dependency calculation for DS writes instructions in SIInsertWaits In DS write instructions, the address operand comes before the value operand(s) which is reversed from every other instruction type. The SIInsertWait assumed that the first use for each instruction was the value, so for DS write it was protecting the address operand with s_waitcnt instructions when it should have been protecting the value operand. llvm-svn: 225289	2015-01-06 19:52:04 +00:00
Adrian Prantl	52f943b536	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." because of a tsan buildbot failure. This reverts commit 225272. Fix should be coming soon. llvm-svn: 225288	2015-01-06 19:47:27 +00:00
Colin LeMahieu	68b2e050f0	[Hexagon] Adding encoding for misc v4 instructions: boundscheck, tlbmatch, dcfetch. llvm-svn: 225283	2015-01-06 19:03:20 +00:00
Sanjoy Das	7c0ce26614	This patch teaches IndVarSimplify to add nuw and nsw to certain kinds of operations that provably don't overflow. For example, we can prove %civ.inc below does not sign-overflow. With this change, IndVarSimplify changes %civ.inc to an add nsw. define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { entry: %length = load i32* %length_ptr, !range !0 %len.sub.1 = sub i32 %length, 1 %upper = icmp slt i32 %init, %len.sub.1 br i1 %upper, label %loop, label %exit loop: %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] %civ.inc = add i32 %civ, 1 %cmp = icmp slt i32 %civ.inc, %length br i1 %cmp, label %latch, label %break latch: store i32 0, i32* %array %check = icmp slt i32 %civ.inc, %len.sub.1 br i1 %check, label %loop, label %break break: ret i32 %civ.inc exit: ret i32 42 } Differential Revision: http://reviews.llvm.org/D6748 llvm-svn: 225282	2015-01-06 19:02:56 +00:00
Colin LeMahieu	d9c605ddae	[Hexagon] Adding encoding information for absolute address loads. llvm-svn: 225279	2015-01-06 18:38:26 +00:00
Mehdi Amini	f3721bf619	SelectionDAGBuilder: move constant initialization out of loop No semantic change intended. Reviewers: resistor Differential Revision: http://reviews.llvm.org/D6834 llvm-svn: 225278	2015-01-06 18:20:04 +00:00
Tom Stellard	49f8bfdcb7	R600/SI: Add a stub GCNTargetMachine This is equivalent to the AMDGPUTargetMachine now, but it is the starting point for separating R600 and GCN functionality into separate targets. It is recommened that users start using the gcn triple for GCN-based GPUs, because using the r600 triple for these GPUs will be deprecated in the future. llvm-svn: 225277	2015-01-06 18:00:21 +00:00
Tom Stellard	07f1160a0c	Triple: Add amdgcn triple This will be used for AMD GPUs with the Graphics Core Next architecture, which are currently using by the r600 triple. llvm-svn: 225276	2015-01-06 18:00:00 +00:00
Tom Stellard	4bc014f0a7	R600/SI: Remove MachineFunction dump from AsmPrinter The dump was dependent on a feature string, which meant that it couldn't be disabled or enable on a per compile basis. llvm-svn: 225275	2015-01-06 17:59:56 +00:00
Andrea Di Biagio	f807a6f297	[CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz. This patch improves the logic added at revision 224899 (see review D6728) that teaches the backend when it is profitable to speculate calls to cttz/ctlz. The original algorithm conservatively avoided speculating more than one instruction from a basic block in a control flow grap modelling an if-statement. In particular, the only allowed instruction (excluding the terminator) was a call to cttz/ctlz. However, there are cases where we could be less conservative and still be able to speculate a call to cttz/ctlz. With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the result is zero extended/truncated in the same basic block, and the zext/trunc instruction is "free" for the target. Added new test cases to CodeGen/X86/cttz-ctlz.ll Differential Revision: http://reviews.llvm.org/D6853 llvm-svn: 225274	2015-01-06 17:41:18 +00:00
Adrian Prantl	8335a5724a	Reapply: Teach SROA how to update debug info for fragmented variables. This also rolls in the changes discussed in http://reviews.llvm.org/D6766. Defers migrating the debug info for new allocas until after all partitions are created. Thanks to Chandler for reviewing! llvm-svn: 225272	2015-01-06 17:14:10 +00:00
Filipe Cabecinhas	e71bd0c89b	Don't loop endlessly for MachO files with 0 ncmds llvm-svn: 225271	2015-01-06 17:08:26 +00:00
Colin LeMahieu	243a5481d9	[Hexagon] Fix 225267. GP register is not yet fully implemented. Removing Uses [GP] maintains existing behavior. llvm-svn: 225270	2015-01-06 16:52:38 +00:00
Adrian Prantl	0c36a75499	Implement a very basic colored syntax highlighting for llvm-dwarfdump. The color scheme is the same as the one used by the colorize dwarfdump script on Darwin. A new --color option can be used to forcibly turn color on or off. http://reviews.llvm.org/D6852 llvm-svn: 225269	2015-01-06 16:50:25 +00:00
Colin LeMahieu	1445553474	[Hexagon] Adding dealloc_return encoding and absolute address stores. llvm-svn: 225267	2015-01-06 16:15:15 +00:00
Asiri Rathnayake	52376acb69	[ARM] Cleanup so_imm* tblgen defintions No functional changes. Support for ARM's modified immediate syntax was added in r223113 and r223115 (review: D6408). That patch introduced the mod_imm* tblegen definitions which renders the existing so_imm* definitions redundant. This patch gets rid of them completely. Reviewed as: D6722 llvm-svn: 225266	2015-01-06 15:55:09 +00:00
Matt Arsenault	55e7312cd8	Convert fcmp with 0.0 from casted integers to icmp This is already handled in general when it is known the conversion can't lose bits with smaller integer types casted into wider floating point types. This pattern happens somewhat often in GPU programs that cast workitem intrinsics to float, which are often compared with 0. Specifically handle the special case of compares with zero which should also be known to not lose information. I had a more general version of this which allows equality compares if the casted float is exactly representable in the integer, but I'm not 100% confident that is always correct. Also fold cases that aren't integers to true / false. llvm-svn: 225265	2015-01-06 15:50:59 +00:00
Craig Topper	639445494f	[X86] Add OpSize32 to XBEGIN_4. Add XBEGIN_2 with OpSize16. Requires new AsmParserOperand types that detect 16-bit and 32/64-bit mode so that we choose the right instruction based on default sizing without predicates. This is necessary since predicates mess up the disassembler table building. llvm-svn: 225256	2015-01-06 08:59:30 +00:00
David Majnemer	9b6b822814	InstCombine: Bitcast call arguments from/to pointer/integer type Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing arguments and return values when DataLayout says it is safe to do so. llvm-svn: 225254	2015-01-06 08:41:31 +00:00
Craig Topper	ddbf51f904	[X86] Make isel select the 2-byte register form of INC/DEC even in non-64-bit mode. Convert to the 1-byte form in non-64-bit mode as part of MCInst lowering. Overall this seems simpler. It reduces duplication of patterns between both modes and it simplifies the memory folding/unfolding tables as they don't need to create fake instructions just to keep track of 64-bitness. llvm-svn: 225252	2015-01-06 07:35:50 +00:00
Hal Finkel	bde27836ce	[PowerPC] Remove old README.txt entry regarding struct passing Because of how Clang represents structs as arrays (at least on non-Darwin platforms), and what SROA does, etc. this is no longer a problem. llvm-svn: 225251	2015-01-06 07:23:13 +00:00
David Majnemer	29c52f7449	X86: Don't make illegal GOTTPOFF relocations "ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF relocation target a movq or addq instruction. Prohibit the truncation of such loads to movl or addl. This fixes PR22083. Differential Revision: http://reviews.llvm.org/D6839 llvm-svn: 225250	2015-01-06 07:12:52 +00:00
Hal Finkel	3fe09ea4d9	[PowerPC] Add some missing names in getTargetNodeName These are used for debugging output; NFC. llvm-svn: 225249	2015-01-06 07:02:15 +00:00
Hal Finkel	5efb918844	[PowerPC] Improve int_to_fp(fp_to_int(x)) combining The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x)) without a load/store pair is updated here with support for unsigned integers, and to support single-precision values without a third rounding step, on newer cores with the appropriate instructions. llvm-svn: 225248	2015-01-06 06:01:57 +00:00
Chandler Carruth	3472ffb37e	[PM] Add a utility pass template that synthesizes the invalidation of a specific analysis result. This is quite handy to test things, and will also likely be very useful for debugging issues. You could narrow down pass validation failures by walking these invalidate pass runs up and down the pass pipeline, etc. I've added support to the pass pipeline parsing to be able to create one of these for any analysis pass desired. Just adding this class uncovered one latent bug where the AnalysisManager CRTP base class had a hard-coded Module type rather than using IRUnitT. I've also added tests for invalidation and caching of analyses in a basic way across all the pass managers. These in turn uncovered two more bugs where we failed to correctly invalidate an analysis -- its results were invalidated but the key for re-running the pass was never cleared and so it was never re-run. Quite nasty. I'm very glad to debug this here rather than with a full system. Also, yes, the naming here is horrid. I'm going to update some of the names to be slightly less awful shortly. But really, I've no "good" ideas for naming. I'll be satisfied if I can get it to "not bad". llvm-svn: 225246	2015-01-06 04:49:44 +00:00
Craig Topper	0f2c4ac649	[X86] Remove 16-bit and 32-bit offset jump instructions from the AsmParser. We always select the 8-bit size and let the assembler backend relax to the larger size. llvm-svn: 225243	2015-01-06 04:23:57 +00:00
Craig Topper	49758aab94	[X86] Make isel select the shorter form of jump instructions instead of the long form. The assembler backend will relax to the long form if necessary. This removes a swap from long form to short form in the MCInstLowering code. Selecting the long form used to be required by the old JIT. llvm-svn: 225242	2015-01-06 04:23:53 +00:00
Chandler Carruth	0b576b377f	[PM] Add a collection of no-op analysis passes and switch the new pass manager tests to use them and be significantly more comprehensive. This, naturally, uncovered a bug where the CGSCC pass manager wasn't printing analyses when they were run. The only remaining core manipulator is I think an invalidate pass similar to the require pass. That'll be next. =] llvm-svn: 225240	2015-01-06 02:50:06 +00:00
Eric Christopher	cb0799c16d	Remove dead variable. llvm-svn: 225233	2015-01-06 01:12:42 +00:00
Eric Christopher	822f1e4dc4	Use the same call off of the TargetMachine rather than the subtarget. llvm-svn: 225232	2015-01-06 01:12:40 +00:00
Eric Christopher	d20ee0a245	Rewrite the Mips16HardFloat pass to avoid using the Subtarget. llvm-svn: 225231	2015-01-06 01:12:30 +00:00
Lang Hames	04b37c4043	Revert r225048: It broke ObjC on AArch64. I've filed http://llvm.org/PR22100 to track this issue. llvm-svn: 225228	2015-01-06 00:54:32 +00:00
Brad Smith	e78889c669	Remove X86 .quad workaround for buggy GNU assembler on OpenBSD / Bitrig. llvm-svn: 225227	2015-01-06 00:53:52 +00:00
Duncan P. N. Exon Smith	bcd960a1bc	IR: Don't drop MDNode uniquing on null operands Now that `LLVMContextImpl` can call `MDNode::dropAllReferences()` to prevent teardown madness, stop dropping uniquing just because an operand drops to null. Part of PR21532. llvm-svn: 225223	2015-01-05 23:31:54 +00:00
Duncan P. N. Exon Smith	6e6428d981	Revert "Use the integrated assembler by default on 32-bit PowerPC and SPARC" This reverts commit r225213. It's failing on multiple buildbots [1][2]. [1]: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22032 [2]: http://lab.llvm.org:8080/green/view/Clang/job/clang-stage1-cmake-RA-incremental_check/2357/ llvm-svn: 225222	2015-01-05 23:31:51 +00:00
Hal Finkel	6837077fcf	[PowerPC] Remove old README.txt entry We no longer generate horrible code for the stated function: void f(signed char a, _Bool b, _Bool c) { signed char t = 0; if (b) t = a; if (c) *a = t; } for which we now generate: .L.f: andi. 5, 5, 1 cmpldi 1, 4, 0 li 5, 0 beq 1, .LBB0_2 lbz 5, 0(3) .LBB0_2: # %if.end bclr 4, 1, 0 stb 5, 0(3) blr so we don't need the README.txt entry. llvm-svn: 225217	2015-01-05 22:20:22 +00:00
Simon Pilgrim	4c55af6850	[X86][SSE] lowerVectorShuffleAsByteShift tidyup Removed local isSequential predicate and use standard helper isSequentialOrUndefInRange instead. llvm-svn: 225216	2015-01-05 22:08:48 +00:00
Hal Finkel	9187711f08	[PowerPC] Convert a README.txt entry into a better test We now produce the desired code as noted in the README.txt file (no spurious or). Remove the README entry and improve the regression test. llvm-svn: 225214	2015-01-05 21:53:52 +00:00
Brad Smith	9cf1e26201	Use the integrated assembler by default on 32-bit PowerPC and SPARC llvm-svn: 225213	2015-01-05 21:48:16 +00:00
Hal Finkel	f4044b02a5	[PowerPC] Remove README.txt entry This entry has been rendered irrelevant now that we have proper CR bit tracking. llvm-svn: 225211	2015-01-05 21:41:26 +00:00
Colin LeMahieu	dacf057bdc	[Hexagon] Adding add/sub with carry, logical shift left by immediate and memop instructions. Removing old defs without bits and updating references. llvm-svn: 225210	2015-01-05 21:36:38 +00:00
Hal Finkel	c7d35bb5b1	[PowerPC] Add a test for truncating a shifted load We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225209	2015-01-05 21:33:14 +00:00
Frederic Riss	e541e0b327	Make DIE.h a public CodeGen header. dsymutil would like to use all the AsmPrinter/MCStreamer infrastructure to stream out the DWARF. In order to do so, it will reuse the DIE object and so this header needs to be public. The interface exposed here has some corners that cannot be used without a DwarfDebug object, but clients that want to stream Dwarf can just avoid these. Differential Revision: http://reviews.llvm.org/D6695 llvm-svn: 225208	2015-01-05 21:29:41 +00:00
Hal Finkel	a4750dec99	[PowerPC] Add another test for load/store with update We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225205	2015-01-05 21:22:42 +00:00
Hal Finkel	200d2ad188	[PowerPC] Fold i1 extensions with other ops Consider this function from our README.txt file: int foo(int a, int b) { return (a < b) << 4; } We now explicitly track CR bits by default, so the comment in the README.txt about not really having a SETCC is no longer accurate, but we did generate this somewhat silly code: cmpw 0, 3, 4 li 3, 0 li 12, 1 isel 3, 12, 3, 0 sldi 3, 3, 4 blr which generates the zext as a select between 0 and 1, and then shifts the result by a constant amount. Here we preprocess the DAG in order to fold the results of operations on an extension of an i1 value into the SELECT_I[48] pseudo instruction when the resulting constant can be materialized using one instruction (just like the 0 and 1). This was not implemented as a DAGCombine because the resulting code would have been anti-canonical and depends on replacing chained user nodes, which does not fit well into the lowering paradigm. Now we generate: cmpw 0, 3, 4 li 3, 0 li 12, 16 isel 3, 12, 3, 0 blr which is less silly. llvm-svn: 225203	2015-01-05 21:10:24 +00:00
Simon Pilgrim	71b96b35e1	[X86][SSE] Fixed description for isSequentialOrUndefInRange. NFC. llvm-svn: 225202	2015-01-05 21:09:48 +00:00
Colin LeMahieu	28bb02a8c7	[Hexagon] Adding rounding reg/reg variants, accumulating multiplies, and accumulating shifts. llvm-svn: 225201	2015-01-05 20:56:41 +00:00
Duncan P. N. Exon Smith	1c00c9f7fc	IR: Prune arguments to ValueAsMetadata::ValueAsMetadata() `LLVMContext` isn't actually used. llvm-svn: 225200	2015-01-05 20:41:25 +00:00
Colin LeMahieu	abdf2b37d8	[Hexagon] Adding V4 bit manipulating instructions, removing ALU defs without encoding bits. llvm-svn: 225199	2015-01-05 20:35:54 +00:00
Colin LeMahieu	3acfddd6b5	[Hexagon] Adding V4 logic-logic instructions and tests. llvm-svn: 225198	2015-01-05 20:14:58 +00:00
Colin LeMahieu	ff10c8c95c	[Hexagon] Adding orand, bitsplit reg/reg, and modwrap instructions. llvm-svn: 225197	2015-01-05 20:04:40 +00:00
Hal Finkel	49557f1b42	[PowerPC] Remove zexts after i32 ctlz The 64-bit semantics of cntlzw are not special, the 32-bit population count is stored as a 64-bit value in the range [0,32]. As a result, it is always zero extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a frontier instruction for the removal of unnecessary zero extensions. llvm-svn: 225192	2015-01-05 18:52:29 +00:00
Hal Finkel	4e2c78228a	[PowerPC] Remove zexts after byte-swapping loads lhbrx and lwbrx not only load their data with byte swapping, but also clear the upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG peephole optimization as frontier instructions for the removal of unnecessary zero extensions. llvm-svn: 225189	2015-01-05 18:09:06 +00:00
Colin LeMahieu	5e079577e1	[Hexagon] Adding round reg/imm and bitsplit instructions. llvm-svn: 225188	2015-01-05 18:08:21 +00:00
Saleem Abdulrasool	150a1dc5c2	SymbolRewriter: use iplist::splice The swap implementation for iplist is currently unsupported. Simply splice the old list into place, which achieves the same purpose. This is needed in order to thread the -frewrite-map-file frontend option correctly. NFC. llvm-svn: 225186	2015-01-05 17:56:32 +00:00
Saleem Abdulrasool	d37ce30888	SymbolRewriter: 80-column Wrap a couple of lines. NFC. llvm-svn: 225185	2015-01-05 17:56:29 +00:00
Ahmed Bougacha	d54c448d34	[AArch64] Improve codegen of store lane instructions by avoiding GPR usage. We used to generate code similar to: umov.b w8, v0[2] strb w8, [x0, x1] because the STRro patterns were preferred to ST1. Instead, we can avoid going through GPRs, and generate: add x8, x0, x1 st1.b { v0 }[2], [x8] This patch increases the ST1 AddedComplexity to achieve that. rdar://16372710 Differential Revision: http://reviews.llvm.org/D6202 llvm-svn: 225183	2015-01-05 17:10:26 +00:00
Ahmed Bougacha	f964df3640	[AArch64] Improve codegen of store lane 0 instructions by directly storing the subregister. For 0-lane stores, we used to generate code similar to: fmov w8, s0 str w8, [x0, x1, lsl #2] instead of: str s0, [x0, x1, lsl #2] To correct that: for store lane 0 patterns, directly match to STR <subreg>0. Byte-sized instructions don't have the special case for a 0 index, because FPR8s are defined to have untyped content. rdar://16372710 Differential Revision: http://reviews.llvm.org/D6772 llvm-svn: 225181	2015-01-05 17:02:28 +00:00
Karthik Bhat	93f27ce886	Select lower fsub,fabs pattern to fabd on AArch64 This patch lowers patterns such as- fsub v0.4s, v0.4s, v1.4s fabs v0.4s, v0.4s to fabd v0.4s, v0.4s, v1.4s on AArch64. Review: http://reviews.llvm.org/D6791 llvm-svn: 225169	2015-01-05 13:57:59 +00:00
Charlie Turner	6632d1f67e	Parse Tag_compatibility correctly. Tag_compatibility takes two arguments, but before this patch it would erroneously accept just one, it now produces an error in that case. Change-Id: I530f918587620d0d5dfebf639944d6083871ef7d llvm-svn: 225167	2015-01-05 13:26:37 +00:00
Charlie Turner	8b2caa458f	Emit the build attribute Tag_conformance. Claim conformance to version 2.09 of the ARM ABI. This build attribute must be emitted first amongst the build attributes when written to an object file. This is to simplify conformance detection by consumers. Change-Id: If9eddcfc416bc9ad6e5cc8cdcb05d0031af7657e llvm-svn: 225166	2015-01-05 13:12:17 +00:00
Karthik Bhat	8ec742c2f9	Select lower sub,abs pattern to sabd on AArch64 This patch lowers patterns such as- sub v0.4s, v0.4s, v1.4s abs v0.4s, v0.4s to sabd v0.4s, v0.4s, v1.4s on AArch64. Review: http://reviews.llvm.org/D6781 llvm-svn: 225165	2015-01-05 13:11:07 +00:00
Chandler Carruth	539dc4b9d5	[PM] Don't run the machinery of invalidating all the analysis passes when all are being preserved. We want to short-circuit this for a couple of reasons. One, I don't really want passes to grow a dependency on actually receiving their invalidate call when they've been preserved. I'm thinking about removing this entirely. But more importantly, preserving everything is likely to be the common case in a lot of scenarios, and it would be really good to bypass all of the invalidation and preservation machinery there. Avoiding calling N opaque functions to try to invalidate things that are by definition still valid seems important. =] This wasn't really inpsired by much other than seeing the spam in the logging for analyses, but it seems better ot get it checked in rather than forgetting about it. llvm-svn: 225163	2015-01-05 12:32:11 +00:00
Chandler Carruth	e5e8fb3bf6	[PM] Add names and debug logging for analysis passes to the new pass manager. This starts to allow us to test analyses more easily, but it's really only the beginning. Some of the code here is still untestable without manual changes to create analysis passes, but I wanted to factor it into a small of chunks as possible. Next up in order to be able to test things are, in no particular order: - No-op analyses passes so we don't have to use real ones to exercise the pass maneger itself. - Automatic way of generating dummy passes that require an analysis be run, including a variant that calls a 'print' method on a pass to make it even easier to print out the results of an analysis. - Dummy passes that invalidate all analyses for their IR unit so we can test invalidation and re-runs. - Automatic way to print each analysis pass as it is re-run. - Automatic but optional verification of analysis passes everywhere possible. I'm not claiming I'll get to all of these immediately, but that's what is in the pipeline at some stage. I'm fleshing out exactly what I need and what to prioritize by working on converting analyses and then trying to test the conversion. =] llvm-svn: 225162	2015-01-05 12:21:44 +00:00
Craig Topper	d3c02f177a	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert. llvm-svn: 225160	2015-01-05 10:15:49 +00:00
Jiangning Liu	40c1b35292	Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159	2015-01-05 10:08:58 +00:00
Craig Topper	dc2fc8035d	[X86] Remove the predicates from the register forms of the 2-byte inc and dec instructions. Remove the 32-bit mode only versions that existed for the disassembler. Move the patterns out of the instructions so they can still be qualified with predicates. llvm-svn: 225157	2015-01-05 08:19:12 +00:00
Craig Topper	3dcdde2e92	[X86] Simplify code a little by just summing flags instead of conditionally incrementing. NFC llvm-svn: 225156	2015-01-05 08:19:10 +00:00
Craig Topper	859677edef	[X86] Remove unnecessary redeclaration of a variable with the same assignment as the beginning of the function. NFC. llvm-svn: 225155	2015-01-05 08:19:07 +00:00
Craig Topper	9cf67c6f57	[X86] Remove a strange fixme referring to a hack that doesn't seem to exist since the code is in a comment. Can't figure out what the body of the 'if' was supposed to be anyway. llvm-svn: 225154	2015-01-05 08:19:05 +00:00
Craig Topper	e9e081cd29	[x86] Reduce text duplication for similar operand class declarations in tablegen instruction info. No functional change intended. llvm-svn: 225153	2015-01-05 08:19:03 +00:00
Craig Topper	21f984966d	[X86] Fix the immediate size to match the address size in the operand types for the move to/from absolute memory instructions. llvm-svn: 225152	2015-01-05 08:18:59 +00:00
Hal Finkel	9bb61de1be	[PowerPC] Enable speculation of cttz/ctlz PPC has an instruction for ctlz with defined zero behavior, and our lowering of cttz (provided by DAGCombine) is also efficient and branchless, so speculating these makes sense. llvm-svn: 225150	2015-01-05 05:24:42 +00:00
Chandler Carruth	73b0164fe5	[SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an assert out of the new pre-splitting in SROA. This fix makes the code do what was originally intended -- when we have a store of a load both dealing in the same alloca, we force them to both be pre-split with identical offsets. This is really quite hard to do because we can keep discovering problems as we go along. We have to track every load over the current alloca which for any resaon becomes invalid for pre-splitting, and go back to remove all stores of those loads. I've included a couple of test cases derived from PR22093 that cover the different ways this can happen. While that PR only really triggered the first of these two, its the same fundamental issue. The other challenge here is documented in a FIXME now. We end up being quite a bit more aggressive for pre-splitting when loads and stores don't refer to the same alloca. This aggressiveness comes at the cost of introducing potentially redundant loads. It isn't clear that this is the right balance. It might be considerably better to require that we only do pre-splitting when we can presplit every load and store involved in the entire operation. That would give more consistent if conservative results. Unfortunately, it requires a non-trivial change to the actual pre-splitting operation in order to correctly handle cases where we end up pre-splitting stores out-of-order. And it isn't 100% clear that this is the right direction, although I'm starting to suspect that it is. llvm-svn: 225149	2015-01-05 04:17:53 +00:00
Hal Finkel	2f61879ff4	[PowerPC] Materialize i64 constants using rotation with masking r225135 added the ability to materialize i64 constants using rotations in order to reduce the instruction count. Sometimes we can use a rotation only with some extra masking, so that we take advantage of the fact that generating a bunch of extra higher-order 1 bits is easy using li/lis. llvm-svn: 225147	2015-01-05 03:41:38 +00:00
Chandler Carruth	d174ce4ad1	[PM] Switch the new pass manager to use a reference-based API for IR units. This was debated back and forth a bunch, but using references is now clearly cleaner. Of all the code written using pointers thus far, in only one place did it really make more sense to have a pointer. In most cases, this just removes immediate dereferencing from the code. I think it is much better to get errors on null IR units earlier, potentially at compile time, than to delay it. Most notably, the legacy pass manager uses references for its routines and so as more and more code works with both, the use of pointers was likely to become really annoying. I noticed this when I ported the domtree analysis over and wrote the entire thing with references only to have it fail to compile. =/ It seemed better to switch now than to delay. We can, of course, revisit this is we learn that references are really problematic in the API. llvm-svn: 225145	2015-01-05 02:47:05 +00:00
Chandler Carruth	75c11b88af	[PM] Cleanup a const_cast and other machinery left over in this code from before I removed thet non-const use of the function. The unused variable that held the const_cast was already kindly removed by Michael. llvm-svn: 225143	2015-01-04 23:13:57 +00:00
Hal Finkel	241ba79f95	[PowerPC] Materialize i64 constants using rotation Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing a rotated constant, and then applying the inverse rotation, requires fewer instructions than the direct method. If so, do that instead. In r225132, I added support for forming constants using bit inversion. In effect, this reverts that commit and replaces it with rotation support. The bit inversion is useful for turning constants that are mostly ones into ones that are mostly zeros (thus enabling a more-efficient shift-based materialization), but the same effect can be obtained by using negative constants and a rotate, and that is at least as efficient, if not more. llvm-svn: 225135	2015-01-04 15:43:55 +00:00
Michael Kuperstein	58c3f6cc31	Fix unused variable warning for non-asserts builds. NFC. llvm-svn: 225133	2015-01-04 13:35:44 +00:00
Hal Finkel	ca6375fb75	[PowerPC] Materialize i64 constants using bit inversion Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing the bit-reversed constant, and then flipping the bits, requires fewer instructions than the direct method. If so, do that instead. llvm-svn: 225132	2015-01-04 12:35:03 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
David Majnemer	087dc8b831	InstCombine: match can find ConstantExprs, don't assume we have a Value We assumed the output of a match was a Value, this would cause us to assert because we would fail a cast<>. Instead, use a helper in the Operator family to hide the distinction between Value and Constant. This fixes PR22087. llvm-svn: 225127	2015-01-04 07:36:02 +00:00
David Majnemer	6ee8d17bc6	ValueTracking: ComputeNumSignBits should tolerate misshapen phi nodes PHI nodes can have zero operands in the middle of a transform. It is expected that utilities in Analysis don't freak out when this happens. Note that it is considered invalid to allow these misshapen phi nodes to make it to another pass. This fixes PR22086. llvm-svn: 225126	2015-01-04 07:06:53 +00:00
Lang Hames	12b12e800b	[APFloat][ADT] Fix sign handling logic for FMA results that truncate to zero. This patch adds a check for underflow when truncating results back to lower precision at the end of an FMA. The additional sign handling logic in APFloat::fusedMultiplyAdd should only be performed when the result of the addition step of the FMA (in full precision) is exactly zero, not when the result underflows to zero. Unit tests for this case and related signed zero FMA results are included. Fixes <rdar://problem/18925551>. llvm-svn: 225123	2015-01-04 01:20:55 +00:00
Saleem Abdulrasool	67f729933f	ARM: permit tail calls to weak externals on COFF Weak externals are resolved statically, so we can actually generate the tail call on PE/COFF targets without breaking the requirements. It is questionable whether we want to propagate the current behaviour for MachO as the requirements are part of the ARM ELF specifications, and it seems that prior to the SVN r215890, we would have tail'ed the call. For now, be conservative and only permit it on PE/COFF where the call will always be fully resolved. llvm-svn: 225119	2015-01-03 21:35:00 +00:00
Hal Finkel	5772566ed6	[PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117	2015-01-03 17:58:24 +00:00
Hal Finkel	d73bfba7eb	[PowerPC] Use 16-byte alignment for modern cores for functions/loops Most modern PowerPC cores prefer that functions and loops start on 16-byte-aligned boundaries (), so instruct block placement, etc. to make this happen. The branch selector has also been adjusted so account for the extra nops that might now be inserted before loop headers. () Some cores actually prefer other alignments for small loops, but that will be addressed in a follow-up commit. llvm-svn: 225115	2015-01-03 14:58:25 +00:00
Craig Topper	589ceee7f4	Minor cleanup to all the switches after MatchInstructionImpl in all the AsmParsers. Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation. llvm-svn: 225114	2015-01-03 08:16:34 +00:00
David Majnemer	8df46c9289	ValueTracking: Make computeKnownBits for Arguments a little more clear We would sometimes leave the out-param APInts untouched while going through computeKnownBits. While I don't know of a way to trigger a bug involving this in practice, it goes against the overall design of computeKnownBits. Found via code inspection. llvm-svn: 225109	2015-01-03 02:33:25 +00:00
Hal Finkel	4edc66b8de	[PowerPC] Add support for the CMPB instruction Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. llvm-svn: 225106	2015-01-03 01:16:37 +00:00
Kostya Serebryany	d421db05bb	[asan] simplify the tracing code, make it use the same guard variables as coverage llvm-svn: 225103	2015-01-03 00:54:43 +00:00
Craig Topper	ae8e1b3831	[X86] Disassembler support for move to/from %rax with a 32-bit memory offset is REX.W and AdSize prefix are both present. llvm-svn: 225099	2015-01-03 00:00:20 +00:00
Craig Topper	017b830564	[X86] Use 32-bit sign extended immediate for 64-bit LOCK_ArithBinOp with sign extended immediate. llvm-svn: 225098	2015-01-03 00:00:14 +00:00
Chandler Carruth	ce08983110	[PM] Fix some formatting where clang-format has improved recently. llvm-svn: 225092	2015-01-02 22:51:44 +00:00
Andrea Di Biagio	6477847ef4	Improved comments. No functional change intended. llvm-svn: 225080	2015-01-02 10:47:46 +00:00
Craig Topper	4e5ab81a12	[X86] Bring some better consistency to the naming of the move to/from %al/ax/eax/rax with memory offset. llvm-svn: 225078	2015-01-02 07:36:23 +00:00
David Majnemer	c8a576b5c0	InstCombine: Detect when llvm.umul.with.overflow always overflows We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077	2015-01-02 07:29:47 +00:00
David Majnemer	491331aca8	Analysis: Reformulate WillNotOverflowUnsignedMul for reusability WillNotOverflowUnsignedMul's smarts will live in ValueTracking as computeOverflowForUnsignedMul. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225076	2015-01-02 07:29:43 +00:00
Craig Topper	055845f5cb	[X86] Make the instructions that use AdSize16/32/64 co-exist together without using mode predicates. This is necessary to allow the disassembler to be able to handle AdSize32 instructions in 64-bit mode when address size prefix is used. Eventually we should probably also support 'addr32' and 'addr16' in the assembler to override the address size on some of these instructions. But for now we'll just use special operand types that will lookup the current mode size to select the right instruction. llvm-svn: 225075	2015-01-02 07:02:25 +00:00
Chandler Carruth	24ac830d7c	[SROA] Teach SROA to be more aggressive in splitting now that we have a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even partially overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074	2015-01-02 03:55:54 +00:00
Chandler Carruth	5986b541d4	[SROA] Make the computation of adjusted pointers not leak GEP instructions. I noticed this when working on dialing up how aggressively we can pre-split loads and stores. My test case wasn't passing because dead GEPs into the allocas persisted when they were built by this routine. This isn't terribly harmful, we still rewrote and promoted the alloca and I can't conceive of how to cause this to happen in a case where we will keep the exact same alloca but rewrite and promote the uses of it. If that ever happened, we'd get an assert out of mem2reg. So I don't have a direct test case yet, but the subsequent commit's test case wouldn't pass without this. There are other problems fixed by this patch that I spotted purely by inspection such as the fact that getAdjustedPtr could have actually deleted dead base pointers. I don't know how to get a base pointer to go into getAdjustedPtr today, so I think this bug could never have manifested (and I certainly can't write a test case for it) but, it wasn't the intent of the code. The code really just wanted to GC the new instructions built. That can be done more directly by comparing with the base pointer which is the only non-new instruction that this code can return. llvm-svn: 225073	2015-01-02 02:47:38 +00:00
Chandler Carruth	29c22fae46	[SROA] Fix the loop exit placement to be prior to indexing the splits array. This prevents it from walking out of bounds on the splits array. Bug found with the existing tests by ASan and by the MSVC debug build. llvm-svn: 225069	2015-01-02 00:10:22 +00:00
Chandler Carruth	c39eaa5041	[SROA] Fix two total think-os in r225061 that should have been caught on a +asserts bootstrap, but my bootstrap had asserts off. Oops. Anyways, in some places it is reasonable to cast (as a sanity check) the pointer operand to a load or store to an instruction within SROA -- namely when the pointer operand is expected to be derived from an alloca, and thus always an instruction. However, the pre-splitting code also deals with loads and stores to non-alloca pointers and there we need to just use the Value*. Nothing about the code relied on the instruction cast, it was only there essentially as an invariant assertion. Remove the two that don't actually hold. This should fix the proximate issue in PR22080, but I'm also doing an asserts bootstrap myself to see if there are other issues lurking. I'll craft a reduced test case in a moment, but I wanted to get the tree healthy as quickly as possible. llvm-svn: 225068	2015-01-01 23:26:16 +00:00
Hal Finkel	ddf8d7d155	[PowerPC] use UINT64_C instead of ul Attempting to fix PR22078 (building on 32-bit systems) by replacing my careless use of 1ul to be a uint64_t constant with UINT64_C(1). llvm-svn: 225066	2015-01-01 19:33:59 +00:00
Chandler Carruth	6044c0bc78	[SROA] Switch to using a more direct debug logging technique in one part of my new load and store splitting, and fix a bug where it logged a totally irrelevant slice rather than the actual slice in question. The logging here previously worked because we used to place new slices onto the back of the core sequence, but that caused other problems. I updated the actual code to store new slices in their own vector but didn't update the logging. There isn't a good way to reuse the logging any more, and frankly it wasn't needed. We can directly log this bit more easily. llvm-svn: 225063	2015-01-01 12:56:47 +00:00
Chandler Carruth	994cde8869	[SROA] Fix formatting with clang-format which I managed to fail to do prior to committing r225061. Sorry for that. llvm-svn: 225062	2015-01-01 12:01:03 +00:00
Chandler Carruth	0715cba02d	[SROA] Teach SROA how to much more intelligently handle split loads and stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061	2015-01-01 11:54:38 +00:00
Hal Finkel	c58ce4132a	[PowerPC] Improve instruction selection bit-permuting operations (64-bit) This is the second installment of improvements to instruction selection for "bit permutation" instruction sequences. r224318 added logic for instruction selection for 32-bit bit permutation sequences, and this adds lowering for 64-bit sequences. The 64-bit sequences are more complicated than the 32-bit ones because: a) the 64-bit versions of the 32-bit rotate-and-mask instructions work by replicating the lower 32-bits of the value-to-be-rotated into the upper 32 bits -- and integrating this into the cost modeling for the various bit group operations is non-trivial b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions cannot, in one instruction, specify the mask starting index, the mask ending index, and the rotation factor. Also, forming arbitrary 64-bit constants is more complicated than in 32-bit mode because the number of instructions necessary is value dependent. Plus, support for 'late masking' was added: it is sometimes more efficient to treat the overall value as if it had no mandatory zero bits when planning the bit-group insertions, and then mask them in at the very end. Unfortunately, as the structure of the bit groups is different in the two cases, the more feasible implementation technique was to generate both instruction sequences, and then pick the shorter one. And finally, we now generate reasonable code for i64 bswap: rldicl 5, 3, 16, 0 rldicl 4, 3, 8, 0 rldicl 6, 3, 24, 0 rldimi 4, 5, 8, 48 rldicl 5, 3, 32, 0 rldimi 4, 6, 16, 40 rldicl 6, 3, 48, 0 rldimi 4, 5, 24, 32 rldicl 5, 3, 56, 0 rldimi 4, 6, 40, 16 rldimi 4, 5, 48, 8 rldimi 4, 3, 56, 0 vs. what we used to produce: li 4, 255 rldicl 5, 3, 24, 40 rldicl 6, 3, 40, 24 rldicl 7, 3, 56, 8 sldi 8, 3, 8 sldi 10, 3, 24 sldi 12, 3, 40 rldicl 0, 3, 8, 56 sldi 9, 4, 32 sldi 11, 4, 40 sldi 4, 4, 48 andi. 5, 5, 65280 andis. 6, 6, 255 andis. 7, 7, 65280 sldi 3, 3, 56 and 8, 8, 9 and 4, 12, 4 and 9, 10, 11 or 6, 7, 6 or 5, 5, 0 or 3, 3, 4 or 7, 9, 8 or 4, 6, 5 or 3, 3, 7 or 3, 3, 4 which is 12 instructions, instead of 25, and seems optimal (at least in terms of code size). llvm-svn: 225056	2015-01-01 02:53:29 +00:00
Sanjay Patel	e68f71574f	InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050	2014-12-31 22:14:05 +00:00
Rafael Espindola	54b435ec3c	Add r224985 back with a fix. The issues was that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. Original message: Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225048	2014-12-31 17:19:34 +00:00
Colin LeMahieu	5691eb5ee7	Reverting 225045 and 225043 and XFAIL multiline.ll on hexagon llvm-svn: 225047	2014-12-31 17:14:35 +00:00
Colin LeMahieu	79e8ebada2	[Hexagon] Removing assertion to appease buildbot until I can reproduce the problem llvm-svn: 225045	2014-12-31 16:20:00 +00:00
Rafael Espindola	d4da9040de	Revert "Remove doesSectionRequireSymbols." This reverts commit r224985. I am investigating why it made an Apple bot unhappy. llvm-svn: 225044	2014-12-31 16:06:48 +00:00
Colin LeMahieu	94272611ac	[Hexagon] Changing an llvm_unreachable to an assertion and returning 0. Relocations aren't implemented yet but we don't need to abort for this in release builds. llvm-svn: 225043	2014-12-31 15:57:38 +00:00
Craig Topper	99bcab7b85	[X86] Fix disassembly of absolute moves to work correctly in 16 and 32-bit modes with all 4 combinations of OpSize and AdSize prefixes being present or not. llvm-svn: 225036	2014-12-31 07:07:31 +00:00
Craig Topper	6e518776e3	[x86] Simplify detection of jcxz/jecxz/jrcxz in disassembler. llvm-svn: 225035	2014-12-31 07:07:11 +00:00
David Majnemer	f89dc3edc9	InstCombine: try to transform A-B < 0 into A < B We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034	2014-12-31 04:21:41 +00:00
Alexey Samsonov	553185ee4b	Revert "merge consecutive stores of extracted vector elements" This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031	2014-12-31 00:40:28 +00:00
Colin LeMahieu	bc405294f0	[Hexagon] Adding accumulating add/sub, doubleword logic-not variants, doubleword bitfield extract, word parity, accumulating multiplies with saturation. llvm-svn: 225024	2014-12-31 00:08:34 +00:00
Colin LeMahieu	8971e055ae	[Hexagon] Adding double-logic on predicate instructions. llvm-svn: 225018	2014-12-30 23:22:39 +00:00
Colin LeMahieu	65f3e12ed1	[Hexagon] Adding newvalue compare and jumps. llvm-svn: 225015	2014-12-30 23:04:21 +00:00
Peter Collingbourne	64faca9b84	RTDyldMemoryManager.cpp: Make the reference to __morestack weak. This fixes the DSO build for now. Eventually we should develop some other mechanism to make this work correctly with DSOs. llvm-svn: 225014	2014-12-30 22:52:33 +00:00
David Blaikie	aeaa5bf55e	DebugInfo: Omit is_stmt from line table entries on the same line. GCC does this for non-zero discriminators and since GCC doesn't produce column info, that was the only place it comes up there. For LLVM, since we can emit discriminators and/or column info, it makes more sense to invert the condition and just test for changes in line number. This should resolve at least some of the GDB 7.5 test suite failures created by recent Clang changes that increase the location fidelity (which, since Clang defaults to including column info on Linux by default created a bunch of cases that confused GDB). In theory we could do this better/differently by grouping actual source statements together in a similar manner to the way lexical scopes are handled but given that GDB isn't really in a position to consume that (& users are probably somewhat used to different lines being different 'statements') this seems the safest and cheapest change. (I'm concerned that doing this 'right' would bloat the debugloc data even further - something Duncan's working hard to address) llvm-svn: 225011	2014-12-30 22:47:13 +00:00
Colin LeMahieu	0cba5f1b43	[Hexagon] Adding postincrement register newvalue stores. llvm-svn: 225010	2014-12-30 22:34:08 +00:00
Colin LeMahieu	9014890819	[Hexagon] Removing old newvalue store variants. Adding postincrement immediate newvalue stores. llvm-svn: 225009	2014-12-30 22:28:31 +00:00
Zoran Jovanovic	10646918d1	[mips][microMIPS] Relocate with symbol for micromips symbols Differential Revision: http://reviews.llvm.org/D6796 llvm-svn: 225008	2014-12-30 22:04:16 +00:00
Colin LeMahieu	820d5cb608	[Hexagon] Adding indexed store new-value variants. llvm-svn: 225007	2014-12-30 22:00:26 +00:00

1 2 3 4 5 ...

75497 Commits