llvm-project

Commit Graph

Author	SHA1	Message	Date
Justin Hibbits	98a532dd8e	Add saving and restoring of r30 to the prologue and epilogue, respectively Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 llvm-svn: 225450	2015-01-08 15:47:19 +00:00
Rafael Espindola	bec6af62b8	Explicitly handle LinkOnceODRAutoHideLinkage. NFC. We already have a test. llvm-svn: 225449	2015-01-08 15:39:50 +00:00
Rafael Espindola	7b4b2dcd0a	Update naming style and clang-format. NFC. llvm-svn: 225448	2015-01-08 15:36:32 +00:00
Kristof Beyls	933de7aa06	Fix large stack alignment codegen for ARM and Thumb2 targets This partially fixes PR13007 (ARM CodeGen fails with large stack alignment): for ARM and Thumb2 targets, but not for Thumb1, as it seems stack alignment for Thumb1 targets hasn't been supported at all. Producing an aligned stack pointer is done by zero-ing out the lower bits of the stack pointer. The BIC instruction was used for this. However, the immediate field of the BIC instruction only allows to encode an immediate that can zero out up to a maximum of the 8 lower bits. When a larger alignment is requested, a BIC instruction cannot be used; llvm was silently producing incorrect code in this case. This commit fixes code generation for large stack aligments by using the BFC instruction instead, when the BFC instruction is available. When not, it uses 2 instructions: a right shift, followed by a left shift to zero out the lower bits. The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code that unconditionally uses BIC to realign the stack pointer, so it very likely has the same problem. However, I wasn't able to produce a test case for that. This commit adds an assert so that the compiler will fail the assert instead of silently generating wrong code if this is ever reached. llvm-svn: 225446	2015-01-08 15:09:14 +00:00
Tom Stellard	654d669e56	R600/SI: Remove SIISelLowering::legalizeOperands() Its functionality has been replaced by calling SIInstrInfo::legalizeOperands() from SIISelLowering::AdjstInstrPostInstrSelection() and running the SIFoldOperands and SIShrinkInstructions passes. llvm-svn: 225445	2015-01-08 15:08:17 +00:00
Elena Demikhovsky	285fbd551a	Masked Load/Store - fixed a bug in type legalization. llvm-svn: 225441	2015-01-08 12:29:19 +00:00
Michael Kuperstein	698ea3b488	Fix include ordering, NFC. llvm-svn: 225439	2015-01-08 11:59:43 +00:00
Michael Kuperstein	46f7d525c3	[X86] Don't try to generate direct calls to TLS globals The call lowering assumes that if the callee is a global, we want to emit a direct call. This is correct for regular globals, but not for TLS ones. Differential Revision: http://reviews.llvm.org/D6862 llvm-svn: 225438	2015-01-08 11:50:58 +00:00
Michael Kuperstein	8c65e31a5a	Move SPAdj logic from PEI into the targets (NFC) PEI tries to keep track of how much starting or ending a call sequence adjusts the stack pointer by, so that it can resolve frame-index references. Currently, it takes a very simplistic view of how SP adjustments are done - both FrameStartOpcode and FrameDestroyOpcode adjust it exactly by the amount written in its first argument. This view is in fact incorrect for some targets (e.g. due to stack re-alignment, or because it may want to adjust the stack pointer in multiple steps). However, that doesn't cause breakage, because most targets (the only in-tree exception appears to be 32-bit ARM) rely on being able to simplify the call frame pseudo-instructions earlier, so this code is never hit. Moving the computation into TargetInstrInfo allows targets to override the way the adjustment is computed if they need to have a non-zero SPAdj. Differential Revision: http://reviews.llvm.org/D6863 llvm-svn: 225437	2015-01-08 11:04:38 +00:00
Craig Topper	7c10252943	[X86] Don't print 'dword ptr' or 'qword ptr' on the operand to some of the LEA variants in Intel syntax. The memory operand is inherently unsized. llvm-svn: 225432	2015-01-08 07:41:30 +00:00
Adrian Prantl	2561bb8831	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." This reverts commit r225379 while investigating an assertion failure reported by Alexey. llvm-svn: 225424	2015-01-08 02:02:00 +00:00
Quentin Colombet	a799e2e014	[RegAllocGreedy] Introduce a late pass to repair broken hints. A broken hint is a copy where both ends are assigned different colors. When a variable gets evicted in the neighborhood of such copies, it is likely we can reconcile some of them. Context Copies are inserted during the register allocation via splitting. These split points are required to relax the constraints on the allocation problem. When such a point is inserted, both ends of the copy would not share the same color with respect to the current allocation problem. When variables get evicted, the allocation problem becomes different and some split point may not be required anymore. However, the related variables may already have been colored. This usually shows up in the assembly with pattern like this: def A ... save A to B def A use A restore A from B ... use B Whereas we could simply have done: def B ... def A use A ... use B Proposed Solution A variable having a broken hint is marked for late recoloring if and only if selecting a register for it evict another variable. Indeed, if no eviction happens this is pointless to look for recoloring opportunities as it means the situation was the same as the initial allocation problem where we had to break the hint. Finally, when everything has been allocated, we look for recoloring opportunities for all the identified candidates. The recoloring is performed very late to rely on accurate copy cost (all involved variables are allocated). The recoloring is simple unlike the last change recoloring. It propagates the color of the broken hint to all its copy-related variables. If the color is available for them, the recoloring uses it, otherwise it gives up on that hint even if a more complex coloring would have worked. The recoloring happens only if it is profitable. The profitability is evaluated using the expected frequency of the copies of the currently recolored variable with a) its current color and b) with the target color. If a) is greater or equal than b), then it is profitable and the recoloring happen. Example Consider the following example: BB1: a = b = BB2: ... = b = a Let us assume b gets split: BB1: a = b = BB2: c = b ... d = c = d = a Because of how the allocation work, b, c, and d may be assigned different colors. Now, if a gets evicted to make room for c, assuming b and d were assigned to something different than a. We end up with: BB1: a = st a, SpillSlot b = BB2: c = b ... d = c = d e = ld SpillSlot = e This is likely that we can assign the same register for b, c, and d, getting rid of 2 copies. Performances Both ARM64 and x86_64 show performance improvements of up to 3% for the llvm-testsuite + externals with Os and O3. There are a few regressions too that comes from the (in)accuracy of the block frequency estimate. <rdar://problem/18312047> llvm-svn: 225422	2015-01-08 01:16:39 +00:00
Ahmed Bougacha	2b6917b020	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421	2015-01-08 00:51:32 +00:00
Nick Lewycky	c99cc19650	Remove empty statement. No functionality change. llvm-svn: 225420	2015-01-08 00:47:03 +00:00
Matthias Braun	ada0adf396	X86: VZeroUpperInserter: shortcut should not trigger if we have any function live-ins. llvm-svn: 225419	2015-01-08 00:33:48 +00:00
Matthias Braun	9d7bc0874c	RegisterCoalescer: Do not remove IMPLICIT_DEFS if they are required for subranges. The register coalescer used to remove implicit_defs when they are covered by the main range anyway. With subreg liveness tracking we can't do that anymore in places where the IMPLICIT_DEF is required as begin of a subregister liverange. llvm-svn: 225416	2015-01-08 00:21:23 +00:00
Matthias Braun	d55e6ddacf	RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases. I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a subregister index in SrcReg/DstReg, but they are actually subregister indices of the coalesced register that get you back to SrcReg/DstReg when applied. Fixed the bug, improved comments and simplified code accordingly. Testcase by Tom Stellard! llvm-svn: 225415	2015-01-07 23:58:38 +00:00
Matthias Braun	4fe686af00	LiveInterval: Implement feedback by Quentin Colombet. llvm-svn: 225413	2015-01-07 23:35:11 +00:00
Philip Reames	76ebd15437	[GC] improve testing around gc.relocate and fix a test Patch by: Ramkumar Ramachandra <artagnon@gmail.com> "This patch started out as an exploration of gc.relocate, and an attempt to write a simple test in call-lowering. I then noticed that the arguments of gc.relocate were not checked fully, so I went in and fixed a few things. Finally, the most important outcome of this patch is that my new error handling code caught a bug in a callsite in stackmap-format." Differential Revision: http://reviews.llvm.org/D6824 llvm-svn: 225412	2015-01-07 22:48:01 +00:00
Tom Stellard	0599297cb4	R600/SI: Commute instructions to enable more folding opportunities llvm-svn: 225410	2015-01-07 22:44:19 +00:00
Duncan P. N. Exon Smith	5e5b85098d	IR: Add MDNode::getDistinct() Allow distinct `MDNode`s to be explicitly created. There's no way (yet) of representing their distinctness in assembly/bitcode, however, so this still isn't first-class. Part of PR22111. llvm-svn: 225406	2015-01-07 22:24:46 +00:00
Tom Stellard	26cc18df43	R600/SI: Only fold immediates that have one use Folding the same immediate into multiple instruction will increase program size, which can hurt performance. llvm-svn: 225405	2015-01-07 22:18:27 +00:00
Adrian Prantl	d88af278b9	Update a comment. llvm-svn: 225399	2015-01-07 21:35:13 +00:00
Duncan P. N. Exon Smith	df55d8ba83	Linker: Don't use MDNode::replaceOperandWith() `MDNode::replaceOperandWith()` changes all instances of metadata. Stop using it when linking module flags, since (due to uniquing) the flag values could be used by other metadata. Instead, use new API `NamedMDNode::setOperand()` to update the reference directly. llvm-svn: 225397	2015-01-07 21:32:27 +00:00
Ahmed Bougacha	67dd2d25a3	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended. A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392	2015-01-07 21:27:10 +00:00
Tom Stellard	45c0b3a882	R600/SI: Remove VReg_32 register class Use VGPR_32 register class instead. These two register classes were identical and having separate classes was causing SIInstrInfo::isLegalOperands() to be overly conservative in some cases. This change is necessary to prevent future paches from missing a folding opportunity in fneg-fabs.ll. llvm-svn: 225382	2015-01-07 20:59:25 +00:00
Olivier Sallenave	0451532996	More FMA folding opportunities. llvm-svn: 225380	2015-01-07 20:54:17 +00:00
Adrian Prantl	72b8ee708f	Reapply: Teach SROA how to update debug info for fragmented variables. The two buildbot failures were addressed in LLVM r225378 and CFE r225359. This rapplies commit 225272 without modifications. llvm-svn: 225379	2015-01-07 20:52:22 +00:00
Adrian Prantl	3dd48c6fde	Debug info: Allow aggregate types to be described by constants. llvm-svn: 225378	2015-01-07 20:48:58 +00:00
Colin LeMahieu	92b49c3e39	[Hexagon] Fix 225372 USR register is not fully complete. Removing Uses = [USR] maintains existing functionality to old instructions without encodings. llvm-svn: 225377	2015-01-07 20:43:38 +00:00
Colin LeMahieu	627df427eb	[Hexagon] Adding floating point classification and creation. llvm-svn: 225374	2015-01-07 20:28:57 +00:00
Tom Stellard	4842c05216	R600/SI: Add a V_MOV_B64 pseudo instruction This is used to simplify the SIFoldOperands pass and make it easier to fold immediates. llvm-svn: 225373	2015-01-07 20:27:25 +00:00
Colin LeMahieu	290ece7d4c	[Hexagon] Adding encodings for v5 floating point instructions. llvm-svn: 225372	2015-01-07 20:24:09 +00:00
Colin LeMahieu	777abcb1d7	[Hexagon] Adding encoding for popcount, fastcorner, dword asr with rounding. llvm-svn: 225371	2015-01-07 20:07:28 +00:00
Tom Stellard	ef3b864a07	R600/SI: Teach SIFoldOperands to split 64-bit constants when folding This allows folding of sequences like: s[0:1] = s_mov_b64 4 v_add_i32 v0, s0, v0 v_addc_u32 v1, s1, v1 into v_add_i32 v0, 4, v0 v_add_i32 v1, 0, v1 llvm-svn: 225369	2015-01-07 19:56:17 +00:00
Olivier Sallenave	e64ad7cedd	Test commit llvm-svn: 225368	2015-01-07 19:45:17 +00:00
Ahmed Bougacha	b994d0c0c5	[X86] Fix 512->256 typo in comments. NFC. llvm-svn: 225367	2015-01-07 19:38:50 +00:00
Philip Reames	352fb93773	Add a missing file from 225365 llvm-svn: 225366	2015-01-07 19:13:28 +00:00
Philip Reames	4ac17a3026	Introduce an example statepoint GC strategy This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.) Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811 There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once. Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others. Differential Revision: http://reviews.llvm.org/D6808 llvm-svn: 225365	2015-01-07 19:07:50 +00:00
David Majnemer	4d77fdf311	X86: Allow the stack probe size to be configurable per function LLVM emits stack probes on Windows targets to ensure that the stack is correctly accessed. However, the amount of stack allocated before emitting such a probe is hardcoded to 4096. It is desirable to have this be configurable so that a function might opt-out of stack probes. Our level of granularity is at the function level instead of, say, the module level to permit proper generation of code after LTO. Patch by Andrew H! N.B. The inliner needs to be updated to properly consider what happens after inlining a function with a specific stack-probe-size into another function with a different stack-probe-size. llvm-svn: 225360	2015-01-07 18:14:07 +00:00
Tom Stellard	bb763e6b47	R600/SI: Refactor SIFoldOperands to simplify immediate folding This will make a future patch much less intrusive. llvm-svn: 225358	2015-01-07 17:42:16 +00:00
Ahmed Bougacha	aa2d290997	[X86] Teach FCOPYSIGN lowering to recognize constant magnitudes. For code like: float foo(float x) { return copysign(1.0, x); } We used to generate: andps <-0.000000e+00,0,0,0>, %xmm0 movss <1.000000e+00>, %xmm1 andps <nan>, %xmm1 orps %xmm0, %xmm1 Basically doing an abs(1.0f) in the two middle instructions. We now generate: andps <-0.000000e+00,0,0,0>, %xmm0 orps <1.000000e+00,0,0,0>, %xmm0 Builds on cleanups r223415, r223542. rdar://19049548 Differential Revision: http://reviews.llvm.org/D6555 llvm-svn: 225357	2015-01-07 17:33:03 +00:00
Jonas Paulsson	fcf0cba88c	New method SDep::isNormalMemoryOrBarrier() in ScheduleDAGInstrs.cpp. Used to iterate over previously added memory dependencies in adjustChainDeps() and iterateChainSucc(). SDep::isCtrl() was previously used in these places, that also gave anti and output edges. The code may be worse if these are followed, because MisNeedChainEdge() will conservatively return true since a non-memory instruction has no memory operands, and a false chain dep will be added. It is also unnecessary since all memory accesses of interest will be reached by memory dependencies, and there is a budget limit for the number of edges traversed. This problem was found on an out-of-tree target with enabled alias analysis. No test case for an in-tree target has been found. Reviewed by Hal Finkel. llvm-svn: 225351	2015-01-07 13:38:29 +00:00
Jonas Paulsson	bf408bbe38	Fix typos in comment and option help texts. For -enable-aa-sched-mi and -use-tbaa-in-sched-mi. llvm-svn: 225350	2015-01-07 13:20:57 +00:00
Asiri Rathnayake	77436f848f	Fix regression in r225266. The change in r225266 was reviewed under D6722. But the commit r225266 has a typo, causing some MCHammer failures. This patch fixes it. Change-Id: I573efcff25003af7478ac02548ebbe929fc7f5fd llvm-svn: 225347	2015-01-07 11:22:58 +00:00
Craig Topper	39354e1b1a	[X86] Merge a switch statement inside a default case of another switch statement on the same variable. There was no additional code in the default so this should be no functional change. llvm-svn: 225345	2015-01-07 08:10:38 +00:00
Craig Topper	8b3c47ca57	[X86] Don't mark the shift by 1 instructions as isConvertibleToThreeAddress. There is no handling for them. llvm-svn: 225344	2015-01-07 08:10:36 +00:00
Craig Topper	23fa478709	[X86] Remove some unused TYPE enums from the disassembler. llvm-svn: 225343	2015-01-07 07:47:52 +00:00
Karthik Bhat	9ba55334dc	Revert r225165 and r225169 Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case where the original subtraction could have caused an overflow. Reverting the same. llvm-svn: 225341	2015-01-07 06:34:34 +00:00
Chandler Carruth	fdb4180514	[PM] Fix a pretty nasty bug where the new pass manager would invalidate passes too many time. I think this is actually the issue that someone raised with me at the developer's meeting and in an email, but that we never really got to the bottom of. Having all the testing utilities made it much easier to dig down and uncover the core issue. When a pass manager is running many passes over a single function, we need it to invalidate the analyses between each run so that they can be re-computed as needed. We also need to track the intersection of preserved higher-level analyses across all the passes that we run (for example, if there is one module analysis which all the function analyses preserve, we want to track that and propagate it). Unfortunately, this interacted poorly with any enclosing pass adaptor between two IR units. It would see the intersection of preserved analyses, and need to invalidate any other analyses, but some of the un-preserved analyses might have already been invalidated and recomputed! We would fail to propagate the fact that the analysis had already been invalidated. The solution to this struck me as really strange at first, but the more I thought about it, the more natural it seemed. After a nice discussion with Duncan about it on IRC, it seemed even nicer. The idea is that invalidating an analysis causes it to be preserved! Preserving the lack of result is trivial. If it is recomputed, great. Until something else invalidates it again, we're good. The consequence of this is that the invalidate methods on the analysis manager which operate over many passes now consume their PreservedAnalyses object, update it to "preserve" every analysis pass to which it delivers an invalidation (regardless of whether the pass chooses to be removed, or handles the invalidation itself by updating itself). Then we return this augmented set from the invalidate routine, letting the pass manager take the result and use the intersection of that across each pass run to compute the final preserved set. This accounts for all the places where the early invalidation of an analysis has already "preserved" it for a future run. I've beefed up the testing and adjusted the assertions to show that we no longer repeatedly invalidate or compute the analyses across nested pass managers. llvm-svn: 225333	2015-01-07 01:58:35 +00:00

1 2 3 4 5 ...

75437 Commits