llvm-project

Commit Graph

Author	SHA1	Message	Date
Mehdi Amini	99eab3dd06	Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263258	2016-03-11 17:15:50 +00:00
Mehdi Amini	1e9c925182	Do not specialize IRBuilder to strip names in SROA Summary: Following r263086, we are replacing this by a runtime check. More cleanup will follow on the IRBuilder itself, but I submitted this patch separately as SROA has a fancy "prefixInserter" class that needs extra-love. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18022 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263256	2016-03-11 17:15:34 +00:00
Chandler Carruth	b47f8010a9	[PM] Make the AnalysisManager parameter to run methods a reference. This was originally a pointer to support pass managers which didn't use AnalysisManagers. However, that doesn't realistically come up much and the complexity of supporting it doesn't really make sense. In fact, many parts of the pass manager were just assuming the pointer was never null already. This at least makes it much more explicit and clear. llvm-svn: 263219	2016-03-11 11:05:24 +00:00
Chandler Carruth	37f1f12226	[SROA] Fix PR25873, which Andrea Di Biagio analyzed the daylights out of, and I misdiagnosed for months and months. Andrea has had a patch for this forever, but I just couldn't see how it was fixing the root cause of the problem. It didn't make sense to me, even though the patch was perfectly good and the analysis of the actual failure event was fantastic. Well, I came back to it today because the patch has sat for far too long and needs attention and decided I wouldn't let it go until I really understood what was going on. After quite some time in the debugger, I finally realized that in fact I had just missed an important case with my previous attempt to fix PR22093 in r225149. Not only do we need to handle loads that won't be split, but stores-of-loads that we won't split. We do actually have enough logic in the presplitting to form new slices for split stores.... unless we decided not to split them! I'm so sorry that it took me this long to come to the realization that this is the issue. It seems so obvious in hind sight (of course). Anyways, the fix becomes much smaller and more focused. The fact that we're left doing integer smashing is related to the FIXME in my original commit: fundamentally, we're not aggressive about pre-splitting for loads and stores to the same alloca. If we want to get aggressive about this, it'll need both what Andrea had put into the proposed fix, but also a lot more logic to essentially iteratively pre-split the alloca until we can't do any more. As I said in that commit log, its really unclear that this is the right call. Instead, the integer blending and letting targets lower this to narrower stores seems slightly better. But we definitely shouldn't really go down that path just to fix this bug. Again, tons of thanks are owed to Andrea and others at Sony for working on this bug. I really should have seen what was going on here and re-directed them sooner. =//// llvm-svn: 263121	2016-03-10 15:31:17 +00:00
Chandler Carruth	d94a5962cc	[SROA] Clean up some really weird code, no functionality changed. We already have the instruction extracted into 'I', just cast that to a store the way we do for loads. Also, we don't enter the if unless SI is non-null, so don't test it again for null. I'm pretty sure the entire test there can be nuked, but this is just the trivial cleanup. llvm-svn: 263112	2016-03-10 14:16:18 +00:00
Artur Pilipenko	aba8fdc480	Fix buildbot failure introduced by 258010. Remove local variables became unused. llvm-svn: 258011	2016-01-17 12:59:40 +00:00
Artur Pilipenko	f84dc06e5b	Push isDereferenceableAndAlignedPointer down into isSafeToLoadUnconditionally Reviewed By: reames Differential Revision: http://reviews.llvm.org/D16226 llvm-svn: 258010	2016-01-17 12:35:29 +00:00
Artur Pilipenko	6dd6969cee	Change isSafeToLoadUnconditionally arguments order. Separated from http://reviews.llvm.org/D10920 . llvm-svn: 257894	2016-01-15 15:27:46 +00:00
Keno Fischer	d5354fdddb	[SROA] Also insert a bit piece expression if only one piece is needed Summary: If SROA creates only one piece (e.g. because the other is not needed), it still needs to create a bit_piece expression if that bit piece is smaller than the original size of the alloca. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16187 llvm-svn: 257795	2016-01-14 20:06:34 +00:00
Sanjay Patel	af674fbfd9	getParent() ^ 3 == getModule() ; NFCI llvm-svn: 255511	2015-12-14 17:24:23 +00:00
Pete Cooper	67cf9a723b	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	72bc23ef02	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Benjamin Kramer	6db3338cb1	[ScalarOpts] Remove dead code. Does not touch debug dumpers. NFC. llvm-svn: 250417	2015-10-15 15:08:58 +00:00
Duncan P. N. Exon Smith	be4d8cba1c	Scalar: Remove remaining ilist iterator implicit conversions Remove remaining `ilist_iterator` implicit conversions from LLVMScalarOpts. This change exposed some scary behaviour in lib/Transforms/Scalar/SCCP.cpp around line 1770. This patch changes a call from `Function::begin()` to `&Function::front()`, since the return was immediately being passed into another function that takes a `Function`. `Function::front()` started to assert, since the function was empty. Note that `Function::end()` does not point at a legal `Function` -- it points at an `ilist_half_node` -- so the other function was getting garbage before. (I added the missing check for `Function::isDeclaration()`.) Otherwise, no functionality change intended. llvm-svn: 250211	2015-10-13 19:26:58 +00:00
Chandler Carruth	29a18a4663	[PM] Port SROA to the new pass manager. In some ways this is a very boring port to the new pass manager as there are no interesting analyses or dependencies or other oddities. However, this does introduce the first good example of a transformation pass with non-trivial state porting to the new pass manager. I've tried to carve out patterns here to replicate elsewhere, and would appreciate comments on whether folks like these patterns: - A common need in the new pass manager is to effectively lift the pass class and some of its state into a public header file. Prior to this, LLVM used anonymous namespaces to provide "module private" types and utilities, but that doesn't scale to cases where a public header file is needed and the new pass manager will exacerbate that. The pattern I've adopted here is to use the namespace-cased-name of the core pass (what would be a module if we had them) as a module-private namespace. Then utility and other code can be declared and defined in this namespace. At some point in the future, we could even have (conditionally compiled) code that used modules features when available to do the same basic thing. - I've split the actual pass run method in two in order to expose a private method usable by the old pass manager to wrap the new class with a minimum of duplicated code. I actually looked at a bunch of ways to automate or generate these, but they are all quite terrible IMO. The fundamental need is to extract the set of analyses which need to cross this interface boundary, and that will end up being too unpredictable to effectively encapsulate IMO. This is also a relatively small amount of boiler plate that will live a relatively short time, so I'm not too worried about the fact that it is boiler plate. The rest of the patch is totally boring but results in a massive diff (sorry). It just moves code around and removes or adds qualifiers to reflect the new name and nesting structure. Differential Revision: http://reviews.llvm.org/D12773 llvm-svn: 247501	2015-09-12 09:09:14 +00:00
James Molloy	efbba72cb2	Add GlobalsAA as preserved to a bunch of transforms GlobalsAA must by definition be preserved in function passes, but the passmanager doesn't know that. Make each pass explicitly preserve GlobalsAA. llvm-svn: 247263	2015-09-10 10:22:12 +00:00
Chandler Carruth	4b682f6f24	[SROA] Fix PR24463, a crash I introduced in SROA by allowing it to handle more allocas with loads past the end of the alloca. I suspect there are some related crashers with slightly different patterns, but I'll fix those and add test cases as I find them. Thanks to David Majnemer for the excellent test case reduction here. Made this super simple to debug and fix. llvm-svn: 246289	2015-08-28 09:03:52 +00:00
Chandler Carruth	748d095ff0	[SROA] Rip out all support for SSAUpdater in SROA. This was only added to preserve the old ScalarRepl's use of SSAUpdater which was originally to avoid use of dominance frontiers. Now, we only need a domtree, and we'll need a domtree right after this pass as well and so it makes perfect sense to always and only use the dom-tree powered mem2reg. This was flag-flipper earlier and has stuck reasonably so I wanted to gut the now-dead code out of SROA before we waste more time with it. Among other things, this will make passmanager porting easier. llvm-svn: 246028	2015-08-26 09:09:29 +00:00
Benjamin Kramer	df005cbe19	Fix some comment typos. llvm-svn: 244402	2015-08-08 18:27:36 +00:00
Chandler Carruth	ccffdaf7ed	[SROA] Fix a nasty pile of bugs to do with big-endian, different alloca types and loads, loads or stores widened past the size of an alloca, etc. This started off with a bug report about big-endian behavior with bitfields and loads and stores to a { i32, i24 } struct. An initial attempt to fix this was sent for review in D10357, but that didn't really get to the root of the problem. The core issue was that canConvertValue and convertValue in SROA were handling different bitwidth integers by doing a zext of the integer. It wouldn't do a trunc though, only a zext! This would in turn lead SROA to form an i24 load from an i24 alloca, zext it to i32, and then use it. This would at least produce the wrong value for big-endian systems. One of my many false starts here was to correct the computation for big-endian systems by shifting. But this doesn't actually work because the original code has a 64-bit store to the entire 8 bytes, and a 32-bit load of the last 4 bytes, and because the alloc size is 8 bytes, we can't lose that last (least significant if bigendian) byte! The real problem here is that we're forming an i24 load in SROA which is actually not sufficiently wide to load all of the necessary bits here. The source has an i32 load, and SROA needs to form that as well. The straightforward way to do this is to disable the zext logic in canConvertValue and convertValue, forcing us to actually load all 32-bits. This seems like a really good change, but it in turn breaks several other parts of SROA. First in the chain of knock-on failures, we had places where we were doing integer-widening promotion even though some of the integer loads or stores extended past the end of the alloca's memory! There was even a comment about preventing this, but it only prevented the case where the type had a different bit size from its store size. So I added checks to handle the cases where we actually have a widened load or store and to avoid trying to special integer widening promotion in those cases. Second, we actually rely on the ability to promote in the face of loads past the end of an alloca! This is important so that we can (for example) speculate loads around PHI nodes to do more promotion. The bits loaded are garbage, but as long as they aren't used and the alignment is suitable high (which it wasn't in the test case!) this is "fine". And we can't stop promoting here, lots of things stop working well if we do. So we need to add specific logic to handle the extension (and truncation) case, but only where that extension or truncation are over bytes that are outside the alloca's allocated storage and thus totally bogus to load or store. And of course, once we add back this correct handling of extension or truncation, we need to correctly handle bigendian systems to avoid re-introducing the exact bug that started us off on this chain of misery in the first place, but this time even more subtle as it only happens along speculated loads atop a PHI node. I've ported an existing test for PHI speculation to the big-endian test file and checked that we get that part correct, and I've added several more interesting big-endian test cases that should help check that we're getting this correct. Fun times. llvm-svn: 242869	2015-07-22 03:32:42 +00:00
David Majnemer	62690b1952	[SROA] Don't de-atomic volatile loads and stores Volatile loads and stores are made visible in global state regardless of what memory is involved. It is not correct to disregard the ordering and synchronization scope because it is possible to synchronize with memory operations performed by hardware. This partially addresses PR23737. llvm-svn: 242126	2015-07-14 06:19:58 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Pete Cooper	7c4d7b8fbe	Construct ArrayRef<const T> from vector<T> ArrayRef already has a SFINAE constructor which can construct ArrayRef<const T> from ArrayRef<T*>. This adds methods to do the same directly from SmallVector and std::vector. This avoids an intermediate step through the use of makeArrayRef. Also update the users of this in LICM and SROA to remove the now unnecessary makeArrayRef call. Reviewed by David Blaikie. llvm-svn: 237309	2015-05-13 22:43:09 +00:00
Pete Cooper	41e0ee3074	Change LoadAndStorePromoter to take ArrayRef instead of SmallVectorImpl&. The array passed to LoadAndStorePromoter's constructor was a constant reference to a SmallVectorImpl, which is just the same as passing an ArrayRef. Also, the data in the array can be 'const Instruction' instead of 'Instruction'. Its not possible to convert a SmallVectorImpl<T> to SmallVectorImpl<const T>, but ArrayRef does provide such a method. Currently this added calls to makeArrayRef which should be a nop, but i'm going to kick off a discussion about improving ArrayRef to not need these. llvm-svn: 237226	2015-05-13 01:12:16 +00:00
Duncan P. N. Exon Smith	a9308c49ef	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Philip Reames	5461d45abf	Move Value.isDereferenceablePointer to ValueTracking [NFC] Move isDereferenceablePointer function to Analysis. This function recursively tracks dereferencability over a chain of values like other functions in ValueTracking. This refactoring is motivated by further changes to support dereferenceable_or_null attribute (http://reviews.llvm.org/D8650). isDereferenceablePointer will be extended to perform context-sensitive analysis and IR is not a good place to have such functionality. Patch by: Artur Pilipenko <apilipenko@azulsystems.com> Differential Revision: reviews.llvm.org/D9075 llvm-svn: 235611	2015-04-23 17:36:48 +00:00
Duncan P. N. Exon Smith	60635e39b6	DebugInfo: Drop rest of DIDescriptor subclasses Delete the remaining subclasses of (the already deleted) `DIDescriptor`. Part of PR23080. llvm-svn: 235404	2015-04-21 18:44:06 +00:00
Duncan P. N. Exon Smith	cd1aecfe36	DebugInfo: Require a DebugLoc in DIBuilder::insertDeclare() Change `DIBuilder::insertDeclare()` and `insertDbgValueIntrinsic()` to take an `MDLocation*`/`DebugLoc` parameter which it attaches to the created intrinsic. Assert at creation time that the `scope:` field's subprogram matches the variable's. There's a matching `clang` commit to use the API. The context for this is PR22778, which is removing the `inlinedAt:` field from `MDLocalVariable`, instead deferring to the `!dbg` location attached to the debug info intrinsic. The best way to ensure we always have a `!dbg` attachment is to require one at creation time. I'll be adding verifier checks next, but this API change is the best way to shake out frontend bugs. Note: I added an `llvm_unreachable()` in `bindings/go` and passed in `nullptr` for the `DebugLoc`. The `llgo` folks will eventually need to pass a valid `DebugLoc` here. llvm-svn: 235041	2015-04-15 21:18:07 +00:00
Duncan P. N. Exon Smith	6a0320a991	DebugInfo: Gut DIExpression Completely gut `DIExpression`, turning it into a simple wrapper around `MDExpression `. There are two bits of magic left: - It's constructed from `const MDExpression` but convertible to `MDExpression*`. - It's default-constructed to `nullptr`. Otherwise, it should behave quite like a raw pointer. Once I've done the same to the rest of the `DIDescriptor` subclasses, I'll come back to delete them entirely (and update call sites as necessary to deal with the missing magic). llvm-svn: 234832	2015-04-14 01:12:42 +00:00
David Blaikie	aa41cd57e0	[opaque pointer type] More GEP IRBuilder API migrations... llvm-svn: 234058	2015-04-03 21:33:42 +00:00
Mehdi Amini	a28d91d81b	DataLayout is mandatory, update the API to reflect it with references. Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740	2015-03-10 02:37:25 +00:00
Mehdi Amini	46a43556db	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270	2015-03-04 18:43:29 +00:00
Benjamin Kramer	4f6ac16292	Replace std::copy with a back inserter with vector append where feasible All of the cases were just appending from random access iterators to a vector. Using insert/append can grow the vector to the perfect size directly and moves the growing out of the loop. No intended functionalty change. llvm-svn: 230845	2015-02-28 10:11:12 +00:00
Adrian Prantl	34e7590e0d	Debug info: When updating debug info during SROA, do not emit debug info for any padding introduced by SROA. In particular, do not emit debug info for an alloca that represents only the padding introduced by a previous iteration. Fixes PR22495. llvm-svn: 228632	2015-02-09 23:57:22 +00:00
Adrian Prantl	27bd01f71c	Debug info: Use DW_OP_bit_piece instead of DW_OP_piece in the intermediate representation. This - increases consistency by using the same granularity everywhere - allows for pieces < 1 byte - DW_OP_piece didn't actually allow storing an offset. Part of PR22495. llvm-svn: 228631	2015-02-09 23:57:15 +00:00
Adrian Prantl	152ac396db	Fix PR22393. When recursively replacing an aggregate with a smaller aggregate or scalar, the debug info needs to refer to the absolute offset (relative to the entire variable) instead of storing the offset inside the smaller aggregate. llvm-svn: 227702	2015-02-01 00:58:04 +00:00
Adrian Prantl	565cc18d8f	Reapply: Teach SROA how to update debug info for fragmented variables. This reapplies r225379. ChangeLog: - The assertion that this commit previously ran into about the inability to handle indirect variables has since been removed and the backend can handle this now. - Testcases were upgrade to the new MDLocation format. - Instead of keeping a DebugDeclares map, we now use llvm::FindAllocaDbgDeclare(). Original commit message follows. Debug info: Teach SROA how to update debug info for fragmented variables. This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 226598	2015-01-20 19:42:22 +00:00
Adrian Prantl	2561bb8831	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." This reverts commit r225379 while investigating an assertion failure reported by Alexey. llvm-svn: 225424	2015-01-08 02:02:00 +00:00
Adrian Prantl	72b8ee708f	Reapply: Teach SROA how to update debug info for fragmented variables. The two buildbot failures were addressed in LLVM r225378 and CFE r225359. This rapplies commit 225272 without modifications. llvm-svn: 225379	2015-01-07 20:52:22 +00:00
Adrian Prantl	52f943b536	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." because of a tsan buildbot failure. This reverts commit 225272. Fix should be coming soon. llvm-svn: 225288	2015-01-06 19:47:27 +00:00
Adrian Prantl	8335a5724a	Reapply: Teach SROA how to update debug info for fragmented variables. This also rolls in the changes discussed in http://reviews.llvm.org/D6766. Defers migrating the debug info for new allocas until after all partitions are created. Thanks to Chandler for reviewing! llvm-svn: 225272	2015-01-06 17:14:10 +00:00
Chandler Carruth	73b0164fe5	[SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an assert out of the new pre-splitting in SROA. This fix makes the code do what was originally intended -- when we have a store of a load both dealing in the same alloca, we force them to both be pre-split with identical offsets. This is really quite hard to do because we can keep discovering problems as we go along. We have to track every load over the current alloca which for any resaon becomes invalid for pre-splitting, and go back to remove all stores of those loads. I've included a couple of test cases derived from PR22093 that cover the different ways this can happen. While that PR only really triggered the first of these two, its the same fundamental issue. The other challenge here is documented in a FIXME now. We end up being quite a bit more aggressive for pre-splitting when loads and stores don't refer to the same alloca. This aggressiveness comes at the cost of introducing potentially redundant loads. It isn't clear that this is the right balance. It might be considerably better to require that we only do pre-splitting when we can presplit every load and store involved in the entire operation. That would give more consistent if conservative results. Unfortunately, it requires a non-trivial change to the actual pre-splitting operation in order to correctly handle cases where we end up pre-splitting stores out-of-order. And it isn't 100% clear that this is the right direction, although I'm starting to suspect that it is. llvm-svn: 225149	2015-01-05 04:17:53 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
Chandler Carruth	24ac830d7c	[SROA] Teach SROA to be more aggressive in splitting now that we have a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even partially overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074	2015-01-02 03:55:54 +00:00
Chandler Carruth	5986b541d4	[SROA] Make the computation of adjusted pointers not leak GEP instructions. I noticed this when working on dialing up how aggressively we can pre-split loads and stores. My test case wasn't passing because dead GEPs into the allocas persisted when they were built by this routine. This isn't terribly harmful, we still rewrote and promoted the alloca and I can't conceive of how to cause this to happen in a case where we will keep the exact same alloca but rewrite and promote the uses of it. If that ever happened, we'd get an assert out of mem2reg. So I don't have a direct test case yet, but the subsequent commit's test case wouldn't pass without this. There are other problems fixed by this patch that I spotted purely by inspection such as the fact that getAdjustedPtr could have actually deleted dead base pointers. I don't know how to get a base pointer to go into getAdjustedPtr today, so I think this bug could never have manifested (and I certainly can't write a test case for it) but, it wasn't the intent of the code. The code really just wanted to GC the new instructions built. That can be done more directly by comparing with the base pointer which is the only non-new instruction that this code can return. llvm-svn: 225073	2015-01-02 02:47:38 +00:00
Chandler Carruth	29c22fae46	[SROA] Fix the loop exit placement to be prior to indexing the splits array. This prevents it from walking out of bounds on the splits array. Bug found with the existing tests by ASan and by the MSVC debug build. llvm-svn: 225069	2015-01-02 00:10:22 +00:00
Chandler Carruth	c39eaa5041	[SROA] Fix two total think-os in r225061 that should have been caught on a +asserts bootstrap, but my bootstrap had asserts off. Oops. Anyways, in some places it is reasonable to cast (as a sanity check) the pointer operand to a load or store to an instruction within SROA -- namely when the pointer operand is expected to be derived from an alloca, and thus always an instruction. However, the pre-splitting code also deals with loads and stores to non-alloca pointers and there we need to just use the Value*. Nothing about the code relied on the instruction cast, it was only there essentially as an invariant assertion. Remove the two that don't actually hold. This should fix the proximate issue in PR22080, but I'm also doing an asserts bootstrap myself to see if there are other issues lurking. I'll craft a reduced test case in a moment, but I wanted to get the tree healthy as quickly as possible. llvm-svn: 225068	2015-01-01 23:26:16 +00:00
Chandler Carruth	6044c0bc78	[SROA] Switch to using a more direct debug logging technique in one part of my new load and store splitting, and fix a bug where it logged a totally irrelevant slice rather than the actual slice in question. The logging here previously worked because we used to place new slices onto the back of the core sequence, but that caused other problems. I updated the actual code to store new slices in their own vector but didn't update the logging. There isn't a good way to reuse the logging any more, and frankly it wasn't needed. We can directly log this bit more easily. llvm-svn: 225063	2015-01-01 12:56:47 +00:00
Chandler Carruth	994cde8869	[SROA] Fix formatting with clang-format which I managed to fail to do prior to committing r225061. Sorry for that. llvm-svn: 225062	2015-01-01 12:01:03 +00:00
Chandler Carruth	0715cba02d	[SROA] Teach SROA how to much more intelligently handle split loads and stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061	2015-01-01 11:54:38 +00:00
Chandler Carruth	ffb7ce56a6	[SROA] Update the documentation and names for accessing the slices within a partition of an alloca in SROA. This reflects the fact that the organization of the slices isn't really ideal for analysis, but is the naive way in which the slices are available while we're processing them in the core partitioning algorithm. It is possible we could improve matters, and I've left a FIXME with one of my ideas for how to do this, but it is a lot of work, the benefit is somewhat minor, and it isn't clear that it would be strictly better. =/ Not really satisfying, but I'm out of really good ideas. This also improves one place where the debug logging failed to mark some split partitions. Now we log in one place, slightly later, and with accurate information about whether the slice is split by the partition being rewritten. llvm-svn: 224800	2014-12-24 01:48:09 +00:00
Chandler Carruth	5031bbe86a	[SROA] Refactor the integer and vector promotion testing logic to operate in terms of the new Partition class, and generally have a more clear set of arguments. No functionality changed. The most notable improvements here are consistently using the terminology of 'partition' for a collection of slices that will be rewritten together and 'slice' for a region of an alloca that is used by a particular instruction. This also makes it more clear that the split things are actually slices as well, just ones that will be split by the proposed partition. This doesn't yet address the confusing aspects of the partition's interface where slices that will be split by the partition and start prior to the partition are accesssed via Partition::splitSlices() while the core range of slices exposed by a Partition includes both unsplit slices and slices which will be split by the end, but started within the offset range of the partition. This is particularly hard to address because the algorithm which computes partitions quite literally doesn't know which slices these will end up being until too late. I'm looking at whether I can fix that or not, but I'm not optimistic. I'll update the comments and/or names to further explain this either way. I've also added one FIXME in this patch relating to this confusion so that I don't forget about it. llvm-svn: 224798	2014-12-24 01:05:14 +00:00
Chandler Carruth	c7d1e24b34	Revert r224739: Debug info: Teach SROA how to update debug info for fragmented variables. This caused codegen to start crashing when we built somewhat large programs with debug info and optimizations. 'check-msan' hit in, and I suspect a bootstrap would as well. I mailed a test case to the review thread. llvm-svn: 224750	2014-12-23 02:58:14 +00:00
Chandler Carruth	e2f66ceed9	[SROA] Lift the logic for traversing the alloca slices one partition at a time into a partition iterator and a Partition class. There is a lot of knock-on simplification that this enables, largely stemming from having a Partition object to refer to in lots of helpers. I've only done a minimal amount of that because enoguh stuff is changing as-is in this commit. This shouldn't change any observable behavior. I've worked hard to preserve the exact traversal semantics which were originally present even though some of them make no sense. I'll be changing some of this in subsequent commits now that the logic is carefully factored into a reusable place. The primary motivation for this change is to break the rewriting into phases in order to support more intelligent rewriting. For example, I'm planning to change how split loads and stores are rewritten to remove the significant overuse of integer bit packing in the resulting code and allow more effective secondary splitting of aggregates. For any of this to work, they have to share the exact traversal logic. llvm-svn: 224742	2014-12-22 22:46:00 +00:00
Adrian Prantl	a47ace5901	Debug info: Teach SROA how to update debug info for fragmented variables. This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 224739	2014-12-22 22:26:00 +00:00
Chandler Carruth	113dc64c67	[SROA] Run clang-format over the entire SROA pass as I wrote it before much of the glory of clang-format, and now any time I touch it I risk introducing formatting changes as part of a functional commit. Also, clang-format is way better at formatting my code than I am. Most of this is a huge improvement although I reverted a couple of places where I hit a clang-format bug with lambdas that has been filed but not (fully) fixed. llvm-svn: 224666	2014-12-20 02:39:18 +00:00
Chandler Carruth	68ea415d04	[SROA] Cleanup - remove the use of std::mem_fun_ref nonsense and use a lambda now that we have them. llvm-svn: 224500	2014-12-18 05:19:47 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
David Majnemer	c0a313b57c	SROA: The alloca type isn't a candidate promotion type for vectors The alloca's type is irrelevant, only those types which are used in a load or store of the exact size of the slice should be considered. This manifested as an assertion failure when we compared the various types: we had a size mismatch. This fixes PR21480. llvm-svn: 222499	2014-11-21 02:34:55 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
Chandler Carruth	2dc9682e59	[SROA] Change how SROA does vector-based promotion of allocas to handle cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the very few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116	2014-10-18 00:44:02 +00:00
Chandler Carruth	8393406f05	[SROA] Switch the common variable name for the 'AllocaSlices' class to 'AS'. Using 'S' as this was a terrible idea. Arguably, 'AS' is not much better, but it at least follows the idea of using initialisms and removes active confusion about the AllocaSlices variable and a Slice variable. llvm-svn: 219963	2014-10-16 21:11:55 +00:00
Chandler Carruth	61747042c1	[SROA] More range-based cleanups to SROA, these brought to you by clang-modernize. I did have to clean up the variable types and whitespace a bit because the use of auto made the code much less readable here. llvm-svn: 219962	2014-10-16 21:05:14 +00:00
Chandler Carruth	57d4cae202	[SROA] Switch a couple of overly complex iterator accessors to just be ArrayRef accessors. I think this even came up in review that this was over-engineered, and indeed it was. Time to un-build it. llvm-svn: 219958	2014-10-16 20:42:08 +00:00
Chandler Carruth	c659df9389	[SROA] Start more deeply moving SROA to use ranges rather than just iterators. There are a ton of places where it essentially wants ranges rather than just iterators. This is just the first step that adds the core slice range typedefs and uses them in a couple of places. I still have to explicitly construct them because they've not been punched throughout the entire set of code. More range-based cleanups incoming. llvm-svn: 219955	2014-10-16 20:24:07 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Hal Finkel	60db05896a	Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342	2014-09-07 18:57:58 +00:00
David Majnemer	d4cffcf073	SROA: Don't insert instructions before a PHI SROA may decide that it needs to insert a bitcast and would set it's insertion point before a PHI. This will create an invalid module right quick. Instead, choose the first insertion point in the basic block that holds our PHI. This fixes PR20822. Differential Revision: http://reviews.llvm.org/D5141 llvm-svn: 216891	2014-09-01 21:20:14 +00:00
Jingyue Wu	ec33fa9aca	[SROA] Fold a PHI node if all its incoming values are the same Summary: Fixes PR20425. During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote. Test Plan: Added three more tests in phi-and-select.ll Reviewers: nlewycky, eliben, meheff, chandlerc Reviewed By: chandlerc Subscribers: zinovy.nis, hfinkel, baldrick, llvm-commits Differential Revision: http://reviews.llvm.org/D4659 llvm-svn: 216299	2014-08-22 22:45:57 +00:00
Reid Kleckner	c36f48f08a	SROA: Handle a case of store size being smaller than allocation size In this case, we are creating an x86_fp80 slice for a union from C where the padding bytes may contain real data. An x86_fp80 alloca is 16 bytes, and that's just fine. We can't, however, use regular loads and stores to access the slice, because the store size is only 10 bytes / 80 bits. Instead, use memcpy and memset. Fixes PR18726. Reviewed By: chandlerc Differential Revision: http://reviews.llvm.org/D5012 llvm-svn: 216248	2014-08-22 00:09:56 +00:00
Craig Topper	71b7b68b74	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 216158	2014-08-21 05:55:13 +00:00
Craig Topper	6230691c91	Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870	2014-08-18 00:24:38 +00:00
Craig Topper	5229cfd163	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 215868	2014-08-17 23:47:00 +00:00
Owen Anderson	6c19ab1b5d	Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In this case, the code path dealing with vector promotion was missing the explicit checks for lifetime intrinsics that were present on the corresponding integer promotion path. llvm-svn: 215148	2014-08-07 21:07:35 +00:00
Hal Finkel	cc39b67530	AA metadata refactoring (introduce AAMDNodes) In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859	2014-07-24 12:16:19 +00:00
Hal Finkel	2e42c34d05	Allow isDereferenceablePointer to look through some bitcasts isDereferenceablePointer should not give up upon encountering any bitcast. If we're casting from a pointer to a larger type to a pointer to a small type, we can continue by examining the bitcast's operand. This missing capability was noted in a comment in the function. In order for this to work, isDereferenceablePointer now takes an optional DataLayout pointer (essentially all callers already had such a pointer available). Most code uses isDereferenceablePointer though isSafeToSpeculativelyExecute (which already took an optional DataLayout pointer), and to enable the LICM test case, LICM needs to actually provide its DL pointer to isSafeToSpeculativelyExecute (which it was not doing previously). llvm-svn: 212686	2014-07-10 05:27:53 +00:00
Duncan P. N. Exon Smith	73686d305a	SROA: Only split loads on byte boundaries r199771 accidently broke the logic that makes sure that SROA only splits load on byte boundaries. If such a split happens, some bits get lost when reassembling loads of wider types, causing data corruption. Move the width check up to reject such splits early, avoiding the corruption. Fixes PR19250. Patch by: Björn Steinbrink <bsteinbr@gmail.com> llvm-svn: 211082	2014-06-17 00:19:35 +00:00
Craig Topper	f40110f4d8	[C++] Use 'nullptr'. Transforms edition. llvm-svn: 207196	2014-04-25 05:29:35 +00:00
Chandler Carruth	964daaaf19	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE definition below all of the header #include lines, lib/Transforms/... edition. This one is tricky for two reasons. We again have a couple of passes that define something else before the includes as well. I've sunk their name macros with the DEBUG_TYPE. Also, InstCombine contains headers that need DEBUG_TYPE, so now those headers #define and #undef DEBUG_TYPE around their code, leaving them well formed modular headers. Fixing these headers was a large motivation for all of these changes, as "leaky" macros of this form are hard on the modules implementation. llvm-svn: 206844	2014-04-22 02:55:47 +00:00
Chandler Carruth	cdf4788401	[C++11] Add range based accessors for the Use-Def chain of a Value. This requires a number of steps. 1) Move value_use_iterator into the Value class as an implementation detail 2) Change it to actually be a Use iterator rather than a User iterator. 3) Add an adaptor which is a User iterator that always looks through the Use to the User. 4) Wrap these in Value::use_iterator and Value::user_iterator typedefs. 5) Add the range adaptors as Value::uses() and Value::users(). 6) Update all of the callers to correctly distinguish between whether they wanted a use_iterator (and to explicitly dig out the User when needed), or a user_iterator which makes the Use itself totally opaque. Because #6 requires churning essentially everything that walked the Use-Def chains, I went ahead and added all of the range adaptors and switched them to range-based loops where appropriate. Also because the renaming requires at least churning every line of code, it didn't make any sense to split these up into multiple commits -- all of which would touch all of the same lies of code. The result is still not quite optimal. The Value::use_iterator is a nice regular iterator, but Value::user_iterator is an iterator over Users rather than over the User objects themselves. As a consequence, it fits a bit awkwardly into the range-based world and it has the weird extra-dereferencing 'operator->' that so many of our iterators have. I think this could be fixed by providing something which transforms a range of T&s into a range of Ts, but that can be separated into another patch, and it isn't yet 100% clear whether this is the right move. However, this change gets us most of the benefit and cleans up a substantial amount of code around Use and User. =] llvm-svn: 203364	2014-03-09 03:16:01 +00:00
Chandler Carruth	7da14f1ab9	[Layering] Move InstVisitor.h into the IR library as it is pretty obviously coupled to the IR. llvm-svn: 203064	2014-03-06 03:23:41 +00:00
Chandler Carruth	9a4c9e597b	[Layering] Move DebugInfo.h into the IR library where its implementation already lives. llvm-svn: 203046	2014-03-06 00:46:21 +00:00
Chandler Carruth	12664a0b17	[Layering] Move DIBuilder.h into the IR library where its implementation already lives. llvm-svn: 203038	2014-03-06 00:22:06 +00:00
Craig Topper	3e4c697ca1	[C++11] Add 'override' keyword to virtual methods that override their base class. llvm-svn: 202953	2014-03-05 09:10:37 +00:00
Chandler Carruth	d031fe9fcf	[C++11] Remove the completely unnecessary requirement on SetVector's remove_if that its predicate is adaptable. We don't actually need this, we can write a generic adapter for any predicate. This lets us remove some very wrong std::function usages. We should never be using std::function for predicates to algorithms. This incurs an indirect call overhead for every evaluation of the predicate, and makes it very hard to inline through. llvm-svn: 202742	2014-03-03 19:28:52 +00:00
Chandler Carruth	1583e99c23	[C++11] Add two range adaptor views to User: operands and operand_values. The first provides a range view over operand Use objects, and the second provides a range view over the Value*s being used by those operands. The naming is "STL-style" rather than "LLVM-style" because we have historically named iterator methods STL-style, and range methods seem to have far more in common with their iterator counterparts than with "normal" APIs. Feel free to bikeshed on this one if you want, I'm happy to change these around if people feel strongly. I've switched code in SROA and LCG to exercise these mostly to ensure they work correctly -- we don't really have an easy way to unittest this and they're trivial. llvm-svn: 202687	2014-03-03 10:42:58 +00:00
Benjamin Kramer	d6f1f84f51	[C++11] Replace llvm::tie with std::tie. The old implementation is no longer needed in C++11. llvm-svn: 202644	2014-03-02 13:30:33 +00:00
Benjamin Kramer	b6d0bd48bd	[C++11] Replace llvm::next and llvm::prior with std::next and std::prev. Remove the old functions. llvm-svn: 202636	2014-03-02 12:27:27 +00:00
Benjamin Kramer	3a377bce4e	Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate. No intended functionality change. llvm-svn: 202588	2014-03-01 11:47:00 +00:00
Chandler Carruth	dfb2efd0da	[SROA] Use the correct index integer size in GEPs through non-default address spaces. This isn't really a correctness issue (the values are truncated) but its much cleaner. Patch by Matt Arsenault! llvm-svn: 202252	2014-02-26 10:08:16 +00:00
Chandler Carruth	286d87ed38	[SROA] Teach SROA how to handle pointers from address spaces other than the default. Based on the patch by Matt Arsenault, D1764! I switched one place to use the more direct pointer type to compute the desired address space, and I reworked the memcpy rewriting section to reflect significant refactorings that this patch helped inspire. Thanks to several of the folks who helped review and improve the patch as well. llvm-svn: 202247	2014-02-26 08:25:02 +00:00
Chandler Carruth	aa72b93ae7	[SROA] Split the alignment computation complete for the memcpy rewriting to work independently for the slice side and the other side. This allows us to only compute the minimum of the two when we actually rewrite to a memcpy that needs to take the minimum, and preserve higher alignment for one side or the other when rewriting to loads and stores. This fix was inspired by seeing the result of some refactoring that makes addrspace handling better. llvm-svn: 202242	2014-02-26 07:29:54 +00:00
Chandler Carruth	181ed05b6a	[SROA] The original refactoring inspired by the addrspace patch in D1764, which in turn set off the other refactorings to make 'getSliceAlign()' a sensible thing. There are two possible inputs to the required alignment of a memory transfer intrinsic: the alignment constraints of the source and the destination. If we are only introducing a (potentially new) offset onto one side of the transfer, we don't need to consider the alignment constraints of the other side. Use this to simplify the logic feeding into alignment computation for unsplit transfers. Also, hoist the clamp of the magical zero alignment for these intrinsics to the more customary one alignment early. This lets several other conditions melt away. No functionality changed. There is a further improvement this exposes which will change functionality, but that's arriving in a separate patch. llvm-svn: 202232	2014-02-26 05:33:36 +00:00
Chandler Carruth	47954c80ed	[SROA] Yet another slight refactoring that simplifies an API in the rewriting logic: don't pass custom offsets for the adjusted pointer to the new alloca. We always passed NewBeginOffset here. Sometimes we spelled it BeginOffset, but only when they were in fact equal. Whats worse, the API is set up so that you can't reasonably call it with anything else -- it assumes that you're passing it an offset relative to the original alloca that happens to fall within the new one. That's the whole point of NewBeginOffset, it's the clamped beginning offset. No functionality changed. llvm-svn: 202231	2014-02-26 05:12:43 +00:00
Chandler Carruth	2659e503c3	[SROA] Simplify the computing of alignment: we only ever need the alignment of the slice being rewritten, not any arbitrary offset. Every caller is really just trying to compute the alignment for the whole slice, never for some arbitrary alignment. They are also just passing a type when they have one to see if we can skip an explicit alignment in the IR by using the type's alignment. This makes for a much simpler interface. Another refactoring inspired by the addrspace patch for SROA, although only loosely related. llvm-svn: 202230	2014-02-26 05:02:19 +00:00
Chandler Carruth	735d5bee48	[SROA] Use NewOffsetBegin in the unsplit case for memset merely for consistency with memcpy rewriting, and fix a latent bug in the alignment management for memset. The alignment issue is that getAdjustedAllocaPtr is computing the relative offset into the new alloca, but the alignment isn't being set to the relative offset, it was using the the absolute offset which is into the old alloca. I don't think its possible to write a test case that actually reaches this code where the resulting alignment would be observably different, but the intent was clearly to use the relative offset within the new alloca. llvm-svn: 202229	2014-02-26 04:45:24 +00:00
Chandler Carruth	ea27cf08d8	[SROA] Use the members for New{Begin,End}Offset in the rewrite helpers rather than passing them as arguments. While I generally prefer actual arguments, in this case the readability loss is substantial. By using members we avoid repeatedly calculating the offsets, and once we're using members it is useful to ensure that those names always refer to the original-alloca-relative new offset for a rewritten slice. No functionality changed. Follow-up refactoring, all toward getting the address space patch merged. llvm-svn: 202228	2014-02-26 04:25:04 +00:00
Chandler Carruth	c46b6eb302	[SROA] Compute the New{Begin,End}Offset values once for each alloca slice being rewritten. We had the same code scattered across most of the visits. Instead, compute the new offsets and the slice size once when we start to visit a particular slice, and use the member variables from then on. This reduces quite a bit of code duplication. No functionality changed. Refactoring inspired to make it easier to apply the address space patch to SROA. llvm-svn: 202227	2014-02-26 04:20:00 +00:00
Chandler Carruth	6aedc106ba	[SROA] Fix PR18615 with some long overdue simplifications to the bounds checking in SROA. The primary change is to just rely on uge for checking that the offset is within the allocation size. This removes the explicit checks against isNegative which were terribly error prone (including the reversed logic that led to PR18615) and prevented us from supporting stack allocations larger than half the address space.... Ok, so maybe the latter isn't common but it's a silly restriction to have. Also, we used to try to support a PHI node which loaded from before the start of the allocation if any of the loaded bytes were within the allocation. This doesn't make any sense, we have never really supported loading or storing before the allocation starts. The simplified logic just doesn't care. We continue to allow loading past the end of the allocation in part to support cases where there is a PHI and some loads are larger than others and the larger ones reach past the end of the allocation. We could solve this a different and more conservative way, but I'm still somewhat paranoid about this. llvm-svn: 202224	2014-02-26 03:14:14 +00:00
Chandler Carruth	3b79b2ab4e	[SROA] Add an off-by-default strict inbounds check to SROA. I had SROA implemented this way a long time ago and due to the overwhelming bugs that surfaced, moved to a much more relaxed variant. Richard Smith would like to understand the magnitude of this problem and it seems fairly harmless to keep some flag-controlled logic to get the extremely strict behavior here. I'll remove it if it doesn't prove useful. llvm-svn: 202193	2014-02-25 21:24:45 +00:00
Rafael Espindola	935125126c	Make DataLayout a plain object, not a pass. Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM don't don't handle passes to also use DataLayout. llvm-svn: 202168	2014-02-25 17:30:31 +00:00
Chandler Carruth	25adb7b00c	[SROA] Use the original load name with the SROA-prefixed IRB rather than just "load". This helps avoid pointless de-duping with order-sensitive numbers as we already have unique names from the original load. It also makes the resulting IR quite a bit easier to read. llvm-svn: 202140	2014-02-25 11:21:48 +00:00
Chandler Carruth	cb93cd2dc9	[SROA] Thread the ability to add a pointer-specific name prefix through the pointer adjustment code. This is the primary code path that creates totally new instructions in SROA and being able to lump them based on the pointer value's name for which they were created causes significantly fewer name collisions and general noise in the debug output. This is particularly significant because it is making it much harder to track down instability in the output of SROA, as name de-duplication is a totally harmless form of instability that gets in the way of seeing real problems. The new fancy naming scheme tries to dig out the root "pre-SROA" name for pointer values and associate that all the way through the pointer formation instructions. Digging out the root is important to prevent the multiple iterative rounds of SROA from just layering too much cruft on top of cruft here. We already track the layers of SROAs iteration in the alloca name prefix. We don't need to duplicate it here. Should have no functionality change, and shouldn't have any really measurable impact on NDEBUG builds, as most of the complex logic is debug-only. llvm-svn: 202139	2014-02-25 11:19:56 +00:00
Chandler Carruth	5117553301	[SROA] Rather than copying the logic for building a name prefix into the PHI-pointer builder, just copy the builder and clobber the obvious fields. llvm-svn: 202136	2014-02-25 11:12:04 +00:00
Chandler Carruth	8183a50f9b	[SROA] Simplify some of the logic to dig out the old pointer value by using OldPtr more heavily. Lots of this code was written before the rewriter had an OldPtr member setup ahead of time. There are already asserts in place that should ensure this doesn't change any functionality. llvm-svn: 202135	2014-02-25 11:08:02 +00:00
Chandler Carruth	7625c54eb4	[SROA] Adjust to new clang-format style. llvm-svn: 202134	2014-02-25 11:07:58 +00:00
Chandler Carruth	a8c4cc68f5	[SROA] Fix a glaring bug in r202091: you have to actually write the break statement, not just think it to yourself.... No idea how this worked at all, much less survived most bots, my bootstrap, and some bot bootstraps! The Polly one didn't survive, and this was filed as PR18959. I don't have a reduced test case and honestly I'm not seeing the need. What we probably need here are better asserts / debug-build behavior in SmallPtrSet so that this madness doesn't make it so far. llvm-svn: 202129	2014-02-25 09:45:27 +00:00
Alexey Samsonov	26af6f7f1b	Silence GCC warning llvm-svn: 202119	2014-02-25 07:56:00 +00:00
Chandler Carruth	83cee7722d	[SROA] Add a debugging tool which shuffles the slices sequence prior to sorting it. This helps uncover latent reliance on the original ordering which aren't guaranteed to be preserved by std::sort (but often are), and which are based on the use-def chain orderings which also aren't (technically) guaranteed. Only available in C++11 debug builds, and behind a flag to prevent noise at the moment, but this is generally useful so figured I'd put it in the tree rather than keeping it out-of-tree. llvm-svn: 202106	2014-02-25 03:59:29 +00:00
Chandler Carruth	bb2a93241d	[SROA] Use a more direct way of determining whether we are processing the destination operand or source operand of a memmove. It so happens that it was impossible for SROA to try to rewrite self-memmove where the operands are identical, because either such a think is volatile (and we don't rewrite) or it is non-volatile, and we don't even register it as a use of the alloca. However, making the 'IsDest' test rely on this subtle fact is... Very confusing for the reader. We should use the direct and readily available test of the Use* which gives us concrete information about which operand is being rewritten. No functionality changed, I hope! ;] llvm-svn: 202103	2014-02-25 03:50:14 +00:00
Chandler Carruth	3bf18ed5e3	[SROA] Fix another instability in SROA with respect to the slice ordering. The fundamental problem that we're hitting here is that the use-def chain ordering is itself not a stable thing to be relying on in the rewriting for SROA. Further, we use a non-stable sort over the slices to arrange them based on the section of the alloca they're operating on. With a debugging STL implementation (or different implementations in stage2 and stage3) this can cause stage2 != stage3. The specific aspect of this problem fixed in this commit deals with the rewriting and load-speculation around PHIs and Selects. This, like many other aspects of the use-rewriting in SROA, is really part of the "strong SSA-formation" that is doen by SROA where it works very hard to canonicalize loads and stores in just the right way to satisfy the needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the same alloca, we test that loads downstream of the select are speculatable around it twice. If only one of the operands to the select needs to be rewritten, then if we get lucky we rewrite that one first and the select is immediately speculatable. This can cause the order of operand visitation, and thus the order of slices to be rewritten, to change an alloca from promotable to non-promotable and vice versa. The fix is to defer all of the speculation until after the rewrite phase is done. Once we've rewritten everything, we can accurately test for whether speculation will work (once, instead of twice!) and the order ceases to matter. This also happens to simplify the other subtlety of speculation -- we need to not speculate anything unless the result of speculating will make the alloca fully promotable by mem2reg. I had a previous attempt at simplifying this, but it was still pretty horrible. There is actually already a really nice test case for this in basictest.ll, but on multiple STL implementations and inputs, we just got "lucky". Fortunately, the test case is very small and we can essentially build it in exactly the opposite way to get reasonable coverage in both directions even from normal STL implementations. llvm-svn: 202092	2014-02-25 00:07:09 +00:00
Rafael Espindola	8eee97ddce	Trivial cleanup: reuse existing variable. Extracted while trying to understand http://llvm-reviews.chandlerc.com/D1764. Patch by Matt Arsenault. llvm-svn: 201425	2014-02-14 19:02:01 +00:00
Paul Robinson	af4e64d095	Disable most IR-level transform passes on functions marked 'optnone'. Ideally only those transform passes that run at -O0 remain enabled, in reality we get as close as we reasonably can. Passes are responsible for disabling themselves, it's not the job of the pass manager to do it for them. llvm-svn: 200892	2014-02-06 00:07:05 +00:00
Chandler Carruth	4de315430c	[SROA] Fix a bug which could cause the common type finding to return inconsistent results for different orderings of alloca slices. The fundamental issue is that it is just always a mistake to return early from this function. There is no effective early exit to leverage. This patch stops trynig to do so and simplifies the code a bit as a consequence. Original diagnosis and patch by James Molloy with some name tweaks by me in part reflecting feedback from Duncan Smith on the mailing list. llvm-svn: 199771	2014-01-21 23:16:05 +00:00
Chandler Carruth	1bf38c6a71	Fix a really nasty SROA bug with how we handled out-of-bounds memcpy intrinsics. Reported on the list by Evan with a couple of attempts to fix, but it took a while to dig down to the root cause. There are two overlapping bugs here, both centering around the circumstance of discovering a memcpy operand which is known to be completely outside the bounds of the alloca. First, we need to kill the other side of the memcpy if it was added to this alloca. Otherwise we'll factor it into our slicing and try to rewrite it even though we know for a fact that it is dead. This is made more tricky because we can visit the sides in either order. So we have to both kill the other side and skip instructions marked as dead. The latter really should be goodness in every case, but here is a matter of correctness. Second, we need to actually remove the uses of the alloca by the memcpy when queuing it for later deletion. Otherwise it may still be using the alloca when we go to promote it (if the rewrite re-uses the existing alloca instruction). Do this by factoring out the use-clobbering used when for nixing a Phi argument and re-using it across the operands of a to-be-deleted instruction. llvm-svn: 199590	2014-01-19 12:16:54 +00:00
Chandler Carruth	73523021d0	[PM] Split DominatorTree into a concrete analysis result object which can be used by both the new pass manager and the old. This removes it from any of the virtual mess of the pass interfaces and lets it derive cleanly from the DominatorTreeBase<> template. In turn, tons of boilerplate interface can be nuked and it turns into a very straightforward extension of the base DominatorTree interface. The old analysis pass is now a simple wrapper. The names and style of this split should match the split between CallGraph and CallGraphWrapperPass. All of the users of DominatorTree have been updated to match using many of the same tricks as with CallGraph. The goal is that the common type remains the resulting DominatorTree rather than the pass. This will make subsequent work toward the new pass manager significantly easier. Also in numerous places things became cleaner because I switched from re-running the pass (!!! mid way through some other passes run!!!) to directly recomputing the domtree. llvm-svn: 199104	2014-01-13 13:07:17 +00:00
Chandler Carruth	5ad5f15cff	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082	2014-01-13 09:26:24 +00:00
Alp Toker	f929e09b10	Add missed cleanup from r198456 All other uses of this macro in LLVM/clang have been moved to the function definition so follow suite (and the usage advice) here too for consistency. llvm-svn: 198516	2014-01-04 22:47:48 +00:00
Nico Weber	7408c7066a	Add a LLVM_DUMP_METHOD macro. The motivation is to mark dump methods as used in debug builds so that they can be called from lldb, but to not do so in release builds so that they can be dead-stripped. There's lots of potential follow-up work suggested in the thread "Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev, but everyone seems to agreen on this subset. Macro name chosen by fair coin toss. llvm-svn: 198456	2014-01-03 22:53:37 +00:00
Chandler Carruth	a126200665	Fix an issue where SROA computed different results based on the relative order of slices of the alloca which have exactly the same size and other properties. This was found by a perniciously unstable sort implementation used to flush out buggy uses of the algorithm. The fundamental idea is that findCommonType should return the best common type it can find across all of the slices in the range. There were two bugs here previously: 1) We would accept an integer type smaller than a byte-width multiple, and if there were different bit-width integer types, we would accept the first one. This caused an actual failure in the testcase updated here when the sort order changed. 2) If we found a bad combination of types or a non-load, non-store use before an integer typed load or store we would bail, but if we found the integere typed load or store, we would use it. The correct behavior is to always use an integer typed operation which covers the partition if one exists. While a clever debugging sort algorithm found problem #1 in our existing test cases, I have no useful test case ideas for #2. I spotted in by inspection when looking at this code. llvm-svn: 195118	2013-11-19 09:03:18 +00:00
Benjamin Kramer	5626259506	Drop spurious handle in comment. llvm-svn: 191172	2013-09-22 11:24:58 +00:00
Benjamin Kramer	90901a35ce	SROA: Handle casts involving vectors of pointers and integer scalars. SROA wants to convert any types of equivalent widths but it's not possible to convert vectors of pointers to an integer scalar with a single cast. As a workaround we add a bitcast to the corresponding int ptr type first. This type of cast used to be an edge case but has become common with SLP vectorization. Fixes PR17271. llvm-svn: 191143	2013-09-21 20:36:04 +00:00
Nick Lewycky	c7776f737f	Revert r187191, which broke opt -mem2reg on the testcases included in PR16867. However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146 when SROA started sending more things directly down the PromoteMemToReg path. In order to revert r187191, I also revert dependent revisions r187296, r187322 and r188146. Fixes PR16867. Does not add the testcases from that PR, but both of them should get added for both mem2reg and sroa when this revert gets unreverted. llvm-svn: 188327	2013-08-13 22:51:58 +00:00
Chandler Carruth	d7cd7e367e	Re-instate r187323 which fast-tracks promotable allocas as soon as the SROA-based analysis has enough information. This should work now that both mem2reg and the SSAUpdater-based AllocaPromoter have been updated to be able to promote the types of allocas that the SROA analysis detects. I've included tests for the AllocaPromoter that were only possible to write once we fast-tracked promotable allocas without rewriting them. This includes a test both for r187347 and r188145. Original commit log for r187323: """ Now that mem2reg understands how to cope with a slightly wider set of uses of an alloca, we can pre-compute promotability while analyzing an alloca for splitting in SROA. That lets us short-circuit the common case of a bunch of trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to within 20% of ScalarRepl for such code. My current benchmark for these numbers is PR15412, but it fits the general pattern of IR emitted by Clang so it should be widely applicable. """ llvm-svn: 188146	2013-08-11 02:17:11 +00:00
Chandler Carruth	c17283b407	Finish fixing the SSAUpdater-based AllocaPromoter strategy in SROA to cope with the more general set of patterns that are now handled by mem2reg and that we can detect quickly while doing SROA's initial analysis. Notably, this allows it to promote through no-op bitcast and GEP sequences. A core part of the SSAUpdater approach is the ability to test whether a particular instruction is part of the set being promoted. Testing this becomes significantly more complex in the world where the operand to every load and store isn't the alloca itself. I ended up using the approach of walking up the def-chain until we find the alloca. I benchmarked this against keeping a set of pointer operands and keeping a set of the loads and stores we care about, and this one seemed faster although the difference was very small. No test case yet because currently the rewriting always "fixes" the inputs to not require this. The next patch which re-enables early promotion of easy cases in SROA will include a test case that specifically exercises this aspect of the alloca promoter. llvm-svn: 188145	2013-08-11 01:56:15 +00:00
Chandler Carruth	45b136f4cf	Reformat some bits of AllocaPromoter and simplify the name and type of our visiting datastructures in the AllocaPromoter/SSAUpdater path of SROA. Also shift the order if clears around to be more consistent. No functionality changed here, this is just a cleanup. llvm-svn: 188144	2013-08-11 01:03:18 +00:00
Chandler Carruth	cd7c8cdfa1	Teach the AllocaPromoter which is wrapped around the SSAUpdater infrastructure to do promotion without a domtree the same smarts about looking through GEPs, bitcasts, etc., that I just taught mem2reg about. This way, if SROA chooses to promote an alloca which still has some noisy instructions this code can cope with them. I've not used as principled of an approach here for two reasons: 1) This code doesn't really need it as we were already set up to zip through the instructions used by the alloca. 2) I view the code here as more of a hack, and hopefully a temporary one. The SSAUpdater path in SROA is a real sore point for me. It doesn't make a lot of architectural sense for many reasons: - We're likely to end up needing the domtree anyways in a subsequent pass, so why not compute it earlier and use it. - In the future we'll likely end up needing the domtree for parts of the inliner itself. - If we need to we could teach the inliner to preserve the domtree. Part of the re-work of the pass manager will allow this to be very powerful even in large SCCs with many functions. - Ultimately, computing a domtree has gotten significantly faster since the original SSAUpdater-using code went into ScalarRepl. We no longer use domfrontiers, and much of domtree is lazily done based on queries rather than eagerly. - At this point keeping the SSAUpdater-based promotion saves a total of 0.7% on a build of the 'opt' tool for me. That's not a lot of performance given the complexity! So I'm leaving this a bit ugly in the hope that eventually we just remove all of this nonsense. I can't even readily test this because this code isn't reachable except through SROA. When I re-instate the patch that fast-tracks allocas already suitable for promotion, I'll add a testcase there that failed before this change. Before that, SROA will fix any test case I give it. llvm-svn: 187347	2013-07-29 09:06:53 +00:00
Chandler Carruth	d31370e060	Temporarily revert r187323 until I update SSAUpdater to match mem2reg. I forgot that we had two totally independent things here. :: sigh :: llvm-svn: 187327	2013-07-28 09:05:49 +00:00
Chandler Carruth	9d96100ff0	Now that mem2reg understands how to cope with a slightly wider set of uses of an alloca, we can pre-compute promotability while analyzing an alloca for splitting in SROA. That lets us short-circuit the common case of a bunch of trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to within 20% of ScalarRepl for such code. My current benchmark for these numbers is PR15412, but it fits the general pattern of IR emitted by Clang so it should be widely applicable. llvm-svn: 187323	2013-07-28 08:27:12 +00:00
Chandler Carruth	d5b806a27f	Thread DataLayout through the callers and into mem2reg. This will be useful in a subsequent patch, but causes an unfortunate amount of noise, so I pulled it out into a separate patch. llvm-svn: 187322	2013-07-28 06:43:11 +00:00
Chandler Carruth	8e3c4dc50e	Don't use all the #ifdefs to hide the stats counters and instead rely on their being optimized out in debug mode. Realistically, this just isn't going to be the slow part anyways. This also fixes unused variable warnings that are breaking LLD build bots. =/ I didn't see these at first, and kept losing track of the fact that they were broken. llvm-svn: 187297	2013-07-27 10:17:49 +00:00
Chandler Carruth	58e25d3905	Fix a problem I introduced in r187029 where we would over-eagerly schedule an alloca for another iteration in SROA. This only showed up with a mixture of promotable and unpromotable selects and phis. Added a test case for this. llvm-svn: 187031	2013-07-24 12:12:17 +00:00
Chandler Carruth	83ea195d40	Fix PR16687 where we were incorrectly promoting an alloca that had pending speculation for a phi node. The problem here is that we were using growth of the specluation set as an indicator of whether speculation would occur, and if the phi node is already in the set we don't see it grow. This is a symptom of the fact that this signal is a total hack. Unfortunately, I couldn't really come up with a non-hacky way of signaling that promotion remains valid after speculation occurs, such that we only speculate when all else looks good for promotion. In the end, I went with at least a much more explicit approach of doing the work of queuing inside the phi and select processing and setting a preposterously named flag to convey that we're in the special state of requiring speculating before promotion. Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing a testcase for this from a pretty giant, nasty assert in a big application. =] The testcase was excellent. llvm-svn: 187029	2013-07-24 09:47:28 +00:00
Nick Lewycky	6ab9d936d5	Remove extraneous null statement. No functionality change! llvm-svn: 186893	2013-07-22 23:38:27 +00:00
Jakub Staszak	cb132face0	OldPtr is llvm::Instruction. Remove unneeded cast<>. llvm-svn: 186880	2013-07-22 22:10:43 +00:00
Benjamin Kramer	08e5070bf5	SROA: Microoptimization: Remove dead entries first, then sort. While there replace an explicit struct with std::mem_fun. llvm-svn: 186761	2013-07-20 08:38:34 +00:00
Chandler Carruth	6c321c131b	Cleanup the stats counters for the new implementation. These actually count the right things and have the right names. llvm-svn: 186667	2013-07-19 10:57:36 +00:00
Chandler Carruth	1ed848d55c	Fix another assert failure very similar to PR16651's test case. This test case came from Benjamin and found the parallel bug in the vector promotion code. llvm-svn: 186666	2013-07-19 10:57:32 +00:00
Chandler Carruth	9f21fe1d65	Try to move to a more reasonable set of naming conventions given the new implementation of the SROA algorithm. We were using the term 'partition' in many places that no longer ever represented an actual partition, but rather just an arbitrary slice of an alloca. No functionality change intended here. Mostly just renaming of types, functions, variables, and rewording of comments. Several comments were rewritten to make a lot more sense in the new structure of things. The stats are still weird and not reflective of how this really works. I'll fix those up in a separate patch as it is a touch more semantic of a change... llvm-svn: 186659	2013-07-19 09:13:58 +00:00
Chandler Carruth	90a735d606	A long overdue cleanup in SROA to use 'DL' instead of 'TD' for the DataLayout variables. llvm-svn: 186656	2013-07-19 07:21:28 +00:00
Chandler Carruth	5955c9e4da	Fix PR16651, an assert introduced in my recent re-work of the innards of SROA. The crux of the issue is that now we track uses of a partition of the alloca in two places: the iterators over the partitioning uses and the previously collected split uses vector. We weren't accounting for the fact that the split uses might invalidate integer widening in ways other than due to their width (in this case due to being volatile). Further reduced testcase added to the tests. llvm-svn: 186655	2013-07-19 07:12:23 +00:00
Chandler Carruth	f0546402af	Reapply r186316 with a fix for one bug where the code could walk off the end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. llvm-svn: 186565	2013-07-18 07:15:00 +00:00
Chandler Carruth	e3899f2c2c	Revert r186316 while I track down an ASan failure and an assert from a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332	2013-07-15 17:36:21 +00:00
Chandler Carruth	e74ff4c643	Reimplement SROA yet again. Same fundamental principle, but a totally different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316	2013-07-15 10:30:19 +00:00
Craig Topper	31ee5866de	Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size. llvm-svn: 185540	2013-07-03 15:07:05 +00:00
Bob Wilson	acfc01dedf	Fix SROA to avoid unnecessary scalar conversions for 1-element vectors. When a 1-element vector alloca is promoted, a store instruction can often be rewritten without converting the value to a scalar and using an insertelement instruction to stuff it into the new alloca. This patch just adds a check to skip that conversion when it is unnecessary. This turns out to be really important for some ARM Neon operations where <1 x i64> is used to get around the fact that i64 is not a legal type. llvm-svn: 184870	2013-06-25 19:09:50 +00:00
Nadav Rotem	1e211913b5	SROA: Generate selects instead of shuffles when blending values because this is the cannonical form. Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often. llvm-svn: 180875	2013-05-01 19:53:30 +00:00

1 2 3 4 5 ...

377 Commits