llvm-project

Commit Graph

Author	SHA1	Message	Date
Dawid Jurczak	43234b1595	[DSE] Transform memset + malloc --> calloc (PR25892) After this change DSE can eliminate malloc + memset and emit calloc. It's https://reviews.llvm.org/D101440 follow-up. Differential Revision: https://reviews.llvm.org/D103009	2021-07-20 11:39:05 +02:00
Florian Mayer	5f08219322	Revert "[hwasan] Use stack safety analysis." This reverts commit `e9c63ed10b`.	2021-07-20 10:36:46 +01:00
Florian Mayer	e9c63ed10b	[hwasan] Use stack safety analysis. This avoids unnecessary instrumentation. Reviewed By: eugenis, vitalybuka Differential Revision: https://reviews.llvm.org/D105703	2021-07-20 10:06:35 +01:00
Johannes Doerfert	c66cbee140	[Attributor] Use set vector instead of vector to prevent duplicates	2021-07-20 01:39:34 -05:00
Johannes Doerfert	5957cf9f11	[Attributor] Simplify to values in the genericValueTraversal We already simplified to a constant, given the new interface we can also simplify to a generic value.	2021-07-20 01:39:34 -05:00
Johannes Doerfert	5eba7846a5	[Attributor] Use checkForAllUses instead of custom use tracking AAMemoryBehaviorFloating used a custom use tracking mechanism even though checkForAllUses exists and is already more powerful. Further, AAMemoryBehaviorFloating uses AANoCapture to guarantee that there are no aliases and following the uses is sufficient. This is an OK assumption if checkForAllUses is used but custom tracking is easily out of sync with AANoCapture and problems follow.	2021-07-20 01:39:33 -05:00
Johannes Doerfert	b96ea6b1fd	[Attributor] Ensure to simplify operands in AAValueConstantRange As with other patches before, the simplification callback interface requires us to go through the Attributor::getAssumedSimplified API first before we recurs. It is unclear if the problem can be explicitly tested with our current infrastructure.	2021-07-20 00:35:14 -05:00
Johannes Doerfert	5fbb51d8d5	[Attributor] Extend the AAValueSimplify compare simplification logic We first simplify the operands of a compare and then reason on the simplified versions, e.g., with AANonNull. This does improve the simplification capabilities but also fixes a potential problem that has not yet been observed by simplifying the operands first.	2021-07-20 00:35:14 -05:00
Johannes Doerfert	9c00aabd60	[Attributor][NFCI] Expose `getAssumedUnderlyingObjects` API	2021-07-20 00:35:13 -05:00
Johannes Doerfert	5e169818fb	[Attributor][NFC] Fix function name spelling	2021-07-20 00:35:13 -05:00
Johannes Doerfert	44a9ee170c	[Attributor][FIX] Do not simplify byval arguments A byval argument is a different value in the caller and callee, we cannot propagate the information as part of AAValueSimplify. Users that want to deal with byval arguments need to specifically perform the argument -> call site step. We do not do this for now.	2021-07-19 22:48:51 -05:00
Johannes Doerfert	c2281f1565	[Attributor] Introduce AAPointerInfo This patch introduces AAPointerInfo which tracks the uses of a pointer and places them in "bins" based on their offset from the base and access size. As with other AAs, any pointer can be tracked but it is up to the user to make sense of the results. The user in this patch is AAValueSimplify and AAPotentialValues which both utilize AAPointerInfo to determine the value of a load. For now, this is restricted to loads of allocas and internal globals. Through the use of AAPointerInfo and the "bins" we can track struct members separately. The users also know that storing only zeros (at unknown indices) will result in loading only 0 (from unknown indices). Other than that, the users are flow and context insensitive (for now). To deal with the "bins" more easily, AAPointerInfo provides a forallInterfearingAccesses that applies a callback on all accesses that might interfere with a given load or store. Differential Revision: https://reviews.llvm.org/D104432	2021-07-19 22:48:35 -05:00
Johannes Doerfert	28c78a9e12	[Attributor] Simplify loads As a first step to simplify loads we only handle `null` and `undef` underlying objects, as well as objects that have the load as a single user. Loads of those values can be replaced by the initializer, if any. Proper reasoning is introduced in a follow up patch Differential Revision: https://reviews.llvm.org/D103862	2021-07-19 22:47:29 -05:00
Johannes Doerfert	97387fdf6d	[OpenMP] Fix carefully track SPMDCompatibilityTracker We did not properly use SPMDCompatibilityTracker in various places. This patch makes sure we look at the validity properly and also fix the state if we can. Differential Revision: https://reviews.llvm.org/D106085	2021-07-19 22:47:03 -05:00
Artem Belevich	1a43ee65d1	Revert "[MemCpyOpt] Enable memcpy optimizations unconditionally." This reverts commit `2c98298a75` which breaks sanitizers.	2021-07-19 14:27:41 -07:00
Petr Hosek	54902e00d1	[InstrProfiling] Use weak alias for bias variable We need the compiler generated variable to override the weak symbol of the same name inside the profile runtime, but using LinkOnceODRLinkage results in weak symbol being emitted in which case the symbol selected by the linker is going to depend on the order of inputs which can be fragile. This change replaces the use of weak definition inside the runtime with a weak alias. We place the compiler generated symbol inside a COMDAT group so dead definition can be garbage collected by the linker. We also disable the use of runtime counter relocation on Darwin since Mach-O doesn't support weak external references, but Darwin already uses a different continous mode that relies on overmapping so runtime counter relocation isn't needed there. Differential Revision: https://reviews.llvm.org/D105176	2021-07-19 12:23:51 -07:00
Artem Belevich	b988d69ea2	[infer-address-spaces] Handle complex non-pointer constexpr arguments. Fixes https://bugs.llvm.org/show_bug.cgi?id=51099 Differential Revision: https://reviews.llvm.org/D106098	2021-07-19 12:15:52 -07:00
Artem Belevich	2c98298a75	[MemCpyOpt] Enable memcpy optimizations unconditionally. The patch does not depend on the availability of the library functions for memcpy/memset as it operates on LLVM intrinsics. The optimizations are useful on the targets that have these functions disabled (e.g. NVPTX & AMDGPU). Differential Revision: https://reviews.llvm.org/D104801	2021-07-19 11:58:02 -07:00
maekawatoshiki	74f0f9a455	[LICM] Create LoopNest Invariant Code Motion (LNICM) pass This patch adds a new pass called LNICM which is a LoopNest version of LICM and a test case to show how LNICM works. Basically, LNICM only hoists invariants out of loop nest (not a loop) to keep/make perfect loop nest. This enables later optimizations that require perfect loop nest. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D104180	2021-07-20 00:31:18 +09:00
Alexey Bataev	d8d8b4574a	[SLP]Fix possible crash on unreachable incoming values sorting. The incoming values for PHI nodes may come from unreachable BasicBlocks, need to handle this case. Differential Revision: https://reviews.llvm.org/D106264	2021-07-19 04:54:53 -07:00
Mindong Chen	e908e063d1	[LoopUtils] Fix incorrect RT check bounds of loop-invariant mem accesses This fixes the lower and upper bound calculation of a RuntimeCheckingPtrGroup when it has more than one loop invariant pointers. Resolves PR50686. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D104148	2021-07-19 19:38:24 +08:00
Florian Mayer	807d50100c	Revert "[hwasan] Use stack safety analysis." This reverts commit `12268fe14a`.	2021-07-19 12:08:32 +01:00
Florian Mayer	12268fe14a	[hwasan] Use stack safety analysis. This avoids unnecessary instrumentation. Reviewed By: eugenis, vitalybuka Differential Revision: https://reviews.llvm.org/D105703	2021-07-19 11:54:44 +01:00
Rosie Sumpter	34d6820551	[LoopFlatten] Use Loop to identify loop induction phi. NFC Replace code which identifies induction phi with helper function getInductionVariable to improve robustness. Differential Revision: https://reviews.llvm.org/D106045	2021-07-19 09:06:57 +01:00
Krishna Kariya	da92e86263	[InstCombine] Fold IntToPtr/PtrToInt to bitcast The inttoptr/ptrtoint roundtrip optimization is not always correct. We are working towards removing this optimization and adding support to specific cases where this optimization works. This patch is the first one on this line. Consider the example: %i = ptrtoint i8* %X to i64 %p = inttoptr i64 %i to i16* %cmp = icmp eq i8* %load, %p In this specific case, the inttoptr/ptrtoint optimization is correct as it only compares the pointer values. In this patch, we fold inttoptr/ptrtoint to a bitcast (if src and dest types are different). Differential Revision: https://reviews.llvm.org/D105088	2021-07-18 23:13:25 +02:00
Sanjay Patel	c0f2c4ce10	[SimplifyCFG] remove unnecessary state variable; NFC Keeping a marker for Changed might have made sense before this code was refactored, but we never touch that variable after initialization now.	2021-07-18 13:42:22 -04:00
Nikita Popov	59c33a0bc8	[Cloning] Remove unused parameter from CloneAndPruneFunctionInto() (NFC)	2021-07-18 18:38:06 +02:00
Sanjay Patel	0e15de2d0c	[InstCombine] fold reassociative FP add into start value of fadd reduction This pattern is visible in unrolled and vectorized loops. Although the backend seems to be able to reassociate to ideal form in the examples I looked at, we might as well do that in IR for efficiency.	2021-07-18 06:26:20 -04:00
Shilei Tian	d3454ee8d2	[AbstractAttributor] Fix two issues in folding __kmpc_is_spmd_exec_mode This patch fixed two issues found when folding `__kmpc_is_spmd_exec_mode`: 1. When the reaching kernels are empty, it should not fold to generic mode. 2. When creating AA for the caller when updating information, the dependency should be required. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D106209	2021-07-17 13:13:44 -04:00
Wenlei He	f9f3c34e0f	[CSSPGO] Turn on iterative-BFI for CSSPGO Iterative-BFI produces better count quality and performance when evaluated on internal benchmarks. Turning it on by default now for CSSPGO. We can consider turn it on by default for AutoFDO as well in the future. Differential Revision: https://reviews.llvm.org/D106202	2021-07-16 17:35:49 -07:00
Sami Tolvanen	0ad1d9fdf2	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This reverts commit `8e3b5cb39e`. Reverting to investigate test failures.	2021-07-16 14:47:33 -07:00
Sami Tolvanen	8e3b5cb39e	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. This version uses module inline assembly to avoid issues with LowerTypeTestsModule. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058	2021-07-16 14:33:34 -07:00
Alexey Bataev	da3dbfcacf	[SLP]Improve calculations of the cost for reused/reordered scalars. Part of D105020. Also, fixed FIXMEs that need to use wider vector type when trying to calculate the cost of reused scalars. This may cause regressions unless D100486 is landed to improve the cost estimations for long vectors shuffling. Differential Revision: https://reviews.llvm.org/D106060	2021-07-16 13:40:15 -07:00
Alexey Bataev	1b18e9ab67	[PATCH] D105827: [SLP]Workaround for InsertSubVector cost. The cost of the InsertSubvector shuffle kind cost is not complete and may end up with just extracts + inserts costs in many cases. Added a workaround to represent it as a generic PermuteSingleSrc, which is still pessimistic but better than InsertSubvector. Differential Revision: https://reviews.llvm.org/D105827	2021-07-16 12:59:08 -07:00
Joseph Huber	b910a109f8	[OpenMP][NFC] Update the comment header for optimizations.	2021-07-16 14:13:13 -04:00
Joseph Huber	2c31d5ebfb	[OpenMP] Add IDs to OpenMP remarks This patch adds unique idenfitiers to the existing OpenMP remarks. This makes it easier to identify the corresponding documentation for each remark that will be hosted in the OpenMP webpage. Depends on D105898 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105939	2021-07-16 14:07:03 -04:00
Joseph Huber	eef6601b0f	[OpenMP] Rework OpenMP remarks This patch rewrites and reworks a few of the existing remarks to make the mmore concise and consistent prior to writing the documentation for them. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105898	2021-07-16 14:07:00 -04:00
Congzhe Cao	a7b7d22d6e	[LoopInterchange] Check lcssa phis in the inner latch in scenarios of multi-level nested loops We already know that we need to check whether lcssa phis are supported in inner loop exit block or in outer loop exit block, and we have logic to check them already. Presumably the inner loop latch does not have lcssa phis and there is no code that deals with lcssa phis in the inner loop latch. However, that assumption is not true, when we have loops with more than two-level nesting. This patch adds checks for lcssa phis in the inner latch. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D102300	2021-07-16 11:59:20 -04:00
Kerry McLaughlin	49d73130ca	[LV] Avoid scalable vectorization for loops containing alloca This patch returns an Invalid cost from getInstructionCost() for alloca instructions if the VF is scalable, as otherwise loops which contain these instructions will crash when attempting to scalarize the alloca. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105824	2021-07-16 11:47:13 +01:00
Sander de Smalen	239d01fa88	Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs." The original patch was: https://reviews.llvm.org/D105806 There were some issues with undeterministic behaviour of the sorting function, which led to scalable-call.ll passing and/or failing. This patch fixes the issue by numbering all instructions in the array first, and using that number as the order, which should provide a consistent ordering. This reverts commit `a607f64118`.	2021-07-16 10:52:01 +01:00
Max Kazantsev	f98ed74f69	[LSR] Handle case 1reg => reg. PR50918 This patch addresses assertion failure in case when the only found formula for LSR is `1reg => reg` which was supposed to be an impossible situation, however there is a test that shows it is possible. In this case, we can use scale register with scale of 1 as the missing base register. Reviewed By: huihuiz, reames Differential Revision: https://reviews.llvm.org/D105009	2021-07-16 11:33:59 +07:00
Shilei Tian	c23da666b5	[Attributor] Add support for compound assignment for ChangeStatus A common use of `ChangeStatus` is as follows: ``` ChangeStatus Changed = ChangeStatus::UNCHANGED; Changed \|= foo(); ``` where `foo` returns `ChangeStatus` as well. Currently `ChangeStatus` doesn't support compound assignment, we have to write as ``` Changed = Changed \| foo(); ``` which is not that convenient. This patch add the support for compound assignment for `ChangeStatus`. Compound assignment is usually implemented as a member function, and binary arithmetic operator is therefore implemented using compound assignment. However, unlike regular C++ class, enum class doesn't support member functions. As a result, they can only be implemented in the way shown in the patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106109	2021-07-15 23:51:46 -04:00
Vitaly Buka	bba8a76b87	[NFC][hwasan] Remove default arguments in internal class	2021-07-15 15:28:02 -07:00
Shilei Tian	ca662297d5	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-15 18:23:23 -04:00
Sanjay Patel	81ce3aa30c	[SLP] avoid leaking poison in reduction of safe boolean logic ops This bug was introduced with D105730 / `25ee55c0ba` . If we are not converting all of the operations of a reduction into a vector op, we need to preserve the existing select form of the remaining ops. Otherwise, we are potentially leaking poison where it did not in the original code. Alive2 agrees that the version that freezes some inputs and then falls back to scalar is correct: https://alive2.llvm.org/ce/z/erF4K2	2021-07-15 17:33:06 -04:00
Simon Pilgrim	0a614ca225	Fix "unknown pragma 'GCC'" MSVC warning. NFCI.	2021-07-15 18:50:19 +01:00
Arthur Eubanks	99cb2507f3	Revert "[SLP]Workaround for InsertSubVector cost." This reverts commit `2eb50baf05`. Causes hangs, see comments on D105827.	2021-07-15 10:19:41 -07:00
Nikita Popov	c191035f42	[IR] Add elementtype attribute This implements the elementtype attribute specified in D105407. It just adds the attribute and the specified verifier rules, but doesn't yet make use of it anywhere. Differential Revision: https://reviews.llvm.org/D106008	2021-07-15 18:04:26 +02:00
Arthur Eubanks	04b75c05b0	[InstCombine] Look through invariant group intrinsics when removing malloc Fixes some regressions with -fstrict-vtable-pointers in llvm-test-suite. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D106017	2021-07-15 09:02:40 -07:00
Philip Reames	95346ba877	[LV] Enable vectorization of multiple exit loops w/computable exit counts This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch. The majority of work to support this was done in a set of previous patches. In particular,, `72314466` avoids having multiple edges from the middle block to the exits, and `4b33b2387` which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit. Differential Revision: https://reviews.llvm.org/D105817	2021-07-15 08:53:51 -07:00
Shilei Tian	a70ef3f568	Revert "[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible" This reverts commit `1100e4aafe`.	2021-07-15 11:19:28 -04:00
Sander de Smalen	a607f64118	Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs." This reverts commit `efaf3099c8`. This reverts commit `dc7bdc1e71`. Reverting patches due to buildbot failures.	2021-07-15 15:21:57 +01:00
Roman Lebedev	3e6c383dc6	[SimplifyCFG] Rerun PHI deduplication after common code sinkinkg (PR51092) `SinkCommonCodeFromPredecessors()` doesn't itself ensure that duplicate PHI nodes aren't created. I suppose, we could teach it to do that on-the-fly (& account for the already-existing PHI nodes, & adjust costmodel), the diff will be bigger than this. The alternative is to schedule a new EarlyCSE pass invocation somewhere later in the pipeline. Clearly, we don't have any EarlyCSE runs in module optimization passline, so this pattern isn't cleaned up... That would perhaps better, but it will again have some compile time impact. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D106010	2021-07-15 16:34:34 +03:00
Sander de Smalen	dc7bdc1e71	[LV] Fix determinism for failing scalable-call.ll test. The sort function for emitting an OptRemark was not deterministic, which caused scalable-call.ll to fail on some buildbots. This patch fixes that. This patch also fixes an issue where `Instruction::comesBefore()` is called when two Instructions are in different basic blocks, which would otherwise cause an assertion failure.	2021-07-15 13:16:59 +01:00
Stephen Tozer	47633af9d4	Reapply "[DebugInfo] Enable variadic debug value salvaging" Reapplied after previous build failures were fixed in `14b62f7e2`. This reverts commit `540b4a5fb3`.	2021-07-15 12:54:51 +01:00
Simon Pilgrim	944f39f38d	[InstCombine] Strip inbounds from (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) fold As discussed on rGd561b6fbdbe6, we can't guarantee that the new gep is inbounds	2021-07-15 12:19:10 +01:00
Ilya Leoshkevich	3845f2cd94	[TSan] Use zeroext for function parameters SystemZ ABI requires zero-extending function parameters to 64-bit. The compiler is free to optimize the code around this assumption, e.g. failing to zero-extend __tsan_atomic32_load()'s morder may cause crashes in to_mo() switch table lookup. Fix by adding zeroext attributes to TSan's FunctionCallees, similar to how it was done in commit `3bc439bdff` ("[MSan] Add instrumentation for SystemZ"). This is a no-op on arches that don't need it. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D105629	2021-07-15 12:18:47 +02:00
Florian Mayer	0ed1747a92	[NFC] [hwasan] Split argument logic into functions. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D105971	2021-07-15 10:45:43 +01:00
Chuanqi Xu	8a1727ba51	[Coroutines] Run coroutine passes by default This patch make coroutine passes run by default in LLVM pipeline. Now the clang and opt could handle IR inputs containing coroutine intrinsics without special options. It should be fine. On the one hand, the coroutine passes seems to be stable since there are already many projects using coroutine feature. On the other hand, the coroutine passes should do nothing for IR who doesn't contain coroutine intrinsic. Test Plan: check-llvm Reviewed by: lxfind, aeubanks Differential Revision: https://reviews.llvm.org/D105877	2021-07-15 14:33:40 +08:00
Kuter Dinel	ade190c5ea	[Attributor] AACallEdges, Add a way to ask nonasm unknown callees This patch adds a feature to AACallEdges AbstractAttribute that allows users to ask if there is a unknown callee that isn't a inline assembly. This feature is needed by some of it's users. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105992	2021-07-15 06:10:42 +03:00
Jon Roelofs	d143103068	[GlobalOpt] Fix a miscompile when evaluating struct initializers. The bug was that evaluateBitcastFromPtr attempts a narrowing to a struct's 0th element of a store that covers other elements. While this is okay on the load side, applying it to stores causes us to miss the writes to the additionally covered elements. rdar://79503568 Differential revision: https://reviews.llvm.org/D105838	2021-07-14 15:37:01 -07:00
Arthur Eubanks	5366de7375	[SimpleLoopUnswitch] Don't non-trivially unswitch loops with catchswitch exits SplitBlock() can't handle catchswitch. Fixes PR50973. Reviewed By: aheejin Differential Revision: https://reviews.llvm.org/D105672	2021-07-14 14:07:28 -07:00
Alexey Bataev	ba2690b17b	[SLP][NFC]Fix variables names, NFC.	2021-07-14 12:43:45 -07:00
Simon Pilgrim	4fd0addb68	[SLP] Fix case of variable name. NFCI.	2021-07-14 20:20:04 +01:00
Sanjay Patel	ca6e117d86	[InstCombine] reorder icmp with offset folds for better results This set of folds was added recently with: `c7b658aeb5` `0c400e8953` `40b752d28d` ...and I noted that this wasn't likely to fire in code derived from C/C++ source because of nsw in particular. But I didn't notice that I had placed the code above the no-wrap block of transforms. This is likely the cause of regressions noted from the previous commit because -- as shown in the test diffs -- we may have transformed into a compare with an arbitrary constant rather than a simpler signbit test.	2021-07-14 12:12:05 -04:00
Sander de Smalen	efaf3099c8	[LV] Print remark when loop cannot be vectorized due to invalid costs. This patch emits remarks for instructions that have invalid costs for a given set of vectorization factors. Some example output: t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load dst[i] = sinf(src[i]); ^ t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32 dst[i] = sinf(src[i]); ^ t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store dst[i] = sinf(src[i]); ^ Reviewed By: fhahn, kmclaughlin Differential Revision: https://reviews.llvm.org/D105806	2021-07-14 17:11:33 +01:00
Alexey Bataev	2eb50baf05	[SLP]Workaround for InsertSubVector cost. The cost of the InsertSubvector shuffle kind cost is not complete and may end up with just extracts + inserts costs in many cases. Added a workaround to represent it as a generic PermuteSingleSrc, which is still pessimistic but better than InsertSubvector. Differential Revision: https://reviews.llvm.org/D105827	2021-07-14 07:54:24 -07:00
Sanjay Patel	25ee55c0ba	[SLP] match logical and/or as reduction candidates This has been a work-in-progress for a long time...we finally have all of the pieces in place to handle vectorization of compare code as shown in: https://llvm.org/PR41312 To do this (see PhaseOrdering tests), we converted SimplifyCFG and InstCombine to the poison-safe (select) forms of the logic ops, so now we need to have SLP recognize those patterns and insert a freeze op to make a safe reduction: https://alive2.llvm.org/ce/z/NH54Ah We get the minimal patterns with this patch, but the PhaseOrdering tests show that we still need adjustments to get the ideal IR in some or all of the motivating cases. Differential Revision: https://reviews.llvm.org/D105730	2021-07-14 09:02:31 -04:00
Simon Pilgrim	d561b6fbdb	[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED) As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep': define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) { %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1 %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3 %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1 ret <4 x i32>* %sel } --> define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) { %sel = select i1 %a2, i64 %a1, i64 %a3 %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel ret <4 x i32>* %gep } This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address: define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) { %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1 %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep ret <4 x i32>* %sel } --> define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) { %sel = select i1 %a2, i64 0, i64 %a1 %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel ret <4 x i32>* %gep } Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests Differential Revision: https://reviews.llvm.org/D105901	2021-07-14 12:21:01 +01:00
Chuanqi Xu	12d04ce956	[NFC] [Coroutines] Remove unused CoroFree	2021-07-14 19:13:12 +08:00
Simon Pilgrim	0722f3d0fa	Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)" Missed some BPF test changes that need addressing	2021-07-14 11:48:37 +01:00
Stephen Tozer	810e4c3c66	[DebugInfo] Correctly update dbg.values with duplicated location ops This patch fixes code that incorrectly handled dbg.values with duplicate location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in question were caused by either applying an update to dbg.value multiple times when the update is only valid once, or by updating the DIExpression for only the first instance of a value that appears multiple times. Differential Revision: https://reviews.llvm.org/D105831	2021-07-14 11:17:24 +01:00
Simon Pilgrim	b803294cf7	[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep': define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) { %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1 %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3 %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1 ret <4 x i32>* %sel } --> define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) { %sel = select i1 %a2, i64 %a1, i64 %a3 %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel ret <4 x i32>* %gep } This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address: define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) { %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1 %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep ret <4 x i32>* %sel } --> define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) { %sel = select i1 %a2, i64 0, i64 %a1 %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel ret <4 x i32>* %gep } Differential Revision: https://reviews.llvm.org/D105901	2021-07-14 10:49:12 +01:00
Shilei Tian	1100e4aafe	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-13 22:28:35 -04:00
Arthur Eubanks	0024ec59a0	[NewPM][SimpleLoopUnswitch] Add option to not trivially unswitch To help with debugging non-trivial unswitching issues. Don't care about the legacy pass, nobody is using it. If a pass's string params are empty (e.g. "simple-loop-unswitch"), don't default to the empty constructor for the pass params. We should still let the parser take care of it in case the parser has its own defaults. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D105933	2021-07-13 16:09:42 -07:00
Eli Friedman	5d1ba53404	[LoopReroll] Add an extra defensive check to avoid SCEV assertion. Make sure getMinusSCEV() didn't return a pointer. The following check would never succeed if it was a pointer, anyway, but calling getMulExpr() on a pointer SCEV now asserts.	2021-07-13 12:17:09 -07:00
Arthur Eubanks	489742991f	[NFC] Inline variable to prevent unused variable warning	2021-07-13 09:57:59 -07:00
Arthur Eubanks	ab5693aa4a	[OpaquePtr] Use byval type more	2021-07-13 09:34:34 -07:00
Arthur Eubanks	113a807977	[OpaquePtr] Get load/store type without PointerType::getElementType()	2021-07-13 09:34:34 -07:00
Arthur Eubanks	693bc04bf6	[OpaquePtr] Use GlobalValue::getValueType() more	2021-07-13 09:34:34 -07:00
Simon Pilgrim	b2f6cf1479	[InstCombine] Fold lshr/ashr(or(neg(x),x),bw-1) --> zext/sext(icmp_ne(x,0)) (PR50816) Handle the missing fold reported in PR50816, which is a variant of the existing ashr(sub_nsw(X,Y),bw-1) --> sext(icmp_sgt(X,Y)) fold. We also handle the lshr(or(neg(x),x),bw-1) --> zext(icmp_ne(x,0)) equivalent - https://alive2.llvm.org/ce/z/SnZmSj We still allow multi uses of the neg(x) - as this is likely to let us further simplify other uses of the neg - but not multi uses of the or() which would increase instruction count. Differential Revision: https://reviews.llvm.org/D105764	2021-07-13 14:44:54 +01:00
Jeroen Dobbelaere	1d8030053d	[NFC] Do not track calls to inlined intrinsics in IFI. Just like intrinsics are not tracked for IFI.InlinedCalls, they should not be tracked for IFI.InlinedCallSites. In the current top-of-tree this change is a NFC, but the full restrict patches (D68484) potentially trigger an read-after-free if intrinsics are also added to the InlindeCallSites, due to a late optimization potentially removing some of the inlined intrinsics. Also see https://lists.llvm.org/pipermail/llvm-dev/2021-July/151722.html for a discussion about the problem. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D105805	2021-07-13 10:18:23 +02:00
hyeongyu kim	e338d08ae6	[SimplifyCFG] Fix SimplifyBranchOnICmpChain to be undef/poison safe. This patch fixes the problem of SimplifyBranchOnICmpChain that occurs when extra values are Undef or poison. Suppose the %mode is 51 and the %Cond is poison, and let's look at the case below. ``` %A = icmp ne i32 %mode, 0 %B = icmp ne i32 %mode, 51 %C = select i1 %A, i1 %B, i1 false %D = select i1 %C, i1 %Cond, i1 false br i1 %D, label %T, label %F => br i1 %Cond, label %switch.early.test, label %F switch.early.test: switch i32 %mode, label %T [ i32 51, label %F i32 0, label %F ] ``` incorrectness: https://alive2.llvm.org/ce/z/BWScX Code before transformation will not raise UB because %C and %D is false, and it will not use %Cond. But after transformation, %Cond is being used immediately, and it will raise UB. This problem can be solved by adding freeze instruction. correctness: https://alive2.llvm.org/ce/z/x9x4oY Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104569	2021-07-13 15:35:18 +09:00
Arthur Eubanks	0e6424acbd	[OpaquePointers][ThreadSanitizer] Cleanup calls to PointerType::getElementType() Reviewed By: #opaque-pointers, dblaikie Differential Revision: https://reviews.llvm.org/D105710	2021-07-12 20:46:08 -07:00
Nikita Popov	7ed3e87825	[Attributes] Determine attribute properties from TableGen data Continuing from D105763, this allows placing certain properties about attributes in the TableGen definition. In particular, we store whether an attribute applies to fn/param/ret (or a combination thereof). This information is used by the Verifier, as well as the ForceFunctionAttrs pass. I also plan to use this in LLParser, which also duplicates info on which attributes are valid where. This keeps metadata about attributes in one place, and makes it more likely that it stays in sync, rather than in various functions spread across the codebase. Differential Revision: https://reviews.llvm.org/D105780	2021-07-12 22:13:38 +02:00
Nikita Popov	6ac32872ee	[Attributes] Replace doesAttrKindHaveArgument() (NFC) This is now the same as isIntAttrKind(), so use that instead, as it does not require manual maintenance. The naming is also more accurate in that both int and type attributes have an argument, but this method was only targeting int attributes. I initially wanted to tighten the AttrBuilder assertion, but we have some in-tree uses that would violate it.	2021-07-12 21:57:26 +02:00
Sanjay Patel	a488c7879e	[InstCombine] reduce signbit test of logic ops to cmp with zero This is the pattern from the description of: https://llvm.org/PR50816 There might be a way to generalize this to a smaller or more generic pattern, but I have not found it yet. https://alive2.llvm.org/ce/z/ShzJoF define i1 @src(i8 %x) { %add = add i8 %x, -1 %xor = xor i8 %x, -1 %and = and i8 %add, %xor %r = icmp slt i8 %and, 0 ret i1 %r } define i1 @tgt(i8 %x) { %r = icmp eq i8 %x, 0 ret i1 %r }	2021-07-12 09:01:26 -04:00
Yevgeny Rouban	88024a724c	[RS4GC] Use one DVCache for both inlineGetBaseAndOffset() and insertParsePoints() This new test demonstrates a case where a base ptr is generated twice for the same value: the first one is generated while the gc.get.pointer.base() is inlined, the second is generated for the statepoint. This happens because the methods inlineGetBaseAndOffset() and insertParsePoints() do not share their defining value cache used by the findBasePointer() method. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D103240	2021-07-12 18:13:00 +07:00
Sander de Smalen	d2e4ccc790	[LV] Ignore candidate VFs with invalid costs. This follows on from discussion on the mailing-list: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html to interpret an Invalid cost as 'infinitely expensive', as this simplifies some of the legalization issues with scalable vectors. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105473	2021-07-12 09:58:22 +01:00
Liqiang Tao	6ebeb7f8ba	[llvm][Inliner] Templatize PriorityInlineOrder The patch templatize PriorityInlinerOrder so that it can accept any type priority metric. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D104972	2021-07-12 10:38:22 +08:00
Johannes Doerfert	792aac9897	[Attributor][NFCI] Add UsedAssumedInformation to more interfaces As with other Attributor interfaces we often want to know if assumed information was used to answer a query. This is important if only known information is allowed or if known information can lead to an early fixpoint. The users have been adjusted but none of them utilizes the new information yet.	2021-07-11 19:18:03 -05:00
Eli Friedman	6144085c29	[IndVars] Don't widen pointers in WidenIV::getWideRecurrence It's not a reasonable transform, and calling getSignExtendExpr() on a pointer hits an assertion.	2021-07-11 17:04:50 -07:00
Florian Hahn	c6e4c1fbd8	[VPlan] Remove default arg from getVPValue (NFC). The const version of VPValue::getVPValue still had a default value for the value index. Remove the default value and use getVPSingleValue instead, which is the proper function.	2021-07-11 22:03:09 +02:00
Craig Topper	e38b7e8948	[DivRemPairs] Add an initial case for hoisting to a common predecessor. This patch adds support for hoisting the division and maybe the remainder for control flow graphs like this. ``` PredBB \| \ \| Rem \| / Div ``` If we have DivRem we'll hoist both to PredBB. If not we'll just hoist Div and expand Rem using the Div. This improves our codegen for something like this ``` __uint128_t udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t remainder) { if (remainder != 0) remainder = dividend % divisor; return dividend / divisor; } ``` Reviewed By: spatel, lebedev.ri Differential Revision: https://reviews.llvm.org/D87555	2021-07-11 10:03:07 -07:00
hyeongyu kim	1a5f4cbe1b	[InstCombine] Add optimization to prevent poison from being propagated. In D104569, Freeze was inserted just before br to solve the `branching on undef` miscompilation problem. But value analysis was being disturbed by added freeze. ``` v = load ptr cond = freeze(icmp (and v, const), const') br cond, ... ``` The case in which value analysis disturbed is as above. By changing freeze to add immediately after load, value analysis will be successful again. ``` v = load ptr freeze(icmp (and v, const), const') => v = load ptr v' = freeze v icmp (and v', const), const' ``` In this patch, I propose the above optimization. With this patch, the poison will not spread as the freeze is performed early. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D105392	2021-07-11 12:40:43 +09:00
Johannes Doerfert	2e7e2994a9	[Attributor][FIX] Destroy bump allocator objects to avoid leaks AllocationInfo and DeallocationInfo objects themselves are allocated with the Attributor bump allocator and do not need to be deallocated. That said, the sets in AllocationInfo and DeallocationInfo need to be destroyed to avoid memory leaks.	2021-07-10 18:53:37 -05:00
Johannes Doerfert	514c033db1	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 18:44:25 -05:00
Johannes Doerfert	8cb7d71355	[OpenMP][FIX] Add missing `)` to remark	2021-07-10 18:40:32 -05:00
Johannes Doerfert	d9659bf6a0	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 17:57:08 -05:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Johannes Doerfert	4761d29633	[Attributor][FIX] Sanitize queries to LVI and ScalarEvolution When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need to be careful with the query. The particular error occurred because we folded a PHI node before the LVI query but the context location was now not dominated by the value anymore. This is not supported by LVI so we have to filter these situations before we query the outside analyses.	2021-07-10 16:45:19 -05:00
Johannes Doerfert	c1d53a316d	[Attributor] Look through selects in genericValueTraversal If we can simplify the select condition we can avoid one value in the traversal. Differential Revision: https://reviews.llvm.org/D103861	2021-07-10 16:44:05 -05:00
Johannes Doerfert	c1c1fe9385	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 16:32:24 -05:00
Johannes Doerfert	dbb3a65f5b	[Attributor][FIX] Do not replace a value with a non-dominating instruction We have to be careful when we replace values to not use a non-dominating instruction. It makes sense that simplification offers those as "simplified values" but we can't manifest them in the IR without PHI nodes. In the future we should consider potentially adding those PHI nodes.	2021-07-10 16:09:30 -05:00
Johannes Doerfert	5ef18e2421	[Attributor] Use AAValueSimplify to simplify returned values We should use AAValueSimplify for all value simplification, however there was some leftover logic that predates AAValueSimplify in AAReturnedValues. This remove the AAReturnedValues part and provides a replacement by making AAValueSimplifyReturned strong enough to handle all previously covered cases. Further, this improve AAValueSimplifyCallSiteReturned to handle returned arguments. AAReturnedValues is now much easier and the collected returned values/instructions are now from the associated function only, making it much more sane. We also do not have the brittle logic anymore that looks for unresolved calls. Instead, we use AAValueSimplify to handle recursion. Useful code has been split into helper functions, e.g., an Attributor interface to get a simplified value. Differential Revision: https://reviews.llvm.org/D103860	2021-07-10 15:52:36 -05:00
Johannes Doerfert	0aab13aaf9	[Attributor] Introduce an optimistic getUnderlyingObjects helper As the `llvm::getUnderlyingObjects` helper, the optimistic version collects objects that might be the base of a given pointer. In contrast to the llvm variant, the optimistic one will use assumed information, e.g., about select conditions or dead blocks, to provide a more precise result. Differential Revision: https://reviews.llvm.org/D103859	2021-07-10 15:47:30 -05:00
Johannes Doerfert	5b12cf3e65	[Attributor][FIX] Traverse uses even if a value is assumed constant Not all attributes are able to handle the interprocedural step and follow the uses into a call site. Let them be able to combine call site uses instead. This might result in some unused values/arguments being leftover but it removes problems where we misused "is dead" even though it was actually "is simplified/replaced". We explicitly check for dead values due to constant propagation in `AAIsDeadValueImpl::areAllUsesAssumedDead` instead. Differential Revision: https://reviews.llvm.org/D103858	2021-07-10 15:47:20 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	269416d419	[Attributor][NFCI] Add UsedAssumedInformation to more interfaces As with other Attributor interfaces we often want to know if assumed information was used to answer a query. This is important if only known information is allowed or if known information can lead to an early fixpoint. The users have been adjusted but none of them utilizes the new information yet.	2021-07-10 12:32:51 -05:00
Johannes Doerfert	d39179d7fa	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 12:32:51 -05:00
Johannes Doerfert	966342790e	[Attributor][FIX] Sanitize queries to LVI and ScalarEvolution When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need to be careful with the query. The particular error occurred because we folded a PHI node before the LVI query but the context location was now not dominated by the value anymore. This is not supported by LVI so we have to filter these situations before we query the outside analyses.	2021-07-10 12:32:51 -05:00
Johannes Doerfert	ae08df87df	[Attributor][FIX] Do not replace a value with a non-dominating instruction We have to be careful when we replace values to not use a non-dominating instruction. It makes sense that simplification offers those as "simplified values" but we can't manifest them in the IR without PHI nodes. In the future we should consider potentially adding those PHI nodes.	2021-07-10 12:32:50 -05:00
Johannes Doerfert	f0628c6ff7	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 12:32:50 -05:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Johannes Doerfert	5003ba2542	[Attributor] Look through selects in genericValueTraversal If we can simplify the select condition we can avoid one value in the traversal. Differential Revision: https://reviews.llvm.org/D103861	2021-07-10 12:32:50 -05:00
Johannes Doerfert	1eb31d6de3	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 12:32:50 -05:00
Johannes Doerfert	374e573cfc	[Attributor] Use AAValueSimplify to simplify returned values We should use AAValueSimplify for all value simplification, however there was some leftover logic that predates AAValueSimplify in AAReturnedValues. This remove the AAReturnedValues part and provides a replacement by making AAValueSimplifyReturned strong enough to handle all previously covered cases. Further, this improve AAValueSimplifyCallSiteReturned to handle returned arguments. AAReturnedValues is now much easier and the collected returned values/instructions are now from the associated function only, making it much more sane. We also do not have the brittle logic anymore that looks for unresolved calls. Instead, we use AAValueSimplify to handle recursion. Useful code has been split into helper functions, e.g., an Attributor interface to get a simplified value. Differential Revision: https://reviews.llvm.org/D103860	2021-07-10 12:32:50 -05:00
Johannes Doerfert	93a279a67d	[Attributor] Introduce an optimistic getUnderlyingObjects helper As the `llvm::getUnderlyingObjects` helper, the optimistic version collects objects that might be the base of a given pointer. In contrast to the llvm variant, the optimistic one will use assumed information, e.g., about select conditions or dead blocks, to provide a more precise result. Differential Revision: https://reviews.llvm.org/D103859	2021-07-10 12:32:49 -05:00
Johannes Doerfert	be5d46e9bb	[Attributor][FIX] Traverse uses even if a value is assumed constant Not all attributes are able to handle the interprocedural step and follow the uses into a call site. Let them be able to combine call site uses instead. This might result in some unused values/arguments being leftover but it removes problems where we misused "is dead" even though it was actually "is simplified/replaced". We explicitly check for dead values due to constant propagation in `AAIsDeadValueImpl::areAllUsesAssumedDead` instead. Differential Revision: https://reviews.llvm.org/D103858	2021-07-10 12:32:49 -05:00
Sander de Smalen	239fcda268	[LV] NFCI: Do cost comparison on InstructionCost directly. Instead of performing the isMoreProfitable() operation on InstructionCost::CostTy the operation is performed on InstructionCost directly, so that it can handle the case where one of the costs is Invalid. This patch also changes the CostTy to be int64_t, so that the type is wide enough to deal with multiplications with e.g. `unsigned MaxTripCount`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105113	2021-07-10 11:57:16 +01:00
Eli Friedman	9c4baf5101	[ScalarEvolution] Strictly enforce pointer/int type rules. Rules: 1. SCEVUnknown is a pointer if and only if the LLVM IR value is a pointer. 2. SCEVPtrToInt is never a pointer. 3. If any other SCEV expression has no pointer operands, the result is an integer. 4. If a SCEVAddExpr has exactly one pointer operand, the result is a pointer. 5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other pointer operands, the result is a pointer. 6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a pointer. 7. Otherwise, the SCEV expression is invalid. I'm not sure how useful rule 6 is in practice. If we exclude it, we can guarantee that ScalarEvolution::getPointerBase always returns a SCEVUnknown, which might be a helpful property. Anyway, I'll leave that for a followup. This is basically mop-up at this point; all the changes with significant functional effects have landed. Some of the remaining changes could be split off, but I don't see much point. Differential Revision: https://reviews.llvm.org/D105510	2021-07-09 17:29:26 -07:00
Valery N Dmitriev	8e9216fe87	[SLP] Do not make an attempt to match reduction on already erased instruction. Differential Revision: https://reviews.llvm.org/D105752	2021-07-09 17:13:15 -07:00
Kazu Hirata	49d66d9f9f	[AFDO] Merge function attributes after inlining This patch teaches the sample profile loader to merge function attributes after inlining functions. Without this patch, the compiler could inline a function requiring the 512-bit vector width into its caller without merging function attributes, triggering a failure during instruction selection. Differential Revision: https://reviews.llvm.org/D105729	2021-07-09 16:47:12 -07:00
Sanjay Patel	c2b7f09d8c	[SLP] make invalid operand explicit for extra arg in reduction matching; NFC This makes it clearer when we have encountered the extra arg. Also, we may need to adjust the way the operand iteration works when handling logical and/or.	2021-07-09 15:32:12 -04:00
Varun Gandhi	92dcb1d2db	[Clang] Introduce Swift async calling convention. This change is intended as initial setup. The plan is to add more semantic checks later. I plan to update the documentation as more semantic checks are added (instead of documenting the details up front). Most of the code closely mirrors that for the Swift calling convention. Three places are marked as [FIXME: swiftasynccc]; those will be addressed once the corresponding convention is introduced in LLVM. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D95561	2021-07-09 11:50:10 -07:00
Arthur Eubanks	9a7afae492	[OpaquePtr][InferAddrSpace] Use PointerType::getWithSamePointeeType()	2021-07-09 10:29:08 -07:00
Arthur Eubanks	4e6013250d	[NFC][OpaquePtr] Use GlobalValue::getValueType() more Instead of getType()->getElementType().	2021-07-09 09:55:41 -07:00
Sanjay Patel	486992f958	[SLP] improve code comments; NFC This likely started out only supporint binops, but now we handle min/max using cmp+sel, and we may extend to handle bool logic in the form of select.	2021-07-09 12:49:54 -04:00
Sanjay Patel	544f2711bb	[SLP] make checks for cmp+select min/max more explicit This is NFC-intended currently (so no test diffs). The motivation is to eventually allow matching for poison-safe logical-and and logical-or (these are in the form of a select-of-bools). ( https://llvm.org/PR41312 ) Those patterns will not have all of the same constraints as min/max in the form of cmp+sel. We may also end up removing the cmp+sel min/max matching entirely (if we canonicalize to intrinsics), so this will make that step easier.	2021-07-09 12:43:43 -04:00
Arthur Eubanks	e4f66a1055	[OpaquePointers][CallPromotion] Don't look at pointee type for byval byval's type parameter is now always required.	2021-07-09 09:34:05 -07:00
Nico Weber	97c675d3d4	Revert "Revert "Temporarily do not drop volatile stores before unreachable"" This reverts commit `52aeacfbf5`. There isn't full agreement on a path forward yet, but there is agreement that this shouldn't land as-is. See discussion on https://reviews.llvm.org/D105338 Also reverts unreviewed "[clang] Improve `-Wnull-dereference` diag to be more in-line with reality" This reverts commit `f4877c78c0`. And all the related changes to tests: This reverts commit `9a0152799f`. This reverts commit `3f7c9cc274`. This reverts commit `329f8197ef`. This reverts commit `aa9f58cc2c`. This reverts commit `2df37d5ddd`. This reverts commit `a72a441812`.	2021-07-09 11:44:34 -04:00
Kevin P. Neal	52900486a1	[FPEnv][InstSimplify] Constrained FP support for NaN Currently InstructionSimplify.cpp knows how to simplify floating point instructions that have a NaN operand. It does not know how to handle the matching constrained FP intrinsic. This patch teaches it how to simplify so long as the exception handling is not "fpexcept.strict". Differential Revision: https://reviews.llvm.org/D103169	2021-07-09 11:26:28 -04:00
Roman Lebedev	4e332cd41a	Revert "Transform memset + malloc --> calloc (PR25892)" It broke `check-msan`, see e.g. https://lab.llvm.org/buildbot/#/builders/18/builds/1934 This reverts commit `375694a07b`.	2021-07-09 16:26:48 +03:00
Roman Lebedev	52aeacfbf5	Revert "Temporarily do not drop volatile stores before unreachable" This reverts commit `4e413e1621`, which landed almost 10 months ago under premise that the original behavior didn't match reality and was breaking users, even though it was correct as per the LangRef. But the LangRef change still hasn't appeared, which might suggest that the affected parties aren't really worried about this problem. Please refer to discussion in: * https://reviews.llvm.org/D87399 (`Revert "[InstCombine] erase instructions leading up to unreachable"`) * https://reviews.llvm.org/D53184 (`[LangRef] Clarify semantics of volatile operations.`) * https://reviews.llvm.org/D87149 (`[InstCombine] erase instructions leading up to unreachable`) clang has `-Wnull-dereference` which will diagnose the obvious cases of null dereference, it was adjusted in `f4877c78c0`, but it will only catch the cases where the pointer is a null literal, it will not catch the cases where an arbitrary store is expected to trap. Differential Revision: https://reviews.llvm.org/D105338	2021-07-09 14:16:54 +03:00
Max Kazantsev	9c5e65691e	[LoopDeletion] Handle switch in proving that loop exits on first iteration Added check for switch-terminated blocks in loops. Now if a block is terminated with a switch, we try to find out which of the cases is taken on 1st iteration and mark corresponding edge from the block to the case successor as live. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D105688 Reviewed By: nikic, mkazantsev	2021-07-09 18:03:34 +07:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Dawid Jurczak	375694a07b	Transform memset + malloc --> calloc (PR25892) After this change DSE can eliminate malloc + memset and emit calloc. It's https://reviews.llvm.org/D101440 follow-up. Differential Revision: https://reviews.llvm.org/D103009	2021-07-09 11:01:12 +02:00
Bjorn Pettersson	472462c472	[NewPM] Consistently use 'simplifycfg' rather than 'simplify-cfg' There was an alias between 'simplifycfg' and 'simplify-cfg' in the PassRegistry. That was the original reason for this patch, which effectively removes the alias. This patch also replaces all occurrances of 'simplify-cfg' by 'simplifycfg'. Reason for choosing that form for the name is that it matches the DEBUG_TYPE for the pass, and the legacy PM name and also how it is spelled out in other passes such as 'loop-simplifycfg', and in other options such as 'simplifycfg-merge-cond-stores'. I for some reason the name should be changed to 'simplify-cfg' in the future, then I think such a renaming should be more widely done and not only impacting the PassRegistry. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D105627	2021-07-09 09:47:03 +02:00
Reshabh Sharma	2e194dec60	[ASan][AMDGPU] Make shadow offset match X86 on Linux This patch explicitly sets the shadow offset for AMDGPU to match that of X86 on Linux. Reviewed By: vitalybuka https://reviews.llvm.org/D105282	2021-07-09 07:48:03 +05:30
Alexey Bataev	8af69975af	[InstCombine][NFC]Use only `replaceInstUsesWith`, NFC.	2021-07-08 13:58:30 -07:00
David Blaikie	1def2579e1	PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23 C++23 will make these conversions ambiguous - so fix them to make the codebase forward-compatible with C++23 (& a follow-up change I've made will make this ambiguous/invalid even in <C++23 so we don't regress this & it generally improves the code anyway)	2021-07-08 13:37:57 -07:00
Vitaly Buka	915e07605c	[msan] Handle funnel shifts Fixes https://bugs.llvm.org/show_bug.cgi?id=50840 Differential Revision: https://reviews.llvm.org/D105387	2021-07-08 12:49:49 -07:00
Alexey Bataev	c574d2fbac	[SLP]Improve vectorization of stores. Patch tries to improve the vectorization of stores. Originally, we just check the type and the base pointer of the store. Patch adds some extra checks to avoid non-profitable vectorization cases. It includes analysis of the scalar values to be stored and triggers the vectorization attempt only if the scalar values have same/alt opcode and are from same basic block, i.e. we don't end up immediately with the gather node, which is not profitable. This also improves compile time by filtering out non-profitable cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D104122	2021-07-08 12:35:39 -07:00
Alexey Bataev	0d74fd3fdf	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Alexey Bataev	b5113bff46	[Instcombine]Transform reduction+(sext/zext(<n x i1>) to <n x im>) to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im. Some of the SPEC tests end up with reduction+(sext/zext(<n x i1>) to <n x im>) pattern, which can be transformed to [-]zext/trunc(ctpop(bitcast <n x i1> to in)) to im. Also, reduction+(<n x i1>) can be transformed to ctpop(bitcast <n x i1> to in) & 1 != 0. Differential Revision: https://reviews.llvm.org/D105587	2021-07-08 07:56:41 -07:00
Michael Liao	4e5d9c8803	[Internalize] Preserve variables externally initialized. - ``externally_initialized`` variables would be initialized or modified elsewhere. Particularly, CUDA or HIP may have host code to initialize or modify ``externally_initialized`` device variables, which may not be explicitly referenced on the device side but may still be used through the host side interfaces. Not preserving them triggers the elimination of them in the GlobalDCE and breaks the user code. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D105135	2021-07-08 10:48:19 -04:00
Nikita Popov	84c15bc018	[SCEVExpander] Support opaque pointers This adds support for opaque pointers to expandAddToGEP() by always generating an i8 GEP for opaque pointers. After looking at some other cases (constexpr GEP folding, SROA GEP generation), I've come around to the idea that we should use i8 GEPs for opaque pointers, because the alternative would be to guess a GEP type from surrounding code, which will not be reliable. Ultimately, i8 GEPs is where we want to end up anyway, and opaque pointers just make that the natural choice. There are a couple of other places in SCEVExpander that check pointer element types, I plan to update those when I run across usable test coverage that doesn't assert elsewhere. Differential Revision: https://reviews.llvm.org/D105398	2021-07-07 20:47:59 +02:00
Sanjay Patel	97c473ad39	[SLP] rename variable to not be misleading; NFC The reduction matching was probably only dealing with binops when it was written, but we have now generalized it to handle select and intrinsics too, so assert on that too.	2021-07-07 14:40:21 -04:00
Arnold Schwaighofer	2937f8d148	[coro async] Cap the alignment of spilled values (vs. allocas) at the max frame alignment Before this patch we would normally use the ABI alignment which can be to high for the context alginment. For spilled values we don't need ABI alignment, since the frame entry's address is not escaped. rdar://79664965 Differential Revision: https://reviews.llvm.org/D105288	2021-07-07 08:06:25 -07:00
Philip Reames	723144665b	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4) Resubmit after the following changes: * Fix a latent bug related to unrolling with required epilogue (see `e49d65f`). I believe this is the cause of the prior PPC buildbot failure. * Disable non-latch exits for epilogue vectorization to be safe (`9ffa90d`) * Split out assert movement (`600624a`) to reduce churn if this gets reverted again. Previous commit message (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-07-07 07:44:35 -07:00
Dylan Fleming	7215dcfe36	[SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths Depends on D104239 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105341	2021-07-07 15:30:10 +01:00
Arnold Schwaighofer	033de11150	[coro async] Move code to proper switch While upstreaming patches this code somehow was applied to the wrong switch statement. Differential Revision: https://reviews.llvm.org/D105504	2021-07-07 06:19:08 -07:00
Max Kazantsev	19885c7adf	[NFC] Remove duplicate function calls Removed repeated call of L->getHeader(). Now using previously stored return value. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D105535 Reviewed By: mkazantsev	2021-07-07 17:02:36 +07:00
Dylan Fleming	7586b47fb6	[SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104239	2021-07-07 09:58:05 +01:00
Johannes Doerfert	168a9234d7	[Attributor][FIX] Replace uses first, then values Before we replaced value by registering all their uses. However, as we replace a value old uses become stale. We now replace values explicitly and keep track of "new values" when doing so to avoid replacing only uses in stale/old values but not their replacements.	2021-07-06 22:43:51 -05:00
Johannes Doerfert	aa3768278d	[Attributor] Introduce a helper function to deal with undef + none We often need to deal with the value lattice that contains none and undef as special values. A simple helper makes this much nicer. Differential Revision: https://reviews.llvm.org/D103857	2021-07-06 22:41:21 -05:00
Johannes Doerfert	fc82409b5c	[Attributor] Simplify operands inside of simplification AAs first When we do simplification via AAPotentialValues or AAValueConstantRange we need to simplify the operands of an instruction we deconstruct first. This does not only improve the result, see for example range.ll, but is required as we allow outside AAs to provide simplification rules via callbacks. If we do ignore the simplification rules and base other simplifications on the IR instead we can create an inconsistent state.	2021-07-06 22:41:18 -05:00
Eli Friedman	7ac1c7bead	Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Recommitting with fix to MemoryDepChecker::isDependent. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 12:16:05 -07:00
Eli Friedman	a6d081b2cb	Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers." This reverts commit `74d6ce5d5f`. Seeing crashes on buildbots in MemoryDepChecker::isDependent.	2021-07-06 11:17:13 -07:00
Philip Reames	9ffa90d6c2	[LV] Disable epilogue vectorization for non-latch exits When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed. Better late (4 months), than never right? I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error. Thus, let's play it safe for now.	2021-07-06 10:57:10 -07:00
Philip Reames	600624a103	[LoopVersion] Move an assert [nfc-ish]	2021-07-06 10:57:10 -07:00
Eli Friedman	74d6ce5d5f	[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers. As part of making ScalarEvolution's handling of pointers consistent, we want to forbid multiplying a pointer by -1 (or any other value). This means we can't blindly subtract pointers. There are a few ways we could deal with this: 1. We could completely forbid subtracting pointers in getMinusSCEV() 2. We could forbid subracting pointers with different pointer bases (this patch). 3. We could try to ptrtoint pointer operands. The option in this patch is more friendly to non-integral pointers: code that works with normal pointers will also work with non-integral pointers. And it seems like there are very few places that actually benefit from the third option. As a minimal patch, the ScalarEvolution implementation of getMinusSCEV still ends up subtracting pointers if they have the same base. This should eliminate the shared pointer base, but eventually we'll need to rewrite it to avoid negating the pointer base. I plan to do this as a separate step to allow measuring the compile-time impact. This doesn't cause obvious functional changes in most cases; the one case that is significantly affected is ICmpZero handling in LSR (which is the source of almost all the test changes). The resulting changes seem okay to me, but suggestions welcome. As an alternative, I tried explicitly ptrtoint'ing the operands, but the result doesn't seem obviously better. I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out how to repair it to test what it was actually trying to test. Differential Revision: https://reviews.llvm.org/D104806	2021-07-06 10:54:41 -07:00
Arnold Schwaighofer	846a530e7d	Fix coro lowering of single predecessor phis Code assumes that uses of single predecessor phis are not live accross suspend points. Cleanup any single predecessor phis preceeding the code making this assumption. rdar://76020301 Differential Revision: https://reviews.llvm.org/D105488	2021-07-06 10:22:25 -07:00
Alexey Bataev	4e1a0684f1	[SLP]Fix non-determinism in PHI sorting. Compare type IDs and DFS numbering for basic block instead of addresses to fix non-determinism. Differential Revision: https://reviews.llvm.org/D105031	2021-07-06 08:45:45 -07:00
Arnold Schwaighofer	130ea3ceb4	Use swift mangling for resume functions The resume partial functions generated for swift suspend points will now use a Swift mangling suffix. Await resume partial functions will use the suffix 'TQ'[0-9]+'_' (e.g "...TQ0_") and suspend resume partial functions will use the suffix 'TY'[0-9]+'_' (e.g "...TY1_"). Reviewed By: nate_chandler Differential Revision: https://reviews.llvm.org/D104144	2021-07-06 08:27:46 -07:00
Florian Hahn	ef0d147cdc	Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups. This reverts commit `706bbfb35b`. The committed version moves the definition of VPReductionPHIRecipe out of an ifdef only intended for ::print helpers. This should resolve the build failures that caused the revert	2021-07-06 14:15:42 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Florian Hahn	706bbfb35b	Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups This reverts commit `3fed6d443f`, `bbcbf21ae6` and `6c3451cd76`. The changes causing build failures with certain configurations, e.g. https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction, llvm::ArrayRef<llvm::VPValue>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]': LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe' collect2: error: ld returned 1 exit status	2021-07-06 12:10:03 +01:00
Florian Hahn	3fed6d443f	[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual. VPReductionRecipe overrides those implementations. Mark them as virtual in the VPWidenPHIRecipe to unbreak build in certain configurations.	2021-07-06 12:00:41 +01:00
Florian Hahn	bbcbf21ae6	[VPlan] Add destructor to VPReductionRecipe to unbreak build. Attempt to unbreak https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio	2021-07-06 11:41:20 +01:00
Florian Hahn	6c3451cd76	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Kerry McLaughlin	17b701c43c	[LV] Collect a list of all element types found in the loop (NFC) Splits `getSmallestAndWidestTypes` into two functions, one of which now collects a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not have to iterate over all instructions in the loop again in other places, such as in D102253 which disables scalable vectorization of a loop if any of the instructions use invalid types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105437	2021-07-06 10:37:41 +01:00
Akira Hatanaka	28fe9afdba	[ObjC][ARC] Prevent moving objc_retain calls past objc_release calls that release the retained object This patch fixes what looks like a longstanding bug in ARC optimizer where it reverses the order of objc_retain calls and objc_release calls that retain and release the same object. The code in ARC optimizer that is responsible for code motion takes the following steps: 1. Traverse the CFG bottom-up and determine how far up objc_release calls can be moved. Determine the insertion points for the objc_release calls, but don't actually move them. 2. Traverse the CFG top-down and determine how far down objc_retain calls can be moved. Determine the insertion points for the objc_retain calls, but don't actually move them. 3. Try to move the objc_retain and objc_release calls if they can't be removed. The problem is that the insertion points for the objc_retain calls are determined in step 2 without taking into consideration the insertion points for objc_release calls determined in step 1, so the order of an objc_retain call and an objc_release call can be reversed, which is incorrect, even though each step is correct in isolation. To fix this bug, this patch teaches the top-down traversal step to take into consideration the insertion points for objc_release calls determined in the bottom-up traversal step. Code motion for an objc_retain call is disabled if there is a possibility that it can be moved past an objc_release call that releases the retained object. rdar://79292791 Differential Revision: https://reviews.llvm.org/D104953	2021-07-05 12:16:15 -07:00
Sanjay Patel	40b752d28d	[InstCombine] fold icmp slt/sgt of offset value with constant This follows up patches for the unsigned siblings: `0c400e8953` `c7b658aeb5` We are translating an offset signed compare to its unsigned equivalent when one end of the range is at the limit (zero or unsigned max). (X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1) (X + C2) <s C --> X >u (C ^ SMAX) (if C == C2) This probably does not show up much in IR derived from C/C++ source because that would likely have 'nsw', and we have folds for that already. As with the previous unsigned transforms, the folds could be generalized to handle non-constant patterns: https://alive2.llvm.org/ce/z/Y8Xrrm ; sgt define i1 @src(i8 %a, i8 %c) { %c2 = add i8 %c, 1 %t = add i8 %a, %c2 %ov = icmp sgt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_off = sub i8 127, %c ; SMAX %ov = icmp ult i8 %a, %c_off ret i1 %ov } https://alive2.llvm.org/ce/z/c8uhnk ; slt define i1 @src(i8 %a, i8 %c) { %t = add i8 %a, %c %ov = icmp slt i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c) { %c_offnot = xor i8 %c, 127 ; SMAX %ov = icmp ugt i8 %a, %c_offnot ret i1 %ov }	2021-07-05 10:08:31 -04:00
Caroline Concatto	b868a2d2c6	[SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector. The function vectorizeChainsInBlock does not support scalable vector, because function like canReuseExtract and isCommutative in the code path assert with scalable vectors. This patch avoids vectorizing blocks that have extract instructions with scalable vector.. Differential Revision: https://reviews.llvm.org/D104809	2021-07-05 12:43:41 +01:00
Stephen Tozer	14b62f7e2f	[DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops This patch fixes an issue which occurred in CodeGenPrepare and HWAddressSanitizer, which both at some point create a map of Old->New instructions and update dbg.value uses of these. They did this by iterating over the dbg.value's location operands, and if an instance of the old instruction was found, replaceVariableLocationOp would be called on that dbg.value. This would cause an error if the same operand appeared multiple times as a location operand, as the first call to replaceVariableLocationOp would update all uses of the old instruction, invalidating the old iterator and eventually hitting an assertion. This has been fixed by no longer iterating over the dbg.value's location operands directly, but by first collecting them into a set and then iterating over that, ensuring that we never attempt to replace a duplicated operand multiple times. Differential Revision: https://reviews.llvm.org/D105129	2021-07-05 10:35:19 +01:00
Nikita Popov	a213f735d8	[IR] Deprecate GetElementPtrInst::CreateInBounds without element type This API is not compatible with opaque pointers, the method accepting an explicit pointer element type should be used instead. Thankfully there were few in-tree users. The BPF case still ends up using the pointer element type for now and needs something like D105407 to avoid doing so.	2021-07-04 16:49:30 +02:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Roman Lebedev	fc150cecd7	[SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable This replaces the current ad-hoc implementation, by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`, with one exception that here in SimplifyCFG we are allowed to remove EH instructions. Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return), arithmetic/logic operations, etc. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105374	2021-07-03 10:45:44 +03:00
Fangrui Song	252a1eecc0	[ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for isDeclarationForLinker `GlobalValue`s. It missed a case for imported declarations (`doImportAsDefinition` is false while `isPerformingImport` is true). This can lead to a linker error for a default visibility symbol in `ld.lld -shared`. When `ClearDSOLocalOnDeclarations` is true, we check `isPerformingImport() && !doImportAsDefinition(&GV)` along with `GV.isDeclarationForLinker()`. The new condition checks an imported declaration. This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104986	2021-07-02 17:08:25 -07:00
Roman Lebedev	53fef0b293	[NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs Mimics similar change for InstCombine: `ce192ced2b` / D104602 All these uses are in blocks that aren't reachable from function's entry, and said blocks are removed by SimplifyCFG itself, so we can't really test this change.	2021-07-02 22:11:52 +03:00
Heejin Ahn	51fecd17bb	[InstCombine] Don't combine PHI before catchswitch This tries to bail out if the PHI is in a `catchswitch` BB in InstCombine. A PHI cannot be combined into a non-PHI instruction if it is in a `catchswitch` BB, because `catchswitch` BB cannot have any non-PHI instruction other than `catchswitch` itself. The given test case started crashing after D98058. Reviewed By: lebedev.ri, rnk Differential Revision: https://reviews.llvm.org/D105309	2021-07-02 12:10:24 -07:00
Roman Lebedev	da81ec6158	[SimplifyCFG] Volatile memory operations do not trap Somewhat related to D105338. While it is up for discussion whether or not volatile store traps, so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap. And even if simplifycfg currently concervatively believes that to be the case, instcombine does not: https://godbolt.org/z/5vhv4K5b8 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D105343	2021-07-02 21:47:44 +03:00
Alexey Bataev	7f7e4aed21	[SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by V.Dmitriev. Reduces number of arguments	2021-07-02 10:30:13 -07:00
Jon Roelofs	37b6e03c18	[Intrinsics] Make MemCpyInlineInst a MemCpyInst This opens up more optimization opportunities in passes that already handle MemCpyInst's. Differential revision: https://reviews.llvm.org/D105247	2021-07-02 10:25:24 -07:00
Roman Lebedev	13e35ac124	[NFC][InstCombine] visitUnreachableInst(): enhance comments somewhat	2021-07-02 17:30:01 +03:00
Roman Lebedev	dadedc99e9	[InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. This is a recommit of `d9d65527c2`, which i immediately reverted because i have messed up something during branch switch, and `597ccc92ce` accidentally ended up being pushed, which was very much not the intention. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:20:21 +03:00
Roman Lebedev	24d271bb18	Revert "https://godbolt.org/z/5vhv4K5b8 " This reverts commit `597ccc92ce`.	2021-07-02 17:17:55 +03:00
Roman Lebedev	93a1642763	Revert "[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable" This reverts commit `d9d65527c2`.	2021-07-02 17:17:47 +03:00
Roman Lebedev	d9d65527c2	[NFCI][InstCombine] visitUnreachableInst(): iteratively erase instructions leading to unreachable In the original review D87149 it was mentioned that this approach was tried, and it lead to infinite combine loops, but i'm not seeing anything like that now, neither in the `check-llvm`, nor on some codebases i tried. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D105339	2021-07-02 17:17:03 +03:00
Roman Lebedev	597ccc92ce	https://godbolt.org/z/5vhv4K5b8	2021-07-02 17:16:19 +03:00
Nico Weber	a92964779c	Revert "[InstrProfiling] Use external weak reference for bias variable" This reverts commit `33a7b4d9d8`. Breaks check-profile on macOS, see comments on https://reviews.llvm.org/D105176	2021-07-02 09:05:12 -04:00
Florian Hahn	a3ca578eb9	[Matrix] Fix crash during fusion if the same load is re-used. This patch fixes a crash when the same load is used for both operands of a fuseable multiply.	2021-07-02 14:00:17 +01:00
Alexey Bataev	28ac873bcb	[SLP]Fix gathering of the scalars by not ignoring UndefValues. The compiler should not ignore UndefValue when gathering the scalars, otherwise the resulting code may be less defined than the original one. Also, grouped scalars to insert them at first to reduce the analysis in further passes. Differential Revision: https://reviews.llvm.org/D105275	2021-07-02 04:46:48 -07:00
Florian Hahn	7655061cc6	[Matrix] Hoist address computation before multiply to enable fusion. If the store address does not dominate the matrix multiply, try to hoist address computation instructions without side-effects and/or memory reads before the multiply, to allow fusion. Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D105193	2021-07-02 09:52:11 +01:00
Evgeniy Brevnov	9568811cb8	[NFC][DSE]Change 'do-while' to 'for' loop to simplify code structure With 'for' loop there is is a single place where 'Current' is adjusted. It helps to avoid copy paste and makes a bit easy to understand overall loop controll flow. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D101044	2021-07-02 10:00:47 +07:00
Craig Topper	066524ea54	[ScalarizeMaskedMemIntrin][SelectionDAGBuilder] Use the element type to calculate alignment for gather/scatter when alignment operand is 0. Previously we used the vector type, but we're loading/storing invididual elements so I think only element alignment should matter. Noticed while looking at the code for something else so I don't have a test case. Differential Revision: https://reviews.llvm.org/D105220	2021-07-01 19:08:47 -07:00
Petr Hosek	33a7b4d9d8	[InstrProfiling] Use external weak reference for bias variable We need the compiler generated variable to override the weak symbol of the same name inside the profile runtime, but using LinkOnceODRLinkage results in weak symbol being emitted which leads to an issue where the linker might choose either of the weak symbols potentially disabling the runtime counter relocation. This change replaces the use of weak definition inside the runtime with an external weak reference to address the issue. We also place the compiler generated symbol inside a COMDAT group so dead definition can be garbage collected by the linker. Differential Revision: https://reviews.llvm.org/D105176	2021-07-01 15:25:31 -07:00
Philip Reames	955f125899	[instcombine] Fold overflow check using overflow intrinsic to comparison This follows up to D104665 (which added umulo handling alongside the existing uaddo case), and generalizes for the remaining overflow intrinsics. I went to add analogous handling to LVI, and discovered that LVI already had a more general implementation. Instead, we can port was LVI does to instcombine. (For context, LVI uses makeExactNoWrapRegion to constrain the value 'x' in blocks reached after a branch on the condition `op.with.overflow(x, C).overflow`.) Differential Revision: https://reviews.llvm.org/D104932	2021-07-01 09:41:55 -07:00
Arnold Schwaighofer	4a361f5209	[coro async] Add support for specifying which parameter is swiftself in async resume functions Differential Revision: https://reviews.llvm.org/D104147	2021-07-01 07:33:15 -07:00
David Sherwood	51b4ab26ca	[NFC] Add new setDebugLocFromInst that uses the class Builder by default In lots of places we were calling setDebugLocFromInst and passing in the same Builder member variable found in InnerLoopVectorizer. I personally found this confusing so I've changed the interface to take an Optional<IRBuilder<> *> and we can now pass in None when we want to use the class member variable. Differential Revision: https://reviews.llvm.org/D105100	2021-07-01 14:23:34 +01:00
Roman Lebedev	333d3a3cdf	[NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO' I.e. it will be `false` for thin lto.	2021-07-01 10:09:24 +03:00
Chuanqi Xu	51fbd18706	[Coroutine] Recommit Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-07-01 11:01:28 +08:00
Sanjay Patel	0c400e8953	[InstCombine] fold icmp ult of offset value with constant This is one sibling of the fold added with `c7b658aeb5` . (X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN) I'm still not sure how to describe it best, but we're translating 2 constants from an unsigned range comparison to signed because that eliminates the offset (add) op. This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/K-fMBf define i1 @src(i8 %a, i8 %c2) { %t = add i8 %a, %c2 %c = add i8 %c2, 128 ; SMIN %ov = icmp ult i8 %t, %c ret i1 %ov } define i1 @tgt(i8 %a, i8 %c2) { %not_c2 = xor i8 %c2, -1 %ov = icmp sgt i8 %a, %not_c2 ret i1 %ov }	2021-06-30 19:00:12 -04:00
Xun Li	822b92aae4	[Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened Relevant discussion can be found at: https://lists.llvm.org/pipermail/llvm-dev/2021-January/148197.html In the existing design, An SCC that contains a coroutine will go through the folloing passes: Inliner -> CoroSplitPass (fake) -> FunctionSimplificationPipeline -> Inliner -> CoroSplitPass (real) -> FunctionSimplificationPipeline The first CoroSplitPass doesn't do anything other than putting the SCC back to the queue so that the entire pipeline can repeat. As you can see, we run Inliner twice on the SCC consecutively without doing any real split, which is unnecessary and likely unintended. What we really wanted is this: Inliner -> FunctionSimplificationPipeline -> CoroSplitPass -> FunctionSimplificationPipeline (note that we don't really need to run Inliner again on the ramp function after split). Hence the way we do it here is to move CoroSplitPass to the end of the CGSCC pipeline, make it once for real, insert the newly generated SCCs (the clones) back to the pipeline so that they can be optimized, and also add a function simplification pipeline after CoroSplit to optimize the post-split ramp function. This approach also conforms to how the new pass manager works instead of relying on an adhoc post split cleanup, making it ready for full switch to new pass manager eventually. By looking at some of the changes to the tests, we can already observe that this changes allows for more optimizations applied to coroutines. Reviewed By: aeubanks, ChuanqiXu Differential Revision: https://reviews.llvm.org/D95807	2021-06-30 11:38:14 -07:00
Sanjay Patel	c7b658aeb5	[InstCombine] fold icmp of offset value with constant There must be a better way to describe this pattern in words? (X + C2) >u C --> X <s -C2 (if C == C2 + SMAX) This could be extended to handle the more general (non-constant) pattern too: https://alive2.llvm.org/ce/z/rdfNFP define i1 @src(i8 %a, i8 %c1) { %t = add i8 %a, %c1 %c2 = add i8 %c1, 127 ; SMAX %ov = icmp ugt i8 %t, %c2 ret i1 %ov } define i1 @tgt(i8 %a, i8 %c1) { %neg_c1 = sub i8 0, %c1 %ov = icmp slt i8 %a, %neg_c1 ret i1 %ov } The pattern was noticed as a by-product of D104932.	2021-06-30 13:37:31 -04:00
Philip Reames	c4fc2cb5b2	[instcombine] umin(x, 1) == zext(x != 0) We already implemented this for the select form, but the intrinsic form was missing. Note that this doesn't change poison behavior as 1 is non-poison, and the optimized form is still poison exactly when x is.	2021-06-30 10:20:01 -07:00
Joseph Huber	ecabc6684f	[OpenMP] Change analysis remarks to not emit on cold functions The remarks will trigger on some functions that are marked cold, such as the `__muldc3` intrinsic functions. Change the remarks to avoid these functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105196	2021-06-30 11:54:24 -04:00
Nico Weber	db86e5c914	Revert "[Coroutine] Add statistics for the number of elided coroutine" This reverts commit `1d9539cf49`. Test fails in LLVM_ENABLE_ASSERTIONS=OFF builds (such as regular release builds).	2021-06-30 10:22:45 -04:00
Joseph Huber	0edb87773b	[OpenMP] Add additional remarks for OpenMPOpt This patch adds additional remarks, suggesting the use of `noescape` for failed globalization and indicating when internalization failed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105150	2021-06-30 09:49:25 -04:00
David Sherwood	7b7b5b5a26	[NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-30 11:11:49 +01:00
Chuanqi Xu	801c2b9bba	[FuncSpec] Add an option to specializing literal constant Now the option is off by default. Since we are not sure if this option would make the compile time increase aggressively. Although we tested it on SPEC2017, we may need to test more to make it on by default. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104365	2021-06-30 11:26:44 +08:00
Chuanqi Xu	1d9539cf49	[Coroutine] Add statistics for the number of elided coroutine Now we lack a benchmark to measure the performance change for each commit. Since coro elide is the main optimization in coroutine module, I wonder it may be an estimation to count the number of elided coroutine in private code bases. e.g., for a certain commit, if we found that the number of elided goes down, we could find it before the commit check-in. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D105095	2021-06-30 11:20:53 +08:00
Jianzhou Zhao	ae6648cee0	[dfsan] Expose dfsan_get_track_origins to get origin tracking status This allows application code checks if origin tracking is on before printing out traces. -dfsan-track-origins can be 0,1,2. The current code only distinguishes 1 and 2 in compile time, but not at runtime. Made runtime distinguish 1 and 2 too. Reviewed By: browneee Differential Revision: https://reviews.llvm.org/D105128	2021-06-29 20:32:39 +00:00
Nikita Popov	c4de78e91c	[SanitizerCoverage] Fix global type check with opaque pointers The code was previously relying on the fact that an incorrectly typed global would result in the insertion of a BitCast constant expression. With opaque pointers, this is no longer the case, so we should check the type explicitly.	2021-06-29 20:32:14 +02:00
Alexey Bataev	129ae515fb	[INSTCOMBINE] Transform reduction(shuffle V, poison, unique_mask) to reduction(V). After SLP + LTO we may have have reduction(shuffle V, poison, mask). This can be simplified to just reduction(V) if the mask is only for single vector and just all elements from this vector are permuted, without reusing, replacing with undefs and/or other values, etc. Differential Revision: https://reviews.llvm.org/D105053	2021-06-29 10:02:38 -07:00
Philip Reames	e49d65f36d	[LV] Fix bug when unrolling (only) a loop with non-latch exit If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so. The included test case on an unmodified opt will access memory one past the expected bound. As a result, this patch is fixing a latent miscompile. Differential Revision: https://reviews.llvm.org/D103700	2021-06-29 08:04:26 -07:00
Johannes Doerfert	7af91a2b8f	[Attributor][NFCI] Make the state of AAValueSimplify explicit As we have done with other states we want the AAValueSimplify state to be explicit to use it more easily in our helpers.	2021-06-29 09:38:22 -05:00
Johannes Doerfert	dcbe58d94c	[Attributor][NFCI] Remove unneeded namespace	2021-06-29 09:38:20 -05:00
Johannes Doerfert	457bd5c8d5	[Attributor] Teach AAPotentialValues about constant select conditions There was a TODO but now we actually check if the select condition is assumed constant and only look at the relevant operand.	2021-06-29 09:38:18 -05:00
Johannes Doerfert	8dc9bb6d85	[Attributor][NFC] Clang format	2021-06-29 09:38:15 -05:00
Johannes Doerfert	a33e128012	[InstCombine] Gracefully handle an alloca outside the alloca-AS While we might eventually want to disallow allocas that do not have the alloca-AS set, it seems undesirable to crash on them. Add a cast when required so that we can support such allocas (at least here). Differential Revision: https://reviews.llvm.org/D104866	2021-06-29 09:38:13 -05:00
David Sherwood	9de63367d8	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `9dde514162`.	2021-06-29 15:20:22 +01:00
David Sherwood	9dde514162	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 14:34:30 +01:00
David Sherwood	8a3365fba2	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `dcfc2c3fac`.	2021-06-29 14:04:42 +01:00
Florian Hahn	47215e1c62	[LV] Fix crash when target instruction for sinking is dead. This patch fixes a crash when the target instruction for sinking is dead. In that case, no recipe is created and trying to get the recipe for it results in a crash. To ensure all sink targets are alive, find & use the first previous alive instruction. Note that the case where the sink source is dead is already handled. Found by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104603	2021-06-29 13:31:22 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
Roman Lebedev	6cf6f6f65f	[NFC][InstCombine] foldAggregateConstructionIntoAggregateReuse(): cast to Instruction eagerly In all of these, the value must be an instruction for us to succeed anyway, so change it to maybe hopefully make further changes more straight-forward.	2021-06-29 13:29:18 +03:00
David Sherwood	dcfc2c3fac	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 09:14:35 +01:00
Sanjay Patel	9d0bf7699c	[InstCombine] don't try to fold a constant expression that can trap (PR50906) We could use a bigger hammer and bail out on any constant expression, but there's a regression test that appears to validly do the transform (although it may not have been intending to check that optimization).	2021-06-28 17:00:21 -04:00
Joseph Huber	57ad2e1067	[OpenMP] Prevent OpenMPOpt from internalizing uncalled functions Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary. Depends on D102423 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104890	2021-06-28 16:47:53 -04:00
Nikita Popov	7ac0442fe5	[SanitizerCoverage] Support opaque pointers Pass element type rather than pointer type to some functions, so we know which type to use for the global variables.	2021-06-28 22:18:42 +02:00
Arnold Schwaighofer	3dee1e8a84	[coro] Fix rematerializable instruction sinking to coro.suspend blocks There is a constraint that coro.suspend instructions need to be in their own blocks. The coro split pass initially creates IR that obeys this constraint (which is later checked). Sinking rematerializable instructions into these blocks breaks that constraint. Instead rematerialize in the predecessor block to the suspend's single predecessor block. Differential Revision: https://reviews.llvm.org/D104051	2021-06-28 09:37:45 -07:00
Nico Weber	540b4a5fb3	Revert "[DebugInfo] Enable variadic debug value salvaging" This reverts commit `adace79652`. Still breaks things, see comment on https://reviews.llvm.org/D91722	2021-06-28 11:25:09 -04:00
Reshabh Sharma	ae983de6cc	[InferAddressSpaces] NFC: For noop IntToPtr/PtrToInt pair cast to operator instead of PtrToInt Compiler crashes at an assertion while casting operands to PtrToIntInst at some cases when ptrtoint is present as an explicit operand to inttoptr. Explicit instruction operator as operand can not be casted to an Instruction. This patch replaces cast from PtrToInst to Operator which are later checked for constant expressions. Differential Revision: https://reviews.llvm.org/D105002	2021-06-28 19:24:26 +05:30
Joseph Huber	13b2fba239	[OpenMP][NFC] Fix typo in OpenMPOpt	2021-06-28 09:49:14 -04:00
Joseph Huber	4024087731	[OpenMP][NFC] Fix missing argument	2021-06-28 09:15:01 -04:00
Joseph Huber	4a6bd8e3e7	[OpenMP] Increase attributor iterations on the GPU Increase the number of attributor iterations on a GPU target. I forgot to change this in D104416. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104920	2021-06-28 08:50:49 -04:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Max Kazantsev	d58514d41c	[LSR][NFC] Make sure that after the canonicalization the formula is canonical	2021-06-28 12:50:04 +07:00
Max Kazantsev	7c73c2ede8	[LoopDeletion] Benefit from branches by undef conditions when symbolically executing 1st iteration We can exploit branches by `undef` condition. Frankly, the LangRef says that such branches are UB, so we can assume that all outgoing edges of such blocks are dead. However, from practical perspective, we know that this is not supported correctly in some other places. So we are being conservative about it. Branch by undef is treated in the following way: - If it is a loop-exiting branch, we always assume it exits the loop; - If not, we arbitrarily assume it takes `true` value. Differential Revision: https://reviews.llvm.org/D104689 Reviewed By: nikic	2021-06-28 11:39:46 +07:00
Nikita Popov	e81702912e	[DSE] Preserve address space Preserve address space when inserting i8* cast.	2021-06-27 20:26:00 +02:00
Nikita Popov	9aa951e80e	[MemCpyOpt] Preserve address space Preserve address space when generating the cast to i8*.	2021-06-27 20:21:19 +02:00
Nikita Popov	f00941e061	[DSE] Support opaque pointers For the start shortening optimization, always use a i8 type for the GEP, as it is a raw offset calculation. Handling of non-i8* memset/memcpy arguments requires insertion of casts. These cases were previously miscompiled, as the offset calculation was performed on the wrong type.	2021-06-27 17:41:40 +02:00
Nikita Popov	f025053977	[MemCpyOpt] Handle unusual memcpy element type Apparently, it is legal to use memcpy/memset with pointer types other than i8. Prior to `81fcdae68c` this case was silently miscompiled, as the i8 offset calculation was performed on some other type. Now it would crash due to a type mismatch. Fix this by inserting an explicit bitcast to i8.	2021-06-27 16:21:44 +02:00
Sanjay Patel	153da08a6c	[InstCombine] hoist min/max intrinsics above select with constant op This is an extension of the handling for unary intrinsics and follows the logic that we use for binary ops. We don't canonicalize to min/max intrinsics yet, but this might help unlock other folds seen in D98152.	2021-06-27 10:02:23 -04:00
Nikita Popov	81fcdae68c	[MemCpyOpt] Support opaque pointers	2021-06-27 15:52:38 +02:00
Nikita Popov	a9129f8964	[LoadStoreVectorizer] Support opaque pointers There are remaining redundant bitcasts.	2021-06-27 15:42:16 +02:00
Florian Hahn	f1a6430272	[VPlan] Track both incoming values for first-order recurrence phis. This patch updates VPWidenPHI recipes for first-order recurrences to also track the incoming value from the back-edge. Similar to D99294, which did the same for reductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104197	2021-06-27 14:29:35 +01:00
Florian Hahn	7f36981977	[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC). Suggested in D104197, avoids the early exit.	2021-06-26 13:13:25 +01:00
Andrew Browne	45f6d5522f	[DFSan] Change shadow and origin memory layouts to match MSan. Previously on x86_64: +--------------------+ 0x800000000000 (top of memory) \| application memory \| +--------------------+ 0x700000008000 (kAppAddr) \| \| \| unused \| \| \| +--------------------+ 0x300000000000 (kUnusedAddr) \| origin \| +--------------------+ 0x200000008000 (kOriginAddr) \| unused \| +--------------------+ 0x200000000000 \| shadow memory \| +--------------------+ 0x100000008000 (kShadowAddr) \| unused \| +--------------------+ 0x000000010000 \| reserved by kernel \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem & ~0x600000000000 SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow Now for x86_64: +--------------------+ 0x800000000000 (top of memory) \| application 3 \| +--------------------+ 0x700000000000 \| invalid \| +--------------------+ 0x610000000000 \| origin 1 \| +--------------------+ 0x600000000000 \| application 2 \| +--------------------+ 0x510000000000 \| shadow 1 \| +--------------------+ 0x500000000000 \| invalid \| +--------------------+ 0x400000000000 \| origin 3 \| +--------------------+ 0x300000000000 \| shadow 3 \| +--------------------+ 0x200000000000 \| origin 2 \| +--------------------+ 0x110000000000 \| invalid \| +--------------------+ 0x100000000000 \| shadow 2 \| +--------------------+ 0x010000000000 \| application 1 \| +--------------------+ 0x000000000000 MEM_TO_SHADOW(mem) = mem ^ 0x500000000000 SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000 Reviewed By: stephan.yichao.zhao, gbalats Differential Revision: https://reviews.llvm.org/D104896	2021-06-25 17:00:38 -07:00
Nikita Popov	fdd4c199a1	Revert "[InstCombine] Make indexed compare fold opaque ptr compatible" This reverts commit `5cb20ef8a2`. Assertion failures with this patch were reported on https://reviews.llvm.org/rG5cb20ef8a235, revert for now.	2021-06-26 00:32:59 +02:00
Eli Friedman	8d5bf0709d	[NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion The implementation is identical, but it makes the semantics a bit more obvious.	2021-06-25 14:43:13 -07:00
Juneyoung Lee	1605593440	[SimplifyLibCalls] Fix memchr opt to use CreateLogicalAnd This fixes a bug at LibCallSimplifier::optimizeMemChr which does the following transformation: ``` // memchr("\r\n", C, 2) != nullptr -> (1 << C & ((1 << '\r') \| (1 << '\n'))) // != 0 // after bounds check. ``` As written above, a bounds check on C (whether it is less than integer bitwidth) is done before doing `1 << C` otherwise 1 << C will overflow. If the bounds check is false, the result of (1 << C & ...) must not be used at all, otherwise the result of shift (which is poison) will contaminate the whole results. A correct way to encode this is `select i1 (bounds check), (1 << C & ...), false` because select does not allow the unused operand to contaminate the result. However, this optimization was introducing `and (bounds check), (1 << C & ...)` which cannot do that. The bug was found from compilation of this C++ code: https://reviews.llvm.org/rG2fd3037ac615#1007197 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104901	2021-06-26 05:59:35 +09:00
Joseph Huber	5ccb7424fa	[OpenMP] Change OpenMPOpt to check openmp metadata The metadata added in D102361 introduces a module flag that we can check to determine if the module was compiled with `-fopenmp` enables. We can now check for the precense of this instead of scanning the call graph for OpenMP runtime functions. Depends on D102361 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102423	2021-06-25 16:34:22 -04:00
Hongtao Yu	3638085ff0	[Coroutines] Define __coro_frame_ty in function scope Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope. We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D104937	2021-06-25 12:33:20 -07:00
Florian Hahn	cc5ee857f9	[LV] Doxygenize VectorizationFactor member comments (NFC). Minor cleanup for follow-up patch.	2021-06-25 18:35:00 +01:00
Philip Reames	2cd23eb243	[instcombine] Fold overflow check using umulo to comparison If we have a umul.with.overflow where the multiply result is not used and one of the operands is a constant, we can perform the overflow check cheaper with a comparison then by performing the multiply and extracting the overflow flag. (Noticed when looking at the conditions SCEV emits for overflow checks.) Differential Revision: https://reviews.llvm.org/D104665	2021-06-25 10:25:45 -07:00
Florian Hahn	91053e327c	[LV] Reflow comment for VectorizationCostTy (NFC).	2021-06-25 14:20:06 +01:00
Arthur Eubanks	1aa02b37e7	Revert "[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions." This reverts commit `1eda5453f2`. Causes https://crbug.com/1223647: Incompatible argument and return types for 'returned' attribute tail call void @llvm.memset.p0i8.i64(i8* noalias noundef returned writeonly align 1 dereferenceable(255) %arraydecay, i8 0, i64 255, i1 false), !dbg !985	2021-06-24 19:24:34 -07:00
Nikita Popov	5cb20ef8a2	[InstCombine] Make indexed compare fold opaque ptr compatible Rather than relying on pointer type equality (which, for a change, is silently incorrect with opaque pointers) check that the GEP source element types match.	2021-06-24 22:33:01 +02:00
Arthur Eubanks	7110510eca	[WPD] Don't optimize calls more than once WPD currently assumes that there is a one to one correspondence between type test assume sequences and virtual calls. However, with -fstrict-vtable-pointers this may not be true. This ends up causing crashes when we try to optimize a virtual call more than once ( applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()). applySingleImplDevirt() actually didn't previous crash because it would replace the devirtualized call with the same direct call. Adding an assert that the call is indirect causes the corresponding test to crash with the rest of the patch. This makes Chrome successfully build with -fstrict-vtable-pointers + WPD. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D104798	2021-06-24 13:28:09 -07:00
Nikita Popov	8e0ff44bf8	[InstCombine] Make varargs cast transform compatible with opaque ptrs The whole transform can be dropped once we have fully transitioned to opaque pointers (as it's purpose is to remove no-op pointer casts). For now, make sure that it handles opaque pointers correctly.	2021-06-24 21:57:05 +02:00
Jonas Paulsson	1eda5453f2	[BuildLibCalls/SimplifyLibCalls] Fix attributes on created CallInst instructions. - When emitting libcalls, do not only pass the calling convention from the function prototype but also the attributes. - Do not pass attributes from e.g. libc memcpy to llvm.memcpy. Review: Reid Kleckner, Eli Friedman, Arthur Eubanks Differential Revision: https://reviews.llvm.org/D103992	2021-06-24 14:47:24 -05:00
Roman Lebedev	d064182612	[SimplifyCFG] Tail-merging all blocks with `resume` terminator Similar to what we already do for `ret` terminators. As noted by @rnk, clang seems to already generate a single `ret`/`resume`, so this isn't likely to cause widespread changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104849	2021-06-24 21:25:06 +03:00
Florian Hahn	833bdbe93c	[LV] Support sinking recipe in replicate region after another region. This patch handles sinking a replicate region after another replicate region. In that case, we can connect the sink region after the target region. This properly handles the case for which an assertion has been added in `337d765282`. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D103514	2021-06-24 13:58:42 +01:00
Stephen Tozer	adace79652	[DebugInfo] Enable variadic debug value salvaging This patch enables the salvaging of debug values that may be calculated from more than one SSA value, such as with binary operators that do not use a constant argument. The actual functionality for this behaviour is added in a previous commit (`c7270567`), but with the ability to actually emit the resulting debug values switched off. The reason for this is that the prior patch has been reverted several times due to issues discovered downstream, some time after the actual landing of the patch. The patch in question is rather large and touches several widely used header files, and all issues discovered are more related to the handling of variadic debug values as a whole rather than the details of the patch itself. Therefore, to minimize the build time impact and risk of conflicts involved in any potential future revert/reapply of that patch, this significantly smaller patch (that touches no header files) will instead be used as the capstone to enable variadic debug value salvaging. The review linked to this patch is mostly implemented by the previous commit, `c7270567`, but also contains the changes in this patch. Differential Revision: https://reviews.llvm.org/D91722	2021-06-24 13:16:29 +01:00
Roman Lebedev	9c4c2f2472	[SimplifyCFG] Tail-merging all blocks with `ret` terminator Based ontop of D104598, which is a NFCI-ish refactoring. Here, a restriction, that only empty blocks can be merged, is lifted. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104597	2021-06-24 13:15:39 +03:00
Stephen Tozer	c72705678c	Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands" This is a partial reapply of the original commit and the followup commit that were previously reverted; this reapply also includes a small fix for a potential source of non-determinism, but also has a small change to turn off variadic debug value salvaging, to ensure that any future revert/reapply steps to disable and renable this feature do not risk causing conflicts. Differential Revision: https://reviews.llvm.org/D91722 This reverts commit `386b66b2fc`.	2021-06-24 09:46:38 +01:00
Zequan Wu	9393894331	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This casues compiler crash: Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' This reverts commit `e3d24b45b8`.	2021-06-23 19:24:56 -07:00
Evgenii Stepanov	78f7e6d8d7	[hwasan] Respect llvm.asan.globals. This enable no_sanitize C++ attribute to exclude globals from hwasan testing, and automatically excludes other sanitizers' globals (such as ubsan location descriptors). Differential Revision: https://reviews.llvm.org/D104825	2021-06-23 18:37:00 -07:00
Nikita Popov	8321335fd8	[InstCombine] Use getFunctionType() Avoid fetching pointer element type...	2021-06-23 20:28:34 +02:00
Sami Tolvanen	e3d24b45b8	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. This relands commit `4474958d3a` with a fix to a use-of-uninitialized-value error that tripped MemorySanitizer. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-23 10:56:13 -07:00
Kuter Dinel	5d44d56f7d	[Attributor] Derive AAFunctionReachability attribute. This attribute uses Attributor's internal 'optimistic' call graph information to answer queries about function call reachability. Functions can become reachable over time as new call edges are discovered. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104599	2021-06-23 20:43:10 +03:00
Nikita Popov	00d3f7cc3c	[LAA] Make getPointersDiff() API compatible with opaque pointers Make getPointersDiff() and sortPtrAccesses() compatible with opaque pointers by explicitly passing in the element type instead of determining it from the pointer element type. The SLPVectorizer result is slightly non-optimal in that unnecessary pointer bitcasts are added. Differential Revision: https://reviews.llvm.org/D104784	2021-06-23 18:44:34 +02:00
Datta Nagraj	ad0085d338	[InstCombine] Eliminate casts to optimize ctlz operation If a ctlz operation is performed on higher datatype and then downcasted, then this can be optimized by doing a ctlz operation on a lower datatype and adding the difference bitsize to the result of ctlz to provide the same output: https://alive2.llvm.org/ce/z/8uup9M The original problem is shown in https://llvm.org/PR50173 Differential Revision: https://reviews.llvm.org/D103788	2021-06-23 11:19:12 -04:00
Sanjay Patel	1e9b6b89a7	[InstCombine] convert FP min/max with negated op to fabs This is part of improving floating-point patterns seen in: https://llvm.org/PR39480 We don't require any FMF because the 2 potential corner cases (-0.0 and NaN) are correctly handled without FMF: 1. -0.0 is treated as strictly less than +0.0 with maximum/minimum, so fabs/fneg work as expected. 2. +/- 0.0 with maxnum/minnum is indeterminate, so transforming to fabs/fneg is more defined. 3. The sign of a NaN may be altered by this transform, but that is allowed in the default FP environment. If there are FMF, they are propagated from the min/max call to one or both new operands which seems to agree with Alive2: https://alive2.llvm.org/ce/z/bem_xC	2021-06-23 10:41:39 -04:00
Roman Lebedev	ff4b1d379f	[NFCI-ish][SimplifyCFGPass] Rework and generalize `ret` block tail-merging This changes the approach taken to tail-merge the blocks to always create a new block instead of trying to reuse some block, and generalizes it to support dealing not with just the `ret` in the future. This effectively lifts the CallBr restriction, although this isn't really intentional. That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it. Other restrictions of the transform remain. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D104598	2021-06-23 14:33:18 +03:00
Joe Ellis	3c4dbf6ea9	[Verifier] Fail on overrunning and invalid indices for {insert,extract} vector intrinsics With regards to overrunning, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1) must be valid ``vec`` indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. (llvm.experimental.vector.extract) Elements ``idx`` through (``idx`` + num_elements(result_type) - 1) must be valid vector indices. If this condition cannot be determined statically but is false at runtime, then the result vector is undefined. For the non-mixed cases (e.g. inserting/extracting a scalable into/from another scalable, or inserting/extracting a fixed into/from another fixed), it is possible to statically check whether or not the above conditions are met. This was previously missing from the verifier, and if the conditions were found to be false, the result of the insertion/extraction would be replaced with an undef. With regards to invalid indices, the langref (llvm/docs/LangRef.rst) specifies: (llvm.experimental.vector.insert) ``idx`` represents the starting element number at which ``subvec`` will be inserted. ``idx`` must be a constant multiple of ``subvec``'s known minimum vector length. (llvm.experimental.vector.extract) The ``idx`` specifies the starting element number within ``vec`` from which a subvector is extracted. ``idx`` must be a constant multiple of the known-minimum vector length of the result type. Similarly, these conditions were not previously enforced in the verifier. In some circumstances, invalid indices were permitted silently, and in other circumstances, an undef was spawned where a verifier error would have been preferred. This commit adds verifier checks to enforce the constraints above. Differential Revision: https://reviews.llvm.org/D104468	2021-06-23 10:33:22 +00:00
Max Kazantsev	842b4c83cb	[LoopDeletion] Exploit undef Phi inputs when symbolically executing 1st iteration Follow-up on Roman's idea expressed in D103959. - If a Phi has undefined inputs from live blocks: - and no other inputs, assume it is undef itself; - and exactly one non-undef input, we can assume that all undefs are equal to this input. Differential Revision: https://reviews.llvm.org/D104618 Reviewed By: lebedev.ri, nikic	2021-06-23 11:53:48 +07:00
Max Kazantsev	b7d2c173eb	[LSR] Filter out zero factors. PR50765 Zero factor leads to division by zero and failure of corresponding assert as shown in PR50765. We should filter out such factors. Differential Revision: https://reviews.llvm.org/D104702 Reviewed By: huihuiz, reames	2021-06-23 10:43:06 +07:00
Jon Roelofs	493d6928fe	[Remarks] Make memsize remarks report as an analysis, not a missed opportunity. Differential revision: https://reviews.llvm.org/D104078	2021-06-22 18:22:47 -07:00
Liqiang Tao	a0d96fdd3a	[llvm][Inliner] Make PriorityInlineOrder lazily updated This patch makes PriorityInlineOrder lazily updated. The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D104654	2021-06-23 08:59:53 +08:00
Joseph Huber	1cfdcae653	[Attributor] Fix AAExecutionDomain returning true on invalid states This patch fixes a problem with the AAExecutionDomain attributor not checking if it is in a valid state. This can cause it to incorrectly return that a block is executed in a single threaded context after the attributor failed for any reason. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103186	2021-06-22 18:12:43 -04:00
Joseph Huber	44feacc736	[OpenMP] Change remaining globalization from an analysis remark to missed After landing the globalization optimizations, the precense of globalization on the device that was not put in shared or stack memory is a failed optimization with performance consequences so it should indicate a missed remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104735	2021-06-22 16:52:06 -04:00
Nikita Popov	7bb7fa12e7	[OpaquePtr] Support changing load type in InstCombine When the load type is changed to ptr, we need the load pointer type to also be ptr, because it's not allowed to create a pointer to an opaque pointer. This is achieved by adjusting the getPointerTo() API to return an opaque pointer for an opaque pointer base type. Differential Revision: https://reviews.llvm.org/D104718	2021-06-22 21:16:15 +02:00
Sami Tolvanen	33c9438f11	Revert "ThinLTO: Fix inline assembly references to static functions with CFI" This reverts commit `4474958d3a`. Breaks check-llvm on Mac.	2021-06-22 12:10:58 -07:00
Joseph Huber	ca1560da72	[OpenMP][NFC] Add new optimizations to OpenMPOpt comment header Summary: Adds mentions to the new globalization optimizations added to the OpenMPOpt comment header.	2021-06-22 14:40:31 -04:00
Joseph Huber	b54ccab509	[Attributor] Add an option to increase the max number of iterations Right now the Attributor defaults to 32 fixed point iterations unless it is set explicitly by a command line flag. This patch allows this to be configured when the attributor instance is created. The maximum is then increased in OpenMPOpt if the target is a kernel. This is because the globalization analysis can result in larger iteration counts due to many dependent instances running at once. Depends on D102444 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104416	2021-06-22 14:38:25 -04:00
Sanjay Patel	b1f6ef92ec	[InstCombine] reduce code duplication for FP min/max with casts fold; NFC	2021-06-22 14:15:04 -04:00
Joseph Huber	30e36c9b3c	[Attributor] Add interface to emit remarks in Attributor Summary: This patch adds support for the Attributor to emit remarks on behalf of some other pass. The attributor can now optionally take a callback function that returns an OptimizationRemarkEmitter object when given a Function pointer. If this is availible then a remark will be emitted for the corresponding pass name. Depends on D102197 Reviewed By: sstefan1 thegameg Differential Revision: https://reviews.llvm.org/D102444	2021-06-22 14:12:46 -04:00
Joseph Huber	7d69da71dd	[OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls Summary: The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087. Depends on D97818 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102197	2021-06-22 13:23:05 -04:00
Sami Tolvanen	4474958d3a	ThinLTO: Fix inline assembly references to static functions with CFI Create an internal alias with the original name for static functions that are renamed in promoteInternals to avoid breaking inline assembly references to them. Link: https://github.com/ClangBuiltLinux/linux/issues/1354 Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D104058	2021-06-22 10:01:55 -07:00
Joseph Huber	03d7e61c87	[OpenMP] Internalize functions in OpenMPOpt to improve IPO passes Summary: Currently the attributor needs to give up if a function has external linkage. This means that the optimization introduced in D97818 will only apply to static functions. This change uses the Attributor to internalize OpenMP device routines by making a copy of each function with private linkage and replacing the uses in the module with it. This allows for the optimization to be applied to any regular function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102824	2021-06-22 12:38:10 -04:00
Joseph Huber	6fc51c9f7d	[OpenMP] Replace GPU globalization calls with shared memory in the middle-end Summary: The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97818	2021-06-22 11:55:44 -04:00
Nikita Popov	e790d3667e	[OpaquePtr] Handle addrspacecasts in InstCombine This adds support for addrspace casts involving opaque pointers to InstCombine, as well as the isEliminableCastPair() helper (otherwise the assertion failure would just move there). Add PointerType::hasSameElementTypeAs() to hide the element type details. Differential Revision: https://reviews.llvm.org/D104668	2021-06-22 17:45:30 +02:00
Jingu Kang	873ff5a728	[SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch There was a bug from cost calculation for partially invariant unswitch. The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should not be counted. Differential Revision: https://reviews.llvm.org/D103816	2021-06-22 16:18:00 +01:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Max Kazantsev	4c4f1ae93e	Re-land "[LoopDeletion] Handle Phis with similar inputs from different blocks" Patch was reverted due to a bug that existed before it and was exposed by it. Returning after the underlying bug has been fixed. Differential Revision: https://reviews.llvm.org/D103959	2021-06-22 12:28:46 +07:00
Max Kazantsev	575253887b	[LoopDeletion] Require loop to have a predecessor when executing 1st iteration symbolically Two predecessors break the further logic, and the loop may come to the opt in non-canonicalized state.	2021-06-22 12:20:55 +07:00
Nikita Popov	39796e1ad0	Reapply [InstCombine] Don't try converting opaque pointer bitcast to GEP Reapplied without changes -- this was reverted together with an underlying patch. ----- Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 22:15:56 +02:00
Nikita Popov	e2c2124a4b	Reapply [InstCombine] Extract bitcast -> gep transform Relative to the original patch, an InstCombine test has been added to show a previously missed pattern, and the Coroutine test that resulted in the revert has been regenerated. ----- Move this into a separate function, to make sure that early returns do not accidentally skip other transforms. This previously happened for the isSized() check, which skipped folds like distributing a bitcast over a select.	2021-06-21 22:03:15 +02:00
Nikita Popov	6922ab73a5	Revert "[InstCombine] Extract bitcast -> gep transform" This reverts commit `d9f5d7b959`. This reverts commit `5780611d7e`. This causes a failure in Coroutine tests.	2021-06-21 21:34:17 +02:00
Nikita Popov	862313cf59	[LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI) As these are no longer passed to UnrollLoop(), there is no need to modify them in computeUnrollCount(). Make them non-reference parameters. Differential Revision: https://reviews.llvm.org/D104590	2021-06-21 21:34:17 +02:00
Alexey Bataev	908b753661	[SLP]Improve vectorization of PHI instructions. Perform better analysis when trying to vectorize PHIs. 1. Do not try to vectorize vector PHIs. 2. Do deeper analysis for more profitable nodes for the vectorization. Before we just tried to vectorize the PHIs of the same type. Patch improves this and tries to vectorize PHIs with incoming values which come from the same basic block, have the same and/or alternative opcodes. It allows to save the compile time and provides better vectorization results in general. Part of D57059. Differential Revision: https://reviews.llvm.org/D103638	2021-06-21 12:26:24 -07:00
Nikita Popov	5780611d7e	[InstCombine] Don't try converting opaque pointer bitcast to GEP Bitcasts having opaque pointer source or result type cannot be converted into a zero-index GEP, GEP source and result types always have the same opaque-ness.	2021-06-21 21:24:50 +02:00
Nikita Popov	d9f5d7b959	[InstCombine] Extract bitcast -> gep transform Move this into a separate function, to make sure that early returns do not accidentally skip other transforms. There is already one isSized() check that could run into this issue, thus this change is not strictly NFC.	2021-06-21 21:24:50 +02:00
Nikita Popov	a969bdc56f	[InstCombine] Remove unnecessary addres space check (NFC) It's not possible to bitcast between different address spaces, and this is ensured by the IR verifier. As such, this bitcast to addrspacecast canonicalization can never be hit.	2021-06-21 20:11:39 +02:00
Nathan Chancellor	f52666985d	Revert "[LoopDeletion] Handle Phis with similar inputs from different blocks" This reverts commit `bb1dc876eb`. This patch causes an assertion failure when building an arm64 defconfig Linux kernel. See https://reviews.llvm.org/D103959 for a link to the original bug report and a reduced reproducer.	2021-06-21 10:18:55 -07:00
Sanjay Patel	198b79caae	[InstCombine] move bitmanipulation-of-select folds This is no outwardly-visible-difference-intended, but it is obviously better to have all transforms for an intrinsic housed together since we already have helper functions in place. It is also potentially more efficient to zap a simple pattern match before trying to do expensive computeKnownBits() calls.	2021-06-21 11:32:16 -04:00
Sanjay Patel	64b2676ca8	[InstCombine] fold ctlz/cttz-of-select with 1 or more constant arms Building on: `4c44b02d87` ...and adding handling for the extra operand in these intrinsics. This pattern is discussed in: https://llvm.org/PR50140	2021-06-21 11:04:12 -04:00
Nikita Popov	80e0424b2c	[Mem2Reg] Use poison for unreachable cases Use poison instead of undef for cases dealing with unreachable code. This still leaves the more interesting case of "load from uninitialized memory" as undef.	2021-06-21 10:54:13 +02:00
Juneyoung Lee	c038845f58	[InstCombine] Fold icmp (select c,const,arg), null if icmp arg, null can be simplified This patch folds icmp (select c,const,arg), null if icmp arg, null can be simplified. Resolves llvm.org/pr48975. Reviewed By: nikic, xbolva00 Differential Revision: https://reviews.llvm.org/D96663	2021-06-21 17:39:05 +09:00
Sjoerd Meijer	342bbb7832	[FuncSpec] Don't specialise functions with NoDuplicate instructions. getSpecializationCost was returning INT_MAX for a case when specialisation shouldn't happen, but this wasn't properly checked if specialisation was forced. Differential Revision: https://reviews.llvm.org/D104461	2021-06-21 09:02:11 +01:00
Max Kazantsev	bb1dc876eb	[LoopDeletion] Handle Phis with similar inputs from different blocks This patch lifts the requirement to have the only incoming live block for Phis. There can be multiple live blocks if the same value comes to phi from all of them. Differential Revision: https://reviews.llvm.org/D103959 Reviewed By: nikic, lebedev.ri	2021-06-21 11:37:06 +07:00
Juneyoung Lee	ce192ced2b	[InstCombine] Use poison constant to represent the result of unreachable instrs This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction. This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll . Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104602	2021-06-21 09:58:44 +09:00
Nikita Popov	1ae266f452	[LoopUnroll] Use smallest exact trip count from any exit This is a more general alternative/extension to D102635. Rather than handling the special case of "header exit with non-exiting latch", this unrolls against the smallest exact trip count from any exit. The latch exit is no longer treated as priviledged when it comes to full unrolling. The motivating case is in full-unroll-one-unpredictable-exit.ll. Here the header exit is an IV-based exit, while the latch exit is a data comparison. This kind of loop does not get rotated, because the latch is already exiting, and loop rotation doesn't try to distinguish IV-based/analyzable latches. Differential Revision: https://reviews.llvm.org/D102982	2021-06-20 20:58:26 +02:00
David Green	a24b02193a	[DSE] Remove stores in the same loop iteration DSE will currently only remove stores in the same block unless they can be guaranteed to be loop invariant. This expands that to any stores that are in the same Loop, at the same loop level. This should still account for where AA/MSSA will not handle aliasing between loops, but allow the dead stores to be removed where they overlap in the same loop iteration. It requires adding loop info to DSE, but that looks fairly harmless. The test case this helps is from code like this, which can come up in certain matrix operations: for(i=..) dst[i] = 0; for(j=..) dst[i] += src[in+j]; After LICM, this becomes: for(i=..) dst[i] = 0; sum = 0; for(j=..) sum += src[in+j]; dst[i] = sum; The first store is dead, and with this patch is now removed. Differntial Revision: https://reviews.llvm.org/D100464	2021-06-20 17:03:30 +01:00
Sanjay Patel	4c44b02d87	[InstCombine] fold ctpop-of-select with 1 or more constant arms The general pattern is mentioned in: https://llvm.org/PR50140 ...but we need to do a bit more to handle intrinsics with extra operands like ctlz/cttz.	2021-06-20 11:28:45 -04:00
Sanjay Patel	240acb0cff	[InstCombine] avoid infinite loops with select folds of constant expressions This pair of transforms was added recently with: `8591640379` And could lead to conflicting folds: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399	2021-06-20 09:46:25 -04:00
Roman Lebedev	c5b7335dc8	[SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken Same as with HoistThenElseCodeToIf() (`ad87761925`).	2021-06-20 12:37:14 +03:00
Roman Lebedev	ad87761925	[SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken This problem is exposed by D104598, after it tail-merges `ret` in `@test_inline_constraint_S_label`, the verifier would start complaining `invalid operand for inline asm constraint 'S'`. Essentially, taking address of a block is mismodelled in IR. It should probably be an explicit instruction, a first one in block, that isn't identical to any other instruction of the same type, so that it can't be hoisted.	2021-06-20 12:18:15 +03:00
Nikita Popov	1bd4085e0b	[LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop() Currently, UnrollLoop() is passed an AllowRuntime flag and decides itself whether runtime unrolling should be used or not. This patch pushes the decision into the caller and allows us to eliminate the ULO.TripCount and ULO.TripMultiple parameters. Differential Revision: https://reviews.llvm.org/D104487	2021-06-19 09:25:57 +02:00
Liqiang Tao	671a87104b	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-19 10:17:32 +08:00
Guozhi Wei	575ba6f425	[InstCombine] Don't transform code if DoTransform is false In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case. Differential Revision: https://reviews.llvm.org/D104567	2021-06-18 18:01:34 -07:00
Fangrui Song	3307240f05	[InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling On ELF, the D1003372 optimization can apply to more cases. There are two prerequisites for making `__profd_` private: * `__profc_` keeps `__profd_` live under compiler/linker GC * `__profd_` is not referenced by code The first is satisfied because all counters/data are in a section group (either `comdat any` or `comdat noduplicates`). The second requires that the function does not use value profiling. Regarding the second point: `__profd_` may be referenced by other text sections due to inlining. There will be a linker error if a prevailing text section references the non-prevailing local symbol. With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`) clang is 4.2% smaller (1-169620032/177066968). `stat -c %s */.o \| awk '{s+=$1}END{print s}' is 2.5% smaller. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-18 17:01:17 -07:00
Hongtao Yu	bd52495518	[CSSPGO] Undoing the concept of dangling pseudo probe As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen. I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D104477	2021-06-18 15:14:11 -07:00
Nikita Popov	3308205ae9	[LoopUnroll] Simplify optimization remarks Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and debug code. For debug code, print information about all exits. For optimization remarks, only include the unroll count and the type of unroll (complete, partial or runtime), but omit detailed information about exit folding, now that more than one exit may be folded. Differential Revision: https://reviews.llvm.org/D104482	2021-06-18 23:47:03 +02:00
Nick Desaulniers	bef2992861	[GCOVProfiling] don't profile Fn's w/ noprofile attribute Similar to D104475, the Linux kernel would like to avoid compiler generated code in certain functions. The no_profile function attribute can be used in C to generate the the noprofile fn attr in IR. Respect that from GCOVProfiling. Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/ Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D104257	2021-06-18 13:58:34 -07:00
Andrew Browne	14407332de	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-18 11:21:46 -07:00
Liqiang Tao	93183a41b9	Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder"	2021-06-18 18:52:00 +08:00
Max Kazantsev	de92287cf8	[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3) This patch handles one particular case of one-iteration loops for which SCEV cannot straightforwardly prove BECount = 1. The idea of the optimization is to symbolically execute conditional branches on the 1st iteration, moving in topoligical order, and only visiting blocks that may be reached on the first iteration. If we find out that we never reach header via the latch, then the backedge can be broken. This implementation uses InstSimplify. SCEV version was rejected due to high compile time impact. Differential Revision: https://reviews.llvm.org/D102615 Reviewed By: nikic	2021-06-18 17:31:57 +07:00
Haojian Wu	3f5d53a525	[Attributor] Fix UB behavior on uninitalized bool variables. Found by ASAN.	2021-06-18 11:49:42 +02:00
Daniil Seredkin	6643e51d79	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 16:28:06 +07:00
Liqiang Tao	a740b707d1	[llvm][Inliner] Add an optional PriorityInlineOrder This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining. The callsite which size is smaller would have a higher priority. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D104028	2021-06-18 16:55:38 +08:00
Max Kazantsev	fa5eb22ad4	[NFC] Assert non-zero factor before division This is to ensure that zero denominator leads to controlled assertion failure rather than UB.	2021-06-18 15:50:50 +07:00
Haojian Wu	7670938bba	[Attributor] Don't print the call-graph in a hard-coded file. This looks like not a practical pattern in our codebase (it could fail in some sandbox environement). Instead we print it via standard output, and it is controled by the -attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.	2021-06-18 09:38:07 +02:00
Daniil Seredkin	6de741de08	Revert "[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X)" This reverts commit `31053338c9`.	2021-06-18 14:21:02 +07:00
Daniil Seredkin	31053338c9	[InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X) InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case. Differential Revision: https://reviews.llvm.org/D104193	2021-06-18 14:12:00 +07:00
Fangrui Song	5798be8458	Revert D103717 "[InstrProfiling] Make __profd_ unconditionally private for ELF" This reverts commit `76d0747e08`. If a group has `__llvm_prf_vals` due to static value profiler counters (`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text section may reference `__llvm_prf_data` and will cause a `relocation refers to a discarded section` linker error. Note: while a `__profc_` group is non-prevailing, it may be referenced by a prevailing text section due to inlining. ``` group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections: [Index] Name [ 67] __llvm_prf_cnts [ 68] __llvm_prf_vals [ 69] __llvm_prf_data [ 70] .rela__llvm_prf_data ```	2021-06-17 23:38:17 -07:00
Johannes Doerfert	30c9d68ad9	[Attributor][FIX] Arguments of unknown functions can be undef This should fix PR50683. The wrong assumption was that we could always know what the callee is when we replace a call site argument with undef. We wanted to know that to remove the `noundef` that might be attached to the argument. Since no callee means we did the propagation on the caller site, there is no need to remove an attribute. It is only needed if we replace all uses and therefore pass `undef` instead of the value that was passed in otherwise.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	666dc6f126	[Attributor] Use a centralized value simplification interface To allow outside AAs that simplify values we need to ensure all value simplification goes through the Attributor, not AAValueSimplify (or any of the other AAs we have already like AAPotentialValues). This patch also introduces an interface for the outside AAs to register simplification callbacks for an IRPosition. To make this work as expected we have to pass IRPositions instead of Values in AAValueSimplify, which makes sense by itself.	2021-06-18 01:07:53 -05:00
Johannes Doerfert	d9194b6efb	[Attributor] Introduce a helper do deal with constant type mismatches If we simplify values we sometimes end up with type mismatches. If the value is a constant we can often cast it though to still allow propagation. The logic is now put into a helper and it replaces some ad hoc things we did before. This also introduces the AA namespace for abstract attribute related functions and types. Differential Revision: https://reviews.llvm.org/D103856	2021-06-18 01:07:52 -05:00
Johannes Doerfert	9959eee001	[Attributor] Make sure Heap2Stack works properly on a GPU target If the target stack is not accessible between different running "threads" we have to make sure not to create allocas for mallocs that might be used by multiple "threads". The "use check" is sufficient to prevent this but if we apply the "free check" we have to make sure the pointer is not communicated to others before the free is reached. Differential Revision: https://reviews.llvm.org/D98608	2021-06-18 01:07:52 -05:00
Johannes Doerfert	9a23e673ca	[OpenMP][NFC] Expose AAExecutionDomain and rename its getter The initial use for AAExecutionDomain was to determine if a single thread executes a block. While this is sometimes informative most of the time, and for other reasons, we actually want to know if it is the "initial thread". Thus, the thread that started execution on the current device. The deduction needs to be adjusted in a follow up as the methods we use right not are looking for the OpenMP thread id which is resets whenever a thread enters a parallel region. What we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and equivalent functions.	2021-06-18 01:07:52 -05:00
Johannes Doerfert	8d7bace3b5	[Attributor][NFC] AAReachability is currently stateless, don't invalidate it We invalidated AAReachabilityImpl directly which is not helpful and confusing as we still used it regardless. We now avoid invalidating it (not needed anyway) and add checks for the state. This has by itself no actual effect but prepares for later extensions.	2021-06-18 01:07:51 -05:00
George Balatsouras	c6b5a25eeb	[dfsan] Replace dfs$ prefix with .dfsan suffix The current naming scheme adds the `dfs$` prefix to all DFSan-instrumented functions. This breaks mangling and prevents stack trace printers and other tools from automatically demangling function names. This new naming scheme is mangling-compatible, with the `.dfsan` suffix being a vendor-specific suffix: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure With this fix, demangling utils would work out-of-the-box. Reviewed By: stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104494	2021-06-17 22:42:47 -07:00
Xun Li	3522167efd	[Coroutine] Properly deal with byval and noalias parameters This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857. Previous attempts can be found in D104007 and D101980. A lot of discussions can be found in those two patches. To summarize the bug: When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly. However, in some cases we find that arguments are still directly used: When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free. This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly. For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass. For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer. Differential Revision: https://reviews.llvm.org/D104184	2021-06-17 19:06:10 -07:00
Roman Lebedev	84eeb82888	[NFC][SimpleLoopUnswitch] unswitchTrivialBranch(): add debug output explaining unswitching failure It's not prohibitively verbose, and allows easier understanding why certain unswitching ultimately wasn't performed.	2021-06-18 00:46:04 +03:00
Kuter Dinel	eaf1b6810c	[Attributor] Derive AACallEdges attribute This attribute computes the optimistic live call edges using the attributor liveness information. This attribute will be used for deriving a inter-procedural function reachability attribute. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104059	2021-06-18 03:29:22 +03:00
Andrew Browne	39295e92f7	Revert "[DFSan] Cleanup code for platforms other than Linux x86_64." This reverts commit `8441b993bd`. Buildbot failures.	2021-06-17 14:19:18 -07:00
Fangrui Song	76d0747e08	[InstrProfiling] Make __profd_ unconditionally private for ELF For ELF, since all counters/data are in a section group (either `comdat any` or `comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the D1003372 optimization prerequisite (linker GC cannot discard data variables while the text section is retained) is always satisified, we can make __profd_ unconditionally private. Reviewed By: davidxl, rnk Differential Revision: https://reviews.llvm.org/D103717	2021-06-17 14:16:54 -07:00
Craig Topper	99e95856fb	[PartiallyInlineLibCalls] Disable sqrt expansion for strictfp. This pass emits a floating point compare and a conditional branch, but if strictfp is enabled we don't emit a constrained compare intrinsic. The backend also won't expand the readonly sqrt call this pass inserts to a sqrt instruction under strictfp. So we end up with 2 libcalls as seen here. https://godbolt.org/z/oax5zMEWd Fix these things by disabling the pass. Differential Revision: https://reviews.llvm.org/D104479	2021-06-17 14:15:12 -07:00
Andrew Browne	8441b993bd	[DFSan] Cleanup code for platforms other than Linux x86_64. These other platforms are unsupported and untested. They could be re-added later based on MSan code. Reviewed By: gbalats, stephan.yichao.zhao Differential Revision: https://reviews.llvm.org/D104481	2021-06-17 14:08:40 -07:00
Nikita Popov	f7c54c4603	[LoopUnroll] Fold all exits based on known trip count/multiple Fold all exits based on known trip count/multiple information from SCEV. Previously only the latch exit or the single exit were folded. This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple entirely: They're still used to a) decide whether runtime unrolling should be performed and b) for ORE remarks. However, the core unrolling logic is independent of them now. Differential Revision: https://reviews.llvm.org/D104203	2021-06-17 20:58:34 +02:00
Roman Lebedev	37dfc467ac	[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg This really isn't talking about vectors in general, but only about either fixed or scalable vectors, and it's pretty confusing to see it state that there aren't any vectors :)	2021-06-17 21:07:34 +03:00
hyeongyukim	69b0ed9a0a	[InstCombine] Fix miscompile on GEP+load to icmp fold (PR45210) As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210 ...the bug is triggered as Eli say when sext(idx) * ElementSize overflows. ``` // assume that GV is an array of 4-byte elements GEP = gep GV, 0, Idx // this is accessing Idx * 4 L = load GEP ICI = icmp eq L, value => ICI = icmp eq Idx, NewIdx ``` The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp. And there is a problem because Idx * ElementSize can overflow. Let's assume that the wanted value is at offset 0. Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00. We should return true for all these values, but currently, the new icmp only returns true for 0x00..00. This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx. ``` ... => Idx' = and Idx, 0x3F..FF ICI = icmp eq Idx', NewIdx ``` Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D99481	2021-06-17 19:46:17 +09:00
Sjoerd Meijer	dcd23d875a	[FuncSpec] Don't specialise functions with attribute NoDuplicate. Differential Revision: https://reviews.llvm.org/D104378	2021-06-17 10:32:29 +01:00
Florian Hahn	80a403348b	[VPlan] Support PHIs as LastInst when inserting scalars in ::get(). At the moment, we create insertelement instructions directly after LastInst when inserting scalar values in a vector in VPTransformState::get. This results in invalid IR when LastInst is a phi, followed by another phi. In that case, the new instructions should be inserted just after the last PHI node in the block. At the moment, I don't think the problematic case can be triggered, but it can happen once predicate regions are merged and multiple VPredInstPHI recipes are in the same block (D100260). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104188	2021-06-17 09:36:44 +01:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Sjoerd Meijer	49ab3b1735	[FuncSpec] Statistics Adds some bookkeeping for collecting the number of specialised functions and a test for that. Differential Revision: https://reviews.llvm.org/D104102	2021-06-16 09:11:51 +01:00
Evgeniy Brevnov	96cded5b79	[SLP] Incorrect handling of external scalar values Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D103954	2021-06-16 13:27:36 +07:00
Andrew Browne	e652d99169	[DFSan][NFC] Fix shadowing variable name.	2021-06-15 22:58:22 -07:00
Rong Xu	82a0bb1afc	[SampleFDO] Place the discriminator flag variable into the used list. We create flag variable "__llvm_fs_discriminator__" in the binary to indicate that FSAFDO hierarchical discriminators are used. This variable might be GC'ed by the linker since it is not explicitly reference. I initially added the var to the use list in pass MIRFSDiscriminator but it did not work. It turned out the used global list is collected in lowering (before MIR pass) and then emitted in the end of pass pipeline. Here I add the variable to the use list in IR level's AddDiscriminators pass. The machine level code is still keep in the case IR's AddDiscriminators is not invoked. If this is the case, this just use -Wl,--export-dynamic-symbol=__llvm_fs_discriminator__ to force the emit. Differential Revision: https://reviews.llvm.org/D103988	2021-06-15 21:51:04 -07:00
Chuanqi Xu	86906304d8	[FuncSpec] Use std::pow instead of operator^ The original implementation calculating UserBonus uses operator ^, which means XOR in C++ language. At the first glance of reviewing, I thought it should be power, my bad. It doesn't make sense to use XOR here. So I believe it should be a carelessness as I made. Test Plan: check-all Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D104282	2021-06-16 10:13:21 +08:00
Andrew Browne	af93157625	[DFSan] Handle landingpad inst explicitly as zero shadow. Before this change, DFSan was relying fallback cases when getting origin address. Differential Revision: https://reviews.llvm.org/D104266	2021-06-15 18:28:20 -07:00
Vitaly Buka	6478ef61b1	[asan] Remove Asan, Ubsan support of RTEMS and Myriad Differential Revision: https://reviews.llvm.org/D104279	2021-06-15 12:59:05 -07:00
Roman Lebedev	e52364532a	[NewPM] Remove SpeculateAroundPHIs pass Addition of this pass has been botched. There is no particular reason why it had to be sold as an inseparable part of new-pm transition. It was added when old-pm was still the default, and very very few users were actually tracking new-pm, so it's effects weren't measured. Which means, some of the turnoil of the new-pm transition are actually likely regressions due to this pass. Likewise, there has been a number of post-commit feedback (post new-pm switch), namely * https://reviews.llvm.org/D37467#2787157 (regresses HW-loops) * https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before) * https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata) and in the half year past, the pass authors (google) still haven't found time to respond to any of that. Hereby it is proposed to backout the pass from the pipeline, until someone who cares about it can address the issues reported, and properly start the process of adding a new pass into the pipeline, with proper performance evaluation. Furthermore, neither google nor facebook reports any perf changes from this change, so i'm dropping the pass completely. It can always be re-reverted should/if anyone want to pick it up again. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D104099	2021-06-15 20:35:55 +03:00
Andrew Litteken	2c21278e74	[IROutliner] Adding DebugInfo handling for IR Outlined Functions This adds support for functions outlined by the IR Outliner to be recognized by the debugger. The expected behavior is that it will skip over the instructions included in that section. This is due to the fact that we can not say which of the original locations the instructions originated from. These functions will show up in the call stack, but you cannot step through them. Reviewers: paquette, vsk, djtodoro Differential Revision: https://reviews.llvm.org/D87302	2021-06-15 10:57:08 -05:00
Florian Hahn	f7fc8927c0	[LoopDeletion] Check for irreducible cycles when deleting loops. Loops with irreducible cycles may loop infinitely. Those cannot be removed, unless the loop/function is marked as mustprogress. Also discussed in D103382. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D104238	2021-06-15 12:56:12 +01:00
Neil Henning	1540da3b78	ABI breaking changes fixes. This commit mostly just replaces bad uses of `NDEBUG` with uses of `LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI breaking changes (normally extra struct elements in headers). Differential Revision: https://reviews.llvm.org/D104216	2021-06-15 11:08:13 +01:00
Vitaly Buka	b8919fb0ea	[NFC][sanitizer] clang-format some code	2021-06-14 18:05:22 -07:00
Huihui Zhang	1c096bf09f	[SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE. Currently, Loop strengh reduce is not handling loops with scalable stride very well. Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance, (refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added). Memory accesses are incremented by "16vscale", while induction variable is incremented by "8vscale". The scaling factor "2" needs to be extracted to build candidate formula i.e., "reg(%in) + 2reg({0,+,(8 %vscale)}". So that addrec register reg({0,+,(8vscale)}) can be reused among Address and ICmpZero LSRUses to enable optimal solution selection. This patch allow LSR getExactSDiv to recognize special cases like "C1XY /s C2X*Y", and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR is missing candidate formula with proper scaled factor to leverage target scaled-index addressing mode. Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable vector. But allow simple valid scale to pass through. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103939	2021-06-14 16:42:34 -07:00
Matt Morehouse	b87894a1d2	[HWASan] Enable globals support for LAM. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104265	2021-06-14 14:20:44 -07:00
Sanjay Patel	8591640379	[InstCombine] add DeMorgan folds for logical ops in select form We canonicalized to these select patterns (poison-safe logic) with D101191, so we need to reduce 'not' ops when possible as we would with 'and'/'or' instructions. This is shown in a secondary example in: https://llvm.org/PR50389 https://alive2.llvm.org/ce/z/BvsESh	2021-06-14 12:54:35 -04:00
Florian Hahn	96ca03493a	[VectorCombine] Limit scalarization to non-poison indices for now. As Eli mentioned post-commit in D103378, the result of the freeze may still be out-of-range according to Alive2. So for now, just limit the transform to indices that are non-poison.	2021-06-14 16:40:14 +01:00
Jeroen Dobbelaere	bb8ce25e88	Intrinsic::getName: require a Module argument Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary. This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288 (as mentioned by @fhahn), after committing D91250. Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D99173	2021-06-14 14:52:29 +02:00
Aditya Kumar	dcbbc69cc5	Calculate getTerminator only when necessary Differential Revision: https://reviews.llvm.org/D104202	2021-06-13 20:16:07 -07:00
Simon Pilgrim	9efe89d82f	BoundsChecking.cpp - tidy implicit header dependencies. NFCI. We don't use <vector> but we do use std::pair (<utility>)	2021-06-13 17:08:15 +01:00
Simon Pilgrim	56541d1377	GVN.cpp - remove unused <vector> include. NFCI.	2021-06-13 14:06:32 +01:00
Simon Pilgrim	c14fd171fe	LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI.	2021-06-13 14:06:32 +01:00
Sanjay Patel	afd44bb6f2	[InstCombine] fold ctlz/cttz of bool types https://alive2.llvm.org/ce/z/tX4pUT	2021-06-13 08:26:40 -04:00
Simon Pilgrim	2477b498f2	ArgumentPromotion.cpp - remove unused <string> include. NFCI.	2021-06-13 13:03:47 +01:00
Simon Pilgrim	b013c58e82	VPlanSLP.cpp - tidy implicit header dependencies. NFCI. We don't use std::string and std::vector, but we do use std::pair and std::max.	2021-06-13 12:37:17 +01:00
Xun Li	fae7debadc	[CHR] Don't run ControlHeightReduction if any BB has address taken This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610. In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable. CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list. However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code. To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken. Added a few test cases. Reviewed By: aeubanks, wenlei, hoy Differential Revision: https://reviews.llvm.org/D103867	2021-06-12 10:29:53 -07:00
Kevin Athey	1d22596b2f	[sanitizer] Remove numeric values from -asan-use-after-return flag. (NFC) for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104152	2021-06-11 15:14:51 -07:00
Kevin Athey	e0b469ffa1	[clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang. Also: - add driver test (fsanitize-use-after-return.c) - add basic IR test (asan-use-after-return.cpp) - (NFC) cleaned up logic for generating table of __asan_stack_malloc depending on flag. for issue: https://github.com/google/sanitizers/issues/1394 Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D104076	2021-06-11 12:07:35 -07:00
Valery N Dmitriev	94a07c79cf	[SLP][NFC] Fix condition that was supposed to save a bit of compile time. It was found by chance revealing discrepancy between comment (few lines above), the condition and how re-ordering of instruction is done inside the if statement it guards. The condition was always evaluated to true. Differential Revision: https://reviews.llvm.org/D104064	2021-06-11 10:08:55 -07:00
Adam Nemet	e0efebb8eb	[Matrix] In transpose opts, handle a^t * a^t Without the fix the testcase crashes because we remove the same instruction twice. Differential Revision: https://reviews.llvm.org/D104127	2021-06-11 09:29:43 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Matt Morehouse	0867edfc64	[HWASan] Add basic stack tagging support for LAM. Adds the basic instrumentation needed for stack tagging. Currently does not support stack short granules or TLS stack histories, since a different code path is followed for the callback instrumentation we use. We may simply wait to support these two features until we switch to a custom calling convention. Patch By: xiangzhangllvm, morehouse Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102901	2021-06-11 08:21:17 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Sjoerd Meijer	9907746f5d	Move Function Specialization to its correct location. NFC. As a follow up of rGc4a0969b9c14, and as part of D104102, move it to the IPO transformations directory.	2021-06-11 15:00:10 +01:00
Sanjay Patel	602ab24833	[SimplifyCFG] avoid crash on degenerate loop The problematic code pattern in the test is based on: https://llvm.org/PR50638 If the IfCond is itself the phi that we are trying to remove, then the loop around line 2835 can end up with something like: %cmp = select i1 %cmp, i1 false, i1 true That can then lead to a use-after-free and assert (although I'm still not seeing that locally in my release + asserts build). I think this can only happen with unreachable code. Differential Revision: https://reviews.llvm.org/D104063	2021-06-11 09:37:06 -04:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Roman Lebedev	20542b47d6	[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper This results in slightly more optimistic alignments in some cases	2021-06-11 12:47:10 +03:00
Roman Lebedev	abc0e0125c	[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function	2021-06-11 12:47:09 +03:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00

... 6 7 8 9 10 ...

28315 Commits