This intrinsic clears the upper bits starting at a specified index. If the index is a constant we can do some simplifications.
This could be in InstSimplify, but we don't handle any target specific intrinsics there today.
Differential Revision: https://reviews.llvm.org/D36069
llvm-svn: 309604
This patch adds simplification support for the BEXTR/BEXTRI intrinsics to match gcc. This only supports cases that fold to 0 or can be fully constant folded. Theoretically we could support converting to AND if the shift part is unused or to only a shift if the mask doesn't modify any bits after an equivalent shl. gcc doesn't do these transformations either.
I put this in InstCombine, but it could be done in InstSimplify. It would be the first target specific intrinsic in InstSimplify.
Differential Revision: https://reviews.llvm.org/D36063
llvm-svn: 309603
The Loop Vectorizer generates redundant operations when manipulating masks:
AND with true, OR with false, compare equal to true. Instead of relying on
a subsequent pass to clean them up, this patch avoids generating them.
Use null (no-mask) to represent all-one full masks, instead of a constant
all-one vector, following the convention of masked gathers and scatters.
Preparing for a follow-up VPlan patch in which these mask manipulating
operations are modeled using recipes.
Differential Revision: https://reviews.llvm.org/D35725
llvm-svn: 309558
Summary:
Since r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.
Reviewers: MatzeB
Reviewed By: MatzeB
Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D35949
llvm-svn: 309553
Summary:
Without any information about the called function, we cannot be sure
that it is safe to interchange loops which contain function calls. For
example there could be dependences that prevent interchanging between
accesses in the called function and the loops. Even functions without any
parameters could cause problems, as they could access memory using
global pointers.
For now, I think it is only safe to interchange loops with calls marked
as readnone.
With this patch, the LLVM test suite passes with `-O3 -mllvm
-enable-loopinterchange` and LoopInterchangeProfitability::isProfitable
returning true for all loops. check-llvm and check-clang also pass when
bootstrapped in a similar fashion, although only 3 loops got
interchanged.
Reviewers: karthikthecool, blitz.opensource, hfinkel, mcrosier, mkuper
Reviewed By: mcrosier
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35489
llvm-svn: 309547
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
Summary:
After some changes in SLP vectorizer we missed some additional checks to
limit the instructions for vectorization. We should not perform analysis
of the instructions if the parent of instruction is not the same as the
parent of the first instruction in the tree or it was analyzed already.
Subscribers: mzolotukhin
Differential Revision: https://reviews.llvm.org/D34881
llvm-svn: 309425
Recommit after workaround the bug PR31652.
Three bugs fixed in previous recommits: The first one is to use CurrentBlock
instead of PREInstr's Parent as param of performScalarPREInsertion because
the Parent of a clone instruction may be uninitialized. The second one is stop
PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst
is defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 309397
JumpThreading claims to preserve LVI, but it doesn't preserve
the analyses which LVI holds a reference to (e.g. the Dominator).
In the current pass manager infrastructure, after JT runs, the
PM frees these analyses (including DominatorTree) but preserves
LVI.
CorrelatedValuePropagation runs immediately after and queries
a corrupted domtree, causing weird miscompiles.
This commit disables the preservation of LVI for the time being.
Eventually, we should either move LVI to a proper dependency
tracking mechanism (i.e. an analyses shouldn't hold references
to other analyses and compute them on demand if needed), or
we should teach all the passes preserving LVI to preserve the
analyses LVI depends on.
The new pass manager has a mechanism to invalidate LVI in case
one of the analyses it depends on becomes invalid, so this problem
shouldn't exist (at least not in this immediate form), but handling
of analyses holding references is still a very delicate subject.
Fixes PR33917 (and rustc).
llvm-svn: 309355
Summary: The original 3.0 hot mupltiplier is too small, and would prevent hot callsites from being inline. This patch increases the hot multilier to 10.0
Reviewers: davidxl, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D35969
llvm-svn: 309344
The alias support was dead code since 2011. It was last touched
in r124182, where it was reintroduced after being removed
in r110434, and since then it was gated behind a HasGlobalAliases
flag that was permanently stuck as `false`.
It is also broken. I'm not sure if it bitrotted or was just broken
in the first place because it appears to have never been tested,
but the following IR results in a crash:
define internal i32 @a(i32 %a, i32 %b) unnamed_addr {
%c = add i32 %a, %b
%d = xor i32 %a, %c
ret i32 %c
}
define internal i32 @b(i32 %a, i32 %b) unnamed_addr {
%c = add i32 %a, %b
%d = xor i32 %a, %c
ret i32 %c
}
It seems safe to remove buggy untested code that no one cared about
for seven years.
Differential Revision: https://reviews.llvm.org/D34802
llvm-svn: 309313
Summary:
Pointer difference simplifications currently happen only if input GEPs don't have other uses or their indexes are all constants, to avoid duplicating indexing arithmetic.
This patch enables cases with exactly one non-constant index among input GEPs to happen where there is no duplicated arithmetic or code size increase even if input GEPs have other uses.
For example, this patch allows "(&A[42][i]-&A[42][0])" --> "i", which didn't happen previously, if the input GEP(s) have other uses.
Reviewers: sanjoy, bkramer
Reviewed By: sanjoy
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D35499
llvm-svn: 309304
This is a module pass so for the old PM, we can't use ORE, the function
analysis pass. Instead ORE is created on the fly.
A few notes:
- isPromotionLegal is folded in the caller since we want to emit the Function
in the remark but we can only do that if the symbol table look-up succeeded.
- There was good test coverage for remarks in this pass.
- promoteIndirectCall uses ORE conditionally since it's also used from
SampleProfile which does not use ORE yet.
Fixes PR33792.
Differential Revision: https://reviews.llvm.org/D35929
llvm-svn: 309294
Summary:
It is possible for some passes to materialize a call to a libcall (ex: ldexp, exp2, etc),
but these passes will not mark the call as a gc-leaf-function. All libcalls are
actually gc-leaf-functions, so we change llvm::callsGCLeafFunction() to tell us that
available libcalls are equivalent to gc-leaf-function calls.
Reviewers: sanjoy, anna, reames
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35840
llvm-svn: 309291
Summary:
Until a more advanced version of importing can be implemented for
aliases (one that imports an alias as an available_externally definition
of the aliasee), skip the narrow subset of cases that was possible but
came at a cost: aliases of linkonce_odr functions could be imported
because the linkonce_odr function could be safely duplicated from the
source module. This came/comes at the cost of not being able to 'home'
imported linkonce functions (they had to be emitted linkonce_odr in all
the destination modules (even if they weren't used by an alias) rather
than as available_externally - causing extra object size).
Tangentially, this also was the only reason ThinLTO would emit multiple
CUs in to the resulting DWARF - which happens to be a problem for
Fission (there's a fix for this in GDB but not released yet, etc).
(actually it's not the only reason - but I'm sending a patch to fix the
other reason shortly)
There's no reason to believe this particularly narrow alias importing
was especially/meaningfully important, only that it was /possible/ to
implement in this way. When a more general solution is done, it should
still satisfy the DWARF concerns above, since the import will still be
available_externally, and thus not create extra CUs.
Since now all aliases are treated the same, I removed/simplified some
test cases since they were testing corner cases where there are no
longer any corners.
Reviewers: tejohnson, mehdi_amini
Differential Revision: https://reviews.llvm.org/D35875
llvm-svn: 309278
Summary:
This changes SimplifyLibCalls to use the new OptimizationRemarkEmitter
API.
In fact, as SimplifyLibCalls is only ever called via InstCombine,
(as far as I can tell) the OptimizationRemarkEmitter is added there,
and then passed through to SimplifyLibCalls later.
I have avoided changing any remark text.
This closes PR33787
Patch by Sam Elliott!
Reviewers: anemet, davide
Reviewed By: anemet
Subscribers: davide, mehdi_amini, eraman, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D35608
llvm-svn: 309158
This is a workaround for the bug described in PR31652 and
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115497.html. The temporary
solution is to add a function EqualityPropUnSafe. In EqualityPropUnSafe, for
some simple patterns we can know the equality comparison may contains undef,
so we regard such comparison as unsafe and will not do loop-unswitching for
them. We also need to disable the select simplification when one of select
operand is undef and its result feeds into equality comparison.
The patch cannot clear the safety issue caused by the bug, but it can suppress
the issue from happening to some extent.
Differential Revision: https://reviews.llvm.org/D35811
llvm-svn: 309059
it when safe.
Very often the BE count is the trip count minus one, and the plus one
here should fold with that minus one. But because the BE count might in
theory be UINT_MAX or some such, adding one before we extend could in
some cases wrap to zero and break when we scale things.
This patch checks to see if it would be safe to add one because the
specific case that would cause this is guarded for prior to entering the
preheader. This should handle essentially all of the common loop idioms
coming out of C/C++ code once canonicalized by LLVM.
Before this patch, both forms of loop in the added test cases ended up
subtracting one from the size, extending it, scaling it up by 8 and then
adding 8 back onto it. This is really silly, and it turns out made it
all the way into generated code very often, so this is a surprisingly
important cleanup to do.
Many thanks to Sanjoy for showing me how to do this with SCEV.
Differential Revision: https://reviews.llvm.org/D35758
llvm-svn: 308968
Summary:
The remaining non range-based for loops do not iterate over full ranges,
so leave them as they are.
Reviewers: karthikthecool, blitz.opensource, mcrosier, mkuper, aemerson
Reviewed By: aemerson
Subscribers: aemerson, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35777
llvm-svn: 308872
Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.
Reviewers: danielcdh, tejohnson, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D35334
llvm-svn: 308760
Separated out the profitability from the safety analysis for multiexit
loop unrolling. Currently, this is an NFC because profitability is true
only if the unroll-runtime-multi-exit is set to true (off-by-default).
This is to ease adding the profitability heuristic up for review at
D35380.
llvm-svn: 308753
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
Previously we were (mis)handling jump table members with a prevailing
definition in a full LTO module and a non-prevailing definition in a
ThinLTO module by dropping type metadata on those functions entirely,
which would cause type tests involving such functions to fail.
This patch causes us to drop metadata only if we are about to replace
it with metadata from cfi.functions.
We also want to replace metadata for available_externally functions,
which can arise in the opposite scenario (prevailing ThinLTO
definition, non-prevailing full LTO definition). The simplest way
to handle that is to remove the definition; there's little value in
keeping it around at this point (i.e. after most optimization passes
have already run) and later code will try to use the function's linkage
to create an alias, which would result in invalid IR if the function
is available_externally.
Fixes PR33832.
Differential Revision: https://reviews.llvm.org/D35604
llvm-svn: 308642
Large CFGs can cause us to blow up the stack because we would have a
recursive step for each basic block in a region.
Instead, create a worklist and iterate it. This limits the stack usage
to something more manageable.
Differential Revision: https://reviews.llvm.org/D35609
llvm-svn: 308582
If OpValue is non-null, we only consider operations similar to OpValue
when intersecting.
Differential Revision: https://reviews.llvm.org/D35292
llvm-svn: 308428
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop unrolling and loop interleaving).
This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.
Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: efriedma
Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35411
llvm-svn: 308422
Generate a single test to decide if there are enough iterations to jump to the
vectorized loop, or else go to the scalar remainder loop. This test compares the
Scalar Trip Count: if STC < VF * UF go to the scalar loop. If
requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the
rest can be used to form vector iterations. So in this case the test checks
instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the
scalar loop if so. Otherwise the vector loop is entered for at-least one vector
iteration.
This test covers the case where incrementing the backedge-taken count will
overflow leading to an incorrect trip count of zero. In this (rare) case we will
also avoid the vector loop and jump to the scalar loop.
This patch simplifies the existing tests and effectively removes the basic-block
originally named "min.iters.checked", leaving the single test in block
"vector.ph".
Original observation and initial patch by Evgeny Stupachenko.
Differential Revision: https://reviews.llvm.org/D34150
llvm-svn: 308421
functions.
In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.
While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.
We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.
This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.
llvm-svn: 308417
Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost.
Reviewers: dberlin, davide, aprantl
Reviewed By: aprantl
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34639
llvm-svn: 308404
Summary:
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.
Submitted on behalf of @morehouse (Matt Morehouse)
Reviewers: eugenis, vitalybuka
Reviewed By: eugenis
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34789
llvm-svn: 308387
Coverage hooks that take less-than-64-bit-integers as parameters need the
zeroext parameter attribute (http://llvm.org/docs/LangRef.html#paramattrs)
to make sure they are properly extended by the x86_64 ABI.
llvm-svn: 308296
In some particular cases eq/ne conditions can be turned into equivalent
slt/sgt conditions. This patch teaches parseLoopStructure to handle some
of these cases.
Differential Revision: https://reviews.llvm.org/D35010
llvm-svn: 308264
Rename the enum value from X86_64_Win64 to plain Win64.
The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.
Differential Revision: https://reviews.llvm.org/D34474
llvm-svn: 308208
This reverts commit r308114 (and follow on fixes to test).
There is a linking failure in a ThinLTO bot:
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/
(and undefined reference). It seems like it must be a second order
effect of the heuristic change I made, and may take some time to try
to reproduce locally and track down. Therefore, reverting for now.
llvm-svn: 308206
This restores r308078/r308079 with a fix for bot non-determinisim (make
sure we run llvm-lto in single threaded mode so the debug output doesn't get
interleaved).
llvm-svn: 308114
Summary:
If one side simplifies to the identity value for inner opcode, we can replace the value with just the operation that can't be simplified.
I've removed a couple now unneeded special cases in visitAnd and visitOr. There are probably other cases I missed.
Reviewers: spatel, majnemer, hfinkel, dberlin
Reviewed By: spatel
Subscribers: grandinj, llvm-commits, spatel
Differential Revision: https://reviews.llvm.org/D35451
llvm-svn: 308111
that appears to exhibit non-determinism and is flaking on the bots
pretty consistently.
r308078: [ThinLTO] Ensure we always select the same function copy to import
r308079: Require asserts in new test that uses debug flag
llvm-svn: 308095
Summary:
Check if the first eligible callee is under the instruction threshold.
Checking this on the first eligible callee ensures that we don't end
up selecting different callees to import when we invoke this routine
with different thresholds due to reaching the callee via paths that
are shallower or hotter (when there are multiple copies, i.e. with
weak or linkonce linkage). We don't want to leave the decision of which
copy to import up to the backend.
Reviewers: mehdi_amini
Subscribers: inglorion, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D35436
llvm-svn: 308078
Summary:
When checking for memory dependencies between calls using MemorySSA,
handle cases where the calls have no MemoryAccess associated with them
because the AA analysis being used has determined that the call does not
read/write memory.
Fixes PR33756
Reviewers: dberlin, davide
Subscribers: mcrosier, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D35317
llvm-svn: 308051
Add the following pattern to TryToUnfoldSelectInCurrBB()
bb:
%p = phi [0, %bb1], [1, %bb2], [0, %bb3], [1, %bb4], ...
%c = cmp %p, 0
%s = select %c, trueval, falseval
The Select in the above pattern will be unfolded and then jump-threaded. The
current implementation does not allow CMP in the middle of PHI and Select.
Differential Revision: https://reviews.llvm.org/D34762
llvm-svn: 308050
Summary:
DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere.
This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction.
Reviewers: dberlin, sanjoy, davide, grosser
Reviewed By: dberlin
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35315
llvm-svn: 308040
When iterating through loop
for (int i = INT_MAX; i > 0; i--)
We fail to generate the pre-loop for it. It happens because we use the
overflown value in a comparison predicate when identifying whether or not
we need it.
In old logic, we used SLE predicate against Greatest value which exceeds all
seen values of the IV and might be overflown. Now we use the GreatestSeen
value of this IV with SLT predicate.
Also added a test that ensures that a pre-loop is generated for such loops.
Differential Revision: https://reviews.llvm.org/D35347
llvm-svn: 308001
Summary:
When we runtime unroll with multiple exit blocks, we also need to update the
immediate dominators of the immediate successors of the exit blocks.
Reviewers: reames, mkuper, mzolotukhin, apilipenko
Reviewed by: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35304
llvm-svn: 307909
This is an incremental change to the promotion feature.
There are two problems with the current behavior:
1) loops with multiple exiting blocks are totally disabled
2) a counter update can only be promoted one level up in
the loop nest -- which does help much for short trip
count inner loops inside a high trip-count outer loops.
Due to this limitation, we still saw very large profile
count fluctuations from run to run for the affected loops
which are usually very hot.
This patch adds the support for promotion counters iteratively
across the loop nest. It also turns on the promotion for
loops with multiple exiting blocks (with a limit).
For single-threaded applications, the performance impact is flat
on average. For instance, dealII improves, but povray regresses.
llvm-svn: 307863
Refactored the code and separated out a function
`canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it
easier to add profitability heuristics later.
Added tests to runtime unrolling to make sure that unrolling for
multi-exit loops is not done unless the option
-unroll-runtime-multi-exit is true.
llvm-svn: 307843
Where is is needed (at the end of headers that define it), be
consistent about its use.
Also fix a few header guards that I found in the process.
Differential Revision: https://reviews.llvm.org/D34916
llvm-svn: 307840
Summary:
LoopRotate manually updates the DoomTree by iterating over all predecessors of a basic block and computing the Nearest Common Dominator.
When a predecessor happens to be unreachable, `DT.findNearestCommonDominator` returns nullptr.
This patch teaches LoopRotate to handle this case and fixes [[ https://bugs.llvm.org/show_bug.cgi?id=33701 | PR33701 ]].
In the future, LoopRotate should be taught to use the new incremental API for updating the DomTree.
Reviewers: dberlin, davide, uabelho, grosser
Subscribers: efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D35074
llvm-svn: 307828
This normally indicates mixed CFI + non-CFI compilation, and will
result in us treating the function in the same way as a function
defined outside of the LTO unit.
Part of PR33752.
Differential Revision: https://reviews.llvm.org/D35281
llvm-svn: 307744
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.
llvm-svn: 307729
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
This is fine as nothing in the code relies on leader and memory
leader being the same for a given congruency class. Ack'ed by
Dan.
Fixes PR33720.
llvm-svn: 307699
The loop structure for the outer loop does not contain the epilog
preheader when we try to unroll inner loop with multiple exits and
epilog code is generated. For now, we just bail out in such cases.
Added a test case that shows the problem. Without this bailout, we would
trip on assert saying LCSSA form is incorrect for outer loop.
llvm-svn: 307676
querying for analysis results on a function declaration rather than
a definition.
The only reason this worked previously is by chance -- because the way
we got alias analysis results with the legacy PM, we happened to not
compute a dominator tree and so we happened to not hit an assert even
though it didn't make any real sense. Now we bail out before trying to
compute alias analysis so that we don't hit these asserts.
llvm-svn: 307625
Summary:
As metioned in https://reviews.llvm.org/D34576, checkings in
`collectConstantCandidates` can be replaced by using
`llvm::canReplaceOperandWithVariable`.
The only special case is that `collectConstantCandidates` return false for
all `IntrinsicInst` but it is safe for us to collect constant candidates from
`IntrinsicInst`.
Reviewers: pirama, efriedma, srhines
Reviewed By: efriedma
Subscribers: llvm-commits, javed.absar
Differential Revision: https://reviews.llvm.org/D34921
llvm-svn: 307587
When unrolling under multiple exits which is under off-by-default option,
the assert that checks for VMap entry in loop exit values is too strong.
(assert if VMap entry did not exist, the value should be a
constant). However, values derived from
constants or from values outside loop, does not have a VMap entry too.
Removed the assert and added a testcase showcasing the property for
non-constant values.
llvm-svn: 307542
Summary:
This solves PR33641.
When removing a dead argument we must also handle possibly existing calls
to llvm.dbg.value that use the removed argument. Now we change the use
of the otherwise dead argument to an undef for some other pass to cleanup
later.
If the calls are left untouched, they will later on cause errors:
"function-local metadata used in wrong function"
since the ArgumentPromotion rewrites the code by creating a new function
with the wanted signature, but the metadata is not recreated so the new
function may then erroneously use metadata from the old function.
Reviewers: mstorsjo, rnk, arsenm
Reviewed By: rnk
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34874
llvm-svn: 307521