Prior to r348205, extracting code regions with live output values was
disabled because of a miscompilation (PR39433). Lift the restriction as
PR39433 has been addressed.
Tested on LNT+externals, on a run of check-llvm in a stage2 build, and
with a full build of iOS (with hot/cold splitting enabled).
As a drive-by, remove an errant TODO.
llvm-svn: 351492
Resuming exception unwinding is roughly as unlikely as throwing an
exception.
Tested on LNT+externals (in particular, the C++ EH regression tests
provide end-to-end test coverage), as well as with a full build of iOS.
llvm-svn: 351491
This gets rid of the brittle/mysterious calls to @sink()/@sideeffect()
peppered throughout the test cases. They are no longer needed to force
splitting to occur.
llvm-svn: 351480
If the sample profile has no inlining hierachy information included, we call
the sample profile is flattened. For flattened profile, in ThinLTO postlink
phase, SampleProfileLoader's hot function inlining and profile annotation will
do nothing, so it is better to save the effort to read in the profile and run
the sample profile loader pass. It is helpful for reducing compile time when
the flattened profile is huge.
Differential Revision: https://reviews.llvm.org/D54819
llvm-svn: 351476
Summary:
InstCombine's sinking algorithm only thinks about memory. It doesn't
think about non-memory constraints like stack object lifetime. It can
sink dynamic allocas across a stacksave call, which may be used with
stackrestore, which can incorrectly reduce the lifetime of the dynamic
alloca.
Fixes PR40365
Reviewers: hfinkel, efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D56872
llvm-svn: 351475
Summary:
If LTOUnit splitting is disabled, the module summary analysis computes
the summary information necessary to perform single implementation
devirtualization during the thin link with the index and no IR. The
information collected from the regular LTO IR in the current hybrid WPD
algorithm is summarized, including:
1) For vtable definitions, record the function pointers and their offset
within the vtable initializer (subsumes the information collected from
IR by tryFindVirtualCallTargets).
2) A record for each type metadata summarizing the vtable definitions
decorated with that metadata (subsumes the TypeIdentiferMap collected
from IR).
Also added are the necessary bitcode records, and the corresponding
assembly support.
The index-based WPD will be sent as a follow-on.
Depends on D53890.
Reviewers: pcc
Subscribers: mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54815
llvm-svn: 351453
During the transforms in LoopSimplifyCFG, when we remove a dead exiting edge, the
parent loop may stop being reachable from the child loop, and therefore they become
siblings. If the former child loop had uses of some values from its former parent loop,
now such uses will require LCSSA Phis, even if they weren't needed before. So we must
form LCSSA for all loops that stopped being ancestors of the current loop in this case.
Differential Revision: https://reviews.llvm.org/D56144
Reviewed By: fedor.sergeev
llvm-svn: 351434
Function `DeleteDeadBlock` requires that all predecessors of a block
being deleted have already been deleted, with the exception of a
single-block loop. When we use it for removal of dead subloops that
contain more than one block, we may not fulfull this requirement and
fail an assertion.
This patch replaces invocation of `DeleteDeadBlock` with a generalized
version `DeleteDeadBlocks` that is able to deal with multiple dead blocks,
even if they contain some cycles.
Differential Revision: https://reviews.llvm.org/D56121
Reviewed By: fedor.sergeev
llvm-svn: 351433
The function merging pass miscompiles identical vararg functions. The
forwarding thunk it emits doesn't forward the full variable-length list
of arguments. Disable merging for vararg functions for now.
I've filed llvm.org/PR40345 to track the issue.
rdar://47326238
llvm-svn: 351411
Essentially, do not treat `call` and `musttail call` as the same thing.
As a drive-by, fold CallInst and InvokeInst handling together using the
CallSite helper.
Differential Revision: https://reviews.llvm.org/D56815
llvm-svn: 351405
Currently we have pgo options defined in PassManagerBuilder.cpp only for
instrument pgo, but not for sample pgo. We also have pgo options defined
in NewPMDriver.cpp in opt only for new pass manager and for all kinds of
pgo. They have some inconsistency.
To make the options more consistent and make tests writing easier, the
patch let old pass manager to share the same pgo options with new pass
manager in opt, and removes the options in PassManagerBuilder.cpp.
Differential Revision: https://reviews.llvm.org/D56749
llvm-svn: 351392
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.
Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56783
llvm-svn: 351349
For the given test SROA detects possible replacement and creates a correct alloca. After that SROA is adding lifetime markers for this new alloca. The function getNewAllocaSlicePtr is trying to deduce the pointer type based on the original alloca, which is split, to use it later in lifetime intrinsic.
For the test we ended up with such code (rA is initial alloca [10 x float], which is split, and rA.sroa.0.0 is a new split allocation)
```
%rA.sroa.0.0.rA.sroa_cast = bitcast i32* %rA.sroa.0 to [10 x float]* <----- this one causing the assertion and is an extra bitcast
%5 = bitcast [10 x float]* %rA.sroa.0.0.rA.sroa_cast to i8*
call void @llvm.lifetime.start.p0i8(i64 4, i8* %5)
```
isAllocaPromotable code assumes that a user of alloca may go into lifetime marker through bitcast but it must be the only one bitcast to i8* type. In the test it's not a i8* type, return false and throw the assertion.
As we are creating a pointer, which will be used in lifetime markers only, the proposed fix is to create a bitcast to i8* immediately to avoid extra bitcast creation.
The test is a greatly simplified to just reproduce the assertion.
Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com>
Reviewers: chandlerc, craig.topper
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D55934
llvm-svn: 351325
Summary: To avoid adding an extern function to the global ctors list, apply the changes of D56538 also to MSan.
Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan
Subscribers: hiraditya, bollu, llvm-commits
Differential Revision: https://reviews.llvm.org/D56734
llvm-svn: 351322
Summary:
Second iteration of D56433 which got reverted in rL350719. The problem
in the previous version was that we dropped the thunk calling the tsan init
function. The new version keeps the thunk which should appease dyld, but is not
actually OK wrt. the current semantics of function passes. Hence, add a
helper to insert the functions only on the first time. The helper
allows hooking into the insertion to be able to append them to the
global ctors list.
Reviewers: chandlerc, vitalybuka, fedor.sergeev, leonardchan
Subscribers: hiraditya, bollu, llvm-commits
Differential Revision: https://reviews.llvm.org/D56538
llvm-svn: 351314
Summary:
Check to make sure that the caller and the callee have compatible
function arguments before promoting arguments. This uses the same
TargetTransformInfo queries that are used to determine if attributes
are compatible for inlining.
The goal here is to avoid breaking ABI when a called function's ABI
depends on a target feature that is not enabled in the caller.
This is a very conservative fix for PR37358. Ideally we would have a more
sophisticated check for ABI compatiblity rather than checking if the
attributes are compatible for inlining.
Reviewers: echristo, chandlerc, eli.friedman, craig.topper
Reviewed By: echristo, chandlerc
Subscribers: nikic, xbolva00, rkruppe, alexcrichton, llvm-commits
Differential Revision: https://reviews.llvm.org/D53554
llvm-svn: 351296
InstCombine is able to transform mem transfer instrinsic to alone store or store/load pair.
It might result in generation of unaligned atomic load/store which later in backend
will be transformed to libcall. It is not an evident gain and it is better to keep intrinsic as is
and handle it at backend.
Reviewers: reames, anna, apilipenko, mkazantsev
Reviewed By: reames
Subscribers: t.p.northover, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D56582
llvm-svn: 351295
Summary:
InvokeInst should be treated like CallInst and
assigned a separate discriminator. This is particularly
import when an Invoke is converted to a Call
during compilation and so can invalidate sample profile
data collected wtih different link time optimizations
Reviewers: twoh, Kader, danielcdh, wmi
Reviewed By: wmi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56491
llvm-svn: 351251
Summary:
Comdat groups override weak symbol behavior, allowing the linker to keep
the comdats for weak symbols in favor of comdats for strong symbols.
Fixes the issue described in:
https://bugs.chromium.org/p/chromium/issues/detail?id=918662
Reviewers: eugenis, pcc, rnk
Reviewed By: pcc, rnk
Subscribers: smeenai, rnk, bd1976llvm, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D56516
llvm-svn: 351247
Increment statistics counter NumSwitches at unswitchNontrivialInvariants() for
unswitching a non-trivial switch instruction. This is to fix a bug that it
increments NumBranches even for the case of switch instruction.
There is no functional change in this patch.
Differential Revision: https://reviews.llvm.org/D56408
llvm-svn: 351193
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.
And we can also get the mask from and i1, or i1, xor i1.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52060
llvm-svn: 351150
Summary:
Use appendToUsed instead of include to ensure that
SanitizerCoverage's constructors are not stripped.
Also, use isOSBinFormatCOFF() to determine if target
binary format is COFF.
Reviewers: pcc
Reviewed By: pcc
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D56369
llvm-svn: 351118
Summary: Like branch instructions, phi nodes frequently do not have debug information related to the block they are in and so they should be ignored.
Reviewers: danielcdh, twoh, Kader, wmi
Reviewed By: wmi
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D55094
llvm-svn: 351102
This reverts commit a9788dd6587d67c856df74eedff5a6ad34ce8320, reversing
changes made to f1309ffebf718d16aec4fab83380556c660e2825.
unintended merge pushed
llvm-svn: 351095
Following PR39807, the way in which SimplifyCFG hoists common code on
branch paths was fixed in r347782. However this left extra code hanging
around HoistThenElseCodeToIf that wasn't necessary and needlessly
complicated matters -- we no longer need to look up through the 'if'
basic block to find a location for hoisted 'select' insts, we can instead
use the location chosen by applyMergedLocation.
This patch deletes that extra logic, and updates a regression test to
reflect the new logic (selects get the merged location, not a previous
insts location).
Differential Revision: https://reviews.llvm.org/D55272
llvm-svn: 351058
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.
This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.
This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).
There's an additional fix now to avoid a dmask=0
For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.
Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.
The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:
%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1
This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.
Differential revision: https://reviews.llvm.org/D48826
Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
Work around for ppcle compiler bug
Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
Utility function `DeleteDeadBlock` expects that all predecessors of a block being
deleted are already deleted, with the exception of single-block loop. It makes it
hard to use for deletion of a set of blocks that may contain cyclic dependencies.
The is no correct order of invocations of this function that does not produce
dangling pointers on already deleted blocks.
This patch introduces a generalized version of this function `DeleteDeadBlocks`
that allows us to remove multiple blocks at once, even if there are cycles among
them. The only requirement is that no block being deleted should have a predecessor
that is not being deleted.
The logic of `DeleteDeadBlocks` is following:
for each block
create relevant DT updates;
remove all instructions (replace with undef if needed);
replace terminator with unreacheable;
apply DT updates;
for each block
delete block;
Therefore, `DeleteDeadBlock` becomes a particular case of
the general algorithm called for a single block.
Differential Revision: https://reviews.llvm.org/D56120
Reviewed By: skatkov
llvm-svn: 351045
Summary:
Records in the module summary index whether the bitcode was compiled
with the option necessary to enable splitting the LTO unit
(e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit).
The information is passed down to the ModuleSummaryIndex builder via a
new module flag "EnableSplitLTOUnit", which is propagated onto a flag
on the summary index.
This is then used during the LTO link to check whether all linked
summaries were built with the same value of this flag. If not, an error
is issued when we detect a situation requiring whole program visibility
of the class hierarchy. This is the case when both of the following
conditions are met:
1) We are performing LowerTypeTests or Whole Program Devirtualization.
2) There are type tests or type checked loads in the code.
Note I have also changed the ThinLTOBitcodeWriter to also gate the
module splitting on the value of this flag.
Reviewers: pcc
Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D53890
llvm-svn: 350948
MergeFunc only deletes unused duplicate functions if they have local
linkage, but it should be safe to relax this to any "discardable if
unused" linkage type.
Differential Revision: https://reviews.llvm.org/D56574
llvm-svn: 350939
Currently when a select has a constant value in one branch and the select feeds
a conditional branch (via a compare/ phi and compare) we unfold the select
statement. This results in threading the conditional branch later on. Similar
opportunity exists when a select (with a constant in one branch) feeds a
switch (via a phi node). The patch unfolds select under this condition.
A testcase is provided.
llvm-svn: 350931
Summary:
The original patch addressed the use of BlockRPONumber by forcing a sequence point when accessing that map in a conditional. In short we found cases where that map was being accessed with blocks that had not yet been added to that structure. For context, I've kept the wall of text below, to what we are trying to fix, by always ensuring a updated BlockRPONumber.
== Backstory ==
I was investigating an ICE (segfault accessing a DenseMap item). This failure happened non-deterministically, with no apparent reason and only on a Windows build of LLVM (from October 2018).
After looking into the crashes (multiple core files) and running DynamoRio, the cores and DynamoRio (DR) log pointed to the same code in `GVN::performScalarPRE()`. The values in the map are unsigned integers, the keys are `llvm::BasicBlock*`. Our test case that triggered this warning and periodic crash is rather involved. But the problematic line looks to be:
GVN.cpp: Line 2197
```
if (BlockRPONumber[P] >= BlockRPONumber[CurrentBlock] &&
```
To test things out, I cooked up a patch that accessed the items in the map outside of the condition, by forcing a sequence point between accesses. DynamoRio stopped warning of the issue, and the test didn't seem to crash after 1000+ runs.
My investigation was on an older version of LLVM, (source from October this year). What it looks like was occurring is the following, and the assembly from the latest pull of llvm in December seems to confirm this might still be an issue; however, I have not witnessed the crash on more recent builds. Of course the asm in question is generated from the host compiler on that Windows box (not clang), but it hints that we might want to consider how we access the BlockRPONumber map in this conditional (line 2197, listed above). In any case, I don't think the host compiler is wrong, rather I think it is pointing out a possibly latent bug in llvm.
1) There is no sequence point for the `>=` operation.
2) A call to a `DenseMapBase::operator[]` can have the side effect of the map reallocating a larger store (more Buckets, via a call to `DenseMap::grow`).
3) It seems perfectly legal for a host compiler to generate assembly that stores the result of a call to `operator[]` on the stack (that's what my host compile of GVN.cpp is doing) . A second call to `operator[]` //might// encourage the map to 'grow' thus making any pointers to the map's store invalid. The `>=` compares the first and second values. If the first happens to be a pointer produced from operator[], it could be invalid when dereferenced at the time of comparison.
The assembly generated from the Window's host compiler does show the result of the first access to the map via `operator[]` produces a pointer to an unsigned int. And that pointer is being stored on the stack. If a second call to the map (which does occur) causes the map to grow, that address (on the stack) is now invalid.
Reviewers: t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D55974
llvm-svn: 350880
Summary:
Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.
Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.
Reviewers: sanjoy, davide, gberry, george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D40375
llvm-svn: 350879
That is, remove many of the calls to Type::getNumContainedTypes(),
Type::subtypes(), and Type::getContainedType(N).
I'm not intending to remove these accessors -- they are
useful/necessary in some cases. However, removing the pointee type
from pointers would potentially break some uses, and reducing the
number of calls makes it easier to audit.
llvm-svn: 350835
The C standard says "The memchr function locates the first
occurrence of c (converted to an unsigned char)[...]". The expansion
was missing the conversion to unsigned char.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39041 .
Differential Revision: https://reviews.llvm.org/D55947
llvm-svn: 350775
Summary:
Instead of using two separate callbacks to return the entry count and the
relative block frequency, use a single callback to return callsite
count. This would allow better supporting hybrid mode in the future as
the count of callsite need not always be derived from entry count (as in
sample PGO).
Reviewers: davidxl
Subscribers: mehdi_amini, steven_wu, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D56464
llvm-svn: 350755
Current strategy of dropping `InstructionPrecedenceTracking` cache is to
invalidate the entire basic block whenever we change its contents. In fact,
`InstructionPrecedenceTracking` has 2 internal strictures: `OrderedInstructions`
that is needed to be invalidated whenever the contents changes, and the map
with first special instructions in block. This second map does not need an
update if we add/remove a non-special instuction because it cannot
affect the contents of this map.
This patch changes API of `InstructionPrecedenceTracking` so that it now
accounts for reasons under which we invalidate blocks. This should lead
to much less recalculations of the map and should save us some compile time
because in practice we don't typically add/remove special instructions.
Differential Revision: https://reviews.llvm.org/D54462
Reviewed By: efriedma
llvm-svn: 350694
This is matching the equivalent of the DAG expansion,
so it should never end up with worse perf than the
original code even if the target doesn't have a rotate
instruction.
llvm-svn: 350672
A straightforward port of tsan to the new PM, following the same path
as D55647.
Differential Revision: https://reviews.llvm.org/D56433
llvm-svn: 350647
Summary:
This fixes the IDom for exit blocks and all blocks reachable from the exit blocks, when runtime unrolling under multiexit/exiting case.
We initially had a restrictive check that the IDom is only updated when
it is the header of the loop.
However, we also need to update the IDom to the correct one when the
IDom is any block within the original loop. See added test cases (which
fail dom tree verification without the patch).
Reviewers: reames, mzolotukhin, mkazantsev, hfinkel
Reviewed by: brzycki, kuhar
Subscribers: zzheng, dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D56284
llvm-svn: 350640
update client code.
Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.
Differential Revision: https://reviews.llvm.org/D56183
llvm-svn: 350508
minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
Thanks for the review from Saleem!
Differential Revision: https://reviews.llvm.org/D55641
llvm-svn: 350503
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.
Differential Revision: https://reviews.llvm.org/D55786
llvm-svn: 350463
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.
llvm-svn: 350456
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.
I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.
Differential Revision: https://reviews.llvm.org/D56247
llvm-svn: 350435
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.
The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.
This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.
Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.
Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.
Differential Revision: https://reviews.llvm.org/D56038
llvm-svn: 350429
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):
```
entry:
+------------+
| x = alloca |
| y = alloca |
+------------+
/ \
lhs: rhs:
+-------------------+ +-------------------+
| lifetime_start(x) | | lifetime_start(x) |
| use(x) | | lifetime_start(y) |
| lifetime_end(x) | | use(x, y) |
| lifetime_start(y) | | lifetime_end(y) |
| use(y) | | lifetime_end(x) |
| lifetime_end(y) | +-------------------+
+-------------------+
```
Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.
This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.
Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.
Fixes llvm.org/PR39671 (rdar://45939472).
Differential Revision: https://reviews.llvm.org/D55967
llvm-svn: 350420
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.
llvm-svn: 350419
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.
Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.
Differential Revision: https://reviews.llvm.org/D55266
llvm-svn: 350408
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.
llvm-svn: 350334
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.
Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
library functions, for which it inserts declarations as necessary.
- Update tests.
Caveats:
- There is one test that I dropped, because it specifically tested the
definition of the ctor.
Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka
Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji
Differential Revision: https://reviews.llvm.org/D55647
llvm-svn: 350305
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.
Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.
This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.
rdar://problem/47005143
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D56235
llvm-svn: 350284
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.
Reviewers: tejohnson, grimar
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56117
llvm-svn: 350269
The caller to EraseInstruction had this conditional:
// ARC calls with null are no-ops. Delete them.
if (IsNullOrUndef(Arg))
but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.
This adds support for both of these cases.
rdar://problem/47003805
llvm-svn: 350261
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.
Differential Revision: https://reviews.llvm.org/D56185
llvm-svn: 350257
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
llvm-svn: 350199
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.
llvm-svn: 350173
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.
llvm-svn: 350116
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)
This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.
Patch by Yuanfang Chen (tabloid.adroit)
Reviewers: craig.topper, evstupac
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55876
llvm-svn: 350074
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.
Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev
llvm-svn: 350049
Both of these places reference memset-like loops. Memset is precise.
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.
llvm-svn: 350044
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.
srcSize is derived from the size of an alloca, and we quit out if the
size of that is > the size of the thing we're copying to. Hence, we
should always copy everything over, so these sizes are precise.
Don't make srcSize itself a LocationSize, since optionality isn't
helpful, and we do some comparisons against other sizes elsewhere in
that function.
llvm-svn: 350019
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)
The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.
Reviewers: davidxl, danielcdh, wmi, dblaikie
Reviewed By: dblaikie
Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D55681
llvm-svn: 349973
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
Currently, runtime unrolling does not support loops where multiple
exiting blocks exit to the latchExit. Added TODO and other code
clarifications for ConnectProlog code.
llvm-svn: 349944
This verification is linear in the size of the function, so it can cause
a quadratic compile-time explosion in a function with many loops to
unroll.
Differential Revision: https://reviews.llvm.org/D54732
llvm-svn: 349871
Summary:
This function is very similar to add_llvm_library(), so this patch merges it
into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...)
with add_llvm_library(lib MODULE ...)
Reviewers: philip.pfaffe, beanz, chandlerc
Reviewed By: philip.pfaffe
Subscribers: chapuni, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D51748
llvm-svn: 349839