update client code.
Also rename it to use the more generic term `call` instead of something
that could be confused with a praticular type.
Differential Revision: https://reviews.llvm.org/D56183
llvm-svn: 350508
minted `CallBase` class instead of the `CallSite` wrapper.
This moves the largest interwoven collection of APIs that traffic in
`CallSite`s. While a handful of these could have been migrated with
a minorly more shallow migration by converting from a `CallSite` to
a `CallBase`, it hardly seemed worth it. Most of the APIs needed to
migrate together because of the complex interplay of AA APIs and the
fact that converting from a `CallBase` to a `CallSite` isn't free in its
current implementation.
Out of tree users of these APIs can fairly reliably migrate with some
combination of `.getInstruction()` on the `CallSite` instance and
casting the resulting pointer. The most generic form will look like `CS`
-> `cast_or_null<CallBase>(CS.getInstruction())` but in most cases there
is a more elegant migration. Hopefully, this migrates enough APIs for
users to fully move from `CallSite` to the base class. All of the
in-tree users were easily migrated in that fashion.
Thanks for the review from Saleem!
Differential Revision: https://reviews.llvm.org/D55641
llvm-svn: 350503
The cttz/ctlz intrinsics have a parameter specifying whether the
result is undefined for zero. cttz(x, false) can be relaxed to
cttz(x, true) if x is known non-zero, and in fact such an optimization
is already performed. However, this currently doesn't work if x is
non-zero as a result of a select rather than an explicit branch.
This patch adds handling for this case, thus allowing
x != 0 ? cttz(x, false) : y to simplify to x != 0 ? cttz(x, true) : y.
Differential Revision: https://reviews.llvm.org/D55786
llvm-svn: 350463
This has some minor optimizations to shouldBeDeferred. This is not
strictly NFC because the early exit inside the loop assumes
TotalSecondaryCost is monotonically non-decreasing, which is not true if
the threshold used by CostAnalyzer is negative. AFAICT the thresholds do
not go below 0 for the default values of the various options we use.
llvm-svn: 350456
In addition to finding dead uses of instructions, also find dead uses
of function arguments, and replace them with zero as well.
I'm changing the way the known bits are computed here to remove the
coupling between the transfer function and the algorithm. It previously
relied on the first op being visited first and computing known bits --
unless the first op is not an instruction, in which case they're computed
on the second op. I could have adjusted this to check for "instruction
or argument", but I think it's better to avoid the repeated calculation
with an explicit flag.
Differential Revision: https://reviews.llvm.org/D56247
llvm-svn: 350435
The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.
The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.
This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.
Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.
Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.
Differential Revision: https://reviews.llvm.org/D56038
llvm-svn: 350429
At -O0, globalopt is not run during the compile step, and we can have a
chain of an alias having an immediate aliasee of another alias. The
summaries are constructed assuming aliases in a canonical form
(flattened chains), and as a result only the base object but no
intermediate aliases were preserved.
Fix by adding a pass that canonicalize aliases, which ensures each
alias is a direct alias of the base object.
Reviewers: pcc, davidxl
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D54507
llvm-svn: 350423
Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):
```
entry:
+------------+
| x = alloca |
| y = alloca |
+------------+
/ \
lhs: rhs:
+-------------------+ +-------------------+
| lifetime_start(x) | | lifetime_start(x) |
| use(x) | | lifetime_start(y) |
| lifetime_end(x) | | use(x, y) |
| lifetime_start(y) | | lifetime_end(y) |
| use(y) | | lifetime_end(x) |
| lifetime_end(y) | +-------------------+
+-------------------+
```
Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.
This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.
Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.
Fixes llvm.org/PR39671 (rdar://45939472).
Differential Revision: https://reviews.llvm.org/D55967
llvm-svn: 350420
Similar to rL350199 - there are no known analysis/codegen holes for
funnel shift intrinsics now, so we can canonicalize the 6+ regular
instructions to funnel shift to improve vectorization, inlining,
unrolling, etc.
llvm-svn: 350419
In some cases the order that we hoist instructions in means that when rehoisting
(which uses the same order as hoisting) we can rehoist to a block A, then a
block B, then block A again. This currently causes an assertion failure as it
expects that when changing the hoist point it only ever moves to a block that
dominates the hoist point being moved from.
Fix this by moving the re-hoist point when it doesn't dominate the dominator of
hoisted instruction, or in other words when it wouldn't dominate the uses of
the instruction being rehoisted.
Differential Revision: https://reviews.llvm.org/D55266
llvm-svn: 350408
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.
llvm-svn: 350334
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.
Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
library functions, for which it inserts declarations as necessary.
- Update tests.
Caveats:
- There is one test that I dropped, because it specifically tested the
definition of the ctor.
Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka
Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji
Differential Revision: https://reviews.llvm.org/D55647
llvm-svn: 350305
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.
Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.
This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.
rdar://problem/47005143
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D56235
llvm-svn: 350284
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.
Reviewers: tejohnson, grimar
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56117
llvm-svn: 350269
The caller to EraseInstruction had this conditional:
// ARC calls with null are no-ops. Delete them.
if (IsNullOrUndef(Arg))
but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.
This adds support for both of these cases.
rdar://problem/47003805
llvm-svn: 350261
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.
Differential Revision: https://reviews.llvm.org/D56185
llvm-svn: 350257
The final piece of IR-level analysis to allow this was committed with:
rL350188
Using the intrinsics should improve transforms based on cost models
like vectorization and inlining.
The backend should be prepared too, so we can now canonicalize more
sequences of shift/logic to the intrinsics and know that the end
result should be equal or better to the original code even if the
target does not have an actual rotate instruction.
llvm-svn: 350199
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
MSan used to report false positives in the case the argument of
llvm.is.constant intrinsic was uninitialized.
In fact checking this argument is unnecessary, as the intrinsic is only
used at compile time, and its value doesn't depend on the value of the
argument.
llvm-svn: 350173
Deletion of dead blocks in arbitrary order may lead to failure
of assertion in `DeleteDeadBlock` that requires that we have
deleted all predecessors before we can delete the current block.
We should instead delete them in RPO order.
llvm-svn: 350116
Summary:
Existing LIR recognizes CTLZ where shifting input variable right until it is zero. (Shift-Until-Zero idiom)
This commit:
1. Augments Shift-Until-Zero idiom to recognize CTTZ where input variable is shifted left.
2. Prepare for BitScan idiom recognition.
Patch by Yuanfang Chen (tabloid.adroit)
Reviewers: craig.topper, evstupac
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D55876
llvm-svn: 350074
This patch teaches LoopSimplifyCFG to remove dead exiting edges
from loops.
Differential Revision: https://reviews.llvm.org/D54025
Reviewed By: fedor.sergeev
llvm-svn: 350049
Both of these places reference memset-like loops. Memset is precise.
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.
llvm-svn: 350044
Trying to keep these patches super small so they're easily post-commit
verifiable, as requested in D44748.
srcSize is derived from the size of an alloca, and we quit out if the
size of that is > the size of the thing we're copying to. Hence, we
should always copy everything over, so these sizes are precise.
Don't make srcSize itself a LocationSize, since optionality isn't
helpful, and we do some comparisons against other sizes elsewhere in
that function.
llvm-svn: 350019
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)
The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.
Reviewers: davidxl, danielcdh, wmi, dblaikie
Reviewed By: dblaikie
Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D55681
llvm-svn: 349973
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
Currently, runtime unrolling does not support loops where multiple
exiting blocks exit to the latchExit. Added TODO and other code
clarifications for ConnectProlog code.
llvm-svn: 349944
This verification is linear in the size of the function, so it can cause
a quadratic compile-time explosion in a function with many loops to
unroll.
Differential Revision: https://reviews.llvm.org/D54732
llvm-svn: 349871
Summary:
This function is very similar to add_llvm_library(), so this patch merges it
into add_llvm_library() and replaces all calls to add_llvm_loadable_module(lib ...)
with add_llvm_library(lib MODULE ...)
Reviewers: philip.pfaffe, beanz, chandlerc
Reviewed By: philip.pfaffe
Subscribers: chapuni, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D51748
llvm-svn: 349839
LLVM treats void* pointers passed to assembly routines as pointers to
sized types.
We used to emit calls to __msan_instrument_asm_load() for every such
void*, which sometimes led to false positives.
A less error-prone (and truly "conservative") approach is to unpoison
only assembly output arguments.
llvm-svn: 349734
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.
This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).
This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.
The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.
Differential Revision: https://reviews.llvm.org/D52116
llvm-svn: 349725
Summary:
On non-Windows these are already removed by ShouldInstrumentGlobal.
On Window we will wait until we get actual issues with that.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55899
llvm-svn: 349707
Summary:
ICF prevented by removing unnamed_addr and local_unnamed_addr for all sanitized
globals.
Also in general unnamed_addr is not valid here as address now is important for
ODR violation detector and redzone poisoning.
Before the patch ICF on globals caused:
1. false ODR reports when we register global on the same address more than once
2. globals buffer overflow if we fold variables of smaller type inside of large
type. Then the smaller one will poison redzone which overlaps with the larger one.
Reviewers: eugenis, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D55857
llvm-svn: 349706