With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
%.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!
I will re-commit this if the bot does not recover.
llvm-svn: 250366
A PDB can be thought of as a very simple file system. It is
occasionally illuminating to see the contents of the underlying files.
Differential Revision: http://reviews.llvm.org/D13674
llvm-svn: 250356
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250345
Summary:
IDFCalculator used a DominatorTree instance for its calculations. Since the PostDominatorTree struct is not a subclass of DominatorTree, it wasn't possible to use PDT in IDFCalculator to compute post-dominance frontiers.
This patch makes IDFCalculator work with a DominatorTreeBase<BasicBlock> instead, which enables PDTs to be utilized.
Patch by Victor Campos (vhscampos@gmail.com)
Reviewers: dberlin
Subscribers: dberlin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13725
llvm-svn: 250320
This adds documentation for the binary profile encoding and moves the
documentation for the text encoding into the header file
SampleProfReader.h.
llvm-svn: 250309
Add a new command line switch, -gnu-hash-table, to print the content of that section.
Differential Revision: http://reviews.llvm.org/D13696
llvm-svn: 250291
Binary encoded profiles used to encode all function names inline at
every reference. This is clearly suboptimal in terms of space. This
patch fixes this by adding a name table to the header of the file.
llvm-svn: 250241
ArchiveMemberHeader, suggestion by Rafael Espíndola.
Also The clang-x86-win2008-selfhost bot still does not like the
malformed-machos 00000031.a test, so removing it for now. All
the other bots are fine with it however.
llvm-svn: 250222
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250204
In a later commit, `SplitBinaryAdd` will be used outside `IsConstDiff`,
so lift that out. And lift out `IsConstDiff` as
`computeConstantDifference` to keep things clean and to avoid playing
C++ access specifier games.
NFC.
llvm-svn: 250143
Continuing the work from last week to remove implicit ilist iterator
conversions. First related commit was probably r249767, with some more
motivation in r249925. This edition gets LLVMTransformUtils compiling
without the implicit conversions.
No functional change intended.
llvm-svn: 250142
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
llvm-svn: 250129
We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).
This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.
After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().
Differential Revision: http://reviews.llvm.org/D13665
llvm-svn: 250118
that caused aborts. This was because of the characters of the ‘Size’ field in
the archive header did not contain decimal characters.
rdar://22983603
llvm-svn: 250117
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250089
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.
For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.
Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.
The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.
The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.
This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.
No non-noise compile time impact was seen on a clang bootstrap build.
llvm-svn: 250032
The new implementation works at least as well as the old implementation
did.
Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.
llvm-svn: 249918
Remove implicit ilist iterator conversions from MachineBasicBlock.cpp.
I've also added an overload of `splice()` that takes a pointer, since
it's a natural API. This is similar to the overloads I added for
`remove()` and `erase()` in r249867.
llvm-svn: 249883
Be explicit about changes between pointers and iterators, as with other
recent commits. This transitively removes implicit ilist iterator
conversions from about 20 source files in CodeGen.
llvm-svn: 249869
Remove a few more implicit ilist iterator conversions, this time from
Analysis.cpp and BranchFolding.cpp.
I added a few overloads for `remove()` and `erase()`, which quite
naturally take pointers as well as iterators as parameters. This will
reduce the churn at least in the short term, but I don't really have a
problem with these existing for longer.
llvm-svn: 249867
This covers the common case of operations that cannot be sunk.
Operations that cannot be hoisted should already be handled properly via
the safe-to-speculate rules and mechanisms.
llvm-svn: 249865
With this patch we can now read and write inline stacks in sample
profiles to the binary encoded profiles.
In a subsequent patch, I will add a string table to the binary encoding.
Right now function names are emitted as strings every time we find them.
This is too bloated and will produce large files in applications with
lots of inlining.
llvm-svn: 249861
This reverts commit r249783, fully reinstating r249782. I've fixed the
bug in clang: it was a non-const iterator that dereferenced to const
(but had an implicit conversion to non-const).
llvm-svn: 249850
Summary:
Previously the relative address flag only affected PDB debug info. Now
both DIContext implementations always expect to be passed virtual
addresses. llvm-symbolizer is now responsible for adding ImageBase to
module offsets when --relative-offset is passed.
Reviewers: zturner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12883
llvm-svn: 249784
Apparently the iterators in `clang::CFGBlock` have an auto-conversion to
`CFGBlock *`, but the dereference operator gives `const CFGBlock &`.
Until I have a moment to fix that, revert the GenericDomTree chagnes
from r249782.
llvm-svn: 249783
Stop converting implicitly between iterators and pointers/references in
lib/IR. For convenience, I've added a `getIterator()` accessor to
`ilist_node` so that callers don't need to know how to spell the
iterator class (i.e., they can use `X.getIterator()` instead of
`Function::iterator(X)`).
I'll eventually disallow these implicit conversions entirely, but
there's a lot of code, so it doesn't make sense to do it all in one
patch. One library or so at a time.
Why? To root out cases of `getNextNode()` and `getPrevNode()` being
used in iterator logic. The design of `ilist` makes that invalid when
the current node could be at the back of the list, but it happens to
"work" right now because of a bug where those functions never return
`nullptr` if you're using a half-node sentinel. Before I can fix the
function, I have to remove uses of it that rely on it misbehaving.
(Maybe the function should just be deleted anyway? But I don't want
deleting it -- potentially a huge project -- to block fixing
ilist/iplist.)
llvm-svn: 249782
Summary:
These non-semantic changes will help make a later change adding
support for deopt operand bundles more streamlined.
Reviewers: reames, swaroop.sridhar
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13491
llvm-svn: 249779
This is to enable me to address review for D13491 -- `Flags` is a
bitfield of `StatepointFlags`, not an individual item out of the enum,
so it should be represented as an `uint32_t`.
llvm-svn: 249778
Summary:
This will be used in a later change to RewriteStatepointsForGC.
Reviewers: reames, swaroop.sridhar
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13490
llvm-svn: 249777
This fixes memory allocation problems by making the merge operation keep
the profile readers around until the merged profile has been emitted.
This is needed to prevent the inlined function names to disappear from
the function profiles. Since all the names are kept as references, once
the reader disappears, the names are also deallocated.
Additionally, XFAIL on big-endian architectures. The test case uses a
gcov file generated on a little-endian system.
llvm-svn: 249724
Like adds and subtracts, muls ripple only to the left so we can use
the same logic.
While we're here, add a print method to DemandedBits so it can be used
with -analyze, which we'll use in the testcase.
llvm-svn: 249686
The algorithm itself is still eager, but it doesn't get run until a
query function is called. This greatly reduces the compile-time impact
of requiring DemandedBits when at runtime it is not often used.
NFCI.
llvm-svn: 249685
This patch adds support for reading sample profiles with inline stacks.
Inline stacks in a profile are generated when the sampled binary has
samples in inlined functions.
For instance, if main() calls foo() and foo() calls bar(), and bar() is
inlined into foo() and foo() inlined into main(), the profile may look
something like:
main total:364084 head:0
[ ... ]
2.3: _Z3fool total:243786
1: 60149
1.2: 38568
1.4: 46511
1.7: _Z3bari total:98558
1.1: 52672
1.2: 45886
At line 2, discriminator 3, main() calls foo(). In turn, foo() calls
bar() at line 1, discriminator 7.
In the textual format, this stacking of inline calls is represented
with indentation.
With this change, LLVM can now read sample profile files generated by
the create_gcov tool from https://github.com/google/autofdo.
llvm-svn: 249644
In r224059, we started verifying after addPass, but missed doing so on
insertPass. There isn't a good reason for the discrepancy, and
skipping the verifier in these cases causes bugs.
This also exposes a verifier error that was introduced in r249087, but
the verifier doesn't run until after the register coalescer, when the
issue happens to have been resolved. I've skipped the verifier after
SIFixSGPRLiveRangesID to avoid the failures for now and will follow up
with Matt for a proper fix.
llvm-svn: 249643
Previously the CompileOnDemand layer always created single-function partitions.
In theory this new API allows for more interesting partitions, though this has
not been well tested yet.
llvm-svn: 249623
The __CxxFrameHandler3 tables for 32-bit are supposed to hold stack
offsets relative to EBP, not ESP. I blindly updated the win-catchpad.ll
test case, and immediately noticed that 32-bit catching stopped working.
While I'm at it, move the frame index to frame offset WinEH table logic
out of PEI. PEI shouldn't have to know about WinEHFuncInfo. I realized
we can calculate frame index offsets just fine from the table printer.
llvm-svn: 249618
Recycler just needs a singly-linked list, and it takes less (and
simpler) code to hand-roll one of those than to build up the equivalent
`iplist_traits`. In theory, this should speed things up a bit too, but
this is really just a drive-by cleanup so I haven't measured.
llvm-svn: 249615
Create `SymbolTableList`, a wrapper around `iplist` for lists that
automatically manage a symbol table. This commit reduces a ton of code
duplication between the six traits classes that were used previously.
As a drive by, reduce the number of template parameters from 2 to 1 by
using a SymbolTableListParentType metafunction (I originally had this as
a separate commit, but it touched most of the same lines so I squashed
them).
I'm in the process of trying to remove the UB in `createSentinel()` (see
the FIXMEs I added for `ilist_embedded_sentinel_traits` and
`ilist_half_embedded_sentinel_traits`). My eventual goal is to separate
the list logic into a base class layer that knows nothing about (and
isn't templated on) the downcasted nodes -- removing the need to invoke
UB -- but for now I'm just trying to get a handle on all the current use
cases (and cleaning things up as I see them).
Besides these six SymbolTable lists, there are two others that use the
addNode/removeNode/transferNodes() hooks: the `MachineInstruction` and
`MachineBasicBlock` lists. Ideally there'll be a way to factor these
hooks out of the low-level API entirely, but I'm not quite there yet.
llvm-svn: 249602
Summary:
This adds some more routines to `IRBuilder` around creating calls and
invokes to `gc.statepoint`. These will be used later.
Reviewers: reames, swaroop.sridhar
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13371
llvm-svn: 249596
There was an off-by-one bug in ip2state tables which manifested when one
call immediately preceded the try-range of the next. The return address
of the previous call would appear to be within the try range of the next
scope, resulting in extra destructors or catches running.
We also computed the wrong offset for catch parameter stack objects. The
offset should be from RSP, not from RBP.
llvm-svn: 249578
I'll be using the function in a similar combine for AArch64. The helper was
also improved to handle undef values.
Part of http://reviews.llvm.org/D13442
llvm-svn: 249572
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.
This should fix the exception handling issues in PR24792.
Differential Revision: http://reviews.llvm.org/D13132
llvm-svn: 249522
Since the `const` version of `getOperandBundle` returns a value of the
same type as the non-`const` version, the non-`const` version is
redundant.
llvm-svn: 249509
This allows modules containing aliases to be lazily jit'd. Previously these
failed with missing symbol errors because the aliases weren't cloned from the
original module.
llvm-svn: 249481
No classes are specializing the symbol table traits, so no need to look
through a typedef for class API. Make a few more functions private
since only SymbolTableListTraits should be using them.
llvm-svn: 249476
Nothing inherits from `MachineBasicBlock`, so this should have no real
functionality change. Just makes the code easier to understand.
llvm-svn: 249473
Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).
This commit changes `SimplifyIndVar` to require a `DominatorTree`. I
don't think this is a problem because `ScalarEvolution` requires it
anyway.
Fixes PR25051.
Depends on D13459.
Reviewers: atrick, hfinkel
Subscribers: joker.eph, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13460
llvm-svn: 249471
The only specializations of `getSymTab()` were identical to the default
defined in `SymbolTableListTraits::getSymTab()`. Remove the
specializations, and stop treating it like a configuration point. Just
to be sure no one else accesses this, make it private.
llvm-svn: 249469
Summary:
Assign one state number per handler/funclet, tracking parent state,
handler type, and catch type token.
State numbers are arranged such that ancestors have lower state numbers
than their descendants.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13450
llvm-svn: 249457
Summary:
- Add CoreCLR to if/else ladders and switches as appropriate.
- Rename isMSVCEHPersonality to isFuncletEHPersonality to better
reflect what it captures.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13449
llvm-svn: 249455
Given the work we are doing on ThinLTO, we will never need to support
module groups and working sets in GCC's implementation of LIPO. These
are currently dead code, and will continue to be so.
llvm-svn: 249351
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.
With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.
In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.
This patch is a joint work by Maxim Ostapenko andy myself.
llvm-svn: 249303
Summary:
The bitcode format is described in this document:
https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view
For more info on ThinLTO see:
https://sites.google.com/site/llvmthinlto
The first customer is ThinLTO, however the data structures are designed
and named more generally based on prior feedback. There are a few
comments regarding how certain interfaces are used by ThinLTO, and the
options added here to gold currently have ThinLTO-specific names as the
behavior they provoke is currently ThinLTO-specific.
This patch includes support for generating per-module function indexes,
the combined index file via the gold plugin, and several tests
(more are included with the associated clang patch D11908).
Reviewers: dexonsmith, davidxl, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13107
llvm-svn: 249270
Track which basic blocks belong to which funclets. Permit branch
folding to fire but only if it can prove that doing so will not cause
code in one funclet to be reused in another.
llvm-svn: 249257
The trailing backslashes in some ASCII art added in r248527 cause a
"error: multi-line comment [-Werror=comment]" when building with gcc
4.9.1 -Wall. Swallow (ASCII-)artistic integrity and use pipes instead.
llvm-svn: 249212
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.
http://reviews.llvm.org/D12992
llvm-svn: 249196
Summary:
This change teaches SCEV that to prove `A u< B` it is sufficient to
prove each of these facts individually:
- B >= 0
- A s< B
- A >= 0
In practice, SCEV sometimes finds it easier to prove these facts
individually than to prove `A u< B` as one atomic step.
Reviewers: reames, atrick, nlewycky, hfinkel
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13042
llvm-svn: 249168
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.
With this, some basic __try / __except examples work.
llvm-svn: 249078
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of. Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.
llvm-svn: 249052
v2: Add test (Matt).
Fix capitalization of isEOP (Matt).
Move pattern to class parameter (Matt).
Make the instruction available to Cayman (Matt).
Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.
Patch by: Zoltan Gilian
llvm-svn: 249042
Summary:
Without this patch, the memory manager would call `mprotect` on every memory
region it ever allocated whenever it wanted to finalize memory (i.e. not just
the ones it just allocated). This caused terrible performance problems for
long running memory managers. In one particular compile heavy julia benchmark,
we were spending 50% of time in `mprotect` if running under MCJIT.
Fix this by splitting allocated memory blocks into those on which memory
permissions have been set and those on which they haven't and only running
`mprotect` on the latter.
Reviewers: lhames
Subscribers: reames, llvm-commits
Differential Revision: http://reviews.llvm.org/D13156
llvm-svn: 248981
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.
Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call. Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.
The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.
llvm-svn: 248959
glibc's PowerPC /usr/include/asm/sigcontext.h, has this:
#ifdef __powerpc64__
#include <asm/elf.h>
#endif
and that contains defines of all of the relocation symbols, like this:
#define R_PPC_NONE 0
and if that file is included prior to including
include/llvm/Support/ELFRelocs/PowerPC*.def, which we cannot in general
prevent, the result will fail.
As it turns out, this happens when compiling
lld/unittests/DriverTests/GnuLdDriverTest.cpp under PPC64/Linux, because:
lld/include/lld/ReaderWriter/ELFLinkingContext.h includes
lld/unittests/DriverTests/DriverTest.h which includes
utils/unittest/googletest/include/gtest/gtest.h which includes
utils/unittest/googletest/include/gtest/internal/gtest-internal.h which includes
/usr/include/sys/wait.h which includes
/usr/include/signal.h which includes
/usr/include/bits/sigcontext.h which includes
/usr/include/asm/sigcontext.h which includes
/usr/include/asm/elf.h
the test could be fixed to include ReaderWriter/ELFLinkingContext.h before
including unittests/DriverTests/DriverTest.h, but dealing with this in the
*.def files is a more-general solution that localizes the fix to the headers
instead of requiring changes to an unbounded number of other source files (both
in-tree and external).
llvm-svn: 248957
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.
In particular, there were no tests for TrustZone being supported in these
architectures.
The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.
Differential Revision: http://reviews.llvm.org/D13236
llvm-svn: 248921
Summary:
As per Duncan's review for D12536, I extracted the sub-byte bit aligned
reading and writing code into lib/Support, and generalized it. Added calls from
BackpatchWord. Also added unittests.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13189
llvm-svn: 248897
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,
<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)
to
<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)
This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.
Differential Revision: http://reviews.llvm.org/D12985
llvm-svn: 248887
Add support to the indexed instrprof reader and writer for the format
that will be used for value profiling.
Patch by Betul Buyukkurt, with minor modifications.
llvm-svn: 248833
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.
In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.
When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.
One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)
Differential Revision: http://reviews.llvm.org/D12681
llvm-svn: 248832
Summary:
Funclets have been turned into functions by the time they hit the object
file. Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.
Differential Revision: http://reviews.llvm.org/D13261
llvm-svn: 248824
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.
Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have
%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)
the x86 back-end emits:
movaps 32(%ecx), %xmm2
movaps (%ecx), %xmm0
movaps 16(%ecx), %xmm1
movaps 48(%ecx), %xmm3
subl $20, %esp <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards
movaps %xmm3, (%esp) <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl $0, 16(%esp)
calll __bar
To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:
subl $32, %esp
movaps %xmm3, (%esp)
Differential Revision: http://reviews.llvm.org/D12337
llvm-svn: 248786
When llvm declarations have argument names, it's helpful to actually
print those names when debugging. Arguably, it'd be nice to print them
all the time, but that would mean the IR we output wouldn't round trip
through bitcode, which doesn't store the names.
Make the varous print() methods in AsmWriter optionally print "for
debug" and set that flag in the dump() methods. The only thing this
does differently for now is print the argument names in declarations.
llvm-svn: 248692
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.
Reviewers: andrew.w.kaylor, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13152
llvm-svn: 248677
llvm::format compiles down to snprintf which has no defined rounding for
floating point arguments, and MSVC has implemented it differently from
what the BSD libcs and glibc do. Try to emulate the glibc rounding
behavior to avoid changing tests.
While there simplify code a bit and move trivial methods inline.
llvm-svn: 248665
Summary:
This change teaches SCEV's `isImpliedCond` two new identities:
A u< B u< -C => (A + C) u< (B + C)
A s< B s< INT_MIN - C => (A + C) s< (B + C)
While these are useful on their own, they're really intended to support
D12950.
The original checkin, r248606 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.
Reviewers: atrick, reames, majnemer, nlewycky, hfinkel
Subscribers: aadg, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12948
llvm-svn: 248637
I realized that the live-out set computed for the return block is
missing the callee saved registers (the non-pristine ones to be exact).
This only affects the liveness computed for instructions inside the
function epilogue which currently none of the LivePhysRegs users in llvm
cares about, so this is just a drive-by fix without a testcase.
Differential Revision: http://reviews.llvm.org/D13180
llvm-svn: 248636
BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:
Pros:
1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.
Cons:
1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.
One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.
Differential revision: http://reviews.llvm.org/D12603
llvm-svn: 248633
Summary:
The default behavior is to omit the .section directive for .text, .data,
and sometimes .bss, but some targets may want to omit this directive for
other sections too.
The AMDGPU backend will uses this to emit a simplified syntax for section
switches. For example if the section directive is not omitted (current
behavior), section switches to .hsatext will be printed like this:
.section .hsatext,#alloc,#execinstr,#write
This is actually wrong, because .hsatext has some custom STT_* flags,
which MC doesn't know how to print or parse.
If the section directive is omitted (made possible by this commit),
section switches will be printed like this:
.hsatext
The motivation for this patch is to make it possible to emit sections
with custom STT_* flags without having to teach MC about all the target
specific STT_* flags.
Reviewers: rafael, grosbach
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12423
llvm-svn: 248618
Summary:
This new helper routine will be used in a subsequent change.
Reviewers: hfinkel
Subscribers: hfinkel, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12949
llvm-svn: 248607
Summary:
This change teaches SCEV's `isImpliedCond` two new identities:
A u< B u< -C => (A + C) u< (B + C)
A s< B s< INT_MIN - C => (A + C) s< (B + C)
While these are useful on their own, they're really intended to support
D12950.
Reviewers: atrick, reames, majnemer, nlewycky, hfinkel
Subscribers: aadg, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12948
llvm-svn: 248606
Arguments to function calls marked "nocapture" can be marked as
non-escaping. However, nocapture is defined in terms of the lifetime
of the callee, and if the callee can directly or indirectly recurse to
the caller, the semantics of nocapture are invalid.
Therefore, we eagerly discover which SCC each function belongs to,
and later can check if callee and caller of a callsite belong to
the same SCC, in which case there could be recursion.
This means that we can't be so optimistic in
getModRefInfo(ImmutableCallsite) - previously we assumed all call
arguments never aliased with an escaping global. Now we need to check,
because a global could now be passed as an argument but still not
escape.
This also solves a related conformance problem: MemCpyOptimizer can
turn non-escaping stores of globals into calls to intrinsics like
llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the
global can't escape and so returns NoModRef when queried, when
obviously a memcpy/memset call does indeed reference and modify its
arguments.
This fixes PR24800, PR24801, and PR24802.
llvm-svn: 248576
Summary:
This also adds the first set of tests for operand bundles.
The optimizer has not been audited to ensure that it does the right
thing with operand bundles.
Depends on D12456.
Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner
Subscribers: maksfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D12457
llvm-svn: 248551
The doesn't seem to be a difference and ELFOSABI_NONE seems to be far more
common:
* Linux doesn't care when loading and puts ELFOSABI_NONE on core dumps.
* Gold and bfd ld produce files with ELFOSABI_NONE.
* Gold and bfd ld seems to ignore EI_OSABI other than for freebsd.
* Gas puts ELFOSABI_NONE in most .o files.
llvm-svn: 248534
These are necessary for implementing mem_fence for
OpenCL 2.0.
The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.
llvm-svn: 248532
Summary:
This change teaches `CallInst`s and `InvokeInst`s to maintain a set of
operand bundles as part of its operands. `CallInst`s and `InvokeInst`s
with operand bundles co-allocate some space before their `Use` array to
hold meta information about which of its operands are part of an operand
bundle.
The strings corresponding to the bundle tags are interned into
`LLVMContextImpl::BundleTagCache`
This change does not include any parsing / bitcode support. That's the
next change.
Depends on D12455.
Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner
Subscribers: MatzeB, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12456
llvm-svn: 248527
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.
This patch changes the handling of +t2dsp to be in line with other
architecture extensions.
Following a revert of r248152 and new review comments, this patch also includes
renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc.
The spelling of "t2dsp" is preserved, pending a further investigation of its
possible external usage.
Differential Revision: http://reviews.llvm.org/D12937
llvm-svn: 248519
Allow a target to do something other than search for copies
that will avoid cross register bank copies.
Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.
I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid kinds
of subregister copies on some targets.
I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.
The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.
llvm-svn: 248478
Summary:
With this change, subclasses of `llvm::User` will be able to co-allocate
a variable number of bytes (called a "descriptor") with the `llvm::User`
instance. The co-allocated descriptor can later be accessed using
`llvm::User::getDescriptor`. This will be used in later changes to
implement operand bundles.
This change steals one bit from `NumUserOperands`, but given that it is
still 28 bits wide I don't think this will be a practical issue.
This change does not allow allocating hung off uses with descriptors.
This only for simplicity, not for any fundamental reason; and we can
easily add this functionality later if needed.
Reviewers: reames, chandlerc, dexonsmith, kmod, majnemer, pete, JosephTremoulet
Subscribers: pete, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12455
llvm-svn: 248453
...because that's what the cost model was intended to do.
As discussed in D12882, this fix has a temporary unintended consequence for
SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make
PR24818 right, and two wrongs make PR24343 act right even though it's really
still wrong.
I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this
cost model change and preserve the righteousness for the bug report cases.
https://llvm.org/bugs/show_bug.cgi?id=24818https://llvm.org/bugs/show_bug.cgi?id=24343
Differential Revision: http://reviews.llvm.org/D12882
llvm-svn: 248439
Note: I'm am not trying to describe what "should be"; I'm only describing what is true today.
This came out of my recent question to llvm-dev titled: When can the dominator tree not contain a node for a basic block?
Differential Revision: http://reviews.llvm.org/D13078
llvm-svn: 248417
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
This is a re-commit of a change in r248357 that was reverted in
r248358.
llvm-svn: 248405
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.
I've refactored the call sites I could find easily to use getZero /
getOne.
Reviewers: hfinkel, majnemer, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12947
llvm-svn: 248362
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
llvm-svn: 248357
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
llvm-svn: 248291
Because mod is always exact, this function should have never taken a rounding mode argument. The actual implementation still has issues, which I'll look at resolving in a subsequent patch.
llvm-svn: 248195
Based on conversations with Justin and a few others, these constructors
are really useful to have in the executable so that you can call them
from the debugger. After some measurements, these *particular* calls
aren't so problematic as to make them a good tradeoff for always inline.
Please let me know if there are other functions really needed for
debugging. The always inline attribute is a hack that we should only
really employ when it doesn't hurt.
llvm-svn: 248188
The definition of the DivergenceAnalysis pass was in a CPP
file and wasn't accessible to users of the analysis to get it
through "getAnalysis<>()".
This patch extracts the definition into a separate header that
can be used by users of the analysis to fetch the results.
Patch by Volkan Keles (vkeles@apple.com)
llvm-svn: 248186
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.
This patch changes the handling of +t2dsp to be in line with other
architecture extensions.
Following review comments, also updating the description of FeatureDSPThumb2
in ARM.td.
Differential Revision: http://reviews.llvm.org/D12937
llvm-svn: 248152
and assert when mask is too large to apply in the small case,
previously the extra words were silently ignored.
clang-format the entire function to match current code standards.
This is a rewrite of r247972 which was reverted in r247983 due to
warning and possible UB on 32-bits hosts.
llvm-svn: 247993
We shifted the MachineBasicBlocks to the end of the MachineFunction in
DFS order. This will not ensure that MachineBasicBlocks which fell
through to one another will remain contiguous. Instead, implement
a stable sort algorithm for iplist.
This partially reverts commit r214150.
llvm-svn: 247978
Extend mask value to 64 bits before taking its complement and assert when mask is
too large to apply in the small case (previously the extra words were silently ignored).
http://reviews.llvm.org/D11890
Patch by James Touton!
llvm-svn: 247972
Since aliases actually use and verify their explicit type already, no
further invalid testing is required here. The
invalid.test:ALIAS-TYPE-MISMATCH case catches errors due to emitting a
non-pointee type in the new format or a non-pointer type in the old
format.
llvm-svn: 247952
Windows EH funclets need to be contiguous. The FuncletLayout pass will
ensure that the funclets are together and begin with a funclet entry MBB.
Differential Revision: http://reviews.llvm.org/D12943
llvm-svn: 247937
This makes catchret look more like a branch, and less like a weird use
of BlockAddress. It also lets us get away from
llvm.x86.seh.restoreframe, which relies on the old parentfpoffset label
arithmetic.
llvm-svn: 247936
This reverts commit r247898 (which reverted r247894).
Patch fixed to address two issues exposed by buildbots:
- unused variable warning in NDEBUG mode
- std::initializer_list lifetime issue causing test failures
Original Summary:
Support for including the function bitcode indices in the Value Symbol
Table. This requires writing the VST after the function blocks, which in
turn requires a new VST forward declaration record encoding the offset of
the full VST (which is backpatched to contain the offset after the VST
is written).
This patch also enables the lazy function reader to use the new function
indices out of the VST. This support will be used by ThinLTO as well, which
will be in a follow on patch. Backwards compatibility with older bitcode
files is maintained.
A new test is also included.
The bitcode format (used for the lazy reader as well as the upcoming
ThinLTO patches) came out of discussions with Duncan and others and is
described here:
https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view
Reviewers: dexonsmith, davidxl, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12536
llvm-svn: 247927
getLandingPadSuccessor assumes that each invoke can have at most one EH
pad successor, but WinEH invokes can have more than one. Two out of
three callers of getLandingPadSuccessor don't use the returned
landingpad, so we can make them use this simple predicate instead.
Eventually we'll have to circle back and fix SplitKit.cpp so that
register allocation works. Baby steps.
llvm-svn: 247904
Temporarily revert to fix some buildbot issues. One is a minor issue
with a variable unused in NDEBUG mode. More concerning are some test
failures on win7 that I need to dig into.
This reverts commit 4e66a74543459832cfd571db42b4543580ae1d1d.
llvm-svn: 247898
Summary:
Support for including the function bitcode indices in the Value Symbol
Table. This requires writing the VST after the function blocks, which in
turn requires a new VST forward declaration record encoding the offset of
the full VST (which is backpatched to contain the offset after the VST
is written).
This patch also enables the lazy function reader to use the new function
indices out of the VST. This support will be used by ThinLTO as well, which
will be in a follow on patch. Backwards compatibility with older bitcode
files is maintained.
A new test is also included.
The bitcode format (used for the lazy reader as well as the upcoming
ThinLTO patches) came out of discussions with Duncan and others and is
described here:
https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view
Reviewers: dexonsmith, davidxl, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12536
llvm-svn: 247894
This adds enough machinery to support reading simple GCC AutoFDO
profiles. It now supports reading flat profiles (no function calls).
Subsequent patches will add support for:
- Inlined calls (in particular, the inline call stack is not traversed
to accumulate samples).
- Working sets and modules. These are used mostly for GCC's LIPO
optimizations, so they're not needed in LLVM atm. I'm not sure that
we will ever need them. For now, I've if0'd around the calls.
The patch also adds support in GCOV.h for gcov version V704 (generated
by GCC's profile conversion tool).
llvm-svn: 247874
from outside the BitstreamWriter.
Split out of patch D12536 (Function bitcode index in Value Symbol Table
and lazy reading support), which will use it to patch in the VST offset.
llvm-svn: 247847
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`). This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.
Later, we can also consider optimizing:
icmp slt signum(x), 0 --> icmp slt x, 0
icmp sle signum(x), 1 --> true
etc.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12703
llvm-svn: 247846
Clang now passes the adjectives as an argument to catchpad.
Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.
llvm-svn: 247844
Otherwise we'd try to emit the thunk that passes the LSDA to
__CxxFrameHandler3. We don't emit the LSDA if there were no landingpads,
so we'd end up with an assembler error when trying to write the COFF
object.
llvm-svn: 247820
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing,
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests:
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.
This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.
This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.
Differential Revision: http://reviews.llvm.org/D12095
llvm-svn: 247815
This is the mirror image of r242395.
When X86FrameLowering::emitEpilogue() looks for where to insert the %esp addition that
deallocates stack space used for local allocations, it assumes that any sequence of pop
instructions from function exit backwards consists purely of restoring callee-save registers.
This may be false, since from some point backward, the pops may be clean-up of stack space
allocated for arguments to a call.
Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D12688
llvm-svn: 247784
When building LLVM as a (potentially dynamic) library that can be linked against
by multiple compilers, the default triple is not really meaningful.
We allow to explicitely set it to an empty string when configuring LLVM.
In this case, said "target independent" tests in the test suite that are using
the default triple are disabled by matching the newly available feature
"default_triple".
Reviewers: probinson, echristo
Differential Revision: http://reviews.llvm.org/D12660
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247775
While packaging 3.7 for Fedora, the debug info splitting
process fell over this, so fix it upstream seems like a good plan.
This should be put in the 3.7 branch as well.
Noticed by Dave Airlie <airlied@redhat.com>
llvm-svn: 247757
The verifier currently runs three times in LTO: (1) after parsing, (2)
at the beginning of the optimization pipeline, and (3) at the end of it.
The first run is important, since we're not sure where the bitcode comes
from and it's nice to validate it, but in release builds the extra runs
aren't appropriate.
This commit:
- Allows these runs to be disabled in LTOCodeGenerator.
- Adds command-line options to llvm-lto.
- Adds command-line options to libLTO.dylib, and disables the verifier
by default in release builds (based on NDEBUG).
This shaves about 3.5% off the runtime of ld64 when linking
verify-uselistorder with -flto -g.
rdar://22509081
llvm-svn: 247729
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247692
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).
For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.
This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.
This commit also contains a trivial patch to clang to account for the C++ API
change.
Reviewers: rengolin
Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10969
llvm-svn: 247683
This is needed by all GlobalObjects (GlobalAlias, Function,
GlobalVariable), see the GlobalObject::getValueType which is used in
many places. If at some point that can be removed, then we can remove
this member.
llvm-svn: 247621
Targets that have non-traditional jump table mechanisms may need to do
something substantially different for jump tables than what
AsmPrinter::EmitJumpTableInfo does. This patch makes that function
virtual so that targets can override it.
Differential Revision: http://reviews.llvm.org/D12786
llvm-svn: 247604
This was a flawed change - it just caused the getElementType call to be
deferred until later, when we really need to remove it. Now that the IR
for GlobalAliases has been updated, the root cause is addressed that way
instead and this change is no longer needed (and in fact gets in the way
- because we want to pass the pointee type directly down further).
Follow up patches to push this through GlobalValue, bitcode format, etc,
will come along soon.
This reverts commit 236160.
llvm-svn: 247585
DeletionCallbackHandle holds GAR in its creation. It assumes;
- It is registered as CallbackVH. It should not be moved in its life.
- Its parent, GAR, may be moved.
To move list<DeletionCallbackHandle> GlobalsAAResult::Handles,
GAR must be updated with the destination in GlobalsAAResult(&&).
llvm-svn: 247534
In some ways this is a very boring port to the new pass manager as there
are no interesting analyses or dependencies or other oddities.
However, this does introduce the first good example of a transformation
pass with non-trivial state porting to the new pass manager. I've tried
to carve out patterns here to replicate elsewhere, and would appreciate
comments on whether folks like these patterns:
- A common need in the new pass manager is to effectively lift the pass
class and some of its state into a public header file. Prior to this,
LLVM used anonymous namespaces to provide "module private" types and
utilities, but that doesn't scale to cases where a public header file
is needed and the new pass manager will exacerbate that. The pattern
I've adopted here is to use the namespace-cased-name of the core pass
(what would be a module if we had them) as a module-private namespace.
Then utility and other code can be declared and defined in this
namespace. At some point in the future, we could even have
(conditionally compiled) code that used modules features when
available to do the same basic thing.
- I've split the actual pass run method in two in order to expose
a private method usable by the old pass manager to wrap the new class
with a minimum of duplicated code. I actually looked at a bunch of
ways to automate or generate these, but they are all quite terrible
IMO. The fundamental need is to extract the set of analyses which need
to cross this interface boundary, and that will end up being too
unpredictable to effectively encapsulate IMO. This is also
a relatively small amount of boiler plate that will live a relatively
short time, so I'm not too worried about the fact that it is boiler
plate.
The rest of the patch is totally boring but results in a massive diff
(sorry). It just moves code around and removes or adds qualifiers to
reflect the new name and nesting structure.
Differential Revision: http://reviews.llvm.org/D12773
llvm-svn: 247501
Summary: This fixes a variety of typos in docs, code and headers.
Subscribers: jholewinski, sanjoy, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12626
llvm-svn: 247495
We had asserts in PHINode::addIncoming to check that the value types matched
the type of the PHI, and that the associated BB was not null. These did not
catch, however, later uses of setIncomingValue and setIncomingBlock (which are
called by addIncoming as well). Moving the asserts to PHINode::setIncoming*
provides better coverage. NFC.
llvm-svn: 247492
realignment should be forced.
With this commit, we can now force stack realignment when doing LTO and
do so on a per-function basis. Also, add a new cl::opt option
"stackrealign" to CommandFlags.h which is used to force stack
realignment via llc's command line.
Out-of-tree projects currently using -force-align-stack to force stack
realignment should make changes to attach the attribute to the functions
in the IR.
Differential Revision: http://reviews.llvm.org/D11814
llvm-svn: 247450
We used to have this magic "hasLoadLinkedStoreConditional()" callback,
which really meant two things:
- expand cmpxchg (to ll/sc).
- expand atomic loads using ll/sc (rather than cmpxchg).
Remove it, and, instead, introduce explicit callbacks:
- bool shouldExpandAtomicCmpXchgInIR(inst)
- AtomicExpansionKind shouldExpandAtomicLoadInIR(inst)
Differential Revision: http://reviews.llvm.org/D12557
llvm-svn: 247429
The former setup once resulted in us ignoring the module for C compilations,
but Clang now errors on this if the header is included from C code (which it is).
llvm-svn: 247377
When the driver tries to locate a program by its name, e.g. a linker, it
scans the paths provided by the toolchain using the ScanDirForExecutable
function. If the lookup fails, the driver uses
llvm::sys::findProgramByName. Unlike llvm::sys::findProgramByName,
ScanDirForExecutable is not aware of file extensions. If the program has
the "exe" extension in its name, which is very common on Windows,
ScanDirForExecutable won't find it under the toolchain-provided paths.
This patch changes the Windows version of the "`can_execute`" function
called by ScanDirForExecutable to respect file extensions, similarly to
llvm::sys::findProgramByName.
Patch by Oleg Ranevskyy
Reviewers: rnk
Differential Revision: http://reviews.llvm.org/D12711
llvm-svn: 247358
Except the changes that defined virtual destructors as =default, because that
ran into problems with GCC 4.7 and overriding methods that weren't noexcept.
llvm-svn: 247298
SmallVector to further help debug builds not waste their time calling
one line functions.
To give you an idea of why this is worthwhile, this change alone gets
another >10% reduction in the runtime of TripleTest.Normalization! It's
now under 9 seconds for me. Sadly, this is the end of the easy wins for
that test. Anything further will require some different architecture of
the test itself. Still, I'm pretty happy. 'check-llvm' now is under 35s
for me.
llvm-svn: 247259
These are now quite heavily used in unit tests and the host tools,
making it worth having them be reasonably fast even in an unoptimized
build. This change reduces the total runtime of TripleTest.Normalization
by yet another 10% to 15%. It is now under 10 seconds on my machine, and
the total check-llvm time has dropped from 38s to around 36s.
I experimented with a number of different options, and the code pattern
here consistently seemed to lower the cleanest, likely due to the
significantly simple CFG and far fewer redundant tests of 'Result'.
llvm-svn: 247257
The logic of this follows something Howard does in libc++ and something
I discussed with Chris eons ago -- for a lot of functions, there is
really no benefit to preserving "debug information" by leaving the
out-of-line even in debug builds. This is especially true as we now do
a very good job of preserving most debug information even in the face of
inlining. There are a bunch of methods in StringRef that we are paying
a completely unacceptable amount for with every debug build of every
LLVM developer.
Some day, we should fix Clang/LLVM so that developers can reasonable
use a default of something other than '-O0' and not waste their lives
waiting on *completely* unoptimized code to execute. We should have
a default that doesn't impede debugging while providing at least
plausable performance.
But today is not that day.
So today, I'm applying always_inline to the functions that are really
hurting the critical path for stuff like 'check_llvm'. I'm being very
cautious here, but there are a few other APIs that we really should do
this for as a matter of pragmatism. Hopefully we can rip this out some
day.
With this change, TripleTest.Normalization runtime decreases by over
10%, and the total 'check-llvm' time on my 48-core box goes from 38s to
just under 37s.
llvm-svn: 247253
'inline' specifier. That specifier may or may not be valid for a given
function, or it may be required for correct linkage even when the
compiler doesn't support the always_inline attribute.
llvm-svn: 247252
with the StringRef::split method when used with a MaxSplit argument
other than '-1' (which nobody really does today, but which should
actually work).
The spec claimed both to split up to MaxSplit times, but also to append
<= MaxSplit strings to the vector. One of these doesn't make sense.
Given the name "MaxSplit", let's go with it being a max over how many
*splits* occur, which means the max on how many strings get appended is
MaxSplit+1. I'm not actually sure the implementation correctly provided
this logic either, as it used a really opaque loop structure.
The implementation was also playing weird games with nullptr in the data
field to try to rely on a totally opaque hidden property of the split
method that returns a pair. Nasty IMO.
Replace all of this with what is (IMO) simpler code that doesn't use the
pair returning split method, and instead just finds each separator and
appends directly. I think this is a lot easier to read, and it most
definitely matches the spec. Added some tests that exercise the corner
cases around StringRef() and StringRef("") that all now pass.
I'll start using this in code in the next commit.
llvm-svn: 247249
on StringRef. Finding and splitting on a single character is
substantially faster than doing it on even a single character StringRef
-- we immediately get to a *very* tuned memchr call this way.
Even nicer, we get to this even in a debug build, shaving 18% off the
runtime of TripleTest.Normalization, helping PR23676 some more.
llvm-svn: 247244
manager to avoid a slow linear scan of every immutable pass and on every
attempt to find an analysis pass.
This speeds up 'check-llvm' on an unoptimized build for me by 15%, YMMV.
It should also help (a tiny bit) other folks that are really
bottlenecked on repeated runs of tiny pass pipelines across small IR
files.
llvm-svn: 247240
All of the complexity is in cleanupret, and it mostly follows the same
codepaths as catchret, except it doesn't take a return value in RAX.
This small example now compiles and executes successfully on win32:
extern "C" int printf(const char *, ...) noexcept;
struct Dtor {
~Dtor() { printf("~Dtor\n"); }
};
void has_cleanup() {
Dtor o;
throw 42;
}
int main() {
try {
has_cleanup();
} catch (int) {
printf("caught it\n");
}
}
Don't try to put the cleanup in the same function as the catch, or Bad
Things will happen.
llvm-svn: 247219
This reapply commit r247178 after post-commit review from D.Blaikie
in a way that makes it compatible with the existing API.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247215
The purpose is to allow templated wrapper to work with either
ArrayRef or any convertible operation:
template<typename Container>
void wrapper(const Container &Arr) {
impl(makeArrayRef(Arr));
}
with Container being a std::vector, a SmallVector, or an ArrayRef.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247214
The 32-bit tables don't actually contain PC range data, so emitting them
is incredibly simple.
The 64-bit tables, on the other hand, use the same table for state
numbering as well as label ranges. This makes things more difficult, so
it will be implemented later.
llvm-svn: 247192
This change enables EmitRecord to pass the supplied record Code to
EmitRecordWithAbbrevImpl, rather than insert it into the Vals array.
It is an enabler for changing EmitRecord to take an ArrayRef<uintty> instead
of a SmallVectorImpl<uintty>&
Patch suggested by Duncan P. N. Exon Smith, modified by myself a bit to get
correct assertion checking.
llvm-svn: 247186
With subregister liveness enabled we can detect the case where only
parts of a register are live in, this is expressed as a 32bit lanemask.
The current code only keeps registers in the live-in list and therefore
enumerated all subregisters affected by the lanemask. This turned out to
be too conservative as the subregister may also cover additional parts
of the lanemask which are not live. Expressing a given lanemask by
enumerating a minimum set of subregisters is computationally expensive
so the best solution is to simply change the live-in list to store the
lanemasks as well. This will reduce memory usage for targets using
subregister liveness and slightly increase it for other targets
Differential Revision: http://reviews.llvm.org/D12442
llvm-svn: 247171
Now that we have an explicit iterator over the idx2MBBMap in SlotIndices
we can use the fact that segments and the idx2MBBMap is sorted by
SlotIndex position so can advance both simultaneously instead of
starting from the beginning for each segment.
This complicates the code for the subregister case somewhat but should
be more efficient and has the advantage that we get the final lanemask
for each block immediately which will be important for a subsequent
change.
Removes the now unused SlotIndexes::findMBBLiveIns function.
Differential Revision: http://reviews.llvm.org/D12443
llvm-svn: 247170
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
Removed "cortex-r5f" and "cortex-m4f" from Target Parser, sinced they are
unknown cpu names for llvm and clang. Also updated default FPUs for R5 and M4
accordingly.
Differential Revision: http://reviews.llvm.org/D12692
Change-Id: Ib81c7216521a361d8ee1296e4b6a2aa00bd479c5
llvm-svn: 247136
Currently this hits an assert that extload should
always be supported, which assumes integer extloads.
This moves a hack out of SI's argument lowering and
is covered by existing tests.
llvm-svn: 247113
Change `EmitRecordWithAbbrev()` and friends to take an `ArrayRef<T>`
instead of requiring a `SmallVectorImpl<T>`. No functionality change
intended.
llvm-svn: 247107
Summary:
32-bit funclets have short prologues that allocate enough stack for the
largest call in the whole function. The runtime saves CSRs for the
funclet. It doesn't restore CSRs after we finally transfer control back
to the parent funciton via a CATCHRET, but that's a separate issue.
32-bit funclets also have to adjust the incoming EBP value, which is
what llvm.x86.seh.recoverframe does in the old model.
64-bit funclets need to spill CSRs as normal. For simplicity, this just
spills the same set of CSRs as the parent function, rather than trying
to compute different CSR sets for the parent function and each funclet.
64-bit funclets also allocate enough stack space for the largest
outgoing call frame, like 32-bit.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12546
llvm-svn: 247092
This change extends the bitset lowering pass to support bitsets that may
contain either functions or global variables. A function bitset is lowered to
a jump table that is laid out before one of the functions in the bitset.
Also add support for non-string bitset identifier names. This allows for
distinct metadata nodes to stand in for names with internal linkage,
as done in D11857.
Differential Revision: http://reviews.llvm.org/D11856
llvm-svn: 247080
This prevents MC clients from getting COFF.h, which conflicts with
winnt.h macros. Also a minor IWYU cleanup. Now the only public headers
including COFF.h are in Object, and they actually need it.
llvm-svn: 246784
Summary:
This function was not taking into account that the
input type could be a vector, and wasn't properly
working for vector types.
This caused an assert when building spec2k6 perlbmk for armv8.
Reviewers: rengolin, mzolotukhin
Subscribers: silviu.baranga, mzolotukhin, rengolin, eugenis, jmolloy, aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D12559
llvm-svn: 246759
Summary:
This intrinsic can be used to extract a pointer to the exception caught by
a given catchpad. Its argument has token type and must be a `catchpad`.
Also clarify ExtendingLLVM documentation regarding overloaded intrinsics.
Reviewers: majnemer, andrew.w.kaylor, sanjoy, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12533
llvm-svn: 246752
Summary:
Add a `cleanupendpad` instruction, used to mark exceptional exits out of
cleanups (for languages/targets that can abort a cleanup with another
exception). The `cleanupendpad` instruction is similar to the `catchendpad`
instruction in that it is an EH pad which is the target of unwind edges in
the handler and which itself has an unwind edge to the next EH action.
The `cleanupendpad` instruction, similar to `cleanupret` has a `cleanuppad`
argument indicating which cleanup it exits. The unwind successors of a
`cleanuppad`'s `cleanupendpad`s must agree with each other and with its
`cleanupret`s.
Update WinEHPrepare (and docs/tests) to accomodate `cleanupendpad`.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12433
llvm-svn: 246751
Function::print isn't interestingly different from Value::print. Just
let the only caller (in PrintCallGraphPass) call the Value version.
llvm-svn: 246720
This patch defines 'unpredictable' metadata. This metadata can be used to signal to the optimizer
or backend that a branch or switch is unpredictable, and therefore, it's probably better to not
split a compound predicate into multiple branches such as in CodeGenPrepare::splitBranchCondition().
This was discussed in:
https://llvm.org/bugs/show_bug.cgi?id=23827
Dependent patches to alter codegen and expose this in clang to follow.
Differential Revision; http://reviews.llvm.org/D12341
llvm-svn: 246688
Summary:
Add the necessary plumbing so that llvm_token_ty can be used as an
argument/return type in intrinsic definitions and correspondingly require
TokenTy in function types. TokenTy is an opaque type that has no target
lowering, but can be used in machine-independent intrinsics. It is
required for the upcoming llvm.eh.padparam intrinsic.
Reviewers: majnemer, rnk
Subscribers: stoklund, llvm-commits
Differential Revision: http://reviews.llvm.org/D12532
llvm-svn: 246651
We can just ask the ObjectWriter for it's stream instead of caching
around our own reference to it. No functionality change is intended.
llvm-svn: 246604
COFF sections are accompanied with an auxiliary symbol which includes a
checksum. This checksum used to be filled with just zero but this seems
to upset LINK.exe when it is processing a /INCREMENTAL link job.
Instead, fill the CheckSum field with the JamCRC of the section
contents. This matches MSVC's behavior.
This fixes PR19666.
N.B. A rather simple implementation of JamCRC is given. It implements
a byte-wise calculation using the method given by Sarwate. There are
implementations with higher throughput like slice-by-eight and making
use of PCLMULQDQ. We can switch to one of those techniques if it turns
out to be a significant use of time.
llvm-svn: 246590
This was last used by the pre-MC object emitter and has been dead for
quite a while. We have better ways to emit endian-dependent stuff now.
llvm-svn: 246571
-only-needed -- link in only symbols needed by destination module
-internalize -- internalize linked symbols
Differential Revision: http://reviews.llvm.org/D12459
llvm-svn: 246561
There are occasions where it is useful to consider the entirety of the
contents of a section. For example, compressed debug info needs the
entire section available before it can compress it and write it out.
The compressed debug info scenario was previously implemented by
mirroring the implementation of writeSectionData in the ELFObjectWriter.
Instead, allow the output stream to be swapped on demand. This lets
callers redirect the output stream to a more convenient location before
it hits the object file.
No functionality change is intended.
Differential Revision: http://reviews.llvm.org/D12509
llvm-svn: 246554
Follow LLVM style for the parameter names (`CamelCase` not `camelCase`),
and surface the header docs in doxygen. No functionality change
intended.
llvm-svn: 246509
Hopefully this will end the GEPs saga!
This commit reverts r245394, i.e., it reapplies r221876 while incorporating the
fixes from D11847.
r221876 was not reapplied alone because it was not safe and D11847 was not
applied alone because it needs r221876 to produce correct results.
This should fix PR24596.
Original commit message for r221876:
Let's try this again...
This reverts r219432, plus a bug fix.
Description of the bug in r219432 (by Nick):
The bug was using AllPositive to break out of the loop; if the loop break
condition i != e is changed to i != e && AllPositive then the
test_modulo_analysis_with_global test I've added will fail as the Modulo will
be calculated incorrectly (as the last loop iteration is skipped, so Modulo
isn't updated with its Scale).
Nick also adds this comment:
ComputeSignBit is safe to use in loops as it takes into account phi nodes, and
the == EK_ZeroEx check is safe in loops as, no matter how the variable changes
between iterations, zero-extensions will always guarantee a zero sign bit. The
isValueEqualInPotentialCycles check is therefore definitely not needed as all
the variable analysis holds no matter how the variables change between loop
iterations.
And this patch also adds another enhancement to GetLinearExpression - basically
to convert ConstantInts to Offsets (see test_const_eval and
test_const_eval_scaled for the situations this improves).
Original commit message:
This reverts r218944, which reverted r218714, plus a bug fix.
Description of the bug in r218714 (by Nick):
The original patch forgot to check if the Scale in VariableGEPIndex flipped the
sign of the variable. The BasicAA pass iterates over the instructions in the
order they appear in the function, and so BasicAliasAnalysis::aliasGEP is
called with the variable it first comes across as parameter GEP1. Adding a
%reorder label puts the definition of %a after %b so aliasGEP is called with %b
as the first parameter and %a as the second. aliasGEP later calculates that %a
== %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first
parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) -
ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly
conclude that %a > %b.
Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug.
Slightly modified by me to add an early exit from the loop and avoid
unnecessary, but expensive, function calls.
Original commit message:
Two related things:
1. Fixes a bug when calculating the offset in GetLinearExpression. The code
previously used zext to extend the offset, so negative offsets were converted
to large positive ones.
2. Enhance aliasGEP to deduce that, if the difference between two GEP
allocations is positive and all the variables that govern the offset are also
positive (i.e. the offset is strictly after the higher base pointer), then
locations that fit in the gap between the two base pointers are NoAlias.
Patch by Nick White!
Message from D11847:
Un-revert of r241981 and fix for PR23626. The 'Or' case of GetLinearExpression
delegates to 'Add' if possible, and if not it returns an Opaque value.
Unfortunately the Scale and Offsets weren't being set (and so defaulted to 0) -
and a scale of zero effectively removes the variable from the GEP instruction.
This meant that BasicAA would return MustAliases when it should have been
returning PartialAliases (and PR23626 was an example of the GVN pass using an
incorrect MustAlias to merge loads from what should have been different
pointers).
Differential Revision: http://reviews.llvm.org/D11847
Patch by Nick White <n.j.white@gmail.com>!
llvm-svn: 246502
This would have suppressed bug 24578, about use-after-
destroy on User and MDNode. Rolled back suppression for
the sake of code cleanliness, in preferance for bug
tracking to keep track of this issue.
This reverts commit 6ff2baabc4625d5b0a8dccf76aa0f72d930ea6c0.
llvm-svn: 246484
Also delete and simplify a lot of MachineModuleInfo code that used to be
needed to handle personalities on landingpads. Now that the personality
is on the LLVM Function, we no longer need to track it this way on MMI.
Certainly it should not live on LandingPadInfo.
llvm-svn: 246478
Based on comments from Hal
(http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150810/292978.html),
I've changed the interface to add a callback mechanism to the
TargetFrameLowering class to query whether the specific target
supports shrink wrapping. By default, shrink wrapping is disabled by
default. Each target can override the default behaviour using the
TargetFrameLowering::targetSupportsShrinkWrapping() method. Shrink
wrapping can still be explicitly enabled or disabled from the command
line, using the existing -enable-shrink-wrap=<true|false> option.
Phabricator: http://reviews.llvm.org/D12293
llvm-svn: 246463
Avoid marking some MCSymbols as used in MC/AsmParser.cpp when no uses
exist. This fixes a bug in parseAssignmentExpression() which
inadvertently sets IsUsed, thereby triggering:
"invalid re-assignment of non-absolute variable"
on otherwise valid code. No other functionality change intended.
The original version of this patch touched many calls to MCSymbol
accessors. On rafael's advice, I have stripped this patch down a bit.
As a follow-up, I intend to find the call sites which intentionally set
IsUsed and force them to do so explicitly.
Differential Revision: http://reviews.llvm.org/D12347
llvm-svn: 246457
Summary:
JumpThreading shouldn't duplicate a convergent call, because that would move a convergent call into a control-inequivalent location. For example,
if (cond) {
...
} else {
...
}
convergent_call();
if (cond) {
...
} else {
...
}
should not be optimized to
if (cond) {
...
convergent_call();
...
} else {
...
convergent_call();
...
}
Test Plan: test/Transforms/JumpThreading/basic.ll
Patch by Xuetian Weng.
Reviewers: resistor, arsenm, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12484
llvm-svn: 246415
Specifically, the header now provides llvm::thread, which is either a
typedef of std::thread or a replacement that calls the function synchronously
depending on the value of LLVM_ENABLE_THREADS.
llvm-svn: 246402
This reverts commit r246371, as it cause a rather obscure bug in AArch64
test-suite paq8p (time outs, seg-faults). I'll investigate it before
reapplying.
llvm-svn: 246379
of its strings when expanding the string literals from the macros, and
push all of the APIs to be StringRef instead of C-string APIs.
This (remarkably) removes a very non-trivial number of strlen calls. It
even deletes code and complexity from one of the primary users -- Clang.
llvm-svn: 246374
Value *getSplatValue(Value *Val);
It complements the CreateVectorSplat(), which creates 2 instructions - insertelement and shuffle with all-zero mask.
The new function recognizes the pattern - insertelement+shuffle and returns the splat value (or nullptr).
It also returns a splat value form ConstantDataVector, for completeness.
Differential Revision: http://reviews.llvm.org/D11124
llvm-svn: 246371
the necessary tables.
This will allow me to restructure the code and structures using this to
be significantly more efficient. It also removes the duplication of the
list of several enumerators. It also enshrines that the order of
enumerators match the order of the entries in the tables, something the
implementation code actually uses.
No functionality changed (yet).
llvm-svn: 246370
parsing logic prior to making substantial changes to it.
This parsing logic is incredibly wasteful, so I'm planning to rewrite
it. Just unittesting the triple parsing logic spends well over 80% of
its time in the ARM parsing logic, and others have measured significant
time spent here in real production compiles.
Stay tuned...
llvm-svn: 246369
This fixes PR24621 and matches what we do for `DILocation`. Although
the limit seems somewhat artificial, there are places in the backend
that also assume 16-bit columns, so we may as well just be consistent
about the limits.
llvm-svn: 246349
Add `Function::setSubprogram()` and `Function::getSubprogram()`,
convenience methods to forward to `setMetadata()` and `getMetadata()`,
respectively, and deal in `DISubprogram` instead of `MDNode`.
Also add a verifier check to enforce that `!dbg` attachments are always
subprograms.
Originally (when I had the llvm-dev discussion back in April) I thought
I'd store a pointer directly on `llvm::Function` for these attachments
-- we frequently have debug info, and that's much cheaper than using map
in the context if there are no other function-level attachments -- but
for now I'm just using the generic infrastructure. Let's add the extra
complexity only if this shows up in a profile.
llvm-svn: 246339