llvm/lib/Support/Signals.cpp:66:13: warning: unused function 'printSymbolizedStackTrace' [-Wunused-function]
llvm/lib/Support/Signals.cpp:52:13: warning: function 'findModulesAndOffsets' has internal linkage but is not defined [-Wundefined-internal]
llvm-svn: 252418
Under most circumstances, if SCEV can simplify X-Y to a constant, then it can
also simplify Y-X to a constant. However, there is no guarantee that this is
always true, and concensus is not to consider that a correctness bug in SCEV
(although it is undesirable).
PPCLoopPreIncPrep gathers pointers used to access memory (via loads, stores and
prefetches) into buckets, where in each bucket the relative pointer offsets are
constant. We used to keep each bucket as a multimap, where SCEV's subtraction
operation was used to define the ordering predicate. Instead, use a fixed SCEV
base expression for each bucket, record the constant offsets from that base
expression, and adjust it later, if desirable, once all pointers have been
collected.
Doing it this way should be more compile-time efficient than the previous
scheme (in addition to making the implementation less sensitive to SCEV
simplification quirks).
Fixes PR25170.
llvm-svn: 252417
The TailDuplication machine pass ran across a malformed CFG: a PHI node
referred it's predecessor's predecessor instead of it's predecessor.
This occurred because we split the edge in X86ISelLowering when we
processed the CATCHRET but forgot to do something about the PHI nodes.
This fixes PR25444.
llvm-svn: 252413
This commit adds enums in LLVMBitCodes.h to improve readability and
maintainability. This is a follow-up to r252368 which was discussed
here:
http://reviews.llvm.org/D12923
llvm-svn: 252395
Summary:
Teach the FunctionAttrs to do the right thing for IR with operand
bundles.
Reviewers: reames, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14408
llvm-svn: 252387
Summary:
This change fixes an iterator wraparound bug in
`determinePointerReadAttrs`.
Ideally, ++'ing off the `end()` of an iplist should result in a failed
assert, but currently iplist seems to silently wrap to the head of the
list on `end()++`. This is why the bad behavior is difficult to
demonstrate.
Reviewers: chandlerc, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14350
llvm-svn: 252386
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.
Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.
Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.
Reviewers: pgavlin, majnemer, rnk
Subscribers: jyknight, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14344
llvm-svn: 252383
FoldPHIArgZextsIntoPHI cannot insert an instruction after the PHI if
there is an EHPad in the BB. Doing so would result in an instruction
inserted after a terminator.
llvm-svn: 252377
Some implicit ilist iterator conversions have crept back into Analysis,
Transforms, Hexagon, and llvm-stress. This removes them.
I'll commit a patch immediately after this to disallow them (in a
separate patch so that it's easy to revert if necessary).
llvm-svn: 252371
We tried to insert a cast of a phi in a block whose terminator is an
EHPad. This is invalid. Do not attempt the transform in these
circumstances.
llvm-svn: 252370
This marker prevents optimization passes from adding 'tail' or
'musttail' markers to a call. Is is used to prevent tail call
optimization from being performed on the call.
rdar://problem/22667622
Differential Revision: http://reviews.llvm.org/D12923
llvm-svn: 252368
Summary:
In general GetTempDir follows the same logic as the replaced code: checks env variables TMP, TEMP, USERPROFILE in order. However, it also perform other checks like making separators native (\), making the path absolute, etc.
This change fixes FileSystemTest.CreateDir unittest that had been failing when run from Unix-like shell on Windows (Unix-like path separator (/) used in env variables).
Reviewers: chapuni, rafael, aaron.ballman
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D14231
llvm-svn: 252366
We used to try to constant-fold them to i32 immediates.
Given that fast-isel doesn't otherwise support vNi1, when selecting
the result users, we'd fallback to SDAG anyway.
However, if the users were in another block, we'd insert broken
cross-class copies (GPR32 to FPR64).
Give up, let SDAG agree with itself on a vNi1 legalization strategy.
llvm-svn: 252364
When matching non-LSB-extracting truncating broadcasts, we now insert
the necessary SRL. If the scalar resulted from a load, the SRL will be
folded into it, creating a narrower, offset, load.
However, i16 loads aren't Desirable, so we get i16->i32 zextloads.
We already catch i16 aextloads; catch these as well.
llvm-svn: 252363
Now that we recognize this, we can support it instead of bailing out.
That is, we can fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc (srl Y, 16)))))
llvm-svn: 252362
We used to incorrectly assume that the offset we're extracting from
was a multiple of the element size. So, we'd fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc Y))))
whereas we should have extracted the higher bits from X.
Instead, bail out if the assumption doesn't hold.
llvm-svn: 252361
Summary:
Pass the VOPProfile object all the through to *_m multiclasses. This will
allow us to do more simplifications in the future.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13437
llvm-svn: 252339
The SLPVectorizer had a very crude way of trying to benefit
from associativity: it tried to optimize for splat/broadcast
or in order to have the same operator on the same side.
This is benefitial to the cost model and allows more vectorization
to occur.
This patch improve the logic and make the detection optimal (locally,
we don't look at the full tree but only at the immediate children).
Should fix https://llvm.org/bugs/show_bug.cgi?id=25247
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D13996
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 252337
All 3 operands of FMA3 instructions are commutable now.
Patch by Slava Klochkov
Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab).
Differential Revision: http://reviews.llvm.org/D13269
llvm-svn: 252335
Modelling of the expression stack is evolving. This patch takes another
step by making pushes explicit.
Differential Revision: http://reviews.llvm.org/D14338
llvm-svn: 252334
Summary:
This change makes the `isImpliedCondition` interface similar to the rest
of the functions in ValueTracking (in that it takes a DataLayout,
AssumptionCache etc.). This is an NFC, intended to make a later diff
less noisy.
Depends on D14369
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14391
llvm-svn: 252333
Summary:
Currently `isImpliedCondition` will optimize "I +_nuw C < L ==> I < L"
only if C is positive. This is an unnecessary restriction -- the
implication holds even if `C` is negative.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14369
llvm-svn: 252332
Summary:
This change adds a framework for adding more smarts to
`isImpliedCondition` around inequalities. Informally,
`isImpliedCondition` will now try to prove "A < B ==> C < D" by proving
"C <= A && B <= D", since then it follows "C <= A < B <= D".
While this change is in principle NFC, I could not think of a way to not
handle cases like "i +_nsw 1 < L ==> i < L +_nsw 1" (that ValueTracking
did not handle before) while keeping the change understandable. I've
added tests for these cases.
Reviewers: reames, majnemer, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14368
llvm-svn: 252331
Mark kernels that use certain features that require user
SGPRs to support with kernel attributes. We need to know
before instruction selection begins because it impacts
the kernel calling convention lowering.
For now this only detects the workitem intrinsics.
llvm-svn: 252323
For some reason VS_32 ends up factoring into the pressure heuristics
even though we should never see a virtual register with this class.
When SGPRs are reserved for register spilling, this for some reason
triggers reg-crit scheduling.
Setting isAllocatable = 0 may help with this since that seems to remove
it from the default implementation's generated table.
llvm-svn: 252321
Summary:
This reverts commit r251965.
Restore "Move metadata linking after lazy global materialization/linking."
This restores commit r251926, with fixes for the LTO bootstrapping bot
failure.
The bot failure was caused by references from debug metadata to
otherwise unreferenced globals. Previously, this caused the lazy linking
to link in their defs, which is unnecessary. With this patch, because
lazy linking is complete when we encounter the metadata reference, the
materializer created a declaration. For definitions such as aliases and
comdats, it is illegal to have a declaration. Furthermore, metadata
linking should not change code generation. Therefore, when linking of
global value bodies is complete, the materializer will simply return
nullptr as the new reference for the linked metadata.
This change required fixing a different test to ensure there was a
real reference to a linkonce global that was only being reference from
metadata.
Note that the new changes to the only-needed-named-metadata.ll test
illustrate an issue with llvm-link -only-needed handling of comdat
groups, whereby it may result in an incomplete comdat group. I note this
in the test comments, but the issue is orthogonal to this patch (it can
be reproduced without any metadata at head).
Reviewers: dexonsmith, rafael, tra
Subscribers: tobiasvk, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D14447
llvm-svn: 252320
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer
Subscribers: MatzeB, llvm-commits
Differential Revision: http://reviews.llvm.org/D14407
llvm-svn: 252318
The benefit from converting narrow loads into a wider load (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is
enabled only in cortex-a57 on which performance benefits were verified.
llvm-svn: 252316
We now create the .eh_frame section early, just like every other special
section.
This means that the special flags are visible in code that explicitly
asks for ".eh_frame".
llvm-svn: 252313
Summary:
The bug was that the sldi instructions have immediate widths dependant on
their element size. So sldi.d has a 1-bit immediate and sldi.b has a 4-bit
immediate. All of these were using 4-bit immediates previously.
Reviewers: vkalintiris
Subscribers: llvm-commits, atanasyan, dsanders
Differential Revision: http://reviews.llvm.org/D14018
llvm-svn: 252297
Summary:
The bug was that the MIPS32R6/MIPS64R6/microMIPS32R6 versions of LSA and DLSA
(unlike the MSA version) failed to account for the off-by-one encoding of the
immediate. The range is actually 1..4 rather than 0..3.
Reviewers: vkalintiris
Subscribers: atanasyan, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14015
llvm-svn: 252295
Summary:
Without these patterns we would generate a complete LL/SC sequence.
This would be problematic for memory regions marked as WRITE-only or
READ-only, as the instructions LL/SC would read/write to the protected
memory regions correspondingly.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D14397
llvm-svn: 252293
This attribute allows the compiler to assume that the function never recurses into itself, either directly or indirectly (transitively). This can be used among other things to demote global variables to locals.
llvm-svn: 252282
This adds the EH_RESTORE x86 pseudo instr, which is responsible for
restoring the stack pointers: EBP and ESP, and ESI if stack realignment
is involved. We only need this on 32-bit x86, because on x64 the runtime
restores CSRs for us.
Previously we had to keep the CATCHRET instruction around during SEH so
that we could convince X86FrameLowering to restore our frame pointers.
Now we can split these instructions earlier.
This was confusing, because we had a return instruction which wasn't
really a return and was ultimately going to be removed by
X86FrameLowering. This change also simplifies X86FrameLowering, which
really shouldn't be building new MBBs.
No observable functional change currently, but with the new register
mask stuff in D14407, CATCHRET will become a register allocator barrier,
and our existing tests rely on us having reasonable register allocation
around SEH.
llvm-svn: 252266
Windows EH funclets need to always return to a single parent funclet. However, it is possible for earlier optimizations to combine funclets (probably based on one funclet having an unreachable terminator) in such a way that this condition is violated.
These changes add code to the WinEHPrepare pass to detect situations where a funclet has multiple parents and clone such funclets, fixing up the unwind and catch return edges so that each copy of the funclet returns to the correct parent funclet.
Differential Revision: http://reviews.llvm.org/D13274?id=39098
llvm-svn: 252249
The bug: I missed adding break statements in the switch / case.
Original commit message:
[SCEV] Teach SCEV some axioms about non-wrapping arithmetic
Summary:
- A s< (A + C)<nsw> if C > 0
- A s<= (A + C)<nsw> if C >= 0
- (A + C)<nsw> s< A if C < 0
- (A + C)<nsw> s<= A if C <= 0
Right now `C` needs to be a constant, but we can later generalize it to
be a non-constant if needed.
Reviewers: atrick, hfinkel, reames, nlewycky
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13686
llvm-svn: 252236
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.
For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.
This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.
Since this is an IR change, a bitcode upgrade has been provided.
Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.
Differential Revision: http://reviews.llvm.org/D14265
llvm-svn: 252219
inalloca variables were not treated as static allocas, therefore didn't
participate in regular stack instrumentation. We don't want them to
participate in dynamic alloca instrumentation as well.
llvm-svn: 252213
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.
While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.
llvm-svn: 252210
Mangling type information into MachineInstr opcode names was a temporary
measure, and it's starting to get hairy. At the same time, the MC instruction
printer wants to use AsmString strings for printing. This patch takes the
first step, starting the process of adding AsmStrings for instructions.
llvm-svn: 252203
Also, remove an enum hack where enum values were used as indexes into an array.
We may want to make this a real class to allow pattern-based queries/customization (D13417).
llvm-svn: 252196
The needed lld matching changes to be submitted immediately next,
but this revision will cause lld failures with this alone which is expected.
This removes the eating of the error in Archive::Child::getSize() when the characters
in the size field in the archive header for the member is not a number. To do this we
have all of the needed methods return ErrorOr to push them up until we get out of lib.
Then the tools and can handle the error in whatever way is appropriate for that tool.
So the solution is to plumb all the ErrorOr stuff through everything that touches archives.
This include its iterators as one can create an Archive object but the first or any other
Child object may fail to be created due to a bad size field in its header.
Thanks to Lang Hames on the changes making child_iterator contain an
ErrorOr<Child> instead of a Child and the needed changes to ErrorOr.h to add
operator overloading for * and -> .
We don’t want to use llvm_unreachable() as it calls abort() and is produces a “crash”
and using report_fatal_error() to move the error checking will cause the program to
stop, neither of which are really correct in library code. There are still some uses of
these that should be cleaned up in this library code for other than the size field.
The test cases use archives with text files so one can see the non-digit character,
in this case a ‘%’, in the size field.
These changes will require corresponding changes to the lld project. That will be
committed immediately after this change. But this revision will cause lld failures
with this alone which is expected.
llvm-svn: 252192
Summary:
This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it.
D11268 is quite old and has merge conflicts against the current trunk.
This request
- rebases D11268 onto the new trunk;
- resolves the merge conflicts;
- fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct.
Reviewers: echristo, rengolin, kubabrecka
Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl
Differential Revision: http://reviews.llvm.org/D14338
llvm-svn: 252177
This fixes the issue of wrong CFA calculation in the following case:
0x08048400 <+0>: push %ebx
0x08048401 <+1>: sub $0x8,%esp
0x08048404 <+4>: **call 0x8048409 <test+9>**
0x08048409 <+9>: **pop %eax**
0x0804840a <+10>: add $0x1bf7,%eax
0x08048410 <+16>: mov %eax,%ebx
0x08048412 <+18>: call 0x80483f0 <bar>
0x08048417 <+23>: add $0x8,%esp
0x0804841a <+26>: pop %ebx
0x0804841b <+27>: ret
The highlighted instructions are a product of movpc instruction. The call
instruction changes the stack pointer, and pop instruction restores its
value. However, the rule for computing CFA is not updated and is wrong on
the pop instruction. So, e.g. backtrace in gdb does not work when on the pop
instruction. This adds cfi instructions for both call and pop instructions.
cfi_adjust_cfa_offset** instruction is used with the appropriate offset for
setting the rules to calculate CFA correctly.
Patch by Violeta Vukobrat.
Differential Revision: http://reviews.llvm.org/D14021
llvm-svn: 252176
We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities.
I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :(
This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case.
llvm-svn: 252168
We were correctly skipping dbginfo intrinsics and terminators, but the initial bailout wasn't, causing it to bail out on almost any block.
llvm-svn: 252152
Summary:
GetUnderlyingObjects() can return "null" among its list of objects,
we don't want to deduce that two pointers can point to the same
memory in this case, so filter it out.
Reviewers: anemet
Subscribers: dexonsmith, llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 252149
Summary:
Remove the loop over the uses of the CallSite in ArgumentUsesTracker.
Since we have the `Use *` for actual argument operand, we can just use
pointer subtraction.
The time complexity remains the same though (except for a vararg
argument) -- `std::advance` is O(UseIndex) for the ArgumentList
iterator.
The real motivation is to make a later change adding support for operand
bundles simpler.
Reviewers: reames, chandlerc, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14363
llvm-svn: 252141
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.
This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.
llvm-svn: 252140
Summary:
The CLR's personality routine passes the pointer to the establisher frame
in RCX, not RDX.
Reviewers: pgavlin, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14343
llvm-svn: 252135
Summary:
llvm-symbolizer understands both PDBs and DWARF, so it is more likely to
succeed at symbolization. If llvm-symbolizer is unavailable, we will
fall back to dbghelp. This also makes our crash traces more similar
between Windows and Linux.
Reviewers: Bigcheese, zturner, chapuni
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12884
llvm-svn: 252118
With this change, instrumentation code and reader/write
code related to profile data structs are kept strictly
in-sync. THis will be extended to cfe and compile-rt
references as well.
Differential Revision: http://reviews.llvm.org/D13843
llvm-svn: 252113
Summary:
Earlier CaptureTracking would assume all "interesting" operands to a
call or invoke were its arguments. With operand bundles this is no
longer true.
Note: an earlier change got `doesNotCapture` working correctly with
operand bundles.
This change uses DSE to test the changes to CaptureTracking. DSE is a
vehicle for testing only, and is not directly involved in this change.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14306
llvm-svn: 252095
The generic infrastructure already did a lot of work to decide if the
fixup value is know or not. It doesn't make sense to reimplement a very
basic case: same fragment.
llvm-svn: 252090
Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.
Fixes PR24193.
llvm-svn: 252088
Splits PrintLoopPass into a new-style pass and a PrintLoopPassWrapper,
much like we already do for PrintFunctionPass and PrintModulePass.
llvm-svn: 252085
This is part-1 of the patch that replaces all edge weights in MBB by
probabilities, which only adds new interfaces. No functional changes.
Differential revision: http://reviews.llvm.org/D13908
llvm-svn: 252083
Summary:
Data operands of a call or invoke consist of the call arguments, and
the bundle operands associated with the `call` (or `invoke`)
instruction. The motivation for this change is that we'd like to be
able to query "argument attributes" like `readonly` and `nocapture`
for bundle operands naturally.
This change also provides a conservative "implementation" for these
attributes for any bundle operand, and an extension point for future
work.
Reviewers: chandlerc, majnemer, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14305
llvm-svn: 252077
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.
The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.
This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.
It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.
Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll
Differential Revision: http://reviews.llvm.org/D13988
llvm-svn: 252074
Summary:
This is intended to make a later change simpler.
Note: adding this bounds checking required fixing `X86FastISel`. As
far I can tell I've preserved original behavior but a careful review
will be appreciated.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14304
llvm-svn: 252073
Patch by Slava Klochkov
The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.
Reviewers: Quentin Colombet, Elena Demikhovsky
Differential Revision: http://reviews.llvm.org/D13710
llvm-svn: 252060
If we have a CMOV, OR and AND combination such as:
if (x & CN)
y |= CM;
And:
* CN is a single bit;
* All bits covered by CM are known zero in y;
Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction).
llvm-svn: 252057
When converting an alias to a non-alias when the aliasee is not
imported, ensure that the linkage type is set to external so that it is
a valid linkage type. Added a test case that exposed this issue.
llvm-svn: 252054
We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code:
if (c & flag1)
*a = x;
if (c & flag2)
*a = y;
...
There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to.
It is, however, legal to merge the stores together and do the store once:
tmp = undef;
if (c & flag1)
tmp = x;
if (c & flag2)
tmp = y;
if (c & flag1 || c & flag2)
*a = tmp;
The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too.
This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure.
The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate.
This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite.
llvm-svn: 252051
The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp
directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode.
llvm-svn: 252042
In my previous change to CVP (251606), I made CVP much more aggressive about trying to constant fold comparisons. This patch is a reversal in direction. Rather than being agressive about every compare, we restore the non-block local restriction for most, and then try hard for compares feeding returns.
The motivation for this is two fold:
* The more I thought about it, the less comfortable I got with the possible compile time impact of the other approach. There have been no reported issues, but after talking to a couple of folks, I've come to the conclusion the time probably isn't justified.
* It turns out we need to know the context to leverage the full power of LVI. In particular, asking about something at the end of it's block (the use of a compare in a return) will frequently get more precise results than something in the middle of a block. This is an implementation detail, but it's also hard to get around since mid-block queries have to reason about possible throwing instructions and don't get to use most of LVI's block focused infrastructure. This will become particular important when combined with http://reviews.llvm.org/D14263.
Differential Revision: http://reviews.llvm.org/D14271
llvm-svn: 252032
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
allow to lower call safepoints with relocates and results in a different
basic blocks. See test case for example.
Differential Revision: http://reviews.llvm.org/D14158
llvm-svn: 252028
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop. E.g.:
for (i)
A[i + 1] = A[i] + B[i]
=>
T = A[0]
for (i)
T = T + B[i]
A[i + 1] = T
The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load. Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.
This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning. As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here. Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.
In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads. I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.
The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop. Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%). This gain also transfers over to x86: it's
around 8-10%.
Right now the pass is off by default and can be enabled
with -enable-loop-load-elim. On the LNT testsuite, there are two
performance changes (negative number -> improvement):
1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
critical paths is reduced
2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
outside of LNT
The pass is scheduled after the loop vectorizer (which is after loop
distribution). The rational is to try to reuse LAA state, rather than
recomputing it. The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.
LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences). LAA is known to omit loop-independent
dependences in certain situations. The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.
Reviewers: dberlin, hfinkel
Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13259
llvm-svn: 252017
Summary: Will be used by the LoopLoadElimination pass.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13258
llvm-svn: 252016
A profile of an LTO link of Chrome revealed that we were spending some
~30-50% of execution time in the function Constant::getRelocationInfo(),
which is called from TargetLoweringObjectFile::getKindForGlobal() and in turn
from TargetMachine::getNameWithPrefix().
It turns out that we only need the result of getKindForGlobal() when
targeting Mach-O, so this change moves the relevant part of the logic to
TargetLoweringObjectFileMachO.
NFCI.
Differential Revision: http://reviews.llvm.org/D14168
llvm-svn: 252014
The printed name and the parsed assembler names weren't the same.
I'm not sure which name SC prints these as, but I think it's this one.
llvm-svn: 252010
If the requested SGPR was not actually aligned, it was
accepted and rounded down instead of rejected.
Also fix an assert if the range is an invalid size.
llvm-svn: 252009
Summary:
Add support for wasm's select operator, and lower LLVM's select DAG node
to it.
Reviewers: sunfish
Subscribers: dschuff, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D14295
llvm-svn: 252002
Introduce DIPrinter which takes care of rendering DILineInfo and
friends. This allows LLVMSymbolizer class to return a structured data
instead of plain std::strings.
llvm-svn: 251989
Summary:
We now collect all types of dependences including lexically forward
deps not just "interesting" ones.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13256
llvm-svn: 251985
Make printDILineInfo and friends responsible for just rendering the
contents of the structures, demangling should actually be performed
earlier, when we have the information about the originating
SymbolizableModule at hand.
llvm-svn: 251981
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )
This patch adds tablegen pattern matching for this instruction.
Differential Revision: http://reviews.llvm.org/D8841
llvm-svn: 251975
Summary:
When the dependence distance in zero then we have a loop-independent
dependence from the earlier to the later access.
No current client of LAA uses forward dependences so other than
potentially hitting the MaxDependences threshold earlier, this change
shouldn't affect anything right now.
This and the previous patch were tested together for compile-time
regression. None found in LNT/SPEC.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13255
llvm-svn: 251973
Summary:
Before this change, we didn't use to collect forward dependences since
none of the current clients (LV, LDist) required them.
The motivation to also collect forward dependences is a new pass
LoopLoadElimination (LLE) which discovers store-to-load forwarding
opportunities across the loop's backedge. The pass uses both lexically
forward or backward loop-carried dependences to detect these
opportunities.
The new pass also analyzes loop-independent (forward) dependences since
they can conflict with the loop-carried dependences in terms of how the
data flows through memory.
The newly added test only covers loop-carried forward dependences
because loop-independent ones are currently categorized as NoDep. The
next patch will fix this.
The two patches were tested together for compile-time regression. None
found in LNT/SPEC.
Note that with this change LAA provides all dependences rather than just
"interesting" ones. A subsequent NFC patch will remove the now trivial
isInterestingDependence and rename the APIs.
Reviewers: hfinkel
Subscribers: jmolloy, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13254
llvm-svn: 251972
This reverts commit r251926. I believe this is causing an LTO
bootstrapping bot failure
(http://lab.llvm.org:8080/green/job/llvm-stage2-cmake-RgLTO_build/3669/).
Haven't been able to repro it yet, but after looking at the metadata I
am pretty sure I know what is going on.
llvm-svn: 251965
Summary:
Since now Scalar Evolution can create non-add rec expressions for PHI
nodes, it can also create SCEVConstant expressions. This will confuse
replaceCongruentPHIs, which previously relied on the fact that SCEV
could not produce constants in this case.
We will now replace the node with a constant in these cases - or avoid
processing the Phi in case of a type mismatch.
Reviewers: sanjoy
Subscribers: llvm-commits, majnemer
Differential Revision: http://reviews.llvm.org/D14230
llvm-svn: 251938
Bypassing LLVM for this has a number of benefits:
1) Laziness support becomes asm-syntax agnostic (previously lazy jitting didn't
work on Windows as the resolver block was in Darwin asm).
2) For cross-process JITs, it allows resolver blocks and trampolines to be
emitted directly in the target process, reducing cross process traffic.
3) It should be marginally faster.
llvm-svn: 251933
Summary:
Currently, named metadata is linked before the LazilyLinkGlobalValues
list is walked and materialized/linked. As a result, references
from DISubprogram and DIGlobalVariable metadata to yet unmaterialized
functions and variables cause them to be added to the lazy linking
list and their definitions are materialized and linked.
This makes the llvm-link -only-needed option not have the intended
effect when debug information is present, as the otherwise unneeded
functions/variables are still linked in.
Additionally, for ThinLTO I have implemented a mechanism to only link
in debug metadata needed by imported functions. Moving named metadata
linking after lazy GV linking will facilitate applying this mechanism
to the LTO and "llvm-link -only-needed" cases as well.
Reviewers: dexonsmith, tra, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14195
llvm-svn: 251926
No test, since it would depend on what the compiler can optimize/reuse.
My next commit made this bug visible on Linux Release compiles with some
versions of gcc.
llvm-svn: 251909
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.
Darwin does not support this well, so don't use pushes whenever it
would be required.
Differential Revision: http://reviews.llvm.org/D13767
llvm-svn: 251904
Commit 251839 triggers miscompiles on some bots:
http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-fast/builds/13723
(The commit is listed in 13722, but due to an existing failure introduced in
13721 and reverted in 13723 the failure is only visible in 13723)
To verify r251839 is indeed the only change that triggered the buildbot failures
and to ensure the buildbots remain green while investigating I temporarily
revert this commit. At the current state it is unclear if this commit introduced
some miscompile or if it only exposed code to Polly that is subsequently
miscompiled by Polly.
llvm-svn: 251901
ScheduleDAGInstrs doesn't behave differently before or after register
allocation. It was only used in a method of MachineSchedulerBase which
behaved differently in MachineScheduler/PostMachineScheduler. Change
this to let MachineScheduler/PostMachineScheduler just pass in a
parameter to that function.
The order of the LiveIntervals* and bool RemoveKillFlags paramters have
been switched to make out-of-tree code fail instead of unintentionally
passing a value intended for the IsPostRA flag to the (previously
following and default initialized) RemoveKillFlags.
Differential Revision: http://reviews.llvm.org/D14245
llvm-svn: 251883
This was causing a variety of test failures when v2i64
is added as a legal type.
SIFixSGPRCopies should correctly handle the case of vector inputs
to a scalar reg_sequence, so this isn't necessary anymore. This
was hiding some deficiencies in how reg_sequence is handled later,
but this shouldn't be a problem anymore since the register class
copy of a reg_sequence is now done before the reg_sequence.
llvm-svn: 251860
I've found myself pointlessly debugging problems from running
graphics tests with an HSA triple a few times, so stop this from
happening again.
llvm-svn: 251858
This is a redo of r251849 except the tests have been split into arch-specific folders
to hopefully make the bots happy.
This is a follow-up from the discussion in D12965. The block-at-a-time limitation of
SelectionDAG also came up in D13297.
Without the InstCombine change from D12965, I don't expect this patch to make any
difference in the real world because InstCombine does not shrink cases like this in
visitSwitchInst(). But we need to have this CGP safety harness in place before
proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.
I've opted for IR regression tests in the patch because that seems like a clearer way to
test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86
will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473
Before:
BB#0:
mr 4, 3
extsh. 3, 4
ble 0, .LBB0_5
BB#1:
cmpwi 3, 99
bgt 0, .LBB0_9
BB#2:
rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend
li 3, 0
cmplwi 4, 1
beqlr 0
BB#3:
cmplwi 4, 10
bne 0, .LBB0_12
BB#4:
li 3, 1
blr
.LBB0_5:
rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend
cmplwi 3, 65436
beq 0, .LBB0_13
BB#6:
cmplwi 3, 65526
beq 0, .LBB0_15
BB#7:
cmplwi 3, 65535
bne 0, .LBB0_12
BB#8:
li 3, 4
blr
.LBB0_9:
rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend
cmplwi 3, 100
beq 0, .LBB0_14
...
After:
BB#0:
rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons
cmpwi 4, 999
ble 0, .LBB0_5
BB#1:
lis 3, 0
ori 3, 3, 65525
cmpw 4, 3
bgt 0, .LBB0_9
BB#2:
cmplwi 4, 1000
beq 0, .LBB0_14
BB#3:
cmplwi 4, 65436
bne 0, .LBB0_13
BB#4:
li 3, 6
blr
.LBB0_5:
li 3, 0
cmplwi 4, 1
beqlr 0
BB#6:
cmplwi 4, 10
beq 0, .LBB0_12
BB#7:
cmplwi 4, 100
bne 0, .LBB0_13
BB#8:
li 3, 2
blr
.LBB0_9:
cmplwi 4, 65526
beq 0, .LBB0_15
BB#10:
cmplwi 4, 65535
bne 0, .LBB0_13
...
Differential Revision: http://reviews.llvm.org/D13532
llvm-svn: 251857
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.
This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1).
Differential revision: http://reviews.llvm.org/D8943
llvm-svn: 251850
This is a follow-up from the discussion in D12965. The block-at-a-time limitation of
SelectionDAG also came up in D13297.
Without the InstCombine change from D12965, I don't expect this patch to make any
difference in the real world because InstCombine does not shrink cases like this in
visitSwitchInst(). But we need to have this CGP safety harness in place before
proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.
I've opted for IR regression tests in the patch because that seems like a clearer way to
test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86
will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473
Before:
BB#0:
mr 4, 3
extsh. 3, 4
ble 0, .LBB0_5
BB#1:
cmpwi 3, 99
bgt 0, .LBB0_9
BB#2:
rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend
li 3, 0
cmplwi 4, 1
beqlr 0
BB#3:
cmplwi 4, 10
bne 0, .LBB0_12
BB#4:
li 3, 1
blr
.LBB0_5:
rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend
cmplwi 3, 65436
beq 0, .LBB0_13
BB#6:
cmplwi 3, 65526
beq 0, .LBB0_15
BB#7:
cmplwi 3, 65535
bne 0, .LBB0_12
BB#8:
li 3, 4
blr
.LBB0_9:
rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend
cmplwi 3, 100
beq 0, .LBB0_14
...
After:
BB#0:
rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons
cmpwi 4, 999
ble 0, .LBB0_5
BB#1:
lis 3, 0
ori 3, 3, 65525
cmpw 4, 3
bgt 0, .LBB0_9
BB#2:
cmplwi 4, 1000
beq 0, .LBB0_14
BB#3:
cmplwi 4, 65436
bne 0, .LBB0_13
BB#4:
li 3, 6
blr
.LBB0_5:
li 3, 0
cmplwi 4, 1
beqlr 0
BB#6:
cmplwi 4, 10
beq 0, .LBB0_12
BB#7:
cmplwi 4, 100
bne 0, .LBB0_13
BB#8:
li 3, 2
blr
.LBB0_9:
cmplwi 4, 65526
beq 0, .LBB0_15
BB#10:
cmplwi 4, 65535
bne 0, .LBB0_13
...
Differential Revision: http://reviews.llvm.org/D13532
llvm-svn: 251849
This reverts commit r251837, due to a number of bot failures of the form:
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-fast/llvm.obj/tools/llvm-link/Release+Asserts/llvm-link.o:llvm-link.cpp:function
loadIndex(llvm::LLVMContext&, llvm::Module const*): error: undefined
reference to
'llvm::object::FunctionIndexObjectFile::create(llvm::MemoryBufferRef,
llvm::LLVMContext&, llvm::Module const*, bool)'
/home/grosser/buildslave/perf-x86_64-penryn-O3-polly-fast/llvm.obj/tools/llvm-link/Release+Asserts/llvm-link.o:llvm-link.cpp:function
loadIndex(llvm::LLVMContext&, llvm::Module const*): error: undefined
reference to 'llvm::object::FunctionIndexObjectFile::takeIndex()'
I'm not sure why these are happening - I added Object to the requred
libraries in tools/llvm-link/LLVMBuild.txt and the LLVM_LINK_COMPONENTS
in tools/llvm-link/CMakeLists.txt. Confirmed for my build that these
symbols come out of libLLVMObject.a. What am I missing?
llvm-svn: 251841
Summary:
This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13974
llvm-svn: 251839
Summary:
Support for necessary linkage changes and symbol renaming during
ThinLTO function importing.
Also includes llvm-link support for manually importing functions
and associated llvm-link based tests.
Note that this does not include support for intelligently importing
metadata, which is currently imported duplicate times. That support will
be in the follow-on patch, and currently is ignored by the tests.
Reviewers: dexonsmith, joker.eph, davidxl
Subscribers: tobiasvk, tejohnson, llvm-commits
Differential Revision: http://reviews.llvm.org/D13515
llvm-svn: 251837
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality.
Consider a simple example shown below:
A
|
B⇆C
|
D
When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D.
This patch filters those cold blocks off from the loop chain by comparing the ratio:
LoopBBFreq / LoopFreq
to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa.
Differential revision: http://reviews.llvm.org/D11662
llvm-svn: 251833
1) PR25154. This is basically a repeat of PR18102, which was fixed in
r200201, and broken again by r234430. The latter changed which of the
store nodes was merged into from the first to the last. Thus, we now
also need to prefer merging a later store at a given address into the
target node, instead of an earlier one.
2) While investigating that, I also realized I'd introduced a bug in
r236850. There, I removed a check for alignment -- not realizing that
nothing except the alignment check was ensuring that none of the stores
were overlapping! This is a really bogus way to ensure there's no
aliased stores.
A better solution to both of these issues is likely to always use the
code added in the 'if (UseAA)' branches which rearrange the chain based
on a more principled analysis. I'll look into whether that can be used
always, but in the interest of getting things back to working, I think a
minimal change makes sense.
llvm-svn: 251816
Summary:
SCEV Predicates represent conditions that typically cannot be derived from
static analysis, but can be used to reduce SCEV expressions to forms which are
usable for different optimizers.
ScalarEvolution now has the rewriteUsingPredicate method which can simplify a
SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using
SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions
need to be made a new SCEV Predicate would be created and added to the set.
Each time after calling getSCEV, the user will call the rewriteUsingPredicate
method.
We add two types of predicates
SCEVPredicateSet - implements a set of predicates
SCEVEqualPredicate - tests for equality between two SCEV expressions
We use the SCEVEqualPredicate to re-implement stride versioning. Every time we
version a stride, we will add a SCEVEqualPredicate to the context.
Instead of adding specific stride checks, LoopVectorize now adds a more
generic SCEV check.
We only need to add support for this in the LoopVectorizer since this is the
only pass that will do stride versioning.
Reviewers: mzolotukhin, anemet, hfinkel, sanjoy
Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13595
llvm-svn: 251800
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. It turns out that the new code path taken due to
legalizing a scalar_to_vector of i64 -> v2i64 exposes a missing check in a
micro optimization to change a load followed by a scalar_to_vector into a
load and splat instruction on PPC.
llvm-svn: 251798
Summary:
The new function sys::path::user_cache_directory tries to discover
a directory suitable for cache storage for current system user.
On Windows and Darwin it returns a path to system-specific user cache directory.
On Linux it follows XDG Base Directory Specification, what is:
- use non-empty $XDG_CACHE_HOME env var,
- use $HOME/.cache.
Reviewers: chapuni, aaron.ballman, rafael
Subscribers: rafael, aaron.ballman, llvm-commits
Differential Revision: http://reviews.llvm.org/D13801
llvm-svn: 251784
1. Added a set of public interfaces in InstrProfRecord
class to access (read/write) value profile data.
2. Changed IndexedProfile reader and writer code to
use the newly defined interfaces and hide implementation
details.
3. Added a couple of unittests for value profiling:
- Test new interfaces to get and set value profile data
- Test value profile data merging with various scenarios.
No functional change is expected. The new interfaces will also
make it possible to change on-disk format of value prof data
to be more compact (to be submitted).
llvm-svn: 251771
Have `getConstantEvolutionLoopExitValue` work correctly with multiple
entry loops.
As far as I can tell, `getConstantEvolutionLoopExitValue` never did the
right thing for multiple entry loops; and before r249712 it would
silently return an incorrect answer. r249712 changed SCEV to fail an
assert on a multiple entry loop, and this change fixes the underlying
issue.
llvm-svn: 251770
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.
Differential Revision: http://reviews.llvm.org/D14108
llvm-svn: 251764
This adds support for COFF I386. This is sufficient for code execution in a
32-bit JIT, though, imported symbols need to custom lowered for the redirection.
llvm-svn: 251761
Prevent `createNodeFromSelectLikePHI` from creating SCEV expressions
that break LCSSA.
A better fix for the same issue is to teach SCEVExpander to not break
LCSSA by inserting PHI nodes at appropriate places. That's planned for
the future.
Fixes PR25360.
llvm-svn: 251756
The initial coverage checking code for sample records failed to count
records inside inlined profiles. This change fixes the oversight.
llvm-svn: 251752
This is a bit ugly, but has a few advantages:
* Archive is now easy to copy since there is no Archive -> Child -> Archive
loop.
* It makes it clear that we already checked for errors when finding the Child
data.
llvm-svn: 251750
from its pass harness by providing a lambda to query for AA results.
This allows the legacy pass to easily provide a lambda that uses the
special helpers to construct function AA results from a legacy CGSCC
pass. With the new pass manager (the next patch) the lambda just
directly wraps the intuitive query API.
llvm-svn: 251715
Summary:
When forming expressions for phi nodes having an incoming value from
outside the loop A and a value coming from the previous iteration B
we were forming an AddRec if:
- B was an AddRec
- the value A was equal to the value for B at iteration -1 (or equal
to the value of B shifted by one iteration, at iteration 0)
In this case, we were computing the expression to be the expression of
B, shifted by one iteration.
This changes generalizes the logic above by removing the restriction that
B needs to be an AddRec. For this we introduce two expression rewriters
that allow us to
- shift an expression by one iteration
- get the value of an expression at iteration 0
This allows us to get SCEV expressions for PHI nodes when these expressions
are not AddRecExprs.
Reviewers: sanjoy
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D14175
llvm-svn: 251700
Update the discriminator assignment algorithm
* If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it.
* If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt.
original code:
; #1 int foo(int i) {
; #2 if (i == 3 || i == 5) return 100; else return 99;
; #3 }
; i == 3: discriminator 0
; i == 5: discriminator 2
; return 100: discriminator 1
; return 99: discriminator 3
llvm-svn: 251689
Update the discriminator assignment algorithm
* If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it.
* If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt.
original code:
; #1 int foo(int i) {
; #2 if (i == 3 || i == 5) return 100; else return 99;
; #3 }
; i == 3: discriminator 0
; i == 5: discriminator 2
; return 100: discriminator 1
; return 99: discriminator 3
llvm-svn: 251685
* If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it.
* If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt.
original code:
; #1 int foo(int i) {
; #2 if (i == 3 || i == 5) return 100; else return 99;
; #3 }
; i == 3: discriminator 0
; i == 5: discriminator 2
; return 100: discriminator 1
; return 99: discriminator 3
llvm-svn: 251680
Introduce LLVMSymbolizer::symbolizeInlinedCode() instead of switching
on PrintInlining option passed to the constructor. This will be needed
once we retrun structured data (instead of std::string) from
LLVMSymbolizer and move printing logic out.
llvm-svn: 251675
Summary:
This is mostly NFC. It is a first step in cleaning up LLVMSymbolize
library. It removes "ModuleInfo" class which bundles together ObjectFile
and its debug info context in favor of:
* abstract SymbolizableModule in public headers;
* SymbolizableObjectFile subclass in implementation.
Additionally, SymbolizableObjectFile is now created via factory, so we
can properly detect object parsing error at this stage instead of keeping
the broken half-parsed object. As a next step, we would be able to
propagate the error all the way back to the library user.
Further improvements might include:
* factoring out the logic of finding appropriate file with debug info
for a given object file, and caching all parsed object files into a
separate class [A].
* factoring out DILineInfo rendering [B].
This would make what is now a heavyweight "LLVMSymbolizer" a relatively
straightforward class, that calls into [A] to turn filepath into a
SymbolizableModule, delegates actual symbolization to concrete SymbolizableModule
implementation, and lets [C] render the result.
Reviewers: dblaikie, echristo, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14099
llvm-svn: 251662
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.
Differential Revision: http://reviews.llvm.org/D14050
llvm-svn: 251659
transformations in FunctionAttrs rather than building a new one each
time.
This isn't trivial because there are different heuristics from different
passes for exactly what set they want. The primary difference is whether
an *overridable* function completely disables the synthesis of
attributes. I've modeled this by directly testing for overridable, and
using the common set that excludes external and opt-none functions.
This does cause some changes by disabling more optimizations in the face
of opt-none. Specifically, we were still optimizing *calls* to opt-none
functions based on their attributes, just not the bodies. It seems
better to be conservative on both fronts given the intended semanticas
here (best effort to not assume or disturb anything). I've not tried to
test this change as it seems complex, brittle, and not important to the
implicit contract of opt-none. Instead, it seems more like a choice that
should be dictated by the simplified implementation and the change to be
acceptable differences within the space of opt-none.
A big benefit here is that these transformations no longer rely on the
legacy pass manager's SCC types, they just work on generic sets of
function pointers. This will make it easy to re-use their logic in the
new pass manager.
I've also made the transforms static functions instead of members where
trivial while I was touching the signatures.
llvm-svn: 251640
This was discovered to be necessary while running memchr-01.ll with
-verify-machinstrs, because it is not allowed to have a phys reg live
accross block boundaries while on SSA form, if the register is
allocatable (expect in entry block and landing pads).
In this test case, stringRRE pseudos are expanded after isel by adding
a loop block which produces a live out CC register. To make the test
pass, it was also necessary to not say that StringRRELoop pseudo uses
R0L, this is only true for the StringRRE opcode.
-verify-machineinstrs added to memchr-01.ll test.
New test case int-cmp-51.ll to test that MachineCSE can eliminate
an identical compare (which it couldn't do before).
Reviewed by Ulrich Weigand
llvm-svn: 251634
Summary:
This commit resolves wrong opcodes for ll and sc instructions for r6 architecutres, which were generated in method MipsTargetLowering::emitAtomicBinary.
Author: Jelena.Losic
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13593
llvm-svn: 251629
Summary:
ARMv6KZ cores were set up incorrectly in ARM.td; also, the SMI mnemonic
(the old name for SMC, as defined in ARMv6KZ) wasn't supported.
Reviewers: jmolloy, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D14154
llvm-svn: 251627
This patch unify the 39-bit and 42-bit mapping for aarch64 to use only
one instrumentation algorithm. This removes compiler flag
SANITIZER_AARCH64_VMA requirement for MSAN on aarch64.
The mapping to use now is for 39 and 42-bits:
0x00000000000ULL-0x01000000000ULL MappingDesc::INVALID
0x01000000000ULL-0x02000000000ULL MappingDesc::SHADOW
0x02000000000ULL-0x03000000000ULL MappingDesc::ORIGIN
0x03000000000ULL-0x04000000000ULL MappingDesc::SHADOW
0x04000000000ULL-0x05000000000ULL MappingDesc::ORIGIN
0x05000000000ULL-0x06000000000ULL MappingDesc::APP
0x06000000000ULL-0x07000000000ULL MappingDesc::INVALID
0x07000000000ULL-0x08000000000ULL MappingDesc::APP
And only for 42-bits:
0x08000000000ULL-0x09000000000ULL MappingDesc::INVALID
0x09000000000ULL-0x0A000000000ULL MappingDesc::SHADOW
0x0A000000000ULL-0x0B000000000ULL MappingDesc::ORIGIN
0x0B000000000ULL-0x0F000000000ULL MappingDesc::INVALID
0x0F000000000ULL-0x10000000000ULL MappingDesc::APP
0x10000000000ULL-0x11000000000ULL MappingDesc::INVALID
0x11000000000ULL-0x12000000000ULL MappingDesc::APP
0x12000000000ULL-0x17000000000ULL MappingDesc::INVALID
0x17000000000ULL-0x18000000000ULL MappingDesc::SHADOW
0x18000000000ULL-0x19000000000ULL MappingDesc::ORIGIN
0x19000000000ULL-0x20000000000ULL MappingDesc::INVALID
0x20000000000ULL-0x21000000000ULL MappingDesc::APP
0x21000000000ULL-0x26000000000ULL MappingDesc::INVALID
0x26000000000ULL-0x27000000000ULL MappingDesc::SHADOW
0x27000000000ULL-0x28000000000ULL MappingDesc::ORIGIN
0x28000000000ULL-0x29000000000ULL MappingDesc::SHADOW
0x29000000000ULL-0x2A000000000ULL MappingDesc::ORIGIN
0x2A000000000ULL-0x2B000000000ULL MappingDesc::APP
0x2B000000000ULL-0x2C000000000ULL MappingDesc::INVALID
0x2C000000000ULL-0x2D000000000ULL MappingDesc::SHADOW
0x2D000000000ULL-0x2E000000000ULL MappingDesc::ORIGIN
0x2E000000000ULL-0x2F000000000ULL MappingDesc::APP
0x2F000000000ULL-0x39000000000ULL MappingDesc::INVALID
0x39000000000ULL-0x3A000000000ULL MappingDesc::SHADOW
0x3A000000000ULL-0x3B000000000ULL MappingDesc::ORIGIN
0x3B000000000ULL-0x3C000000000ULL MappingDesc::APP
0x3C000000000ULL-0x3D000000000ULL MappingDesc::INVALID
0x3D000000000ULL-0x3E000000000ULL MappingDesc::SHADOW
0x3E000000000ULL-0x3F000000000ULL MappingDesc::ORIGIN
0x3F000000000ULL-0x40000000000ULL MappingDesc::APP
And although complex it provides a better memory utilization that
previous one.
llvm-svn: 251624
Summary:
The microMIPS register class GPRMM16 does not contain the $zero register.
However, MipsSEDAGToDAGISel::replaceUsesWithZeroReg() would replace uses
of the $dst register:
[d]addiu, $dst, $zero, 0
with the $zero register, without checking for membership in the register
class of the target machine operand.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13984
llvm-svn: 251622
Since the verifier will give false reports if it incorrectly thinks MI is
loading or storing using an FI, it is necessary to scan memoperands and
find out how the FI is used in the instruction. This should be relatively
rare.
Needed to make CodeGen/SystemZ/spill-01.ll pass, which now runs with this flag.
Reviewed by Quentin Colombet.
llvm-svn: 251620
Summary:
Conversion opcode name format should be f64.convert_u/i64 not f64_convert_u
Author: s3ththompson
Reviewers: jfb
Subscribers: sunfish, jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D14160
llvm-svn: 251613
Clang driver now injects -u<hook_var> flag in the linker
command line, in which case user function is not needed
any more.
Differential Revision: http://reviews.llvm.org/D14033
llvm-svn: 251612
This was a layering violation in ScheduleDAGInstrs (and
MachineSchedulerBase) they both shouldn't know directly whether they are
used by the PostMachineScheduler or the MachineScheduler.
llvm-svn: 251608
Somewhat shockingly for an analysis pass which is computing constant ranges, LVI did not understand the ranges provided by range metadata.
As part of this change, I included a change to CVP primarily because doing so made it much easier to write small self contained test cases. CVP was previously only handling the non-local operand case, but given that LVI can sometimes figure out information about instructions standalone, I don't see any reason to restrict this. There could possibly be a compile time impact from this, but I suspect it should be minimal. If anyone has an example which substaintially regresses, please let me know. I could restrict the block local handling to ICmps feeding Terminator instructions if needed.
Note that this patch continues a somewhat bad practice in LVI. In many cases, we know facts about values, and separate context sensitive facts about values. LVI makes no effort to distinguish and will frequently cache the same value fact repeatedly for different contexts. I would like to change this, but that's a large enough change that I want it to go in separately with clear documentation of what's changing. Other examples of this include the non-null handling, and arguments.
As a meta comment: the entire motivation of this change was being able to write smaller (aka reasonable sized) test cases for a future patch teaching LVI about select instructions.
Differential Revision: http://reviews.llvm.org/D13543
llvm-svn: 251606
Follow on to http://reviews.llvm.org/D13074, implementing something pointed out by Sanjoy. His truth table from his comment on that bug summarizes things well:
LHS | RHS | LHS >=s RHS | LHS implies RHS
0 | 0 | 1 (0 >= 0) | 1
0 | 1 | 1 (0 >= -1) | 1
1 | 0 | 0 (-1 >= 0) | 0
1 | 1 | 1 (-1 >= -1) | 1
The key point is that an "i1 1" is the value "-1", not "1".
Differential Revision: http://reviews.llvm.org/D13756
llvm-svn: 251597
The most common use case is when eliminating redundant range checks in an example like the following:
c = a[i+1] + a[i];
Note that all the smarts of the transform (the implication engine) is already in ValueTracking and is tested directly through InstructionSimplify.
Differential Revision: http://reviews.llvm.org/D13040
llvm-svn: 251596
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.
llvm-svn: 251592
We cannot form ctr-based loops around function calls, including calls to
__tls_get_addr used for PIC TLS variables. References to such TLS variables,
however, might be buried within constant expressions, and so we need to search
the entire constant expression to be sure that no references to such TLS
variables exist.
Fixes PR25256, reported by Eric Schweitz. This is a slightly-modified version
of the patch suggested by Eric in the bug report, and a test case I created.
llvm-svn: 251582
As a follow-up to r251566, do the same for the other optionally-supported
register classes (mostly for vector registers). Don't return an unavailable
register class (which would cause an assert later), but fail cleanly when
provided an unsupported inline asm constraint.
llvm-svn: 251575
At the LLVM level this ABI is essentially a minimal modification of AAPCS to
support 16-byte alignment for vector types and the stack.
llvm-svn: 251570
These MachO file directives are used by linkers and other tools to provide
compatibility information, much like the existing .ios_version_min and
.macosx_version_min.
llvm-svn: 251569
This adds the flag -mllvm -sample-profile-check-coverage=N to the
SampleProfile pass. N is the percent of input sample records that the
user expects to apply. If the pass does not use N% (or more) of the
sample records in the input, it emits a warning.
This is useful to detect some forms of stale profiles. If the code has
drifted enough from the original profile, there will be records that do
not match the IR anymore.
This will not detect cases where a sample profile record for line L is
referring to some other instructions that also used to be at line L.
llvm-svn: 251568
It looks like this broke the stage 2 builder:
http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/6989/
Original commit message:
AliasSetTracker does not need to convert the access mode to ModRefAccess if the
new visited UnknownInst has only 'REF' modrefinfo to existing pointers in the
sets.
Patch by Andrew Zhogin!
llvm-svn: 251562
This teaches SCEV to compute //max// backedge taken counts for loops
like
for (int i = k; i != 0; i >>>= 1)
whatever();
SCEV yet cannot represent the exact backedge count for these loops, and
this patch does not change that. This is really geared towards teaching
SCEV that loops like the above are *not* infinite.
llvm-svn: 251558
Summary:
If P branches to Q conditional on C and Q branches to R conditional on
C' and C => C' then the branch conditional on C' can be folded to an
unconditional branch.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13972
llvm-svn: 251557
Add a couple of helper methods to make the primary
raw profile reader interface's implementation more
readable. It also hides more format details. This
patch has no functional change.
llvm-svn: 251546
The pass was keeping around a lot of per-function data (visited blocks,
edges, dominance, etc) that is just taking up memory for no reason. In
fact, from function to function it could potentially confuse the
propagator since some maps are indexed by line offsets which can be
common between functions.
llvm-svn: 251531
In getArgModRefInfo we consider all arguments as having MRI_ModRef.
However for arguments marked with readonly attribute we can return
more precise answer - MRI_Ref.
Differential Revision: http://reviews.llvm.org/D13992
llvm-svn: 251525
I think these were affected by a change way back when to stop printing newlines in Value::dump() by default. This change simply allows the debug output to be readable.
NFC.
llvm-svn: 251517
Summary:
This patch handles assembly and disassembly, but not codegen, as of yet.
Additionally, it fixes a bug whereby SP and PC as shifted-reg operands
were treated as predictable in ARMv7 Thumb; and it enables the tests
for invalid and unpredictable instructions to run on both ARMv7 and ARMv8.
Reviewers: jmolloy, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D14141
llvm-svn: 251516
When checking if an indirect global (a global with pointer type) is only assigned by allocation functions, first check if the global is itself initialized. If it is, it's not only assigned by allocation functions.
This fixes PR25309. Thanks to David Majnemer for reducing the test case!
llvm-svn: 251508
Summary:
This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13974
llvm-svn: 251492
Change InstrProfReaderIndex from typedef into a wrapper
class with helper methods. This makes the index profile
reader code more readable. It also hides the implementation
detail of the index format and make it more flexible to allow
support different (or more than one) format in the future.
llvm-svn: 251491
This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now.
llvm-svn: 251490
cntlz is the old POWER mnemonic. cntlzw is the PowerPC mnemonic.
This change fixes an issue when -no-integrated-as: The opcode cntlz is
unrecognized by gas
Alias the POWER mnemonic cntlz[.] to the PowerPC mnemonic cntlzw[.]
This is done for because the POWER cntlz mnemonic has be used by LLVM for
a very long time. We need to make sure that assembly programs
that are using the cntlz[.] do not break with this change.
Change PowerPC tests to reflect the insn change from cntlz to cntlzw.
Add assembly test to verify cntlz[.] is encoded correctly.
Patch by Tom Rix!
llvm-svn: 251489
Summary: This will allow a later patch to `JumpThreading` use this functionality.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13971
llvm-svn: 251488
Summary:
Don't call `computeKnownBitsFromRangeMetadata` for extended loads --
this can cause a mismatch between the width of the !range metadata and
the width of the APInt's accumulating `KnownZero` (and `KnownOne` in the
future). This isn't a problem now, but will be after a future change.
Note: this can be made more aggressive in the future.
Reviewers: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14107
llvm-svn: 251486
r248010 changed the -debug output to use short ids, but did not
similarly modify the graph printer. Change to be consistent, for ease of
cross-reference.
llvm-svn: 251465
CatchReturnInst has side-effects: it runs a destructor. This destructor
could conceivably run forever/call exit/etc. and should not be removed.
llvm-svn: 251461