This patch changes the analysis diagnostics produced when loops with
floating-point recurrences or memory operations are identified. The new messages
say "cannot prove it is safe to reorder * operations; allow reordering by
specifying #pragma clang loop vectorize(enable)". Depending on the type of
diagnostic the message will include additional options such as ffast-math or
__restrict__.
This patch also allows the vectorize(enable) pragma to override the low pointer
memory check threshold. When the hint is given a higher threshold is used.
See the clang patch for the options produced for each diagnostic.
llvm-svn: 246187
Constant propagation for single precision math functions (such as
tanf) is already working, but was not enabled. This patch enables
these for many single-precision functions, and adds respective test
cases.
Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf
exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf
llvm-svn: 246186
Constant propagation for single precision math functions (such as
tanf) is already working, but was not enabled. This patch enables
these for many single-precision functions, and adds respective test
cases.
Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf
exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf
llvm-svn: 246158
Unlike scalar operations, we can perform vector operations on element types that
are smaller than the native integer types. We type-promote scalar operations if
they are smaller than a native type (e.g., i8 arithmetic is promoted to i32
arithmetic on Arm targets). This patch detects and removes type-promotions
within the reduction detection framework, enabling the vectorization of small
size reductions.
In the legality phase, we look through the ANDs and extensions that InstCombine
creates during promotion, keeping track of the smaller type. In the
profitability phase, we use the smaller type and ignore the ANDs and extensions
in the cost model. Finally, in the code generation phase, we truncate the result
of the reduction to allow InstCombine to rewrite the entire expression in the
smaller type.
This fixes PR21369.
http://reviews.llvm.org/D12202
Patch by Matt Simpson <mssimpso@codeaurora.org>!
llvm-svn: 246149
Globals in address spaces other than one may have 0 as a valid address,
so we should not assume that they can be null.
Reviewed by Philip Reames.
llvm-svn: 246137
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence.
We do need to make sure that stores before the fence can't be eliminated even if there's another store to the same location after the fence. In theory, we could reorder the second store above the fence and *then* eliminate the former, but we can't do this if the stores are on opposite sides of the fence.
Note: While more aggressive then what's there, this patch is still implementing a really conservative ordering. In particular, I'm not trying to exploit undefined behavior via races, or the fact that the LangRef says only 'atomic' accesses are ordered w.r.t. fences.
Differential Revision: http://reviews.llvm.org/D11434
llvm-svn: 246134
When computing base pointers, we introduce new instructions to propagate the base of existing instructions which might not be bases. However, the algorithm doesn't make any effort to recognize when the new instruction to be inserted is the same as an existing one already in the IR. Since this is happening immediately before rewriting, we don't really have a chance to fix it after the pass runs without teaching loop passes about statepoints.
I'm really not thrilled with this patch. I've rewritten it 4 different ways now, but this is the best I've come up with. The case where the new instruction is just the original base defining value could be merged into the existing algorithm with some complexity. The problem is that we might have something like an extractelement from a phi of two vectors. It may be trivially obvious that the base of the 0th element is an existing instruction, but I can't see how to make the algorithm itself figure that out. Thus, I resort to the call to SimplifyInstruction instead.
Note that we can only adjust the instructions we've inserted ourselves. The live sets are still being tracked in side structures at this point in the code. We can't easily muck with instructions which might be in them. Long term, I'm really thinking we need to materialize the live pointer sets explicitly in the IR somehow rather than using side structures to track them.
Differential Revision: http://reviews.llvm.org/D12004
llvm-svn: 246133
This patch ensures that every analysis diagnostic produced by the vectorizer
will be printed if the loop has a vectorization hint on it. The condition has
also been improved to prevent printing when a disabling hint is specified.
llvm-svn: 246132
This is a one-line-change patch that moves the update to UnhandledWeights to the correct position: it should be updated for all clusters instead of just range clusters.
Differential Revision: http://reviews.llvm.org/D12391
llvm-svn: 246129
As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine.
Note: Duplicate case values are disallowed by the LangRef and verifier.
Differential Revision: http://reviews.llvm.org/D11995
llvm-svn: 246125
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.
Reviewers: jholewinski, meheff, jingyue
Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D12377
llvm-svn: 246107
Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly.
Differential Revision: http://reviews.llvm.org/D12166#fae6eca7
llvm-svn: 246104
Things of note:
- Other linkage types aren't handled yet. We'll figure it out with dynamic linking.
- Special LLVM globals are either ignored, or error out for now.
- TLS isn't supported yet (WebAssembly will have threads later).
- There currently isn't a syntax for alignment, I left it in a comment so it's easy to hook up.
- Undef is convereted to whatever the type's appropriate null value is.
- assert versus report_fatal_error: follow what other AsmPrinters do, and assert only on what should have been caught elsewhere.
llvm-svn: 246092
This was causing problems when some functions use a GuardReg and some
don't as can happen when mixing SelectionDAG and FastISel generated
functions.
llvm-svn: 246075
If you're going to realign %sp to get object alignment properly (which
the code does), and stack offsets and alignments are calculated going
down from %fp (which they are), then the total stack size had better
be a multiple of the alignment. LLVM did indeed ensure that.
And then, after aligning, the sparc frame code added 96 (for sparcv8)
to the frame size, making any requested alignment of 64-bytes or
higher *guaranteed* to be misaligned. The test case added with r245668
even tests this exact scenario, and asserted the incorrect behavior,
which I somehow failed to notice. D'oh.
This change fixes the frame lowering code to align the stack size
*after* adding the spill area, instead.
Differential Revision: http://reviews.llvm.org/D12349
llvm-svn: 246042
This is a fix for disassembling unusual instruction sequences in 64-bit
mode w.r.t the CALL rel16 instruction. It might be desirable to move the
check somewhere else, but it essentially mimics the special case
handling with JCXZ in 16-bit mode.
The current behavior accepts the opcode size prefix and causes the
call's immediate to stop disassembling after 2 bytes. When debugging
sequences of instructions with this pattern, the disassembler output
becomes extremely unreliable and essentially useless (if you jump midway
into what lldb thinks is a unified instruction, you'll lose %rip). So we
ignore the prefix and consume all 4 bytes when disassembling a 64-bit
mode binary.
Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S.
is defined as:
Indicates an instruction syntax that requires an address override
prefix in 64-bit mode and is not supported. Using an address
override prefix in 64-bit mode may result in model-specific
execution behavior. (Vol. 2A 3-7)
Since 0x66 is an operand override prefix we should be OK (although we
may want to warn about 0x67 prefixes to 0xe8). On the CPUs I tested
with, they all ignore the 0x66 prefix in 64-bit mode.
Patch by Matthew Barney!
Differential Revision: http://reviews.llvm.org/D9573
llvm-svn: 246038
This was only added to preserve the old ScalarRepl's use of SSAUpdater
which was originally to avoid use of dominance frontiers. Now, we only
need a domtree, and we'll need a domtree right after this pass as well
and so it makes perfect sense to always and only use the dom-tree
powered mem2reg. This was flag-flipper earlier and has stuck reasonably
so I wanted to gut the now-dead code out of SROA before we waste more
time with it. Among other things, this will make passmanager porting
easier.
llvm-svn: 246028
The binaries containing the linked DWARF generated by dsymutil are not
standard relocatable object files like emitted did previsously. They should be
dSYM companion files, which means they have a different file type in the
header, but also a couple other peculiarities:
- they contain the segments and sections from the original binary in their
load commands, but not the actual contents. This means they get an address
and a size, but their offset is always 0 (but these are not virtual sections)
- they also conatin all the defined symbols from the original binary
This makes MC a really bad fit to emit these kind of binaries. The approach
that was used in this patch is to leverage MC's section layout for the
debug sections, but to use a replacement for MachObjectWriter that lives
in MachOUtils.cpp. Some of the low-level helpers from MachObjectWriter
were reused too.
llvm-svn: 246012
llvm-dsymutil needs to emit dSYM companion bundles. These are binary files
that replicate some of the orignal binary file properties (sections and
symbols). To get acces to these properties, pass the binary path in the
debug map.
llvm-svn: 246011
Summary: When comparing basic blocks, there is an additional check that two Value*'s should have the same ID, which interferes with merging equivalent constants of different kinds (such as a ConstantInt and a ConstantPointerNull in the included testcase). The cmpValues function already ensures that the two values in each function are the same, so removing this check should not cause incorrect merging.
Also, the type comparison is redundant, based on reviewing the code and testing on the test suite and several large LTO bitcodes.
Author: jrkoenig
Reviewers: nlewycky, jfb, dschuff
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D12302
llvm-svn: 246001
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.
Reviewers: nadav, asl, eugenis
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1622
llvm-svn: 245990
There was an issue in the test setup because the test requires an arch that
wasn't filtered by the lit.local.cfg, but given the set of bots that failed,
I'm not confident this is the (only) issue. So this commit also adds more
output to the test to help me track down the failure if it happens again.
Original commit message:
[dsymutil] Rewrite thumb triple names in user visible messages.
We autodetect triples from the input file(s) while reading the mach-o debug map.
As we need to create a Target from those triples, we always chose the thumb
variant (because the arm variant might not be 'instantiable' eg armv7m). The
user visible architecture names should still be 'arm' and not 'thumb' variants
though.
llvm-svn: 245988
Extend signed relational comparison instrumentation with a special
case for comparisons with -1. This fixes an MSan false positive when
such comparison is used as a sign bit test.
https://llvm.org/bugs/show_bug.cgi?id=24561
llvm-svn: 245980
When lowering switch statement, if bit tests are used then LLVM will always generates a jump to the default statement in the last bit test. However, this is not necessary when all cases in bit tests cover a contiguous range. This is because when generating the bit tests header MBB, there is a range check that guarantees cases in bit tests won't go outside of [low, high], where low and high are minimum and maximum case values in the bit tests. This patch checks if this is the case and then doesn't emit jump to default statement and hence saves a bit test and a branch.
Differential Revision: http://reviews.llvm.org/D12249
llvm-svn: 245976
This reverts commit r245960.
Multiple bots are failing on the new test. It seemd like llvm-dsymutil exits with an error. Investigating.
llvm-svn: 245964
We autodetect triples from the input file(s) while reading the mach-o debug map.
As we need to create a Target from those triples, we always chose the thumb
variant (because the arm variant might not be 'instantiable' eg armv7m). The
user visible architecture names should still be 'arm' and not 'thumb' variants
though.
llvm-svn: 245960
The loop minimum iterations check below ensures the loop has enough trip count so the generated
vector loop will likely be executed, and it covers the overflow check.
Differential Revision: http://reviews.llvm.org/D12107.
llvm-svn: 245952
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.
This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.
A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx
This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449
Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.
Differential Revision: http://reviews.llvm.org/D12288
llvm-svn: 245950
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.
Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316
llvm-svn: 245924
It doesn't solve the problem, when for example we load something, and
then assume that it is the same as some constant value, because
globalopt will fail on unknown load instruction. The proposed solution
would be to skip some instructions that we can't evaluate and they are
safe to skip (f.e. load, assume and many others) and see if they are
required to perform optimization (f.e. we don't care about ephemeral
instructions that may appear using @llvm.assume())
http://reviews.llvm.org/D12266
llvm-svn: 245919
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.
llvm-svn: 245907
Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting.
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D12298
llvm-svn: 245886
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass. The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation. I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.
llvm-svn: 245862
The FP16_TO_FP node only uses the bottom 16 bits of its input, so the
following pattern can be optimised by removing the AND:
(FP16_TO_FP (AND op, 0xffff)) -> (FP16_TO_FP op)
This is a common pattern for ARM targets when functions have __fp16
arguments, as they are passed as floats (so that they get passed in the
correct registers), but then bitcast and truncated to ignore the top 16
bits.
llvm-svn: 245832
Not only do we not need to do anything to read correct values from the
object files, but the current logic actually wrongly applies twice the
section base address when there is no LoadedObjectInfo passed to the
DWARFContext creation (as the added test shows).
Simply do not apply any relocations on the mach-o debug info if there is
no load offset to apply.
llvm-svn: 245807
This patch adds all the refactored tests in new files, the old
tests will be removed by a followup commit.
Thanks to D. Blaikie for all the feedback.
llvm-svn: 245803
Summary:
WinEHPrepare is going to require that cleanuppad and catchpad produce values
of token type which are consumed by any cleanupret or catchret exiting the
pad. This change updates the signatures of those operators to require/enforce
that the type produced by the pads is token type and that the rets have an
appropriate argument.
The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and
similarly for `CleanupReturnInst`/`CleanupPadInst`). To accommodate that
restriction, this change adds a notion of an operator constraint to both
LLParser and BitcodeReader, allowing appropriate sentinels to be constructed
for forward references and appropriate error messages to be emitted for
illegal inputs.
Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad
predecessor must have no other predecessors; this ensures that WinEHPrepare
will see the expected linear relationship between sibling catches on the
same try.
Lastly, remove some superfluous/vestigial casts from instruction operand
setters operating on BasicBlocks.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12108
llvm-svn: 245797
Some debug info was drastically out of date, from the days where we used
to emit a list of length one (with a single null entry) rather than an
empty list (or, more recently, no list at all) for list fields that have
no elements.
llvm-svn: 245796
There was already a good error path for this. Added a test for it & made
a minor code change to ensure the error path was actually reached,
rather than crashing before we got that far.
llvm-svn: 245795
Summary:
__shared__ variable may now emit undef value as initializer, do not
throw error on that.
Test Plan: test/CodeGen/NVPTX/global-addrspace.ll
Patch by Xuetian Weng
Reviewers: jholewinski, tra, jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D12242
llvm-svn: 245785
Summary:
Merge functions previously relied on unsigned comparisons of pointer values to
order functions. This caused observable non-determinism in the compiler for
large bitcode programs. Basically, opt -mergefuncs program.bc | md5sum produces
different hashes when run repeatedly on the same machine. Differing output was
observed on three large bitcodes, but it was less frequent on the smallest file.
It is possible that this only manifests on the large inputs, hence remaining
undetected until now.
This patch fixes this by removing (almost, see below) all places where
comparisons between pointers are used to order functions. Most of these changes
are local, but the comparison of global values requires assigning an identifier
to each local in the order it is visited. This is very similar to the way the
comparison function identifies Value*'s defined within a function. Because the
order of visiting the functions and their subparts is deterministic, the
identifiers assigned to the globals will be as well, and the order of functions
will be deterministic.
With these changes, there is no more observed non-determinism. There is also
only minor slowdowns (negligible to 4%) compared to the baseline, which is
likely a result of the fact that global comparisons involve hash lookups and not
just pointer comparisons.
The one caveat so far is that programs containing BlockAddress constants can
still be non-deterministic. It is not clear what the right solution is here. In
particular, even if the global numbers are used to order by function, we still
need a way to order the BasicBlock*'s. Unfortunately, we cannot just bail out
and fail to order the functions or consider them equal, because we require a
total order over functions. Note that programs with BlockAddress constants are
relatively rare, so the impact of leaving this in is minor as long as this pass
is opt-in.
Author: jrkoenig
Reviewers: nlewycky, jfb, dschuff
Subscribers: jevinskie, llvm-commits, chapuni
Differential revision: http://reviews.llvm.org/D12168
llvm-svn: 245762
SCEV expansion can invalidate previously expanded values. For example
in SCEVExpander::ReuseOrCreateCast, if we already have the requested
cast value but it's not at the desired location, a new cast is inserted
and the old cast will be invalidated.
Therefore, when expanding the bounds for the pointers, a later entry can
invalidate the IR value for an earlier one. The fix is to store a value
handle rather than the value itself.
The newly added test has a more detailed description of how the bug
triggers.
This bug can have a negative but potentially highly variable performance
impact in Loop Distribution. Because one of the bound values was
invalidated and is an undef expression now, InstCombine is free to
transform the array overlap check:
Start0 <= End1 && Start1 <= End0
into:
Start0 <= End1
So depending on the runtime location of the arrays, we would detect a
conflict and fall back on the original loop of the versioned loop.
Also tested compile time with SPEC2006 LTO bc files.
llvm-svn: 245760
We can wait on either VM, EXP or LGKM.
The waits are independent.
Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.
Here's an example of subtle perf reduction this patch solves:
This is without the patch:
buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen
The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.
Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.
Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.
Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.
Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.
Patch by: Axel Davy
Differential Revision: http://reviews.llvm.org/D11883
llvm-svn: 245755
The original checkin was buggy, this change has a fix.
Original commit message:
[InstCombine] Transform A & (L - 1) u< L --> L != 0
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245753
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).
Fixes the test case from PR24542 (although that may have been over-reduced).
llvm-svn: 245741
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.
llvm-svn: 245733
This will confirm that the patch in D12154 is actually NFC.
It will also confirm that the proposed changes for the AMD chips
are behaving as expected.
llvm-svn: 245704
This is intended to improve code generation for GEPs, as the index value is
shifted by the element size and in GEPs of multi-dimensional arrays the index
of higher dimensions is multiplied by the lower dimension size.
Differential Revision: http://reviews.llvm.org/D12197
llvm-svn: 245689
Note: I do not implement a base pointer, so it's still impossible to
have dynamic realignment AND dynamic alloca in the same function.
This also moves the code for determining the frame index reference
into getFrameIndexReference, where it belongs, instead of inline in
eliminateFrameIndex.
[Begin long-winded screed]
Now, stack realignment for Sparc is actually a silly thing to support,
because the Sparc ABI has no need for it -- unlike the situation on
x86, the stack is ALWAYS aligned to the required alignment for the CPU
instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9.
However, LLVM unfortunately implements user-specified overalignment
using stack realignment support, so for now, I'm going to go along
with that tradition. GCC instead treats objects which have alignment
specification greater than the maximum CPU-required alignment for the
target as a separate block of stack memory, with their own virtual
base pointer (which gets aligned). Doing it that way avoids needing to
implement per-target support for stack realignment, except for the
targets which *actually* have an ABI-specified stack alignment which
is too small for the CPU's requirements.
Further unfortunately in LLVM, the default canRealignStack for all
targets effectively returns true, despite that implementing that is
something a target needs to do specifically. So, the previous behavior
on Sparc was to silently ignore the user's specified stack
alignment. Ugh.
Yet MORE unfortunate, if a target actually does return false from
canRealignStack, that also causes the user-specified alignment to be
*silently ignored*, rather than emitting an error.
(I started looking into fixing that last, but it broke a bunch of
tests, because LLVM actually *depends* on having it silently ignored:
some architectures (e.g. non-linux i386) have smaller stack alignment
than spilled-register alignment. But, the fact that a register needs
spilling is not known until within the register allocator. And by that
point, the decision to not reserve the frame pointer has been frozen
in place. And without a frame pointer, stack realignment is not
possible. So, canRealignStack() returns false, and
needsStackRealignment() then returns false, assuming everyone can just
go on their merry way assuming the alignment requirements were
probably just suggestions after-all. Sigh...)
Differential Revision: http://reviews.llvm.org/D12208
llvm-svn: 245668
The module splitter splits a module into linkable partitions. It will
be used to implement parallel LTO code generation.
This initial version of the splitter does not attempt to deal with the
somewhat subtle symbol visibility issues around module splitting. These
will be dealt with in a future change.
Differential Revision: http://reviews.llvm.org/D12132
llvm-svn: 245662
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.
This fixes http://llvm.org/PR24459
llvm-svn: 245641
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.
llvm-svn: 245640
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245635
Fixes PR23464: one way to use the broadcast intrinsics is:
_mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));
We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.
Differential Revision: http://reviews.llvm.org/D10557
llvm-svn: 245613
Summary:
Add an LSR test that exercises isTruncateFree. Without this change, LSR creates
another indvar representing the truncated value.
Reviewers: jholewinski, eliben
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D12058
llvm-svn: 245611
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.
Differential Revision: http://reviews.llvm.org/D10555
llvm-svn: 245606
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.
Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.
llvm-svn: 245589
Caught by the famous "DebugLoc describes the currect SubProgram" assertion.
When GVN is removing a nonlocal load it updates the debug location of the
SSA value it replaced the load with with the one of the load. In the
testcase this actually overwrites a valid debug location with an empty one.
In reality GVN has to make an arbitrary choice between two equally valid
debug locations. This patch changes to behavior to only update the
location if the value doesn't already have a debug location.
llvm-svn: 245588
Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd.
Reviewers: majnemer
Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D12156
llvm-svn: 245569
COMISD should receive QWORD because it is defined as
(V)COMISD xmm1, xmm2/m64
COMISS should receive DWORD because it is defined as
(V)COMISS xmm1, xmm2/m32
Differential Revision: http://reviews.llvm.org/D11712
llvm-svn: 245551
Usually DSE is not supposed to remove lifetime intrinsics, but it's
actually ok to remove them for dead objects in terminating blocks,
because they convey no extra information there. Until we hit a lifetime
start that cannot be removed, that is. Because from that point on the
lifetime intrinsics become interesting again, e.g. for stack coloring.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11710
llvm-svn: 245542
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).
Fixes PR24225.
llvm-svn: 245535
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.
llvm-svn: 245530
This commit modifies the serialization syntax so that the global IR values in
machine memory operands use the global value '@<name>' syntax instead of the
current '%ir.<name>' syntax.
The unnamed global IR values are handled by this commit as well, as the
existing global value parsing method can parse the unnamed globals already.
llvm-svn: 245527
The global IR values in machine memory operands should use the global value
'@<name>' syntax instead of the current '%ir.<name>' syntax.
However, the global value call entry pseudo source values use the global value
syntax already. Therefore, the syntax for the call entry pseudo source values
has to be changed so that the global values and call entry global value PSVs
can be parsed without ambiguities.
llvm-svn: 245526
We still need to add constant folding of vector comparisons to fold the tests for targets that don't support the respective min/max nodes
I needed to update 2011-12-06-AVXVectorExtractCombine to load a vector instead of using a constant vector to prevent it folding
Differential Revision: http://reviews.llvm.org/D12118
llvm-svn: 245503
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.
This fixes rdar://problem/22345295.
llvm-svn: 245499
We don't do a great job with >= 0 comparisons against zero when the
result is used as an i8.
Given something like:
void f(long long LL, bool *B) {
*B = LL >= 0;
}
We used to generate:
shrq $63, %rdi
xorb $1, %dil
movb %dil, (%rsi)
Now we generate:
testq %rdi, %rdi
setns (%rsi)
Differential Revision: http://reviews.llvm.org/D12136
llvm-svn: 245498
Previously WebAssembly's datalayout string had -v128:8:128. This had been an
attempt to declare a certain level of support for unaligned SIMD accesses.
However, clang makes its own determinations for SIMD alignment that are
independent of the datalayout string, so this wasn't actually meaningful.
llvm-svn: 245494
Check to see if this is a CONCAT_VECTORS of a bunch of EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector inputs come from at most two distinct vectors the same size as the result, attempt to turn this into a legal shuffle.
Differential Revision: http://reviews.llvm.org/D12125
llvm-svn: 245490
This commit serializes the machine instruction's register operand ties.
The ties are printed out only when the instructon has register ties that are
different from the ties that are specified in the instruction's description.
llvm-svn: 245482
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.
llvm-svn: 245481
The defined registers are already serialized - they are represented by placing
them before the '=' in a machine instruction. However, certain instructions like
INLINEASM can have defined register operands after the '=', so this commit
introduces the 'def' register flag for such operands.
llvm-svn: 245480
Reintroduce r245442. Remove an overly conservative assertion introduced
in r245442. We could replace the assertion to use `shareSameRegisterFile`
instead, but in that point in `insertPHI` we already lost the original
Def subreg to check against. So drop the assertion completely.
Original commit message:
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 245479
Since r244955, we try to use the short-form ErrorInfo when both
tries failed, and the long-form match failed on a suffix operand.
However, this means we sometimes mix ErrorInfo and MatchResult
(one manifestation of this being PR24498). Instead, restore both.
llvm-svn: 245469
This patch updates the X86 lowering so that the Exception Pointer and Selector
are 64-bit wide only if Subtarget.isTarget64BitLP64.
Patch by João Porto
Reviewers: dschuff, rnk
Differential Revision: http://reviews.llvm.org/D12111
llvm-svn: 245454
Reapply r243486.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 245442
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.
Reviewers: rengolin, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D12105
llvm-svn: 245439
There are some cases where the mul sequence is smaller, but for the most part,
using a div is preferable. This does not apply to vectors, since x86 doesn't
have vector idiv, and a vector mul/shifts sequence ought to be smaller than a
scalarized division.
Differential Revision: http://reviews.llvm.org/D12082
llvm-svn: 245431
Fix how DependenceAnalysis calls delinearization, mirroring what is done in
Delinearization.cpp (mostly by making sure to call getSCEVAtScope before
delinearizing, and by removing the unnecessary 'Pairs == 1' check).
Patch by Vaivaswatha Nagaraj!
llvm-svn: 245408
Here we make ScalarEvolution::isKnownPredicate, indirectly, a little smarter.
Given some relational comparison operator OP, and two AddRec SCEVs, {I,+,S} OP
{J,+,T}, we can reduce this to the comparison I OP J when S == T, both AddRecs
are for the same loop, and both are known not to wrap.
As it turns out, because of the way that backedge-guard expressions can be
leveraged when computing known predicates, this allows indvars to simplify the
if-statement comparison in this loop:
void foo (int *a, int *b, int n) {
for (int i = 0; i < n; ++i) {
if (i > n)
a[i] = b[i] + 1;
}
}
which, somewhat surprisingly, we were not previously optimizing away.
llvm-svn: 245400
This commit adds support for bit mask target flag serialization to the MIR
printer and the MIR parser. It also adds support for the machine operand's
target flag serialization to the AArch64 target.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 245383
To properly handle this, define the *a instructions as separate
instruction classes by refactoring the LoadA and StoreA multiclasses.
Move the instruction tests into the sparcv9 file to test the difference.
llvm-svn: 245360
The current code normalizes select(C0, x, select(C1, x, y)) towards
select(C0|C1, x, y) if the targets prefers that form. This patch adds an
additional rule that if the select(C1, x, y) part already exists in the
function then we want to normalize into the other direction because the
effects of reusing the existing value are bigger than transforming into
the target preferred form.
This addresses regressions following r238793, see also:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150727/290272.html
Differential Revision: http://reviews.llvm.org/D11616
llvm-svn: 245350
State numbers are calculated by performing a walk from the innermost
funclet to the outermost funclet. Rudimentary support for the new EH
constructs has been added to the assembly printer, just enough to test
the new machinery.
Differential Revision: http://reviews.llvm.org/D12098
llvm-svn: 245331
Summary: This is the correct way to handle JAL instructions when PIC is enabled.
Patch by Toma Tabacu
Reviewers: seanbruno, tomatabacu
Subscribers: brooks, seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D6231
llvm-svn: 245305
This is (almost) everything under MC/MachO/ARM. There are still some
cases missing, because llvm-readobj doesn't (yet) support some features,
that macho-dump provides. I plan to reduce the gap between them shortly.
llvm-svn: 245302
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
is constant we can change all variables with constants in the same BasicBlock
http://reviews.llvm.org/D11918
llvm-svn: 245265
It is possible to be in a situation where more than one funclet token is
a valid SSA value. If we see a terminator which exits a funclet which
doesn't use the funclet's token, replace it with unreachable.
Differential Revision: http://reviews.llvm.org/D12074
llvm-svn: 245238
Summary:
Increase the estimated costs for insert/extract element operations on
AArch64. This is motivated by results from benchmarking interleaved
accesses.
Add missing costs for zext/sext/trunc instructions and some integer to
floating point conversions. These costs were previously calculated
by scalarizing these operation and were affected by the cost increase of
the insert/extract element operations.
Reviewers: rengolin
Subscribers: mcrosier, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11939
llvm-svn: 245226
Summary:
This change limits the minimum cost of an insert/extract
element operation to 2 in cases where this would result
in mixing of NEON and VFP code.
Reviewers: rengolin
Subscribers: mssimpso, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12030
llvm-svn: 245225
Summary:
When demoting an SSA value that has a use on a phi and one of the phi's
predecessors terminates with catchret, the edge needs to be split and the
load inserted in the new block, else we'll still have a cross-funclet SSA
value.
Add a test for this, and for the similar case where a def to be spilled is
on and invoke and a critical edge, which was already implemented but
missing a test.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12065
llvm-svn: 245218
Summary: It is the same as LA, except that it can also load 64-bit addresses and it only works on 64-bit MIPS architectures.
Reviewers: tomatabacu, seanbruno, vkalintiris
Subscribers: brooks, seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D9524
llvm-svn: 245208
These only get generated if the target supports them. If one of the variants is not legal and the other is, and it is safe to do so, the other variant will be emitted.
For example on AArch32 (V8), we have scalar fminnm but not fmin.
Fix up a couple of tests while we're here - one now produces better code, and the other was just plain wrong to start with.
llvm-svn: 245196
PR24469 resulted because DeleteDeadInstruction in handleNonLocalStoreDeletion was
deleting the next basic block iterator. Fixed the same by resetting the basic block iterator
post call to DeleteDeadInstruction.
llvm-svn: 245195
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
If we can ignore NaNs, fmin/fmax libcalls can become compare and select
(this is what we turn std::min / std::max into).
This IR should then be optimized in the backend to whatever is best for
any given target. Eg, x86 can use minss/maxss instructions.
This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314
Differential Revision: http://reviews.llvm.org/D11866
llvm-svn: 245187
Bitwise arithmetic can obscure a simple sign-test. If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.
llvm-svn: 245171
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero. This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.
This fixes PR24085.
Differential Revision: http://reviews.llvm.org/D11289
llvm-svn: 245169
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.
(zext (truncate x)) -> (zext (and(x, m))
Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.
This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.
Differential Revision: http://reviews.llvm.org/D11764
llvm-svn: 245160
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction. CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.
llvm-svn: 245149
This patch makes the Merge Functions pass faster by calculating and comparing
a hash value which captures the essential structure of a function before
performing a full function comparison.
The hash is calculated by hashing the function signature, then walking the basic
blocks of the function in the same order as the main comparison function. The
opcode of each instruction is hashed in sequence, which means that different
functions according to the existing total order cannot have the same hash, as
the comparison requires the opcodes of the two functions to be the same order.
The hash function is a static member of the FunctionComparator class because it
is tightly coupled to the exact comparison function used. For example, functions
which are equivalent modulo a single variant callsite might be merged by a more
aggressive MergeFunctions, and the hash function would need to be insensitive to
these differences in order to exploit this.
The hashing function uses a utility class which accumulates the values into an
internal state using a standard bit-mixing function. Note that this is a different interface
than a regular hashing routine, because the values to be hashed are scattered
amongst the properties of a llvm::Function, not linear in memory. This scheme is
fast because only one word of state needs to be kept, and the mixing function is
a few instructions.
The main runOnModule function first computes the hash of each function, and only
further processes functions which do not have a unique function hash. The hash
is also used to order the sorted function set. If the hashes differ, their
values are used to order the functions, otherwise the full comparison is done.
Both of these are helpful in speeding up MergeFunctions. Together they result in
speedups of 9% for mysqld (a mostly C application with little redundancy), 46%
for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all
three cases, the new speed of MergeFunctions is about half that of the module
verifier, making it relatively inexpensive even for large LTO builds with
hundreds of thousands of functions. The same functions are merged, so this
change is free performance.
Author: jrkoenig
Reviewers: nlewycky, dschuff, jfb
Subscribers: llvm-commits, aemerson
Differential revision: http://reviews.llvm.org/D11923
llvm-svn: 245140
This seems to only work some of the time. In some situations,
this seems to use a nonsensical type and isn't actually aware of the
memory being accessed. e.g. if branch condition is an icmp of a pointer,
it checks the addressing mode of i1.
llvm-svn: 245137
Summary:
http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions.
This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit:
for (int i = 0; i < numIterations; ++i)
sum += ptr[i - offset];
for (int i = 0; i < numIterations; ++i)
sum += ptr[i * stride];
for (int i = 0; i < numIterations; ++i)
sum += ptr[3 * (i << 7)];
Reviewers: atrick, sanjoy
Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben
Differential Revision: http://reviews.llvm.org/D11860
llvm-svn: 245118
Although targeting CoreCLR is similar to targeting MSVC, there are
certain important differences that the backend must be aware of
(e.g. differences in stack probes, EH, and library calls).
Differential Revision: http://reviews.llvm.org/D11012
llvm-svn: 245115
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.
llvm-svn: 245107
This patch makes the Darwin ARM backend take advantage of TargetParser. It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.
Differential Revision: http://reviews.llvm.org/D11996
llvm-svn: 245081
This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly
return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load
merging optimization is changed to use the TLI hook. This exposes a shortcoming in the
current logic and results in the regression test update. Changing other direct users of
the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.
Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting
SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads
while PerformLOADCombine() splits them back into 16-byte loads.
Differential Revision: http://reviews.llvm.org/D10662
llvm-svn: 245075
Summary: Similar to the change we applied to ASan. The same test case works.
Reviewers: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11961
llvm-svn: 245067
This reverts commit r245047.
It was failing on the darwin bots. The problem was that when running
./bin/llc -march=msp430
llc gets to
if (TheTriple.getTriple().empty())
TheTriple.setTriple(sys::getDefaultTargetTriple());
Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.
That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.
llvm-svn: 245062
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.
It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.
llvm-svn: 245047
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.
I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.
llvm-svn: 245035
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.
There are several applications for such a type but my immediate
motivation stems from WinEH. Our personality routine enforces a
single-entry - single-exit regime for cleanups. After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together. We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.
Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup. This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.
What is the burden to the optimizer? Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway. There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.
Differential Revision: http://reviews.llvm.org/D11861
llvm-svn: 245029
Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.
One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.
Reviewers: broune
Subscribers: jholewinski, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12016
llvm-svn: 245003
This commit modifies the way the machine basic blocks are serialized - now the
machine basic blocks are serialized using a custom syntax instead of relying on
YAML primitives. Instead of using YAML mappings to represent the individual
machine basic blocks in a machine function's body, the new syntax uses a single
YAML block scalar which contains all of the machine basic blocks and
instructions for that function.
This is an example of a function's body that uses the old syntax:
body:
- id: 0
name: entry
instructions:
- '%eax = MOV32r0 implicit-def %eflags'
- 'RETQ %eax'
...
The same body is now written like this:
body: |
bb.0.entry:
%eax = MOV32r0 implicit-def %eflags
RETQ %eax
...
This syntax change is motivated by the fact that the bundled machine
instructions didn't map that well to the old syntax which was using a single
YAML sequence to store all of the machine instructions in a block. The bundled
machine instructions internally use flags like BundledPred and BundledSucc to
determine the bundles, and serializing them as MI flags using the old syntax
would have had a negative impact on the readability and the ease of editing
for MIR files. The new syntax allows me to serialize the bundled machine
instructions using a block construct without relying on the internal flags,
for example:
BUNDLE implicit-def dead %itstate, implicit-def %s1 ... {
t2IT 1, 24, implicit-def %itstate
%s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate
}
This commit also converts the MIR testcases to the new syntax. I developed
a script that can convert from the old syntax to the new one. I will post the
script on the llvm-commits mailing list in the thread for this commit.
llvm-svn: 244982
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.
Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.
llvm-svn: 244955
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.
Code generated before the patch was applied:
0: 0f bc c7 bsf %edi,%eax
3: b9 20 00 00 00 mov $0x20,%ecx
8: 0f 45 c8 cmovne %eax,%ecx
b: 83 c1 02 add $0x2,%ecx
e: b8 01 00 00 00 mov $0x1,%eax
13: 85 ff test %edi,%edi
15: 0f 45 c1 cmovne %ecx,%eax
18: c3 retq
Code generated after the patch was applied:
0: 0f bc cf bsf %edi,%ecx
3: 83 c1 02 add $0x2,%ecx
6: 85 ff test %edi,%edi
8: b8 01 00 00 00 mov $0x1,%eax
d: 0f 45 c1 cmovne %ecx,%eax
10: c3 retq
It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.
Differential Revision: http://reviews.llvm.org/D11989
llvm-svn: 244947
We used to be over-conservative about preserving inbounds. Actually, the second
GEP (which applies the constant offset) can inherit the inbounds attribute of
the original GEP, because the resultant pointer is equivalent to that of the
original GEP. For example,
x = GEP inbounds a, i+5
=>
y = GEP a, i // inbounds removed
x = GEP inbounds y, 5 // inbounds preserved
llvm-svn: 244937
This patch corresponds to review:
http://reviews.llvm.org/D11471
It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.
llvm-svn: 244921
They rely on global fast-math options, but soon ISel will rely only on fast-math flags on the instructions themselves. Rip the fast checks out into their own file so we can mark their instructions as fast.
llvm-svn: 244914
These tests relied on -enable-no-nans-fp-math, whereas soon they'll take their no-nans hint
from the FCMP instruction itself, so split the no-nans stuff out into its own test.
Also do a slight rejig of instruction order. The old FMIN/MAX backend matching had to deal with looking through casts, which it never did particularly well. Now, instcombine will recognize such patterns and canonicalize the cast outside the select. So modify the test inputs to assume that instcombine has already run.
llvm-svn: 244913
DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location.
So far this worked only if the store is in the same block as the load.
Now we can also handle stores which are in a different block than the load.
Example:
define i32 @test(i1, i32*) {
entry:
%l2 = load i32, i32* %1, align 4
br i1 %0, label %bb1, label %bb2
bb1:
br label %bb3
bb2:
; This store is redundant
store i32 %l2, i32* %1, align 4
br label %bb3
bb3:
ret i32 0
}
Differential Revision: http://reviews.llvm.org/D11854
llvm-svn: 244901
Previously, for O32 ABI we did not calculate correct addend for R_MIPS_HI16
and R_MIPS_PCHI16 relocations. This patch fixes that.
Patch by Vladimir Radosavljevic.
Differential Revision: http://reviews.llvm.org/D11186
llvm-svn: 244897
Summary:
Update the demotion logic in WinEHPrepare to avoid creating new cleanups by
walking predecessors as necessary to insert stores for EH-pad PHIs.
Also avoid creating stores for EH-pad PHIs that have no uses.
The store/load placement is still pretty naive. Likely future improvements
(at least for optimized compiles) include:
- Share loads for related uses as possible
- Coalesce non-interfering use/def-related PHIs
- Store at definition point rather than each PHI pred for non-interfering
lifetimes.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11955
llvm-svn: 244894
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
{ <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.
This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.
Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.
However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.
This patch fixes the crash by implementing CanLowerReturn. As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.
llvm-svn: 244889
Consider this code:
BB:
%i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
%add = add nsw i32 %i, %b
...
In this common case the add can be moved to the %if.else basic block, because
adding zero is an identity operation. If we go though %if.then branch it's
always a win, because add is not executed; if not, the number of instructions
stays the same.
This pattern applies also to other instructions like sub, shl, shr, ashr | 0,
mul, sdiv, div | 1.
Patch by Jakub Kuderski!
llvm-svn: 244887
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.
Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.
Differential Revision: http://reviews.llvm.org/D11800
llvm-svn: 244882
Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.
I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift.
Differential Revision: http://reviews.llvm.org/D11938
llvm-svn: 244872
Now that we can properly promote mismatched FCOPYSIGNs (r244858), we
can mark the FP_ROUND on the result as truncating, to expose folding.
FCOPYSIGN doesn't change anything but the sign bit, so
(fp_round (fcopysign (fpext a), b))
is equivalent to (modulo the sign bit):
(fp_round (fpext a))
which is a no-op.
llvm-svn: 244862
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.
Follow-up to r243924, r243926, and r244858.
llvm-svn: 244860
We don't care about its type, and there's even a combine that'll fold
away the FP_EXTEND if we let it run. However, until it does, we'll have
something broken like:
(f32 (fp_extend (f64 v)))
Scalar f16 follow-up to r243924.
llvm-svn: 244858
To be clear: this is an *optimization* not a correctness change.
CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll.
One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust.
I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially.
Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly.
Differential Revision: http://reviews.llvm.org/D11819
llvm-svn: 244821
When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :)
This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed. Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement. That change will follow in the near future.
llvm-svn: 244808
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
* Tell ISelLowering that all booleans are either 0 or 1.
* Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
* Add tests for ord/uno.
* Add tests for ueq/one/ult/ule/ugt/uge.
* Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11970
llvm-svn: 244779
r242520 was reverted in r244313 as the expected behaviour of the alias
attribute in C is that the alias has the same size as the aliasee. However
we can re-introduce adding the size on the alias when the aliasee does not,
from a source code or object perspective, exist as a discrete entity. This
happens when the aliasee is not a symbol, or when that symbol is private.
Differential Revision: http://reviews.llvm.org/D11943
llvm-svn: 244752
On Mach-O emitting aliases for the variables that make up a MergedGlobals
variable can cause problems when linking with dead stripping enabled so don't
do that, except for external variables where we must emit an alias.
llvm-svn: 244748
This abstracts away the test for "when can we fold across a MachineInstruction"
into the the MI interface, and changes call-frame optimization use the same test
the peephole optimizer users.
Differential Revision: http://reviews.llvm.org/D11945
llvm-svn: 244729
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).
InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).
I also moved all the relevant combine tests into InstCombine/blend_x86.ll
Differential Revision: http://reviews.llvm.org/D11934
llvm-svn: 244723
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.
llvm-svn: 244684
Mangled "linkage" names can be huge, and if the debugger (or other
tools) have no use for them, the size savings can be very impressive
(on the order of 40%).
Add one test for controlling behavior, and modify a number of tests to
either stop using linkage names, or make llc emit them (so these tests
will still run when the default triple is for PS4).
Differential Revision: http://reviews.llvm.org/D11374
llvm-svn: 244678
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant. Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.
llvm-svn: 244676
Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11924
llvm-svn: 244665
Summary:
For example:
s6 = s0*s5;
s2 = s6*s6 + s6;
...
s4 = s6*s3;
We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)).
This only happens when Aggressive is true, otherwise hasOneUse() check
already prevents from folding the multiplication with more uses.
Test Plan: test/CodeGen/NVPTX/fma-assoc.ll
Patch by Xuetian Weng
Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm
Subscribers: arsenm, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D11855
llvm-svn: 244649
Summary: LowerSwitch crashed with the attached test case after deleting the default block. This happened because the current implementation of deleting dead blocks is wrong. After the default block being deleted, it contains no instruction or terminator, and it should no be traversed anymore. However, since the iterator is advanced before processSwitchInst() function is executed, the block advanced to could be deleted inside processSwitchInst(). The deleted block would then be visited next and crash dyn_cast<SwitchInst>(Cur->getTerminator()) because Cur->getTerminator() returns a nullptr. This patch fixes this problem by recording dead default blocks into a list, and delete them after all processSwitchInst() has been done. It still possible to visit dead default blocks and waste time process them. But it is a compile time issue, and I plan to have another patch to add support to skip dead blocks.
Reviewers: kariddi, resistor, hans, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11852
llvm-svn: 244642
Other objects can never reference the MergedGlobals symbol so external linkage
is never needed. Using private instead of internal linkage means the object is
more similar to what it looks like when global merging is not enabled, with
the only difference being that the merged variables are addressed indirectly
relative to the start of the section they are in.
Also add aliases for merged variables with internal linkage, as this also makes
the object be more like what it is when they are not merged.
Differential Revision: http://reviews.llvm.org/D11942
llvm-svn: 244615
I incorrectly wrote CHECK-NEXT with followin with ':', the check was
ignored by FileCheck.
The non-inbound GEP is folded here because the DataLayout is no longer
optional, the fold was originally guarded with a comment that said:
We need TD information to know the pointer size unless this is inbounds.
Now we always have "TD information" and perform the fold.
Thanks Jonathan Roelofs for noticing.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 244613
First step in preventing immediates that occur more than once within a single
basic block from being pulled into their users, in order to prevent unnecessary
large instruction encoding .Currently enabled only when optimizing for size.
Patch by: zia.ansari@intel.com
Differential Revision: http://reviews.llvm.org/D11363
llvm-svn: 244601
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:
- Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
- f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!
llvm-svn: 244595
REPE, REPZ, REPNZ, REPNE should have mnemonics for Intel syntax as well.
Currently using these instructions causes compilation errors for Intel syntax.
Differential Revision: http://reviews.llvm.org/D11794
llvm-svn: 244584
The "imul reg, imm" alias is not defined for intel syntax.
In intel syntax there is no w/l/q suffix for the imul instruction.
Differential Revision: http://reviews.llvm.org/D11887
llvm-svn: 244582
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.
matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.
C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.
ARM and AArch64 at least have different instructions for these different cases.
llvm-svn: 244580
Summary:
This patch remaps the assembly idiom 'move' to 'or' instead of 'daddu' or
'addu'. The use of addu/daddu instead of or as move was highlighted as a
performance issue during the analysis of a recent 64bit design. Originally
move was encoded as 'or' by binutils but was changed for the r10k cpu family
due to their pipeline which had 2 arithmetic units and a single logical unit,
and so could issue multiple (d)addu based moves at the same time but only 1
logical move.
This patch preserves the disassembly behaviour so that disassembling a old style
(d)addu move still appears as move, but assembling move always gives an or
Patch by Simon Dardis.
Reviewers: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11796
llvm-svn: 244579
When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp"
following a call by one or two pops, respectively. We don't try to do it in
general, but only when the stack adjustment immediately follows a call - which
is the most common case.
That allows taking a short-cut when trying to find a free register to pop into,
instead of a full-blown liveness check. If the adjustment immediately follows a
call, then every register the call clobbers but doesn't define should be dead at
that point, and can be used.
Differential Revision: http://reviews.llvm.org/D11749
llvm-svn: 244578
The condition for clearing the folding candidate list was clamped together
with the "uninteresting instruction" condition. This is too conservative,
e.g. we don't need to clear the list when encountering an IMPLICIT_DEF.
Differential Revision: http://reviews.llvm.org/D11591
llvm-svn: 244577
Summary: I somehow forgot to add these when I added the basic floating-point opcodes. Also remove ceil/floor/trunc/nearestint for now, and add them only when properly tested.
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11927
llvm-svn: 244562
This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints.
llvm-svn: 244555
Summary: convertToHexString doesn't represent them correctly at this point in time. This is a follow-up to sunfish's suggestion in D11914.
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11925
llvm-svn: 244551
This commit serializes the UsedPhysRegMask register mask from the machine
register information class. The mask is serialized as an inverted
'calleeSavedRegisters' mask to keep the output minimal.
This commit also allows the MIR parser to infer this mask from the register
mask operands if the machine function doesn't specify it.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 244548
This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
llvm-svn: 244523
Summary:
For now output using C99's hexadecimal floating-point representation.
This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11914
llvm-svn: 244520
The PATCHPOINT instructions have a single optional defined register operand,
but the machine verifier can't verify the optional defined register operands.
This commit makes sure that the machine verifier won't report an error when a
PATCHPOINT instruction doesn't have its optional defined register operand.
This change will allow us to enable the machine verifier for the code
generation tests for the patchpoint intrinsics.
Reviewers: Juergen Ributzka
llvm-svn: 244513
Summary:
This makes it so that reports symbolized after the fact with
llvm-symbolizer are more similar to the ones we generate at runtime with
in-process dbghelp.
Reviewers: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11785
llvm-svn: 244512
frame setup instruction.
This commit ensures that the stack map lowering code in FastISel adds an
appropriate number of immediate operands to the frame setup instruction.
The previous code added just one immediate operand, which was fine for a target
like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit
operands. This caused the machine verifier to report an error when the old code
added just one.
Reviewers: Juergen Ributzka
Differential Revision: http://reviews.llvm.org/D11853
llvm-svn: 244508
NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF.
As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire.
I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are:
| Time per call (ms) | Runtime (ms) | Benchmark |
| 0.000012514 | 6257 | sete.i386 |
| 0.000012810 | 6405 | sete.i386-fast |
| 0.000010456 | 5228 | sete.x86-64 |
| 0.000010496 | 5248 | sete.x86-64-fast |
| 0.000012906 | 6453 | lahf-sahf.i386 |
| 0.000013236 | 6618 | lahf-sahf.i386-fast |
| 0.000010580 | 5290 | lahf-sahf.x86-64 |
| 0.000010304 | 5152 | lahf-sahf.x86-64-fast |
| 0.000028056 | 14028 | pushf-popf.i386 |
| 0.000027160 | 13580 | pushf-popf.i386-fast |
| 0.000023810 | 11905 | pushf-popf.x86-64 |
| 0.000026468 | 13234 | pushf-popf.x86-64-fast |
Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose.
Reviewers: rnk, jvoung, t.p.northover
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D6629
llvm-svn: 244503
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.
Differential Revision: http://reviews.llvm.org/D11886
llvm-svn: 244495
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
llvm-svn: 244489
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.
llvm-svn: 244485
The LDD/STD instructions can load/store a 64bit quantity from/to
memory to/from a consecutive even/odd pair of (32-bit) registers. They
are part of SparcV8, and also present in SparcV9. (Although deprecated
there, as you can store 64bits in one register).
As recommended on llvmdev in the thread "How to enable use of 64bit
load/store for 32bit architecture" from Apr 2015, I've modeled the
64-bit load/store operations as working on a v2i32 type, rather than
making i64 a legal type, but with few legal operations. The latter
does not (currently) work, as there is much code in llvm which assumes
that if i64 is legal, operations like "add" will actually work on it.
The same assumption does not hold for v2i32 -- for vector types, it is
workable to support only load/store, and expand everything else.
This patch:
- Adds a new register class, IntPair, for even/odd pairs of registers.
- Modifies the list of reserved registers, the stack spilling code,
and register copying code to support the IntPair register class.
- Adds support in AsmParser. (note that in asm text, you write the
name of the first register of the pair only. So the parser has to
morph the single register into the equivalent paired register).
- Adds the new instructions themselves (LDD/STD/LDDA/STDA).
- Hooks up the instructions and registers as a vector type v2i32. Adds
custom legalizer to transform i64 load/stores into v2i32 load/stores
and bitcasts, so that the new instructions can actually be
generated, and marks all operations other than load/store on v2i32
as needing to be expanded.
- Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG.
This hack undoes the transformation of i64 operands into two
arbitrarily-allocated separate i32 registers in
SelectionDAGBuilder. and instead passes them in a single
IntPair. (Arbitrarily allocated registers are not useful, asm code
expects to be receiving a pair, which can be passed to ldd/std.)
Also adds a bunch of test cases covering all the bugs I've added along
the way.
Differential Revision: http://reviews.llvm.org/D8713
llvm-svn: 244484
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.
llvm-svn: 244481
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.
The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".
llvm-svn: 244466
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D11559
llvm-svn: 244448
PR24139 contains an analysis of poor register allocation. One of the findings
was that when calculating the spill weight, a rematerializable interval once
split is no longer rematerializable. This is because the isRematerializable
check in CalcSpillWeights.cpp does not follow the copies introduced by live
range splitting (after splitting, the live interval register definition is a
copy which is not rematerializable).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D11686
llvm-svn: 244439
We can only PHI translate instructions. In our attempt to PHI translate
a bitcast, we attempt to translate its operand; however, the operand
might be an argument or a global instead of an instruction. Benignly
bail out when this happens.
This fixes PR24397.
Differential Revision: http://reviews.llvm.org/D11879
llvm-svn: 244418
The pass adds new kernel arguments for image attributes, and
resolves calls to dummy attribute and resource id getter functions.
Patch by: Zoltan Gilian
llvm-svn: 244372
Summary:
With InstAlias, we don't need to print the _e32 portion of the mnemonic
when we print the $dst operand. This change makes it possible to
include vcc in the asm string when we switch VOPC over to having
implicit vcc defs.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11813
llvm-svn: 244362
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.
Reviewers: sanjoy, weimingz, chenli
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11841
llvm-svn: 244348
This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.
e.g.
%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.
In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821).
Differential Revision: http://reviews.llvm.org/D11760
llvm-svn: 244341
Summary: We were using the SI encoding for VI.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11812
llvm-svn: 244332
In tree they are only used by llvm-readobj, but it is also used by
https://github.com/mono/CppSharp.
While at it, add some missing error checking.
llvm-svn: 244320
llvm-dsymutil has to be able to process debug info produced by other compilers
which use different line table settings. The testcase wasn't generated by
another compiler, but by a modified clang.
llvm-svn: 244319
Summary:
Port the ReconstructShuffle function from AArch64 to ARM
to handle mismatched incoming types in the BUILD_VECTOR
node.
This fixes an outstanding FIXME in the ReconstructShuffle
code.
Reviewers: t.p.northover, rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11720
llvm-svn: 244314
Summary: WebAssembly's tablegen instructions have the names WebAssembly expects, but by LLVM convention they're uppercase and suffixed with their type after an underscore. Leave the C++ code that way, but print outt he names WebAssembly expects (lowercase, no type). We could teach tablegen to do this later, maybe by using `!cast<string>(node)` in the .td files.
Reviewers: sunfish
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D11776
llvm-svn: 244305
The block address machine operands can reference IR blocks in other functions.
This commit fixes a bug where the references to unnamed IR blocks in other
functions weren't serialized correctly.
llvm-svn: 244299
When we are not emitting the condition for the branch, because the condition is
in another BB or SDAG did the selection for us, then we have to mask the flag in
the register with AND.
This is required when the condition comes from a truncate, because SDAG only
truncates down to a legal size of i32.
This fixes rdar://problem/22161062.
llvm-svn: 244291
This reverts commit r243198 and 243304.
Turns out this wasn't the correct fix for this problem. It works only within
FastISel, but fails when the truncate is selected by SDAG.
llvm-svn: 244287
lld might end up using a small part of this, but it will be in a much
refactored form. For now this unblocks avoiding the full section scan in the
ELFFile constructor.
This also has a (very small) error handling improvement.
llvm-svn: 244282
A dSYM bundle is a file hierarchy that looks slike this:
<bundle name>.dSYM/
Contents/
Info.plist
Resources/
DWARF/
<DWARF file(s)>
This is the default output mode of dsymutil.
llvm-svn: 244270
dsymutil should by default generate dSYM bundles which are filesystem
hierarchies containing the debug info and an additional Info.plist.
Currently llvm-dsymutil emits raw binaries containing the debug info.
This is what we call the 'flat mode'. Add a -f/-flat option that is
supposed to enable that flat mode, but don't wire it for now, only
pass it to the tests that will need it to stay functional once we
do bundle generation by default.
This basically makes this commit NFC and removes the noise from the
actual commit that adds support for bundle generation.
llvm-svn: 244269
Summary: This allows us to consolidate several of the TableGen patterns.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11602
llvm-svn: 244253
points.
There is an infinite loop that can occur in Shrink Wrapping while searching
for the Save/Restore points.
Part of this search checks whether the save/restore points are located in
different loop nests and if so, uses the (post) dominator trees to find the
immediate (post) dominator blocks. However, if the current block does not have
any immediate (post) dominators then this search will result in an infinite
loop. This can occur in code containing an infinite loop.
The modification checks whether the immediate (post) dominator is different from
the current save/restore block. If it is not, then the search terminates and the
current location is not considered as a valid save/restore point for shrink wrapping.
Phabricator: http://reviews.llvm.org/D11607
llvm-svn: 244247
iisUnmovableInstruction() had a list of instructions hardcoded which are
considered unmovable. The list lacked (at least) an entry for the va_arg
and cmpxchg instructions.
Fix this by introducing a new Instruction::mayBeMemoryDependent()
instead of maintaining another instruction list.
Patch by Matthias Braun <matze@braunis.de>.
Differential Revision: http://reviews.llvm.org/D11577
rdar://problem/22118647
llvm-svn: 244244
Summary: Divide the primitive size in bits by eight so the initial load's alignment is in bytes as expected. Tested with the included unit test.
Reviewers: rengolin, jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11804
llvm-svn: 244229
This change improves EmitLoweredSelect() so that multiple contiguous CMOV pseudo
instructions with the same (or exactly opposite) conditions get lowered using a single
new basic-block. This eliminates unnecessary extra basic-blocks (and CFG merge points)
when contiguous CMOVs are being lowered.
Patch by: kevin.b.smith@intel.com
Differential Revision: http://reviews.llvm.org/D11428
llvm-svn: 244202
The COFFSymbolRef::isFunctionDefinition() function tests for several conditions
that are not related to whether a symbol is a function, but rather whether
the symbol meets the requirements for a function definition auxiliary record,
which excludes certain symbols such as internal functions and undefined
references. The test we need to determine the symbol type is much simpler:
we only need to compare the complex type against IMAGE_SYM_DTYPE_FUNCTION.
llvm-svn: 244195
This commit implements the initial serialization of the machine operand target
flags. It extends the 'TargetInstrInfo' class to add two new methods that help
to provide text based serialization for the target flags.
This commit can serialize only the X86 target flags, and the target flags for
the other targets will be serialized in the follow-up commits.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 244185
This reverts commit r244163. The workaround shouldn't be necessary
after r244172, and moreover the commit was slightly buggy as it
dis a simple mkdir without removing the directory first, which could
cause 'File exists' errors.
llvm-svn: 244182
More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables.
Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads.
llvm-svn: 244166
This option allows to select a subset of the architectures when
performing a universal binary link. The filter is done completely
in the mach-o specific part of the code.
llvm-svn: 244160
Summary:
Emit both DWARF and CodeView if "CodeView" and "Dwarf Version" module
flags are set.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11756
llvm-svn: 244158
This commit serializes the offset for the following operands: target index,
global address, external symbol, constant pool index, and block address.
llvm-svn: 244157
Summary: PR24191 finds that the expected memory-register operations aren't generated when relaxed { load ; modify ; store } is used. This is similar to PR17281 which was addressed in D4796, but only for memory-immediate operations (and for memory orderings up to acquire and release). This patch also handles some floating-point operations.
Reviewers: reames, kcc, dvyukov, nadav, morisset, chandlerc, t.p.northover, pete
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11382
llvm-svn: 244128
The DWARF linker isn't touched by this, the implementation links
individual files and merges them together into a fat binary by
calling out to the 'lipo' utility.
The main change is that the MachODebugMapParser can now return
multiple debug maps for a single binary.
The test just verifies that lipo would be invoked correctly, but
doesn't actually generate a binary. This mimics the way clang
tests its external iplatform tools integration.
llvm-svn: 244087
In PR24288 it was pointed out that the easy case of a non-escaping
global and something that *obviously* required an escape sometimes is
hidden behind PHIs (or selects in theory). Because we have this binary
test, we can easily just check that all possible input values satisfy
the requirement. This is done with a (very small) recursion through PHIs
and selects. With this, the specific example from the PR is correctly
folded by GVN.
Differential Revision: http://reviews.llvm.org/D11707
llvm-svn: 244078
On Darwin, it is required to stamp the object file with VERSION_MIN load
command. This commit will provide a VERSRION_MIN load command to the
MachO file that doesn't specify the version itself by inferring from
Target Triple.
llvm-svn: 244059
return StringSwitch<int>(Flags)
.Case("g", 0x1)
.Case("nzcvq", 0x2)
.Case("nzcvqg", 0x3)
.Default(-1);
...
// The _g and _nzcvqg versions are only valid if the DSP extension is
// available.
if (!Subtarget->hasThumb2DSP() && (Mask & 0x2))
return -1;
ARMARM confirms that the comment is right, and the code was wrong.
llvm-svn: 244029
In r242277, I updated the MachineCombiner to work with itineraries, but I
missed a call that is scheduling-model-only (the opcode-only form of
computeInstrLatency). Using the form that takes an MI* allows this to work with
itineraries (and should be NFC for subtargets with scheduling models).
llvm-svn: 244020
In the commentary for D11660, I wasn't sure if it was alright to create new
integer machine instructions without also creating the implicit EFLAGS operand.
From what I can see, the implicit operand is always created by the MachineInstrBuilder
based on the instruction type, so we don't have to do that explicitly. However, in
reviewing the debug output, I noticed that the operand was not marked as 'dead'.
The machine combiner should do that to preserve future optimization opportunities
that may be checking for that dead EFLAGS operand themselves.
Differential Revision: http://reviews.llvm.org/D11696
llvm-svn: 243990
It introduced two regressions on 64-bit big-endian targets running under N32
(MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4, and
MultiSource/Applications/kimwitu++/kc) The issue is that on 64-bit targets
comparisons such as BEQ compare the whole GPR64 but incorrectly tell the
instruction selector that they operate on GPR32's. This leads to the
elimination of i32->i64 extensions that are actually required by
comparisons to work correctly.
There's currently a patch under review that fixes this problem.
llvm-svn: 243984
r243883 started moving 'distinct' nodes instead of duplicated them in
lib/Linker. This had the side-effect of sometimes not cloning uniqued
nodes that reference them. I missed a corner case:
!named = !{!0}
!0 = !{!1}
!1 = distinct !{!0}
!0 is the entry point for "remapping", and a temporary clone (say,
!0-temp) is created and mapped in case we need to model a uniquing
cycle.
Recursive descent into !1. !1 is distinct, so we leave it alone,
but update its operand to !0-temp.
Pop back out to !0. Its only operand, !1, hasn't changed, so we don't
need to use !0-temp. !0-temp goes out of scope, and we're finished
remapping, but we're left with:
!named = !{!0}
!0 = !{!1}
!1 = distinct !{null} ; uh oh...
Previously, if !0 and !0-temp ended up with identical operands, then
!0-temp couldn't have been referenced at all. Now that distinct nodes
don't get duplicated, that assumption is invalid. We need to
!0-temp->replaceAllUsesWith(!0) before freeing !0-temp.
I found this while running an internal `-flto -g` bootstrap. Strangely,
there was no case of this in the open source bootstrap I'd done before
commit...
llvm-svn: 243961
This adds the software division routines for the Windows RTABI. These are not
expected to be used often though as most modern Windows ARM capable targets
support hardware division. In the case that the target CPU doesnt support
hardware division, this will be the fallback.
llvm-svn: 243952
There's a bunch of code in LowerFCOPYSIGN that does smart lowering, and
is actually already vector-aware; let's use it instead of scalarizing!
The only interesting change is that for v2f32, we previously always used
use v4i32 as the integer vector type.
Use v2i32 instead, and mark FCOPYSIGN as Custom.
llvm-svn: 243926
We used to legalize it like it's any other binary operations. It's not,
because it accepts mismatched operand types. Because of that, we used
to hit various asserts and miscompiles.
Specialize vector legalizations to, in the worst case, unroll, or, when
possible, to just legalize the operand that needs legalization.
Scalarization isn't covered, because I can't think of a target where
some but not all of the 1-element vector types are to be scalarized.
llvm-svn: 243924
through PHI nodes across iterations.
This patch teaches the new advanced loop unrolling heuristics to propagate
constants into the loop from the preheader and around the backedge after
simulating each iteration. This lets us brute force solve simple recurrances
that aren't modeled effectively by SCEV. It also makes it more clear why we
need to process the loop in-order rather than bottom-up which might otherwise
make much more sense (for example, for DCE).
This came out of an attempt I'm making to develop a principled way to account
for dead code in the unroll estimation. When I implemented
a forward-propagating version of that it produced incorrect results due to
failing to propagate *cost* between loop iterations through the PHI nodes, and
it occured to me we really should at least propagate simplifications across
those edges, and it is quite easy thanks to the loop being in canonical and
LCSSA form.
Differential Revision: http://reviews.llvm.org/D11706
llvm-svn: 243900
This fixes a bug found while working on the bitcode reader. In
particular, the method BitstreamReader::AtEndOfStream doesn't always
behave correctly when processing a data streamer. The method
fillCurWord doesn't properly set CurWord/BitsInCurWord if the data
streamer was already at eof, but GetBytes had not yet set the
ObjectSize field of the streaming memory object.
This patch fixes this problem, and provides a test to show that
this problem has been fixed.
Patch by Karl Schimpf.
Differential Revision: http://reviews.llvm.org/D11391
llvm-svn: 243890
Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s.
The backend is liable to start relying on that (if it hasn't already),
so make uniquable `DICompileUnit`s illegal and automatically upgrade old
bitcode. This is a nice cleanup, since we can remove an unnecessary
`DenseSet` (and the associated uniquing info) from `LLVMContextImpl`.
Almost all the testcases were updated with this script:
git grep -e '= !DICompileUnit' -l -- test |
grep -v test/Bitcode |
xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,'
I imagine something similar should work for out-of-tree testcases.
llvm-svn: 243885
This is necessary for WatchOS support, where the compact unwind format assumes
this kind of layout. For now we only want this on Swift-like CPUs though, where
it's been the Xcode behaviour for ages. Also, since it can expand the prologue
we don't want it at -Oz.
llvm-svn: 243884
* generate function with string attribute using API,
* dump it in LL format,
* try to parse.
Add parser support for string attributes to fix the issue.
Reviewed By: reames, hfinkel
Differential Revision: http://reviews.llvm.org/D11058
llvm-svn: 243877
Enabling merging of extern globals appears to be generally either beneficial or
harmless. On some benchmarks suites (on Cortex-M4F, Cortex-A9, and Cortex-A57)
it gives improvements in the 1-5% range, but in the rest the overall effect is
zero.
Differential Revision: http://reviews.llvm.org/D10966
llvm-svn: 243874
The test/DebugInfo/dwarfdump-macho-universal.test test added in r243862 uses
an input from another test's directory (test/tools/dsymutil/Inputs/fat-test.o)
which breaks our test setup.
Copying the required test input to the test's Input directory to fix the issue.
llvm-svn: 243872
In http://reviews.llvm.org/rL215382, IT forming was made more conservative under
the belief that a flag-setting instruction was unpredictable inside an IT block on ARMv6M.
But actually, ARMv6M doesn't even support IT blocks so that's impossible. In the ARMARM for
v7M, v7AR and v8AR it states that the semantics of such an instruction changes inside an
IT block - it doesn't set the flags. So actually it is fine to use one inside an IT block
as long as the flags register is dead afterwards.
This gives significant performance improvements in a variety of MPEG based workloads.
Differential revision: http://reviews.llvm.org/D11680
llvm-svn: 243869
Summary: This currently sets the shift amount RHS to the same type as the LHS, and assumes that the LHS is a simple type. This isn't currently the case e.g. with weird integers sizes, but will eventually be true and will assert if not. That's what you get for having an experimental backend: break it and you get to keep both pieces. Most backends either set the RHS to MVT::i32 or MVT::i64, but WebAssembly is a virtual ISA and tries to have regular-looking binary operations where both operands are the same type (even if a 64-bit RHS shifter is slightly silly, hey it's free!).
Subscribers: llvm-commits, sunfish, jfb
Differential Revision: http://reviews.llvm.org/D11715
llvm-svn: 243860
The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to convert them to legal shuffle clear masks.
This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading.
There are a few examples of poor shuffle lowering that are exposed by this patch that will be cleaned up in future patches (e.g. merging shuffles that are separated by bitcasts, x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX, etc.)
Differential Revision: http://reviews.llvm.org/D11518
llvm-svn: 243831
Summary: Also test 64-bit integers, except shifts for now which are broken because isel dislikes the 32-bit truncate that precedes them.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11699
llvm-svn: 243822
This commit fixes a bug in the class 'SIInstrInfo' where the implicit register
machine operands were added to a machine instruction in an incorrect order -
the implicit uses were added before the implicit defs.
I found this bug while working on moving the implicit register operand
verification code from the MIR parser to the machine verifier.
This commit also makes the method 'addImplicitDefUseOperands' in the machine
instruction class public so that it can be reused in the 'SIInstrInfo' class.
Reviewers: Matt Arsenault
Differential Revision: http://reviews.llvm.org/D11689
llvm-svn: 243799
Summary:
For example, in
struct S {
int *x;
int *y;
};
__global__ void foo(S s) {
int *b = s.y;
// use b
}
"b" is guaranteed to point to global. NVPTX should emit ld.global/st.global for
accessing "b".
Reviewers: jholewinski
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D11505
llvm-svn: 243790
Summary:
Use -1 as numoperands for the return SDTypeProfile, denoting that return is variadic. Note that the patterns in InstrControl.td still need to match the inputs, so this ins't an "anything goes" variadic on ret!
The next step will be to handle other local types (not just int32).
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11692
llvm-svn: 243783
Successive versions of LLVM should retain the ability to parse bitcode
generated by old releases of the compiler. This adds a bitcode format
compatibility test, which is intended to provide good (albeit not
entirely exhaustive) coverage of the current LangRef.
This also includes compatibility tests for LLVM 3.6. After every 3.X.0
release, the compatibility.ll file from the 3.X branch should be copied
to compatibility-3.X.ll on trunk, and the 3.X.0 release used to generate
a corresponding bitcode file.
Patch by Vedant Kumar!
llvm-svn: 243779
When encountering a scattered relocation, the code would assert trying to
access an unexisting section. I couldn't find a way to expose the result
of the processing of a scattered reloc, and I'm really unsure what the
right thing to do is. This patch just skips them during the processing in
DwarfContext and adds a mach-o file to the tests that exposed the asserting
behavior.
(This is a new failure that is being exposed by Rafael's recent work on
the libObject interfaces. I think the wrong behavior has always happened,
but now it's asserting)
llvm-svn: 243778
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.
Most of the testcase updates were generated by the following sed script:
find test/ -name "*.ll" -o -name "*.mir" |
xargs grep -l 'DILocalVariable' |
xargs sed -i '' \
-e 's/tag: DW_TAG_arg_variable, //' \
-e 's/tag: DW_TAG_auto_variable, //'
There were only a handful of tests in `test/Assembly` that I needed to
update by hand.
(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`. I've added a FIXME to that effect.)
llvm-svn: 243774
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Differential Revision: http://reviews.llvm.org/D11097
llvm-svn: 243766
Summary:
This prints assembly for int32 integer operations defined in WebAssemblyInstrInteger.td only, with major caveats:
- The operation names are currently incorrect.
- Other integer and floating-point types will be added later.
- The printer isn't factored out to handle recursive AST code yet, since it can't even handle control flow anyways.
- The assembly format isn't full s-expressions yet either, this will be added later.
- This currently disables PrologEpilogCodeInserter as well as MachineCopyPropagation becasue they don't like virtual registers, which WebAssembly likes quite a bit. This will be fixed by factoring out NVPTX's change (currently a fork of PrologEpilogCodeInserter).
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11671
llvm-svn: 243763
Add i16, i32, i64 imul machine instructions to the list of reassociation
candidates.
A new bit of logic is needed to handle integer instructions: they have an
implicit EFLAGS operand, so we have to make sure it's dead in order to do
any reassociation with integer ops.
Differential Revision: http://reviews.llvm.org/D11660
llvm-svn: 243756
This makes llvm-nm consistent with binutils nm on executables and DLLs.
For a vanilla hello world executable, the address of main should include
the default image base of 0x400000.
llvm-svn: 243755
Summary:
Favor the extended reg patterns over the shifted reg patterns that match
only the operand shift and not the full sign/zero extend and shift.
Reviewers: jmolloy, t.p.northover
Subscribers: mcrosier, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11569
llvm-svn: 243753
This is to fix an incorrect error when trying to initialize
DwarfNumbers with a !cast<int> of a bits initializer.
getValuesAsListOfInts("DwarfNumbers") would not see an IntInit
and instead the cast, so would give up.
It seems likely that this could be generalized to attempt
the convertInitializerTo for any type. I'm not really sure
why the existing code seems to special case the string cast cases
when convertInitializerTo seems like it should generally handle this
sort of thing.
llvm-svn: 243722
For a modulo (reminder) operation,
clang -target armv7-none-linux-gnueabi generates "__modsi3"
clang -target armv7-none-eabi generates "__aeabi_idivmod"
clang -target armv7-linux-androideabi generates "__modsi3"
Android bionic libc doesn't provide a __modsi3, instead it provides a
"__aeabi_idivmod". This patch fixes the LLVM ARMISelLowering to generate
the correct call when ever there is a modulo operation.
Differential Revision: http://reviews.llvm.org/D11661
llvm-svn: 243717
Fixing MinSize attribute handling was discussed in D11363.
This is a prerequisite patch to doing that.
The handling of OptSize when lowering mem* functions was broken
on Darwin because it wants to ignore -Os for these cases, but the
existing logic also made it ignore -Oz (MinSize).
The Linux change demonstrates a widespread problem. The backend
doesn't usually recognize the MinSize attribute by itself; it
assumes that if the MinSize attribute exists, then the OptSize
attribute must also exist.
Fixing this more generally will be a follow-on patch or two.
Differential Revision: http://reviews.llvm.org/D11568
llvm-svn: 243693
The patch changes the SLPVectorizer::vectorizeStores to choose the immediate
succeeding or preceding candidate for a store instruction when it has multiple
consecutive candidates. In this way it has better chance to find more slp
vectorization opportunities.
Differential Revision: http://reviews.llvm.org/D10445
llvm-svn: 243666
Update the debug info in the check-lines because the change in r243638
introduced a constant initialization before the prologue's end as part
of a register spill.
llvm-svn: 243640
Summary:
This hidden option would disable code generation through FastISel by
default. It was removed from the available options and from the
Fast-ISel tests that required it in order to run the tests.
Reviewers: dsanders
Subscribers: qcolombet, llvm-commits
Differential Revision: http://reviews.llvm.org/D11610
llvm-svn: 243638
Summary:
Previously, we would sign-extend non-boolean negative constants and
zero-extend otherwise. This was problematic for PHI instructions with
negative values that had a type with bitwidth less than that of the
register used for materialization.
More specifically, ComputePHILiveOutRegInfo() assumes the constants
present in a PHI node are zero extended in their container and
afterwards deduces the known bits.
For example, previously we would materialize an i16 -4 with the
following instruction:
addiu $r, $zero, -4
The register would end-up with the 32-bit 2's complement representation
of -4. However, ComputePHILiveOutRegInfo() would generate a constant
with the upper 16-bits set to zero. The SelectionDAG builder would use
that information to generate an AssertZero node that would remove any
subsequent trunc & zero_extend nodes.
In theory, we should modify ComputePHILiveOutRegInfo() to consult
target-specific hooks about the way they prefer to materialize the
given constants. However, git-blame reports that this specific code
has not been touched since 2011 and it seems to be working well for every
target so far.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11592
llvm-svn: 243636
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.
llvm-svn: 243585
The dsymutil-classic -v option dumps the tool version rather than
putting it in verbose mode. Rename -v to -verbose and update the
tests that use it (in the process removing it from a few tests that
didn't require it anymore since the -dump-debug-map option was
introduced).
A followup commit will reintroduce the -v option that dumps the
version.
llvm-svn: 243582
This patch improves the 32-bit target i64 constant matching to detect the shuffle vector splats that are introduced by i64 vector shift vectorization (D8416).
Differential Revision: http://reviews.llvm.org/D11327
llvm-svn: 243577
It's potentially more efficient on Cyclone, and from the optimization guides &
schedulers looks like it has no effect on Cortex-A53 or A57. In general you'd
expect a MOV to be about the most efficient instruction with its semantics,
even though the official "UXTW" alias is really a UBFX.
llvm-svn: 243576
This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed.
Differential Revision: http://reviews.llvm.org/D11439
llvm-svn: 243569
Summary:
returns_twice (most importantly, setjmp) functions are
optimization-hostile: if local variable is promoted to register, and is
changed between setjmp() and longjmp() calls, this update will be
undone. This is the reason why "man setjmp" advises to mark all these
locals as "volatile".
This can not be enough for ASan, though: when it replaces static alloca
with dynamic one, optionally called if UAR mode is enabled, it adds a
whole lot of SSA values, and computations of local variable addresses,
that can involve virtual registers, and cause unexpected behavior, when
these registers are restored from buffer saved in setjmp.
To fix this, just disable dynamic alloca and UAR tricks whenever we see
a returns_twice call in the function.
Reviewers: rnk
Subscribers: llvm-commits, kcc
Differential Revision: http://reviews.llvm.org/D11495
llvm-svn: 243561
Given certain shuffle-vector masks, LLVM emits splat instructions
which splat the wrong bytes from the source register. The issue is
that the function PPC::isSplatShuffleMask() in PPCISelLowering.cpp
does not ensure that the splat pattern found is requesting bytes that
are aligned on an EltSize boundary. This patch detects this situation
as not a valid splat mask, resulting in a permute being generated
instead of a splat.
Patch and test case by Tyler Kenney, cleaned up a bit by me.
This is a simple bug fix that would be good to incorporate into 3.7.
llvm-svn: 243519
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -aarch64-strict-align to decide whether strict alignment should be
forced.
rdar://problem/21529937
llvm-svn: 243516
Summary:
As added initially, statepoints required their call targets to be a
constant pointer null if ``numPatchBytes`` was non-zero. This turns out
to be a problem ergonomically, since there is no way to mark patchable
statepoints as calling a (readable) symbolic value.
This change remove the restriction of requiring ``null`` call targets
for patchable statepoints, and changes PlaceSafepoints to maintain the
symbolic call target through its transformation.
Reviewers: reames, swaroop.sridhar
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11550
llvm-svn: 243502
PR24141: https://llvm.org/bugs/show_bug.cgi?id=24141
contains a test case where we have duplicate entries in a node's uses() list.
After r241826, we use CombineTo() to delete dead nodes when combining the uses into
reciprocal multiplies, but this fails if we encounter the just-deleted node again in
the list.
The solution in this patch is to not add duplicate entries to the list of users that
we will subsequently iterate over. For the test case, this avoids triggering the
combine divisors logic entirely because there really is only one user of the divisor.
Differential Revision: http://reviews.llvm.org/D11345
llvm-svn: 243500
This commit defines subtarget feature strict-align and uses it instead of
cl::opt -arm-strict-align to decide whether strict alignment should be
forced. Also, remove the logic that was checking the OS and architecture
as clang is now responsible for setting strict-align based on the command
line options specified and the target architecute and OS.
rdar://problem/21529937
http://reviews.llvm.org/D11470
llvm-svn: 243493
Reapply 243271 with more fixes; although we are not handling multiple
sources with coalescable copies, we were not properly skipping this
case.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 243486
Summary:
Currently, we support only the MIPS O32 ABI calling convention for call
lowering. With this change we avoid using the O32 calling convetion for
lowering calls marked as using the fast calling convention.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11515
llvm-svn: 243485
Summary:
Generate correct code for the select instruction by zero-extending
it's boolean/condition operand to GPR-width. This is necessary because
the conditional-move instructions operate on the whole register.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11506
llvm-svn: 243469
If the pointer is the store's value operand, this would produce
a broken module. Make sure the use is actually for the pointer operand.
llvm-svn: 243462
Summary:
Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.
There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial.
This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this:
for (int i = 0; i < limit; ++i) {
sum += ptr[i + offset];
}
Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.
There are more details in this discussion on llvmdev.
June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234
July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392
Patch by Bjarke Roune
Reviewers: eliben, atrick, sanjoy
Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits
Differential Revision: http://reviews.llvm.org/D11212
llvm-svn: 243460
GR64 <-> VR64 copies).
This commit adds a MIR test case for the commit r242191, which was committed
without one. This test case verifies that the ExpandPostRA pass expands the
GR64 <-> VR64 copies into the appropriate MMX_MOV instructions.
llvm-svn: 243457
The 'common' section TLS is not implemented.
Current C/C++ TLS variables are not placed in common section.
DWARF debug info to get the address of TLS variables is not generated yet.
clang and driver changes in http://reviews.llvm.org/D10524
Added -femulated-tls flag to select the emulated TLS model,
which will be used for old targets like Android that do not
support ELF TLS models.
Added TargetLowering::LowerToTLSEmulatedModel as a target-independent
function to convert a SDNode of TLS variable address to a function call
to __emutls_get_address.
Added into lib/Target/*/*ISelLowering.cpp to call LowerToTLSEmulatedModel
for TLSModel::Emulated. Although all targets supporting ELF TLS models are
enhanced, emulated TLS model has been tested only for Android ELF targets.
Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for
emulated TLS variables.
Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls.
TODO: Add proper DIE for emulated TLS variables.
Added new unit tests with emulated TLS.
Differential Revision: http://reviews.llvm.org/D10522
llvm-svn: 243438
Summary:
Add patterns for doing floating point round with various rounding modes
followed by conversion to int as a single FCVT* instruction.
Reviewers: t.p.northover, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11424
llvm-svn: 243422
This path add the aarch64 lowering of __builtin_thread_pointer. It uses
the already implemented AArch64ISD::THREAD_POINTER used in TLS generation.
llvm-svn: 243412
no-alias with non-addr-taken globals: they cannot alias a captured
pointer.
If the non-global underlying object would have been a capture were it to
alias the global, we can firmly conclude no-alias. It isn't reasonable
for a transformation to introduce a capture in a way observable by an
alias analysis. Consider, even if it were to temporarily capture one
globals address into another global and then restore the other global
afterward, there would be no way for the load in the alias query to
observe that capture event correctly. If it observes it then the
temporary capturing would have changed the meaning of the program,
making it an invalid transformation. Even instrumentation passes or
a pass which is synthesizing stores to global variables to expose race
conditions in programs could not trigger this unless it queried the
alias analysis infrastructure mid-transform, in which case it seems
reasonable to return results from before the transform started.
See the comments in the change for a more detailed outlining of the
theory here.
This should address the primary performance regression found when the
non-conservatively-correct path of the alias query was disabled.
Differential Revision: http://reviews.llvm.org/D11410
llvm-svn: 243405
VPAND is a lot faster than VPSHUFB and VPBLENDVB - this patch ensures we attempt to lower to a basic bitmask before lowering to the slower byte shuffle/blend instructions.
Split off from D11518.
Differential Revision: http://reviews.llvm.org/D11541
llvm-svn: 243395
This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy
when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826https://llvm.org/bugs/show_bug.cgi?id=22371#c25
The quick fix is to simply remove the scalar FP logical instructions from the load folding table
in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs,
fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP
logical instructions (because that's all x86 gives us anyway). That lets us do the load folding
legally.
Differential Revision: http://reviews.llvm.org/D11477
llvm-svn: 243361
This is effectively an NFC but we can no longer print the index of the
pointer group so instead I print its address. This still lets us
cross-check the section that list the checks against the section that
list the groups (see how I modified the test).
E.g. before we printed this:
Run-time memory checks:
Check 0:
Comparing group 0:
%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
Against group 1:
%arrayidxA = getelementptr i16, i16* %a, i64 %ind
%arrayidxA1 = getelementptr i16, i16* %a, i64 %add
...
Grouped accesses:
Group 0:
(Low: %c High: (78 + %c))
Member: {%c,+,4}<%for.body>
Member: {(2 + %c),+,4}<%for.body>
Now we print this (changes are underlined):
Run-time memory checks:
Check 0:
Comparing group (0x7f9c6040c320):
~~~~~~~~~~~~~~
%arrayidxC1 = getelementptr inbounds i16, i16* %c, i64 %store_ind_inc
%arrayidxC = getelementptr inbounds i16, i16* %c, i64 %store_ind
Against group (0x7f9c6040c358):
~~~~~~~~~~~~~~
%arrayidxA1 = getelementptr i16, i16* %a, i64 %add
%arrayidxA = getelementptr i16, i16* %a, i64 %ind
...
Grouped accesses:
Group 0x7f9c6040c320:
~~~~~~~~~~~~~~
(Low: %c High: (78 + %c))
Member: {(2 + %c),+,4}<%for.body>
Member: {%c,+,4}<%for.body>
llvm-svn: 243354
Summary:
If a scale or a base register can be rewritten as "Zext({A,+,1})" then
LSR will now consider a formula of that form in its normal cost
computation.
Depends on D9180
Reviewers: qcolombet, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9181
llvm-svn: 243348
Summary: WebAssemblySubtarget.cpp expects a default 'generic' CPU to exist, and this seems to be prevalent with other targets. It makes sense to have something between MVP and bleeding-edge, even though for now it's the same as MVP. This removes a warning that's currently generated.
Subscribers: jfb, llvm-commits, sunfish
Differential Revision: http://reviews.llvm.org/D11546
llvm-svn: 243345
This commit serializes the references from the machine basic blocks to the
unnamed basic blocks.
This commit adds a new attribute to the machine basic block's YAML mapping
called 'ir-block'. This attribute contains the actual reference to the
basic block.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243340
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"
This change re-implements D9784 with the two differences:
1. It does not use SCEVExpander and does not generate new
instructions. Instead, it does a quick local search for existing
`llvm::Value`s that it needs when modifying the `icmp`
instruction.
2. It is more general -- it deals with both increasing and decreasing
induction variables.
I've added all of the tests included with D9784, and two more.
As an example on what this change does (copied from D9784):
Given C code:
```
for (int i = M; i < N; i++) // i is known not to overflow
if (i < 0) break;
a[i] = 0;
}
```
This transformation produces:
```
for (int i = M; i < N; i++)
if (M < 0) break;
a[i] = 0;
}
```
Which can be unswitched into:
```
if (!(M < 0))
for (int i = M; i < N; i++)
a[i] = 0;
}
```
I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine. Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison. I'm open to putting it in its own helper routine if you
think that is better.
Reviewers: reames, nicholas, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11278
llvm-svn: 243331
be reserved.
The decision to reserve x18 is going to be made solely by the front-end,
so it isn't necessary to check if the OS is Darwin in the backend.
llvm-svn: 243308
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.
Differential Revision: http://reviews.llvm.org/D11503
llvm-svn: 243303
The pointer size of the addrspacecasted pointer might not have matched,
so this would have hit an assert in accumulateConstantOffset.
I think this was here to allow constant folding of a load of an
addrspacecasted constant. Accumulating the offset through the
addrspacecast doesn't make much sense, so something else is necessary
to allow folding the load through this cast.
llvm-svn: 243300
Author: Dave Airlie <airlied@redhat.com>
In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.
this should be applicable for llvm 3.7 as well.
llvm-svn: 243294
This commit zeroes out the virtual register references in the machine
function's liveins in the class 'MachineRegisterInfo' when the virtual
register definitions are cleared.
Reviewers: Matthias Braun
llvm-svn: 243290
Reapply r242295 with fixes in the implementation.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
psllq %mm1, %mm0
movd %mm0, %r9
jmp C
B:
por %mm1, %mm0
movd %mm0, %r9
jmp C
C:
movd %r9, %mm0
pshufw $238, %mm0, %mm0
Becomes:
A:
psllq %mm1, %mm0
jmp C
B:
por %mm1, %mm0
jmp C
C:
pshufw $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 243271
Summary:
Fix the cost of interleaved accesses for ARM/AArch64.
We were calling getTypeAllocSize and using it to check
the number of bits, when we should have called
getTypeAllocSizeInBits instead.
This would pottentially cause the vectorizer to
generate loads/stores and shuffles which cannot
be matched with an interleaved access instruction.
No performance changes are expected for now since
matching/generating interleaved accesses is still
disabled by default.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D11524
llvm-svn: 243270
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.
llvm-svn: 243253
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.
I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.
Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.
This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.
Reviewers: chandlerc, hfinkel
Subscribers: aschwaighofer, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D9819
llvm-svn: 243250
Summary:
This patch improves trivial loop unswitch.
The current trivial loop unswitch only checks if loop header's terminator contains a trivial unswitch condition. But if the loop header only has one reachable successor (due to intentionally or unintentionally missed code simplification), we should consider the successor as part of the loop header. Therefore, instead of stopping at loop header's terminator, we should keep traversing its successors within loop until reach a *real* conditional branch or switch (whose condition can not be constant folded). This change will enable a single -loop-unswitch pass to unswitch multiple trivial conditions (unswitch one trivial condition could open opportunity to unswitch another one in the same loop), while the old implementation can unswitch only one per pass.
Reviewers: reames, broune
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11481
llvm-svn: 243203
When truncating to non-legal types (such as i16, i8 and i1) always use an AND
instruction to mask out the upper bits. This was only done when the source type
was an i64, but not when the source type was an i32.
This commit fixes this and adds the missing i32 truncate tests.
This fixes rdar://problem/21990703.
llvm-svn: 243198
extension property we're requesting - zero or sign extended.
This fixes cases where we want to return a zero extended 32-bit -1
and not be sign extended for the entire register. Also updated the
already out of date comment with the current behavior.
llvm-svn: 243192
whether register x18 should be reserved.
This change is needed because we cannot use a backend option to set
cl::opt "aarch64-reserve-x18" when doing LTO.
Out-of-tree projects currently using cl::opt option "-aarch64-reserve-x18"
to reserve x18 should make changes to add subtarget feature "reserve-x18"
to the IR.
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D11463
llvm-svn: 243186
Add a verifier check that `DILocalVariable`s of tag
`DW_TAG_arg_variable` always have a non-zero 'arg:' field, and those of
tag `DW_TAG_auto_variable` always have a zero 'arg:' field. These are
the only configurations that are properly understood by the backend.
(Also, fix the bad examples in LangRef and test/Assembler, and fix the
bug in Kaleidoscope Ch8.)
A large number of testcases seem to have bitrotted their way forward
from some ancient version of the debug info hierarchy that didn't have
`arg:` parameters. If you have out-of-tree testcases that start failing
in the verifier and you don't care enough to get the `arg:` right, you
may have some luck just calling:
sed -e 's/, arg: 0/, arg: 1/'
or some such, but I hand-updated the ones in tree.
llvm-svn: 243183
This commit serializes the callee saved information from the class
'MachineFrameInfo'. This commit extends the YAML mappings for the fixed and
the ordinary stack objects and adds an optional 'callee-saved-register'
attribute. This attribute is used to serialize the callee save information.
llvm-svn: 243173
This patch extend LoopReroll pass to hand the loops which
is similar to the following:
while (len > 1) {
sum4 += buf[len];
sum4 += buf[len-1];
len -= 2;
}
llvm-svn: 243171
This commit serializes the virtual register allocations hints of type 0.
These hints specify the preferred physical registers for allocations.
llvm-svn: 243156
The names for instructions inserted were previous dependent on iteration order. By deriving the names from the original instructions, we can avoid instability in tests without resorting to ordered traversals. It also makes the IR mildly easier to read at large scale.
llvm-svn: 243140
This commit adds the liveins and successors properties to machine basic blocks
in some of the MIR tests to ensure that the tests will pass when the MIR parser
will run the machine verifier after initializing a machine function.
llvm-svn: 243124
This commit moves and transforms the generic test
'CodeGen/MIR/successor-basic-blocks.mir' into an X86 specific test
'CodeGen/MIR/X86/successor-basic-blocks.mir'. This change is required in order
to enable the machine verifier for the MIR parser, as the machine verifier
verifies that the machine basic blocks contain instructions that actually
determine the machine basic block successors.
llvm-svn: 243123
Some shufflevectors are currently being incorrectly lowered in the AArch32
backend as the existing checks for detecting the NEON operations from the
shufflevector instruction expects the shuffle mask and the vector operands to be
of the same length.
This is not always the case as the mask may be twice as long as the operand;
here only the lower half of the shufflemask gets checked, so provided the lower
half of the shufflemask looks like a vector transpose (or even is just all -1
for undef) then the intrinsics may get incorrectly lowered into a vector
transpose (VTRN) instruction.
This patch fixes this by accommodating for both cases and adds regression tests.
Differential Revision: http://reviews.llvm.org/D11407
llvm-svn: 243103
is an immediate, in this check the value is negated and stored in and int64_t.
The value can be -2^63 yet the result cannot be stored in an int64_t and this
gives some undefined behaviour causing failures. The negation is only necessary
when the values is within a certain range and so it should not need to negate
-2^63, this patch introduces this and also a regression test.
Differential Revision: http://reviews.llvm.org/D11408
llvm-svn: 243100
This patch allows llvm-dsymutil to read universal (aka fat) macho object
files and archives. The patch touches nearly everything in the BinaryHolder,
but it is fairly mechinical: the methods that returned MemoryBufferRefs or
ObjectFiles now return a vector of those, and the high-level access function
takes a triple argument to select the architecture.
There is no support yet for handling fat executables and thus no support for
writing fat object files.
llvm-svn: 243096
MachOObjectFile offers a method for detecting the correct triple, use
it instead of the previous approximation. This doesn't matter right
now, but it will become important for mach-o universal (fat) binaries.
llvm-svn: 243095
Summary:
Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate.
This patch is intended to be applied after D10205 (though it could be applied independently).
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10206
llvm-svn: 243084
The test in PR24199 ( https://llvm.org/bugs/show_bug.cgi?id=24199 ) crashes because machine
trace metrics was not ignoring dbg_value instructions when calculating data dependencies.
The machine-combiner pass asks machine trace metrics to calculate an instruction trace,
does some reassociations, and calls MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval()
along with MachineTraceMetrics::invalidate(). The dbg_value instructions have their operands
invalidated, but the instructions are not expected to be deleted.
On a subsequent loop iteration of the machine-combiner pass, machine trace metrics would be
called again and die while accessing the invalid debug instructions.
Differential Revision: http://reviews.llvm.org/D11423
llvm-svn: 243057
Summary:
Scalarizer has two data structures that hold information about changes
to the function, Gathered and Scattered. These are cleared in finish()
at the end of runOnFunction() if finish() detects any changes to the
function.
However, finish() was checking for changes by only checking if
Gathered was non-empty. The function visitStore() only modifies
Scattered without touching Gathered. As a result, Scattered could have
ended up having stale data if Scalarizer only scalarized store
instructions. Since the data in Scattered is used during the execution
of the pass, this introduced dangling pointer errors.
The fix is to check whether both Scattered and Gathered are empty
before deciding what to do in finish(). This also fixes a problem
where the Function can be modified although the pass returns false.
Reviewers: rnk
Subscribers: rnk, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D10459
llvm-svn: 243040
Adds pushes to the folding tables.
This also required a fix to the TD definition, since the memory forms of
the push instructions did not have the right mayLoad/mayStore flags.
Differential Revision: http://reviews.llvm.org/D11340
llvm-svn: 243010
We currently version `__asan_init` and when the ABI version doesn't match, the linker gives a `undefined reference to '__asan_init_v5'` message. From this, it might not be obvious that it's actually a version mismatch error. This patch makes the error message much clearer by changing the name of the undefined symbol to be `__asan_version_mismatch_check_xxx` (followed by the version string). We obviously don't want the initializer to be named like that, so it's a separate symbol that is used only for the purpose of version checking.
Reviewed at http://reviews.llvm.org/D11004
llvm-svn: 243003
The DAG Node "SCALAR_TO_VECTOR" may be created if the type of the scalar element is legal.
Added a check for the scalar type before creating this node.
Added a test that fails with assertion on the current version.
Differential Revision: http://reviews.llvm.org/D11413
llvm-svn: 242994
This commit broke the build. Numerous build bots broken, and it was
blocking my progress so reverting.
It should be trivial to reproduce -- enable the BPF backend and it
should fail when running llvm-tblgen.
llvm-svn: 242992
The debug map contains the timestamp of the object files in references.
We do not check these in the general case, but it's really useful if
you have archives where different versions of an object file have been
appended. This allows llvm-dsymutil to find the right one.
llvm-svn: 242965
The MSVC ABI requires that we generate an alias for the vtable which
means looking through a GlobalAlias which cannot be overridden improves
our ability to devirtualize.
Found while investigating PR20801.
Patch by Andrew Zhogin!
Differential Revision: http://reviews.llvm.org/D11306
llvm-svn: 242955