Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
llvm-svn: 286423
No testcase included because I can't figure out how to reduce it.
(It's easy to write a testcase where rotation clones an assume,
but that doesn't actually seem to trigger the crash in opt on
its own; maybe an issue with the laziness?)
Differential Revision: https://reviews.llvm.org/D26434
llvm-svn: 286410
Summary:
Unrolled Loop Size calculations moved to a function.
Constant representing number of optimized instructions
when "back edge" becomes "fall through" replaced with
variable.
Some comments added.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D21719
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 286389
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420
llvm-svn: 286385
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.
Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.
Differential Revision: https://reviews.llvm.org/D25812
llvm-svn: 286384
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.
Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.
llvm-svn: 286377
The name/comment of the third argument to the ScheduleDAGMI constructor
is RemoveKillFlags and not IsPostRA. Only the comments are changed.
Review: A Trick
llvm-svn: 286350
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.
Differential Revision: https://reviews.llvm.org/D26185
llvm-svn: 286347
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Differential Revision: https://reviews.llvm.org/D26331
llvm-svn: 286345
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26330
llvm-svn: 286344
Summary:
This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912.
Reviewers: delena, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26322
llvm-svn: 286339
The BitcodeReader no longer produces BitcodeDiagnosticInfo diagnostics.
The only remaining reference was in the gold plugin; the code there has been
dead since we stopped producing InvalidBitcodeSignature error codes in r225562.
While at it remove the InvalidBitcodeSignature error code.
llvm-svn: 286326
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.
Reviewers: davidxl, hfinkel
Subscribers: mehdi_amini, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26155
llvm-svn: 286325
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.
If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares.
Ie, running the optimizer pessimizes the codegen for this case without this patch:
define <4 x i32> @umax_vec(<4 x i32> %x) {
%cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
%sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
ret <4 x i32> %sel
}
$ ./opt umax.ll -S | ./llc -o - -mattr=avx
vpmaxud LCPI0_0(%rip), %xmm0, %xmm0
$ ./opt -instcombine umax.ll -S | ./llc -o - -mattr=avx
vpxor %xmm1, %xmm1, %xmm1
vpcmpgtd %xmm0, %xmm1, %xmm1
vmovaps LCPI0_0(%rip), %xmm2 ## xmm2 = [2147483647,2147483647,2147483647,2147483647]
vblendvps %xmm1, %xmm0, %xmm2, %xmm0
Differential Revision: https://reviews.llvm.org/D26096
llvm-svn: 286318
As the test change shows, we can increase the critical path by adding
a 'not' instruction, so make sure that we're actually removing an
instruction if we do this transform.
This transform could also cause us to miss folds of min/max pairs.
llvm-svn: 286315
Previously support had been added for using CodeViewRecordIO
to read (deserialize) CodeView type records. This patch adds
support for writing those same records. With this patch,
reading and writing of CodeView type records finally uses a single
codepath.
Differential Revision: https://reviews.llvm.org/D26253
llvm-svn: 286304
if it is more specific than the one in its DW_AT_specification.
If a static member is an array, the translation unit containing the
member definition may have a more specific type (including its length)
than TUs only seeing the class declaration. This patch adds a
DW_AT_type to the member's DW_TAG_variable in addition to the
DW_AT_specification in these cases. The member type in the
DW_AT_specification still shows the more generic type (without the
length) to avoid defeating type uniquing.
The DWARF standard discourages “duplicating” a DW_AT_type in a member
variable definition but doesn’t explicitly forbid it. Having the more
specific type (with the array length) available is what allows the
debugger to print the contents of a static array member variable.
https://reviews.llvm.org/D26368
rdar://problem/28706946
llvm-svn: 286302
Summary:
There are two variables here that break. This change constrains both of them to
debug builds (via DEBUG() or #ifndef NDEBUG).
Reviewers: bkramer, t.p.northover
Subscribers: mehdi_amini, vkalintiris
Differential Revision: https://reviews.llvm.org/D26421
llvm-svn: 286300
Summary:
This patch uses the same approach added for inline asm in r285513 to
similarly prevent promotion/renaming of locals used or defined in module
level asm.
All static global values defined in normal IR and used in module level asm
should be included on either the llvm.used or llvm.compiler.used global.
The former were already being flagged as NoRename in the summary, and
I've simply added llvm.compiler.used values to this handling.
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Fixes PR30610.
Reviewers: mehdi_amini
Subscribers: johanengelen, krasin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26146
llvm-svn: 286297
This addresses PR30746, <https://llvm.org/bugs/show_bug.cgi?id=30746>. The ASan pass iterates over entry-block instructions and checks each alloca whether it's in NonInstrumentedStaticAllocaVec, which is apparently slow. This patch gathers the instructions to move during visitAllocaInst.
Differential Revision: https://reviews.llvm.org/D26380
llvm-svn: 286296
Summary:
We've had support for auto upgrading old style scalar TBAA access
metadata tags into the "new" struct path aware TBAA metadata for 3 years
now. The only way to actually generate old style TBAA was explicitly
through the IRBuilder API. I think this is a good time for dropping
support for old style scalar TBAA.
I'm not removing support for textual or bitcode upgrade -- if you have
IR with the old style scalar TBAA tags that go through the AsmParser orf
the bitcode parser before LLVM sees them, they will keep working as
usual.
Note:
%val = load i32, i32* %ptr, !tbaa !N
!N = < scalar tbaa node >
is equivalent to
%val = load i32, i32* %ptr, !tbaa !M
!N = < scalar tbaa node >
!M = !{!N, !N, 0}
Reviewers: manmanren, chandlerc, sunfish
Subscribers: mcrosier, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D26229
llvm-svn: 286291
After instruction selection we perform some checks on each VReg just before
discarding the type information. These checks were assertions before, but that
breaks the fallback path so this patch moves the logic into the main flow and
reports a better error on failure.
llvm-svn: 286289
This completes assembler / disassembler support for all BFP
instructions provided by the floating-point extensions facility.
The instructions added here are not currently used for codegen.
llvm-svn: 286285
Add several instructions that operate on the program mask
or the addressing mode. These are not really needed for
code generation under Linux, but are provided for completeness
for the assembler/disassembler.
llvm-svn: 286284
Add the 16 access registers as LLVM registers. This allows removing
a lot of special cases in the assembler and disassembler where we
were handling access registers; this can all just use the generic
register code now.
Also add a bunch of instructions to operate on access registers,
for assembler/disassembler use only. No change in code generation
intended.
llvm-svn: 286283
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.
llvm-svn: 286274
Erasing reverse_iterators is problematic; iterate manually.
While there, keep track of the range of inserted instructions.
It can miss instructions inserted elsewhere, but those are harder
to track.
Differential Revision: http://reviews.llvm.org/D22924
llvm-svn: 286272
For example, it invalidates the domtree, causing assertions
in later passes which need dominator infos. Make it preserve
GlobalsAA, as suggested by Eli.
Differential Revision: https://reviews.llvm.org/D26381
llvm-svn: 286271
Define a couple of additional semantic classes and use them
throughout the .td files to make them more consistent and
more easily readable.
No functional change.
llvm-svn: 286268
This changes the InstRR (and related) patterns to no longer
automatically add an "r" at the end of the mnemonic. This
makes the .td files more obviously understandable, and also
allows using the patterns for those few instructions that
do not follow the *r scheme.
Also add some more sub-formats of the RRF format class, to
match operand names and sequence from the PoP better.
No functional change.
llvm-svn: 286267
Now that we've added instruction format subclasses like
InstRIb, it makes sense to rename the old InstRI to InstRIa.
Similar for InstRX, InstRXY, InstRS, InstRSY, and InstSS.
No functional change.
llvm-svn: 286266
Rework patterns for branches, call & return instructions,
compare-and-branch, compare-and-trap, and conditional move
instructions.
In particular, simplify creation of patterns for the extended
opcodes of instructions that take a CC mask.
Also, use semantical instruction classes for all the instructions
instead of open-coding them in SystemZInstrInfo.td.
Adds a couple of the basic branch instructions (that are unused
for codegen) for the assembler/disassembler.
llvm-svn: 286263
About when we should move a vreg from CurrentNewVRegs to NewVRegs,
if the vreg in CurrentNewVRegs was added into RecoloringCandidate and was
evicted, it shouldn't be added to NewVRegs because its physical register
will be restored at the end of tryLastChanceRecoloring after the recoloring
failed. If the vreg in CurrentNewVRegs was not in RecoloringCandidate, i.e.
it was evicted in selectOrSplitImpl inside tryRecoloringCandidates, its
physical register will not be restored even if the recoloring failed. In
that case, we need to add the vreg to NewVRegs.
Same as r281783, the problem was seen on out-of-tree target and we didn't
have a test case that reproduce the problem with in-tree targets.
llvm-svn: 286259
Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before.
Reviewers: asl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D23718
llvm-svn: 286252
From experiments, discriminator is rarely greater than 127. Here we enforce it to be no greater than 127 so that it will always fit in 1 byte.
llvm-svn: 286245
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.
The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....
This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).
The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.
Differential Revision: https://reviews.llvm.org/D26031
llvm-svn: 286238
Summary:
These are good candidates for jump threading. This enables later opts
(such as InstCombine) to combine instructions from the selects with
instructions out of the selects. SimplifyCFG will fold the select
again if unfolding wasn't worth it.
Patch by James Molloy and Pablo Barrio.
Reviewers: rengolin, haicheng, sebpop
Subscribers: jojo, jmolloy, llvm-commits
Differential Revision: https://reviews.llvm.org/D26391
llvm-svn: 286236
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.
This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.
Differential Revision: https://reviews.llvm.org/D25910
llvm-svn: 286233
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.
Differential Revision: https://reviews.llvm.org/D26394
llvm-svn: 286231
This additional information can be used to improve the locations when generating remarks for loops.
Patch by Florian Hahn.
Differential Revision: https://reviews.llvm.org/D25763
llvm-svn: 286227
Unique ownership is just one possible ownership pattern for the memory buffer
underlying the bitcode reader. In practice, as this patch shows, ownership can
often reside at a higher level. With the upcoming change to allow multiple
modules in a single bitcode file, it will no longer be appropriate for
modules to generally have unique ownership of their memory buffer.
The C API exposes the ownership relation via the LLVMGetBitcodeModuleInContext
and LLVMGetBitcodeModuleInContext2 functions, so we still need some way for
the module to own the memory buffer. This patch does so by adding an owned
memory buffer field to Module, and using it in a few other places where it
is convenient.
Differential Revision: https://reviews.llvm.org/D26384
llvm-svn: 286214
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106630.html
Move block info block state to a new class, BitstreamBlockInfo.
Clients may set the block info for a particular cursor with the
BitstreamCursor::setBlockInfo() method.
At this point BitstreamReader is not much more than a container for an
ArrayRef<uint8_t>, so remove it and replace all uses with direct uses
of memory buffers.
Differential Revision: https://reviews.llvm.org/D26259
llvm-svn: 286207
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.
Patch mostly by Ahmed again.
llvm-svn: 286183
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.
With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.
llvm-svn: 286171
With this we get a new field in the YAML record if the value being
streamed out has a debug location. For examples, please see the changes
to the tests.
This is then used in opt-viewer to display a link for the callee
function in the inlining remarks.
Differential Revision: https://reviews.llvm.org/D26366
llvm-svn: 286169
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling. We just need to transfer memory
operands during lowering.
Reviewers: mcrosier, t.p.northover, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26313
llvm-svn: 286168
Because we shift the stack pointer by an unknown amount, we need an
additional pointer. In the case where we have variable-size objects
as well, we can't reuse the frame pointer, thus three pointers.
Patch by Jacob Gravelle
Differential Revision: https://reviews.llvm.org/D26263
llvm-svn: 286160
Summary:
In some specific scenarios with well understood operand bundle types
(like `"deopt"`) it may be possible to go ahead and convert recursion to
iteration, but TailRecursionElimination does not have that logic today
so avoid doing the right thing for now.
I need some input on whether `"funclet"` operand bundles should also
block tail recursion elimination. If not, I'll allow TRE across calls
with `"funclet"` operand bundles and add a test case.
Reviewers: rnk, majnemer, nlewycky, ahatanak
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26270
llvm-svn: 286147
Although rare, atomic accesses to floating-point types seem to be valid, i.e. `%a = load atomic float ...`. The TSan instrumentation pass however tries to emit inttoptr, which is incorrect, we should use a bitcast here. Anyway, IRBuilder already has a convenient helper function for this.
Differential Revision: https://reviews.llvm.org/D26266
llvm-svn: 286135
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.
Fixes verifier failures in a future commit.
Also fix verifier error when inserting copy for vccz
corruption bug.
llvm-svn: 286133
Argument evaluation order is one of the edge cases where Clang differs
from GCC, yielding different IR depending on which compiler LLVM was
built with. Make the order deterministic and tune the test to actually
verify the order instead of trying to hide it.
llvm-svn: 286126
This was reverted at r285866 because there was a crash handling a scalar
select of vectors. I added a check for that pattern and a test case based
on the example provided in the post-commit thread for r285732.
llvm-svn: 286113
* Use a generic vector unit to model the issue unit more accurately.
* Update some vector instructions that actually use the vector unit for more
than one cycle.
Review: Ulrich Weigand
llvm-svn: 286112
IssueWidth updated to reflect the capacity of the issue unit correctly.
Correct number of FX and LS units modelled (2, was 1).
Review: Ulrich Weigand
llvm-svn: 286109
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.
If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.
In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.
llvm-svn: 286107
Cmake has not recognized that Hexagon.td has a new dependency in
HexagonPatterns.td. All changes to that file were not visible to
the build bots.
llvm-svn: 286084
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI. Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.
Addresses PR30825!
llvm-svn: 286082
The clr/set/toggle-bit instructions (with the bit index given as an
immediate operand) had both, custom selection code that generated them,
and selection patterns at the same time. The selection patterns were
not used, because the custom selection code was executed first.
This patch removes the custom code in favor of the selection patterns.
The custom code handled 64-bit registers as well with an immediate bit
index, and so new patterns were added to implement that.
It was also the same case for the instruction "Rd += asr(Rs, Rt)",
except that the custom code did not offer any additional functionality,
and was simply removed.
llvm-svn: 286080
Summary:
This kill various depreacated API related to attribute :
- The deprecated C API attribute based on LLVMAttribute enum.
- The Raw attribute set format (planned to be removed in 4.0).
Reviewers: bkramer, echristo, mehdi_amini, void
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D23039
llvm-svn: 286062
Summary:
Fixes PR30869.
In D25977 I meant to change all functions that care about lifetime. I
changed constructors, factory functions, but I missed member/free
functions that return new instances. This patch changes them.
Reviewers: hfinkel, kbarton, echristo, joerg
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D26269
llvm-svn: 286060
Summary:
SmallSetVector uses DenseSet, but that means we need to reserve some
values for the empty and tombstone keys.
It seems to me we should have a general way to let us store full-range
ints inside of DenseSets, and furthermore that we probably shouldn't
silently let you add ints into DenseSets without explicitly promising
that they're in range. But that's a battle for another day; for now,
just fix this code, since we currently do something Very Bad when
compiling ffmpeg.
Fixes PR30914.
Reviewers: jeremyhu
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D26323
llvm-svn: 286038
We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs.
This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls.
llvm-svn: 286037
It currently fires an assert if you even try. Looking back, I don't think it ever worked because it only changed the name of the function object, but not the intrinsic ID stored in it. Given that, I think it can be removed since no one has noticed or complained in the past 4 years.
llvm-svn: 286031
Summary: This patch returns the same label if the CP entry with the same value has been created.
Reviewers: eli.friedman, rengolin, jmolloy
Subscribers: majnemer, jmolloy, llvm-commits
Differential Revision: https://reviews.llvm.org/D25804
llvm-svn: 286006
instruction.
This avoids dereferencing null in the debug logging if the instruction
was not in fact a return instruction. This potential bug was found by
PVS-Studio.
This actually fixes the last of the "dereferenced a pointer before
checking it for null" reports in the recent PVS-Studio run. However,
there are quite a few reports of this nature that I did not do anything
to fix because they are pretty glaring false positives. They usually
took the form of quite clear correlated checks or a check made in
a separate function. I've even added asserts anywhere this correlation
wasn't pretty obvious and fundamental to the code.
llvm-svn: 285988
of which that is hidden inside a separate function call) and helpfully
before building expensive transaction infrastructure. This will avoid
crashing when running CGP in a generic mode if we ever managed to hit
this case.
Note that I spent some time looking at alternatives. CGP is actually
used without a TM or TLI in order to do some target-independent testing.
Further, all of the neighboring optimization techniques actually have
some paths that are effective even in the absence of TLI so this seemed
the correct scope at which to check and bypass logic. It still isn't
clear that long-term support for missing TM/TLI is the right
cost/benefit tradeoff for CGP -- we seem to get relatively little for it
and the code is just littered with checks (and assumptions which
I suspect are still missing some checks).
This at least fixes the potential bug in this code spotted by
PVS-Studio, so we've got that going for us. ;]
llvm-svn: 285987
behind the test that the MachineModuleInfo analysis was
actually available and can be used.
While the MachO bits may well be reasonable to assume in the darwin
assembly printer, the analysis isn't constructively guaranteed anywhere
I could find so it seems safest to avoid crashing here.
This issue was found with PVS-Studio. Pretty sure the Clang Static
Anaylzer flags similar issues but we've probably never pointed it at
this code effectively.
llvm-svn: 285972
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.
Reviewers: rengolin, efriedma, t.p.northover, jmolloy
Subscribers: efriedma, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D26120
llvm-svn: 285969
in llvm-objdump for Mach-O files add the printing of the
ARM_THREAD_STATE64 in the same format as
otool-classic(1) on darwin.
To do this the 64-bit ARM general tread state
needed to be defined in include/llvm/Support/MachO.h .
rdar://28985800
llvm-svn: 285967
This file is unused as of r285939, but we need to keep it around
for bots that don't do full rebuilds. We should be able to delete this
again in a few days.
llvm-svn: 285948
Quite sad we still aren't really using aggressive dead code warnings
from Clang that we could potentially use to catch this and so many other
things.
llvm-svn: 285936
This condition is trivially always true prior to the change. The comment
at the call site makes it clear that we expect *all* of these to be '=',
'S', or 'I' so fix the code.
We have a bug I will update to track the fact that Clang doesn't warn on
this: http://llvm.org/PR13101
llvm-svn: 285930
hange explores the fact that LDS reads may be reordered even if access
the same location.
Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.
Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.
Differential Revision: https://reviews.llvm.org/D25944
llvm-svn: 285919
Summary:
Have MergeConsecutiveStores explicitly return information about the stores
that were merged, so that we can safely determine whether the starting
node has been freed.
Reviewers: chandlerc, bogner, niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25601
llvm-svn: 285916
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).
1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.
1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.
llvm-svn: 285893
That is consistent with other methods around
and helps to handle error on a caller side.
Differential revision: https://reviews.llvm.org/D26247
llvm-svn: 285886
This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR.
Reviewers: delena, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26134
llvm-svn: 285878
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.
llvm-svn: 285876
Summary:
The recent change I made to consult the summary when deciding whether to
rename (to handle inline asm) in r285513 broke the distributed build
case. In a distributed backend we will only have a portion of the
combined index, specifically for imported modules we only have the
summaries for any imported definitions. When renaming on import we were
asserting because no summary entry was found for a local reference being
linked in (def wasn't imported).
We only need to consult the summary for a renaming decision for the
exporting module. For imports, we would have prevented importing any
references to NoRename values already.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26250
llvm-svn: 285871
This reverts commit r285732.
This change introduced a new assertion failure in the following
testcase at -O2:
typedef short __v8hi __attribute__((__vector_size__(16)));
__v8hi foo(__v8hi &V1, __v8hi &V2, unsigned mask) {
__v8hi Result = V1;
if (mask & 0x80)
Result[0] = V2[0];
return Result;
}
llvm-svn: 285866
the offsets and sizes of an element of the Mach-O file overlaps with
another element in the Mach-O file.
Some other tests for malformed Mach-O files now run into these
checks so their tests were also adjusted.
llvm-svn: 285860
Otherwise we set it always to zero, which is not correct,
and we assert inside alignTo (Assertion failed:
Align != 0u && "Align can't be 0.").
Differential Revision: https://reviews.llvm.org/D26173
llvm-svn: 285841
Using a pattern similar to that of YamlIO, this allows
us to have a single codepath for translating codeview
records to and from serialized byte streams. The
current patch only hooks this up to the reading of
CodeView type records. A subsequent patch will hook
it up for writing of CodeView type records, and then a
third patch will hook up the reading and writing of
CodeView symbols.
Differential Revision: https://reviews.llvm.org/D26040
llvm-svn: 285836
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.
Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25656
llvm-svn: 285835
Summary:
Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls.
Reviewers: sdardis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26036
llvm-svn: 285814
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.
llvm-svn: 285813
While bootstrapping Clang with recent `gcc 6.2.0` I found a bug related to misleading indentation.
I believe, a pair of `{}` was forgotten, especially given the above similar piece of code:
```
if (!RDef || !HII->isPredicable(*RDef)) {
Done = coalesceRegisters(RD, RegisterRef(S1));
if (Done) {
UpdRegs.insert(RD.Reg);
UpdRegs.insert(S1.getReg());
}
}
```
Reviewers: kparzysz
Differential Revision: https://reviews.llvm.org/D26204
llvm-svn: 285794
Summary:
It was detected that the reassociate pass could enter an inifite
loop when analysing dead code. Simply skipping to analyse basic
blocks that are dead avoids such problems (and as a side effect
we avoid spending time on optimising dead code).
The solution is using the same Reverse Post Order ordering of the
basic blocks when doing the optimisations, as when building the
precalculated rank map. A nice side-effect of this solution is
that we now know that we only try to do optimisations for blocks
with ranked instructions.
Fixes https://llvm.org/bugs/show_bug.cgi?id=30818
Reviewers: llvm-commits, davide, eli.friedman, mehdi_amini
Subscribers: dberlin
Differential Revision: https://reviews.llvm.org/D26154
llvm-svn: 285793
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106595.html
This change also fixes an API oddity where BitstreamCursor::Read() would
return zero for the first read past the end of the bitstream, but would
report_fatal_error for subsequent reads. Now we always report_fatal_error
for all reads past the end. Updated clients to check for the end of the
bitstream before reading from it.
I also needed to add padding to the invalid bitcode tests in
test/Bitcode/. This is because the streaming interface was not checking that
the file size is a multiple of 4.
Differential Revision: https://reviews.llvm.org/D26219
llvm-svn: 285773
This is enough to compile and link but doesn't yet do anything particularly
useful. Once an ASM parser and printer are added in the next two patches, the
whole thing can be usefully tested.
Differential Revision: https://reviews.llvm.org/D23562
llvm-svn: 285770
For now, only add instruction definitions for basic ALU operations. Our
initial target is a working MC layer rather than codegen, so appropriate
SelectionDAG patterns will come later.
Differential Revision: https://reviews.llvm.org/D23561
llvm-svn: 285769
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.
Also start verifying the register classes of inlineasm operands.
llvm-svn: 285762
Summary:
Currently PreferredLoopExit is set only in buildLoopChains, which is
never called if there are no MachineLoops.
MSan is currently broken by this:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/145/steps/check-llvm%20msan/logs/stdio
This is a naive fix to get things green again. iteratee: you may have a better fix.
This change will also mean PreferredLoopExit will not carry over if
buildCFGChains() is called a second time in runOnMachineFunction, this
appears to be the right thing.
Reviewers: bkramer, iteratee, echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26069
llvm-svn: 285757
InstCombine should always canonicalize patterns like the one shown in the comment
when visiting 'select' insts in adjustMinMax().
Scalars were already handled there, and vector splats are handled after:
https://reviews.llvm.org/rL285732
llvm-svn: 285744
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.
llvm-svn: 285743
For example, rename s6Ext to s6_0Ext. The names for shifted integers
include the underscore and this will make the naming consistent. It
also exposed a few duplicates that were removed.
llvm-svn: 285728
It's likely if a conditional branch needs to be expanded, the following
unconditional branch will also need expansion. By expanding the
unconditional branch first, the conditional branch can be simply
inverted to jump over the inserted indirect branch block. If the
conditional branch is expanded first, it results in an additional
branch.
This avoids test regressions in future commits.
llvm-svn: 285722
This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll
Differential Revision: https://reviews.llvm.org/D25802
llvm-svn: 285716
This contains just enough for lib/Target/RISCV to compile. Notably a basic
RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc
-march=riscv32 myinput.ll and will find it fails due to the lack of
MCAsmInfo.
See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for
further discussion
Differential Revision: https://reviews.llvm.org/D23560
llvm-svn: 285712
Add the necessary definitions for RISC-V ELF files, including relocs. Also
make necessary trivial change to ELFYaml, llvm-objdump, and llvm-readobj in
order to work with RISC-V ELFs.
Differential Revision: https://reviews.llvm.org/D23557
llvm-svn: 285708
As it stands, the OperandMatchResultTy is only included in the generated
header if there is custom operand parsing. However, almost all backends
make use of MatchOperand_Success and friends from OperandMatchResultTy for
e.g. parseRegister. This is a pain when starting an AsmParser for a new
backend that doesn't yet have custom operand parsing. Move the enum to
MCTargetAsmParser.h.
This patch is a prerequisite for D23563
Differential Revision: https://reviews.llvm.org/D23496
llvm-svn: 285705
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.
The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot. I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.
Reviewers: bogner, ab, t.p.northover, arsenm
Subscribers: wdng, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D25999
llvm-svn: 285704
This patch introduces the combine:
(C1 shift (A add C2)) -> ((C1 shift C2) shift A)
iff A and C2 are both positive
If both A and C2 are know to be positive then we can safely split into 2 shifts, permitting the folding of the Inner shift.
Fix for the spec benchmark case mentioned by @nadav on PR15141 (assuming we can prove that the inputs as positive).
Differential Revision: https://reviews.llvm.org/D26000
llvm-svn: 285696
[Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment]
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
llvm-svn: 285690
If a response file included by construct @file itself includes a response file
and that file is specified by relative file name, current behavior is to resolve
the name relative to the current working directory. The change adds additional
flag to ExpandResponseFiles that may be used to resolve nested response file
names relative to including file. With the new mode a set of related response
files may be kept together and reference each other with short position
independent names.
Differential Revision: https://reviews.llvm.org/D24917
llvm-svn: 285675
This is intended to make the semantic intent clearer.
The wrapper objects are now generic to avoid `const_cast` s. Since
`const` ness is part of the API of `MDNode::getMostGenericTBAA` (and
therefore I can't make things `const` all the way through without some
code churn outside TypeBasedAliasAnalysis.cpp), this seemed like the
cleanest solution.
llvm-svn: 285665
No block info block should need to define local abbreviations, so we can
always use a code width of 2.
Also change all block info block writers to use EnterBlockInfoBlock.
Differential Revision: https://reviews.llvm.org/D26168
llvm-svn: 285660
This bug was exposed by using nsw/nuw for more aggressive folds in:
https://reviews.llvm.org/rL284844
The changes mimic the IR demanded bits logic in InstCombiner::SimplifyDemandedUseBits(),
but we can't just flip flag bits in the DAG; we have to create a new node that has the
bits cleared.
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30841
llvm-svn: 285656
Generate the slowest possible codepath for noopt CodeGen. Even trying to be
clever with the negated jump can cause out-of-range jumps. Use a wide branch
instead. Although the code is modelled simplistically, the later optimizations
would recombine the branching into `cbz` if possible. This re-enables the
previous optimization as well as hopefully gives us working code in all cases.
Addresses PR30356!
llvm-svn: 285649
Summary:
There is no point to importing at -O0, since we won't inline. We should
also disable other cross-module optimizations.
(Plan to backport this fix to the 3.9 branch to fix PR30774)
Reviewers: pcc
Subscribers: johanengelen, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25918
llvm-svn: 285648
Summary:
This has been replaced by the NVPTXInferAddressSpaces pass. We've had
the new one as the default with the old one accessible via a flag for
some months now, and we've had no problems.
Reviewers: tra
Subscribers: llvm-commits, jholewinski, jingyue, mgorny
Differential Revision: https://reviews.llvm.org/D26165
llvm-svn: 285642
the offsets and sizes of an element of the file overlaps with
another element in the Mach-O file.
This shows the approach to this testing for three elements
and contains for tests for their overlap. Checking for all the
remain elements will be added next.
llvm-svn: 285632
DW_TAG_atomic_type was already included in Dwarf.defs and emitted correctly,
however Verifier didn't recognize it as valid.
Thus we introduce the following changes:
* Make DW_TAG_atomic_type valid tag for IR and DWARF (enabled only with -gdwarf-5)
* Add it to related docs
* Add DebugInfo tests
Differential Revision: https://reviews.llvm.org/D26144
llvm-svn: 285624
On Darwin, simple C null-terminated constant strings normally end up in the __TEXT,__cstring section of the resulting Mach-O binary. When instrumented with ASan, these strings are transformed in a way that they cannot be in __cstring (the linker unifies the content of this section and strips extra NUL bytes, which would break instrumentation), and are put into a generic __const section. This breaks some of the tools that we have: Some tools need to scan all C null-terminated strings in Mach-O binaries, and scanning all the contents of __const has a large performance penalty. This patch instead introduces a special section, __asan_cstring which will now hold the instrumented null-terminated strings.
Differential Revision: https://reviews.llvm.org/D25026
llvm-svn: 285619
This change enables LLD to construct a Section Map stream in a PDB file.
I do not understand all these fields in the Section Map yet, but it seems
like a copy of a COFF section header in another format.
With this patch, DbiStreamBuilder can emit a Section Map which
llvm-pdbdump can dump.
Differential Revision: https://reviews.llvm.org/D26112
llvm-svn: 285606
Modifying DWARFFormValue to remember the DWARFUnit that it was encoded with can simplify the usage of instances of this class. Previously users would have to try and pass in the same DWARFUnit that was used to decode the form value and there was a possibility that a different DWARFUnit might be supplied to the functions that extract values (strings, CU relative references, addresses) and cause problems. This fixes this potential issue by storing the DWARFUnit inside the DWARFFormValue so that this mistake can't be made. Instances of DWARFFormValue are not stored permanently and are used as temporary values, so the increase in size of an instance of DWARFFormValue isn't a big deal. This makes decoding form values more bullet proof and is a change that will be used by future modifications.
https://reviews.llvm.org/D26052
llvm-svn: 285594
Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions.
Commit on behalf of mharoush
Extending inline assembly support, compatible with GCC as folowing:
"k" constraint hints the compiler to select any of AVX512 k0-k7 registers.
"Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask.
Reviewer: 1. rnk
Differential Revision: https://reviews.llvm.org/D25062
llvm-svn: 285591
Fixes Bug 30808.
Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced.
Old gfx6/7-specific def renamed to smrd_offset_8 for clarity.
Lit tests updated.
Differential Revision: https://reviews.llvm.org/D26085
llvm-svn: 285590
This patch implements two changes:
- Move processor feature definition into a new file SystemZFeatures.td,
and provide explicit lists of supported and unsupported features for
each level of the z/Architecture. This allows specifying unsupported
features in the scheduler definition files for each processor.
- Add optional aliases for the -mcpu processor names according to the
level of the z/Architecture, for compatibility with other compilers
on the platform. The supported aliases are:
-mcpu=arch8 equals -mcpu=z10
-mcpu=arch9 equals -mcpu=z196
-mcpu=arch10 equals -mcpu=zEC12
-mcpu=arch11 equals -mcpu=z13
llvm-svn: 285577
The LEFR/LFER pseudos are aliases for vector instructions and should
therefore be guared by FeatureVector. If they aren't, the TableGen
scheduler definition checking might complain that there is no data
for those pseudos for pre-z13 machines.
No functional change intended.
llvm-svn: 285576
Currently, when using an instruction that is not supported on the
currently selected architecture, the LLVM assembler is likely to
diagnose an "invalid operand" instead of a "missing feature".
This is because many operands require a custom parser in order to
be processed correctly, and if an instruction is not available
according to the current feature set, the generated parser code
will also not detect the associated custom operand parsers.
Fixed by temporarily enabling all features while parsing operands.
The missing features will then be correctly detected when actually
parsing the instruction itself.
llvm-svn: 285575
LLVM currently treats the first operand of MVCK as if it were a
regular base+index+displacement address. However, it is in fact
a base+displacement combined with a length register field.
While the two might look syntactically similar, there are two
semantic differences:
- %r0 is a valid length register, even though it cannot be used
as an index register.
- In an expression with just a single register like 0(%rX), the
register is treated as base with normal addresses, while it is
treated as the length register (with an empty base) for MVCK.
Fixed by adding a new operand parser class BDRAddr and reworking
the assembler parser to distinguish between address + length
register operands and regular addresses.
llvm-svn: 285574
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.
Differential Revision: https://reviews.llvm.org/D25722
llvm-svn: 285564
possible pointer-wrap-around concerns, in some cases.
Before this patch, collectConstStridedAccesses (part of interleaved-accesses
analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when
examining all candidate pointers. This is too conservative. Instead, this
patch makes collectConstStridedAccesses use an optimistic approach, calling
getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the
candidate interleave groups have been formed, revisits the pointer-wrapping
analysis but only where it matters: namely, in groups that have gaps, and where
the gaps are not at the very end of the group (in which case the loop is
peeled). This second time getPtrStride is called with [Assume=false,
ShouldCheckWrap=true], but this could further be improved to using Assume=true,
once we also add the logic to track that we are not going to meet the scev
runtime checks threshold.
Differential Revision: https://reviews.llvm.org/D25276
llvm-svn: 285517
This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type.
llvm-svn: 285515
Summary:
Instead of using the workaround of suppressing the entire index for
modules that call inline asm that may reference locals, use the
NoRename flag on the summary for any locals in the llvm.used set, and
add a reference edge from any functions containing inline asm.
This avoids issues from having no summaries despite the module defining
global values, which was preventing more aggressive index-based
optimization. It will be followed by a subsequent patch to make a
similar fix for local references in module level asm (to fix PR30610).
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26121
llvm-svn: 285513
Summary:
When we have an aliasee that is linkonce, while we can't convert
the non-prevailing copies to available_externally, we still need to
convert the prevailing copy to weak. If a reference to the aliasee
is exported, not converting a copy to weak will result in undefined
references when the linkonce is removed in its original module.
Add a new test and update existing tests.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26076
llvm-svn: 285512
Summary:
Replace the check of whether a GV has a section with the flag check
in the summary. This is in preparation for using the NoPromote flag
to convey other situations when we can't promote (e.g. locals used in
inline asm).
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26063
llvm-svn: 285507
Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202.
There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding
smax pattern and commuted variants.
The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We
can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR.
Differential Revision: https://reviews.llvm.org/D26091
llvm-svn: 285499
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.
I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.
DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.
This looked like this had caused compile time regressions on some buildbots (and was reverted in rL285381), but appears to have just been a harmless bystander!
Differential Revision: https://reviews.llvm.org/D25691
llvm-svn: 285494
- Fix doxygen file comment
- reduce indentation in loop
- Factor out some common subexpressions
- Move independent helper function out of class
- Fix Changed flag (this is not strictly NFC but a bugfix, but the flag
seems ignored anyway)
llvm-svn: 285488
This resubmits r284436 and r284437, which were reverted in
r284462 as they were breaking the AArch64 buildbot.
The breakage on AArch64 turned out to be a miscompile which is
still not fixed, but is actively tracked at llvm.org/pr30748.
This resubmission re-writes the code in a way so as to make the
miscompile not happen.
llvm-svn: 285483
Instead of asserting that the shift count is != 0 we just bail out
as it's not profitable trying to optimize a node which will be
removed anyway.
Differential Revision: https://reviews.llvm.org/D26098
llvm-svn: 285480
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.
Reviewers: tony-tye, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D25998
llvm-svn: 285479
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.
This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.
llvm-svn: 285463
While trying to add the glc bit to SMEM instructions on VI
with the new refactoring I ran into some kind of shadowing
problem for the glc operand when using the pseudoinstruction
as a multiclass parameter.
Everywhere that currently uses it defines the operand to have the same
name as its type, i.e. glc:$glc which works. For some reason now it
conflicts, and its up evaluating to the wrong thing. For the
real encoding classes,
let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated
and still visible in the Inst initializer in the expanded td file.
In other cases I got a a different error about an illegal operand
where this was using { 0 } initializer from the bits<1> glc initializer
instead of evaluating it as false in the if.
For consistency all of the operand types should probably
be captialized to avoid conflicting with the variable names
unless somebody has a better idea of how to fix this.
llvm-svn: 285462
Summary:
In isel, transform
Num % Den
into
Num - (Num / Den) * Den
if the result of Num / Den is already available.
Reviewers: tra
Subscribers: hfinkel, llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D26090
llvm-svn: 285461
Summary:
This "pass" eagerly creates div and rem instructions even when only one
is needed -- it relies on a later pass (machine DCE?) to clean them up.
This is problematic not just from a cleanliness perspective (this pass
is running during CodeGenPrepare, so should leave the IR in a better
state), but it also creates a problem for instruction selection. If we
always have a div+rem, isel will always select a divrem instruction (if
possible), even when a single div or rem would do.
Specifically, in NVPTX, we want to compute rem from the output of div,
if available. But if a div is not available, we want to leave the rem
alone. This transformation is overeager if div is always available.
Because this code runs as part of CodeGenPrepare, it's nontrivial to
write a test for this change. But this will effectively be tested by
a later patch which adds the aforementioned change to NVPTX isel.
Reviewers: tra
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26088
llvm-svn: 285460
Summary:
In BypassSlowDivision's short-dividend path, we would create e.g.
udiv exact i32 %a, %b
"exact" here means that we are asserting that %a is a multiple of %b.
But we have no reason to believe this must be true -- this is just a
bug, as far as I can tell.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D26097
llvm-svn: 285459
When LivePhysRegs adds live-in registers, it recognizes ~0 as a special
lane mask indicating the entire register. If the lane mask is not ~0,
it will only add the subregisters that overlap the specified lane mask.
The problem is that if a live-in register does not have subregisters,
and the lane mask is not ~0, it will not be added to the live set.
(The given lane mask may simply be the lane mask of its register class.)
If a register does not have subregisters, add it to the live set if
the lane mask is non-zero.
Differential Revision: https://reviews.llvm.org/D26094
llvm-svn: 285440
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.
llvm-svn: 285435
We already read the flags out of the summary when writing the summary
records for functions and aliases, do the same for variables.
This is an NFC change for now since the flags computed on the fly from
the GlobalValue currently will always match those in the summary
already, but once I send a follow-on patch to set the NoRename flag for
locals in the llvm.used set this becomes a necessary change.
llvm-svn: 285433
TargetPassConfig::addMachinePasses() does some housekeeping first:
Handling the -print-machineinstrs flag and doing an initial printing
"After Instruction Selection". There is no reason for RegUsageInfoProp
to run before those two steps.
llvm-svn: 285422
Do not use LiveIntervals to recalculate kills, because that cannot be
done accurately without implicit uses on predicated instructions.
llvm-svn: 285409
Summary:
We were trying to add APInt values with different bit sizes after
visiting an addrspacecast instruction which changed the bit width
of the pointer.
Reviewers: majnemer, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24774
llvm-svn: 285407
Now LPPassManager will run LCSSA verification only for the top-level loop
which was processed on the current iteration.
Differential Revision: https://reviews.llvm.org/D25873
llvm-svn: 285394
Fixes PR 30784. Discussed with Justin, who pointed out that
in the new PassManager infrastructure we can have more fine-grained
control on which analyses we want to preserve, but this is the
best we can do with the current infrastructure.
llvm-svn: 285380
Summary:
Previously we were creating the alias summary on the fly while writing
the summary to bitcode. This moves the creation of these summaries to
the module summary index builder where we build the rest of the summary
index.
This is going to be necessary for setting the NoRename flag for values
possibly used in inline asm or module level asm.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26049
llvm-svn: 285379
Summary:
This is in preparation for a change to utilize this flag for symbols
referenced/defined in either inline or module level assembly.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26048
llvm-svn: 285376
In the past the compiler always emitted .debug_line version 2, though some opcodes from DWARF 3 (e.g. DW_LNS_set_prologue_end, DW_LNS_set_epilogue_begin or DW_LNS_set_isa) and from DWARF 4 could be emitted by the compiler.
This patch changes version information of .debug_line to exactly match the DWARF version. For .debug_line version 4, a new field maximum_operations_per_instruction is emitted.
Differential Revision: https://reviews.llvm.org/D16697
llvm-svn: 285355
Summary:
This patch adds DoubleAPFloat mode to APFloat.
Now, an APFloat with semantics PPCDoubleDouble will have DoubleAPFloat layout
(APFloat.U.Double), which contains two underlying APFloats as
PPCDoubleDoubleImpl and IEEEdouble semantics. Currently the IEEEdouble APFloat
is not used, and the first APFloat behaves exactly the same before this change.
This patch consists of three kinds of logics:
1) Construction and destruction of APFloat. Now the ctors, dtor, assign
opertors and factory functions construct different underlying layout
based on the semantics passed in.
2) s/IEEE/getIEEE()/ for normal, lifetime-unrelated computation functions.
These functions only access Floats[0] in DoubleAPFloat, which is the
same as today's semantic.
3) A "Double dispatch" function, APFloat::convert. Converting between two
different layouts requires appropriate logic.
Neither of these change the external behavior.
Reviewers: hfinkel, kbarton, echristo, iteratee
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25977
llvm-svn: 285351
There is a use after free bug in the existing code. Loop layout selects
a preferred exit block, and then lays out the loop. If this block is
removed during layout, it needs to be invalidated to prevent a use after
free.
llvm-svn: 285348
obsolete load commands.
Again the philosophy of the error checking in libObject for
Mach-O files, the idea behind the checking is that we never
will return a Mach-O file out of libObject that contains unknown
things the library code can’t operate on. So known obsolete
load commands will cause a hard error.
Also to make things clear I have added comments to the
values and structures in Support/Mach-O.h and
Support/MachO.def as to what is obsolete.
As noted in a TODO in the code, there may need to be a
non-default mode to allow some unknown values for well
structured Mach-O files with things like unknown load
load commands. So things like using an old lldb on a newer
Mach-O file could still provide some limited functionality.
llvm-svn: 285342
The Windows ARM target expects the compiler to emit a division-by-zero check.
The check would use the form of:
cmp r?, #0
cbz .Ltrap
b .Lbody
.Lbody:
...
.Ltrap:
udf #249 @ __brkdiv0
This works great most of the time. However, if the body of the function is
greater than 127 bytes, the branch target limitation of cbz becomes an issue.
This occurs in the unoptimized code generation cases sometimes (like in
compiler-rt).
Since this is a matter of correctness, possibly pay a small penalty instead. We
now form this slightly differently:
cbnz .Lbody
udf #249 @ __brkdiv0
.Lbody:
...
The positive case is through the branch instead of being the next instruction.
However, because of the basic block layout, the negated branch is going to be
a short distance always (2 bytes away, after the inserted __brkdiv0).
The new t__brkdiv0 instruction is required to explicitly mark the instruction as
a terminator as the generic UDF instruction is not a terminator.
Addresses PR30532!
llvm-svn: 285312
Summary: LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.
Reviewers: hfinkel, davidxl, chandlerc
Subscribers: anna, modocache, mgorny, beanz, reames, dberlin, chandlerc, mcrosier, junbuml, sanjoy, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D22778
llvm-svn: 285308
r282428 added the MipsOptimizePICCall as an opt-in pass that can be
skipped when using the -opt-bisect-limit option. However, this pass is
needed because it generates code that conforms to the o32 ABI
specification by using the $t9 register for PIC calls with JALR
instructions.
This bug was exposed by the fact that skipFunction() also checks for
the "optnone" attribute. This caused functions with that attribute to
break the requirements of the o32 ABI.
llvm-svn: 285305
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).
Updated cost table accordingly.
Differential Revision: https://reviews.llvm.org/D26011
llvm-svn: 285304
Summary:
Found when running Valgrind.
This removes two unnecessary assignments when using
AttrBuilder::removeAttribute.
AttrBuilder::removeAttribute returns a reference to the object.
As the LHSes were the same as the callees, the assignments
resulted in memcpy calls where dst = src.
Commited on behalf-of: dstenb (David Stenberg)
Reviewers: mkuper, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25460
llvm-svn: 285298
Currently computeKnownBits returns the common known zero/one bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original computeKnownBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
The approach was found to be easier than trying to add a per-element known bits solution, for a similar usefulness given the combines where computeKnownBits is typically used.
I've only added support for a few opcodes so far (the ones that have proven straightforward to test), all others will default to demanding all elements but can be updated in due course.
DemandedElts support could similarly be added to computeKnownBitsForTargetNode in a future commit.
Differential Revision: https://reviews.llvm.org/D25691
llvm-svn: 285296
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.
Also the test is fixed to make it perform actual testing.
Differential Revision: https://reviews.llvm.org/D25671
llvm-svn: 285286
UMAAL is a DSP instruction and it is not available on thumbv7m
(Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong
CHECK prefix in longMAC.ll test.
Patch by Vadzim Dambrouski.
Differential Revision: https://reviews.llvm.org/D25890
llvm-svn: 285278
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.
Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].
The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25829
llvm-svn: 285273
This patch corresponds to review:
https://reviews.llvm.org/D25896
It just eliminates the redundant ZExt after a count trailing zeros instruction.
llvm-svn: 285267
Since Exynos-M2 improved the FP square root unit a bit over the one in
Exynos-M1, it does not benefit from using the Newton series for such
operations.
llvm-svn: 285246
Change type of some missed DebugInfo-related alignment variables,
that are still uint64_t, to uint32_t.
Original change introduced in r284482.
llvm-svn: 285242
It would be a very nice invariant to rely on, but unfortunately it doesn't
necessarily hold (and the causes of mis-sorted reglists appear to be quite
varied) so to be robust the frame lowering code can't assume that the first
register in the list is also the first one that actually gets pushed.
Should fix an issue where we were turning something like:
push {r8, r4, r7, lr}
sub sp, #24
into nonsense like:
push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr}
llvm-svn: 285232
This patch ensures that if a floating point vector operand is legalized by
expanding, it is legalized through the stack rather than by calling
DAGTypeLegalizer::IntegerToVector which will cause a failure since the operand
is a non-integer type.
This fixes PR 30715.
llvm-svn: 285231
Summary:
Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`.
This is a folloup of rL258422 and
https://github.com/rust-lang/rust/pull/30917 where llvm failed to
optimize away the bounds checking in a binary search.
Patch by Arthur Silva!
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25941
llvm-svn: 285228
This reverts commit r285191.
LICM appears to rely on the Alias Set Tracker hitting lifetime markers to prevent
code from being moved outside of the original scope.
llvm-svn: 285227
Update the README.txt with newer information, add a link to the Emscripten
page explaining the current easiest way to use the LLVM wasm backend, and
mention that other ways of using the LLVM wasm backend are in development.
llvm-svn: 285215
This reapplies revision 285093. Original commit message:
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.
Differential Revision: https://reviews.llvm.org/D25742
llvm-svn: 285212
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.
Refactor processor definition to use ISA version features.
Fixed ISA version for stoney.
Based on Laurent Morichetti's patch.
Differential Revision: https://reviews.llvm.org/D25919
llvm-svn: 285210
Summary: This patch introduces updateDiscriminator to DILocation so that it can be directly called by AddDiscriminator. It also makes it easier to update the discriminator later.
Reviewers: dnovillo, dblaikie, aprantl, echristo
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25959
llvm-svn: 285207
This finds all of the references to a frame index in a function, and
sorts by the offset. If multiple instructions use the same offset,
nothing was breaking the tie for sorting.
This avoids the test failures the reverted r282999 introduced.
llvm-svn: 285201
Summary:
AMDGPU will need this one i16 is added as a legal type. This is tested by:
test/CodeGen/AMDGPU/sdiv.ll
test/CodeGen/AMDGPU/sdivrem24.ll
test/CodeGen/AMDGPU/udiv.ll
test/CodeGen/AMDGPU/udivrem24.ll
Reviewers: bogner, efriedma
Subscribers: efriedma, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D25699
llvm-svn: 285199
Summary:
In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches.
Fixes pr30561
Reviewers: igorb, aymanmus, delena
Differential Revision: https://reviews.llvm.org/D25310
llvm-svn: 285196
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Differential Revision: https://reviews.llvm.org/D24425
llvm-svn: 285189
When the loop exit condition is canonicalized as a != compaison, reuse the
debug location of the original (non canonical) comparison.
Before this patch, the debug location of the new icmp was obtained from the
loop latch terminator. This patch fixes the issue by correctly setting the
IRBuilder's "current debug location" to the location of the original compare.
Differential Revision: https://reviews.llvm.org/D25953
llvm-svn: 285185
* Assume that clang passes non-zero alignment value to DIBuilder
only in case when it was forced by C++11 'alignas', C11 '_Alignas'
or compiler attribute '__attribute__((aligned (N)))'.
* Emit DW_AT_alignment if alignment is specified for type/object.
Differential Revision: https://reviews.llvm.org/D24425
llvm-svn: 285181
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
Reviewers: igorb, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25933
llvm-svn: 285173
On SparcV8, it was previously the case that a variable-sized alloca
might overlap by 4-bytes the last fixed stack variable, effectively
because 92 (the number of bytes reserved for the register spill area) !=
96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC).
It's not as simple as changing 96 to 92, because variables that should
be 8-byte aligned would then be misaligned.
For now, simply increase the allocation size by 8 bytes for each dynamic
allocation -- wastes space, but at least doesn't overlap. As the large
comment says, doing this more efficiently will require larger changes in
llvm.
Also adds some test cases showing that we continue to not support
dynamic stack allocation and over-alignment in the same function.
llvm-svn: 285131
Summary:
Fixes PR28281.
MSVC lists indirect virtual base classes in the field list of a class,
using LF_IVBCLASS records. This change makes LLVM emit such records
when processing DW_TAG_inheritance tags with the DIFlagVirtual and
(newly introduced) DIFlagIndirect tags.
Reviewers: rnk, ruiu, zturner
Differential Revision: https://reviews.llvm.org/D25578
llvm-svn: 285130
Summary:
Select instruction annotation in IR PGO uses the edge count to infer the
branch count. It's currently placed in setInstrumentedCounts() where
no all the BB counts have been computed. This leads to wrong branch weights.
Move the annotation after all BB counts are populated.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25961
llvm-svn: 285128
The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996.
The problem is with following code
xB = load (type B);
xA = load (type A);
+yA = (A)xB; B -> A
+zAn = PHI[yA, xA]; PHI
+zBn = (B)zAn; // A -> B
store zAn;
store zBn;
optimizeBitCastFromPhi generates
+zBn = (B)zAn; // A -> B
and expects it will be combined with the following store instruction to another
store zAn
Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely.
optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions.
Differential Revision: https://reviews.llvm.org/D23896
llvm-svn: 285116
This reverts r285093, as it caused unexpected buildbot failures on
clang-ppc64le-linux, clang-ppc64be-linux, clang-ppc64be-linux-multistage
and clang-ppc64be-linux-lnt. Failing test ubsan/TestCases/TypeCheck/vptr.cpp.
llvm-svn: 285110
Summary:
The intention is to make APFloat an interface class, so that later I can add a second implementation class DoubleAPFloat to correctly implement PPCDoubleDouble semantic. The interface of IEEEFloat is not public, and can be simplified (currently it's exactly the same as the old APFloat), but that belongs to a separate patch.
DoubleAPFloat should look like:
class DoubleAPFloat {
const fltSemantics *Semantics;
std::unique_ptr<APFloat> APFloats; // Two heap-allocated APFloats.
};
There is no functional change, nor public interface change.
Reviewers: hfinkel, chandlerc, iteratee, echristo, kbarton
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25536
llvm-svn: 285105
Add an option to allow easier experimentation by target maintainers with the
minimum number of entries to create jump tables. Also clarify the name of
the other existing option governing the creation of jump tables.
Differential revision: https://reviews.llvm.org/D25883
llvm-svn: 285104
When there's a tie between partitionings of jump tables, consider also cases
that result in no jump tables, but in one or a few cases. The motivation is
that many contemporary processors typically perform case switches fairly
quickly.
Differential revision: https://reviews.llvm.org/D25212
llvm-svn: 285099
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.
Differential Revision: https://reviews.llvm.org/D25632
llvm-svn: 285097
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
Thanks to Adrian Prantl for stewarding this patch!
llvm-svn: 285094
The branch folding pass tail merges blocks into a common-tail. However, the
tail retains the debug information from one of the original inputs to the
merge (chosen randomly). This is a problem for sampled-based PGO, as hits
on the common-tail will be attributed to whichever block was chosen,
irrespective of which path was actually taken to the common-tail.
This patch fixes the issue by nulling the debug location for the common-tail.
Differential Revision: https://reviews.llvm.org/D25742
llvm-svn: 285093
When indvars widened an induction variable, the debug location for the loop
increment computation was incorrectly set equal to the debug loc of the loop
latch terminator.
This patch fixes the issue by propagating the correct location from the
original loop increment instruction to the new widened increment.
Differential Revision: https://reviews.llvm.org/D25872
llvm-svn: 285083
Now that MemorySSA keeps track of whether MemoryUses are optimized, use
getClobberingMemoryAccess() to check MemoryUse memory dependencies since
it should no longer be so expensive.
This is a follow-up change to https://reviews.llvm.org/D25881
llvm-svn: 285080
It is not safe to use LOAD ON CONDITION to implement access to a memory
location marked "volatile", since the architecture leaves it unspecified
whether or not an access happens if the condition is false.
The current code already appears to care about that:
def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>;
Unfortunately, that "nonvolatile_load" operator is simply ignored
by the CondUnaryRSY class, and there was no test to catch it.
llvm-svn: 285077
We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches.
This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent.
Differential Revision: https://reviews.llvm.org/D25874
llvm-svn: 285072
Summary:
Do *not* perform combines such as:
vector_shuffle<4,1,2,3>(build_vector(Ud, C0, C1 C2), scalar_to_vector(X))
->
build_vector(X, C0, C1, C2)
Keeping the shuffle allows lowering the constant build_vector to a materialized
constant vector (such as a vector-load from the constant-pool or some other idiom).
Reviewers: delena, igorb, spatel, mkuper, andreadb, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25524
llvm-svn: 285063
In an IR symbol table I would expect the comdats to be represented as:
- A table of strings, one for each comdat name.
- Each symbol has an optional index into that table.
The natural api for accessing that would be
InputFile:
ArrayRef<StringRef> getComdatTable() const;
Symbol:
int getComdatIndex() const;
This patch implements an API as close to that as possible. The
implementation on top of the current IRObjectFile is a bit hackish,
but should map just fine over a symbol table and is very convenient to
use.
llvm-svn: 285061
Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added.
Reviewers: delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25594
llvm-svn: 285053
This is a function to go backwards in a block to find the first
instruction in a bundle, so iterator is a more natural choice for
parameter/return rather than a reference to a MachineInstruction.
llvm-svn: 285051
This changes locals from being declared by the emitLocal hook in
WebAssemblyTargetStreamer, rather than with an instruction. After exploring
the infastructure in LLVM more, this seems to make more sense since
declaring locals doesn't use an encoded opcode.
This also adds more 0xd opcodes, type encodings, and miscellaneous
binary encoding bits.
llvm-svn: 285040
Passing a MachineFunction as argument is more natural and avoids an
unnecessary round-trip through the logic determining the correct
Subtarget because MachineFunction already has a reference anyway.
llvm-svn: 285039
There are two fixes here: one, AnalyzeUsesOfPointer can't return
false until it has checked all the uses of the pointer. Two, if a
global uses another global, we have to assume the address of the
first global escapes.
Fixes https://llvm.org/bugs/show_bug.cgi?id=30707 .
Differential Revision: https://reviews.llvm.org/D25798
llvm-svn: 285034
(Const)?MIOperands is equivalent to the C++ style
MachineInstr::mop_iterator. Use the latter for consistency except for a
few callers of MIOperands::analyzePhysReg().
llvm-svn: 285029
This patch adds a pass, controlled by an option and off by default for
now, for making implicit get_local/set_local explicit. This simplifies
emitting wasm with MC.
Differential Revision: https://reviews.llvm.org/D25836
llvm-svn: 285009
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.
Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.
Differential Revision: https://reviews.llvm.org/D25917
llvm-svn: 285006
This fixes a bug in the handling of lexical scopes, when more than one
scope is defined on the same line or functions are inlined into call
sites that are on the same line as the function definition. This
situation can easily happen in macro expansions.
The problem is solved by introducing a SmallDenseMap<DIScope *,
DILexicalBlockFile *, 1> that keeps track of all the different lexical
scopes that share a line/file location.
Fixes PR30681.
llvm-svn: 284998
Add support for estimating the square root or its reciprocal and division or
reciprocal using the combiner generic Newton series.
Differential revision: https://reviews.llvm.org/D25291
llvm-svn: 284986
Summary:
When using MemorySSA, re-optimize MemoryPhis when removing a store since
this may create MemoryPhis with all identical arguments.
Also, when using MemorySSA to check if two MemoryUses are reading from
the same version of the heap, use the defining access instead of calling
getClobberingAccess, since the latter can currently result in many more
AA calls. Once the MemorySSA use optimization tracking changes are
done, we can remove this limitation, which should result in more loads
being CSE'd.
Reviewers: dberlin
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25881
llvm-svn: 284984
https://reviews.llvm.org/D24924
This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review.
llvm-svn: 284983
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.
This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.
Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.
Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.
This fixes several GL45-CTS.shaders.indexing.* tests.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25633
llvm-svn: 284980
Fix AsmParser lines to correctly handle end-of-line pre-processor
comments parsing when '#' is not the assembly line comment prefix.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25567
llvm-svn: 284978
If we don't have futimens(), we fall back to futimes(), which only supports
microsecond timestamps. In that case, we need to explicitly cast away the extra
precision in setLastModificationAndAccessTime().
llvm-svn: 284977
Summary:
Most of the changes are very straight-forward. The only choice I had to make was
to use second-precision time points in the Archive classes. I did this because
the archive files use that precision in the on-disk representation anyway.
Reviewers: rafael, zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25773
llvm-svn: 284974
Summary:
Add relocations for AArch64 ILP32. Includes:
- Addition of definitions for R_AARCH32_*
- Definition of new -target-abi: ilp32
- Definition of data layout string
- Tests for added relocations. Not comprehensive, but matches
existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64".
- Tests for llvm-readobj
Reviewers: zatrazz, peter.smith, echristo, t.p.northover
Subscribers: aemerson, rengolin, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25159
llvm-svn: 284973
Summary:
These are good candidates for jump threading. This enables later opts
(such as InstCombine) to combine instructions from the selects with
instructions out of the selects. SimplifyCFG will fold the select
again if unfolding wasn't worth it.
Patch by James Molloy and Pablo Barrio.
Reviewers: reames, bkramer, mcrosier, gberry, haicheng, jmolloy, sebpop
Subscribers: jojo, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D25477
llvm-svn: 284971
Summary:
This is a follow-up to D25416. It removes all usages of TimeValue from
llvm/Support library (except for the actual TimeValue declaration), and replaces
them with appropriate usages of std::chrono. To facilitate this, I have added
small utility functions for converting time points and durations into appropriate
OS-specific types (FILETIME, struct timespec, ...).
Reviewers: zturner, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25730
llvm-svn: 284966
Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci
as not being part of microMIPS. This does not cover the sync instruction alias,
as that will be handled with a different patch. Add sync to the valid tests for
microMIPS.
Reviewers: vkalintiris
Differential Revision: https://reviews.llvm.org/D25795
llvm-svn: 284962
Summary: With MSVC 2013 and GCC < 4.8 gone, we can use the "constexpr" keyword.
Reviewers: bkramer, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25901
llvm-svn: 284947
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.
llvm-svn: 284939
In BasicAA GEP operand values get adjusted ("wrap-around") based on the
pointersize. Otherwise, in non-64b modes, AA could report false negatives.
However, a wrap-around is valid only for a fully evaluated expression.
It had been introduced to fix an alias problem in
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160118/326163.html.
This commit restricts the wrap-around to constant gep operands only where the
value is known at compile-time.
llvm-svn: 284908
This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
Differential Revision: https://reviews.llvm.org/D25805
llvm-svn: 284891
If a 64-bit value is tested against a bit which is known to be in the range
[0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly
shorter encoding.
Differential Revision: https://reviews.llvm.org/D25862
llvm-svn: 284864
Summary: This adds support for dumping the globals stream from PDB files using llvm-pdbdump, similar to the support we have for the publics stream.
Reviewers: ruiu, zturner
Subscribers: beanz, mgorny, modocache
Differential Revision: https://reviews.llvm.org/D25801
llvm-svn: 284861
Summary:
Utility pass to remove gc.relocates created by rewrite statepoints for GC.
With respect to safepoint verification, the IR generated would be incorrect, and cannot run
as such.
This would be a single transformation on the final optimized IR.
The benefit of the pass is for easy analysis when the IRs are 'polluted' by too
many gc.relocates.
Added tests.
test run: All RS4GC tests with -verify option. Local downstream tests on large
IR files. This also works when the pointer being gc.relocated is another
gc.relocate.
Reviewers: sanjoy, reames
Subscribers: beanz, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D25096
llvm-svn: 284855
0 - X --> 0, if the sub is NUW
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
0 - X --> X, if X is 0 or the minimum signed value
This is the DAG equivalent of:
https://reviews.llvm.org/rL284649
plus the fold for the NUW case which already existed in InstSimplify.
Note that we miss a vector fold because of a deficiency in the DAG version of
computeKnownBits().
llvm-svn: 284844
After register allocation it is possible to have a spill of a register
that is only partially defined. That in itself it fine, but creates a
problem for double vector registers. Stores of such registers are pseudo
instructions that are expanded into pairs of individual vector stores,
and in case of a partially defined source, one of the stores may use
an entirely undefined register. To avoid this, track the defined parts
and only generate actual stores for those.
llvm-svn: 284841
Summary:
Need to reorder the operands to have the callee as the last argument.
Adds a pseudo-instruction, and a pass to lower it into a real
call_indirect.
This is the first of two options for how to fix the problem.
Reviewers: dschuff, sunfish
Subscribers: jfb, beanz, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D25708
llvm-svn: 284840
Because we're just 'or-ing' these 2 variables later in the code, I
don't think there's a logical bug here, but of course the string with
"no size" is the one that should have the size suffix stripped off.
llvm-svn: 284826
As discussed in D24815, let's start the process of killing off the broken fast-math global
state housed in TargetOptions and eliminate the need for function-level fast-math attributes.
Here we enable two similar folds that are possible when we don't care about signed-zero:
fadd nsz x, 0 --> x
fsub nsz 0, x --> -x
Note that although the test cases include a 'sin' function call, I'm side-stepping the
FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these
tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node.
Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't
actually do anything today because Flags are silently dropped for any node that is not a
binary operator.
Differential Revision: https://reviews.llvm.org/D25297
llvm-svn: 284824
When we have a loop with a known upper bound on the number of iterations, and
furthermore know that either the number of iterations will be either exactly
that upper bound or zero, then we can fully unroll up to that upper bound
keeping only the first loop test to check for the zero iteration case.
Most of the work here is in plumbing this 'max-or-zero' information from the
part of scalar evolution where it's detected through to loop unrolling. I've
also gone for the safe default of 'false' everywhere but howManyLessThans which
could probably be improved.
Differential Revision: https://reviews.llvm.org/D25682
llvm-svn: 284818
Summary:
The spill size was incorrectly set to 196 bits,
which isn't a multiple of 8. This problem was detected when
experimenting with asserts that the spill size should be a
multiple of the byte size.
New corrected value for the spill size is set to 192 bits.
Note that tablegen (RegisterInfoEmitter) will divide the
size set in the RegisterClass definition by 8. So this
change should not have any impact on the tablegen output
(trunc(192/8) == trunc(196/8) == 24 bytes).
Reviewers: t.p.northover
Subscribers: llvm-commits, aemerson, rengolin
Differential Revision: https://reviews.llvm.org/D25818
llvm-svn: 284814
There's no agreement about this patch. I personally find the
PRE machinery of the current GVN hard enough to reason about
that I'm not sure I'll try to land this again, instead of working
on the rewrite).
llvm-svn: 284796
rL284780 fixed the PREL31 relocation and added a test for it. Being
the first such test for ARM relocations, it exposed incorrect endianness
assumptions (causing buildbot failures on big-endian hosts). Fix that by
using the same helpers used for the x86 case.
llvm-svn: 284789
This is to avoid inlining too many multiplication operands into a SCEV, which could
take exponential time in the worst case.
Reviewers: Sanjoy Das, Mehdi Amini, Michael Zolotukhin
Differential Revision: https://reviews.llvm.org/D25794
llvm-svn: 284784
Summary:
This allows us to mark when uses have been optimized.
This lets us avoid rewalking (IE when people call getClobberingAccess on everything), and also
enables us to later relax the requirement of use optimization during updates with less cost.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25172
llvm-svn: 284771
load commands that use the MachO::twolevel_hints_command type
which includes only the LC_TWOLEVEL_HINTS load command.
This is not used in llvm libObject code or in llvm tool code. But
does appear in one of the binary test files. While this load command is
obsolete it is easier to add code for it in libObject than edit or change
the binary test case.
llvm-svn: 284769
This was all using ArrayRef<>s before which presents a problem
when you want to serialize to or deserialize from an actual
PDB stream. An ArrayRef<> is really just a special case of
what can be handled with StreamInterface though (e.g. by using
a ByteStream), so changing this to use StreamInterface allows
us to plug in a PDB stream and get all the record serialization
and deserialization for free on a MappedBlockStream.
Subsequent patches will try to remove TypeTableBuilder and
TypeRecordBuilder in favor of class that operate on
Streams as well, which should allow us to completely merge
the reading and writing codepaths for both types and symbols.
Differential Revision: https://reviews.llvm.org/D25831
llvm-svn: 284762
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284757
Summary:
While promoting *_EXTEND_VECTOR_INREG nodes whose inputs are already
promoted, perform the appropriate sign extension for the promoted node
before doing the *_EXTEND_VECTOR_INREG operation. If not, the undefined
high-order bits of the promoted operand may (a) be garbage inc ase of
zext) or (b) contribute the wrong sign-bit (in case of sext)
Updated the promote-vec3.ll test after this change. The diff shows
explicit zeroing in case of zext and intermediate sign extension in case
of sext.
Reviewers: RKSimon
Subscribers: llvm-commits, srhines
Differential Revision: https://reviews.llvm.org/D25790
llvm-svn: 284752
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.
This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.
original commit msg:
[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284746
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.
We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.
llvm-svn: 284744
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
This is a resubmission of r284590. The mingw build should be fixed now. The
problem was we were matching time_t with _localtime_64s, which was incorrect on
_USE_32BIT_TIME_T systems. Instead I use localtime_s, which should always
evaluate to the correct function.
llvm-svn: 284720
Post-RA sched strategy and scheduling instruction annotations for z196, zEC12
and z13.
This scheduler optimizes decoder grouping and balances processor resources
(including side steering the FPd unit instructions).
The SystemZHazardRecognizer keeps track of the scheduling state, which can
be dumped with -debug-only=misched.
Reviers: Ulrich Weigand, Andrew Trick.
https://reviews.llvm.org/D17260
llvm-svn: 284704
- Add alignment attribute to DIVariable family
- Modify bitcode format to match new DIVariable representation
- Update tests to match these changes (also add bitcode upgrade test)
- Expect that frontend passes non-zero align value only when it is not default
(was forcibly aligned by alignas()/_Alignas()/__atribute__(aligned())
Differential Revision: https://reviews.llvm.org/D25073
llvm-svn: 284678
load commands that use the MachO::thread_command type
but are not used in llvm libObject code but used in llvm tool code.
This includes the LC_UNIXTHREAD and LC_THREAD
load commands.
A quick note about the philosophy of the error checking in
libObject for Mach-O files, the idea behind the checking is
that we never will return a Mach-O file out of libObject that
contains unknown things in the load commands.
To do this the 32-bit ARM and PPC general tread states
needed to be defined as two test case binaries contained
them. If other thread states for other CPUs need to be
added we will do that as needed.
Going forward the LC_MAIN load command is used to
set the entry point in Mach-O executables these days
instead of an LC_UNIXTHREAD as was done in the past.
So today only in core files are LC_THREAD load commands
and thread states usually found.
Other thread states have not yet been defined in
include/Support/MachO.h at this time. But that can be
added as needed with their corresponding checking also
added.
llvm-svn: 284668
Profile runtime can generate an empty raw profile (when there is no function in
the shared library). This empty profile is treated as a text format profile. A
test format profile without the flag of "#IR" is thought to be a clang
generated profile. So in llvm profile merging, we will get a bogus warning of
"Merge IR generated profile with Clang generated profile."
The fix here is to skip the empty profile (when the buffer size is 0) for
profile merge.
Reviewers: vsk, davidxl
Differential Revision: http://reviews.llvm.org/D25687
llvm-svn: 284659
0 - X --> X, if X is 0 or the minimum signed value
0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW
I noticed this pattern might be created in the backend after the change from D25485,
so we'll want to add a similar fold for the DAG.
The use of computeKnownBits in InstSimplify may be something to investigate if the
compile time of InstSimplify is noticeable. We could replace computeKnownBits with
specific pattern matchers or limit the recursion.
Differential Revision: https://reviews.llvm.org/D25785
llvm-svn: 284649
This code crashed on funclet-style EH instructions such as catchpad,
catchswitch, and cleanuppad. Just treat all EH pad instructions
equivalently and avoid merging the globals they reference through any
use.
llvm-svn: 284633
Some instructions from the original loop, when vectorized, can become trivially
dead. This happens because of the way we structure the new loop. For example,
we create new induction variables and induction variable "steps" in the new
loop. Thus, when we go to vectorize the original induction variable update, it
may no longer be needed due to the instructions we've already created. This
patch prevents us from creating these redundant instructions. This reduces code
size before simplification and allows greater flexibility in code generation
since we have fewer unnecessary instruction uses.
Differential Revision: https://reviews.llvm.org/D25631
llvm-svn: 284631
This change is motivated by the case when IndVarSimplify doesn't widen a comparison of IV increment because it can't prove IV increment being non-negative. We end up with a redundant trunc of the widened increment on this example.
for.body:
%i = phi i32 [ %start, %for.body.lr.ph ], [ %i.inc, %for.inc ]
%within_limits = icmp ult i32 %i, 64
br i1 %within_limits, label %continue, label %for.end
continue:
%i.i64 = zext i32 %i to i64
%arrayidx = getelementptr inbounds i32, i32* %base, i64 %i.i64
%val = load i32, i32* %arrayidx, align 4
br label %for.inc
for.inc:
%i.inc = add nsw nuw i32 %i, 1
%cmp = icmp slt i32 %i.inc, %limit
br i1 %cmp, label %for.body, label %for.end
There is a range check inside of the loop which guarantees the IV to be non-negative. NSW on the increment guarantees that the increment is also non-negative. Teach IndVarSimplify to use the range check to prove non-negativity of loop increments.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D25738
llvm-svn: 284629
Summary:
Changes default backend parallelism from thread::hardware_concurrency to
the new llvm::heavyweight_hardware_concurrency, which for X86 Linux
defaults to the number of physical cores (and will fall back to
thread::hardware_concurrency otherwise). This avoid oversubscribing
the physical cores using hyperthreading.
Reviewers: mehdi_amini, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25775
llvm-svn: 284618
This reverts commit r284590 as it fails on the mingw buildbot. I think I know the
fix, but I cannot test it right now. Will reapply when I verify it works ok.
This reverts r284590.
llvm-svn: 284615
Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG.
With the mask, this should be no worse than 2 shifts. The mask can be eliminated
in some cases, so that should be better than 2 shifts.
This change exposed some missing folds related to negation:
https://reviews.llvm.org/rL284239https://reviews.llvm.org/rL284395
There may be others, so please let me know if you see any regressions.
Differential Revision: https://reviews.llvm.org/D25485
llvm-svn: 284611
This required reengineering of some of the part of liveness calculation,
including fixing some issues caused by the limitations of the previous
approach. The current code is not necessarily the fastest, but it should
be functionally correct (at least more so than before). The compile-time
performance will be addressed in the future.
llvm-svn: 284609
Summary:
std::chrono mostly covers the functionality of llvm::sys::TimeValue and
lldb_private::TimeValue. This header adds a bit of utility functions and
typedefs, which make the usage of the library and porting code from TimeValues
easier.
Rationale:
- TimePoint typedef - precision of system_clock is implementation defined -
using a well-defined precision helps maintain consistency between platforms,
makes it interact better with existing TimeValue classes, and avoids cases
there a time point is implicitly convertible to a specific precision on some
platforms but not on others.
- system_clock::to_time_t only accepts time_points with the default system
precision (even though time_t has only second precision on all platforms we
support). To avoid the need for explicit casts, I have added a toTimeT()
wrapper function. toTimePoint(time_t) was not strictly necessary, but I have
added it for symmetry.
Reviewers: zturner, mehdi_amini
Subscribers: beanz, mgorny, llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D25416
llvm-svn: 284590
Most z13 vector instructions have a base form where the data type of
the operation (whether to consider the vector to be 16 bytes, 8
halfwords, 4 words, or 2 doublewords) is encoded into a mask field,
and then a set of extended mnemonics where the mask field is not
present but the data type is encoded into the mnemonic name.
Currently, LLVM only supports the type-specific forms (since those
are really the ones needed for code generation), but not the base
type-generic forms.
To complete the assembler support and make it fully compatible with
the GNU assembler, this commit adds assembler aliases for all the
base forms of the various vector instructions.
It also adds two more alias forms that are documented in the PoP:
VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc.
VNOT -- special variant of VNO
llvm-svn: 284586
The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are
documented in the Principles of Operation to have an optional last
operand to encode arbitrary values in a mask field.
This commit adds support for those optional operands, and cleans up
the patterns to generate vector string instruction as bit. No change
to code generation intended.
llvm-svn: 284585
Declare the LLVM_CMAKE_PATH to the source directory location of CMake
files, in order to make it possible to easily use them in subprojects.
Such a variable is already declared in most of LLVM projects
(and inconsistently mixed with direct source tree references), including
Clang, LLDB, compiler-rt, libcxx... Declaring it inside main LLVM tree
makes it possible to avoid having to declare fallback values or use
conditionals in those projects.
It should be noted that in some of the subprojects LLVM_CMAKE_PATH is
used to reference generated LLVMConfig.cmake file. However, these
references are conditional to stand-alone builds and explicitly
including this file is unnecessary in combined builds.
Differential Revision: https://reviews.llvm.org/D25724
llvm-svn: 284581
The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions.
It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size.
TBB example:
Before: lsls r0, r0, #2 After: add r0, pc
adr r1, .LJTI0_0 ldrb r0, [r0, #6]
ldr r0, [r0, r1] lsls r0, r0, #1
mov pc, r0 add pc, r0
=> No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4.
The only case that can increase dynamic instruction count is the TBH case:
Before: lsls r0, r4, #2 After: lsls r4, r4, #1
adr r1, .LJTI0_0 add r4, pc
ldr r0, [r0, r1] ldrh r4, [r4, #6]
mov pc, r0 lsls r4, r4, #1
add pc, r4
=> 1 more instruction in prologue. Jump table shrunk by a factor of 2.
So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!)
llvm-svn: 284580
This will get the same ConstantSDNode scalar or vector splat value as the current separate dyn_cast<ConstantSDNode> / isVector() approach.
llvm-svn: 284578
This renames the function for checking FP function attribute values and also
adds more build attribute tests (which are in separate files because build
attributes are set per file).
Differential Revision: https://reviews.llvm.org/D25625
llvm-svn: 284571
Summary:
This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors.
New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions.
There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert.
The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns.
Reviewers: RKSimon, delena, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25651
llvm-svn: 284567
Example of output:
COVERAGE:
COVERED: in DSO2(int) /pathto/DSO2.cpp:6
COVERED: in DSO2(int) /pathto/DSO2.cpp:8
COVERED: in DSO1(int) /pathto/DSO1.cpp:6
COVERED: in DSO1(int) /pathto/DSO1.cpp:8
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:16
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:19
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:25
COVERED: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:26
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO1.so
UNCOVERED_LINE: in DSO1(int) /pathto/DSO1.cpp:9
UNCOVERED_FUNC: in Uncovered1()
MODULE_WITH_COVERAGE: /pathto/libLLVMFuzzer-DSO2.so
UNCOVERED_LINE: in DSO2(int) /pathto/DSO2.cpp:9
UNCOVERED_FUNC: in Uncovered2()
MODULE_WITH_COVERAGE: /pathto/LLVMFuzzer-DSOTest
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:21
UNCOVERED_LINE: in LLVMFuzzerTestOneInput /pathto/DSOTestMain.cpp:27
UNCOVERED_FILE: /pathto/DSOTestExtra.cpp
Several things are not perfect here:
* we are using objdump+awk instead of sancov because sancov does not support DSOs yet.
* this breaks in the presence of ASAN_OPTIONS=strip_path_prefix=...
(need to implement another API to get the module name by PC)
llvm-svn: 284554
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284545
Summary:
This pass shrink-wraps a condition to some library calls where the call
result is not used. For example:
sqrt(val);
is transformed to
if (val < 0)
sqrt(val);
Even if the result of library call is not being used, the compiler cannot
safely delete the call because the function can set errno on error
conditions.
Note in many functions, the error condition solely depends on the incoming
parameter. In this optimization, we can generate the condition can lead to
the errno to shrink-wrap the call. Since the chances of hitting the error
condition is low, the runtime call is effectively eliminated.
These partially dead calls are usually results of C++ abstraction penalty
exposed by inlining. This optimization hits 108 times in 19 C/C++ programs
in SPEC2006.
Reviewers: hfinkel, mehdi_amini, davidxl
Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz
Differential Revision: https://reviews.llvm.org/D24414
llvm-svn: 284542
Summary:
The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass.
The performance impact on speccpu2006 on Intel sandybridge machines:
spec/2006/fp/C++/444.namd 25.3 +0.26%
spec/2006/fp/C++/447.dealII 45.96 -0.10%
spec/2006/fp/C++/450.soplex 41.97 +1.49%
spec/2006/fp/C++/453.povray 36.83 -0.96%
spec/2006/fp/C/433.milc 23.81 +0.32%
spec/2006/fp/C/470.lbm 41.17 +0.34%
spec/2006/fp/C/482.sphinx3 48.13 +0.69%
spec/2006/int/C++/471.omnetpp 22.45 +3.25%
spec/2006/int/C++/473.astar 21.35 -2.06%
spec/2006/int/C++/483.xalancbmk 36.02 -2.39%
spec/2006/int/C/400.perlbench 33.7 -0.17%
spec/2006/int/C/401.bzip2 22.9 +0.52%
spec/2006/int/C/403.gcc 32.42 -0.54%
spec/2006/int/C/429.mcf 39.59 +0.19%
spec/2006/int/C/445.gobmk 26.98 -0.00%
spec/2006/int/C/456.hmmer 24.52 -0.18%
spec/2006/int/C/458.sjeng 28.26 +0.02%
spec/2006/int/C/462.libquantum 55.44 +3.74%
spec/2006/int/C/464.h264ref 46.67 -0.39%
geometric mean +0.20%
Manually checked 473 and 471 to verify the diff is in the noise range.
Reviewers: rengolin, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24818
llvm-svn: 284541
This is just a quick utility handy for getting rough summaries of types
in a given object or dwo file. I've been using it to investigate the
amount of type info redundancy across a project build, for example.
llvm-svn: 284537
The custom lowering is pretty straightforward: basically, just AND
together the two halves of a <4 x i32> compare.
Differential Revision: https://reviews.llvm.org/D25713
llvm-svn: 284536
Summary:
The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation:
1. add a new metadata for function section prefix
2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function
3. output the section prefix set by CGP
Reviewers: davidxl, eraman
Subscribers: vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D24989
llvm-svn: 284533
Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0`
to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not
have to be materialized beforehand for FCMP, as it has a form that has 0.0
as an implicit operand.
Differential Revision: https://reviews.llvm.org/D24808
llvm-svn: 284531
load command that use the MachO:: linkedit_data_command
type but is not used in llvm libObject code but used in llvm tool code.
This is for the LC_CODE_SIGNATURE load command.
llvm-svn: 284529
AArch64 actually supports many 8-bit operations under the definition used by
GlobalISel: the designated information-carrying bits of a GPR32 get the right
value if you just use the normal 32-bit instruction.
llvm-svn: 284526
load commands that use the MachO::routines_command and
and MachO::routines_command_64 types but are not used in llvm
libObject code but used in llvm tool code.
This includes the LC_ROUTINES and LC_ROUTINES_64
load commands.
llvm-svn: 284504
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284495
This patch teaches ias for mips to handle expressions such as
(8*4)+(8*31)($sp). Such expression typically occur from the expansion
of multiple macro definitions.
This partially resolves PR/30383.
Thanks to Sean Bruno for reporting the issue!
Reviewers: zoran.jovanovic, vkalintiris
Differential Revision: https://reviews.llvm.org/D24667
llvm-svn: 284485
The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands.
MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit
immediate.
This patch correct the definition of sync so that it is accepted with an
operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned
immediate for MIPS32 and later revisions.
Additionally a clear error is given when the MIPS32 version of sync is
used when targeting pre MIPS32.
This partially resolves PR/30714.
Thanks to Daniel Sanders for reporting this issue!
Reveiwers: vkalintiris
Differential Revision: https://reviews.llvm.org/D25672
llvm-svn: 284483
In futher patches we shall have alignment field added to DIVariable family
and switching from uint64_t to uint32_t will save 4 bytes per variable.
Differential Revision: https://reviews.llvm.org/D25620
llvm-svn: 284482
ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads
or stores using the specified source or destination register and the next
register.
This patch does not add support for the cases where the offset is greater than
a 16 bit signed immediate as that would lead to a wrong/misleading error
message as the assembler would report "instruction requires a CPU feature
not currently enabled" for ld & sd for MIPS64 when their offset is not a signed
16 bit number.
This fixes PR/29159.
Thanks to Sean Bruno for reporting this issue!
Reviewers: vkalintiris, seanbruno, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D24556
llvm-svn: 284481
Committing on behalf of Coby Tayree: After check-all and LGTM
Desc:
AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}')
Currently, the following forms are allowed:
{k<num>}
{k<num>}{z}
This patch allows the following forms:
{z}{k<num>}
and ignores the next form:
{z}
Justification would be quite simple - GCC
Differential Revision: http://reviews.llvm.org/D25013
llvm-svn: 284479
Summary:
Instead of instantiating the MipsFastISel class and checking if the
target is supported in the overriden methods, we should perform that
check before creating the class. This allows us to enable FastISel *only*
for targets that truly support it, ie. MIPS32 to MIPS32R5.
Reviewers: sdardis
Subscribers: ehostunreach, llvm-commits
Differential Revision: https://reviews.llvm.org/D24824
llvm-svn: 284475
In loops that look something like
i = n;
do {
...
} while(i++ < n+k);
where k is a constant, the maximum backedge count is k (in fact the backedge
count will be either 0 or k, depending on whether n+k wraps). More generally
for LHS < RHS if RHS-(LHS of first comparison) is a constant then the loop will
iterate either 0 or that constant number of times.
This allows for more loop unrolling with the recent upper bound loop unrolling
changes, and I'm working on a patch that will let loop unrolling additionally
make use of the loop being executed either 0 or k times (we need to retain the
loop comparison only on the first unrolled iteration).
Differential Revision: https://reviews.llvm.org/D25607
llvm-svn: 284465
This reverts commits 284436 and 284437 because they still break AArch64 bots:
Value of: format_number(-10, IntegerStyle::Integer, 1)
Actual: "-0"
Expected: "-10"
llvm-svn: 284462
This patch assigns cost of the scaling used in addressing for Cortex-R52.
On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.
Differential Revision: http://reviews.llvm.org/D25670
Reviewer: jmolloy
llvm-svn: 284460
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.
This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.
It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.
Differential Revision: https://reviews.llvm.org/D23808
llvm-svn: 284459
This patch adds simplified support for tail calls on ARM with XRay instrumentation.
Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program
#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
std::printf("In fC()\n");
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
std::printf("In fB()\n");
fC();
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
std::printf("In fA()\n");
fB();
}
// Avoid infinite recursion in case the logging function is instrumented (so calls logging
// function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}
int main(int argc, char* argv[]) {
__xray_set_handler(simplyPrint);
printf("Patching...\n");
__xray_patch();
fA();
printf("Unpatching...\n");
__xray_unpatch();
fA();
return 0;
}
gives the following output:
Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()
So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().
Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).
Differential Revision: https://reviews.llvm.org/D25030
llvm-svn: 284456
When Error was threaded through these APIs back in r265606 the
"return" was missed here, which triggers a warning if/when I add
LLVM_NODISCARD to the Error type.
llvm-svn: 284454
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.
Reviewers: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25666
llvm-svn: 284451
Summary:
There are differences in codegen between Linux and Windows due to:
1. Using std::sort which uses quicksort which is a non-stable sort.
2. Iterating over Set data structure where the iteration order is
non deterministic.
Reviewers: arsenm, grosbach, junbuml, zinob, MatzeB
Subscribers: MatzeB, wdng
Differential Revision: https://reviews.llvm.org/D25695
llvm-svn: 284441
This resubmits commits 284425 and r284428, which were reverted
in r284429 due to some infinite recursion caused by an incorrect
selection of function overloads. Reproduced the failure on Linux
using GCC 4.8.4, and confirmed that with the new patch the tests
path on GCC as well as MSVC. So hopefully this fixes everything.
llvm-svn: 284436
Summary:
Reclaiming the name 'CachedHashString' will let us add a type with that
name that owns its value.
Reviewers: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25644
llvm-svn: 284434
load commands that use the MachO::sub_framework_command,
MachO::sub_umbrella_command, MachO::sub_library_command
and MachO::sub_client_command types but are not used in llvm
libObject code but used in llvm tool code.
This includes the LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA,
LC_SUB_LIBRARY and LC_SUB_CLIENT load commands.
llvm-svn: 284431
raw_ostream has not afforded a lot of flexibility in terms of
how to format numbers when outputting. Wrap this all up into
a set of low level helper functions that can be used to output
numbers with arbitrary precision, alignment, format, etc and
then update raw_ostream to use these functions.
This will be useful for upcoming improvements to llvm's string
formatting libraries, but are still useful independently.
Differential Revision: https://reviews.llvm.org/D25497
llvm-svn: 284425
As noted in:
https://reviews.llvm.org/D25685
This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch.
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.
llvm-svn: 284421
If -coverage is passed, but -g is not, clang populates the PassManager
pipeline with StripSymbols(debugOnly = true).
The stripSymbol pass therefore scans the list of named metadata,
drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD)
around. The verifier runs, and finds out that there's a CU not listed
in !llvm.dbg.cu (as it was previously dropped) -> crash.
When we strip debug info, so, check if there's coverage data,
and strip it as well, in order to avoid pending metadata left around.
Differential Revision: https://reviews.llvm.org/D25689
llvm-svn: 284418
Summary: Debug info should *not* affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info.
Reviewers: davidxl, mzolotukhin, jmolloy
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D25286
llvm-svn: 284415
Summary:
This adds the necessary logic to support relocations to thumb functions in the COFF dynamic linker.
The jumps to function addresses are mostly blx, which requires the ISA selection bit when jumping to a thumb function.
Note: I'm determining if the relocation requires the ISA bit when creating the relocation entries and not when resolving the relocation. I have to do that because I need the ObjectFile and the actual Symbol, which are available only when creating the entries. It would require a gross refactor if I do it otherwise, but I'm okay with doing it if you think it's better.
Reviewers: peter.smith, compnerd
Subscribers: rengolin, sas
Differential Revision: https://reviews.llvm.org/D25151
llvm-svn: 284410
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25198
llvm-svn: 284397
This came up as part of:
https://reviews.llvm.org/D25485
Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors.
llvm-svn: 284395
Based on post-commit review for D25585/r284180, rename
hardware_physical_concurrency to heavyweight_hardware_concurrency,
to better reflect what type of tasks it should be used for and
to enable other systems to map this to something other than the
number of physical cores.
llvm-svn: 284390
/../foo is still a proper path after removing the dotdot. This should
now finally match https://9p.io/sys/doc/lexnames.html [Cleaning names].
llvm-svn: 284384
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.
In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.
llvm-svn: 284381
Not all ConstantExprs can be represented by a global variable, for example most
pointer arithmetic other than addition of a constant, so we can't convert these
values from switch statements to lookup tables.
Differential Revision: https://reviews.llvm.org/D25550
llvm-svn: 284379
Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor.
Reviewers: sanjoy, grosser
Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser
Differential Revision: https://reviews.llvm.org/D16492
llvm-svn: 284378
CodeGenPrepare knows how to move a zext of a load into the same basic block
where the load lives. The goal is to help ISel match a zero-extending load
instead of two separated instructions.
CGP attempts to move a zext computation even if it lives in a basic block that
does not post-dominate the load's basic block. That means, the hoisted zext may
be speculated. Preserving the zext location would hurt the debugging experience
and the quality of sample pgo.
With this patch, when moving a zext near to its associated load, CGP no longer
propagates the zext's debug location. Instead, CGP conservatively reuses the
same debug location for the load and the zext.
An alternative approach would be to assign an artificial line-0 location to the
zext. However we don't want to over-use the 'line-0' for this particular case
because it would have a size cost in the line-table section for no additional
benefit.
Differential Revision: https://reviews.llvm.org/D25611
llvm-svn: 284377
In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
llvm-svn: 284311
BasicBlock::size is O(insts), making this loop O(blocks*insts), which
can be really slow on generated code. getPrevNode already checks if
we're at the beginning of the block and returns nullptr if so, just use
that instead. No functionality change intended.
llvm-svn: 284303
- Removed unused class members.
- Made class internal data private.
- Made class scoped data function scoped where it's possible.
- Replace naked new/delete with unique_ptr.
- Made resources guaranteed to be freed.
Differential Revision: https://reviews.llvm.org/D25464
llvm-svn: 284290
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
llvm-svn: 284287
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
Differential Revision: https://reviews.llvm.org/D25521
llvm-svn: 284276
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
llvm-svn: 284268
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)
Reviewers: arsenm, nhaehnle
Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D24672
llvm-svn: 284267
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
Reviewers: bogner, arsenm
Subscribers: nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24964
llvm-svn: 284266
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
Use PackedRegisterRef to store the register information in the graph nodes.
This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.
llvm-svn: 284255
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Differential Revision: https://reviews.llvm.org/D25535
llvm-svn: 284251
This is an improvement when compiling with llvm. llvm doesn't inline
the call to insert, so the align is always executed and shows up in
the profile.
With gcc the call to insert is inlined and the align computation moved
and done only if needed.
With this patch we explicitly only compute it if it is needed.
In the two tests with debug info, the speedup was
scylla
master 3.008959365
patch 2.932080942 1.02621974786x faster
firefox
master 6.709823604
patch 6.592387227 1.01781393795x faster
In all others the difference was in the noise.
llvm-svn: 284249
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Differential Revision: https://reviews.llvm.org/D23446
llvm-svn: 284248
Prefer add/zext because they are better supported in terms of value-tracking.
Note that the backend should be prepared for this IR canonicalization
(including vector types) after:
https://reviews.llvm.org/rL284015
Differential Revision: https://reviews.llvm.org/D25135
llvm-svn: 284241
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25289
llvm-svn: 284224
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.
Reviewers: dsanders, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D21473
llvm-svn: 284218
Committing in the name of Ziv Izhar: After check-all and LGTM .
The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
jmp short label
label:
}
A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958
Differential Revision: https://reviews.llvm.org/D24957
llvm-svn: 284211
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.
llvm-svn: 284204
Summary:
This will be used by ThinLTO to set the amount of backend
parallelism, which performs better when restricted to the number
of physical cores (on X86 at least, where getHostNumPhysicalCores is
currently defined). If not available this falls back to
thread::hardware_concurrency.
Note I didn't add to the thread class since that is a typedef to
std::thread where available.
Reviewers: mehdi_amini
Subscribers: beanz, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D25585
llvm-svn: 284180
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.
llvm-svn: 284175
Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.
llvm-svn: 284173
Summary:
This operation is promoted the same way was ISD::BSWAP. This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.
Reviewers: bogner, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D25202
llvm-svn: 284163
the X86 subdirectory. Original commit message:
Requires a valid TargetMachine to be passed to the SafeStack pass.
Patch by Michael LeMay
Differential revision: http://reviews.llvm.org/D24896
llvm-svn: 284161
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.
Differential Revision: https://reviews.llvm.org/D24849
llvm-svn: 284160
Relax the constraint for empty live-ranges while doing last chance
recoloring. Indeed, those live-ranges do not need an actual color to be
fond for the recoloring to work.
Empty live-range may happen as a result of splitting/spilling.
Unfortunately no test case for in-tree targets.
llvm-svn: 284152
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 284151
Summary:
For now I have only added support for x86_64 Linux, but other systems
can be added incrementally.
This is to be used for setting the default parallelism for ThinLTO
backends (instead of thread::hardware_concurrency which includes
hyperthreading and is too aggressive). I'll send this as a follow-on
patch, and it will fall back to hardware_concurrency when the new
getHostNumPhysicalCores returns -1 (when not supported for a given
host system).
I also added an interface to MemoryBuffer to force reading a file
as a stream - this is required for /proc/cpuinfo which is a special
file that looks like a normal file but appears to have 0 size.
The existing readers of this file in Host.cpp are reading the first
1024 or so bytes from it, because the necessary info is near the top.
But for the new functionality we need to be able to read the entire
file. I can go back and change the other readers to use the new
getFileAsStream as a follow-on patch since it seems much more robust.
Added a unittest.
Reviewers: mehdi_amini
Subscribers: beanz, mgorny, llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D25564
llvm-svn: 284138
In the MS ABI, the frontend is supposed to MD5 such pathologically long
names. LLVM should still defend itself from long names, though.
Fixes part of PR29098.
llvm-svn: 284136
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.
Based on a patch by David Kreitzer
This bug is a consequence of
r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines
We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.
Differential Revision: https://reviews.llvm.org/D25566
llvm-svn: 284130
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.
For instance:
LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]
Above, (1) takes less cycles than (2).
By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.
Differential Revision: http://reviews.llvm.org/D24857
Reviewers: jmolloy, rengolin
llvm-svn: 284127
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.
Differential Revision: https://reviews.llvm.org/D25333
llvm-svn: 284123
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.
llvm-svn: 284119
These instructions were only defined for microMIPSR6 previously. Add
definitions for MIPSR6, correct definitions for microMIPSR6, flag these
instructions as having unmodelled side effects (they disable/enable
virtual processors) and add missing disassember tests for microMIPSR6.
Reviewers: vkalintiris
Differential Review: https://reviews.llvm.org/D24291
llvm-svn: 284115
The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.
This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86.
Differential Revision: http://reviews.llvm.org/D25022
llvm-svn: 284108
We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom.
We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width.
llvm-svn: 284102
This is with an extra change to avoid calling MemoryLocation::get() on a call instruction.
Differential Revision: https://reviews.llvm.org/D25542
llvm-svn: 284098
This allows RegBankSelect in greedy mode to get rid some of the cross
register bank copies when loads are involved in the chain of
computation.
llvm-svn: 284097
- Use storage class C_STAT for 'PrivateLinkage' The storage class for
PrivateLinkage should equal to the Internal Linkage.
- Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes
x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix
"L" may conflict to the normal symbol name starting with 'L'.
Based on a patch by Han Sangjin! Manually updated test cases.
llvm-svn: 284096
This CL didn't actually address the test case in PR30499, and clang
still crashes.
Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC"
Reverts r283965 and r283967.
llvm-svn: 284093
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.
Reviewers: majnemer, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25293
llvm-svn: 284061
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
llvm-svn: 284053
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
We need to use the overload of Mangler::getNameWithPrefix that takes a
GlobalValue in order to mangle in the stdcall stack byte count for Windows
targets.
Differential Revision: https://reviews.llvm.org/D25529
llvm-svn: 284040
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.
Differential Revision: https://reviews.llvm.org/D25478
llvm-svn: 284036
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.
llvm-svn: 284031
Module inline asm was always being linked/concatenated
when running the IRLinker. This is correct for full LTO but not when
we are importing for ThinLTO, as it can result in multiply defined
symbols when the module asm defines a global symbol.
In order to test with llvm-lto2, I had to work around PR30396,
where a symbol that is defined in module assembly but defined in the
LLVM IR appears twice. Added workaround to llvm-lto2 with a FIXME.
Fixes PR30610.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25359
llvm-svn: 284030
Summary:
Constant bundle operands may need to retain their constant-ness for
correctness. I'll admit that this is slightly odd, but it looks like
SimplifyCFG already does this for things like @llvm.frameaddress and
@llvm.stackmap, so I suppose adding one more case is not a big deal.
It is possible to add a mechanism to denote bundle operands that need to
remain constants, but that's probably too complicated for the time
being.
Reviewers: jmolloy
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25502
llvm-svn: 284028
Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default.
Patch by Li Huang
Differential Revision: https://reviews.llvm.org/D18777
llvm-svn: 284022
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.
Differential Revision: https://reviews.llvm.org/D25374
llvm-svn: 284015
This combiner breaks debug experience and should not be run when optimizations are disabled.
For example:
int main() {
int j = 0;
j += 2;
if (j == 2)
return 0;
return 5;
}
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.
Differential Revision: https://reviews.llvm.org/D19268
llvm-svn: 284014
An arithmetic shift can be safely changed to a logical shift if the first
operand is known positive. This allows ComputeKnownBits (and similar analysis)
to determine the sign bit of the shifted value in some cases. In turn, this
allows InstCombine to canonicalize a signed comparison (a > 0) into an equality
check (a != 0).
PR30577
Differential Revision: https://reviews.llvm.org/D25119
llvm-svn: 284013
The current Cost Model implementation is very inaccurate and has to be
updated, improved, re-implemented to be able to take into account the
concrete CPU models and the concrete targets where this Cost Model is
being used. For example, the Latency Cost Model should be differ from
Code Size Cost Model, etc.
This patch is the first step to launch the developing and implementation
of a new Cost Model generation.
Differential Revision: https://reviews.llvm.org/D25186
llvm-svn: 284012
As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead.
Differential Revision: https://reviews.llvm.org/D25466
llvm-svn: 284000
subcommands
This commit fixes a bug where the help output doesn't display subcommands when
a tool has less than 3 subcommands.
This change doesn't include a corresponding unittest as there is no viable way
to provide a unittest for it.
Differential Revision: https://reviews.llvm.org/D25463
llvm-svn: 283998
The basic inlining operation makes the following changes to the call graph:
1) Add edges that were previously transitive edges. This is always trivial and
this patch gives the LCG helper methods to make this more convenient.
2) Remove the inlined edge. We had existing support for this, but it contained
bugs that needed to be fixed. Testing in the same pattern as the inliner
exposes these bugs very nicely.
3) Delete a function when it becomes dead because it is internal and all calls
have been inlined. The LCG had no support at all for this operation, so this
adds that support.
Two unittests have been added that exercise this specific mutation pattern to
the call graph. They were extremely effective in uncovering bugs. Sadly,
a large fraction of the code here is just to implement those unit tests, but
I think they're paying for themselves. =]
This was split out of a patch that actually uses the routines to
implement inlining in the new pass manager in order to isolate (with
unit tests) the logic that was entirely within the LCG.
Many thanks for the careful review from folks! There will be a few minor
follow-up patches based on the comments in the review as well.
Differential Revision: https://reviews.llvm.org/D24225
llvm-svn: 283982
This reverts commit r283946.
This breaks when build with GCC:
lib/Fuzzer/FuzzerTracePC.cpp:169:6: error: always_inline function might not be inlinable [-Werror=attributes]
lib/Fuzzer/FuzzerTracePC.cpp:169:6: error: inlining failed in call to always_inline 'void fuzzer::TracePC::HandleCmp(void*, T, T) [with T = long unsigned int]': target specific option mismatch
lib/Fuzzer/FuzzerTracePC.cpp:198:65: error: called from here
llvm-svn: 283979
Although Copies are not specific to preISel, we still have to assign them
a proper register class. However, given they are not constrained to
anything we do not have to handle the source register at the copy. It
will be properly mapped when reaching the related definition.
In the process, the handlong of G_ANYEXT is slightly modified as those
end up being selected as copy. The difference is that when register size
do not match on both sides, we need to insert SUBREG_TO_REG operation,
otherwise the post RA copy expansion will not be happy!
llvm-svn: 283972
This implements the cleanup that Danny asked to commit separately from the
previous fix to GVN-hoist in https://reviews.llvm.org/D25476#inline-219818
Tested with ninja check on x86_64-linux.
llvm-svn: 283967
This is a refreshed version of a patch that was reverted: it fixes
the problems reported in both PR30216 and PR30499, and
contains all the test-cases from both bugs.
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Tested on x86_64-linux with check and a test-suite run.
Differential Revision: https://reviews.llvm.org/D25476
llvm-svn: 283965
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
B = Splat A
C = Splat B
=>
C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
B = Splat A
C = Splat B
=>
B = Splat A
C = COPY B
Fixes PR30663.
Reviewers: echristo, iteratee, kbarton, nemanjai
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25493
llvm-svn: 283961
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.
llvm-svn: 283953