Move veriication check for legal conversions to f128 into LowerINT_TO_FP()
and fix some indentations to match other sections of the code for readability.
llvm-svn: 330138
The code uses the index of the last element in the sorted array to determine the maximum size needed for the vector. But if the last index is a FunctionIndex(~0), attrIdxToArrayIdx will return 0 and the vector will have size 1. If there are any indices before FunctionIndex, those values would return a value larger than 0 from attrIdxToArrayIdx. So in this case we need to look in front of the FunctionIndex to get the true size needed.
Differential Revision: https://reviews.llvm.org/D45632
llvm-svn: 330136
When emitting CodeView debug information, compiler-generated thunk routines
should be emitted using S_THUNK32 symbols instead of S_GPROC32_ID symbols so
Visual Studio can properly step into the user code. This initial support only
handles standard thunk ordinals.
Differential Revision: https://reviews.llvm.org/D43838
llvm-svn: 330132
Most of these are pretty trivial and obvious. Setting the toolchain
version to 14.11 is perhaps a little questionable, but we've been bitten
in the past where one of our version fields sidn't match MSVC's, and I
definitely don't want to go through that diagnosis again as it was
pretty time consuming and hard to track down.
I found all of these by using llvm-pdbutil export to dump the dbi and
pdb streams to a file, then using fc followed by llvm-pdbutil explain to
explain the mismatched bytes.
There are still some more, these are just the low hanging fruit.
Differential Revision: https://reviews.llvm.org/D45276
llvm-svn: 330130
Two cleanups:
1. As noted in D45453, we had tests that don't need FMF that were misplaced in the 'fast-math.ll' test file.
2. This removes the final uses of dyn_castFNegVal, so that can be deleted. We use 'match' now.
llvm-svn: 330126
Using Goldmont's cost tables for these two upcoming
atom archs.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45612
llvm-svn: 330109
Summary: As discussed in https://reviews.llvm.org/D45606, it makes more sense to name the class as SmallVectorMemoryBuffer
Reviewers: bkramer, dblaikie
Reviewed By: dblaikie
Subscribers: mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D45661
llvm-svn: 330107
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.
The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.
https://rise4fun.com/Alive/Pkmg
Reviewers: spatel, craig.topper, eli.friedman, jingyue
Reviewed By: spatel
Subscribers: llvm-commits
Was reviewed as part of https://reviews.llvm.org/D45631
llvm-svn: 330103
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.
The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.
https://rise4fun.com/Alive/Pkmg
Reviewers: spatel, craig.topper, eli.friedman, jingyue
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45631
llvm-svn: 330101
As suggested in https://reviews.llvm.org/D45631#1068338,
looking at haveNoCommonBitsSet() users, and *trying* to
show the change effect elsewhere.
llvm-svn: 330100
TargetSchedModel now always delegates to MCSchedModel the computation of
instruction latency and reciprocal throughput.
No functional change intended.
llvm-svn: 330099
This is a transform that I limited in instcombine in rL329821 because it was
creating more instructions in IR when the cast has multiple uses.
But if the cast is free, then we can do the transform regardless of other
uses because it improves the potential throughput of the calculation by
removing a dependency on the fneg.
Differential Revision: https://reviews.llvm.org/D45598
llvm-svn: 330098
Summary:
Since the class is used by both MCJIT and LTO, it makes more sense to move it to Support lib.
This is a follow up patch to r329929 and https://reviews.llvm.org/D45244
Reviewers: bkramer, dblaikie
Reviewed By: bkramer
Subscribers: mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D45606
llvm-svn: 330093
Create convenience functions for printing error, warning and note to
stdout. Previously we had similar functions being used in dsymutil, but
given that this pattern is so common it makes sense to make it available
globally.
llvm-svn: 330091
These simplifications were previously enabled only with isFast(), but that
is more restrictive than required. Since r317488, FMF has 'reassoc' to
control these cases at a finer level.
llvm-svn: 330089
Summary:
InsertPos is within the bacic block `Header`, so `findDebugLoc()` should
be called on not `MBB` but `Header` instead.
Reviewers: yurydelendik
Subscribers: jfb, dschuff, aprantl, sbc100, jgravelle-google, sunfish, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45648
llvm-svn: 330079
The destination size of the movzx/movsx instruction is controlled by the normal operand size mechanisms. Only the input type is fixed.
This means that a 0x66 prefix on the encoding for zext/sext 16->32 should really produce a 16->16 instruction. Functionally this is equivalent to a GR16->GR16 move since bits 16 and above will be preserved. So nothing is actually extended.
llvm-svn: 330078
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.
Differential Revision: https://reviews.llvm.org/D45271
llvm-svn: 330062
r327219 added wrappers to std::sort which randomly shuffle the container before
sorting. This will help in uncovering non-determinism caused due to undefined
sorting order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of
std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to
llvm::sort. Refer the comments section in D44363 for a list of all the
required patches.
llvm-svn: 330061
The Power 9 scheduler model should now include the TLS instructions.
We can now, once again, mark the model as complete.
From now on, if instructions are added to Power 9 but are not
added to the model the build should produce an error. Hopefully
that will alert the developer who is adding new instructions
that they should also be added to the scheulder model.
llvm-svn: 330060
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: kcc, pcc, danielcdh, jmolloy, sanjoy, dberlin, ruiu
Reviewed By: ruiu
Subscribers: ruiu, llvm-commits
Differential Revision: https://reviews.llvm.org/D45142
llvm-svn: 330059
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: grosbach, void, ruiu
Reviewed By: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45138
llvm-svn: 330058
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: bogner, vsk, eraman, ruiu
Reviewed By: ruiu
Subscribers: ruiu, llvm-commits
Differential Revision: https://reviews.llvm.org/D45139
llvm-svn: 330057
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer D44363 for a list of all the required patches.
Reviewers: pcc, mehdi_amini, ruiu
Reviewed By: ruiu
Subscribers: ruiu, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D45137
llvm-svn: 330053
We have a few functions that virtually all command wants to run on
process startup/shutdown. This patch adds InitLLVM class to do that
all at once, so that we don't need to copy-n-paste boilerplate code
to each llvm command's main() function.
Differential Revision: https://reviews.llvm.org/D45602
llvm-svn: 330046
Previously, the MIPS backend would alwyas break down constant multiplications
into a series of shifts, adds, and subs. This patch changes that so the cost of
doing so is estimated.
The cost is estimated against worst case constant materialization and retrieving
the results from the HI/LO registers.
For cases where the value type of the multiplication is not legal, the cost of
legalization is estimated and is accounted for before performing the
optimization of breaking down the constant
This resolves PR36884.
Thanks to npl for reporting the issue!
Reviewers: abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D45316
llvm-svn: 330037
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.
Differential Revision: https://reviews.llvm.org/D44675
llvm-svn: 330034
Similar to rL329834, don't rely on itinerary scheduler model to determine latencies for LEA thresholds, use the generic TargetSchedModel::computeInstrLatency call.
llvm-svn: 330030
Summary:
This was originally part of rL328132, and led to the discovery
of the issues addressed in rL328987. Re-landing.
Reviewers: xur, davidxl, bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45545
llvm-svn: 330029
Summary:
Added instructions for contiguous stores, ST1, with scalar+imm addressing
modes and corresponding tests. The patch also adds parsing of
'mul vl' as needed for the VL-scaled immediate.
This is patch [6/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45432
llvm-svn: 330014
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.
I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]
This extends/generalizes that fold, so it is handled too.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45539
llvm-svn: 330001
Summary:
Added Z_(b|h|s|d) vector list RegisterOperands along with support to
add/print the vector lists.
This is patch [5/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: fhahn
Subscribers: tschuett, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45431
llvm-svn: 330000
Hint to hardware to move the cache line containing the
address to a more distant level of the cache without
writing back to memory.
Reviewers: craig.topper, zvi
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D45256
llvm-svn: 329992
The commit in SVN r310001 that added support for this actually didn't
use the right struct field for the frame pointer - for ARM, there is
no register named Fp in the CONTEXT struct. On Windows, the R11
register is used as frame pointer.
Differential Revision: https://reviews.llvm.org/D45590
llvm-svn: 329991
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.
We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.
llvm-svn: 329990
Summary: This enables debug fission on implicit ThinLTO when linked with gold. It will put the .dwo files in a directory specified by user.
Reviewers: tejohnson, pcc, dblaikie
Reviewed By: pcc
Subscribers: JDevlieghere, mehdi_amini, inglorion
Differential Revision: https://reviews.llvm.org/D44792
llvm-svn: 329988
This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093.
llvm-svn: 329967
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. This reduces the size of Chromium
for Android's .text section by 108KB.
Differential Revision: https://reviews.llvm.org/D45199
llvm-svn: 329956
This lifts a restriction on DILocation::getMergedLocation(), allowing it
to create merged locations for instructions other than calls.
Instruction::applyMergedLocation() now defaults to creating merged
locations for all instructions.
The default behavior of getMergedLocation() is unchanged: callers which
invoke it directly are unaffected.
This change will enable a follow-up Mem2Reg fix which improves crash
reporting.
Differential Revision: https://reviews.llvm.org/D45396
llvm-svn: 329955
This parses a mangled name into an AST (typically an intermediate stage in
itaniumDemangle) and provides some functions to query certain properties or
print certain parts of the demangled name.
Differential revision: https://reviews.llvm.org/D44668
llvm-svn: 329951
Summary:
GCC compresses the pseudo instruction "mv rd, rs", which is an alias of
"addi rd, rs, 0", to "c.mv rd, rs".
In LLVM we rely on the canonical MC instruction (MCInst) to do our compression
checks and since there is no rule to compress "addi rd, rs, 0" --> "c.mv
rd, rs" we lose this compression opportunity to gcc.
In this patch we fix that by adding an addi to c.mv compression pattern, the
instruction "mv rd, rs" will be compressed to "c.mv rd, rs" just like
gcc does.
Patch by Zhaoshi Zheng (zzheng) and Sameer (sabuasal).
Reviewers: asb, apazos, zzheng, mgrang, shiva0217
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, niosHD, kito-cheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45583
llvm-svn: 329939
A previously missing intrinsic for an old instruction.
Reviewers: craig.topper, echristo
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D45312
llvm-svn: 329936
AFI->setRedZone(false) was put in the wrong place before, and so it only fired
on functions that didn't have stack frames. This moves that to the top of
emitPrologue to make sure that every function without a redzone has it set
correctly.
This also adds a function representing one of the early exit cases (GHC calling
convention) to the MachineOutliner noredzone test to ensure that we can outline
from functions like these, where we never use a redzone.
llvm-svn: 329922
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.
llvm-svn: 329920
There are cases when individual NodeSets can be equal with respect to
the ordering criteria. Since they are stored in an ordered container,
use stable_sort to preserve the relative order of equal NodeSets.
This should remove non-determinism discovered by shuffling done in
llvm::sort with expensive checks enabled.
llvm-svn: 329915
Summary:
Merged 'addVectorList64Operands' and 'addVectorList128Operands' into a
generic 'addVectorListOperands', which can be easily extended to work
for SVE vectors.
This is patch [4/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45430
llvm-svn: 329909
Don't assume SelectionDAG is non-null as the targets can use it with a
null pointer.
Differential Revision: https://reviews.llvm.org/D44611
llvm-svn: 329908
Created a helper function to query for non negative SCEVs. Uses the
SGE predicate to catch constants that could be interpreted as
negative.
Differential Revision: https://reviews.llvm.org/D45481
llvm-svn: 329907
Summary:
Added 'RegisterKind' to the VectorListOp structure, so that this operand
type can be reused for SVE vector lists in a later patch. It also
refactors the 'tryParseVectorList' function so it can be used directly
in the ParserMethod of an operand. The parsing can now parse multiple
kinds of vectors and recover if there is no match.
This is patch [3/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45429
llvm-svn: 329900
Summary:
According RISC-V ELF psABI specification, base RV32 and RV64 ISAs only
allow 32-bit instruction alignment, but instruction allow to be aligned
to 16-bit boundaries for C-extension.
So we just align to 4 bytes and 2 bytes for C-extension is enough.
Reviewers: asb, apazos
Differential Revision: https://reviews.llvm.org/D45560
Patch by Kito Cheng.
llvm-svn: 329899
This patch makes tryCandidate() virtual and some utility functions like
tryLess(), tryGreater(), ... externally available (used to be static).
This makes it possible for a target to derive a new MachineSchedStrategy from
GenericScheduler and reuse most parts.
It was necessary to wrap functions with the same names in
AMDGPU/SIMachineScheduler in a local namespace.
Review: Andy Trick, Florian Hahn
https://reviews.llvm.org/D43329
llvm-svn: 329884
Operand 0 should have the same type of the result. So if the result type needs to be promoted, operand 0 needs to be promoted unconditionally.
llvm-svn: 329883
Also add double-prevoius-failure.ll which captures a test case that at one
point triggered a compiler crash, while developing calling convention support
for f64 on RV32D with soft-float ABI.
llvm-svn: 329877
fadd.d is required in order to force floating point registers to be used in
test code, as parameters are passed in integer registers in the soft float
ABI.
Much of this patch is concerned with support for passing f64 on RV32D with a
soft-float ABI. Similar to Mips, introduce pseudoinstructions to build an f64
out of a pair of i32 and to split an f64 to a pair of i32. BUILD_PAIR and
EXTRACT_ELEMENT can't be used, as a BITCAST to i64 would be necessary, but i64
is not a legal type.
llvm-svn: 329871
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.
Without this, I get verifier errors on `@Test2`.
It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.
llvm-svn: 329868
The standard says that the order of evaluation of an expression
s[x] = foo()
is unspecified. In our case, we first create an empty entry in the map,
then call foo(), then store its return value to the created entry. The
problem is that foo uses the map as a cache, so if it finds that there
is an entry in the map, it stops computation. This change explicitly
sets the order, thus fixing this heisenbug.
llvm-svn: 329864
Without these functions it's hard to create a TargetMachine for
Orc JIT that creates efficient native code.
It's not sufficient to just expose LLVMGetHostCPUName(), because
for some CPUs there's fewer features actually available than
the CPU name indicates (e.g. AVX might be missing on some CPUs
identified as Skylake).
Differential Revision: https://reviews.llvm.org/D44861
llvm-svn: 329856
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).
Patch by Josh Stone.
llvm-svn: 329852
Similar to the wbinvd instruction, except this
one does not invalidate caches. Ring 0 only.
The encoding matches a wbinvd instruction with
an F3 prefix.
Reviewers: craig.topper, zvi, ashlykov
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D43816
llvm-svn: 329847
Most importantly, we should not replace slashes with backslashes
because that would invalidate the path.
Differential Revision: https://reviews.llvm.org/D45473
llvm-svn: 329838
Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550.
I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here,
There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers.
There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical.
NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329837
Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call.
Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329834
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329829
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Re-landed after noticing that the buildbot failure from 329808 seemed to
be unrelated.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329826
This is causing compilation timeouts on code with long sequences of
local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out
that code coverage instrumentation is a great way to create sequences
like this, which how our users ran into the issue in practice.
Intel has a tool that detects these kinds of non-linear compile time
issues, and Andy Kaylor reported it as PR37010.
The current sinking code scans the whole basic block once per local
value sink, which happens before emitting each call. In theory, local
values should only be introduced to be used by instructions between the
current flush point and the last flush point, so we should only need to
scan those instructions.
llvm-svn: 329822
Previously the MD5 option of the .file directive provided the checksum
as a quoted hex string; now it's a normal hex number with 0x prefix,
same as the .octa directive accepts.
Differential Revision: https://reviews.llvm.org/D45459
llvm-svn: 329820
Add the minimal support necessary to lower a function that returns the
sum of two i32 values.
Support argument/return lowering of i32 values through registers only.
Add tablegen for regbankselect and instructionselect.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D44304
llvm-svn: 329819
Two issues were fixed:
runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.
handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.
Differential Revision: https://reviews.llvm.org/D45187
llvm-svn: 329815
This reverts 329808. That change caused a report of a failure in
test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect
it is an expensive-check-only error.
Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0
llvm-svn: 329811
Summary:
Place parsing of a vector index into a separate function to reduce
duplication, since the code is duplicated in both the parsing of a
Neon vector register operand and a Neon vector list.
This is patch [2/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45428
llvm-svn: 329809
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329808
In r329691, we would choose FP even if the offset wouldn't fit, just
because the offset is smaller than the one from BP. This made many
accesses through FP need to scavenge a register, which resulted in
slower and bigger code for no good reason.
This patch now always picks the offset that fits first, even if FP is
preferred.
llvm-svn: 329797
This is a follow up of rL327695 to instruction select more variants of VSELGT
and VSELGE, for which it is necessary to custom lower SELECT.
More work is required in this area, which will be addressed soon:
- more variants need to be regression tested, but this depends on the next point.
- first LowerConstantFP need to be adjusted for fp16 values.
Differential Revision: https://reviews.llvm.org/D45205
llvm-svn: 329788
Summary:
Merged 'tryMatchVectorRegister' (specific to Neon) and
'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and
created a generic 'parseVectorKind()' function that returns the #Elements
and ElementWidth of a vector suffix. This reduces the duplication of
this functionality between two the vector implementations.
This is patch [1/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: fhahn
Subscribers: tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45427
llvm-svn: 329782
The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency.
llvm-svn: 329774
With -fno-plt, for example, calls to printf when getting converted to puts
still use the PLT. This patch checks for the metadata "RtLibUseGOT" and
annotates the declaration with the right attributes.
Differential Revision: https://reviews.llvm.org/D45180
llvm-svn: 329768
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
v2: - fix regressions in merge-stores.ll and multiple_tails.ll
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.
Reviewers: mcrosier
Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45502
llvm-svn: 329761
This is based on an example that was recently posted on llvm-dev:
void *propagate_null(void* b, int* g) {
if (!b) {
return 0;
}
(*g)++;
return b;
}
https://godbolt.org/g/xYk3qG
The original code or constant propagation in other passes has obscured the fact
that the phi can be removed completely.
Differential Revision: https://reviews.llvm.org/D45448
llvm-svn: 329755
Summary:
The verification rules for the intrinsics for atomic memcpy, atomic memmove,
and atomic memset are basically code clones. This change merges their verification
rules into a single block to remove duplication.
llvm-svn: 329753
Summary:
Darwin dynamic linker can handle weak symbols in ConstDataSection.
ReadonReadOnlyWithRel symbols should be emitted in ConstDataSection
instead of normal DataSection.
rdar://problem/39298457
Reviewers: dexonsmith, kledzik
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45472
llvm-svn: 329752
This commit fixes the bot failures that were coming up before with r329716.
The fix was to move the check for "isInSection()" inside of the if condition
and emit the error there instead of waiting to get past the unreachable statement.
This should work in debug and release builds now.
llvm-svn: 329746
There was missing nullptr check before a call to getSection() in
recordRelocation. This would result in a segfault in code like the attached
test.
This adds the missing check and a test which makes sure we get the expected
error output.
llvm-svn: 329716
Summary:
We would like the UMR debugging tool[0] to be able to provide
disassembly for currently live waves based on plain memory
dumps, and we want to leverage the LLVM disassembler for this.
This mostly works, except that UMR clearly can't provide real
symbol info, so it wants to set DisInfo == nullptr.
[0] https://cgit.freedesktop.org/amd/umr/
Reviewers: arsenm, rampitec, artem.tamazov, dp
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45477
Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6
llvm-svn: 329715
Summary:
This type is created on-demand and used as the base type for array
ranges. Since it is "special", its construction did not go through the
createTypeDIE function and so it was never inserted into the accelerator
table, although it clearly belongs there.
I add an explicit addAccelType call to insert it into the table.
During review, we also decided to rename the type to something more
unique to avoid confusion in case the user has own "sizetype" type. The
new name for the type size __ARRAY_SIZE_TYPE__.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45445
llvm-svn: 329705
Improve the alias analysis to account for cases where we
know that src/dst pairs cannot alias due to things like
TBAA. As we know they are noalias, we know no dependency
can occur. Also fixes issues around the size parameter
to AA being incorrect.
Differential Revision: https://reviews.llvm.org/D42381
llvm-svn: 329692
In the presence of variable-sized stack objects, we always picked the
base pointer when resolving frame indices if it was available.
This makes us hit an assert where we can't reach the emergency spill
slot if it's too far away from the base pointer. Since on AArch64 we
decide to place the emergency spill slot at the top of the frame, it
makes more sense to use FP to access it.
The changes here don't affect only emergency spill slots but all the
frame indices. The goal here is to try to choose between FP, BP and SP
so that we minimize the offset and avoid scavenging, or worse, asserting
when trying to access a slot allocated by the scavenger.
Previously discussed here: https://reviews.llvm.org/D40876.
Differential Revision: https://reviews.llvm.org/D45358
llvm-svn: 329691
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).
This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.
V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading
scratch descriptor from GIT.
Reviewers: kzhuravl, nhaehnle, timcorringham
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D44468
Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 329690
Much like any written register in load/store instructions, the status register
is not allowed to overlap with any others. So diagnose it like we already do
with the other cases.
llvm-svn: 329687
The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell
resource, and not a Broadwell processor resource. That entry was added to the
Broadwell model because variable blends were consuming it.
This was clearly a typo (the resource name should have been BWPort5), which
unfortunately was never caught before. It was not reported as an error because
HWPort5 is a resource defined by the Haswell model. It has been found when
testing some code with llvm-mca: the list of resources in the resource pressure
view was odd.
This patch fixes the issue; now variable blend instructions consume 2 cycles on
BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious)
entry in the BroadWellModelProcResources table.
llvm-svn: 329686
Summary:
Subtargets can define the libpfm counter names that can be used to
measure cycles and uops issued on ProcResUnits.
This allows making llvm-exegesis available on more targets.
Fixes PR36984.
Reviewers: gchatelet, RKSimon, andreadb, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45360
llvm-svn: 329675
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.
In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.
I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.
Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.
Differential Revision: https://reviews.llvm.org/D45154
llvm-svn: 329673
Prefer to use the 32-bit AND with immediate instead.
Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold.
Fixes PR37063.
llvm-svn: 329667
While reading Codeview records which contain variable-length encoded integers,
such as LF_BCLASS, LF_ENUMERATE, LF_MEMBER, LF_VBCLASS or LF_IVBCLASS,
the record's size would be improperly calculated in cases where the value was
indeed of a variable length (>= LF_NUMERIC). This caused a bad alignement on
the next record, which would/might crash later on.
Differential Revision: https://reviews.llvm.org/D45104
llvm-svn: 329659
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
Summary:
Another clean up, following D43208.
Interleaved memory access analysis/optimization has nothing to do with vectorization legality. It doesn't really belong there. On the other hand, cost model certainly has to know about it.
In principle, vectorization should proceed like Legality ==> Optimization ==> CostModel ==> CodeGen, and this change just does that,
by moving the interleaved access analysis/decision out of Legal, and run it just before CostModel object is created.
After this, I can move LoopVectorizationLegality and Hints/Requirements classes into it's own header file, making it shareable within Transform tree. I have the patch already but I don't want to mix with this change. Eventual goal is to move to Analysis tree, but I first need to move RecurrenceDescriptor/InductionDescriptor from Transform/Util/LoopUtil.* to Analysis.
Reviewers: rengolin, hfinkel, mkuper, dcaballe, sguggill, fhahn, aemerson
Reviewed By: rengolin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45072
llvm-svn: 329645
Summary:
SSAUpdater is a bottleneck in JumpThreading, and this patch improves the
situation by using SSAUpdaterBulk instead.
Compile time impact: no noticable changes on CTMark, a big improvement
on the test from PR16756.
Reviewers: dberlin, davide, MatzeB
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44282
llvm-svn: 329644
Summary:
SSAUpdater is a bottleneck in a number of passes, and one of the reasons
is that it performs a lot of unnecessary computations (DT/IDF) over and
over again. This patch adds a new SSAUpdaterBulk that uses existing DT
and avoids recomputing IDF when possible.
Reviewers: dberlin, davide, MatzeB
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44282
llvm-svn: 329643
The caching walker used to hold its own caches, which made its `reset()`
function meaningful. Since caching has been moved out of it, there's no
reason to continue to have these cache-related methods.
Similarly, the EXPENSIVE_CHECKS block that's getting removed used to
rerun the query with caching disabled. Since that's how we always do
queries now, it's redundant.
llvm-svn: 329638
Lower is slightly odd. It often doesn't change the type but the lowerings
do use the new type to decide what code to create. Treat it like a mutation
but provide convenience functions that re-use the existing type.
Re-uses the existing tests:
test/CodeGen/AArch64/GlobalISel/legalize-rem.mir
test/CodeGen/AArch64/GlobalISel//legalize-mul.mir
test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir
llvm-svn: 329623
Fix PR36484, as suggested:
<quote>
during moves, mark the direct users of the erased things that were phis as "not to be optimized"
<quote>
llvm-svn: 329621
1. Remove max_scratch_backing_memory_byte_size from kernel header
2. Make it a reserved field
3. Ignore it while parsing assembly for backwards compatibility
4. Bump up minor version of kernel header
Differential Revision: https://reviews.llvm.org/D45452
llvm-svn: 329620
LowerIntUnary as its name says has an assert for integer types. But for the bitcast case one side might be an FP type.
Rather than making sure the function really works for fp types and renaming it. Just do really basic splitting directly. The LowerIntUnary has the advantage that it can peek through BUILD_VECTOR because every other call is during Lowering. But these calls are during legalization and will be followed by a DAG combine round.
Revert some change to LowerVectorIntUnary that were originally made just to make these two calls work even in pure integer cases.
This was found purely by compiling the avx512f-builtins.c test from clang so I've copied over the offending function from that.
llvm-svn: 329616
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. It reduces the size of Chromium for
Android's .text section by 46KB, or 56KB with ThinLTO (which exposes
more opportunities to use a direct access rather than a GOT access).
Because the addend range is limited in COFF and Mach-O, this is
enabled for ELF only.
Differential Revision: https://reviews.llvm.org/D45199
llvm-svn: 329611
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: sunfish, RKSimon
Reviewed By: sunfish
Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, llvm-commits
Differential Revision: https://reviews.llvm.org/D44873
llvm-svn: 329607
This allows MachineMemOperand::getSize()'s result to be fed directly into
MachineMemOperand::MachineMemOperand() without a narrowing type conversion
warning.
llvm-svn: 329602
building.
https://reviews.llvm.org/D45067
This change attempts to do two things:
1) It separates out the state that is stored in the
MachineIRBuilder(InsertionPt, MF, MRI, InsertFunction etc) into a
separate object called MachineIRBuilderState.
2) Add the ability to constant fold operations while building instructions
(optionally). MachineIRBuilder is now refactored into a MachineIRBuilderBase
which contains lots of non foldable build methods and their implementation.
Instructions which can be constant folded/transformed are now in a class
called FoldableInstructionBuilder which uses CRTP to use the implementation
of the derived class for buildBinaryOps. Additionally buildInstr in the derived
class can be used to implement other kinds of transformations.
Also because of separation of state, given a MachineIRBuilder in an API,
if one wishes to use another MachineIRBuilder, a new one can be
constructed from the state locally. For eg,
void doFoo(MachineIRBuilder &B) {
MyCustomBuilder CustomB(B.getState());
// Use CustomB for building.
}
reviewed by : aemerson
llvm-svn: 329596
While it appears to be correct information based on Intel's optimization manual and Agner's data, it causes perf regressions on a couple of the benchmarks in our internal list.
llvm-svn: 329593
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329591
Summary:
This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU
target without any other targets that use GlobalISel.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D45353
llvm-svn: 329588
Without the fast math flags, the llvm.experimental.vector.reduce.fadd/fmul intrinsic expansions must be expanded in order.
This patch scalarizes the reduction, applying the accumulator at the start of the sequence: ((((Acc + Scl[0]) + Scl[1]) + Scl[2]) + ) ... + Scl[NumElts-1]
Differential Revision: https://reviews.llvm.org/D45366
llvm-svn: 329585
This patch fixes an issue exposed on the SystemZ build bots when committing
https://reviews.llvm.org/rL327856. The hoisting was temporarily disabled with
an option. This patch now re-enables hoisting and checks that we only hoist a
store instruction when all its operands are either constant caller preserved
registers or immediates.
Differential Revision: https://reviews.llvm.org/D45286
llvm-svn: 329577
Summary:
If an input DICompileUnit is completely empty (e.g., the result of
running "clang -g" on an empty file), we don't bother emitting an empty
DWARF CU. When we do that, we must make sure we don't also emit a DWARF v5
name index, as DWARF specifies that each index must reference at least
one compilation unit.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45435
llvm-svn: 329575
Summary:
We do not try to move the instructions and split the block till we
know the blocks can be split, i.e. BCE-cmp-insts can be separated from
non-BCE-cmp-insts.
Reviewers: davide, courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44443
llvm-svn: 329564
Summary:
We were emitting accelerator entries for functions with no name, which
is contrary to the DWARF v5 spec: "All other (i.e., *not*
DW_TAG_namespace) debugging information entries without a DW_AT_name
attribute are excluded." Besides that, a name table entry with an empty
string as a key is fairly useless.
We can sometimes end up with functions which have a DW_AT_linkage_name but no
DW_AT_name. One such example is the global-constructor-initialization functions,
which C++ compilers synthesize for each compilation unit with global
constructors.
A very strict reading of the DWARF v5 spec would suggest that we should not even
emit the accelerator entry for the linkage name in this case, but I don't think
we should go that far.
I found this when running the dwarf verifier over llvm codebase compiled
with DWARF v5 accelerator tables.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: vleschuk, clayborg, echristo, probinson, llvm-commits
Differential Revision: https://reviews.llvm.org/D45367
llvm-svn: 329552
Recommitting r329283, third time lucky...
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
llvm-svn: 329551
These are were previously grouped in small groups of similarish intrinsics. But all the intrinsics have the same number of arguments and the same order. So we can move them all into a larger group for handling.
llvm-svn: 329549
In IRCE, we have a very old legacy check that works when we collect comparisons that we
treat as range checks. It ensures that the value against which the indvar is compared is
loop invariant and is also positive.
This latter condition remained there since the times when IRCE was only able to handle
signed latch comparison. As the optimization evolved, it now learned how to intersect
signed or unsigned ranges, and this logic has no reliance on the fact that the right border
of each range should be positive.
The old implementation of this non-negativity check was also naive enough and just looked
into ranges (while most of other IRCE logic tries to use power of SCEV implications), so this
check did not allow to deal with the most simple case that looks like follows:
int size; // not known non-negative
int length; //known non-negative;
i = 0;
if (size != 0) {
do {
range_check(i < size);
range_check(i < length);
++i;
} while (i < size)
}
In this case, even if from some dominating conditions IRCE could parse loop
structure, it could only remove the range check against `length` and simply
ignored the check against `size`.
In this patch we remove this obsolete check. It will allow IRCE to pick comparison
against `size` as a potential range check and then let Range Intersection logic
decide whether it is OK to eliminate it or not.
Differential Revision: https://reviews.llvm.org/D45362
Reviewed By: samparker
llvm-svn: 329547
Summary:
Currently MachineLoopInfo is used in only two places:
1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used.
2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true.
Despite that, we currently have a dependency on MachineLoopInfo, which makes
pass manager to compute it and MachineDominator Tree. This patch removes the
use (1) and makes the use (2) lazy, thus avoiding some redundant
recomputations.
Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi
Subscribers: rengolin, javed.absar, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44812
llvm-svn: 329542
The TargetSchedModel is always initialized using the TargetSubtargetInfo's
MCSchedModel and TargetInstrInfo, so we don't need to extract those and
pass 3 parameters to init().
Differential Revision: https://reviews.llvm.org/D44789
llvm-svn: 329540
Summary:
Cmov and setcc previously used WriteALU, but on Intel processors at least they are more restricted than basic ALU ops.
This patch adds new SchedWrites for them and removes the InstRWs. I had to leave some InstRWs for CMOVA/CMOVBE and SETA/SETBE because those have an extra uop relative to the other condition codes on Intel CPUs.
The test changes are due to fixing a missing ZnAGU dependency on the memory form of setcc.
Reviewers: RKSimon, andreadb, GGanesh
Reviewed By: RKSimon
Subscribers: GGanesh, llvm-commits
Differential Revision: https://reviews.llvm.org/D45380
llvm-svn: 329539
Summary:
This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data.
The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx"
Reviewers: RKSimon, GGanesh
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44841
llvm-svn: 329538
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: chandlerc, jordan_rose, bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45140
llvm-svn: 329536
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: hfinkel, RKSimon
Reviewed By: RKSimon
Subscribers: nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D44870
llvm-svn: 329535
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: chandlerc, craig.topper, RKSimon
Reviewed By: chandlerc, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44874
llvm-svn: 329534
Explicitly link LLVMTestingSupport library against LLVMSupport. This
is necessary to fix linking errors when LLVMTestingSupport is built
as a shared library (with BUILD_SHARED_LIBS=ON) and -Wl,-z,defs is used.
Differential Revision: https://reviews.llvm.org/D45408
llvm-svn: 329522
In our real world application, we found the following optimization is missed in DAGCombiner
(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))
If the user of original zext is an add, it may enable further lea optimization on x86.
This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.
Differential Revision: https://reviews.llvm.org/D44402
llvm-svn: 329516
Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget
llvm-svn: 329510
Should fix UBSan bot by also checking there's no "uwtable" attribute
before skipping. Otherwise the unwind table will be useless since its
moves expect CSRs to actually be preserved.
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch mostly by myeisha (pmb).
llvm-svn: 329494
Summary:
The LLVM SourceMgr class (which is used indirectly by Swift, though not Clang)
has a routine for looking up line numbers of SMLocs. This routine uses a
shared, special-purpose cache that handles exactly one access pattern
efficiently: looking up the line number of an SMLoc that points into the same
buffer as the last query made to the SourceMgr, at a location in the buffer at
or ahead of the last query.
When this works it's fine, but when it fails it's catastrophic for performancer:
one recent out-of-order access from a Swift utility routine ran for tens of
seconds, spending 99% of its time repeatedly scanning buffers for '\n'.
This change removes the shared cache from the SourceMgr and installs a new
cache in each SrcBuffer. The per-SrcBuffer caches are also "full", in the sense
that rather than caching a single last-query pointer, they cache _all_ the
line-ending offsets, in a binary-searchable array, such that once it's
populated (on first access), all subsequent access patterns run at the same
speed.
Performance measurements I've done show this is actually a little bit faster on
real codebases (though only a couple fractions of a percent). Memory usage is
up by a few tens to hundreds of bytes per SrcBuffer that has a line lookup done
on it; I've attempted to minimize this by using dynamic selection of integer
sized when storing offset arrays. But the main motive here is to
make-impossible the cases we don't always see, that show up by surprise when
there is an out-of-order access pattern.
Reviewers: jordan_rose
Reviewed By: jordan_rose
Subscribers: probinson, llvm-commits
Differential Revision: https://reviews.llvm.org/D45003
llvm-svn: 329470
Llvm-mc (and tools that use Path.inc on Windows) assume that strings are utf-8
encoded, however, this is not always the case. On Windows the default codepage
is not utf-8, so most of the time the strings are not utf-8 encoded.
The lld test 'format-binary-non-ascii' uses llvm-mc with a file with non-ascii
characters in the name which is how this bug was found. The test fails when run
using Python 3 because it uses properly encoded unicode strings (Python 2 actually
ends up using a byte string which is not utf-8 encoded, so the test passes, but
that's separate issue).
Patch by Stella Stamenova!
llvm-svn: 329468
Previously HalfTy was not handled which would either trigger an assertion,
or result in array initialized with garbage.
Differential Revision: https://reviews.llvm.org/D45391
llvm-svn: 329463
v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16
and that was not previously handled correctly by tryLDGLDU()
Differential Revision: https://reviews.llvm.org/D45339
llvm-svn: 329456
Summary:
This patch implements a tablegen-driven Instruction Compression
mechanism for generating RISCV compressed instructions
(C Extension) from the expanded instruction form.
This tablegen backend processes CompressPat declarations in a
td file and generates all the compile-time and runtime checks
required to validate the declarations, validate the input
operands and generate correct instructions.
The checks include validating register operands, immediate
operands, fixed register operands and fixed immediate operands.
Example:
class CompressPat<dag input, dag output> {
dag Input = input;
dag Output = output;
list<Predicate> Predicates = [];
}
let Predicates = [HasStdExtC] in {
def : CompressPat<(ADD GPRNoX0:$rs1, GPRNoX0:$rs1, GPRNoX0:$rs2),
(C_ADD GPRNoX0:$rs1, GPRNoX0:$rs2)>;
}
The result is an auto-generated header file
'RISCVGenCompressEmitter.inc' which exports two functions for
compressing/uncompressing MCInst instructions, plus
some helper functions:
bool compressInst(MCInst& OutInst, const MCInst &MI,
const MCSubtargetInfo &STI,
MCContext &Context);
bool uncompressInst(MCInst& OutInst, const MCInst &MI,
const MCRegisterInfo &MRI,
const MCSubtargetInfo &STI);
The clients that include this auto-generated header file and
invoke these functions can compress an instruction before emitting
it, in the target-specific ASM or ELF streamer, or can uncompress
an instruction before printing it, when the expanded instruction
format aliases is favored.
The following clients were added to implement compression\uncompression
for RISCV:
1) RISCVAsmParser::MatchAndEmitInstruction:
Inserted a call to compressInst() to compresses instructions
parsed by llvm-mc coming from an ASM input.
2) RISCVAsmPrinter::EmitInstruction:
Inserted a call to compressInst() to compress instructions that
were lowered from Machine Instructions (MachineInstr).
3) RVInstPrinter::printInst:
Inserted a call to uncompressInst() to print the expanded
version of the instruction instead of the compressed one (e.g,
add s0, s0, a5 instead of c.add s0, a5) when -riscv-no-aliases
is not passed.
This patch squashes D45119, D42780 and D41932. It was reviewed in smaller patches by
asb, efriedma, apazos and mgrang.
Reviewers: asb, efriedma, apazos, llvm-commits, sabuasal
Reviewed By: sabuasal
Subscribers: mgorny, eraman, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng
Differential Revision: https://reviews.llvm.org/D45385
llvm-svn: 329455
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: stoklund, kparzysz, dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45144
llvm-svn: 329451
Summary:
The 'strong' StackProtector heuristic takes into consideration call instructions.
Certain intrinsics, such as lifetime.start, can cause the
StackProtector to protect functions that do not need to be protected.
Specifically, a volatile variable, (not optimized away), but belonging to a stack
allocation will encourage a llvm.lifetime.start to be inserted during
compilation. Because that intrinsic is a 'call' the strong StackProtector
will see that the alloca'd variable is being passed to a call instruction, and
insert a stack protector. In this case the intrinsic isn't really lowered to a
call. This can cause unnecessary stack checking, at the cost of additional
(wasted) CPU cycles.
In the future we should rely on TargetTransformInfo::isLoweredToCall, but as of
now that routine considers all intrinsics as not being lowerable. That needs
to be corrected, and such a change is on my list of things to get moving on.
As a side note, the updated stack-protector-dbginfo.ll test always seems to
pass. I never see the dbg.declare/dbg.value reaching the
StackProtector::HasAddressTaken, but I don't see any code excluding dbg
intrinsic calls either, so I think it's the safest thing to do.
Reviewers: void, timshen
Reviewed By: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45331
llvm-svn: 329450
The compiler is generating packet with the following instructions,
which causes an undefined register assert in the verifier.
$r0 = IMPLICIT_DEF
$r1 = IMPLICIT_DEF
S2_storerd_io killed $r29, 0, killed %d0
The problem is that the packetizer is not saving the IMPLICIT_DEF
instructions, which are needed when checking if it is legal to
add the store instruction. The fix is to add the IMPLICIT_DEF
instructions to the CurrentPacketMIs structure.
Patch by Brendon Cahoon.
llvm-svn: 329439
Packetizer keeps two zero-latency bound instrctions in the same packet ignoring
the stalls on the later instruction. This should not be the case if there is no
data dependence.
Patch by Sumanth Gundapaneni.
llvm-svn: 329437
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: bogner, rnk, MatzeB, RKSimon
Reviewed By: rnk
Subscribers: JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45133
llvm-svn: 329435
Summary:
This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency.
Apparently we were inconsistent about whether the store has latency or not thus the test changes.
I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5.
Reviewers: RKSimon, andreadb
Reviewed By: andreadb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45351
llvm-svn: 329416
Summary:
Fixing an issue where initializations of globals where constructors use
casts were silently translated to 0-initialization.
Reviewers: davidxl, evgeny777
Reviewed By: evgeny777
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45198
llvm-svn: 329409
Summary:
This patch add checks to verify that the information in the name index
entries is consistent with the debug_info section. Specifically, we
check that entries point to valid DIEs, and their names, tags, and
compile units match the information in the debug_info sections.
These checks are only run if the previous checks did not find any errors
in the name index headers. Attempting to proceed with the checks anyway
would likely produce a lot of spurious errors and the verification code
would need to be very careful to avoid crashing.
I also add a couple of more checks to the abbreviation-validation code
to verify that some attributes are always present (an index without a
DW_IDX_die_offset attribute is fairly useless).
The entry verification works only on indexes without any type units - I
haven't attempted to extend it to type units, as we don't even have a
DWARF v5-compatible type unit generator at the moment.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45323
llvm-svn: 329392
As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions.
I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent.
As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests...
Differential Revision: https://reviews.llvm.org/D44654
llvm-svn: 329388
Inserting instrumentation between a musttail call and ret instruction
would create invalid IR. Instead, treat musttail calls as function
exits.
llvm-svn: 329385
MFI.LocalFrameSize was not serialized.
It is usually set from LocalStackSlotAllocation, so if that pass doesn't
run it is impossible do deduce it from the stack objects. Until now, this
information was lost.
llvm-svn: 329382
Summary:
The positions of the DwarfVersion and AddressSize arguments were
reversed, which caused parsing for dwarf opcodes which contained
address-size-dependent operands (such as DW_OP_addr). Amusingly enough,
none of the address-size asserts fired, as dwarf version was always 4,
which is a valid address size.
I ran into this when constructing weird inputs for the DWARF verifier. I
I add a test case as hand-written dwarf -- I am not sure how to trigger
this differently, as having a DW_OP_addr inside a location list is a
fairly non-standard thing to do.
Fixing this error exposed a bug in the debug_loc.dwo parser, which was
always being constructed with an address size of 0. I fix that as well
by following the pattern in the non-dwo parser of picking up the address
size from the first compile unit (which is technically not correct, but
probably good enough in practice).
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45324
llvm-svn: 329381
VSX D-form load/store instructions of POWER9 require the offset be a multiple of 16 and a helper`isOffsetMultipleOf` is used to check this.
So far, the helper handles FrameIndex + offset case, but not handling FrameIndex without offset case. Due to this, we are missing opportunities to exploit D-form instructions when accessing an object or array allocated on stack.
For example, x-form store (stxvx) is used for int a[4] = {0}; instead of d-form store (stxv). For larger arrays, D-form instruction is not used when accessing the first 16-byte. Using D-form instructions reduces register pressure as well as instructions.
Differential Revision: https://reviews.llvm.org/D45079
llvm-svn: 329377
Summary:
- Add a missing getter for module-level inline assembly
- Add a missing append function for module-level inline assembly
- Deprecate LLVMSetModuleInlineAsm and replace it with LLVMSetModuleInlineAsm2 which takes an explicit length parameter
- Deprecate LLVMConstInlineAsm and replace it with LLVMGetInlineAsm, a function that allows passing a dialect and is not mis-classified as a constant operation
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45346
llvm-svn: 329369
This restores what was lost with rL73243 but without
re-introducing the bug that was present in the old code.
Note that we already have these transforms if the ops are
marked 'fast' (and I assume that's happening somewhere in
the code added with rL170471), but we clearly don't need
all of 'fast' for these transforms.
llvm-svn: 329362
Summary:
Replace ArrayRefs by actual std::array objects so that there are
no dangling references.
Reviewers: rsmith, gkistanova
Subscribers: sdardis, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D45338
llvm-svn: 329359
r327219 added wrappers to std::sort which randomly shuffle the container before
sorting. This will help in uncovering non-determinism caused due to undefined
sorting order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of
std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to
llvm::sort. Refer D44363 for a list of all the required patches.
llvm-svn: 329353
This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64.
The Sandy Bridge version was missing a load port use.
llvm-svn: 329347
Currently it is 6. If the "feature" was not used, report dummy
hidden argument. Otherwise it does not match the kernarg size
reported in the kernel header.
Differential Revision: https://reviews.llvm.org/D45129
llvm-svn: 329341
We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion.
llvm-svn: 329339
Functions in different objects may use different TOCs, so calls between such
functions should use the global entry point of the callee which updates the
TOC pointer.
This should fix a bug that the Numba developers encountered (see
https://github.com/numba/numba/issues/2451).
Patch by Olexa Bilaniuk. Thanks Olexa!
No RuntimeDyld checker test case yet as I am not familiar enough with how
RuntimeDyldELF fixes up call-sites, but I do not want to hold up landing
this. I will continue to work on it and see if I can rope some powerpc
experts in.
llvm-svn: 329335
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: pcc, mehdi_amini, dexonsmith
Reviewed By: dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45132
llvm-svn: 329334
Summary:
This is a fix to PR37005.
Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug
whereby it will convert:
%src = getelementptr inbounds i8, i8* %base, <2 x i64> %val
%res = getelementptr inbounds i8, <2 x i8*> %src, i64 %val2
into:
%src = getelementptr inbounds i8, i8* %base, i64 %val2
%res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val
By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2
is loop invariant.
This fix recreates new GEP instructions if the index operand swap would result in the type
of %src changing from vector to scalar, or vice versa.
Reviewers: sebpop, spatel
Reviewed By: sebpop
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45287
llvm-svn: 329331
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: t.p.northover, RKSimon, MatzeB, bkramer
Reviewed By: bkramer
Subscribers: javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D44855
llvm-svn: 329329
This patch adds a way for users to create their own custom sections to
be added to wasm files. At the LLVM IR layer, they are defined through
the "wasm.custom_sections" named metadata. The expected use case for
this is bindings generators such as wasm-bindgen.
Patch by Dan Gohman
Differential Revision: https://reviews.llvm.org/D45297
llvm-svn: 329315
This patch adds the ability to describe properties of the hardware retire
control unit.
Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).
A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.
A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize. A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".
Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.
Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).
This patch fixes PR36661.
Differential Revision: https://reviews.llvm.org/D45259
llvm-svn: 329304
Summary:
The existing Failed() matcher only allowed asserting that the operation
failed, but it was not possible to verify any details of the returned
error.
This patch adds two new matchers, which make this possible:
- Failed<InfoT>() verifies that the operation failed with a single error
of a given type.
- Failed<InfoT>(M) additionally check that the contained error info
object is matched by the nested matcher M.
To make these work, I've changed the implementation of the ErrorHolder
class. Now, instead of just storing the string representation of the
Error, it fetches the ErrorInfo objects and stores then as a list of
shared pointers. This way, ErrorHolder remains copyable, while still
retaining the full information contained in the Error object.
In case the Error object contains two or more errors, the new matchers
will fail to match, instead of trying to match all (or any) of the
individual ErrorInfo objects. This seemed to be the most sensible
behavior for when one wants to match exact error details, but I could be
convinced otherwise...
Reviewers: zturner, lhames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44925
llvm-svn: 329288
A noreturn nounwind function can be expected to never return in any way, and by
never returning it will also never have to restore any callee-saved registers
for its caller. This makes it possible to skip spills of those registers during
function entry, saving some stack space and time in the process. This is rather
useful for embedded targets with limited stack space.
Should fix PR9970.
Patch by myeisha (pmb).
llvm-svn: 329287
For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp
This patch causes problems for a number of models that had been incorrectly flagged as complete.
Differential Revision: https://reviews.llvm.org/D43235
llvm-svn: 329280
Summary:
Add a new plugin API. This closes the gap between pass registration and out-of-tree passes for the new PassManager.
Unlike with the existing API, interaction with a plugin is always
initiated from the tools perspective. I.e., when a plugin is loaded, it
resolves and calls a well-known symbol `llvmGetPassPluginInfo` to obtain
details about the plugin. The fundamental motivation is to get rid of as
many global constructors as possible. The API exposed by the plugin
info is kept intentionally minimal.
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: bollu, grosser, lksbhm, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D35258
llvm-svn: 329273
This patch introduces a way to set custom OptPassGate instances to LLVMContext.
A new instance field OptBisector and a new method setOptBisect() are added
to the LLVMContext classes. These changes allow to set a custom OptBisect class
that can make its own decisions on skipping optional passes.
Another important feature of this change is ability to set different instances
of OptPassGate to different LLVMContexts. So the different contexts can be used
independently in several compiling threads of one process.
One unit test is added.
Patch by Yevgeny Rouban.
Reviewers: andrew.w.kaylor, fedor.sergeev, vsk, dberlin, Eugene.Zelenko, reames, skatkov
Reviewed By: andrew.w.kaylor, fedor.sergeev
Differential Revision: https://reviews.llvm.org/D44464
llvm-svn: 329267
LoopInterchange relies on LoopInfo being up-to-date, so we should
preserve it after interchanging. This patch updates restructureLoops to
move the BBs of the interchanged loops to the right place.
Reviewers: davide, efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D45278
llvm-svn: 329264
It's failing on the bots and I'm not sure why.
This reverts:
[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents.
[X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256.
[X86] Remove some InstRWs for plain store instructions on Sandy Bridge.
[X86] Auto-generate complete checks. NFC
llvm-svn: 329256
Summary:
If the callsite is inside landing pad, do not perform callsite splitting.
Callsite splitting uses utility function llvm::DuplicateInstructionsInSplitBetween, which eventually calls llvm::SplitEdge. llvm::SplitEdge calls llvm::SplitCriticalEdge with an assumption that the function returns nullptr only when the target edge is not a critical edge (and further assumes that if the return value was not nullptr, the predecessor of the original target edge always has a single successor because critical edge splitting was successful). However, this assumtion is not true because SplitCriticalEdge returns nullptr if the destination block is a landing pad. This invalid assumption results assertion failure.
Fundamental solution might be fixing llvm::SplitEdge to not to rely on the invalid assumption. However, it'll involve a lot of work because current API assumes that llvm::SplitEdge never fails. Instead, this patch makes callsite splitting to not to attempt splitting if the callsite is in a landing pad.
Attached test case will crash with assertion failure without the fix.
Reviewers: fhahn, junbuml, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45130
llvm-svn: 329250
A bug was found where an offset of -1 would generate an encoding
of max int64 which is invalid in the binary format.
Differential Revision: https://reviews.llvm.org/D45280
llvm-svn: 329238
The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.
Differential Revision: https://reviews.llvm.org/D45239
llvm-svn: 329236
Summary: @llvm.icall.branch.funnel is musttail with variable number of
arguments. After inlining current backend can't separate call targets from call
arguments.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45116
llvm-svn: 329235
Sometimes instead of storing addresses as is, the kernel stores the address of
a page and an offset within that page, and then computes the actual address
when it needs to make an access. Because of this the pointer tag gets lost
(gets set to 0xff). The solution is to ignore all accesses tagged with 0xff.
This patch adds a -hwasan-match-all-tag flag to hwasan, which allows to ignore
accesses through pointers with a particular pointer tag value for validity.
Patch by Andrey Konovalov.
Differential Revision: https://reviews.llvm.org/D44827
llvm-svn: 329228
The MachineOutliner has a bunch of target hooks that will call llvm_unreachable
if the target doesn't implement them. Therefore, if you enable the outliner on
such a target, it'll just crash. It'd be much better if it'd just *not* run
the outliner at all in this case.
This commit adds a hook to TargetInstrInfo that returns false by default.
Targets that implement the hook make it return true. The outliner checks the
return value of this hook to decide whether or not to continue.
llvm-svn: 329220
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329218
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches.
Reviewers: t.p.northover, jmolloy, RKSimon, rengolin
Reviewed By: rengolin
Subscribers: dexonsmith, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D44853
llvm-svn: 329216
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329215
Using this, you can use llvm-pdbutil to export the contents of a
stream to a binary file, then run explain on the binary file so
that it treats the offset as an offset into the stream instead
of an offset into a file. This makes it easy to compare the
contents of the same stream from two different files.
llvm-svn: 329207
Some compilers do not like having an enum type and a variable with the
same name (AccelTableKind). I rename the variable to TheAccelTableKind.
Suggestions for a better name welcome.
llvm-svn: 329202
- MSVC was not OK with a static_assert referencing a non-static member
variable, even though it was just in a sizeof(expression). I move the
assert into the emit function, where it is probably more useful.
- Tests were failing in builds which did not have the X86 target
configured. Since this functionality is not target-specific, I have
removed the target specifiers from the .ll files.
llvm-svn: 329201
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.
Also remove some stale-looking comments from the Nios2 target cmakefile.
No intended behavior change.
llvm-svn: 329181
Summary:
This patch adds a DwarfAccelTableEmitter class, which generates an
accelerator table, as specified in DWARF v5 standard. At the moment it
only generates a DIE offset column and (if we are indexing more than one
compile unit) a CU column.
Indexing type units is not currently supported, as we don't even have
the ability to generate DWARF v5-compatible compile units.
The implementation is not data-source agnostic like the one generating
apple tables. This was not necessary as we currently only have one user
of this code, and without a second user it was not obvious to me how to
best abstract this. (The difference between these tables and the apple
ones is that they need a lot more metadata about the debug info they are
indexing).
The generation is triggered by the --accel-tables argument, which
supersedes the --dwarf-accel-tables arg -- the latter was a simple
on-off switch, but not we can choose between two kinds of accelerator
tables we can generate.
This is tested by parsing the generated tables with llvm-dwarfdump and
the DWARFVerifier, and I've also checked that GNU readelf is able to
make sense of the tables.
Differential Revision: https://reviews.llvm.org/D43286
llvm-svn: 329179
They were added in r285274, in what looks like a merge mishap.
AVRGenMCCodeEmitter.inc is the only non-dupe tablegen invocation added in that
revision.
Also sort the tablegen lines to make this easier to spot in the future.
llvm-svn: 329178
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.
This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.
Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.
v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI
Change-Id: I099f309e0a394082a5901ea196c3967afb867f04
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44939
llvm-svn: 329166
Fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform, so the branch is
non-uniform.
This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.
As discovered after committing an earlier version of this change, this
exposes a subtle interaction between this pass and DivergenceAnalysis:
since we remove and re-create branch instructions, we can no longer rely
on DivergenceAnalysis for branches in subregions that were already
processed by the pass.
Explicitly remove branch instructions from DivergenceAnalysis to
avoid dangling pointers as a matter of defensive programming, and
change how we detect non-uniform subregions.
Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4
Differential Revision: https://reviews.llvm.org/D43743
llvm-svn: 329165
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.
There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".
Fixes a bug encountered in Nier: Automata.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D40547
llvm-svn: 329164
Recommitting rL321259. Previosuly this caused an issue with PPCBE but
I didn't receieve a reproducer and didn't have the time to follow up.
If the issue appears again, please provide a reproducer so I can fix
it.
Original commit message:
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
llvm-svn: 329160
Summary:
Patch https://reviews.llvm.org/D44467 implements conversion of invalid
vmov instructions into valid ones. It turned out that some valid
instructions also get converted, for example
vmov.i64 d2, #0xff00ff00ff00ff00 ->
vmov.i16 d2, #0xff00
Such behavior is incorrect because according to the ARM ARM section
F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD
instructions, "On assembly, the data type must be matched in the table
if possible."
This patch fixes the isNEONmovReplicate check so that the above
instruction is not modified any more.
Reviewers: rengolin, olista01
Reviewed By: rengolin
Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D44678
llvm-svn: 329158
These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both.
llvm-svn: 329154
This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis.
If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values
`L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also
prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient
to prove the predicate for values that come from the same predecessor block.
The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0`
given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and
non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot.
Differential Revision: https://reviews.llvm.org/D44001
Reviewed By: anna
llvm-svn: 329150
Summary:
Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors.
This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store.
Reviewers: efriedma, jmolloy, ABataev
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39760
llvm-svn: 329142
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.
Reviewers: pcc, vitalybuka
Reviewed By: pcc
Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44802
llvm-svn: 329139
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.
Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.
This also adds a new test for the new redzone behaviour.
llvm-svn: 329134
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D45219
llvm-svn: 329131
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().
llvm-svn: 329121
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.
This removes the requirement to pass -mno-red-zone when outlining for AArch64.
https://reviews.llvm.org/D45189
llvm-svn: 329120
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.
This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
llvm-svn: 329116
Summary:
If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment.
For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned.
```
%PackedStruct = type <{ i64 }>
...
%data = alloca %PackedStruct, align 8
```
If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame:
```
%f.Frame = type { ..., i16, [6 x i8], %PackedStruct }
```
Reviewers: rnk, lewissbaker, modocache
Reviewed By: modocache
Subscribers: EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D45221
llvm-svn: 329112
It also updates test/Transforms/LoopInterchange/call-instructions.ll
to use accesses where we can prove dependence after D35430.
Reviewers: sebpop, karthikthecool, blitz.opensource
Reviewed By: sebpop
Differential Revision: https://reviews.llvm.org/D45206
llvm-svn: 329111
Summary:
Introduce the ShadowCallStack function attribute. It's added to
functions compiled with -fsanitize=shadow-call-stack in order to mark
functions to be instrumented by a ShadowCallStack pass to be submitted
in a separate change.
Reviewers: pcc, kcc, kubamracek
Reviewed By: pcc, kcc
Subscribers: cryptoad, mehdi_amini, javed.absar, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44800
llvm-svn: 329108
The missing definitions are from cvconst.h shipped with DIA SDK.
Correct the url to MSDN for MemoryTypeEnum and set the underlying
type of PDB_StackFrameType and PDB_MemoryType to uint16_t.
llvm-svn: 329104
Summary:
This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs
property, so now the MachineFunctionPass can enforce this check.
These passes are disabled in NVPTX & WebAssembly.
Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier
Reviewed By: dschuff, thegameg
Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45183
llvm-svn: 329095
Summary:
Some targets do not support extended format of .loc directive and
support only simple format: .loc <FileID> <Line> <Column>. Patch adds
MCAsmInfo flag and option that allows emit .loc directive without
additional flags.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45184
llvm-svn: 329089
Summary:
Folding patterns like:
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%ext = extractelement <4 x i8> %insvec, i32 0
%cond = icmp eq i32 %ext, 0
Combined with existing rules, this allows us to fold patterns like:
%insvec = insertelement <4 x i8> undef, i8 %val, i32 0
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%cond = icmp eq i8 %val, 0
When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.
Reviewers: spatel, anna, mkazantsev
Reviewed By: spatel
Subscribers: efriedma, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D44997
llvm-svn: 329087
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 329085
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.
I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
errs() rather than dbgs().
llvm-svn: 329082
This should fix the problem reported by the lld buildbots:
- Builder lld-x86_64-darwin13, Build #19782
- Builder lld-perf-testsuite, Build #1419
llvm-svn: 329068
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
The default assembly handling mode may introduce false positives in the
cases when MSan doesn't understand that the assembly call initializes
the memory pointed to by one of its arguments.
We introduce the conservative mode, which initializes the first
|sizeof(type)| bytes for every |type*| pointer passed into the
assembly statement.
llvm-svn: 329054
The patch changes the usage of dominate to properlyDominate
to satisfy the condition !(a < a) while using std::max.
It is actually NFC due to set data structure is used to keep
the Loops and no two identical loops can be in collection.
So in reality there is no difference between usage of
dominate and properlyDominate in this particular case.
However it might be changed so it is better to fix it.
llvm-svn: 329051
Current implementation of `computeExitLimit` has a big piece of code
the only purpose of which is to prove that after the execution of this
block the latch will be executed. What it currently checks is actually a
subset of situations where the exiting block dominates latch.
This patch replaces all these checks for simple particular cases with
domination check over loop's latch which is the only necessary condition
of taking the exiting block into consideration. This change allows to
calculate exact loop taken count for simple loops like
for (int i = 0; i < 100; i++) {
if (cond) {...} else {...}
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44677
Reviewed By: efriedma
llvm-svn: 329047
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.
I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
`errs()` rather than `dbgs()`.
llvm-svn: 329046
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
(1). In code, just swap is not enough, ConditionalCode CC
should also be swapped, otherwise incorrect code will
be generated.
(2). The ConditionalCode swap should be subject to
getHasJmpExt(). If getHasJmpExt() is False, certain
conditional codes will not be supported and swap
may generate incorrect code.
The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329042
We use two approaches for determining the minimum bitwidth.
* Demanded bits
* Value tracking
If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.
But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in
%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i
are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in
%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0
it doesn't make sense to shrink %tmp15 and we can skip the value tracking.
Ideas are from Matthew Simpson!
Differential Revision: https://reviews.llvm.org/D44868
llvm-svn: 329035
Summary:
When attempting to split a coroutine with 'hidden' visibility (for
example, a C++ coroutine that is inlined when compiled with the option
'-fvisibility-inlines-hidden'), LLVM would hit an assertion in
include/llvm/IR/GlobalValue.h:240: "local linkage requires default
visibility". The issue is that the visibility is copied from the source
of the function split in the `CloneFunctionInto` function, but the linkage
is not. To fix, create the new function first with external linkage,
then copy the linkage from the original function *after* `CloneFunctionInto`
is called.
Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the
explicit call to `setDSOLocal` can be removed in CoroSplit.cpp.
Test Plan: check-llvm
Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk
Reviewed By: rnk
Subscribers: llvm-commits, eric_niebler
Differential Revision: https://reviews.llvm.org/D44185
llvm-svn: 329033
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.
Reviewers: efriedma, majnemer, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45186
llvm-svn: 329027
This command can dump the binary contents of a stream to a file.
This is useful when you want to do side-by-side comparisons of
a specific stream from two PDBs to examine the differences between
them. You can export both of them to a file, then open them up
side by side in a hex editor (for example), so as to eliminate any
differences that might arise from the contents being on different
blocks in the PDB.
In subsequent patches I plan to improve the "explain" subcommand
so that you can explain the contents of a binary file that isn't
necessarily a full PDB, but one of these dumped streams, by telling
the subcommand how to interpret the contents.
llvm-svn: 329002
These used to be set in the old autoconf build, but the cmake build has had a
"TODO: actually check for these" comment since it was checked in, and they
were set to 1 on mingw unconditionally. It seems safe to say that they always
exist under mingw, so just remove them and assume they're set exactly when on
mingw (with msvc, we use `pragma comment` instead of linking these via flags).
llvm-svn: 328992
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.
This patch clones the metadatas in the original function to the new function.
Differential Revision: https://reviews.llvm.org/D44127
llvm-svn: 328991
The autoconf manual: "This macro is obsolescent, as all current systems with
directory libraries have <dirent.h>. New programs need not use this macro."
llvm-svn: 328989
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.
Reviewers: EricWF, modocache, rnk, lewissbaker
Reviewed By: modocache
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45114
llvm-svn: 328986
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 328980
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Differential revision: https://reviews.llvm.org/D41330
Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
When running dsymutil as part of your build system, it can be desirable
for warnings to be part of the end product, rather than just being
emitted to the output stream. This patch upstreams that functionality.
Differential revision: https://reviews.llvm.org/D44639
llvm-svn: 328965
This patch adds a set of unstable C API bindings to the DIBuilder interface for
creating structure, function, and aggregate types.
This patch also removes the existing implementations of these functions from
the Go bindings and updates the Go API to fit the new C APIs.
llvm-svn: 328953
Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW.
llvm-svn: 328952
This also moves to define it in the same way as ADCX which seems to use
constraints a bit better.
This is pulled out of the review for reducing the use of popf for
restoring EFLAGS, but is independent. There are still more problems with
our definitions for these instructions that Craig is going to look at
but this is at least less broken and he can start from this to improve
them more fully.
Thanks to Craig for the review here.
llvm-svn: 328945
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
llvm-svn: 328944
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44938
llvm-svn: 328939
Summary:
Avoids having to list all intrinsics manually.
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44937
llvm-svn: 328938
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: echristo, zturner, samsonov
Reviewed By: echristo
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45134
llvm-svn: 328935
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no limit).
Reviewers: wmi
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45127
llvm-svn: 328934
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.
Differential Revision: https://reviews.llvm.org/D44199
llvm-svn: 328933
This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false.
llvm-svn: 328926
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer D44363 for a list of all the required patches.
Reviewers: sanjoy, dexonsmith, hfinkel, RKSimon
Reviewed By: dexonsmith
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D44944
llvm-svn: 328925
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.
Differential Revision: https://reviews.llvm.org/D44909
llvm-svn: 328921
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.
Reviewers: eraman
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45062
llvm-svn: 328909
Make sure ThinLTO with caching doesn't use non-atomic writes to the cache file (to prevent data races and cache files corruption).
1. Place temp file to the same place where the caching directory is (instead of creating it the directory pointed to by TMP/TEMP variable). This will help to prevent using non-atomic rename and falling back to non-atomic "direct" write to the cache file.
2. if rename failed do not write to the cache file directly (direct write to the file is non-atomic and could cause data race conditions).
3. if cache file doesn't exist (e.g., because 'rename' failed or because some other reasons), bypass using the cache altogether.
Differential Revision: https://reviews.llvm.org/D45076
llvm-svn: 328904
Summary:
This exposes WebAssembly passes for use on the command line (as
arguments to -print-before and the like).
Reviewers: dschuff, sunfish
Subscribers: MatzeB, jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D45103
llvm-svn: 328901
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.
llvm-svn: 328897
This patch resolves link errors when the address of a static function is taken, and that function is uninstrumented by DFSan.
This change resolves bug 36314.
Patch by Sam Kerner!
Differential Revision: https://reviews.llvm.org/D44784
llvm-svn: 328890
Summary:
Make NVPTX require structured CFG. Added a temporary flag to
"roll back" the behavior for easy deployment.
Combined with D45008, this fixes several internal Nvidia GPU test
failures that we suspect to be ptxas miscompiles (PR27738).
Reviewers: jlebar
Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D45070
llvm-svn: 328885
Summary:
Tail duplication easily breaks the structure of CFG, e.g. duplicating on
a region entry. If the structure is intended to be preserved, then we
may want to configure tail duplication, or disable it for structured
CFG. From our benchmark results disabling it doesn't cause performance
regression.
Notice that this currently affects AMDGPU backend. In the next patch, I
also plan to turn on requiresStructuredCFG for NVPTX.
All unit tests still pass.
Reviewers: jlebar, arsenm
Subscribers: jholewinski, sanjoy, wdng, tpr, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45008
llvm-svn: 328884
Summary:
Previous revision caused a leak in the echo test that got caught by the ASAN bots because of missing free of the handlers array and was reverted in r328759. Resubmitting the patch with that correction.
Add support for cleanupret, catchret, catchpad, cleanuppad and catchswitch and their associated accessors.
Test is modified from SimplifyCFG because it contains many diverse usages of these instructions.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits, vlad.tsyrklevich
Differential Revision: https://reviews.llvm.org/D45100
llvm-svn: 328883
This will show more detail when using `llvm-pdbutil explain` on an
offset in the DBI or PDB streams. Specifically, it will dig into
individual header fields and substreams to give a more precise
description of what the byte represents.
llvm-svn: 328878
The code has bugs dealing with -0.0.
Since D44550 introduced FABS pattern folding in InstCombine,
this patch removes the now-redundant code that causes
https://bugs.llvm.org/show_bug.cgi?id=36600.
Patch by Mikhail Dvoretckii!
Differential Revision: https://reviews.llvm.org/D44683
llvm-svn: 328872