This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D44785
llvm-svn: 330322
Summary:
- Renamed tryParseRegister to tryParseScalarRegister, which
now returns an OperandMatchResultTy.
- Moved matching of certain aliases into matchRegisterNameAlias.
- Changed type of most 'Reg' variables to 'unsigned'.
This is patch [1/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro, samparker
Reviewed By: samparker
Subscribers: samparker, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45687
llvm-svn: 330311
This removes a bunch of unnecessary InstRW overrides. It also cleans up the missing information from the Sandy Bridge model. Other fixes to other models.
llvm-svn: 330308
Summary:
ASSERT_SORTED checks if a table is sorted, and uses a boolean to
prevent the check from being run again if it was earlier determined
that the table is in fact sorted. Unsynchronized reads and writes of
that boolean triggered ThreadSanitizer's data race detection. This
change rewrites the code to use std::atomic<bool> instead.
Fixes PR36922.
Reviewers: rnk
Reviewed By: rnk
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D45742
llvm-svn: 330301
The compiler only emits the locked version of these which use different instruction definitions. The versions fixed here are only used by the assembler/disassembler.
llvm-svn: 330287
Reverts rL330224, while issues with the C extension and missed common
subexpression elimination opportunities are addressed. Neither of these issues
are visible in current RISC-V backend unit tests, which clearly need
expanding.
llvm-svn: 330281
Legalize and emit code for converting unsigned HWord/Char to QP:
xscvsdqp
xscvudqp
Only covering patterns for unsigned forms cause we don't have part-word
sign-extending integer loads into VSX registers.
Differential Revision: https://reviews.llvm.org/D45494
llvm-svn: 330278
Legalize and emit code for converting (Un)Signed Word to quad-precision via:
xscvsdqp
xscvudqp
Differential Revision: https://reviews.llvm.org/D45389
llvm-svn: 330273
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.
Reviewers: echristo, jlebar, tra, jholewinski
Subscribers: mgorny, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41827
llvm-svn: 330271
a zero register.
Previously I tried this and saw LLVM unable to transform this to fold
with memory operands such as spill slot rematerialization. However, it
clearly works as shown in this patch. We turn these into `cmpb $0,
<mem>` when useful for folding a memory operand without issue. This form
has no disadvantage compared to `testb $-1, <mem>`. So overall, this is
likely no worse and may be slightly smaller in some cases due to the
`testb %reg, %reg` form.
Differential Revision: https://reviews.llvm.org/D45475
llvm-svn: 330269
Path.inc/widenPath tries to decode the path using both UTF-8 and the default Windows code page.
This is no longer necessary with the new InitLLVM method which ensures that the command line
arguemnts are already UTF-8 on Windows.
llvm-svn: 330266
across basic blocks in the limited cases where it is very straight
forward to do so.
This will also be useful for other places where we do some limited
EFLAGS propagation across CFG edges and need to handle copy rewrites
afterward. I think this is rapidly approaching the maximum we can and
should be doing here. Everything else begins to require either heroic
analysis to prove how to do PHI insertion manually, or somehow managing
arbitrary PHI-ing of EFLAGS with general PHI insertion. Neither of these
seem at all promising so if those cases come up, we'll almost certainly
need to rewrite the parts of LLVM that produce those patterns.
We do now require dominator trees in order to reliably diagnose patterns
that would require PHI nodes. This is a bit unfortunate but it seems
better than the completely mysterious crash we would get otherwise.
Differential Revision: https://reviews.llvm.org/D45673
llvm-svn: 330264
Summary:
A change to use divergence analysis in the AMDGPU backend was getting formal
arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or
VGPR2
For graphics shaders it is possible to have more than these passed in as VGPR
Modified the checking code to check for any VGPR registers passed in as formal
arguments.
Also, some intrinsics that are sources of divergence may have been lowered
during instruction selection and are missed on subsequent calls to
isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well.
Finally, the FunctionLoweringInfo tracks virtual registers that are live across
basic block boundaries. This is used to check for divergence of CopyFromRegister
registers using the DivergenceAnalysis analysis. For multiple blocks the lazily
evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45372
Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3
llvm-svn: 330257
After investigation discussed in D45439, it would seem that the nsw
flag restriction is unnecessary in most cases. So the IsInductionVar
lambda has been removed, the functionality extracted, and now only
require nsw when using eq/ne predicates.
Differential Revision: https://reviews.llvm.org/D45617
llvm-svn: 330256
Summary:
Due to some android peculiarities, in some build configurations
(statically linked executables targeting older releases) we could detect
the presence of these functions (because they are present in libc.a,
where check_library_exists searches), but then fail to build because the
headers did not include the definition.
This attempts to remedy that by upgrading the check_library_exists to
check_symbol_exists, which will check that the function is declared too.
I am hoping that a more thorough check will make the messy #ifdef we
have accumulated in the code obsolete, so I optimistically try to remove
them.
Reviewers: zturner, kparzysz, danalbert
Subscribers: srhines, mgorny, krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45359
llvm-svn: 330251
If a predicate does not become known after peeling, peeling is unlikely
to be beneficial.
Reviewers: mcrosier, efriedma, mkazantsev, junbuml
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D44983
llvm-svn: 330250
Summary:
Previously we crashed for the combination of the two features because we
tried to reference the dwo CU from the main object file. The fix
consists of two items:
- reference the skeleton CU from the name index (the consumer is
expected to use the skeleton CU to find the real data).
- use the main object file string pool for the strings in the index
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45566
llvm-svn: 330249
Summary:
When sinking an instruction in InstCombine we now also sink
the DbgInfoIntrinsics that are using the sunken value.
Example)
When sinking the load in this input
bb.X:
%0 = load i64, i64* %start, align 4, !dbg !31
tail call void @llvm.dbg.value(metadata i64 %0, ...)
br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
br label %for.body
we now also move the dbg.value, like this
bb.X:
br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
%0 = load i64, i64* %start, align 4, !dbg !31
tail call void @llvm.dbg.value(metadata i64 %0, ...)
br label %for.body
In the past we haven't moved the dbg.value so we got
bb.X:
tail call void @llvm.dbg.value(metadata i64 %0, ...)
br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
%0 = load i64, i64* %start, align 4, !dbg !31
br label %for.body
So in the past we got a debug-use before the def of %0.
And that dbg.value was also on the path jumping to %for.end, for
which %0 never was defined.
CodeGenPrepare normally comes to rescue later (when not moving
the dbg.value), since it moves dbg.value instrinsics quite
brutally, without really analysing if it is correct to move
the intrinsic (see PR31878).
So at the moment this patch isn't expected to have much impact,
besides that it is moving the dbg.value already in opt, making
the IR look more sane directly.
This can be seen as a preparation to (hopefully) make it possible
to turn off CodeGenPrepare::placeDbgValues later as a solution
to PR31878.
I also adjusted test/DebugInfo/X86/sdagsplit-1.ll to make the
IR in the test case up-to-date with this behavior in InstCombine.
Reviewers: rnk, vsk, aprantl
Reviewed By: vsk, aprantl
Subscribers: mattd, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D45425
llvm-svn: 330243
Summary: Previously if a modifer was placed on a non-GPR register class we would hit an assert or crash.
Reviewers: echristo
Reviewed By: echristo
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D45751
llvm-svn: 330238
The bitcast may be interfering with other combines or vectorization
as shown in PR16739:
https://bugs.llvm.org/show_bug.cgi?id=16739
Most pointer-related optimizations are probably able to look through
this bitcast, but removing the bitcast shrinks the IR, so it's at
least a size savings.
Differential Revision: https://reviews.llvm.org/D44833
llvm-svn: 330237
Summary:
Statistic and ManagedStatic both use mutexes. There was a lock order
inversion where, during initialization, Statistic's mutex would be
held while taking ManagedStatic's, and in llvm_shutdown,
ManagedStatic's mutex would be held while taking Statistic's
mutex. This change causes Statistic's initialization code to avoid
holding its mutex while calling ManagedStatic's methods, avoiding the
inversion.
Reviewers: dsanders, rtereshin
Reviewed By: dsanders
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45398
llvm-svn: 330236
Track the debug locations of the incoming values to newly-created phis,
and apply merged debug locations to the phis.
A merged location will be on line 0, but will have the correct scope
set. This improves crash reporting when an inlined instruction with a
merged location triggers a machine exception. A debugger will be able to
narrow down the crash to the correct inlined scope, instead of simply
pointing to the outer scope of the caller.
Taken together with a change allows generating merged line-0 locations
for instructions which aren't calls, this results in a 0.5% increase in
the uncompressed size of the .debug_line section of a stage2+Release
build of clang (-O3 -g).
rdar://33858697
Differential Revision: https://reviews.llvm.org/D45397
llvm-svn: 330227
The implementation follows the MIPS backend and expands the
pseudo instruction directly during asm parsing. As the result, only
real MC instructions are emitted to the MCStreamer. Additionally,
PseudoLI instructions are emitted during codegen. The actual
expansion to real instructions is performed during MI to MC lowering
and is similar to the expansion performed by the GNU Assembler.
Differential Revision: https://reviews.llvm.org/D41949
Patch by Mario Werner.
llvm-svn: 330224
When we skip bitcasts while looking for GEP in LoadSoreVectorizer
we should also verify that the type is sized otherwise we assert
Differential Revision: https://reviews.llvm.org/D45709
llvm-svn: 330221
Summary:
Add an LLVM intrinsic for type discriminated event logging with XRay.
Similar to the existing intrinsic for custom events, but also accepts
a type tag argument to allow plugins to be aware of different types
and semantically interpret logged events they know about without
choking on those they don't.
Relies on a symbol defined in compiler-rt patch D43668. I may wait
to submit before I can see demo everything working together including
a still to come clang patch.
Reviewers: dberris, pelikan, eizan, rSerge, timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45633
llvm-svn: 330219
Summary:
It was not easy to provide a test case for D45648 (rL330079) because the bug
didn't manifest itself in the set of currently valid IRs. Added an assertion to
check this faster, thanks to @dblaikie's suggestion.
Reviewers: dblaikie
Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits, dblaikie
Differential Revision: https://reviews.llvm.org/D45711
llvm-svn: 330217
GetArgumentVector (or GetCommandLineArguments) is very Windows-specific.
I think it doesn't make much sense to provide that function from sys::Process.
I also made a change so that the function takes a BumpPtrAllocator
instead of a SpecificBumpPtrAllocator. The latter is the class to call
dtors, but since char * is trivially destructible, we should use the
former class.
Differential Revision: https://reviews.llvm.org/D45641
llvm-svn: 330216
The DBI stream contains a list of module descriptors. At the
beginning of each descriptor is a structure representing the first
section contribution in the output file for that module. LLD
currently doesn't fill out this structure at all, but link.exe
does. So as a precursor to emitting this data in LLD, we first
need a way to dump it so that it can be checked.
This patch adds support for the dumping, and verifies via a test
that LLD emits bogus information.
llvm-svn: 330208
Stack addressing needs addressing modes that provide an offset field
immediately following the frame index. An initializer from a non-stack
addressing could force the stack address to use a form that does not
provide an offset field.
llvm-svn: 330191
LLVMDump* functions are available in Release builds too.
Patch by Brenton Bostick.
Differential Revision: https://reviews.llvm.org/D44600
llvm-svn: 330189
Split VCMP/VMAX/VMIN instructions off to WriteFCmp and VCOMIS instructions off to WriteFCom instead of assuming they match WriteFAdd
Differential Revision: https://reviews.llvm.org/D45656
llvm-svn: 330179
One more, hopefully the last, bug is fixed: when forming UsesToRewrite
we should ignore phi operands coming from edges that we want to delete.
This reverts r329910.
llvm-svn: 330175
Apparently, DebugInfoFinder::processCompileUnit doesn't process all
of the possible kinds of DIImportedEntit'ies, e.g. DIGlobalVariable's.
Previously introduced `llvm_unreachable` is therefore incorrect.
Removing it here.
llvm-svn: 330167
Using Config->is64() will treat ARM64 as Amd64, which is incorrect.
Furthermore, there are more esoteric architectures that could
theoretically be encountered. Just set it directly to the machine
type, which we already know anyway.
llvm-svn: 330157
Summary:
Specifying assert message with an || operator makes the compiler interpret it
as a bool. Changed it to &&.
Reviewers: asb, apazos
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45660
llvm-svn: 330148
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.
Differential Revision: https://reviews.llvm.org/D45469
llvm-svn: 330143
materializing function definitions.
MaterializationUnit instances are responsible for resolving and finalizing
symbol definitions when their materialize method is called. By contract, the
MaterializationUnit must materialize all definitions it is responsible for and
no others. If it can not materialize all definitions (because of some error)
then it must notify the associated VSO about each definition that could not be
materialized. The MaterializationResponsibility class tracks this
responsibility, asserting that all required symbols are resolved and finalized,
and that no extraneous symbols are resolved or finalized. In the event of an
error it provides a convenience method for notifying the VSO about each
definition that could not be materialized.
llvm-svn: 330142
notifyMaterializationFailed.
The notifyMaterializationFailed method can determine which error to raise by
looking at which queue the pending queries are in (resolution or finalization).
llvm-svn: 330141
Move veriication check for legal conversions to f128 into LowerINT_TO_FP()
and fix some indentations to match other sections of the code for readability.
llvm-svn: 330138
The code uses the index of the last element in the sorted array to determine the maximum size needed for the vector. But if the last index is a FunctionIndex(~0), attrIdxToArrayIdx will return 0 and the vector will have size 1. If there are any indices before FunctionIndex, those values would return a value larger than 0 from attrIdxToArrayIdx. So in this case we need to look in front of the FunctionIndex to get the true size needed.
Differential Revision: https://reviews.llvm.org/D45632
llvm-svn: 330136
When emitting CodeView debug information, compiler-generated thunk routines
should be emitted using S_THUNK32 symbols instead of S_GPROC32_ID symbols so
Visual Studio can properly step into the user code. This initial support only
handles standard thunk ordinals.
Differential Revision: https://reviews.llvm.org/D43838
llvm-svn: 330132
Most of these are pretty trivial and obvious. Setting the toolchain
version to 14.11 is perhaps a little questionable, but we've been bitten
in the past where one of our version fields sidn't match MSVC's, and I
definitely don't want to go through that diagnosis again as it was
pretty time consuming and hard to track down.
I found all of these by using llvm-pdbutil export to dump the dbi and
pdb streams to a file, then using fc followed by llvm-pdbutil explain to
explain the mismatched bytes.
There are still some more, these are just the low hanging fruit.
Differential Revision: https://reviews.llvm.org/D45276
llvm-svn: 330130
Two cleanups:
1. As noted in D45453, we had tests that don't need FMF that were misplaced in the 'fast-math.ll' test file.
2. This removes the final uses of dyn_castFNegVal, so that can be deleted. We use 'match' now.
llvm-svn: 330126
Using Goldmont's cost tables for these two upcoming
atom archs.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45612
llvm-svn: 330109
Summary: As discussed in https://reviews.llvm.org/D45606, it makes more sense to name the class as SmallVectorMemoryBuffer
Reviewers: bkramer, dblaikie
Reviewed By: dblaikie
Subscribers: mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D45661
llvm-svn: 330107
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.
The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.
https://rise4fun.com/Alive/Pkmg
Reviewers: spatel, craig.topper, eli.friedman, jingyue
Reviewed By: spatel
Subscribers: llvm-commits
Was reviewed as part of https://reviews.llvm.org/D45631
llvm-svn: 330103
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.
The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.
https://rise4fun.com/Alive/Pkmg
Reviewers: spatel, craig.topper, eli.friedman, jingyue
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45631
llvm-svn: 330101
As suggested in https://reviews.llvm.org/D45631#1068338,
looking at haveNoCommonBitsSet() users, and *trying* to
show the change effect elsewhere.
llvm-svn: 330100
TargetSchedModel now always delegates to MCSchedModel the computation of
instruction latency and reciprocal throughput.
No functional change intended.
llvm-svn: 330099
This is a transform that I limited in instcombine in rL329821 because it was
creating more instructions in IR when the cast has multiple uses.
But if the cast is free, then we can do the transform regardless of other
uses because it improves the potential throughput of the calculation by
removing a dependency on the fneg.
Differential Revision: https://reviews.llvm.org/D45598
llvm-svn: 330098
Summary:
Since the class is used by both MCJIT and LTO, it makes more sense to move it to Support lib.
This is a follow up patch to r329929 and https://reviews.llvm.org/D45244
Reviewers: bkramer, dblaikie
Reviewed By: bkramer
Subscribers: mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D45606
llvm-svn: 330093
Create convenience functions for printing error, warning and note to
stdout. Previously we had similar functions being used in dsymutil, but
given that this pattern is so common it makes sense to make it available
globally.
llvm-svn: 330091
These simplifications were previously enabled only with isFast(), but that
is more restrictive than required. Since r317488, FMF has 'reassoc' to
control these cases at a finer level.
llvm-svn: 330089
Summary:
InsertPos is within the bacic block `Header`, so `findDebugLoc()` should
be called on not `MBB` but `Header` instead.
Reviewers: yurydelendik
Subscribers: jfb, dschuff, aprantl, sbc100, jgravelle-google, sunfish, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45648
llvm-svn: 330079
The destination size of the movzx/movsx instruction is controlled by the normal operand size mechanisms. Only the input type is fixed.
This means that a 0x66 prefix on the encoding for zext/sext 16->32 should really produce a 16->16 instruction. Functionally this is equivalent to a GR16->GR16 move since bits 16 and above will be preserved. So nothing is actually extended.
llvm-svn: 330078
As demonstrated by the regression tests added in this patch, the
following cases are valid cases:
1. A Function with no DISubprogram attached, but various debug info
related to its instructions, coming, for instance, from an inlined
function, also defined somewhere else in the same module;
2. ... or coming exclusively from the functions inlined and eliminated
from the module entirely.
The ValueMap shared between CloneFunctionInto calls within CloneModule
needs to contain identity mappings for all of the DISubprogram's to
prevent them from being duplicated by MapMetadata / RemapInstruction
calls, this is achieved via DebugInfoFinder collecting all the
DISubprogram's. However, CloneFunctionInto was missing calls into
DebugInfoFinder for functions w/o DISubprogram's attached, but still
referring DISubprogram's from within (case 1). This patch fixes that.
The fix above, however, exposes another issue: if a module contains a
DISubprogram referenced only indirectly from other debug info
metadata, but not attached to any Function defined within the module
(case 2), cloning such a module causes a DICompileUnit duplication: it
will be moved in indirecty via a DISubprogram by DebugInfoFinder first
(because of the first bug fix described above), without being
self-mapped within the shared ValueMap, and then will be copied during
named metadata cloning. So this patch makes sure DebugInfoFinder
visits DICompileUnit's referenced from DISubprogram's as it goes w/o
re-processing llvm.dbg.cu list over and over again for every function
cloned, and makes sure that CloneFunctionInto self-maps
DICompileUnit's referenced from the entire function, not just its own
DISubprogram attached that may also be missing.
The most convenient way of tesing CloneModule I found is to rely on
CloneModule call from `opt -run-twice`, instead of writing tedious
unit tests. That feature has a couple of properties that makes it hard
to use for this purpose though:
1. CloneModule doesn't copy source filename, making `opt -run-twice`
report it as a difference.
2. `opt -run-twice` does the second run on the original module, not
its clone, making the result of cloning completely invisible in opt's
actual output with and without `-run-twice` both, which directly
contradicts `opt -run-twice`s own error message.
This patch fixes this as well.
Reviewed By: aprantl
Reviewers: loladiro, GorNishanov, espindola, echristo, dexonsmith
Subscribers: vsk, debug-info, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45593
llvm-svn: 330069
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.
Differential Revision: https://reviews.llvm.org/D45271
llvm-svn: 330062
r327219 added wrappers to std::sort which randomly shuffle the container before
sorting. This will help in uncovering non-determinism caused due to undefined
sorting order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of
std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to
llvm::sort. Refer the comments section in D44363 for a list of all the
required patches.
llvm-svn: 330061
The Power 9 scheduler model should now include the TLS instructions.
We can now, once again, mark the model as complete.
From now on, if instructions are added to Power 9 but are not
added to the model the build should produce an error. Hopefully
that will alert the developer who is adding new instructions
that they should also be added to the scheulder model.
llvm-svn: 330060
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: kcc, pcc, danielcdh, jmolloy, sanjoy, dberlin, ruiu
Reviewed By: ruiu
Subscribers: ruiu, llvm-commits
Differential Revision: https://reviews.llvm.org/D45142
llvm-svn: 330059
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: grosbach, void, ruiu
Reviewed By: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45138
llvm-svn: 330058
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: bogner, vsk, eraman, ruiu
Reviewed By: ruiu
Subscribers: ruiu, llvm-commits
Differential Revision: https://reviews.llvm.org/D45139
llvm-svn: 330057
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer D44363 for a list of all the required patches.
Reviewers: pcc, mehdi_amini, ruiu
Reviewed By: ruiu
Subscribers: ruiu, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D45137
llvm-svn: 330053
We have a few functions that virtually all command wants to run on
process startup/shutdown. This patch adds InitLLVM class to do that
all at once, so that we don't need to copy-n-paste boilerplate code
to each llvm command's main() function.
Differential Revision: https://reviews.llvm.org/D45602
llvm-svn: 330046
Previously, the MIPS backend would alwyas break down constant multiplications
into a series of shifts, adds, and subs. This patch changes that so the cost of
doing so is estimated.
The cost is estimated against worst case constant materialization and retrieving
the results from the HI/LO registers.
For cases where the value type of the multiplication is not legal, the cost of
legalization is estimated and is accounted for before performing the
optimization of breaking down the constant
This resolves PR36884.
Thanks to npl for reporting the issue!
Reviewers: abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D45316
llvm-svn: 330037
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.
Differential Revision: https://reviews.llvm.org/D44675
llvm-svn: 330034
Similar to rL329834, don't rely on itinerary scheduler model to determine latencies for LEA thresholds, use the generic TargetSchedModel::computeInstrLatency call.
llvm-svn: 330030
Summary:
This was originally part of rL328132, and led to the discovery
of the issues addressed in rL328987. Re-landing.
Reviewers: xur, davidxl, bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45545
llvm-svn: 330029
Summary:
Added instructions for contiguous stores, ST1, with scalar+imm addressing
modes and corresponding tests. The patch also adds parsing of
'mul vl' as needed for the VL-scaled immediate.
This is patch [6/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45432
llvm-svn: 330014
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.
I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]
This extends/generalizes that fold, so it is handled too.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45539
llvm-svn: 330001
Summary:
Added Z_(b|h|s|d) vector list RegisterOperands along with support to
add/print the vector lists.
This is patch [5/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: fhahn
Subscribers: tschuett, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45431
llvm-svn: 330000
Hint to hardware to move the cache line containing the
address to a more distant level of the cache without
writing back to memory.
Reviewers: craig.topper, zvi
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D45256
llvm-svn: 329992
The commit in SVN r310001 that added support for this actually didn't
use the right struct field for the frame pointer - for ARM, there is
no register named Fp in the CONTEXT struct. On Windows, the R11
register is used as frame pointer.
Differential Revision: https://reviews.llvm.org/D45590
llvm-svn: 329991
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.
We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.
llvm-svn: 329990
Summary: This enables debug fission on implicit ThinLTO when linked with gold. It will put the .dwo files in a directory specified by user.
Reviewers: tejohnson, pcc, dblaikie
Reviewed By: pcc
Subscribers: JDevlieghere, mehdi_amini, inglorion
Differential Revision: https://reviews.llvm.org/D44792
llvm-svn: 329988
This removes the last of the x86 schedule itineraries, I'm intending to cleanup the remaining uses of NoItinerary/OpndItins/etc. before resolving PR37093.
llvm-svn: 329967
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. This reduces the size of Chromium
for Android's .text section by 108KB.
Differential Revision: https://reviews.llvm.org/D45199
llvm-svn: 329956
This lifts a restriction on DILocation::getMergedLocation(), allowing it
to create merged locations for instructions other than calls.
Instruction::applyMergedLocation() now defaults to creating merged
locations for all instructions.
The default behavior of getMergedLocation() is unchanged: callers which
invoke it directly are unaffected.
This change will enable a follow-up Mem2Reg fix which improves crash
reporting.
Differential Revision: https://reviews.llvm.org/D45396
llvm-svn: 329955
This parses a mangled name into an AST (typically an intermediate stage in
itaniumDemangle) and provides some functions to query certain properties or
print certain parts of the demangled name.
Differential revision: https://reviews.llvm.org/D44668
llvm-svn: 329951
Summary:
GCC compresses the pseudo instruction "mv rd, rs", which is an alias of
"addi rd, rs, 0", to "c.mv rd, rs".
In LLVM we rely on the canonical MC instruction (MCInst) to do our compression
checks and since there is no rule to compress "addi rd, rs, 0" --> "c.mv
rd, rs" we lose this compression opportunity to gcc.
In this patch we fix that by adding an addi to c.mv compression pattern, the
instruction "mv rd, rs" will be compressed to "c.mv rd, rs" just like
gcc does.
Patch by Zhaoshi Zheng (zzheng) and Sameer (sabuasal).
Reviewers: asb, apazos, zzheng, mgrang, shiva0217
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, niosHD, kito-cheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45583
llvm-svn: 329939
A previously missing intrinsic for an old instruction.
Reviewers: craig.topper, echristo
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D45312
llvm-svn: 329936
AFI->setRedZone(false) was put in the wrong place before, and so it only fired
on functions that didn't have stack frames. This moves that to the top of
emitPrologue to make sure that every function without a redzone has it set
correctly.
This also adds a function representing one of the early exit cases (GHC calling
convention) to the MachineOutliner noredzone test to ensure that we can outline
from functions like these, where we never use a redzone.
llvm-svn: 329922
This change is exposing UB in source code - as was warned/predicted. :)
See D44909 for discussion. Reverting while we figure out how to fix things.
llvm-svn: 329920
There are cases when individual NodeSets can be equal with respect to
the ordering criteria. Since they are stored in an ordered container,
use stable_sort to preserve the relative order of equal NodeSets.
This should remove non-determinism discovered by shuffling done in
llvm::sort with expensive checks enabled.
llvm-svn: 329915
Summary:
Merged 'addVectorList64Operands' and 'addVectorList128Operands' into a
generic 'addVectorListOperands', which can be easily extended to work
for SVE vectors.
This is patch [4/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45430
llvm-svn: 329909
Don't assume SelectionDAG is non-null as the targets can use it with a
null pointer.
Differential Revision: https://reviews.llvm.org/D44611
llvm-svn: 329908
Created a helper function to query for non negative SCEVs. Uses the
SGE predicate to catch constants that could be interpreted as
negative.
Differential Revision: https://reviews.llvm.org/D45481
llvm-svn: 329907
Summary:
Added 'RegisterKind' to the VectorListOp structure, so that this operand
type can be reused for SVE vector lists in a later patch. It also
refactors the 'tryParseVectorList' function so it can be used directly
in the ParserMethod of an operand. The parsing can now parse multiple
kinds of vectors and recover if there is no match.
This is patch [3/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45429
llvm-svn: 329900
Summary:
According RISC-V ELF psABI specification, base RV32 and RV64 ISAs only
allow 32-bit instruction alignment, but instruction allow to be aligned
to 16-bit boundaries for C-extension.
So we just align to 4 bytes and 2 bytes for C-extension is enough.
Reviewers: asb, apazos
Differential Revision: https://reviews.llvm.org/D45560
Patch by Kito Cheng.
llvm-svn: 329899
This patch makes tryCandidate() virtual and some utility functions like
tryLess(), tryGreater(), ... externally available (used to be static).
This makes it possible for a target to derive a new MachineSchedStrategy from
GenericScheduler and reuse most parts.
It was necessary to wrap functions with the same names in
AMDGPU/SIMachineScheduler in a local namespace.
Review: Andy Trick, Florian Hahn
https://reviews.llvm.org/D43329
llvm-svn: 329884
Operand 0 should have the same type of the result. So if the result type needs to be promoted, operand 0 needs to be promoted unconditionally.
llvm-svn: 329883
Also add double-prevoius-failure.ll which captures a test case that at one
point triggered a compiler crash, while developing calling convention support
for f64 on RV32D with soft-float ABI.
llvm-svn: 329877
fadd.d is required in order to force floating point registers to be used in
test code, as parameters are passed in integer registers in the soft float
ABI.
Much of this patch is concerned with support for passing f64 on RV32D with a
soft-float ABI. Similar to Mips, introduce pseudoinstructions to build an f64
out of a pair of i32 and to split an f64 to a pair of i32. BUILD_PAIR and
EXTRACT_ELEMENT can't be used, as a BITCAST to i64 would be necessary, but i64
is not a legal type.
llvm-svn: 329871
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.
Without this, I get verifier errors on `@Test2`.
It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.
llvm-svn: 329868
The standard says that the order of evaluation of an expression
s[x] = foo()
is unspecified. In our case, we first create an empty entry in the map,
then call foo(), then store its return value to the created entry. The
problem is that foo uses the map as a cache, so if it finds that there
is an entry in the map, it stops computation. This change explicitly
sets the order, thus fixing this heisenbug.
llvm-svn: 329864
Without these functions it's hard to create a TargetMachine for
Orc JIT that creates efficient native code.
It's not sufficient to just expose LLVMGetHostCPUName(), because
for some CPUs there's fewer features actually available than
the CPU name indicates (e.g. AVX might be missing on some CPUs
identified as Skylake).
Differential Revision: https://reviews.llvm.org/D44861
llvm-svn: 329856
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).
Patch by Josh Stone.
llvm-svn: 329852
Similar to the wbinvd instruction, except this
one does not invalidate caches. Ring 0 only.
The encoding matches a wbinvd instruction with
an F3 prefix.
Reviewers: craig.topper, zvi, ashlykov
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D43816
llvm-svn: 329847
Most importantly, we should not replace slashes with backslashes
because that would invalidate the path.
Differential Revision: https://reviews.llvm.org/D45473
llvm-svn: 329838
Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550.
I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here,
There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers.
There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical.
NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329837
Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call.
Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329834
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329829
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Re-landed after noticing that the buildbot failure from 329808 seemed to
be unrelated.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329826
This is causing compilation timeouts on code with long sequences of
local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out
that code coverage instrumentation is a great way to create sequences
like this, which how our users ran into the issue in practice.
Intel has a tool that detects these kinds of non-linear compile time
issues, and Andy Kaylor reported it as PR37010.
The current sinking code scans the whole basic block once per local
value sink, which happens before emitting each call. In theory, local
values should only be introduced to be used by instructions between the
current flush point and the last flush point, so we should only need to
scan those instructions.
llvm-svn: 329822
Previously the MD5 option of the .file directive provided the checksum
as a quoted hex string; now it's a normal hex number with 0x prefix,
same as the .octa directive accepts.
Differential Revision: https://reviews.llvm.org/D45459
llvm-svn: 329820
Add the minimal support necessary to lower a function that returns the
sum of two i32 values.
Support argument/return lowering of i32 values through registers only.
Add tablegen for regbankselect and instructionselect.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D44304
llvm-svn: 329819
Two issues were fixed:
runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.
handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.
Differential Revision: https://reviews.llvm.org/D45187
llvm-svn: 329815
This reverts 329808. That change caused a report of a failure in
test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect
it is an expensive-check-only error.
Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0
llvm-svn: 329811
Summary:
Place parsing of a vector index into a separate function to reduce
duplication, since the code is duplicated in both the parsing of a
Neon vector register operand and a Neon vector list.
This is patch [2/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45428
llvm-svn: 329809
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329808
In r329691, we would choose FP even if the offset wouldn't fit, just
because the offset is smaller than the one from BP. This made many
accesses through FP need to scavenge a register, which resulted in
slower and bigger code for no good reason.
This patch now always picks the offset that fits first, even if FP is
preferred.
llvm-svn: 329797
This is a follow up of rL327695 to instruction select more variants of VSELGT
and VSELGE, for which it is necessary to custom lower SELECT.
More work is required in this area, which will be addressed soon:
- more variants need to be regression tested, but this depends on the next point.
- first LowerConstantFP need to be adjusted for fp16 values.
Differential Revision: https://reviews.llvm.org/D45205
llvm-svn: 329788
Summary:
Merged 'tryMatchVectorRegister' (specific to Neon) and
'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and
created a generic 'parseVectorKind()' function that returns the #Elements
and ElementWidth of a vector suffix. This reduces the duplication of
this functionality between two the vector implementations.
This is patch [1/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: fhahn
Subscribers: tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45427
llvm-svn: 329782
The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency.
llvm-svn: 329774
With -fno-plt, for example, calls to printf when getting converted to puts
still use the PLT. This patch checks for the metadata "RtLibUseGOT" and
annotates the declaration with the right attributes.
Differential Revision: https://reviews.llvm.org/D45180
llvm-svn: 329768
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
v2: - fix regressions in merge-stores.ll and multiple_tails.ll
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.
Reviewers: mcrosier
Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45502
llvm-svn: 329761
This is based on an example that was recently posted on llvm-dev:
void *propagate_null(void* b, int* g) {
if (!b) {
return 0;
}
(*g)++;
return b;
}
https://godbolt.org/g/xYk3qG
The original code or constant propagation in other passes has obscured the fact
that the phi can be removed completely.
Differential Revision: https://reviews.llvm.org/D45448
llvm-svn: 329755
Summary:
The verification rules for the intrinsics for atomic memcpy, atomic memmove,
and atomic memset are basically code clones. This change merges their verification
rules into a single block to remove duplication.
llvm-svn: 329753
Summary:
Darwin dynamic linker can handle weak symbols in ConstDataSection.
ReadonReadOnlyWithRel symbols should be emitted in ConstDataSection
instead of normal DataSection.
rdar://problem/39298457
Reviewers: dexonsmith, kledzik
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45472
llvm-svn: 329752
This commit fixes the bot failures that were coming up before with r329716.
The fix was to move the check for "isInSection()" inside of the if condition
and emit the error there instead of waiting to get past the unreachable statement.
This should work in debug and release builds now.
llvm-svn: 329746
There was missing nullptr check before a call to getSection() in
recordRelocation. This would result in a segfault in code like the attached
test.
This adds the missing check and a test which makes sure we get the expected
error output.
llvm-svn: 329716
Summary:
We would like the UMR debugging tool[0] to be able to provide
disassembly for currently live waves based on plain memory
dumps, and we want to leverage the LLVM disassembler for this.
This mostly works, except that UMR clearly can't provide real
symbol info, so it wants to set DisInfo == nullptr.
[0] https://cgit.freedesktop.org/amd/umr/
Reviewers: arsenm, rampitec, artem.tamazov, dp
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45477
Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6
llvm-svn: 329715
Summary:
This type is created on-demand and used as the base type for array
ranges. Since it is "special", its construction did not go through the
createTypeDIE function and so it was never inserted into the accelerator
table, although it clearly belongs there.
I add an explicit addAccelType call to insert it into the table.
During review, we also decided to rename the type to something more
unique to avoid confusion in case the user has own "sizetype" type. The
new name for the type size __ARRAY_SIZE_TYPE__.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45445
llvm-svn: 329705
Improve the alias analysis to account for cases where we
know that src/dst pairs cannot alias due to things like
TBAA. As we know they are noalias, we know no dependency
can occur. Also fixes issues around the size parameter
to AA being incorrect.
Differential Revision: https://reviews.llvm.org/D42381
llvm-svn: 329692
In the presence of variable-sized stack objects, we always picked the
base pointer when resolving frame indices if it was available.
This makes us hit an assert where we can't reach the emergency spill
slot if it's too far away from the base pointer. Since on AArch64 we
decide to place the emergency spill slot at the top of the frame, it
makes more sense to use FP to access it.
The changes here don't affect only emergency spill slots but all the
frame indices. The goal here is to try to choose between FP, BP and SP
so that we minimize the offset and avoid scavenging, or worse, asserting
when trying to access a slot allocated by the scavenger.
Previously discussed here: https://reviews.llvm.org/D40876.
Differential Revision: https://reviews.llvm.org/D45358
llvm-svn: 329691
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).
This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.
V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading
scratch descriptor from GIT.
Reviewers: kzhuravl, nhaehnle, timcorringham
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D44468
Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 329690
Much like any written register in load/store instructions, the status register
is not allowed to overlap with any others. So diagnose it like we already do
with the other cases.
llvm-svn: 329687