The new test fails on the Hexagon bot. Reverting while I investigate.
This reverts https://reviews.llvm.org/rL322005
This reverts commit b7e0026b4385180c378edc658ec91a39566f2942.
llvm-svn: 322008
Adds option /guard:cf to clang-cl and -cfguard to cc1 to emit function IDs
of functions that have their address taken into a section named .gfids$y for
compatibility with Microsoft's Control Flow Guard feature.
Differential Revision: https://reviews.llvm.org/D40531
llvm-svn: 322005
Allow SimplifyDemandedBits to use TargetLoweringOpt::computeKnownBits to look through bitcasts. This can help simplifying in some cases where bitcasts of constants generated during or after legalization can't be folded away, and thus didn't get picked up by SimplifyDemandedBits. This fixes PR34620, where a redundant pand created during legalization from lowering and lshr <16xi8> wasn't being simplified due to the presence of a bitcasted build_vector as an operand.
Committed on the behalf of @sameconrad (Sam Conrad)
Differential Revision: https://reviews.llvm.org/D41643
llvm-svn: 321969
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
There's definitely room for improvement with some follow up patches.
Reviewers: RKSimon, zvi, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41560
llvm-svn: 321967
Summary:
I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.
There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.
With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: nemanjai, llvm-commits, kbarton
Differential Revision: https://reviews.llvm.org/D41654
llvm-svn: 321959
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
We're trading branch and compares for loads and logic ops.
This makes the code smaller and hopefully faster in most cases.
The 24-byte test shows an interesting construct: we load the trailing scalar
elements into vector registers and generate the same pcmpeq+movmsk code that
we expected for a pair of full vector elements (see the 32- and 64-byte tests).
Differential Revision: https://reviews.llvm.org/D41714
llvm-svn: 321934
This had been reverted because the new test failed on non-X86 bots. I moved
the new test to the appropriate subdirectory to correct this.
Differential Revision: https://reviews.llvm.org/D41264
Original submission: r321122 (which was reverted by r321125)
This reverts commit 3c1639b5703c387a0d8cba2862803b4e68dff436.
llvm-svn: 321911
Summary:
This commit updates the BufferByteStreamer, used by DebugLocStream
to buffer bytes/comments to put in the debug_loc section, to
make sure that the Buffer and Comments vectors are synced.
Previously, when an SLEB128 or ULEB128 was emitted together with
a comment, the vectors could be out-of-sync if the LEB encoding
added several entries to the Buffer vectors, while we only added
a single entry to the Comments vector.
The goal with this is to get the comments in the debug_loc
section in the .s file correctly aligned.
Example (using ARM as target):
Instead of
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @ DW_OP_piece
.byte 147 @ 8
.byte 8 @ sub-register DW_OP_regx
.byte 144 @ 257
.byte 129 @ DW_OP_piece
.byte 2 @ 8
.byte 147 @
.byte 8 @
we now get
.byte 144 @ sub-register DW_OP_regx
.byte 128 @ 256
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
.byte 144 @ sub-register DW_OP_regx
.byte 129 @ 257
.byte 2 @
.byte 147 @ DW_OP_piece
.byte 8 @ 8
Reviewers: JDevlieghere, rnk, aprantl
Reviewed By: aprantl
Subscribers: davide, Ka-Ka, uabelho, aemerson, javed.absar, kristof.beyls, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D41763
llvm-svn: 321907
This implements the DWARF 5 feature described at
http://www.dwarfstd.org/ShowIssue.php?issue=141215.1
This allows a consumer to understand whether a composite data type is
trivially copyable and thus should be passed by value instead of by
reference. The canonical example is being able to distinguish the
following two types:
// S is not trivially copyable because of the explicit destructor.
struct S {
~S() {}
};
// T is a POD type.
struct T {
~T() = default;
};
This patch adds two new (DI)flags to LLVM metadata: TypePassByValue
and TypePassByReference.
<rdar://problem/36034922>
Differential Revision: https://reviews.llvm.org/D41743
llvm-svn: 321844
The existing version worked incorrectly when inversion of a branch condintion is impossible.
Changed the "fixupConditionalBranch()" function - a new BB (a trampoline) is created to keep the original branch condition.
Differential Revision: https://reviews.llvm.org/D41634
llvm-svn: 321785
Add iterator ranges for machine instruction phis, similar to the IR-level
phi ranges added in r303964. I updated a few places to use this. Besides
general code simplification, this change will allow removing a non-upstream
change from Swift's copy of LLVM (in a better way than my previous attempt
in http://reviews.llvm.org/D19080).
https://reviews.llvm.org/D41672
llvm-svn: 321783
Handle this in DAGCombiner::visitEXTRACT_VECTOR_ELT the same as we already do in SelectionDAG::getNode and use APInt instead of getZExtValue.
This should also fix oss-fuzz #4910
llvm-svn: 321767
The preference only applies to 'memcmp() == 0' expansion, so try to make that clearer.
x86 will likely benefit by increasing the default value from '1' to '2' as seen in PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
...so that is the planned follow-up to this clean-up step.
llvm-svn: 321756
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend.
D20830 threaded an MCSubtargetInfo reference through
MCAsmBackend::relaxInstruction, but this isn't the only function that would
benefit from access. This patch removes the Triple and CPUString arguments
from createMCAsmBackend and replaces them with MCSubtargetInfo.
This patch just changes the interface without making any intentional
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)
This change initially exposed PR35686, which has since been resolved in r321026.
Differential Revision: https://reviews.llvm.org/D41349
llvm-svn: 321692
Previously the code for handling G_SMULO didn't properly check for the signed
multiply overflow, instead treating it the same as the unsigned G_UMULO.
Fixes PR35800.
llvm-svn: 321690
A call may have an intrinsic name but not have a valid intrinsic ID,
for example with llvm.invariant.group.barrier. If so, treat it as a
normal call like FastISel does.
llvm-svn: 321662
Tests updated to explicitly use fast-isel at -O0 instead of implicitly.
This change also allows an explicit -fast-isel option to override an
implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0.
Differential Revision: https://reviews.llvm.org/D41362
llvm-svn: 321655
Our internal testing has revealed has discovered bugs in PPC builds.
I have forward reproduction instructions to the original author (Nirav).
llvm-svn: 321649
Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same.
If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works.
getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening.
FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum.
Differential Revision: https://reviews.llvm.org/D40664
llvm-svn: 321629
Fix code in LiveDebugVariables that was changing def MachineOperands to
uses, which will hit an assert for dead operands after the change to add
the renamable bit to MachineOperands. Avoid the assert by clearing the
dead bit before changing the operand to a use.
Fixes issue reported in out of tree target by Jesper Antonsson at Ericsson.
llvm-svn: 321571
Summary:
I have been getting rather difficult to reproduce SIGBUS crashes when
compiling certain FreeBSD sources, and their stack traces pointed
squarely at `SelectionDAG::salvageDebugInfo()`:
```
Core was generated by `/usr/obj/share/dim/src/freebsd/clang600-import/amd64.amd64/tmp/usr/bin/cc -cc1 -'.
Program terminated with signal SIGBUS, Bus error.
#0 isInvalidated () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SDNodeDbgValue.h:115
115 bool isInvalidated() const { return Invalid; }
(gdb) bt
#0 isInvalidated () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SDNodeDbgValue.h:115
#1 salvageDebugInfo () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7116
#2 0x00000000033b2516 in operator() () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3595
#3 __invoke<(lambda at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3593:59) &, llvm::SDNode *, llvm::SDNode *> () at /usr/include/c++/v1/type_traits:4323
#4 __call<(lambda at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3593:59) &, llvm::SDNode *, llvm::SDNode *> () at /usr/include/c++/v1/__functional_base:349
#5 operator() () at /usr/include/c++/v1/functional:1562
#6 0x00000000033b0817 in operator() () at /usr/include/c++/v1/functional:1916
#7 NodeDeleted () at /share/dim/src/freebsd/clang600-import/contrib/llvm/include/llvm/CodeGen/SelectionDAG.h:293
#8 0x0000000003529dde in RemoveDeadNodes () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:610
#9 0x00000000035556df in MorphNodeTo () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:6794
#10 0x00000000033a9acc in MorphNode () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:2594
#11 0x00000000033ac80b in SelectCodeCommon () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:3601
#12 0x00000000023d464b in SelectCode () at /usr/obj/share/dim/src/freebsd/clang600-import/amd64.amd64/tmp/obj-tools/lib/clang/libllvm/X86GenDAGISel.inc:282902
#13 Select () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:3072
#14 0x00000000033a5afa in DoInstructionSelection () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:988
#15 0x00000000033a4e1a in CodeGenAndEmitDAG () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:868
#16 0x00000000033a2643 in SelectAllBasicBlocks () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1624
#17 0x000000000339f158 in runOnMachineFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:466
#18 0x00000000023d03c4 in runOnMachineFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:175
#19 0x00000000035cc8c2 in runOnFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/MachineFunctionPass.cpp:62
#20 0x00000000030dca9a in runOnFunction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1520
#21 0x00000000030dccf3 in runOnModule () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1541
#22 0x00000000030dd228 in runOnModule () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1597
#23 run () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/IR/LegacyPassManager.cpp:1700
#24 0x00000000014db578 in EmitAssembly () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:815
#25 EmitBackendOutput () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:1181
#26 0x00000000014d5b26 in HandleTranslationUnit () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:292
#27 0x0000000001c4c332 in ParseAST () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/Parse/ParseAST.cpp:159
#28 0x00000000015d546c in Execute () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:897
#29 0x0000000001cec311 in ExecuteAction () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/Frontend/CompilerInstance.cpp:991
#30 0x00000000014b4f81 in ExecuteCompilerInvocation () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:252
#31 0x00000000014aa73f in cc1_main () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/tools/driver/cc1_main.cpp:221
#32 0x00000000014b2928 in ExecuteCC1Tool () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/tools/driver/driver.cpp:309
#33 main () at /share/dim/src/freebsd/clang600-import/contrib/llvm/tools/clang/tools/driver/driver.cpp:388
(gdb) frame 1
#1 salvageDebugInfo () at /share/dim/src/freebsd/clang600-import/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7116
7116 if (DV->isInvalidated())
(gdb) disassemble
Dump of assembler code for function salvageDebugInfo():
[...]
0x0000000003557348 <+744>: nopl 0x0(%rax,%rax,1)
0x0000000003557350 <+752>: mov (%r12),%r13
=> 0x0000000003557354 <+756>: cmpb $0x0,0x31(%r13)
0x0000000003557359 <+761>: jne 0x35573b0 <salvageDebugInfo()+848>
(gdb) info registers
[...]
r13 0x5a5a5a5a5a5a5a5a 6510615555426900570
```
The `0x5a5a5a5a5a5a5a5a` value in `r13` indicates the memory was either
uninitialized, or already freed.
Unfortunately I do not have a simple self-contained test case for this.
However, it seems pretty clear that the call to `AddDbgValue()` in
`salvageDebugInfo()` causes the problems, since it modifies
`SelectionDag::DbgInfo` while looping through one of its DenseMaps:
```
void SelectionDAG::salvageDebugInfo(SDNode &N) {
[...]
for (auto DV : GetDbgValues(&N)) {
if (DV->isInvalidated())
continue;
[...]
AddDbgValue(Clone, N0.getNode(), false);
[...]
}
}
```
At least, if I comment out the `AddDbgValue()` call, the crashes go
away. I propose to change this function slightly, similar to the
`SelectionDAG::transferDbgValues()` function just above it, to save the
cloned SDDbgValues in a separate SmallVector, and only call
AddDbgValue() on them after the for loop is done.
Reviewers: aprantl, bogner, bkramer, davide
Reviewed By: davide
Subscribers: davide, krytarowski, JDevlieghere, emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D41589
llvm-svn: 321545
For example, float operations may fail to constant fold under certain circumstances (inf/nan/denormal creation etc.)
Reduced from oss-fuzz #4802 test case
llvm-svn: 321488
This moves the combine for turning ANDs into shuffle with zero out of SimplifyVBinOps and places it only in visitAND below the reassociate handling. This fixes the specific case I noticed where we failed to combine two ands with constants.
llvm-svn: 321417
getOperand returns an SDValue that contains the node and the result number. There is no guarantee that the result number if 0. By using the -> operator we are calling SDNode::getValueType rather than SDValue::getValueType. This requires supplying a result number and we shouldn't assume it was 0.
I don't have a test case. Just noticed while cleaning up some other code and saw that it occurred in other places.
llvm-svn: 321397
BaseIndexOffset supercedes findBaseOffset analysis save only Constant
Pool addresses. Migrate analysis to BaseIndexOffset.
Relanding after correcting base address matching check.
llvm-svn: 321389
Re-land r321234. It had to be reverted because it broke the shared
library build. The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).
Original commit message:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321375
This seems to improve X86's ability to match this into an address computation. Otherwise the other operand gets assigned to the base register and the stack pointer + frame index ends up in the index register. But index registers can't encode ESP/RSP so we end up having to move it into another register to meet the constraint.
I could try to improve the address matcher in X86, but swapping the producer seemed easier. Several other places already have the operands in this order so this is at least consistent.
llvm-svn: 321370
Summary:
This replaces calls to getEntryCount().hasValue() with hasProfileData
that does the same thing. This refactoring is useful to do before adding
synthetic function entry counts but also a useful cleanup IMO even
otherwise. I have used hasProfileData instead of hasRealProfileData as
David had earlier suggested since I think profile implies "real" and I
use the phrase "synthetic entry count" and not "synthetic profile count"
but I am fine calling it hasRealProfileData if you prefer.
Reviewers: davidxl, silvas
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41461
llvm-svn: 321331
The knownbits_mask_or_shuffle_uitofp change is interesting - shuffle combines manage to kick in, removing the AND constant mask load. For targets with fast-variable-shuffle this should reduce further to VPOR+VPSHUFB+VCVTDQ2PS.
llvm-svn: 321279
If the SRL node is only used by an AND, we may be able to set the
ExtVT to the width of the mask, making the AND redundant. To support
this, another check has been added in isLegalNarrowLoad which queries
whether the load is valid.
Differential Revision: https://reviews.llvm.org/D41350
llvm-svn: 321259
Summary:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321234
When intrinsics are allowed to have mem operands, there
are two ways this can happen. First is an intrinsic
that is marked has having a mem operand, but is not handled
by getTgtMemIntrinsic.
The second way can occur even for intrinsics which do not
have a mem operand. It seems the selector table does
some kind of sorting based on the opcode, and the
mem ref recording can happen in the same scope for
intrinsics that both do and do not have mem refs.
I haven't been able to figure out exactly why this happens
(although it happens even with the matcher optimizations disabled).
I'm not sure if it's worth trying to avoid hitting this for
these nodes since I think it's still reasonable to handle
this in case getTgtMemIntrinic is not implemented.
llvm-svn: 321208
Summary:
The function section prefix for PGO based layout (e.g. hot/unlikely)
should look at the hotness of all blocks not just the entry BB.
A function with a cold entry but a very hot loop should be placed in the
hot section, for example, so that it is located close to other hot
functions it may call. For SamplePGO it was already looking at the
branch weights on calls, and I made that code conditional on whether
this is SamplePGO since it was essentially a noop for instrumentation
PGO anyway.
Reviewers: davidxl
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D41395
llvm-svn: 321197
These functions simply call their counterparts in the associated SDNode,
which do take an optional SelectionDAG. This change makes the legalization
debug trace a little easier to read, since target-specific nodes will
now have their names shown instead of "Unknown node #123".
llvm-svn: 321180
It appears the code uses nullptr to represent a void type in debug metadata,
which led to an assertion failure when building DeltaAlgorithm.cpp with a
self-hosted clang on Windows.
I'm not sure why/if the problem was Windows-specific.
Fixes bug https://bugs.llvm.org/show_bug.cgi?id=35543
Differential Revision: https://reviews.llvm.org/D41264
llvm-svn: 321122
Work towards the unification of MIR and debug output by refactoring the
interfaces.
Also add support for printing with a null TargetIntrinsicInfo and no
MachineFunction.
llvm-svn: 321111
Another followup to my refactoring in r321036: Turns out we can end up
with an x86 darwin target that is not macos (simulator triples can look
like i386-apple-ios) so we need the x86/32bit check in all cases.
llvm-svn: 321104
Summary:
Extend overlapping store elision to handle overwrites of stores by
larger stores.
Nontemporal tests have been modified to add memory dependencies to
prevent store elision.
Reviewers: craig.topper, rnk, t.p.northover
Subscribers: javed.absar, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40969
llvm-svn: 321089
Work towards the unification of MIR and debug output by refactoring the
interfaces.
Before this patch we printed "<call frame instruction>" in the debug
output.
llvm-svn: 321084
I missed some prefixes and the fact that on AArch64 we use "bzero"
instead of "__bzero" as on X86 when doing my refactoring in r321035.
Improve tests for bzero.
llvm-svn: 321046
I missed the fact that the later called InitLibcallCallingConvs()
overrides some things set in InitLibcalls() when I did the refactoring
in r321036.
Fix by merging InitLibcallCallingConvs() into InitLibcalls() and doing
the initialization earlier.
llvm-svn: 321045
Note:
- X86ISelLowering: setLibcallName(SINCOS) was superfluous as
InitLibcalls() already does it.
- ARMISelLowering: Setting libcallnames for sincos/sincosf seemed
superfluous as in the darwin case it wouldn't be used while for all
other cases InitLibcalls already does it.
llvm-svn: 321036
Adds missing support for DW_FORM_data16.
Update of r320852/r320886, fixing the unittest again, this time use a
raw char string for the test data.
Differential Revision: https://reviews.llvm.org/D41090
llvm-svn: 321011
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D41177
llvm-svn: 320962
When we put the value in select placeholder we must pass
the value through simplification tracker due to the value might
be already simplified and erased.
This is a fix for PR35658.
Reviewers: john.brawn, uabelho
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41251
llvm-svn: 320956
Adds missing support for DW_FORM_data16.
Update of r320852, fixing the unittest to use a hand-coded struct
instead of std::array to guarantee data layout.
Differential Revision: https://reviews.llvm.org/D41090
llvm-svn: 320886
Summary:
Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1.
We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires.
For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that.
For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them.
Reviewers: RKSimon, delena, spatel, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40942
llvm-svn: 320849
Work towards the unification of MIR and debug output by printing
`%stack.0` instead of `<fi#0>`, and `%fixed-stack.0` instead of
`<fi#-4>` (supposing there are 4 fixed stack objects).
Only debug syntax is affected.
Differential Revision: https://reviews.llvm.org/D41027
llvm-svn: 320827
The following CFI directives are suported by MC but not by MIR:
* .cfi_rel_offset
* .cfi_adjust_cfa_offset
* .cfi_escape
* .cfi_remember_state
* .cfi_restore_state
* .cfi_undefined
* .cfi_register
* .cfi_window_save
Add support for printing, parsing and update tests.
Differential Revision: https://reviews.llvm.org/D41230
llvm-svn: 320819
This makes it work better with some build_vector and concat_vectors creations.
Adjust the NewSDValueDbgMsg in getConstant to avoid duplicating the print when it calls getSplatBuildVector since getSplatBuildVector didn't trigger a print before.
llvm-svn: 320783
Summary:
- lowers @llvm.global_dtors by adding @llvm.global_ctors
functions which register the destructors with `__cxa_atexit`.
- impements @llvm.global_ctors with wasm start functions and linker metadata
See [here](https://github.com/WebAssembly/tool-conventions/issues/25) for more background.
Subscribers: jfb, dschuff, mgorny, jgravelle-google, aheejin, sunfish
Differential Revision: https://reviews.llvm.org/D41211
llvm-svn: 320774
While investigating LLVM PR22316 (http://llvm.org/bugs/show_bug.cgi?id=22316)
I started wondering if it were not always preferable to emit the
initial DBG_VALUEs for stack arguments as FI locations instead of
describing the first register they get copied into. The advantage of
doing this is that the arguments will be available as soon as the
stack is setup. As illustrated by the testcase in the PR, the first
copy of the FI into a register may be sunk by MachineSink.cpp into a
later basic block. By describing the argument on the stack, we nicely
circumvent this problem.
<rdar://problem/19583723>
Differential Revision: https://reviews.llvm.org/D41135
llvm-svn: 320758
Most of the -Wsign-compare warnings are due to the fact that
enums are signed by default in the MS ABI, while the
tautological comparison warnings trigger on x86 builds where
sizeof(size_t) is 4 bytes, so N > numeric_limits<unsigned>::max()
is always false.
Differential Revision: https://reviews.llvm.org/D41256
llvm-svn: 320750
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.
On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.
llvm-svn: 320746
Work towards the unification of MIR and debug output by printing
`<mcsymbol sym>` instead of `<MCSym=sym>`.
Only debug syntax is affected.
llvm-svn: 320685
Work towards the unification of MIR and debug output by printing
`liveout(...)` instead of `<regliveout>`.
Only debug syntax is affected.
llvm-svn: 320683
Work towards the unification of MIR and debug output by printing
`@foo` instead of `<ga:@foo>`.
Also print target flags in the MIR format since most of them are used on
global address operands.
Only debug syntax is affected.
llvm-svn: 320682
Recommitting rL319773, which was reverted due to a recursive issue
causing timeouts. This happened because I failed to check whether
the discovered loads could be narrowed further. In the case of a tree
with one or more narrow loads, that could not be further narrowed, as
well as a node that would need masking, an AND could be introduced
which could then be visited and recombined again with the same load.
This could again create the masking load, with would be combined
again... We now check that the load can be narrowed so that this
process stops.
Original commit message:
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D41177
llvm-svn: 320679
A v32i1 CONCAT_VECTORS of v16i1 uses promotion to v32i8 to legalize the v32i1. This results in a bunch of extract_vector_elts and a build_vector that ultimately gets scalarized.
This patch checks to see if v16i8 is legal and inserts a any_extend to that so that we can concat v16i8 to v32i8 and avoid creating the extracts.
llvm-svn: 320674
If so go ahead and get the promoted input vector to extract from. Previously, we would create a bunch of any_extends of extract_vector_elts with illegal input type that needs to be promoted. The legalization of those extract_vector_elts would then potentially introduce a truncate. So now we have a bunch of any_extends of truncates. By legalizing both parts together we avoid creating these extra nodes.
The test changes seem to be because we were previously combining the build_vector with the any_extend before the any_extend got combined with the truncate.
llvm-svn: 320669
Factor out duplicated code emitting mach-o version-min specifiers.
This should be NFC but happens to fix a bug where the code in
MCMachoStreamer didn't take the version skew between darwin and macos
versions into account.
llvm-svn: 320666
Two issues were found about machine inst scheduler when compiling ProRender
with -g for amdgcn target:
GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it
should not since DBG_VALUE is not mapped in LiveIntervals.
when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and
ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D41132
llvm-svn: 320650
Currently this is an LLVM extension to the COFF spec which is
experimental and intended to speed up linking. For now it is
behind a hidden cl::opt flag, but in the future we can move it
to a "real" cc1 flag and have the driver pass it through whenever
it is appropriate.
The patch to actually make use of this section in lld will come
in a followup.
Differential Revision: https://reviews.llvm.org/D40917
llvm-svn: 320649
Shrink wrapping should ignore DBG_VALUEs referring to frame indices,
since the presence of debug information must not affect code
generation.
Differential Revision: https://reviews.llvm.org/D41187
llvm-svn: 320606
Work towards the unification of MIR and debug output by printing `target-index(target-specific) + 8` instead of `<ti#0+8>` and `target-index(target-specific) + 8` instead of `<ti#0-8>`.
Only debug syntax is affected.
llvm-svn: 320565
Work towards the unification of MIR and debug output by printing
`%const.0 + 8` instead of `<cp#0+8>` and `%const.0 - 8` instead of
`<cp#0-8>`.
Only debug syntax is affected.
Differential Revision: https://reviews.llvm.org/D41116
llvm-svn: 320564
Headers/Implementation files should be named after the class they
declare/define.
Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`
llvm-svn: 320546
Summary:
Add isRenamable() predicate to MachineOperand. This predicate can be
used by machine passes after register allocation to determine whether it
is safe to rename a given register operand. Register operands that
aren't marked as renamable may be required to be assigned their current
register to satisfy constraints that are not captured by the machine
IR (e.g. ABI or ISA constraints).
Reviewers: qcolombet, MatzeB, hfinkel
Subscribers: nemanjai, mcrosier, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D39400
llvm-svn: 320503
When either instruction in a fused pair has no other dependency, besides on
the other instruction, make sure that other instructions do not get
scheduled between them. Additionally, avoid fusing an instruction more than
once along the same dependency chain.
Differential revision: https://reviews.llvm.org/D36704
llvm-svn: 320420
This is due to PR26161 needing to be resolved before we can fix
big endian bugs like PR35359. The work to split aggregates into smaller LLTs
instead of using one large scalar will take some time, so in the mean time
we'll fall back to SDAG.
Some ARM BE tests xfailed for now as a result.
Differential Revision: https://reviews.llvm.org/D40789
llvm-svn: 320388
At first, I tried to thread the x86 needle and use a target hook (isVectorShiftByScalarCheap())
to disable the transform only for non-splat pow-of-2 constants, but not AVX2, but only some
element types, but...it's difficult.
Here we just avoid the loop with the x86 vector transform that conflicts with the general DAG
combine and preserve all of the existing behavior AFAICT otherwise.
Some tests that will probably fail if someone does try to restrict this in a more targeted way
for x86-only may be found in:
test/CodeGen/X86/combine-mul.ll
test/CodeGen/X86/vector-mul.ll
test/CodeGen/X86/widen_arith-5.ll
This should prevent the infinite looping seen with:
https://bugs.llvm.org/show_bug.cgi?id=35579
Differential Revision: https://reviews.llvm.org/D41040
llvm-svn: 320374
This commit is the first part of https://reviews.llvm.org/D40348.
In order to allow target combines to be performed on newly combined
indexed loads, add them back to the worklist. The remainder of the
above patch will be committed in subsequent revisions and will use
this. Test cases will be included with those follow-up commits.
llvm-svn: 320365
This is a preparatory step for D34515.
This change:
- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
borrow, we need to invert the value of the second operand and result before
and after using ARMISD::SUBE. We need to invert the carry result of
ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
as well otherwise i64 operations now would require branches. This implies
updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
- fixes PR34045
- fixes PR34564
- fixes PR35103
Differential Revision: https://reviews.llvm.org/D35192
llvm-svn: 320355
Introduces the AddrFI "addressing mode", which is necessary simply because
it's not possible to write a pattern that directly matches a frameindex.
Ensure callee-saved registers are accessed relative to the stackpointer. This
is necessary as callee-saved register spills are performed before the frame
pointer is set.
Move HexagonDAGToDAGISel::isOrEquivalentToAdd to SelectionDAGISel, so we can
make use of it in the RISC-V backend.
Differential Revision: https://reviews.llvm.org/D39848
llvm-svn: 320353
Summary:
This relaxes an assertion inside SelectionDAGBuilder which is overly
restrictive on targets which have no concept of alignment (such as AVR).
In these architectures, all types are aligned to 8-bits.
After this, LLVM will only assert that accesses are aligned on targets
which actually require alignment.
This patch follows from a discussion on llvm-dev a few months ago
http://llvm.1065342.n5.nabble.com/llvm-dev-Unaligned-atomic-load-store-td112815.html
Reviewers: bogner, nemanjai, joerg, efriedma
Reviewed By: efriedma
Subscribers: efriedma, cactus, llvm-commits
Differential Revision: https://reviews.llvm.org/D39946
llvm-svn: 320243
Summary:
This is LLVM instrumentation for the new HWASan tool. It is basically
a stripped down copy of ASan at this point, w/o stack or global
support. Instrumenation adds a global constructor + runtime callbacks
for every load and store.
HWASan comes with its own IR attribute.
A brief design document can be found in
clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier).
Reviewers: kcc, pcc, alekseyshl
Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D40932
llvm-svn: 320217
MachineSink attempts to place instructions near the basic blocks where
they are needed. Once an instruction has been sunk, its location
relative to other instructions no longer is consistent with the
original source code. In order to ensure correct stepping in the
debugger, the debug location for sunk instructions is either merged
with the insertion point or erased if the target successor block is
empty.
Originally submitted as r318679, revised to fix sanitizer failure and
improve testing.
Patch by Matthew Voss!
Differential Revision: https://reviews.llvm.org/D39933
llvm-svn: 320216
Work towards the unification of MIR and debug output by refactoring the
interfaces.
Add support for operand subreg index as an immediate to debug printing
and use ::print in the MIRPrinter.
Differential Review: https://reviews.llvm.org/D40965
llvm-svn: 320209
I noticed this pattern in D38316 / D38388. We failed to combine a shuffle that is either
repeating a scalar insertion at the same position in a vector or translated to a different
element index.
Like the earlier patch, this could be an instcombine too, but since we opted to make this
a DAG transform earlier, I've made this one a DAG patch too.
We do not need any legality checking because the new insert is identical to the existing
insert except that it may have a different constant insertion operand.
The constant insertion test in test/CodeGen/X86/vector-shuffle-combining.ll was the
motivation for D38756.
Differential Revision: https://reviews.llvm.org/D40209
llvm-svn: 320050
Summary:
Changed use_instructions() to use_nodbg_instructions() when
building an instruction set.
We don't want the presence of debug info to affect the code
we generate.
Reviewers: dblaikie, Eugene.Zelenko, chandlerc, aprantl
Reviewed By: aprantl
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D40882
llvm-svn: 320010
Currently, when creating a named section, the Wasm
frontend forces it to use `SectionKind::Data`, whereas
in fact C++ does generate code sections with custom
names.
Patch by Nicholas Wilson
Differential Revision: https://reviews.llvm.org/D40906
llvm-svn: 320002
Summary:
When calculating the RootLatency, we add up all the latencies of the
deleted instructions. But for NewRootLatency we only add the latency of
the new root instructions, ignoring the latencies of the other
instructions inserted. This leads the combiner to underestimate the cost
of patterns which add multiple instructions. This patch fixes that by
summing up the latencies of all new instructions. For NewRootNode, the
more complex getLatency function is used.
Note that we may be slightly more precise than just summing up
all latencies. For example, consider a pattern like
r1 = INS1 ..
r2 = INS2 ..
r3 = INS3 r1, r2
I think in some other places, the total latency of the pattern would be
estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider
that worth changing, I think it would be best to do in a follow-up
patch.
Reviewers: Gerolf, sebpop, spop, fhahn
Reviewed By: fhahn
Subscribers: evandro, llvm-commits
Differential Revision: https://reviews.llvm.org/D40307
llvm-svn: 319951
If the mask needs to be promoted that should occur by the legalizer detecting the mask operand needs to be promoted not as a side effect of another action.
llvm-svn: 319852
The mask will be promoted if necessary when operands are promoted. It's possible the mask type is legal, but the setcc result type is a different. We shouldn't promote to the setcc result type unless the mask needs to be promoted.
llvm-svn: 319850
The patch originally broke Chromium (crbug.com/791714) due to its failing to
specify that the new pseudo instructions clobber EFLAGS. This commit fixes
that.
> Summary: This strengthens the guard and matches MSVC.
>
> Reviewers: hans, etienneb
>
> Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D40622
llvm-svn: 319824
There's no such thing as a setcc with vector operands and scalar result. And if we're trying to widen the result we would have to already be looking at a vector result type.
So this patch renames the VSETCC function as the SETCC function and delete the original SETCC function.
llvm-svn: 319799
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D39604
llvm-svn: 319773
Summary:
Found out, at code inspection, that there was a fault in
DAGCombiner::CombineConsecutiveLoads for big-endian targets.
A BUILD_PAIR is always having the least significant bits of
the composite value in element 0. So when we are doing the checks
for consecutive loads, for big endian targets, we should check
if the load to elt 1 is at the lower address and the load
to elt 0 is at the higher address.
Normally this bug only resulted in missed oppurtunities for
doing the load combine. I guess that in some rare situation it
could lead to faulty combines, but I've not seen that happen.
Note that this patch actually will trigger load combine for
some big endian regression tests.
One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get
t76: i64,ch = load<LD8[FixedStack-9]
instead of
t37: i32,ch = load<LD4[FixedStack-10]>
t35: i32,ch = load<LD4[FixedStack-9]>
t41: i64 = build_pair t37, t35
before legalization. Then the legalization will split the LD8
into two loads, so the end result is the same. That should
verify that the transfomation is correct now.
Reviewers: niravd, hfinkel
Reviewed By: niravd
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D40444
llvm-svn: 319771
Pull the checks upon the load out from ReduceLoadWidth into their own
function.
Differential Revision: https://reviews.llvm.org/D40833
llvm-svn: 319766
MachineRegisterInfo used to allow just one regalloc hint per virtual
register. This patch extends this to a vector of regalloc hints, which is
filled in by common code with sorted copy hints. Such hints will make for
more ID copies that can be removed.
NB! This improvement is currently (and hopefully temporarily) *disabled* by
default, except for SystemZ. The only reason for this is the big impact this
has on tests, which has unfortunately proven unmanageable. It was a long
while since all the tests were updated and just waiting for review (which
didn't happen), but now targets have to enable this themselves
instead. Several targets could get a head-start by downloading the tests
updates from the Phabricator review. Thanks to those who helped, and sorry
you now have to do this step yourselves.
This should be an improvement generally for any target!
The target may still create its own hint, in which case this has highest
priority and is stored first in the vector. If it has target-type, it will
not be recomputed, as per the previous behaviour.
The temporary hook enableMultipleCopyHints() will be removed as soon as all
targets return true.
Review: Quentin Colombet, Ulrich Weigand.
https://reviews.llvm.org/D38128
llvm-svn: 319754
The CONCAT_VECTORS operand get its type from getSetCCResultType, but if the mask type and the setcc have different scalar sizes this creates an illegal CONCAT_VECTORS operation. The concat type should be 2x the mask type, and then an extend should be added if needed.
llvm-svn: 319744
Consistently use the same parameter names as the names of the affected
fields. This avoids some unintuitive abbreviations like `isSS`.
llvm-svn: 319722
While we cannot skip the whole TwoAddressInstructionPass even for -O0
there are some parts of the pass that are currently skipped at -O0 but
not for optnone. Changing this as there is no reason to have those two
hit different code paths here.
llvm-svn: 319721
MatchRotate assumes the types of the types of LHS and RHS are equal,
which is always the case then they come from an OR node, but here
we're getting them from two different TRUNC nodes, so we have to check
the types.
llvm-svn: 319695
If the truncation has been pushed past the or-node, look through it and
truncate afterwards.
Differential revision: https://reviews.llvm.org/D40792
llvm-svn: 319692
This patch splits atomics out of the generic G_LOAD/G_STORE and into their own
G_ATOMIC_LOAD/G_ATOMIC_STORE. This is a pragmatic decision rather than a
necessary one. Atomic load/store has little in implementation in common with
non-atomic load/store. They tend to be handled very differently throughout the
backend. It also has the nice side-effect of slightly improving the common-case
performance at ISel since there's no longer a need for an atomicity check in the
matcher table.
All targets have been updated to remove the atomic load/store check from the
G_LOAD/G_STORE path. AArch64 has also been updated to mark
G_ATOMIC_LOAD/G_ATOMIC_STORE legal.
There is one issue with this patch though which also affects the extending loads
and truncating stores. The rules only match when an appropriate G_ANYEXT is
present in the MIR. For example,
(G_ATOMIC_STORE (G_TRUNC:s16 (G_ANYEXT:s32 (G_ATOMIC_LOAD:s16 X))))
will match but:
(G_ATOMIC_STORE (G_ATOMIC_LOAD:s16 X))
will not. This shouldn't be a problem at the moment, but as we get better at
eliminating extends/truncates we'll likely start failing to match in some
cases. The current plan is to fix this in a patch that changes the
representation of extending-load/truncating-store to allow the MMO to describe
a different type to the operation.
llvm-svn: 319691
Summary:
Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so
that it can be called from other places.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40750
llvm-svn: 319689
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
An instruction returned by TII->convertToThreeAddress() may contain a %noreg
(undef) operand, which is not expected by tryInstructionTransform(). So if
this MI is sunk to a lower point in MBB, it must be skipped when later
encountered.
A new set SunkInstrs is used for this purpose.
Note: there is no test supplied here, as this was triggered on SystemZ while
working on a review of instruction flags. A test case for this bugfix will be
included in the upcoming SystemZ commit.
Review: Quentin Colombet
https://reviews.llvm.org/D40711
llvm-svn: 319646
Both LoadedVT and NarrowLoad are passed as references and neither
of them are used by any of its callers.
Differential Revision: https://reviews.llvm.org/D40713
llvm-svn: 319645
If we have a non-splat constant shift amount, the minimum shift amount can be used to infer the number of zero upper bits of the result. There's probably a lot more that we can do here, but this
fixes a case where I wanted to infer the sign bit as zero when all the shift amounts are non-zero.
llvm-svn: 319639
SelectionDAGISel::LowerArguments assumes sret addr space is 0, which is
not true for amdgcn---amdgiz target.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D40255
llvm-svn: 319630
Two issues found when doing codegen for splitting vector with non-zero alloca addr space:
DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT/SplitVecOp_EXTRACT_VECTOR_ELT uses dummy pointer info for creating
SDStore. Since one pointer operand contains multiply and add, InferPointerInfo is unable to
infer the correct pointer info, which ends up with a dummy pointer info for the target to lower
store and results in isel failure. The fix is to introduce MachinePointerInfo::getUnknownStack to
represent MachinePointerInfo which is known in alloca address space but without other information.
TargetLowering::getVectorElementPointer uses value type of pointer in addr space 0 for
multiplication of index and then add it to the pointer. However the pointer may be in an addr
space which has different size than addr space 0. The fix is to use the pointer value type for
index multiplication.
Differential Revision: https://reviews.llvm.org/D39758
llvm-svn: 319622
Currently, the outliner considers candidates that intersect with themselves in
the candidate pruning step. That is, candidates of the form "AA" in ranges like
"AAAAAA". In that range, it looks like there are 5 instances of "AA" that could
possibly be outlined, and that's considered in the benefit calculation.
However, only at most 3 instances of "AA" could ever be outlined in "AAAAAA".
Thus, it's possible to pass through "AA" to the candidate selection step even
though it's *never* the case that "AA" could be outlined. This makes it so that
when we find candidates, we consider only non-overlapping occurrences of that
candidate.
llvm-svn: 319588
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
Re-commit after fixing the failing testcase in rL319576, rL319577 and
rL319578.
llvm-svn: 319581
These are blocks that haven't not been executed during training. For large
projects this could make a significant difference. For the project, I was
looking at, I got an order of magnitude decrease in the size of the total YAML
files with this and r319235.
Differential Revision: https://reviews.llvm.org/D40678
llvm-svn: 319556
Summary:
1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to
accommodate similar operand appearing in the DAG e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will now look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity
then complex LEAs (having 3 operands) could be factored out e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop.
4/ Simplify LEA converts (lea (BASE,1,INDEX,0) --> add (BASE, INDEX) which offers better through put.
PR32755 will be taken care of by this pathc.
Previous patch revisions : r313343 , r314886
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy, jbhateja
Reviewed By: lsaba, RKSimon, jbhateja
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 319543
Summary: LegalizerInfo assumes all G_MERGE_VALUES and G_UNMERGE_VALUES instructions are legal, so it is not possible to legalize vector operations on illegal vector types. This patch fixes the problem by removing the related check and adding default actions for G_MERGE_VALUES and G_UNMERGE_VALUES.
Reviewers: qcolombet, ab, dsanders, aditya_nandakumar, t.p.northover, kristof.beyls
Reviewed By: dsanders
Subscribers: rovka, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39823
llvm-svn: 319524
Type promotion makes no guarantee about the contents of the promoted bits. Since the gather/scatter instruction will use the bits to calculate addresses, we need to ensure they aren't garbage.
llvm-svn: 319520
These command line options are not intended for public use, and often
don't even make sense in the context of a particular tool anyway. About
90% of them are already hidden, but when people add new options they
forget to hide them, so if you were to make a brand new tool today, link
against one of LLVM's libraries, and run tool -help you would get a
bunch of junk that doesn't make sense for the tool you're writing.
This patch hides these options. The real solution is to not have
libraries defining command line options, but that's a much larger effort
and not something I'm prepared to take on.
Differential Revision: https://reviews.llvm.org/D40674
llvm-svn: 319505
G_ATOMICRMW_* is generally legal on AArch64. The exception is G_ATOMICRMW_NAND.
G_ATOMIC_CMPXCHG_WITH_SUCCESS needs to be lowered to G_ATOMIC_CMPXCHG with an
external comparison.
Note that IRTranslator doesn't generate these instructions yet.
llvm-svn: 319466
output
As part of the unification of the debug format and the MIR format,
always use `printReg` to print all kinds of registers.
Updated the tests using '_' instead of '%noreg' until we decide which
one we want to be the default one.
Differential Revision: https://reviews.llvm.org/D40421
llvm-svn: 319445
Re applying after fixing issues in the diff, sorry for any painful conflicts/merges!
Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html
This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128).
The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary.
There is a follow up change to add an option to clang.
Thanks.
Reviewers: hfinkel, MatzeB
Reviewed By: MatzeB
Subscribers: thegameg, asb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39788
llvm-svn: 319430
As part of the unification of the debug format and the MIR format, avoid
printing "vreg" for virtual registers (which is one of the current MIR
possibilities).
Basically:
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/%vreg([0-9]+)/%\1/g"
* grep -nr '%vreg' . and fix if needed
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E "s/ vreg([0-9]+)/ %\1/g"
* grep -nr 'vreg[0-9]\+' . and fix if needed
Differential Revision: https://reviews.llvm.org/D40420
llvm-svn: 319427
Summary:
Original RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-August/117028.html
I wasn't sure who to put as reviewers, so please add/remove people as appropriate.
This change adds a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. The section contains pairs of function symbol references (8 byte) and stack sizes (unsigned LEB128).
The contents of this section can be used to measure changes to stack sizes between different versions of the compiler or a source base. The advantage of having a section is that we can extract this information when examining binaries that we didn't build, and it allows users and tools easy access to that information just by referencing the binary.
There is a follow up change to add an option to clang.
Thanks.
Reviewers: hfinkel, MatzeB
Reviewed By: MatzeB
Subscribers: thegameg, asb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39788
llvm-svn: 319423
visitAND attempts to narrow the width of extending loads that are
then masked off. ReduceLoadWidth already exists for a similar purpose
and handles shifts, so I've moved the code to handle AND nodes there.
Differential Revision: https://reviews.llvm.org/D39595
llvm-svn: 319421
If we put in an assertsext/zext here, we're able to generate better truncate code using pack on pre-avx512 targets.
Similar is already done during type legalization. This is the equivalent for op legalization
Differential Revision: https://reviews.llvm.org/D40591
llvm-svn: 319368
If common type is different we should bail out due to we will not be
able to create a select or Phi of these values.
Basically it is done in ExtAddrMode::compare however it does not work
if we handle the null first and then two values of different types.
so add a check in initializeMap as well. The check in ExtAddrMode::compare
is used as earlier bail out.
Reviewers: reames, john.brawn
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40479
llvm-svn: 319292
The object can't straddle the address space
wrap around, so I think it's OK to assume any
offsets added to the base object pointer can't
overflow. Similar logic already appears to be
applied in SelectionDAGBuilder when lowering
aggregate returns.
llvm-svn: 319272
Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel.
The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value.
llvm-svn: 319259
Summary:
Recommitting this with the correct sorting predicate. The Low field of Clusters is a ConstantInt and
cannot be directly compared. So we needed to invoke slt (signed less than) to compare correctly.
This fixes failures in the following tests uncovered by D39245:
LLVM :: CodeGen/ARM/ifcvt3.ll
LLVM :: CodeGen/ARM/switch-minsize.ll
LLVM :: CodeGen/X86/switch.ll
LLVM :: CodeGen/X86/switch-bt.ll
LLVM :: CodeGen/X86/switch-density.ll
Reviewers: hans, fhahn
Reviewed By: hans
Subscribers: aemerson, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D40541
llvm-svn: 319210
Summary:
They're not always mutually exclusive. read-modify-write atomics are both
at the same time. One example of this is the SWP instructions on AArch64.
Another example is GlobalISel's G_ATOMICRMW_* generic instructions which
will be added in a later patch.
Reviewers: arphaman, aemerson
Reviewed By: aemerson
Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D40157
llvm-svn: 319202
The motivation behind this patch is that future directions require us to
be able to compute the hash value of records independently of actually
using them for de-duplication.
The current structure of TypeSerializer / TypeTableBuilder being a
single entry point that takes an unserialized type record, and then
hashes and de-duplicates it is not flexible enough to allow this.
At the same time, the existing TypeSerializer is already extremely
complex for this very reason -- it tries to be too many things. In
addition to serializing, hashing, and de-duplicating, ti also supports
splitting up field list records and adding continuations. All of this
functionality crammed into this one class makes it very complicated to
work with and hard to maintain.
To solve all of these problems, I've re-written everything from scratch
and split the functionality into separate pieces that can easily be
reused. The end result is that one class TypeSerializer is turned into 3
new classes SimpleTypeSerializer, ContinuationRecordBuilder, and
TypeTableBuilder, each of which in isolation is simple and
straightforward.
A quick summary of these new classes and their responsibilities are:
- SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of
bytes. Does not do any hashing. Every time you call it, it will
re-serialize and return bytes again. The same instance can be re-used
over and over to avoid re-allocations, and in exchange for this
optimization the bytes returned by the serializer only live until the
caller attempts to serialize a new record.
- ContinuationRecordBuilder : Turns a FieldList-like record into a series
of fragments. Does not do any hashing. Like SimpleTypeSerializer,
returns references to privately owned bytes, so the storage is
invalidated as soon as the caller tries to re-use the instance. Works
equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a
long-standing theoretical limitation of the previous implementation.
- TypeTableBuilder : Accepts sequences of bytes that the user has already
serialized, and inserts them by de-duplicating with a hash table. For
the sake of convenience and efficiency, this class internally stores a
SimpleTypeSerializer so that it can accept unserialized records. The
same is not true of ContinuationRecordBuilder. The user is required to
create their own instance of ContinuationRecordBuilder.
Differential Revision: https://reviews.llvm.org/D40518
llvm-svn: 319198
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
This is needed for cases when the memory access is not as big as the width of
the data type. For instance, storing i1 (1 bit) would be done in a byte (8
bits).
Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a
size of 0, which for instance makes alias analysis return NoAlias even when
it shouldn't.
There are no tests as this was done as a follow-up to the bugfix for the case
where this was discovered (r318824). This handles more similar cases.
Review: Björn Petterson
https://reviews.llvm.org/D40339
llvm-svn: 319173
LLVM Coding Standards:
Function names should be verb phrases (as they represent actions), and
command-like function should be imperative. The name should be camel
case, and start with a lower case letter (e.g. openFile() or isFoo()).
Differential Revision: https://reviews.llvm.org/D40416
llvm-svn: 319168
The priorities in the section name suffixes are zero padded,
allowing the linker to just do a lexical sort.
Add zero padding for .ctors sections in ELF as well.
Differential Revision: https://reviews.llvm.org/D40407
llvm-svn: 319150
Unoptimized IR can have linear sequences of stores to an array, where the
initial GEP for the first store is formed from the pointer to the array, and the
GEP for each store after the first is formed from the previous GEP with some
offset in an inductive fashion.
The (large) resulting DAG when analyzed by DAGCombine undergoes an excessive
number of combines as each store node is examined every time its' offset node
is combined with any child of the offset. One of the transformations is
findBetterNeighborChains which assists MergeConsecutiveStores. The former
relies on repeated chain walking to do its' work, however MergeConsecutiveStores
is disabled at O0 which makes the transformation redundant.
Any optimization level other than O0 would invoke InstCombine which would
resolve the chain of GEPs into flat base + offset GEP for each store which
does not exhibit the repeated examination of each store to the array.
Disabling this optimization fixes an excessive compile time issue (30~ minutes
for the test case provided) at O0.
Reviewers: niravd, craig.topper, t.p.northover
Differential Revision: https://reviews.llvm.org/D40193
llvm-svn: 319142
This fixes cases where we wouldn't perform various register operand
checks just because we didn't happen to have a definition in the
MCInstrDesc. This changes the code to only skip the tests that actually
depend on the MCInstrDesc definition.
This makes the machine verifier spot the problem from
https://llvm.org/PR33071 after the pass that actually caused it.
llvm-svn: 319141
Additional checks for phi operands:
- first operand should be a virtual register def. It should not be
tied, implicit, internalread, earlyclobber or a read.
- The other operands should be register/mbb operands next to each other
- The register operands should not be implicit, internalread,
earlyclobber, debug or tied.
- We can perform most of the PHI checks even for unreachable blocks.
llvm-svn: 319140
With AVX512 vXi1 types are legal so we shouldn't be extending them.
This change is similar to existing code in the zext(setcc) combine.
llvm-svn: 319120
The current way that trivial addressing modes are detected incorrectly thinks
that null pointers are non-trivial, leading to an infinite loop where we keep
duplicating the same select. Fix this by aware of null when deciding if an
addressing mode is trivial.
Differential Revision: https://reviews.llvm.org/D40447
llvm-svn: 319019
Summary:
Currently ScalarizeVecRes_SETCC checks for the result type being a vector and jumps to ScalarizeVecRes_VSETCC. But if we're scalarizing a vector result, aren't we guaranteed to be looking at a vector type?
This patch deletes the current ScalarizeVecRes_SETCC and renames ScalarizeVecRes_VSETCC to ScalarizeVecRes_SETCC.
Reviewers: RKSimon, arsenm, eladcohen, zvi
Reviewed By: RKSimon
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D40452
llvm-svn: 318982
CodeGenPrepare sinks address computations from one basic block to another
and attempts to reuse address computations that have already been sunk. If
the same address computation appears twice with the first instance as an
operand of a load whose result is an operand to a simplifable select,
CodeGenPrepare simplifies the select and recursively erases the now dead
instructions. CodeGenPrepare then attempts to use the erased address
computation for the second load.
Fix this by erasing the cached address value if it has zero uses before
looking for the address value in the sunken address map.
This partially resolves PR35209.
Thanks to Alexander Richardson for reporting the issue!
This fixed version relands r318032 which was reverted in r318049 due to
sanitizer buildbot failures.
Reviewers: john.brawn
Differential Revision: https://reviews.llvm.org/D39841
llvm-svn: 318956
This patch extends the recent work in optimizeMemoryInst to make it able to
combine more ExtAddrMode fields than just the BaseReg.
This fixes some benchmark regressions introduced by r309397, where GVN PRE is
hoisting a getelementptr such that it can no longer be combined into the
addressing mode of the load or store that uses it.
Differential Revision: https://reviews.llvm.org/D38133
llvm-svn: 318949
TableGen already generates code for selecting a G_FDIV, so we only need
to add a test.
For the legalizer and reg bank select, we do the same thing as for the
other floating point binary operations: either mark as legal if we have
a FP unit or lower to a libcall, and map to the floating point
registers.
llvm-svn: 318915
TableGen already generates code for selecting a G_FMUL, so we only need
to add a test for that part.
For the legalizer and reg bank select, we do the same thing as the other
floating point binary operators: either mark as legal if we have a FP
unit or lower to a libcall, and map to the floating point registers.
llvm-svn: 318910
This patch reverts change to X86TargetLowering::getScalarShiftAmountTy in
rL318727 and move the logic to DAGTypeLegalizer::SplitInteger.
The reason is that getScalarShiftAmountTy returns a shift amount type that
is suitable for common use cases in CodeGen. DAGTypeLegalizer::SplitInteger
is a rare situation which requires a shift amount type larger than what
getScalarShiftAmountTy. In this case, it is more reasonable to do special
handling of shift amount type in DAGTypeLegalizer::SplitInteger only. If
similar situations arises the logic may be moved to a separate function.
Differential Revision: https://reviews.llvm.org/D40320
llvm-svn: 318890
Since i1 is a legal type, this:
NumBytes = Op1->getMemoryVT().getSizeInBits() >> 3;
is wrong and should be instead
NumBytes = Op0->getMemoryVT().getStoreSize();
There seems to be more places where this should be fixed outside DAGCombiner.
Review: Hal Finkel
https://bugs.llvm.org/show_bug.cgi?id=35366
llvm-svn: 318824
DAGTypeLegalizer::SplitInteger uses default pointer size as shift amount constant type,
which causes less performant ISA in amdgcn---amdgiz target since the default pointer
type is i64 whereas the desired shift amount type is i32.
This patch fixes that by using TLI.getScalarShiftAmountTy in DAGTypeLegalizer::SplitInteger.
The X86 change is necessary since splitting i512 requires shifting amount of 256, which
cannot be held by i8.
Differential Revision: https://reviews.llvm.org/D40148
llvm-svn: 318727
Normally this would be cleaned up by promoting the condition operand next. But in the attached case we promoted the result from v2i48 to v2i64 and the condition from v2i1 to v2i48. Then we tried to "promote" the v2i48 condition back to v2i1 because that's what the SetCC result type for v2i64 is on X86 with VLX. But promote is either a NOP or SIGN_EXTEND and this would need a truncation.
With the change here we now get the SetCC type of v2i1 when we're handling the result promotion and the operand no longer needs to be promoted itself.
Fixes PR35272.
llvm-svn: 318706
MachineSink attempts to place instructions near the basic blocks where
they are needed. Once an instruction has been sunk, its location
relative to other instructions is no longer consistent with the
original source code. In order to ensure correct single-stepping and
profiling, the debug location for sunk instructions is either merged
with the insertion point or erased if the target successor block is
empty.
Patch by Matthew Voss!
Differential Revision: https://reviews.llvm.org/D39933
llvm-svn: 318679
The instructions addis,addi, bl are used to calculate the address of TLS thread
local variables. These TLS access code sequences are generated repeatedly every
time the thread local variable is accessed. By communicating to Machine CSE that
X2 is guaranteed to have the same value within the same function call (so called
Caller Preserved Physical Register), the redundant TLS access code sequences are
cleaned up.
Differential Revision: https://reviews.llvm.org/D39173
llvm-svn: 318661
We must collect all AddModes even if they are the same.
This is due to Original value is different but we need all original
values collected as they are used as anchors in common phi finding.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40166
llvm-svn: 318638
Instead of asserting that the type sizes are exactly equal, we check
that the new size is big enough to contain the original type.
We have to relax this constrain because, right now, we sometimes
specify that things that are smaller than a storage type are legal
instead of widening everything to the size of a storage type.
E.g., we say that G_AND s16 is legal and we map that on GPR32.
This is something we may revisit in the future (either by changing
the legalization process or keeping track separately of the storage
size and the size of the type), but let us reflect the reality of
the situation for now.
llvm-svn: 318587
If a vreg's bank is specified in the registers block and one of its
defs or uses also specifies the bank, we end up checking that the
RegBank is equal to diagnose conflicting banks. The problem comes up
for generic vregs, where we weren't fully initializing the VRegInfo
when parsing the registers block, so we'd end up comparing a null
pointer to uninitialized memory.
This fixes a non-deterministic failure when round tripping through MIR
with generic vregs.
llvm-svn: 318543
Previously we were assuming all results were vectors and calling SetWidenedVector, but if its a chain result we should just replace uses instead.
This fixes an error found by expensive checks after r318368.
llvm-svn: 318509
TransferDbgValues (capital 'T') is wired into ReplaceAllUsesWith, and
transferDbgValues (lowercase 't') is used elsewhere (e.g in Legalize).
Both functions should be doing the exact same thing. This patch
consolidates the logic into one place.
This was reverted in r318455 because some newly introduced asserts,
which I thought were NFC, were firing. I filed PR35338. For now I've
weakened the asserts.
Testing: check-llvm, check-clang, and a stage2 Rel+Deb build of clang
Differential Revision: https://reviews.llvm.org/D40104
llvm-svn: 318498
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
The sign extend might be from an i16 or i8 type and was inserted by InstCombine to match the pointer width. X86 gather legalization isn't currently detecting this to reinsert a sign extend to make things legal.
It's a bit weird for the SelectionDAGBuilder to do this kind of optimization in the first place. With this removed we can at least lean on InstCombine somewhat to ensure the index is i32 or i64.
I'll work on trying to recover some of the test cases by removing sign extends in the backend when its safe to do so with an understanding of the current legalizer capabilities.
This should fix PR30690.
llvm-svn: 318466
TransferDbgValues (capital 'T') is wired into ReplaceAllUsesWith, and
transferDbgValues (lowercase 't') is used elsewhere (e.g in Legalize).
Both functions should be doing the exact same thing. This patch
consolidates the logic into one place.
Differential Revision: https://reviews.llvm.org/D40104
llvm-svn: 318448
SelectionDAGBuilder::visitAlloca assumes alloca address space is 0, which is
incorrect for triple amdgcn---amdgiz and causes isel failure.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D40095
llvm-svn: 318392
Change the calculation for the desired ValueType for non-sign
extending loads, as in those cases we don't care about the
higher bits. This creates a smaller ExtVT and allows for such
combinations as:
(srl (zextload i16, [addr]), 8) -> (zextload i8, [addr + 1])
Differential Revision: https://reviews.llvm.org/D40034
llvm-svn: 318390
The LatencyPriorityQueue doesn't currently check whether the SU being removed really exists in the Queue.
This method fails quietly when SU is not found and removes the last element from the Queue, leading to unexpected behavior.
Unfortunately, this only occurs on our custom target, with the custom scheduler. In our case, when remove() is invoked, it removes the wrong SU at the end of the Queue, which is only discovered later when VerifyScheduledDAG() is invoked and finds that some nodes were not scheduled at all.
As this is only reproducible with a lot of proprietary code, I'm hopeful this assert is straightforward enough to not necessitate a test.
Patch by Ondrej Glasnak!
Differential Revision: https://reviews.llvm.org/D40084
llvm-svn: 318387
Summary:
Use use_nodbg_empty() rather than use_empty() in
MachineRegisterInfo::EmitLiveInCopies() when determining if a livein
register has any uses or not. Otherwise a single dbg.value can make us
generate different code, meaning -g would affect code generation.
Found when compiling code for my out-of-tree target. Unfortunately I
haven't been able to reproduce the problem on X86 or any of the other
in-tree targets that I tried, so no test case.
Reviewers: MatzeB
Reviewed By: MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39044
llvm-svn: 318382
For example, this is currently reachable by X86 if you use a masked store intrinsic with a v1iX type.
Using a fatal error seems like a better user experience if someone were to encounter this on a release build. There are several other similar places that have been converted from unreachable to fatal error previously.
llvm-svn: 318379
processDbgDeclares assumes pointer size is the same for different addr spaces.
It uses pointer size for addr space 0 for all pointers, which causes assertion
in stripAndAccumulateInBoundsConstantOffsets for amdgcn---amdgiz since
pointer in addr space 5 has different size than in addr space 0.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D40085
llvm-svn: 318370
Summary:
This patch adds a LLVM_ENABLE_GISEL_COV which, like LLVM_ENABLE_DAGISEL_COV,
causes TableGen to instrument the generated table to collect rule coverage
information. However, LLVM_ENABLE_GISEL_COV goes a bit further than
LLVM_ENABLE_DAGISEL_COV. The information is written to files
(${CMAKE_BINARY_DIR}/gisel-coverage-* by default). These files can then be
concatenated into ${LLVM_GISEL_COV_PREFIX}-all after which TableGen will
read this information and use it to emit warnings about untested rules.
This technique could also be used by SelectionDAG and can be further
extended to detect hot rules and give them priority over colder rules.
Usage:
* Enable LLVM_ENABLE_GISEL_COV in CMake
* Build the compiler and run some tests
* cat gisel-coverage-[0-9]* > gisel-coverage-all
* Delete lib/Target/*/*GenGlobalISel.inc*
* Build the compiler
Known issues:
* ${LLVM_GISEL_COV_PREFIX}-all must be generated as a manual
step due to a lack of a portable 'cat' command. It should be the
concatenation of all ${LLVM_GISEL_COV_PREFIX}-[0-9]* files.
* There's no mechanism to discard coverage information when the ruleset
changes
Depends on D39742
Reviewers: ab, qcolombet, t.p.northover, aditya_nandakumar, rovka
Reviewed By: rovka
Subscribers: vsk, arsenm, nhaehnle, mgorny, kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D39747
llvm-svn: 318356
Due to integer precision, we might have numerator greater than denominator in
the branch probability scaling. Add a check to prevent this from happening.
llvm-svn: 318353
In constructAbstractSubprogramScopeDIE there can be a potential mismatch
between `this` and the CU of ContextDIE when a scope is shared between
two DISubprograms belonging to a different CU. In that case, `this` is
the CU that was specified in the IR, but the CU of ContextDIE is that of
the first subprogram that was emitted. This patch fixes the mismatch by
looking up the CU of ContextDIE, and switching to use that.
This fixes PR35212 (https://bugs.llvm.org/show_bug.cgi?id=35212)
Patch by Philip Craig!
Differential revision: https://reviews.llvm.org/D39981
llvm-svn: 318289
artifacts along with DCE
Legalization Artifacts are all those insts that are there to make the
type system happy. Currently, the target needs to say all combinations
of extends and truncs are legal and there's no way of verifying that
post legalization, we only have *truly* legal instructions. This patch
changes roughly the legalization algorithm to process all illegal insts
at one go, and then process all truncs/extends that were added to
satisfy the type constraints separately trying to combine trivial cases
until they converge. This has the added benefit that, the target
legalizerinfo can only say which truncs and extends are okay and the
artifact combiner would combine away other exts and truncs.
Updated legalization algorithm to roughly the following pseudo code.
WorkList Insts, Artifacts;
collect_all_insts_and_artifacts(Insts, Artifacts);
do {
for (Inst in Insts)
legalizeInstrStep(Inst, Insts, Artifacts);
for (Artifact in Artifacts)
tryCombineArtifact(Artifact, Insts, Artifacts);
} while(!Insts.empty());
Also, wrote a simple wrapper equivalent to SetVector, except for
erasing, it avoids moving all elements over by one and instead just
nulls them out.
llvm-svn: 318210
This patch peels off the top case in switch statement into a branch if the
probability exceeds a threshold. This will help the branch prediction and
avoids the extra compares when lowering into chain of branches.
Differential Revision: http://reviews.llvm.org/D39262
llvm-svn: 318202
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.
This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)
LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.
Differential Revision: https://reviews.llvm.org/D39287
llvm-svn: 318195
Summary:
Bypass of slow divs based on operand values is currently disabled for
-Os. Do the same when profile summary is available and the working set
size of the application is huge. This is similar to how loop peeling is
guarded by hasHugeWorkingSetSize. In the div bypass case, the generated
extra code (and the extra branch) tendss to outweigh the benefits of the
bypass. This results in noticeable performance improvement on an
internal application.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39992
llvm-svn: 318179
TargetLowering::LowerCallTo assumes that sret value type corresponds to a
pointer in default address space, which is incorrect, since sret value type
should correspond to a pointer in alloca address space, which may not
be the default address space. This causes assertion for amdgcn target
in amdgiz environment.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D39996
llvm-svn: 318167
For now at least. We clearly need some kind of comdat or
linkonce_odr support for wasm but currently COMDAT is not
supported.
Disable COMDAT support in the same way we do the Mach-O. This
also causes clang not to generated COMDATs.
Differential Revision: https://reviews.llvm.org/D39873
llvm-svn: 318123
CodeGenPrepare sinks address computations from one basic block to another
and attempts to reuse address computations that have already been sunk. If
the same address computation appears twice with the first instance as an
operand of a load whose result is an operand to a simplifable select,
CodeGenPrepare simplifies the select and recursively erases the now dead
instructions. CodeGenPrepare then attempts to use the erased address
computation for the second load.
Fix this by erasing the cached address value if it has zero uses before
looking for the address value in the sunken address map.
This partially resolves PR35209.
Thanks to Alexander Richardson for reporting the issue!
Reviewers: john.brawn
Differential Revision: https://reviews.llvm.org/D39841
llvm-svn: 318032
Allow a pattern rewriter to be installed in CodeGenDAGPatterns and use it to
correct situations where SelectionDAG and GlobalISel disagree on
representation. For example, it would rewrite:
(sextload:i32 $ptr)<<unindexedload>><<sextload>><<sextloadi16>
to:
(sext:i32 (load:i16 $ptr)<<unindexedload>>)
I'd have preferred to replace the fragments and have the expansion happen
naturally as part of PatFrag expansion but the type inferencing system can't
cope with loads of types narrower than those mentioned in register classes.
This is because the SDTCisInt's on the sext constrain both the result and
operand to the 'legal' integer types (where legal is defined as 'a register
class can contain the type') which immediately rules the narrower types out.
Several targets (those with only one legal integer type) would then go on to
crash on the SDTCisOpSmallerThanOp<> when it removes all the possible types
for the result of the extend.
Also, improve isObviouslySafeToFold() slightly to automatically return true for
neighbouring instructions. There can't be any re-ordering problems if
re-ordering isn't happenning. We'll need to improve it further to handle
sign/zero-extending loads when the extend and load aren't immediate neighbours
though.
llvm-svn: 317971
This is a fix for a bug in r317947. We were supposed to check that all the indices are are constant 0, but instead we're only make sure that indices that are constant are 0. Non-constant indices are being ignored.
llvm-svn: 317950
Currently we can only get a uniform base from a simple GEP with 2 operands. This causes us to miss address folding opportunities for simple global array accesses as the test case shows.
This patch adds support for larger GEPs if the other indices are 0 since those don't require any additional computations to be inserted.
We may also want to handle constant splats of zero here, but I'm leaving that for future work when I have a real world example.
Differential Revision: https://reviews.llvm.org/D39911
llvm-svn: 317947
Summary:
The associated debug value is updated when the virtual source register
of a copy is completely eliminated and replaced with a rematerialize
value in the defed register of the copy. As the debug value now is
associated with another register it also need to be moved, otherwise
the debug value isn't valid.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: MatzeB, llvm-commits, qcolombet
Differential Revision: https://reviews.llvm.org/D38024
llvm-svn: 317880
* The method getRegAllocationHints() is now of bool type instead of void. If
true is returned, regalloc (AllocationOrder) will *only* try to allocate the
hints, as opposed to merely trying them before non-hinted registers.
* TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with
an increase in number of LOCRs.
In this case, it is desired to force the hints even though there is a slight
increase in spilling, because if a non-hinted register would be allocated,
the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR
(Load On Condition) SystemZ instruction must have both operands in either the
low or high part of the 64 bit register.
Reviewers: Quentin Colombet and Ulrich Weigand
https://reviews.llvm.org/D36795
llvm-svn: 317879
Summary: This fixes failure in CodeGen/AArch64/global-merge-group-by-use.ll uncovered by D39245.
Reviewers: ab, asl
Reviewed By: ab
Subscribers: aemerson, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D39635
llvm-svn: 317817
In Rust, a trait can be implemented for any type, and if a trait
object pointer is used for the type, then a virtual table will be
emitted for that trait/type combination.
We would like debuggers to be able to inspect trait objects, which
requires finding the concrete type associated with a given vtable.
This patch changes LLVM so that any type can be passed to
replaceVTableHolder. This allows the Rust compiler to emit the needed
debug info -- associating a vtable with the concrete type for which it
was emitted.
This is a DWARF extension: DWARF only specifies the meaning of
DW_AT_containing_type in one specific situation. This style of DWARF
extension is routine, though, and LLVM already has one such case for
DW_AT_containing_type.
Patch by Tom Tromey!
Differential Revision: https://reviews.llvm.org/D39503
llvm-svn: 317730
This patch implements Chandler's idea [0] for supporting languages that
require support for infinite loops with side effects, such as Rust, providing
part of a solution to bug 965 [1].
Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual
effect, but which appears to optimization passes to have obscure side effects,
such that they don't optimize away loops containing it. It also teaches
several optimization passes to ignore this intrinsic, so that it doesn't
significantly impact optimization in most cases.
As discussed on llvm-dev [2], this patch is the first of two major parts.
The second part, to change LLVM's semantics to have defined behavior
on infinite loops by default, with a function attribute for opting into
potential-undefined-behavior, will be implemented and posted for review in
a separate patch.
[0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html
[1] https://bugs.llvm.org/show_bug.cgi?id=965
[2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html
Differential Revision: https://reviews.llvm.org/D38336
llvm-svn: 317729
This reverts r317579, originally committed as r317100.
There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.
When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
int a, c, d, e, f, g, h, i, j, k, l, m;
void n(int o, int *b) {
if (g)
f = 0;
for (; f < o; f++) {
m = a;
if (l > j * k > i)
j = i = k = d;
h = b[c] - e;
}
}
We get assembly that looks like this:
; BB#1: ; %if.then
Lloh3:
adrp x9, _f@GOTPAGE
Lloh4:
ldr x9, [x9, _f@GOTPAGEOFF]
mov w8, wzr
Lloh5:
str wzr, [x9]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.lt LBB0_3
b LBB0_7
LBB0_2: ; %entry.if.end_crit_edge
Lloh6:
adrp x8, _f@GOTPAGE
Lloh7:
ldr x8, [x8, _f@GOTPAGEOFF]
Lloh8:
ldr w8, [x8]
stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w20, -16
cmp w8, w0
b.ge LBB0_7
LBB0_3: ; %for.body.lr.ph
Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.
llvm-svn: 317726
Previously, hasSideEffects was ? for TargetOpcode::PHI and would be inferred
as 1. D37065 sets the previously inferred properties explicitly. This patch sets
hasSideEffects=0 for PHI, as it is for G_PHI. MachineInstr::isSafeToMove has
been updated so it still returns false for PHI.
Additionally, HexagonBitSimplify relied on a PHI node having the
hasUnmodeledSideEffects property. This patch fixes that assumption.
Differential Revision: https://reviews.llvm.org/D37097
llvm-svn: 317721
In 2010 a commit with no testcase and no further explanation
explicitly disabled the handling of inlined variables in
EmitFuncArgumentDbgValue(). I don't think there is a good reason for
this any more and re-enabling this adds debug locations for variables
associated with an LLVM function argument in functions that are
inlined into the first basic block. The only downside of doing this is
that we may insert a DBG_VALUE before the inlined scope, but (1) this
could be filtered out later, and (2) LiveDebugValues will not
propagate it into subsequent basic blocks if they don't dominate the
variable's lexical scope, so this seems like a small price to pay.
rdar://problem/26228128
llvm-svn: 317702
Some of the AMDGPU stack addressing modes require knowing the sign
bit is zero. We used to accomplish this by custom lowering
frame indexes, and then putting an AssertZext around a
TargetFrameIndex. This required specifically looking for
the AssextZext + frame index pattern which was moderately
disgusting. The same could probably be accomplished
with a target specific node, but would still
require special handling of frame indexes.
llvm-svn: 317671
This patch enables the folding of address computation in
memory instruction in case adress is represented by Phi node.
The inputs of Phi node might be different in base register.
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317665
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.
llvm-svn: 317647
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().
Original r317100 message:
"Correct dwarf unwind information in function epilogue for X86"
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
llvm-svn: 317579
This changes the interface of how targets describe how to legalize, see
the below description.
1. Interface for targets to describe how to legalize.
In GlobalISel, the API in the LegalizerInfo class is the main interface
for targets to specify which types are legal for which operations, and
what to do to turn illegal type/operation combinations into legal ones.
For each operation the type sizes that can be legalized without having
to change the size of the type are specified with a call to setAction.
This isn't different to how GlobalISel worked before. For example, for a
target that supports 32 and 64 bit adds natively:
for (auto Ty : {s32, s64})
setAction({G_ADD, 0, s32}, Legal);
or for a target that needs a library call for a 32 bit division:
setAction({G_SDIV, s32}, Libcall);
The main conceptual change to the LegalizerInfo API, is in specifying
how to legalize the type sizes for which a change of size is needed. For
example, in the above example, how to specify how all types from i1 to
i8388607 (apart from s32 and s64 which are legal) need to be legalized
and expressed in terms of operations on the available legal sizes
(again, i32 and i64 in this case). Before, the implementation only
allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0,
s128}, NarrowScalar). A worse limitation was that if you'd wanted to
specify how to legalize all the sized types as allowed by the LLVM-IR
LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times
and probably would need a lot of memory to store all of these
specifications.
Instead, the legalization actions that need to change the size of the
type are specified now using a "SizeChangeStrategy". For example:
setLegalizeScalarToDifferentSizeStrategy(
G_ADD, 0, widenToLargerAndNarrowToLargest);
This example indicates that for type sizes for which there is a larger
size that can be legalized towards, do it by Widening the size.
For example, G_ADD on s17 will be legalized by first doing WidenScalar
to make it s32, after which it's legal.
The "NarrowToLargest" indicates what to do if there is no larger size
that can be legalized towards. E.g. G_ADD on s92 will be legalized by
doing NarrowScalar to s64.
Another example, taken from the ARM backend is:
for (unsigned Op : {G_SDIV, G_UDIV}) {
setLegalizeScalarToDifferentSizeStrategy(Op, 0,
widenToLargerTypesUnsupportedOtherwise);
if (ST.hasDivideInARMMode())
setAction({Op, s32}, Legal);
else
setAction({Op, s32}, Libcall);
}
For this example, G_SDIV on s8, on a target without a divide
instruction, would be legalized by first doing action (WidenScalar,
s32), followed by (Libcall, s32).
The same principle is also followed for when the number of vector lanes
on vector data types need to be changed, e.g.:
setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal);
setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal);
setLegalizeVectorElementToDifferentSizeStrategy(
G_ADD, 0, widenToLargerTypesUnsupportedOtherwise);
As currently implemented here, vector types are legalized by first
making the vector element size legal, followed by then making the number
of lanes legal. The strategy to follow in the first step is set by a
call to setLegalizeVectorElementToDifferentSizeStrategy, see example
above. The strategy followed in the second step
"moreToWiderTypesAndLessToWidest" (see code for its definition),
indicating that vectors are widened to more elements so they map to
natively supported vector widths, or when there isn't a legal wider
vector, split the vector to map it to the widest vector supported.
Therefore, for the above specification, some example legalizations are:
* getAction({G_ADD, LLT::vector(3, 3)})
returns {WidenScalar, LLT::vector(3, 8)}
* getAction({G_ADD, LLT::vector(3, 8)})
then returns {MoreElements, LLT::vector(8, 8)}
* getAction({G_ADD, LLT::vector(20, 8)})
returns {FewerElements, LLT::vector(16, 8)}
2. Key implementation aspects.
How to legalize a specific (operation, type index, size) tuple is
represented by mapping intervals of integers representing a range of
size types to an action to take, e.g.:
setScalarAction({G_ADD, LLT:scalar(1)},
{{1, WidenScalar}, // bit sizes [ 1, 31[
{32, Legal}, // bit sizes [32, 33[
{33, WidenScalar}, // bit sizes [33, 64[
{64, Legal}, // bit sizes [64, 65[
{65, NarrowScalar} // bit sizes [65, +inf[
});
Please note that most of the code to do the actual lowering of
non-power-of-2 sized types is currently missing, this is just trying to
make it possible for targets to specify what is legal, and how non-legal
types should be legalized. Probably quite a bit of further work is
needed in the actual legalizing and the other passes in GlobalISel to
support non-power-of-2 sized types.
I hope the documentation in LegalizerInfo.h and the examples provided in the
various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well
enough how this is meant to be used.
This drops the need for LLT::{half,double}...Size().
Differential Revision: https://reviews.llvm.org/D30529
llvm-svn: 317560
This patch disables the handling of selects in optimization
extensing scope of optimizeMemoryInst.
The optimization itself is disable by default.
The idea here is just to switch optimiztion level step by step.
Specifically, first optimization will be enabled only for Phi nodes,
then select instructions will be added.
In case someone will complain about perfromance it will be easier to
detect what part of optimizations is responsible for that.
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317555
Summary:
Print %subreg.<subregidxname> instead of just the subregister
index when printing immediate operands corresponding to subreg
indices in INSERT_SUBREG, EXTRACT_SUBREG, SUBREG_TO_REG and
REG_SEQUENCE.
Reviewers: qcolombet, MatzeB
Reviewed By: MatzeB
Subscribers: nhaehnle, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D39696
llvm-svn: 317513
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html
and again more recently:
http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html
...this is a step in cleaning up our fast-math-flags implementation in IR to better match
the capabilities of both clang's user-visible flags and the backend's flags for SDNode.
As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the
'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic
reassociation - 'AllowReassoc'.
We're also adding a bit to allow approximations for library functions called 'ApproxFunc'
(this was initially proposed as 'libm' or similar).
...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did
look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits),
but that's apparently already used for other purposes. Also, I don't think we can just
add a field to FPMathOperator because Operator is not intended to be instantiated.
We'll defer movement of FMF to another day.
We keep the 'fast' keyword. I thought about removing that, but seeing IR like this:
%f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2
...made me think we want to keep the shortcut synonym.
Finally, this change is binary incompatible with existing IR as seen in the
compatibility tests. This statement:
"Newer releases can ignore features from older releases, but they cannot miscompile
them. For example, if nsw is ever replaced with something else, dropping it would be
a valid way to upgrade the IR."
( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility )
...provides the flexibility we want to make this change without requiring a new IR
version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will
fail to optimize some previously 'fast' code because it's no longer recognized as
'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'.
Note: an inter-dependent clang commit to use the new API name should closely follow
commit.
Differential Revision: https://reviews.llvm.org/D39304
llvm-svn: 317488
This is an implementation of PR26223.
Currently optimizeMemoryInst optimization tries to fold address computation
if all possible way to get compute the address are of the form
baseGV + base + scale * Index + offset
where scale and offset are constants and baseGV, base and Index are exactly
the same instructions if defined.
The patch extends this optimization to allow different bases. In this case
it tries to find/build a Phi node merging all possible bases and use this Phi node
as a base for sunk address computation. Also it supports Select instruction on
the way.
The main motivation for this scope extension is GCRelocateInst.
If there is a relocation of derived pointer it will be represented as relocation of base + offset.
Also there will be a Phi node merging address computation for relocated derived pointer
and derived pointer itself. If we have a Phi node merging original base and relocated base
and can fold the address computation of derived pointer then we can potentially reduce
the code size and Phi node for derived pointer. The later can have a positive impact to
register allocator.
Reviewers: efriedma, dberlin, mkazantsev, reames, john.brawn
Reviewed By: john.brawn
Subscribers: javed.absar, john.brawn, dneilson, llvm-commits
Differential Revision: https://reviews.llvm.org/D36073
llvm-svn: 317429
This header already includes a CodeGen header and is implemented in
lib/CodeGen, so move the header there to match.
This fixes a link error with modular codegeneration builds - where a
header and its implementation are circularly dependent and so need to be
in the same library, not split between two like this.
llvm-svn: 317379
This preserves the debug info for the cast operation in the original location.
rdar://problem/33460652
Reapplied r317340 with the test moved into an ARM-specific directory.
llvm-svn: 317375
DenseMaps require the definition of a type to be available when using a
pointer to that type as a key to know how many bits are available for
tombstone/etc.
llvm-svn: 317360
Make doSpillCalleeSavedRegs a member function, instead of passing most of the
members of PEI as arguments.
Differential Review: https://reviews.llvm.org/D35642
llvm-svn: 317309
mir-canon (MIRCanonicalizerPass) is a pass designed to reorder instructions and
rename operands so that two similar programs will diff more cleanly after being
run through mir-canon than they would otherwise. This project is still a work
in progress and there are ideas still being discussed for improving diff
quality.
M include/llvm/InitializePasses.h
M lib/CodeGen/CMakeLists.txt
M lib/CodeGen/CodeGen.cpp
A lib/CodeGen/MIRCanonicalizerPass.cpp
llvm-svn: 317285
Summary:
Currently the block frequency analysis is an approximation for irreducible
loops.
The new irreducible loop metadata is used to annotate the irreducible loop
headers with their header weights based on the PGO profile (currently this is
approximated to be evenly weighted) and to help improve the accuracy of the
block frequency analysis for irreducible loops.
This patch is a basic support for this.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D39028
llvm-svn: 317278
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage
This reverts commit eea333c33fa73ad225ef28607795984829f65688.
llvm-svn: 317213
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.
See the discussion in PR33325 for more details.
Reviewers: spatel
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D39456
llvm-svn: 317211
When splitting a large load to smaller legally-typed loads, the last load should be padded to reach the size of the previous one so a CONCAT_VECTORS node could reunite them again.
The code currently pads the last load to reach the size of the first load (instead of the previous).
Differential Revision: https://reviews.llvm.org/D38495
Change-Id: Ib60b55ed26ce901fabf68108daf52683fbd5013f
llvm-svn: 317206
As of today we only use .cfi_offset to specify the offset of a CSR, but
we never use .cfi_restore when the CSR is restored.
If we want to perform a more advanced type of shrink-wrapping, we need
to use .cfi_restore in order to switch the CFI state between blocks.
This patch only aims at adding support for the directive.
Differential Revision: https://reviews.llvm.org/D36114
llvm-svn: 317199
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Changed CFI instructions so that they:
- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal
Added CFIInstrInserter pass:
- analyzes each basic block to determine cfa offset and register valid at
its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D35844
llvm-svn: 317100
Change the map key from DIFile* to the absolute path string. Computing
the absolute path isn't expensive because we already have a map that
caches the full path keyed on DIFile*.
llvm-svn: 317041
Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725
If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like.
llvm-svn: 316967
Summary:
For reference, see: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116589.html
This patch fleshes out the instruction class hierarchy with respect to atomic and
non-atomic memory intrinsics. With this change, the relevant part of the class
hierarchy becomes:
IntrinsicInst
-> MemIntrinsicBase (methods-only class)
-> MemIntrinsic (non-atomic intrinsics)
-> MemSetInst
-> MemTransferInst
-> MemCpyInst
-> MemMoveInst
-> AtomicMemIntrinsic (atomic intrinsics)
-> AtomicMemSetInst
-> AtomicMemTransferInst
-> AtomicMemCpyInst
-> AtomicMemMoveInst
-> AnyMemIntrinsic (both atomicities)
-> AnyMemSetInst
-> AnyMemTransferInst
-> AnyMemCpyInst
-> AnyMemMoveInst
This involves some class renaming:
ElementUnorderedAtomicMemCpyInst -> AtomicMemCpyInst
ElementUnorderedAtomicMemMoveInst -> AtomicMemMoveInst
ElementUnorderedAtomicMemSetInst -> AtomicMemSetInst
A script for doing this renaming in downstream trees is included below.
An example of where the Any* classes should be used in LLVM is when reasoning
about the effects of an instruction (ex: aliasing).
---
Script for renaming AtomicMem* classes:
PREFIXES="[<,([:space:]]"
CLASSES="MemIntrinsic|MemTransferInst|MemSetInst|MemMoveInst|MemCpyInst"
SUFFIXES="[;)>,[:space:]]"
REGEX="(${PREFIXES})ElementUnorderedAtomic(${CLASSES})(${SUFFIXES})"
REGEX2="visitElementUnorderedAtomic(${CLASSES})"
FILES=$( grep -E "(${REGEX}|${REGEX2})" -r . | tr ':' ' ' | awk '{print $1}' | sort | uniq )
SED_SCRIPT="s~${REGEX}~\1Atomic\2\3~g"
SED_SCRIPT2="s~${REGEX2}~visitAtomic\1~g"
for f in $FILES; do
echo "Processing: $f"
sed -i ".bak" -E "${SED_SCRIPT};${SED_SCRIPT2};${EA_SED_SCRIPT};${EA_SED_SCRIPT2}" $f
done
Reviewers: sanjoy, deadalnix, apilipenko, anna, skatkov, mkazantsev
Reviewed By: sanjoy
Subscribers: hfinkel, jholewinski, arsenm, sdardis, nhaehnle, JDevlieghere, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38419
llvm-svn: 316950
- Targets that want to support memcmp expansions now return the list of
supported load sizes.
- Expansion codegen does not assume that all power-of-two load sizes
smaller than the max load size are valid. For examples, this is not the
case for x86(32bit)+sse2.
Fixes PR34887.
llvm-svn: 316905
Introduce a isConstOrDemandedConstSplat helper function that can recognise a constant splat build vector for at least the demanded elts we care about.
llvm-svn: 316866
For cases where we know the floating point representations match the bitcasted integer equivalent, allow bitcasting to these types.
This is especially useful for the X86 floating point compare results which return all/zero bits but as a floating point type.
Differential Revision: https://reviews.llvm.org/D39289
llvm-svn: 316831
In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then
if sext is the only user of extload, there is no big difference, no harm no benefit.
if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case.
This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload.
Differential Revision: https://reviews.llvm.org/D39108
llvm-svn: 316802
Not having the subclass data on an MemIntrinsicSDNodes means it was possible
to try to fold 2 nodes with the same operands but differing MMO flags. This
would trip an assertion when trying to refine the alignment between the 2
MachineMemOperands.
Differential Revision: https://reviews.llvm.org/D38898
llvm-svn: 316737
Summary:
Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming
block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case.
Reviewers: efriedma, junbuml, davidxl, hfinkel, qcolombet
Reviewed By: efriedma
Subscribers: dberlin, kuhar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37343
llvm-svn: 316711
Summary:
This seems to be the only place in llvm we directly call qsort. We can replace
this with a call to array_pod_sort. Also minor cleanup of the sorting function.
Reviewers: bkramer, Eugene.Zelenko, rafael
Reviewed By: bkramer
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D39214
llvm-svn: 316671
Summary: Make sure shifts are legal/specified by the legalizerinfo before creating it
Reviewers: qcolombet, dsanders, rovka, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39264
llvm-svn: 316602
Compute the actual decomposition only after deciding whether to expand
of not. Else, it's easy to make the compiler OOM with:
`memcpy(dst, src, 0xffffffffffffffff);`, which typically happens if
someone mistakenly passes a negative value. Add a test.
This reverts commit f8fc02fbd4ab33383c010d33675acf9763d0bd44.
llvm-svn: 316567
Duplicated code found in three places put into a new static function:
/// Given a Count of resource usage and a Latency value, return true if a
/// SchedBoundary becomes resource limited.
static bool checkResourceLimit(unsigned LFactor, unsigned Count,
unsigned Latency) {
return (int)(Count - (Latency * LFactor)) > (int)LFactor;
}
Review: Florian Hahn, Matthias Braun
https://reviews.llvm.org/D39235
llvm-svn: 316560
This code added in r297930 assumed that it could create
a select with a condition type that is just an integer
bitcast of the selected type. For AMDGPU any vselect is
going to be scalarized (although the vector types are legal),
and all select conditions must be i1 (the same as getSetCCResultType).
This logic doesn't really make sense to me, but there's
never really been a consistent policy in what the select
condition mask type is supposed to be. Try to extend
the logic for skipping the transform for condition types
that aren't setccs. It doesn't seem quite right to me though,
but checking conditions that seem more sensible (like whether the
vselect is going to be expanded) doesn't work since this
seems to depend on that also.
llvm-svn: 316554
Similar to how llvm::salvagDebugInfo hooks into InstCombine, this adds
a hook that can be invoked before an SDNode that is associated with an
SDDbgValue is erased to capture the effect of the deleted node in a
DIExpression.
The motivating example is an SDDebugValue attached to an ADD operation
that gets folded into a LOAD+OFFSET operation.
rdar://problem/32121503
llvm-svn: 316525
This reverts commit r316417, which causes internal compiles to OOM.
I don't unfortunately have a self-contained test case but will follow up
with courbet.
llvm-svn: 316497
This updates the MIRPrinter to include the regclass when printing
virtual register defs, which is already valid syntax for the
parser. That is, given 64 bit %0 and %1 in a "gpr" regbank,
%1(s64) = COPY %0(s64)
would now be written as
%1:gpr(s64) = COPY %0(s64)
While this change alone introduces a bit of redundancy with the
registers block, it allows us to update the tests to be more concise
and understandable and brings us closer to being able to remove the
registers block completely.
Note: We generally only print the class in defs, but there is one
exception. If there are uses without any defs whatsoever, we'll print
the class on all uses. I'm not completely convinced this comes up in
meaningful machine IR, but for now the MIRParser and MachineVerifier
both accept that kind of stuff, so we don't want to have a situation
where we can print something we can't parse.
llvm-svn: 316479
Refactor ExpandMemcmp:
- Stop duplicating the logic for computation of the sequence of loads to
generate (thsi was done in three different places), this is now done
only once in MemCmpExpansion::MemCmpExpansion().
- Add a FIXME to expose a bug with the computation of the number of loads
when not all sizes are loadable. For example, on X86-32 + SSE, possible
loads are {16,4,2,1} bytes. The current code considers that all loads
starting at MaxLoadSize are possible. This is not an issue right now as
vector loads are not enabled, so I'm not fixing the issue here to keep
the change as small as possible. I'm going to address this in a
subsequent revision, where I enable vector loads.
See https://bugs.llvm.org/show_bug.cgi?id=34887
Differential Revision: https://reviews.llvm.org/D38498
llvm-svn: 316417
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.
Reviewers:
aaboud
zvi
craig.topper
gadi.haber
Differential revision: https://reviews.llvm.org/D34393
Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
This commit adds optimisation remarks for outlining which fire when a function
is successfully outlined.
To do this, OutlinedFunctions must now contain references to their Candidates.
Since the Candidates must still be sorted and worked on separately, this is
done by working on everything in terms of shared_ptrs to Candidates. This is
good; it means that we can easily move everything to outlining in terms of
the OutlinedFunctions rather than the individual Candidates. This is far more
intuitive than what's currently there!
(Remarks are output when a function is created for some group of Candidates.
In a later commit, all of the outlining logic should be rewritten so that we
loop over OutlinedFunctions rather than over Candidates.)
llvm-svn: 316396
This fixes a bug where we'd crash given code like the test-case from
https://bugs.llvm.org/show_bug.cgi?id=30792 . Instead, we let the
offending clobber silently slide through.
This doesn't fully fix said bug, since the assembler will still complain
the moment it sees a crypto/fp/vector op, and we still don't diagnose
calls that require vector regs.
Differential Revision: https://reviews.llvm.org/D39030
llvm-svn: 316374
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.
This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.
Differential Revision: https://reviews.llvm.org/D38696
llvm-svn: 316331
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810
This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload
Such sequences are created in 2 scenarios:
Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills
As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).
Differential Revision: https://reviews.llvm.org/D35816
Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
We don't need to do any additional recursion, we just need to analyze the APInt stored in the node. This matches what the ValueTracking versions do for IR.
llvm-svn: 316256
Summary:
We shouldn't recurse any further but it doesn't mean we shouldn't be able to give the known bits for a constant. The caller would probably like that we always return the right answer for a constant RHS. This matches what InstCombine does in this case.
I don't have a test case because this showed up while trying to revive D31724.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: arsenm, llvm-commits
Differential Revision: https://reviews.llvm.org/D38967
llvm-svn: 316255
Move the prune logic in pruneOverlaps to a new function, prune. This lets us
reuse the prune functionality. Makes the code a bit more readable. It'll also
make it easier to emit remarks/debug statements for pruned functions.
llvm-svn: 316031
This commit moves the decrement logic for outlined functions into the class,
and makes OccurrenceCount private. It can now be accessed via
getOccurrenceCount().
This makes it more difficult to accidentally introduce bugs by incorrectly
decrementing the occurrence count on OutlinedFunctions.
llvm-svn: 316020
Cleanup to Candidate that moves all end index calculations into
Candidate.endIdx(). For the sake of consistency, StartIdx and Len are now
private members, and can be accessed with length() and startIdx() respectively.
llvm-svn: 316019
Summary:
It seems that negative offset was accidentally allowed in D17967.
AFAICT small negative offset should be valid (always raise segfault) on all archs that I'm aware of (especially x86, which is the only one with this optimization enabled) and such case can be useful when loading hiden metadata from an object.
However, like the positive side, it should only be done within a certain limit.
For now, use the same limit on the positive side for the negative side.
A separate option can be added if needs appear.
Reviewers: mcrosier, skatkov
Reviewed By: skatkov
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38925
llvm-svn: 315991
Minor addition and follow up of r314773 and r311533: this adds more
debug messages to the type legalizer. For each node, it dumps
legalization info for results and operands nodes, rather than just the
final legalized node.
Differential Revision: https://reviews.llvm.org/D38726
llvm-svn: 315904
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
The previous commit failed on MSVC due to a failure to convert an
initializer_list to a std::vector. Hopefully, MSVC will accept this version.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
llvm-svn: 315887
Summary:
iPTR is a pointer of subtarget-specific size to any address space. Therefore
type checks on this size derive the SizeInBits from a subtarget hook.
At this point, we can import the simplests G_LOAD rules and select load
instructions using them. Further patches will support for the predicates to
enable additional loads as well as the stores.
Depends on D37457
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37458
llvm-svn: 315885
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Hopefully fixed the ambiguous constructor that a large number of bots reported.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
llvm-svn: 315869
Summary:
It's possible for a ComplexPattern to be used as an operator in a match
pattern. This is used by the load/store patterns in AArch64 to name the
suboperands returned by ComplexPattern predicate so that they can be broken
apart and referenced independently in the result pattern.
This patch adds support for this in order to enable the import of load/store
patterns.
Depends on D37445
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: qcolombet
Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D37456
llvm-svn: 315863
TargetRegisterInfo::getMinimalPhysRegClass is actually pretty expensive
because it has to iterate over all the register classes.
Cache this information as we need and get it so that we limit its usage.
Right now, we heavily rely on it, because this is how we get the mapping
for vregs defined by copies from physreg (i.e., the one that are ABI
related).
Improve compile time by up to 10% for that pass.
NFC
llvm-svn: 315759