Summary:
`extsw` and `sldi` are supposed to be combined if they are in the same
BB in instruction selection phase. This patch handles the case where
extsw and sldi are not in the same BB.
Differential Revision: https://reviews.llvm.org/D63806
llvm-svn: 365430
The indirect call sequence on PPC requires that the TOC base register be saved
prior to the indirect call and restored after the call since the indirect call
may branch to a global entry point in another DSO which will update the TOC
base. Over the last couple of years, we have improved this to:
- be able to hoist TOC saves from loops (with changes to MachineLICM)
- avoid multiple saves when one dominates the other[s]
However, it is still possible to have multiple TOC saves dynamically in the
execution path if there is no dominance relationship between them.
This patch moves the TOC save to the prologue when one of the TOC saves is in a
block that post-dominates entry (i.e. it cannot be avoided) or if it is in a
block that is hotter than entry.
Differential revision: https://reviews.llvm.org/D63803
llvm-svn: 365232
Summary:
"ww" and "ws" are both constraint codes for VSX vector registers that
hold scalar double data. "ww" is preferred for float while "ws" is
preferred for double.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D64119
llvm-svn: 365106
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
llvm-svn: 365010
After implemented this hook, we will model the memory dependency in the scheduling dependency graph more precise,
and will have more opportunity to reorder the load/stores, as they didn't have the dependency at some condition
Differential Revision: https://reviews.llvm.org/D63804
llvm-svn: 364886
FeatureFusion bits was first introduced in
https://reviews.llvm.org/rL253724. for add/load integer fusion for P8.
The only use of `hasFusion` was https://reviews.llvm.org/rL255319.
However, this was removed later in https://reviews.llvm.org/rL280440.
So, there is NO any reference to fusion in code now.
Leaving it there is misleading and confusing, so remove it for now.
We can alwasy add back if we ever support fusion in the future.
llvm-svn: 364581
This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751
llvm-mc aborted when disassembling tabortdc.
This patch try to clean up TM related DAGs.
* Fixes the problem by remove explicit output of cr0, and put it as implicit def.
* Update int_ppc_tbegin pattern to accommodate the implicit def of cr0.
* Update the TCHECK operand and int_ppc_tcheck accordingly.
* Add some builtin test and disassembly tests.
* Remove unused CRRC0/crrc0
Differential Revision: https://reviews.llvm.org/D61935
llvm-svn: 364544
PPCMIPeephole::emitRLDICWhenLoweringJumpTables should return a bool
value to indicate optimization is conducted or not.
Differential Revision: https://reviews.llvm.org/D63801
llvm-svn: 364383
This was just an omission in the back end. We have had the instructions for both
single and double precision for a few HW generations, but never got around to
legalizing these.
Differential revision: https://reviews.llvm.org/D63634
llvm-svn: 364373
Summary:
In Secure PLT ABI, -fpic is similar to -fPIC. The differences are that:
* -fpic stores the address of _GLOBAL_OFFSET_TABLE_ in r30, while -fPIC stores .got2+0x8000.
* -fpic uses an addend of 0 for R_PPC_PLTREL24, while -fPIC uses 0x8000.
Reviewers: hfinkel, jhibbits, joerg, nemanjai, spetrovic
Reviewed By: jhibbits
Subscribers: adalava, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63563
llvm-svn: 364324
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
After r248261, the indentation switches, inside a namespace definition,
between indenting and not indenting one level in for that namespace; the
abomination occurs in the middle of a class definition. Fix that.
llvm-svn: 364133
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
llvm-svn: 363526
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D61228
llvm-svn: 363495
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
Implement necessary target hooks to enable MachinePipeliner for P9 only.
The pass is off by default, can be enabled with -ppc-enable-pipeliner for P9.
Differential Revision: https://reviews.llvm.org/D62164
llvm-svn: 363085
This reverts r362990 (git commit 374571301d)
This was causing linker warnings on Darwin:
ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.
llvm-svn: 363028
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
llvm-svn: 362990
This was found during HTM cleanup.
Adding a test for builtin_ttest would expose following issue.
*** Bad machine code: Illegal physical register for instruction ***
- function: test10
- basic block: %bb.0 entry (0xf0e57497b58)
- instruction: %5:crrc0 = TABORTWCI 0, $zero, 0
- operand 2: $zero
$zero is not a GPRC register.
LLVM ERROR: Found 1 machine code errors.
Differential Revision: https://reviews.llvm.org/D63079
llvm-svn: 362974
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
llvm-svn: 362774
Use the PPC vector min/max instructions for computing the corresponding
operation as these should be faster than the compare/select sequences
we currently emit.
Differential revision: https://reviews.llvm.org/D47332
llvm-svn: 362759
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
Summary:
This patch implements SDAG call lowering on AIX for functions
which only have parameters that could fit into GPRs.
Reviewers: hubert.reinterpretcast, syzaara
Differential Revision: https://reviews.llvm.org/D62823
llvm-svn: 362708
Generally speaking, we lower to an optimal rotate sequence for nodes visible in
the SDAG. However, there are instances where the two rotates are not visible at
ISEL time - most notably those in a very common sequence when lowering switch
statements to jump tables.
A common situation is a switch on a 32-bit integer. This value has to have the
upper 32 bits cleared and because jump table offsets are word offsets, the value
needs to be shifted left by 2 bits. We currently emit the clear and the left
shift as two separate instructions, but this is not needed as we can lower it to
a single RLDIC.
This patch just cleans that up.
Differential revision: https://reviews.llvm.org/D60402
llvm-svn: 362576
This is to address some of the problems in existing P9 resource modeling,
especially about the dispatching rules.
Instead of using a hypothetical DISPATCHER , we try to use the number of
actual dispatch slots, and define SchedWriteRes to model dispatch rules,
then update instruction classes according to dispatch rules.
All the dispatch rules and instruction classes update are made according
to POWER9 User Manual.
Differential Revision: https://reviews.llvm.org/D61873
llvm-svn: 362509