As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.
Differential Revision: https://reviews.llvm.org/D35007
llvm-svn: 307934
The issue is not if the value is pcrel. It is whether we have a
relocation or not.
If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.
llvm-svn: 307730
1. The available program storage region of the red zone to compilers is 288
bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).
Differential Revision: https://reviews.llvm.org/D34337
llvm-svn: 307672
In the POWER9 instruction scheduler, SchedWriteRes for the simple integer instructions are misconfigured to use that of (costly) DFU instructions.
This results in surprisingly long instruction latency estimation and causes misbehavior in some optimizers such as if-conversion.
Differential Revision: https://reviews.llvm.org/D34869
llvm-svn: 307624
This patch reduces compilation time by avoiding redundant analysis while selecting instructions to create an immediate.
If the instruction count required to create the input number without rotate is 2, we do not need further analysis to find a shorter instruction sequence with rotate; rotate + load constant cannot be done by 1 instruction (i.e. getInt64CountDirectnever return 0).
This patch should not change functionality.
Differential Revision: https://reviews.llvm.org/D34986
llvm-svn: 307623
For this example:
float test (int *arr) {
return arr[2];
}
We currently generate the following code:
li r4, 8
lxsiwax f0, r3, r4
xscvsxdsp f1, f0
With this patch, we will now generate:
addi r3, r3, 8
lxsiwax f0, 0, r3
xscvsxdsp f1, f0
Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204
Differential Revision: https://reviews.llvm.org/D35027
llvm-svn: 307553
On power 8 we sometimes insert swaps to deal with the difference between
Little-Endian and Big-Endian. The swap removal pass is supposed to clean up
these swaps. On power 9 we don't need this pass since we do not need to insert
the swaps in the first place.
Commiting on behalf of Stefan Pintilie.
Differential Revision: https://reviews.llvm.org/D34627
llvm-svn: 307185
This patch adds the exploitation for new power 9 instructions which extract
variable elements from vectors:
VEXTUBLX
VEXTUBRX
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX
Differential Revision: https://reviews.llvm.org/D34032
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
llvm-svn: 307174
This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign
extended vector extract elements where the indices used by the vector extract
are not correct. We can still use the new hardware instructions by adding a
shuffle to move the elements to the correct indices. I introduced a new PPCISD
node here because adding a vector_shuffle and changing the elements of the
vector_extracts was getting undone by another DAG combine.
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
Differential Revision: https://reviews.llvm.org/D34009
llvm-svn: 307169
This patch fixes a verification error with -verify-machineinstrs while expanding __tls_get_addr by not creating ADJCALLSTACKUP and ADJCALLSTACKDOWN if there is another ADJCALLSTACKUP in this basic block since nesting ADJCALLSTACKUP/ADJCALLSTACKDOWN is not allowed.
Here, ADJCALLSTACKUP and ADJCALLSTACKDOWN are created as a fence for instruction scheduling to avoid _tls_get_addr is scheduled before mflr in the prologue (https://bugs.llvm.org//show_bug.cgi?id=25839). So if another ADJCALLSTACKUP exists before _tls_get_addr, we do not need to create a new ADJCALLSTACKUP.
Differential Revision: https://reviews.llvm.org/D34347
llvm-svn: 306678
PowerPC backend does not pass the current optimization level to SelectionDAGISel and so SelectionDAGISel works with the default optimization level regardless of the current optimization level.
This patch makes the PowerPC backend set the optimization level correctly.
Differential Revision: https://reviews.llvm.org/D34615
llvm-svn: 306367
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
llvm-svn: 306177
Define target hook isReallyTriviallyReMaterializable() to explicitly specify
PowerPC instructions that are trivially rematerializable. This will allow
the MachineLICM pass to accurately identify PPC instructions that should always
be hoisted.
Differential Revision: https://reviews.llvm.org/D34255
llvm-svn: 305932
Summary:
This is my misunderstanding on isBarrier. It's not for memory barriers,
but for other control flow purposes. lwsync doesn't have it either.
This fixes a simple crash with -verify-machineinstrs like below:
define void @Foo() {
entry:
%tmp = load atomic i64, i64* undef acquire, align 8
unreachable
}
I deliberately don't want to check in the test, since there is little
chance to regress on such a mistake. Such a test adds noise to the code
base.
I plan to check in first, since it fixes a crash, and the fix is obvious.
Reviewers: kbarton, echristo
Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34314
llvm-svn: 305624
Add condition for MachineLICM to safely hoist instructions that utilize
non constant registers that are reserved.
On PPC, global variable access is done through the table of contents (TOC)
which is always in register X2. The ABI reserves this register in any
functions that have calls or access global variables.
A call through a function pointer involves saving, changing and restoring
this register around the call and thus MachineLICM does not consider it to
be invariant. We can however guarantee the register is preserved across the
call and thus is invariant.
Differential Revision: https://reviews.llvm.org/D33562
llvm-svn: 305490
This patch fixes a potential verification error (64-bit register operands for cmpw) with -verify-machineinstrs.
Differential Revision: https://reviews.llvm.org/D34208
llvm-svn: 305479
Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.
Differential Revision: https://reviews.llvm.org/D33690
llvm-svn: 305214
Note that if we need the result of both the divide and the modulo then we
compute the modulo based on the result of the divide and not using the new
hardware instruction.
Commit on behalf of STEFAN PINTILIE.
Differential Revision: https://reviews.llvm.org/D33940
llvm-svn: 305210
This step is just intended to reduce code duplication rather than change any functionality.
A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.
Differential Revision: https://reviews.llvm.org/D33649
llvm-svn: 305192
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.
Reviewers: chandlerc, rnk, reames
Reviewed By: reames
Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D33903
llvm-svn: 305189
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.
This patch fixed PR32442.
Differential Revision: https://reviews.llvm.org/D31407
llvm-svn: 305001
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.
Differential Revision: https://reviews.llvm.org/D33843
llvm-svn: 304864
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
This adds a callback to the LLVMTargetMachine that lets target indicate
that they do not pass the machine verifier checks in all cases yet.
This is intended to be a temporary measure while the targets are fixed
allowing us to enable the machine verifier by default with
EXPENSIVE_CHECKS enabled!
Differential Revision: https://reviews.llvm.org/D33696
llvm-svn: 304320
Fixes PPCTTIImpl::getCacheLineSize() returning the wrong cache line size for
newer ppc processors.
Commiting on behalf of Stefan Pintilie.
Differential Revision: https://reviews.llvm.org/D33656
llvm-svn: 304317
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.
Differential Revision: https://reviews.llvm.org/D28637
llvm-svn: 304313
There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI
Instruction, this patch recognizes them and does the selection to improve
the PPC performance.
Differential Revision: https://reviews.llvm.org/D33404
llvm-svn: 304298
This patch builds upon https://reviews.llvm.org/rL302810 to add
handling for bitwise logical operations in general purpose registers.
The idea is to keep the values in GPRs as long as possible - only
extracting them to a condition register bit when no further operations
are to be done.
Differential Revision: https://reviews.llvm.org/D31851
llvm-svn: 304282
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.
While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.
llvm-svn: 304247
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".
This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.
Reviewers: spatel, RKSimon, efriedma
Reviewed By: RKSimon
Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits
Differential Revision: https://reviews.llvm.org/D33530
llvm-svn: 304215
Summary
clang -c -mcpu=pwr9 test/CodeGen/PowerPC/build-vector-tests.ll causes an assertion failure during the binary encoding.
The failure occurs when a D-form load instruction takes two register operands instead of a register + an immediate.
This patch fixes the problem and also adds an assertion to catch this failure earlier before the binary encoding (i.e. during lit test).
The fix is from Nemanja Ivanovic @nemanjai.
Differential Revision: https://reviews.llvm.org/D33482
llvm-svn: 304133
I forgot to forward the chain, causing some missing instruction
dependencies. The test crashes the compiler without this patch.
Inspired by the test case, D33519 also tries to remove the extra sync.
Differential Revision: https://reviews.llvm.org/D33573
llvm-svn: 303931
PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly
printer. Its size was incorrectly marked as 4, correct it to 8. The
incorrect size can cause incorrect branch relaxation in
PPCBranchSelector under the right conditions.
llvm-svn: 303904
There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI
instruction, this patch recognizes them and does the selection to improve the
PPC performance.
llvm-svn: 303822
PPC backend eliminates compare instructions by using record-form instructions in PPCInstrInfo::optimizeCompareInstr, which is called from peephole optimization pass.
This patch improves this optimization to eliminate more compare instructions in two types of common case.
- comparison against a constant 1 or -1
The record-form instructions set CR bit based on signed comparison against 0. So, the current implementation does not exploit the record-form instruction for comparison against a non-zero constant.
This patch enables record-form optimization for constant of 1 or -1 if possible; it changes the condition "greater than -1" into "greater than or equal to 0" and "less than 1" into "less than or equal to 0".
With this patch, compare can be eliminated in the following code sequence, as an example.
uint64_t a, b;
if ((a | b) & 0x8000000000000000ull) { ... }
else { ... }
- andi for 32-bit comparison on PPC64
Since record-form instructions execute 64-bit signed comparison and so we have limitation in eliminating 32-bit comparison, i.e. with cmplwi, using the record-form. The original implementation already has such checks but andi. is not recognized as an instruction which executes implicit zero extension and hence safe to convert into record-form if used for equality check.
%1 = and i32 %a, 10
%2 = icmp ne i32 %1, 0
br i1 %2, label %foo, label %bar
In this simple example, LLVM generates andi. + cmplwi + beq on PPC64.
This patch make it possible to eliminate the cmplwi for this case.
I added andi. for optimization targets if it is safe to do so.
Differential Revision: https://reviews.llvm.org/D30081
llvm-svn: 303500
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handling a null TargetMachine call
`getAnalysisIfAvailable<TargetPassConfig>`.
* Passes not handling a null TargetMachine
`addRequired<TargetPassConfig>` and call
`getAnalysis<TargetPassConfig>`.
* MachineFunctionPasses now use MF.getTarget().
* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.
This fixes a crash when running `llc -start-before prologepilog`.
PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.
Related to PR30324.
Differential Revision: https://reviews.llvm.org/D33222
llvm-svn: 303360
When legalizing vector operations on vNi128, they will be split to v1i128
because that is a legal type on ppc64, but then the compiler will crash in
selection dag because it fails to select for these operations. This patch fixes
shift operations. Logical shift right and left shift can be performed in the
vector unit, but algebraic shift right requires being split.
Differential Revision: https://reviews.llvm.org/D32774
llvm-svn: 303307
The variables MinGPR/MinG8R were not updated properly when resetting the
offsets, which in the included testcase lead to saving the CR register
in the same location as R30.
This fixes another issue reported in PR26519.
Differential Revision: https://reviews.llvm.org/D33017
llvm-svn: 303257
Summary:
This fixes pr32392.
The lowering pipeline is:
llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in
expandPostRAPseudo.
The reason why expandPostRAPseudo is chosen is because previous passes
are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne-
7, .+4 (some branch pass(s)).
Differential Revision: https://reviews.llvm.org/D32763
llvm-svn: 303205
Summary:
Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc.,
because those are not defined for b > sizeof(a) * 8, even after some of
the combiners run.
However, PPCISD::SHL defines that behavior (as the instructions themselves).
Move the combination to the backend.
The tests in shift_mask.ll still pass.
Reviewers: echristo, hfinkel, efriedma, iteratee
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D33076
llvm-svn: 302937
According to Power ISA V3.0 document, the first source operand of mtvsrdd is constant 0 if r0 is specified. So the corresponding register constraint should be g8rc_nox0.
This bug caused wrong output generated by 401.bzip2 when -mcpu=power9 and fdo are specified.
Differential Revision: https://reviews.llvm.org/D32880
llvm-svn: 302834
This patch is the first in a series of patches to provide code gen for
doing compares in GPRs when the compare result is required in a GPR.
It adds the infrastructure to select GPR sequences for i1->i32 and i1->i64
extensions. This first patch handles equality comparison on i32 operands with
the result sign or zero extended.
Differential Revision: https://reviews.llvm.org/D31847
llvm-svn: 302810
Now both emitLeadingFence and emitTrailingFence take the instruction
itself, instead of taking IsLoad/IsStore pairs.
Instruction::mayReadFromMemory and Instrucion::mayWriteToMemory are used
for determining those two booleans.
The instruction argument is also useful for later D32763, in
emitTrailingFence. For emitLeadingFence, it seems to have cleaner
interface with the proposed change.
Differential Revision: https://reviews.llvm.org/D32762
llvm-svn: 302539
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.
Differential Revision: https://reviews.llvm.org/D32637
llvm-svn: 302262
This happened on the PPC32/SVR4 path and was discovered when building
FreeBSD on PPC32. It was a typo-class error in the frame lowering code.
This fixes PR26519.
llvm-svn: 302183
Summary:
This is the corresponding llvm change to D28037 to ensure no performance
regression.
Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28329
llvm-svn: 301990
Fixes PR30730.
This is a re-commit of a pulled commit. The commit was pulled because some
software projects contained uses of Altivec vectors that violated alignment
requirements. Known issues have now been fixed.
Committing on behalf of Lei Huang.
Differential Revision: https://reviews.llvm.org/D26861
llvm-svn: 301892
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
1. RegisterClass::getSize() is split into two functions:
- TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
- TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
- TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;
This will allow making those values depend on subtarget features in the
future.
Differential Revision: https://reviews.llvm.org/D31783
llvm-svn: 301221
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows. This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.
llvm-svn: 301047
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.
We had a terrific bug due to this in Chromium.
It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.
Differential Revision: https://reviews.llvm.org/D32330
llvm-svn: 301040
This will become asan errors once the patch lands that poisons the
memory after free. The x86 change is a hack, but I don't see how to
solve this properly at the moment.
llvm-svn: 300867
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
Check the legality of ISD::[US]MULO to see whether
Intrinsic::[us]mul_with_overflow will legalize into a function call (and, thus,
will use the CTR register). Fixes PR32485.
Patch by Tim Neumann!
Differential Revision: https://reviews.llvm.org/D31790
llvm-svn: 299910
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401
It's likely that other targets will want to enable this hook for scalar transforms,
and there are probably other patterns that can use bitwise logic to reduce comparisons.
Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).
Differential Revision: https://reviews.llvm.org/D31483
llvm-svn: 299542
A number of backends (AArch64, MIPS, ARM) have been using
MCContext::reportError to report issues such as out-of-range fixup values in
their TgtAsmBackend. This is great, but because MCContext couldn't easily be
threaded through to the adjustFixupValue helper function from its usual
callsite (applyFixup), these backends ended up adding an MCContext* argument
and adding another call to applyFixup to processFixupValue. Adding an
MCContext parameter to applyFixup makes this unnecessary, and even better -
applyFixup can take a reference to MCContext rather than a potentially null
pointer.
Differential Revision: https://reviews.llvm.org/D30264
llvm-svn: 299529
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.
Differential Revision: https://reviews.llvm.org/D31249
llvm-svn: 299201
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.
This patch fixed PR32442.
Differential Revision: https://reviews.llvm.org/D31407
llvm-svn: 298955
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.
In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D31052
llvm-svn: 298010
mfvrd and mffprd are both alias to mfvrsd.
This patch enables correct parsing of the aliases, but we still emit a mfvrsd.
Committing on behalf of brunoalr (Bruno Rosa).
Differential Revision: https://reviews.llvm.org/D29177
llvm-svn: 297849
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/
Re-commit r296771 to continue testing on the patch.
Sorry for the trouble!
llvm-svn: 297256
This reverts commit r296771.
We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)
llvm-svn: 297124
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.
This is part of the ongoing process to obsolete D24480. The motivation is to
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
PowerPC is an obvious winner for this kind of transform because it has fast and complete
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).
x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86
changes just shows that that code is a mess of special-case holes. We may be able to remove
some of that logic now.
My guess is that other targets will want to enable this hook for most cases. The likely
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to
math/logic, but the general transform for any 2 constants needs one more instruction
(multiply or 'and').
ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.
Differential Revision: https://reviews.llvm.org/D30537
llvm-svn: 296977
This patch fixes pr32063.
Current code in PPCTargetLowering::PerformDAGCombine can transform
bswap
store
into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,
1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().
2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.
Differential Revision: https://reviews.llvm.org/D30362
llvm-svn: 296811
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.
Committing on behalf of Hiroshi Inoue.
Differential Revision: https://reviews.llvm.org/D29881
llvm-svn: 296771
Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate.
The corresponding pattern already exists for 32-bit integers.
Committing on behalf of Hiroshi Inoue.
Differential Revision: https://reviews.llvm.org/D29387
llvm-svn: 296144
Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that
clear bits from the right hand size.
Committing on behalf of Hiroshi Inoue.
Differential Revision: https://reviews.llvm.org/D29388
llvm-svn: 296143
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.
This patch fixes pr31492.
Differential Revision: https://reviews.llvm.org/D28630
This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug.
llvm-svn: 295506
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.
Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.
Reviewers: dberris, echristo, iteratee, kbarton, hfinkel
Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D29742
llvm-svn: 294781
until we can get better TargetMachine::isCompatibleDataLayout to compare - otherwise
we can't code generate existing bitcode without a string equality data layout.
This reverts commit r294702.
llvm-svn: 294709
For other platforms we should find out what they need and likely
make the same change, however, a smaller additional change is easier
for platforms we know have it specified in the ABI. As part of this
rewrite some of the handling in the backends for data layout and update
a bunch of testcases.
Based on a patch by Simonas Kazlauskas!
llvm-svn: 294702
Adds the vnot extended mnemonic for the vnor instruction.
Committing on behalf of brunoalr (Bruno Rosa).
Differential Revision: https://reviews.llvm.org/D29225
llvm-svn: 294330
The the following instructions:
- LD/LWZ (expanded from sjLj pseudo-instructions)
- LXVL/LXVLL vector loads
- STXVL/STXVLL vector stores
all require G8RC_NO0X class registers for RA.
Differential Revision: https://reviews.llvm.org/D29289
Committed for Lei Huang
llvm-svn: 293769
Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in
the PPC back end.
Committing on behalf of brunoalr (Bruno Rosa).
Differential Revision: https://reviews.llvm.org/D29133
llvm-svn: 293626
Summary:
Adds the following instructions:
* mfpmr
* mtpmr
* icblc
* icblq
* icbtls
Fix the scheduling for mtspr on e5500, which uses CFX0, instead of
SFX0/SFX1 as on e500mc.
Addresses PR 31538.
Differential Revision: https://reviews.llvm.org/D29002
llvm-svn: 293417
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store
instructions.
2) Updated the flags on a number of intrinsics indicating that they write
memory.
3) Added SDNPMemOperand flags for some target dependent SDNodes so that they
propagate their memory operand
Review: https://reviews.llvm.org/D28818
llvm-svn: 293200
And teach shouldAssumeDSOLocal that ppc has no copy relocations.
The resulting code handle a few more case than before. For example, it
knows that a weak symbol can be resolved to another .o file, but it
will still be in the main executable.
llvm-svn: 293180
This reverts commit r292680. It is causing significantly worse
performance and test timeouts in our internal builds. I have already
routed reproduction instructions your way.
llvm-svn: 293092
Change getReservedRegs() to not mark a register as reserved and then
revert that decision in some cases. Motivated by the discussion in
https://reviews.llvm.org/D29056
llvm-svn: 293073
When a register like R1 is reserved, X1 should be reserved as well. This
was already done "manually" when 64bit code was enabled, however using
the markSuperRegs() function on the base register is more convenient and
allows to use the checksAllSuperRegsMarked() function even in 32bit mode
to avoid accidental breakage in the future.
This is also necessary to allow https://reviews.llvm.org/D28881
Differential Revision: https://reviews.llvm.org/D29056
llvm-svn: 292870
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.
This patch fixes pr31492.
Differential Revision: https://reviews.llvm.org/D28630
llvm-svn: 292680
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.
llvm-svn: 292154
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.
llvm-svn: 292128
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:
1. When determining tail-call eligibility. If we make a tail call (i.e.
directly branch to a function) then there is no place for the linker to add
a TOC restoration.
2. When determining when we need to add a nop instruction after a call.
Likewise, if there is a possibility that the linker might need to add a
TOC restoration after a call, then we need to put a nop after the call
(the bl instruction).
First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.
Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).
There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.
Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.
Differential Revision: https://reviews.llvm.org/D27231
llvm-svn: 291003
PWR9 processor model for instruction scheduling. A subsequent patch will migrate
PWR9 to Post RA MIScheduler.
https://reviews.llvm.org/D24525
llvm-svn: 290102
This patch appears to result in trampolines in vtables being miscompiled
when they in turn tail call a method.
I've posted some preliminary details about the failure on the thread for
this commit and talked to Hal. He was comfortable going ahead and
reverting until we sort out what is wrong.
llvm-svn: 289928
In some situations, the BUILD_VECTOR node that builds a v18i8 vector by
a splat of an i8 constant will end up with signed 8-bit values and other
situations, it'll end up with unsigned ones. Handle both situations.
Fixes PR31340.
llvm-svn: 289804
Most of the PowerPC64 code generation for the ELF ABI is already PIC.
There are four main exceptions:
(1) Constant pointer arrays etc. should in writeable sections.
(2) The TOC restoration NOP after a call is needed for all global
symbols. While GNU ld has a workaround for questionable GCC self-calls,
we trigger the checks for calls from COMDAT sections as they cross input
sections and are therefore not considered self-calls. The current
decision is questionable and suboptimal, but outside the scope of the
change.
(3) TLS access can not use the initial-exec model.
(4) Jump tables should use relative addresses. Note that the current
encoding doesn't work for the large code model, but it is more compact
than the default for any non-trivial jump table. Improving this is again
beyond the scope of this change.
At least (1) and (3) are assumptions made in target-independent code and
introducing additional hooks is a bit messy. Testing with clang shows
that a -fPIC binary is 600KB smaller than the corresponding -fno-pic
build. Separate testing from improved jump table encodings would explain
only about 100KB or so. The rest is expected to be a result of more
aggressive immediate forming for -fno-pic, where the -fPIC binary just
uses TOC entries.
This change brings the LLVM output in line with the GCC output, other
PPC64 compilers like XLC on AIX are known to produce PIC by default
as well. The relocation model can still be provided explicitly, i.e.
when using MCJIT.
One test case for case (1) is included, other test cases with relocation
mode sensitive behavior are wired to static for now. They will be
reviewed and adjusted separately.
Differential Revision: https://reviews.llvm.org/D26566
llvm-svn: 289743
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:
1. When determining tail-call eligibility. If we make a tail call (i.e.
directly branch to a function) then there is no place for the linker to add
a TOC restoration.
2. When determining when we need to add a nop instruction after a call.
Likewise, if there is a possibility that the linker might need to add a
TOC restoration after a call, then we need to put a nop after the call
(the bl instruction).
First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.
Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).
There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.
Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. This will yield subtle bugs if
interposition is attempted. As a result, regardless of whether we're in PIC
mode, we don't assume that we need to add the nop to support the possibility of
ELF interposition. However, the necessary check is in place (i.e. calling
GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for
which interposition is allowed at the IR level, we'll add the nop as necessary.
In the mean time, we'll generate more tail calls and fewer nops when compiling
position-independent code.
Differential Revision: https://reviews.llvm.org/D27231
llvm-svn: 289638
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
This patch fixes pr31144.
Differential Revision: https://reviews.llvm.org/D27287
llvm-svn: 289473
This is the final patch in the series of patches that improves
BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations
to remove redundant instructions. It also adds a large test case which
encompasses a large set of code patterns that build vectors - this test
case was the motivator for this series of patches.
Differential Revision: https://reviews.llvm.org/D26066
llvm-svn: 288800
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.
Differential Revision: https://reviews.llvm.org/D26713
llvm-svn: 288560
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
This is per function data so it is better kept at the function instead
of the module.
This is a necessary step to have machine module passes work properly.
Differential Revision: https://reviews.llvm.org/D27185
llvm-svn: 288291
This patch corresponds to review:
https://reviews.llvm.org/D26023
This patch adds support for converting a vector of loads into a single load if
the loads are consecutive (in either direction).
llvm-svn: 288219
This patch corresponds to review:
https://reviews.llvm.org/D25980
This is the 2nd patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector
of fp-to-int conversions into an fp-to-int conversion of a build vector of fp
values. For example:
Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...)
Into (fp_to_[su]i (build_vector $A, $B, ...))).
Which is a natural match for much cleaner code.
llvm-svn: 288218
This commit caused some miscompiles that did not show up on any of the bots.
Reverting until we can investigate the cause of those failures.
llvm-svn: 288214
This patch corresponds to review:
https://reviews.llvm.org/D25912
This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.
llvm-svn: 288152
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.
Fixes PR31127.
llvm-svn: 287765
Summary:
* ARM is omitted from this patch because this check appears to expose bugs in this target.
* Mips is omitted from this patch because this check either detects bugs or deliberate
emission of instructions that don't satisfy their predicates. One deliberate
use is the SYNC instruction where the version with an operand is correctly
defined as requiring MIPS32 while the version without an operand is defined
as an alias of 'SYNC 0' and requires MIPS2.
* X86 is omitted from this patch because it doesn't use the tablegen-erated
MCCodeEmitter infrastructure.
Patches for ARM and Mips will follow.
Depends on D25617
Reviewers: tstellarAMD, jmolloy
Subscribers: wdng, jmolloy, aemerson, rengolin, arsenm, jyknight, nemanjai, nhaehnle, tstellarAMD, llvm-commits
Differential Revision: https://reviews.llvm.org/D25618
llvm-svn: 287439
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.
Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.
Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something
similar.
llvm-svn: 287329
For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.
Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.
Differential Revision: https://reviews.llvm.org/D26336
llvm-svn: 287059
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.
llvm-svn: 286967
add an intrinsic to expose the 'VSX Scalar Convert Half-Precision to
Single-Precision' instruction.
Differential review: https://reviews.llvm.org/D26536
llvm-svn: 286862
This patch corresponds to review:
https://reviews.llvm.org/D26480
Adds all the intrinsics used for various permute builtins that will
be added to altivec.h.
llvm-svn: 286638
This patch corresponds to review:
https://reviews.llvm.org/D26307
Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.
llvm-svn: 286596
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
Differential revision: https://reviews.llvm.org/D22975
llvm-svn: 286523
behind the test that the MachineModuleInfo analysis was
actually available and can be used.
While the MachO bits may well be reasonable to assume in the darwin
assembly printer, the analysis isn't constructively guaranteed anywhere
I could find so it seems safest to avoid crashing here.
This issue was found with PVS-Studio. Pretty sure the Clang Static
Anaylzer flags similar issues but we've probably never pointed it at
this code effectively.
llvm-svn: 285972
GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter.
GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore
not a true subclass of GPRC.
llvm-svn: 285813
This patch corresponds to review:
https://reviews.llvm.org/D25896
It just eliminates the redundant ZExt after a count trailing zeros instruction.
llvm-svn: 285267
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.
Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.
Differential Revision: https://reviews.llvm.org/D25917
llvm-svn: 285006
https://reviews.llvm.org/D24924
This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review.
llvm-svn: 284983
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.
This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.
original commit msg:
[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284746
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284495
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
Differential Revision: https://reviews.llvm.org/D25521
llvm-svn: 284276
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
B = Splat A
C = Splat B
=>
C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
B = Splat A
C = Splat B
=>
B = Splat A
C = COPY B
Fixes PR30663.
Reviewers: echristo, iteratee, kbarton, nemanjai
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25493
llvm-svn: 283961
Summary:
I had for the second time today a bug where llvm::format("%s", Str)
was called with Str being a StringRef. The Linux and MacOS bots were
fine, but windows having different calling convention, it printed
garbage.
Instead we can catch this at compile-time: it is never expected to
call a C vararg printf-like function with non scalar type I believe.
Reviewers: bogner, Bigcheese, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25266
llvm-svn: 283509
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.
Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.
The ingredients of this patch are:
Remove the reciprocal estimate command-line debug option.
Add TargetRecip to TargetLowering.
Remove TargetRecip from TargetOptions.
Clean up the TargetRecip implementation to work with this new scheme.
Set the default reciprocal settings in TargetLoweringBase (everything is off).
Update the PowerPC defaults, users, and tests.
Update the x86 defaults, users, and tests.
Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.
Differential Revision: https://reviews.llvm.org/D24816
llvm-svn: 283252
This patch corresponds to review:
The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.
llvm-svn: 283212
This patch corresponds to review:
https://reviews.llvm.org/D23155
This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:
Int to Fp conversions of 1 or 2-byte values loaded from memory
Building vectors of 1 or 2-byte integers with values loaded from memory
Storing individual 1 or 2-byte elements from integer vectors
This patch implements all of those uses.
llvm-svn: 283190
The PPC branch-selection pass, which performs branch relaxation, needs to
account for the padding that might be introduced to satisfy block alignment
requirements. We were assuming that the first block was at offset zero (i.e.
had the alignment of the function itself), but under the ELFv2 ABI, a global
entry function prologue is added to the first block, and it is a
two-instruction sequence (i.e. eight-bytes long). If the function has 16-byte
alignment, the fact that the first block is eight bytes offset from the start
of the function is relevant to calculating where padding will be added in
between later blocks.
Unfortunately, I don't have a small test case.
llvm-svn: 283086
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.
Fixes PR26970.
llvm-svn: 283060
This patch corresponds to review:
https://reviews.llvm.org/D24396
This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.
llvm-svn: 282478
This patch corresponds to review:
https://reviews.llvm.org/D21135
This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld
In order to improve some build_vector and extractelement patterns.
llvm-svn: 282246
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.
llvm-svn: 282182
This patch corresponds to:
https://reviews.llvm.org/D21409
The LXVD2X, LXVW4X, STXVD2X and STXVW4X instructions permute the two doublewords
in the vector register when in little-endian mode. Custom code ensures that the
necessary swaps are inserted for these. This patch simply removes the possibilty
that a load/store node will match one of these instructions in the SDAG as that
would not insert the necessary swaps.
llvm-svn: 282144
This patch corresponds to review:
https://reviews.llvm.org/D19825
The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.
llvm-svn: 282143
When a phi node is finally lowered to a machine instruction it is
important that the lowered "load" instruction is placed before the
associated DEBUG_VALUE entry describing the value loaded.
Renamed the existing SkipPHIsAndLabels to SkipPHIsLabelsAndDebug to
more fully describe that it also skips debug entries. Then used the
"new" function SkipPHIsAndLabels when the debug information should not
be skipped when placing the lowered "load" instructions so that it is
placed before the debug entries.
Differential Revision: https://reviews.llvm.org/D23760
llvm-svn: 281727
This patch corresponds to review:
https://reviews.llvm.org/D24021
In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).
llvm-svn: 281479
There is currently no codegen for Power9 that depends on the directive
so this is NFC for now but will be important in the future. This was
missed in r268950 so I'm adding it now.
llvm-svn: 281473
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).
On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.
I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).
Fixes PR30280.
llvm-svn: 280789
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it
there is no guarantee that this part of the stack will not be modified
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.
This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24093
llvm-svn: 280705
We used to compute the padding contributions to the block sizes during branch
relaxation only at the start of the transformation. As we perform branch
relaxation, we change the sizes of the blocks, and so the amount of inter-block
padding might change. Accordingly, we need to recompute the (alignment-based)
padding in between every iteration on our way toward the fixed point.
Unfortunately, I don't have a test case (and none was provided in the bug
report), and while this obviously seems needed, algorithmically, I don't have
any way of generating a small and/or non-fragile regression test.
llvm-svn: 280626
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.
The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).
Fixes PR27721.
llvm-svn: 280614
PowerPC assembly code in the wild, so it seems, has things like this:
bc+ 12, 28, .L9
This is a bit odd because the '+' here becomes part of the BO field, and the BO
field is otherwise the first operand. Nevertheless, the ISA specification does
clearly say that the +- hint syntax applies to all conditional-branch mnemonics
(that test either CTR or a condition register, although not the forms which
check both), both basic and extended, so this is supposed to be valid.
This introduces some asm-parser-only definitions which take only the upper
three bits from the specified BO value, and the lower two bits are implied by
the +- suffix (via some associated aliases).
Fixes PR23646.
llvm-svn: 280571
dcbf has an optional hint-like field, add support for the extended form and the
associated mnemonics (dcbfl and dcbflp).
Partially fixes PR24796.
llvm-svn: 280559
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.
llvm-svn: 280545
As Sanjay suggested when he added the hook, PPC should return true from
hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a
compare).
Fixes PR27203.
llvm-svn: 280457
Following a suggestion by Sanjay, we should lower:
%shl = shl i32 1, %y
%and = and i32 %x, %shl
%cmp = icmp eq i32 %and, %shl
ret i1 %cmp
into:
subfic r4, r4, 32
rlwnm r3, r3, r4, 31, 31
Add this pattern and some associated patterns for the 64-bit case and the
not-equal case. Fixes PR27356.
llvm-svn: 280454
When applying our address-formation PPC64 peephole, we are reusing the @ha TOC
addis value with the low parts associated with different offsets (i.e.
different effective symbol addends). We were assuming this was okay so long as
the offsets were less than the alignment of the global variable being accessed.
This ignored the fact, however, that the TOC base pointer itself need only be
8-byte aligned. As a result, what we were doing is legal only for offsets less
than 8 regardless of the alignment of the object being accessed.
Fixes PR28727.
llvm-svn: 280441
The logic in this function assumes that the P8 supports fusion of addis/addi,
but it does not. As a result, there is no advantage to restricting our peephole
application, merging addi instructions into dependent memory accesses, even
when the addi has multiple users, regardless of whether or not we're optimizing
for size.
We might need something like this again for the P9; I suspect we'll revisit
this code when we work on P9 tuning.
llvm-svn: 280440
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
llvm-svn: 280350
When a function contains something, such as inline asm, which explicitly
clobbers the register used as the frame pointer, don't spill it twice. If we
need a frame pointer, it will be saved/restored in the prologue/epilogue code.
Explicitly spilling it again will reuse the same spill slot used by the
prologue/epilogue code, thus clobbering the saved value. The same applies
to the base-pointer or PIC-base register.
Partially fixes PR26856. Thanks to Ulrich for his analysis and the small
inline-asm reproducer.
llvm-svn: 280188
Implement Bill's suggested fix for 32-bit targets for PR22711 (for the
alignment of each entry). As pointed out in the bug report, we could just force
the section alignment, since we only add pointer-sized things currently, but
this fix is somewhat more future-proof.
llvm-svn: 280049
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.
Fixes PR19098.
llvm-svn: 280040
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).
Fixes PR22923.
llvm-svn: 280022
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.
Differential Revision: http://reviews.llvm.org/D23850
llvm-svn: 279698
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.
This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328
Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin
https://reviews.llvm.org/D23441
llvm-svn: 278900
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.
Differential Revision: https://reviews.llvm.org/D23445
llvm-svn: 278799
Summary: It triggers exponential behavior when the DAG has many branches.
Reviewers: hfinkel, kbarton
Subscribers: iteratee, nemanjai, echristo
Differential Revision: https://reviews.llvm.org/D23428
llvm-svn: 278548
There were two locations where fast-isel would generate a LFD instruction
with a target register class VSFRC instead of F8RC when VSX was enabled.
This can ccause invalid registers to be used in certain cases, like:
lfd 36, ...
instead of using a VSX load instruction. The wrong register number gets
silently truncated, causing invalid code to be generated.
The first place is PPCFastISel::PPCEmitLoad, which had multiple problems:
1.) The IsVSSRC and IsVSFRC flags are not initialized correctly, since they
are computed from resultReg, which is still zero at this point in many cases.
Fixed by changing the helper routines to operate on a register class instead
of a register and passing in UseRC.
2.) Even with this fixed, Is64VSXLoad is still wrong due to a typo:
bool Is32VSXLoad = IsVSSRC && Opc == PPC::LFS;
bool Is64VSXLoad = IsVSSRC && Opc == PPC::LFD;
The second line needs to use isVSFRC (like PPCEmitStore does).
3.) Once both the above are fixed, we're now generating a VSX instruction --
but an incorrect one, since generation of an indexed instruction with null
index is wrong. Fixed by copying the code handling the same issue in
PPCEmitStore.
The second place is PPCFastISel::PPCMaterializeFP, where we would emit an
LFD to load a constant from the literal pool, and use the wrong result
register class. Fixed by hardcoding a F8RC class even on systems
supporting VSX.
Fixes: https://llvm.org/bugs/show_bug.cgi?id=28630
Differential Revision: https://reviews.llvm.org/D22632
llvm-svn: 277823
This patch fixes passing long double type arguments to function in
soft float mode. If there is less than 4 argument registers free
(long double type is mapped in 4 gpr registers in soft float mode)
long double type argument must be passed through stack.
Differential Revision: https://reviews.llvm.org/D20114.
llvm-svn: 277804
This patch fixes pr25548.
Current implementation of PPCBoolRetToInt doesn't handle CallInst correctly, so it failed to do the intended optimization when there is a CallInst with parameters. This patch fixed that.
llvm-svn: 277655
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.
Differential Revision: https://reviews.llvm.org/D22885
llvm-svn: 277126
Avoid implicit conversions from MachineInstrBundleIterator to
MachineInstr* in the PowerPC backend, mainly by preferring MachineInstr&
over MachineInstr* when a pointer isn't nullable and using range-based
for loops.
There was one piece of questionable code in PPCInstrInfo::AnalyzeBranch,
where a condition checked a pointer converted from an iterator for
nullptr. Since this case is impossible (moreover, the code above
guarantees that the iterator is valid), I removed the check when I
changed the pointer to a reference.
Despite that case, there should be no functionality change here.
llvm-svn: 276864
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.
The corresponding change to clang is in: http://reviews.llvm.org/D16538
Patch by: Joel Jones
Differential Revision: https://reviews.llvm.org/D16213
llvm-svn: 276654
This patch corresponds to review:
https://reviews.llvm.org/D21354
We use direct moves for extracting integer elements from vectors. We also use
direct moves when converting integers to FP. When these operations are chained,
we get a direct move out of a VSR followed by a direct move back into a VSR.
These are redundant - all we need to do is line up the element and convert.
llvm-svn: 275796
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
This patch corresponds to review:
http://reviews.llvm.org/D20239
It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.
llvm-svn: 275215
This patch corresponds to review:
http://reviews.llvm.org/D21358
Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.
llvm-svn: 275168
There is a problem in VSXSwapRemoval where it is incorrectly removing permute instructions.
In this case, the permute is feeding both a vector store and also a non-store instruction. In this case, the permute cannot be removed.
The fix is to simply look at all the uses of the vector register defined by the permute and ensure that all the uses are vector store instructions.
This problem was reported in PR 27735 (https://llvm.org/bugs/show_bug.cgi?id=27735).
Test case based on the original problem reported.
Phabricator Review: http://reviews.llvm.org/D21802
llvm-svn: 274645
This patch corresponds to review:
http://reviews.llvm.org/D20443
It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).
llvm-svn: 274535
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator. One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.
llvm-svn: 274304
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr. In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
llvm-svn: 274287
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr. This is a
general API improvement.
Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other. Instead I've done everything as a block and just
updated what was necessary.
This is mostly mechanical fixes: adding and removing `*` and `&`
operators. The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.
As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.
Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy. I couldn't run tests
for AVR since llc doesn't link with it turned on.
llvm-svn: 274189
I think this converts all the simple cases that really just care about
the generated code being position independent or not. The remaining
uses are a bit more complicated and are checking things like "is this
a library or executable" or "can this symbol be preempted".
llvm-svn: 274055
The setCallee function will set the number of fixed arguments based
on the size of the argument list. The FixedArgs parameter was often
explicitly set to 0, leading to a lack of consistent value for non-
vararg functions.
Differential Revision: http://reviews.llvm.org/D20376
llvm-svn: 273403
We convert `Default` to `NotPIC` so that target independent code
can reason about this correctly.
Differential Revision: http://reviews.llvm.org/D21394
llvm-svn: 273024
Recommiting after fixing non-atomic insert to front of SmallVector in
MCAsmLexer.h
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 273007
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 272953
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.
llvm-svn: 272512
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.
This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).
This is needed for Swift and (possibly) other non-clang frontends.
Fixes PR26190.
llvm-svn: 272005
subclasses. These are not passes proper. We don't support registering
them, they can't be constructed with default arguments, and the ID is
actually in a base class.
Only these two targets even had any boiler plate to try to do this, and
it had to be munged out of the INITIALIZE_PASS macros to work. What's
worse, the boiler plate has rotted and the "name" of the pass is
actually the description string now!!! =/ All of this is completely
unnecessary. No other target bothers, and nothing breaks if you don't
initialize them because CodeGen has an entirely separate initialization
path that is somewhat more durable than relying on the implicit
initialization the way the 'opt' tool does for registered passes.
llvm-svn: 271650
Fix PR27943 "Bad machine code: Using an undefined physical register".
SUBFC8 implicitly defines the CR0 register, but this was omitted in
the instruction definition.
Patch by Jameson Nash <jameson@juliacomputing.com>
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D20802
llvm-svn: 271425
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.
Part of llvm.org/pr26808.
llvm-svn: 270283
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
While promoting nodes in PPCTargetLowering::DAGCombineExtBoolTrunc, it is
possible for one of the nodes to be replaced by another. To make sure we do not
visit the deleted nodes, and to make sure we visit the replacement nodes, use a
list of HandleSDNodes to track the to-be-promoted nodes during the promotion
process.
The same fix has been applied to the analogous code in
PPCTargetLowering::DAGCombineTruncBoolExt.
Fixes PR26985.
llvm-svn: 269272
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.
llvm-svn: 269011
This patch corresponds to review:
http://reviews.llvm.org/D19683
Simply adds the bits for being able to specify -mcpu=pwr9 to the back end.
llvm-svn: 268950
This patch fixes register alignment for long double type in
soft float mode. Before this patch alignment was 8 and this
patch changes it to 4.
Differential Revision: http://reviews.llvm.org/D18034
llvm-svn: 268909
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.
We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.
Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.
llvm-svn: 268693
This patch corresponds to review:
http://reviews.llvm.org/D18592
It allows the PPC back end to generate the xxspltw instruction where we
previously only emitted vspltw.
llvm-svn: 268516
If, in between the splat and the load (which does an implicit splat), there is
a read of the splat register, then that register must have another earlier
definition. In that case, we can't replace the load's destination register with
the splat's destination register.
Unfortunately, I don't have a small or non-fragile test case.
llvm-svn: 268152