We lower to a sequence consisting of:
- MOVi 0 into a register
- VCMPS to do the actual comparison and set the VFP flags
- FMSTAT to move the flags out of the VFP unit
- MOVCCi to either use the "zero register" that we have previously set
with the MOVi, or move 1 into the result register, based on the values
of the flags
As was the case with soft-float, for some predicates (one, ueq) we
actually need two comparisons instead of just one. When that happens, we
generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of
using the result of the first MOVCCi as the "zero register" for the
second one. This is a bit overkill, since one comparison followed by
two non-flag-setting conditional moves should be enough. In any case,
the backend manages to CSE one of the comparisons away so it doesn't
matter much.
Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not
VCMPES. This makes the code a lot simpler, and it also seems correct
since the LLVM Lang Ref defines simple true/false returns if the
operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand
exception, so they won't be slipping through unnoticed.
Implementation-wise, this introduces a template so we can share the same
code that we use for handling integer comparisons, since the only
differences are in the details (exact opcodes to be used etc). Hopefully
this will be easy to extend to s64 G_FCMP.
llvm-svn: 307365
Contrary to the stepForward()/stepBackward() method accumulate() doesn't
have a direction as defs, uses and clobbers all have the same effect.
Also improve the documentation comment.
llvm-svn: 307351
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
llvm-svn: 307346
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
llvm-svn: 307326
This fixes calls to external functions starting with a capital L,
fixing errors like this:
fatal error: error in backend: assembler label 'LocalFree' can not be undefined
Differential Revision: https://reviews.llvm.org/D35079
llvm-svn: 307317
Regardless of relaxation options such as -cl-fast-relaxed-math
we are producing rather long code for fdiv via amdgcn_fdiv_fast
intrinsic. This intrinsic is used to replace fdiv with 2.5ulp
metadata and does not handle denormals, thus believed to be fast.
An fdiv instruction can also have fast math flag either by itself
or together with fpmath metadata. Clang used with a relaxation flag
always produces both metadata and fast flag:
%div = fdiv fast float %v, %0, !fpmath !12!12 = !{float 2.500000e+00}
Current implementation ignores fast flag and favors metadata. An
instruction with just fast flag would be lowered to a fastest rcp +
mul, but that never happen on practice because of described mutual
clang and BE behavior.
This change allows an "fdiv fast" to be always lowered as rcp + mul.
Differential Revision: https://reviews.llvm.org/D34844
llvm-svn: 307308
Allows the MachineIRBuilder APIs to directly create registers (based on
LLT or TargetRegisterClass) as well as accept MachineInstrBuilders
and implicitly converts to register(with getOperand(0).getReg()).
Eg usage:
LLT s32 = LLT::scalar(32);
auto C32 = Builder.buildConstant(s32, 32);
auto Tmp = Builder.buildInstr(TargetOpcode::G_SUB, s32, C32,
OtherReg);
auto Tmp2 = Builder.buildInstr(Opcode, DstReg,
Builder.buildConstant(s32, 31)); ....
Only a few methods added for now.
Reviewed by Tim
llvm-svn: 307302
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.
llvm-svn: 307292
This covers both hard and soft float.
Hard float is easy, since it's just Legal.
Soft float is more involved, because there are several different ways to
handle it based on the predicate: one and ueq need not only one, but two
libcalls to get a result. Furthermore, we have large differences between
the values returned by the AEABI and GNU functions.
AEABI functions return a nice 1 or 0 representing true and respectively
false. GNU functions generally return a value that needs to be compared
against 0 (e.g. for ogt, the value returned by the libcall is > 0 for
true). We could introduce redundant comparisons for AEABI as well, but
they don't seem easy to remove afterwards, so we do different processing
based on whether or not the result really needs to be compared against
something (and just truncate if it doesn't).
llvm-svn: 307243
On power 8 we sometimes insert swaps to deal with the difference between
Little-Endian and Big-Endian. The swap removal pass is supposed to clean up
these swaps. On power 9 we don't need this pass since we do not need to insert
the swaps in the first place.
Commiting on behalf of Stefan Pintilie.
Differential Revision: https://reviews.llvm.org/D34627
llvm-svn: 307185
This patch adds the exploitation for new power 9 instructions which extract
variable elements from vectors:
VEXTUBLX
VEXTUBRX
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX
Differential Revision: https://reviews.llvm.org/D34032
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
llvm-svn: 307174
This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign
extended vector extract elements where the indices used by the vector extract
are not correct. We can still use the new hardware instructions by adding a
shuffle to move the elements to the correct indices. I introduced a new PPCISD
node here because adding a vector_shuffle and changing the elements of the
vector_extracts was getting undone by another DAG combine.
Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
Differential Revision: https://reviews.llvm.org/D34009
llvm-svn: 307169
Several integer multiply/divide instructions require use of a
register pair as input and output. This patch moves setting
up the input register pair from C++ code to TableGen, simplifying
the whole process and making it more easily extensible.
No functional change.
llvm-svn: 307155
Fixes a couple of whitespace errors, re-sorts the vector floating-point
instructions to make them more easily extensible, and adds a missing
pseudo instruction.
No functional change.
llvm-svn: 307154
We used to have a helper that replaced an instruction with a libcall.
That turns out to be too aggressive, since sometimes we need to replace
the instruction with at least two libcalls. Therefore, change our
existing helper to only create the libcall and leave the instruction
removal as a separate step. Also rename the helper accordingly.
llvm-svn: 307149
This implements suggesting other mnemonics when an invalid one is specified,
for example:
$ echo "adXd r1,r2,#3" | llvm-mc -triple arm
<stdin>:1:1: error: invalid instruction, did you mean: add, qadd?
adXd r1,r2,#3
^
The implementation is target agnostic, but as a first step I have added it only
to the ARM backend; so the ARM backend is a good example if someone wants to
enable this too for another target.
Differential Revision: https://reviews.llvm.org/D33128
llvm-svn: 307148
We should rewrite this using the generic branch relaxation pass, but for
the moment having this pass is better than hitting an assertion error.
llvm-svn: 307109
Summary:
Replace the matcher if-statements for each rule with a state-machine. This
significantly reduces compile time, memory allocations, and cumulative memory
allocation when compiling AArch64InstructionSelector.cpp.o after r303259 is
recommitted.
The following patches will expand on this further to fully fix the regressions.
Reviewers: rovka, ab, t.p.northover, qcolombet, aditya_nandakumar
Reviewed By: ab
Subscribers: vitalybuka, aemerson, javed.absar, igorb, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33758
llvm-svn: 307079
Summary:
When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool.
I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead.
Reviewers: spatel, RKSimon, zvi
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34923
llvm-svn: 307062
Summary: I believe this should be supported on GLM since RDSEED is.
Reviewers: m_zuckerman, zvi, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34828
llvm-svn: 307060
Previously, if a basic block ended with a FRMIDX instruction, we would
end up doing something like this.
*std::next(MBB.end())
Which would hit an error:
"Assertion `!NodePtr->isKnownSentinel()' failed."
llvm-svn: 307057