Commit Graph

1445 Commits

Author SHA1 Message Date
Nemanja Ivanovic 6a74bfba20 [PowerPC] Keep vector int to fp conversions in vector domain
At present a v2i16 -> v2f64 convert is implemented by extracts to scalar,
scalar converts, and merge back into a vector. Use vector converts instead,
with the int data permuted into the proper position and extended if necessary.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D53346

llvm-svn: 345361
2018-10-26 03:19:13 +00:00
Stefan Pintilie 927e8bf316 [Power9] Add __float128 support in the backend for bitcast to a i128
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.

Differential Revision: https://reviews.llvm.org/D49507

llvm-svn: 345053
2018-10-23 17:11:36 +00:00
QingShan Zhang bc1586352e [PowerPC] Fix the assert of ISD::SIGN_EXTEND_INREG when type is v2i16 and v2i8
For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation. 
So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52449

llvm-svn: 344109
2018-10-10 02:33:48 +00:00
Nemanja Ivanovic 87873d04c3 [PowerPC] Implement hasBitPreservingFPLogic for types that can be supported
This is the PPC-specific non-controversial part of
https://reviews.llvm.org/D44548 that simply enables this combine for PPC
since PPC has these instructions.
This commit will allow the target-independent portion to be truly target
independent.

llvm-svn: 344077
2018-10-09 20:35:15 +00:00
QingShan Zhang cae9425a3c [PowerPC] Fix the assert of combineBVOfConsecutiveLoads when element num is 1
Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive.
But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert.

Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52072

llvm-svn: 342611
2018-09-20 03:09:15 +00:00
Strahinja Petrovic 488fd4e625 [PowerPC] Fix label address calculation for ppc64
This patch fixes calculating address of label for non-pic ppc64.

Differential Revision: https://reviews.llvm.org/D50965

llvm-svn: 342368
2018-09-17 11:03:40 +00:00
Lion Yang c68f78d5d8 [PowerPC] Fix the calling convention for i1 arguments on PPC32
Summary:
Integer types smaller than i32 must be extended to i32 by default.
The feature "crbits" introduced at r202451 handles i1 as a special case,
but it did not extend properly.
The caller was, therefore, passing i1 stack arguments by writing 0/1 to
the first byte of the 4-byte stack object and callee was
reading the first byte for the value.

"crbits" is enabled if the optimization level is greater than 1,
which is very common in "release builds".
Such discrepancies with ABI specification also introduces
potential incompatibility with programs or libraries
built with other compilers e.g. GCC.

Fixes PR38661

Reviewers: hfinkel, cuviper

Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D51108

llvm-svn: 342288
2018-09-14 21:26:05 +00:00
QingShan Zhang abbb894ff5 [PowerPC] Combine ADD to ADDZE
On the ppc64le platform, if ir has the following form,

define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 {
entry:
  %cmp = icmp ne i64 %z, CONSTANT      (-32767 <= CONSTANT <= 32768)
  %conv1 = zext i1 %cmp to i64
  %add = add nsw i64 %conv1, %x
  ret i64 %add
}
we can optimize it to the form below.

                                when C == 0
                            --> addze X, (addic Z, -1))
                           /
add X, (zext(setne Z, C))--
                           \    when -32768 <= -C <= 32767 && C != 0
                            --> addze X, (addic (addi Z, -C), -1)

Patch By: HLJ2009 (Li Jia He)
Differential Revision: https://reviews.llvm.org/D51403
Reviewed By: Nemanjai 

llvm-svn: 341634
2018-09-07 07:56:05 +00:00
Nemanja Ivanovic 5d06f17b8a [PowerPC] Revert commit r339779
This commit has caused failures in some internal benchmarks. Temporarily
reverting this patch until the issue can be diagnosed and fixed.

llvm-svn: 340740
2018-08-27 13:20:42 +00:00
Nemanja Ivanovic f2588a28a8 [PowerPC] Recommit r340016 after fixing the reported issue
The internal benchmark failure reported by Google was due to a missing
check for the result type for the sign-extend and shift DAG. This commit
adds the check and re-commits the patch.

llvm-svn: 340734
2018-08-27 11:20:27 +00:00
Eric Christopher 3dc594c1e6 Temporarily Revert "[PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction" due to it causing a compiler crash on valid.
This reverts commit r340016, testcase forthcoming.

llvm-svn: 340315
2018-08-21 18:35:08 +00:00
Stefan Pintilie 39869ccf51 [PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loads
This patch addresses:

- Implementation within PPCISelLowering.cpp to check if we should use direct
load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector
function is used; which will allow us to catch as many cases of the
scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into
lxsd.

- Test cases to exhibit the behaviour of emitting lxsd/lfd.

Patch by amyk

Differential revision: https://reviews.llvm.org/D49698

llvm-svn: 340037
2018-08-17 15:15:26 +00:00
Nemanja Ivanovic 39751276b0 [PowerPC] Generate Power9 extswsli extend sign and shift immediate instruction
Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli
extend sign and shift immediate instruction.

Patch by RolandF.

Differential revision: https://reviews.llvm.org/D49879

llvm-svn: 340016
2018-08-17 12:35:44 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Nemanja Ivanovic 5b9a4f8ee5 [PowerPC] Enhance the selection(ISD::VSELECT) of vector type
To make ISD::VSELECT available(legal) so long as there are altivec instruction,
otherwise it's default behavior is expanding.
Use xxsel to match vselect if vsx is open, or use vsel.

In order to do not write many patterns in td file, promote (for vector it's
bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into
vsel or xxsel.

Patch by wuzish
Differential revision: https://reviews.llvm.org/D49531

llvm-svn: 339779
2018-08-15 15:30:36 +00:00
Nemanja Ivanovic 8b4bd09e22 [PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types
When trying to combine a DAG that builds a vector out of sign-extensions of
vector extracts, the code assumes legal input types. Due to that, we have to
disable this combine prior to legalization.
In some cases, the DAG will look slightly different after legalization so
account for that in the matching code.

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087

Differential Revision: https://reviews.llvm.org/D49080

llvm-svn: 339769
2018-08-15 12:58:13 +00:00
Zaara Syeda b2595b988b [PowerPC] Improve codegen for vector loads using scalar_to_vector
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:

LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64

Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950

llvm-svn: 339260
2018-08-08 15:20:43 +00:00
Nemanja Ivanovic e1a525ed06 [PowerPC] Do not round values prior to converting to integer
Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements
feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector
loses precision. This patch removes the code that adds these nodes
to true f64 operands. It also adds patterns required to ensure
the code is still vectorized rather than converting individual
elements and inserting into a vector.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38342

Differential Revision: https://reviews.llvm.org/D50121

llvm-svn: 338658
2018-08-02 00:03:22 +00:00
Craig Topper 2f60ef2c78 [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc.
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.

llvm-svn: 338329
2018-07-30 23:22:00 +00:00
Craig Topper a568a27dfa [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2.
llvm-svn: 338303
2018-07-30 21:04:34 +00:00
Matt Arsenault 81920b0a25 DAG: Add calling convention argument to calling convention funcs
This seems like a pretty glaring omission, and AMDGPU
wants to treat kernels differently from other calling
conventions.

llvm-svn: 338194
2018-07-28 13:25:19 +00:00
Justin Hibbits 22e939a15b Fix build failures from r337347, found by clang
* Delete a no-longer-used override, and mark the other
getRegisterTypeForCallingConv() as override.
* SPE only supports i32, not i64, as the internal type, so simply remove
the type check, so that DestReg and Opc are provably always set.

GCC 6.4 did not warn about either of the above.

llvm-svn: 337350
2018-07-18 05:19:25 +00:00
Justin Hibbits d52990c71b Introduce codegen for the Signal Processing Engine
Summary:
The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1,
e500v2, and several e200 cores.  This adds support targeting the e500v2,
as this is more common than the e500v1, and is in SoCs still on the
market.

This patch is very intrusive because the SPE is binary incompatible with
the traditional FPU.  After discussing with others, the cleanest
solution was to make both SPE and FPU features on top of a base PowerPC
subset, so all FPU instructions are now wrapped with HasFPU predicates.

Supported by this are:
* Code generation following the SPE ABI at the LLVM IR level (calling
conventions)
* Single- and Double-precision math at the level supported by the APU.

Still to do:
* Vector operations
* SPE intrinsics

As this changes the Callee-saved register list order, one test, which
tests the precise generated code, was updated to account for the new
register order.

Reviewed by: nemanjai
Differential Revision: https://reviews.llvm.org/D44830

llvm-svn: 337347
2018-07-18 04:25:10 +00:00
Justin Hibbits ceb3cd96f7 Add PowerPC e500(v2) core scheduler and directives.
Differential Revision: https://reviews.llvm.org/D44828

llvm-svn: 337345
2018-07-18 04:24:49 +00:00
Stefan Pintilie 133acb22bb [Power9] Add __float128 builtins for Rounding Operations
Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

llvm-svn: 336601
2018-07-09 20:38:40 +00:00
Stefan Pintilie 3d76326d24 [Power9] Add __float128 support for compare operations
Added handling for the select f128.

Differential Revision: https://reviews.llvm.org/D48294

llvm-svn: 336548
2018-07-09 13:36:14 +00:00
Stefan Pintilie b351f09c9e [Power9] Add __float128 library call for frem
Power 9 does not have a hardware instruction for frem but we can call fmodf128.

Differential Revision: https://reviews.llvm.org/D48552

llvm-svn: 336406
2018-07-06 02:47:02 +00:00
Lei Huang 5612b90694 [Power9] Add lib calls for float128 operations with no equivalent PPC instructions
Map the following instructions to the proper float128 lib calls:
  pow[i], exp[2], log[2|10], sin, cos, fmin, fmax

Differential Revision: https://reviews.llvm.org/D48544

llvm-svn: 336361
2018-07-05 15:21:37 +00:00
Lei Huang a855e17f09 [Power9] Ensure float128 in non-homogenous aggregates are passed via VSX reg
Non-homogenous aggregates are passed in consecutive GPRs, in GPRs and in memory,
or in memory. This patch ensures that float128 members of non-homogenous
aggregates are passed via VSX registers.

This is done via custom lowering a bitcast of a build_pari(i64,i64) to float128
to a new PPCISD node, BUILD_FP128.

Differential Revision: https://reviews.llvm.org/D48308

llvm-svn: 336310
2018-07-05 06:21:37 +00:00
Lei Huang d17c39ccaa [Power9]Legalize and emit code for quad-precision convert from single-precision
Legalize and emit code for quad-precision floating point operation conversion of
single-precision value to quad-precision.

Differential Revision: https://reviews.llvm.org/D47569

llvm-svn: 336307
2018-07-05 04:18:37 +00:00
Lei Huang a26f3be454 [Power9] Implement float128 parameter passing and return values
This patch enable parameter passing and return by value for float128 types.
Passing aggregate/union which contain float128 members will be submitted in
subsequent patches.

Differential Revision: https://reviews.llvm.org/D47552

llvm-svn: 336306
2018-07-05 04:10:15 +00:00
Lei Huang 6270ab6ce4 [Power9]Legalize and emit code for round & convert quad-precision values
Legalize and emit code for round & convert float128 to double precision and
single precision.

Differential Revision: https://reviews.llvm.org/D46997

llvm-svn: 336299
2018-07-04 21:59:16 +00:00
Strahinja Petrovic bb2b00bb80 [PowerPC] Fix label address calculation for ppc32
This patch fixes calculating address of label on ppc32 (for -fPIC).

Differential Revision: https://reviews.llvm.org/D46582

llvm-svn: 335043
2018-06-19 13:07:40 +00:00
Hiroshi Inoue 0f7f59f073 [PowerPC] fix trivial typos in comment, NFC
llvm-svn: 334583
2018-06-13 08:54:13 +00:00
Amaury Sechet 8467411dad Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

llvm-svn: 333748
2018-06-01 13:21:33 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Lei Huang c517e95bc6 [Power9]Legalize and emit code for truncate and convert QP to DW
Legalize and emit code for:

  * xscvqpsdz : VSX Scalar truncate & Convert Quad-Precision to Signed Dword
  * xscvqpudz : VSX Scalar truncate & Convert Quad-Precision to Unsigned Dword

Differential Revision: https://reviews.llvm.org/D45553

llvm-svn: 331787
2018-05-08 18:23:31 +00:00
Lei Huang c29229a644 [PowerPC] Unify handling for conversion of FP_TO_INT feeding a store
Existing DAG combine only handles conversions for FP_TO_SINT:
"{f32, f64} x { i32, i16 }"

This patch simplifies the code to handle:
"{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }"

Differential Revision: https://reviews.llvm.org/D46102

llvm-svn: 331778
2018-05-08 17:36:40 +00:00
Nemanja Ivanovic 61ffbf21cd Commit r331416 breaks the big-endian PPC bot. On the big endian build, we
actually encounter constants wider than 64-bits. Add the guard to prevent
tripping the assert.

llvm-svn: 331420
2018-05-03 01:04:13 +00:00
Nemanja Ivanovic 01e2e79abf [PowerPC] Implement isMaskAndCmp0FoldingBeneficial
Sinking the and closer to a compare against zero is beneficial on PPC as it
allows us to emit record-form instructions. In the future, we may expand this
to a larger set of operations that feed compares against zero since PPC has
lots of record-form instructions.

Differential revision: https://reviews.llvm.org/D46060

llvm-svn: 331416
2018-05-02 23:55:23 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Lei Huang 42ab1d3d03 [NFC] Move verificaiton check for f128 conversion into LowerINT_TO_FP()
Move veriication check for legal conversions to f128 into LowerINT_TO_FP()
and fix some indentations to match other sections of the code for readability.

llvm-svn: 330138
2018-04-16 17:30:24 +00:00
Lei Huang 10367eb422 [Power9]Legalize and emit code for converting (Un)Signed DWord to Quad-Precision
Legalize and emit code for:

  * xscvsdqp
  * xscvudqp

Differential Revision: https://reviews.llvm.org/D45230

llvm-svn: 329931
2018-04-12 18:00:14 +00:00
Lei Huang 09fda63af0 [Power9]Legalize and emit code for quad-precision fma instructions
Legalize and emit code for the following quad-precision fma:

  * xsmaddqp
  * xsnmaddqp
  * xsmsubqp
  * xsnmsubqp

Differential Revision: https://reviews.llvm.org/D44843

llvm-svn: 329206
2018-04-04 16:43:50 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
Lei Huang be0afb0870 [Power9]Legalize and emit code for quad-precision convert from double-precision
Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.

Differential Revision: https://reviews.llvm.org/D44746

llvm-svn: 328558
2018-03-26 17:46:25 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
Craig Topper c2dbd677bd [PowerPC][LegalizeFloatTypes] Move the PPC hacks for (i32 fp_to_sint/fp_to_uint (ppcf128 X)) out of LegalizeFloatTypes and into PPC specific code
I'm not entirely sure these hacks are still needed. If you remove the hacks completely, the name of the library call that gets generated doesn't match the grep the test previously had. So the test wasn't really checking anything.

If the hack is still needed it belongs in PPC specific code. I believe the FP_TO_SINT code here is the only place in the tree where a FP_ROUND_INREG node is created today. And I don't think its even being used correctly because the legalization returned a BUILD_PAIR with the same value twice. That doesn't seem right to me. By moving the code entirely to PPC we can avoid creating the FP_ROUND_INREG at all.

I replaced the grep in the existing test with full checks generated by hacking update_llc_test_check.py to support ppc32 just long enough to generate it.

Differential Revision: https://reviews.llvm.org/D44061

llvm-svn: 328017
2018-03-20 18:49:28 +00:00
Lei Huang 6d1596a98c [PowerPC][Power9]Legalize and emit code for quad-precision add/div/mul/sub
Legalize and emit code for quad-precision floating point operations:

  * xsaddqp
  * xssubqp
  * xsdivqp
  * xsmulqp

Differential Revision: https://reviews.llvm.org/D44506

llvm-svn: 327878
2018-03-19 18:52:20 +00:00
Guozhi Wei 9c916584ba [PPC] Avoid non-simple MVT in STBRX optimization
PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash.

This patch detects the non-simple MVT and returns early.

Differential Revision: https://reviews.llvm.org/D44500

llvm-svn: 327651
2018-03-15 17:49:12 +00:00
Lei Huang 1f8da3ae19 [PowerPC][NFC] formatting-only fix
llvm-svn: 327599
2018-03-15 03:06:44 +00:00
Craig Topper 80d3bb3b4b [TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC
The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have.

There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run.

A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does.

llvm-svn: 326832
2018-03-06 19:44:52 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Lei Huang dfd41552f4 [PowerPC] Reduce stack frame for fastcc functions by only allocating parameter save area when needed
Current implementation always allocates the parameter save area conservatively
for fastcc functions. There is no reason to allocate the parameter save area if
all the parameters can be passed via registers.

Differential Revision: https://reviews.llvm.org/D42602

llvm-svn: 325581
2018-02-20 15:09:45 +00:00
Zaara Syeda 1f59ae311b Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778
2018-01-30 16:17:22 +00:00
Tim Shen 7abe9887b0 [PPC] Avoid incorrect fp-i128-fp lowering.
Summary:
Fix an issue that's similar to what D41411 fixed:
  float(__int128(float_var)) shouldn't be optimized to xscvdpsxds +
  xscvsxdsp, as they mean (float)(int64_t)float_var.

Reviewers: jtony, hfinkel, echristo

Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D42400

llvm-svn: 323270
2018-01-23 22:06:57 +00:00
Zaara Syeda c9dc7b451b Revert [PowerPC] This reverts commit rL322721
Failing build bots. Revert the commit now.

llvm-svn: 322748
2018-01-17 20:00:15 +00:00
Zaara Syeda 8e951fd2f6 [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 322721
2018-01-17 18:22:55 +00:00
Nemanja Ivanovic ebb23078e9 [PowerPC] Zero-extend the compare operand for ATOMIC_CMP_SWAP
Part of the fix for https://bugs.llvm.org/show_bug.cgi?id=35812.
This patch ensures that the compare operand for the atomic compare and swap
is properly zero-extended to 32 bits if applicable.
A follow-up commit will fix the extension for the SETCC node generated when
expanding an ATOMIC_CMP_SWAP_WITH_SUCCESS. That will complete the bug fix.

Differential Revision: https://reviews.llvm.org/D41856

llvm-svn: 322372
2018-01-12 14:58:41 +00:00
Craig Topper d461aefe5f [PowerPC] Add an ISD::TRUNCATE to the legalization for ppc_is_decremented_ctr_nonzero
Summary:
I believe legalization is really expecting that ReplaceNodeResults will return something with the same type as the thing that's being legalized. Ultimately, it uses the output to replace the uses in the DAG so the type should match to make that work.

There are two relevant cases here. When crbits are enabled, then i1 is a legal type and getSetCCResultType should return i1. In this case, the truncate will be between i1 and i1 and should be removed (SelectionDAG::getNode does this). Otherwise, getSetCCResultType will be i32 and the legalizer will promote the truncate to be i32 -> i32 which will be similarly removed.

With this fixed we can remove some code from PromoteIntRes_SETCC that seemed to only exist to deal with the intrinsic being replaced with a larger type without changing the other operand. With the truncate being used for connectivity this doesn't happen anymore.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: nemanjai, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D41654

llvm-svn: 321959
2018-01-07 07:51:36 +00:00
Hiroshi Inoue ca3cdd7f27 [PowerPC] fix a bug in TCO eligibility check
If the callee and caller use different calling convensions, we cannot apply TCO if the callee requires arguments on stack; e.g. C calling convention and Fast CC use the same registers for parameter passing, but the stack offset is not necessarily same.

This patch also recommit r319218 "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions." by @sfertile since the problem reported in r320106 should be fixed.

Differential Revision: https://reviews.llvm.org/D40893

llvm-svn: 321579
2017-12-30 08:09:04 +00:00
Tony Jiang eba757e45c [PowerPC] Fix parest build failure in SPEC2017.
The build failure was caused by an assertion in pre-legalization DAGCombine:

Combining: t6: ppcf128 = uint_to_fp t5
... into: t20: f32 = PPCISD::FCFIDUS t19

which is clearly wrong since ppcf128 are definitely different type with f32 and
we cannot change the node value type when do DAGCombine. The fix is don't
handle ppc_fp128 or i1 conversions in PPCTargetLowering::combineFPToIntToFP and
leave it to downstream to legalize it and expand it to small legal types.

Differential Revision: https://reviews.llvm.org/D41411

llvm-svn: 321276
2017-12-21 15:42:50 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Nemanja Ivanovic 1794cdc481 Fix code causing fallthrough warnings in the PPC back end.
llvm-svn: 320806
2017-12-15 11:47:48 +00:00
Matt Arsenault 7d7adf4f2e TLI: Allow using PSV for intrinsic mem operands
llvm-svn: 320756
2017-12-14 22:34:10 +00:00
Matt Arsenault 1117133687 DAG: Expose all MMO flags in getTgtMemIntrinsic
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.

On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.

llvm-svn: 320746
2017-12-14 21:39:51 +00:00
Nemanja Ivanovic 50d37a1129 [PowerPC] Sign-extend negative constant stores
Second part of https://reviews.llvm.org/D40348.
Revision r318436 has extended all constants feeding a store to 64 bits
to allow for CSE on the SDAG. However, negative constants were zero extended
which made the constant being loaded appear to be a positive value larger than
16 bits. This resulted in long sequences to materialize such constants
rather than simply a "load immediate". This patch just sign-extends those
updated constants so that they remain 16-bit signed immediates if they started
out that way.

llvm-svn: 320368
2017-12-11 14:35:48 +00:00
Eric Christopher a469acac03 Temporarily revert "[PowerPC] Allow tail calls of fastcc functions from C CallingConv functions."
It is causing sanitizer failures on llvm tests in a bootstrapped compiler. No bot link since it's currently down, but following up to get the bot up.

This reverts commit r319218.

llvm-svn: 320106
2017-12-07 22:26:19 +00:00
Sean Fertile e200016ea9 [PowerPC] Allow tail calls of fastcc functions from C CallingConv functions.
Allow fastcc callees to be tail-called from ccc callers.

Differential Revision: https://reviews.llvm.org/D40355

llvm-svn: 319218
2017-11-28 20:25:58 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Guozhi Wei 433e8d3e04 [PPC] Change i32 constant in store instruction to i64
This patch changes all i32 constant in store instruction to i64 with truncation, to increase the chance that the referenced constant can be shared with other i64 constant.

Differential Revision: https://reviews.llvm.org/D39352

llvm-svn: 318436
2017-11-16 18:27:34 +00:00
Sean Fertile 0f0837e84e [PowerPC] Implement mayBeEmittedAsTailCall for PPC
Implements TargetLowering callback 'mayBeEmittedAsTailCall' that enables
CodeGenPrepare to duplicate returns when they might enable a tail-call.

Differential Revision: https://reviews.llvm.org/D39777

llvm-svn: 318321
2017-11-15 18:58:27 +00:00
Sean Fertile 7b056b3048 [PowerPC] Split out the tailcall calling convention checks. NFC.
Move the calling convention checks for tail-call eligibility for the 64-bit
SysV ABI into a separate function. This is so that it can be shared with
'mayBeEmittedAsTailCall' in a subsequent change.

llvm-svn: 318305
2017-11-15 16:53:41 +00:00
David Blaikie 3f833edc7c Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Graham Yiu 5cd044e8c8 Use new vector insert half-word and byte instructions when we see insertelement on '8 x i16' and '16 x i8' types. Also extended existing lit testcase to cover these cases.
Differential Revision: https://reviews.llvm.org/D34630

llvm-svn: 317613
2017-11-07 20:55:43 +00:00
Graham Yiu 52a52a6cab Fix buildbot breakages from r317503. Add parentheses to assignment when using result as a condition.
llvm-svn: 317508
2017-11-06 21:04:19 +00:00
Graham Yiu 030621bbcb Adds code to PPC ISEL lowering to recognize byte inserts from vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word.
Differential Revision: https://reviews.llvm.org/D34497

llvm-svn: 317503
2017-11-06 20:18:30 +00:00
Guozhi Wei e3b8d9a312 [PPC] Use xxbrd to speed up bswap64
Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64.

  rotldi   5, 3, 16
  rotldi   4, 3, 8
  rotldi   9, 3, 24
  rotldi   10, 3, 32
  rotldi   11, 3, 48
  rotldi   12, 3, 56
  rldimi 4, 5, 8, 48
  rldimi 4, 9, 16, 40
  rldimi 4, 10, 24, 32
  rldimi 4, 11, 40, 16
  rldimi 4, 12, 48, 8
  rldimi 4, 3, 56, 0

But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to:

  mtvsrdd 34, 3, 3
  xxbrd 34, 34
  mfvsrld 3, 34

Differential Revision: https://reviews.llvm.org/D39510

llvm-svn: 317499
2017-11-06 19:09:38 +00:00
Graham Yiu 671526148c Adds code to PPC ISEL lowering to recognize half-word inserts from vector_shuffles, and use P9 shift and vector insert instructions instead of vperm.
Differential Revision: https://reviews.llvm.org/D34160

llvm-svn: 317111
2017-11-01 18:06:56 +00:00
Hiroshi Inoue e3a3e3c9e9 [PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended
This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass.
If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated.
One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. 
For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated.

void int_func(int);
void ii_test(int a) {
    if (a & 1) return int_func(a);
}

Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG.

Differential Revision: https://reviews.llvm.org/D31319

llvm-svn: 315888
2017-10-16 04:12:57 +00:00
Matt Arsenault f2db97d8fa DAG: Add opcode and source type to isFPExtFree
This is only currently used for mad/fma transforms.
This is the only case where it should be used for AMDGPU,
so add an opcode to be sure.

llvm-svn: 315740
2017-10-13 19:55:45 +00:00
Hal Finkel 112a6bac72 [PowerPC] Don't use xscvdpspn on the P7
xscvdpspn was not introduced until the P8, so don't use it on the P7. Fixes a
regression introduced in r288152.

llvm-svn: 312612
2017-09-06 03:08:26 +00:00
Tony Jiang 61ef1c540c [PPC][NFC] Renaming things with 'xxinsert' moniker to 'vecinsert' to make it more general.
Commit on behalf of Graham Yiu (gyiu@ca.ibm.com)

llvm-svn: 312547
2017-09-05 18:08:02 +00:00
Sean Fertile 00393cce3a [PPC] Refine checks for emiting TOC restore nop and tail-call eligibility.
For the medium and large code models we only need to check if a call crosses
dso-boundaries when considering tail-call elgibility.

Differential Revision: https://reviews.llvm.org/D34245

llvm-svn: 311353
2017-08-21 17:35:32 +00:00
Nemanja Ivanovic 979dcb6f09 [PowerPC] Don't crash on larger splats achieved through 1-byte splats
We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will
produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch
prevents crashing in that situation.

Differential Revision: https://reviews.llvm.org/D35650

llvm-svn: 310358
2017-08-08 13:52:45 +00:00
Rafael Espindola 79e238afee Delete Default and JITDefault code models
IMHO it is an antipattern to have a enum value that is Default.

At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.

This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.

llvm-svn: 309911
2017-08-03 02:16:21 +00:00
Stefan Pintilie 873889ca16 [Power9] Exploit vector absolute difference instructions on Power 9
Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW)
for byte, halfword and word. We should take advantage of these.

Differential Revision: https://reviews.llvm.org/D34684

llvm-svn: 309876
2017-08-02 20:07:21 +00:00
Peter Collingbourne 081ffe2ff2 Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This was a use-after-free waiting to happen.

llvm-svn: 309159
2017-07-26 19:15:29 +00:00
Jonas Paulsson 024e319489 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Nemanja Ivanovic 3c7e276d24 [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.

Differential Revision: https://reviews.llvm.org/D35007

llvm-svn: 307934
2017-07-13 18:17:10 +00:00
Tony Jiang acefbcf38e [PPC CodeGen] Expand the bitreverse.i64 intrinsic.
Differential Revision: https://reviews.llvm.org/D34908
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

llvm-svn: 307563
2017-07-10 18:11:23 +00:00
Lei Huang 168d14b143 [PowerPC] Reduce register pressure by not materializing a constant just for use as an index register for X-Form loads/stores.
For this example:
float test (int *arr) {
    return arr[2];
}

We currently generate the following code:
  li r4, 8
  lxsiwax f0, r3, r4
  xscvsxdsp f1, f0

With this patch, we will now generate:
  addi r3, r3, 8
  lxsiwax f0, 0, r3
  xscvsxdsp f1, f0

Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204
Differential Revision: https://reviews.llvm.org/D35027

llvm-svn: 307553
2017-07-10 16:44:45 +00:00
Hiroshi Inoue a86c920b1e fix typos in comments and error messages; NFC
llvm-svn: 307533
2017-07-10 12:44:25 +00:00
Lei Huang 317104183d [PowerPC] NFC : Common up definitions of isIntS16Immediate and update parameter to int16_t
llvm-svn: 307442
2017-07-07 21:12:35 +00:00
Tony Jiang c260e0eb56 [PPC CodeGen] Expand the bitreverse.i32 intrinsic.
Differential Revision: https://reviews.llvm.org/D33572
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093

llvm-svn: 307413
2017-07-07 16:41:55 +00:00
Simon Pilgrim 6c1695c6f7 [PowerPC] Fix -Wimplicit-fallthrough warnings. NFCI.
llvm-svn: 307382
2017-07-07 10:21:44 +00:00
Tony Jiang 9a91a18110 [Power9] Exploit vector integer extend instructions when indices aren't correct.
This patch adds on to the exploitation added by https://reviews.llvm.org/D33510.
This now catches build vector nodes where the inputs are coming from sign
extended vector extract elements where the indices used by the vector extract
are not correct. We can still use the new hardware instructions by adding a
shuffle to move the elements to the correct indices. I introduced a new PPCISD
node here because adding a vector_shuffle and changing the elements of the
vector_extracts was getting undone by another DAG combine.

Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com)
Differential Revision: https://reviews.llvm.org/D34009

llvm-svn: 307169
2017-07-05 16:00:38 +00:00
Eric Christopher a5b50cd645 Tidy up some calls to getRegister for readability.
llvm-svn: 305626
2017-06-17 02:25:49 +00:00
Lei Huang f689f69fea Test commit - NFC.
Modified a comment to confirm commit access functionality.

llvm-svn: 305402
2017-06-14 17:25:55 +00:00
Kit Barton 0b216305db Test commit - NFC.
Modified a comment to confirm commit access functionality.

llvm-svn: 305309
2017-06-13 17:35:29 +00:00
NAKAMURA Takumi 3807ab24c6 PPCISelLowering.cpp: Fix warnings in r305214. [-Wdocumentation]
llvm-svn: 305277
2017-06-13 07:34:32 +00:00
Tony Jiang 1a8eec141a [PowerPC] Match vec_revb builtins to P9 instructions.
Power9 has instructions that will reverse the bytes within an element for all
sizes (half-word, word, double-word and quad-word). These can be used for the
vec_revb builtins in altivec.h. However, we implement these to match vector
shuffle nodes as that will cover both the builtins and vector shuffles that
occur in the SDAG through other means.

Differential Revision: https://reviews.llvm.org/D33690

llvm-svn: 305214
2017-06-12 18:24:36 +00:00
Tony Jiang 30a49d1a3d [Power9] Added support for the modsw, moduw, modsd, modud hardware instructions.
Note that if we need the result of both the divide and the modulo then we
compute the modulo based on the result of the divide and not using the new
hardware instruction.

Commit on behalf of STEFAN PINTILIE.
Differential Revision: https://reviews.llvm.org/D33940

llvm-svn: 305210
2017-06-12 17:58:42 +00:00
Sanjay Patel d4765a38b4 [DAG] add helper to bind memop chains; NFCI
This step is just intended to reduce code duplication rather than change any functionality.

A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper.

Differential Revision: https://reviews.llvm.org/D33649

llvm-svn: 305192
2017-06-12 14:41:48 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Zaara Syeda 3a7578c658 [PPC] Inline expansion of memcmp
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.

Differential Revision: https://reviews.llvm.org/D28637

llvm-svn: 304313
2017-05-31 17:12:38 +00:00
Tony Jiang 60c247de18 [PowerPC] Fix a performance bug for PPC::XXPERMDI.
There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI
Instruction, this patch recognizes them and does the selection to improve
the PPC performance.

Differential Revision: https://reviews.llvm.org/D33404

llvm-svn: 304298
2017-05-31 13:09:57 +00:00
Craig Topper f6d4dc5b4a [SelectionDAG] Set ISD::FPOWI to Expand by default
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".

This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits

Differential Revision: https://reviews.llvm.org/D33530

llvm-svn: 304215
2017-05-30 15:27:55 +00:00
Tim Shen a76f20c364 [PPC] Add text for assert.
llvm-svn: 303940
2017-05-25 23:40:46 +00:00
Tim Shen a2b85da879 [PPC] Fix atomics lowering in DAG lowering.
I forgot to forward the chain, causing some missing instruction
dependencies. The test crashes the compiler without this patch.

Inspired by the test case, D33519 also tries to remove the extra sync.

Differential Revision: https://reviews.llvm.org/D33573

llvm-svn: 303931
2017-05-25 22:58:35 +00:00
Tony Jiang 0a429f040e [PowerPC] Fix a performance bug for PPC::XXSLDWI.
There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI
instruction, this patch recognizes them and does the selection to improve the
PPC performance.

llvm-svn: 303822
2017-05-24 23:48:29 +00:00
Kyle Butt f6c61ef64d CodeGen: Power: Add lowering for shifts of v1i128.
When legalizing vector operations on vNi128, they will be split to v1i128
because that is a legal type on ppc64, but then the compiler will crash in
selection dag because it fails to select for these operations. This patch fixes
shift operations. Logical shift right and left shift can be performed in the
vector unit, but algebraic shift right requires being split.

Differential Revision: https://reviews.llvm.org/D32774

llvm-svn: 303307
2017-05-17 21:54:41 +00:00
Tim Shen 3bef27cc6f [PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync.
Summary:
This fixes pr32392.

The lowering pipeline is:
llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in
expandPostRAPseudo.

The reason why expandPostRAPseudo is chosen is because previous passes
are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne-
7, .+4 (some branch pass(s)).

Differential Revision: https://reviews.llvm.org/D32763

llvm-svn: 303205
2017-05-16 20:18:06 +00:00
Tim Shen 10c64e6aea [PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the backend. NFC.
Summary:
Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc.,
because those are not defined for b > sizeof(a) * 8, even after some of
the combiners run.

However, PPCISD::SHL defines that behavior (as the instructions themselves).
Move the combination to the backend.

The tests in shift_mask.ll still pass.

Reviewers: echristo, hfinkel, efriedma, iteratee

Subscribers: nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D33076

llvm-svn: 302937
2017-05-12 19:25:37 +00:00
Tim Shen 04de70d3a7 [Atomic] Remove IsStore/IsLoad in the interface, and pass the instruction instead. NFC.
Now both emitLeadingFence and emitTrailingFence take the instruction
itself, instead of taking IsLoad/IsStore pairs.
Instruction::mayReadFromMemory and Instrucion::mayWriteToMemory are used
for determining those two booleans.

The instruction argument is also useful for later D32763, in
emitTrailingFence. For emitLeadingFence, it seems to have cleaner
interface with the proposed change.

Differential Revision: https://reviews.llvm.org/D32762

llvm-svn: 302539
2017-05-09 15:27:17 +00:00
Serge Pavlov d526b13e61 Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to  CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.

This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.

The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
  affects all targets that use frame pseudo instructions and touched many
  files although the changes are uniform.
- Access to frame properties are implemented using special instructions
  rather than calls getOperand(N).getImm(). For X86 and ARM such
  replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
  instruction. These involve proper instruction initialization and
  methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
  frame parts initialized inside frame instruction pair and outside it.

The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.

Differential Revision: https://reviews.llvm.org/D32394

llvm-svn: 302527
2017-05-09 13:35:13 +00:00
Craig Topper f0aeee01c3 [KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits.
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.

Differential Revision: https://reviews.llvm.org/D32637

llvm-svn: 302262
2017-05-05 17:36:09 +00:00
Nemanja Ivanovic b89c27f515 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
Fixes PR30730.
This is a re-commit of a pulled commit. The commit was pulled because some
software projects contained uses of Altivec vectors that violated alignment
requirements. Known issues have now been fixed.

Committing on behalf of Lei Huang.

Differential Revision: https://reviews.llvm.org/D26861

llvm-svn: 301892
2017-05-02 01:47:34 +00:00
Amara Emerson d28f0cd448 Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.
This removes BinaryWithFlagsSDNode, and flags are now all passed by value.

Differential Revision: https://reviews.llvm.org/D32527

llvm-svn: 301803
2017-05-01 15:17:51 +00:00
Craig Topper d0af7e8ab8 [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.

This is largely a mechanical transformation from KnownZero to Known.Zero.

Differential Revision: https://reviews.llvm.org/D32569

llvm-svn: 301620
2017-04-28 05:31:46 +00:00
Krzysztof Parzyszek c8e8e2a046 Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937

llvm-svn: 301234
2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek 98ab4c64c4 Revert r301231: Accidentally committed stale files
I forgot to commit local changes before commit.

llvm-svn: 301232
2017-04-24 19:48:51 +00:00
Krzysztof Parzyszek c0197066d7 Move value type list from TargetRegisterClass to TargetRegisterInfo
Differential Revision: https://reviews.llvm.org/D31937

llvm-svn: 301231
2017-04-24 19:43:45 +00:00
Simon Pilgrim 37b536e4b3 [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

llvm-svn: 299201
2017-03-31 11:24:16 +00:00
Eric Christopher c78be4d3be Kill some trailing whitespace to make some new changes a bit easier.
llvm-svn: 298637
2017-03-23 19:41:10 +00:00
Nirav Dave ac6081cb67 Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

llvm-svn: 298179
2017-03-18 00:44:07 +00:00
Reid Kleckner 45707d4d5a Remove getArgumentList() in favor of arg_begin(), args(), etc
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.

In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D31052

llvm-svn: 298010
2017-03-16 22:59:15 +00:00
Hiroshi Inoue 138a3faa3e Test commit.
llvm-svn: 297959
2017-03-16 16:30:06 +00:00
Tim Shen c7472d912b Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size""
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/

Re-commit r296771 to continue testing on the patch.

Sorry for the trouble!

llvm-svn: 297256
2017-03-08 02:41:35 +00:00
Tim Shen 70054bb827 Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"
This reverts commit r296771.

We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)

llvm-svn: 297124
2017-03-07 07:40:10 +00:00
Nemanja Ivanovic 12e67d868a [PowerPC] Fix failure with STBRX when store is narrower than the bswap
Fixes a crash caused by r296811 by truncating the input of the STBRX node
when the bswap is wider than i32.

Fixes https://bugs.llvm.org/show_bug.cgi?id=32140

Differential Revision: https://reviews.llvm.org/D30615

llvm-svn: 297001
2017-03-06 07:32:13 +00:00
Guozhi Wei ed28e742ee [PPC] Fix code generation for bswap(int32) followed by store16
This patch fixes pr32063.

Current code in PPCTargetLowering::PerformDAGCombine can transform

bswap
store

into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,

1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().

2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.

Differential Revision: https://reviews.llvm.org/D30362

llvm-svn: 296811
2017-03-02 21:07:59 +00:00
Nemanja Ivanovic db8425eff0 [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29881

llvm-svn: 296771
2017-03-02 17:38:59 +00:00
Tony Jiang 8e8c444d3d [PowerPC] Expand ISEL instruction into if-then-else sequence.
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.

llvm-svn: 292154
2017-01-16 20:12:26 +00:00
Tony Jiang 8da139a9fd Revert "[PowerPC] Expand ISEL instruction into if-then-else sequence."
This reverts commit 1d0e0374438ca6e153844c683826ba9b82486bb1.

llvm-svn: 292131
2017-01-16 15:01:07 +00:00
Tony Jiang 7630b8c5ee [PowerPC] Expand ISEL instruction into if-then-else sequence.
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.

llvm-svn: 292128
2017-01-16 14:43:12 +00:00
Eugene Zelenko 8187c192c6 [PowerPC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291872
2017-01-13 00:58:58 +00:00
Hal Finkel b2f951d87a [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 291003
2017-01-04 21:05:13 +00:00
Chandler Carruth 05e80d31bd Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This patch appears to result in trampolines in vtables being miscompiled
when they in turn tail call a method.

I've posted some preliminary details about the failure on the thread for
this commit and talked to Hal. He was comfortable going ahead and
reverting until we sort out what is wrong.

llvm-svn: 289928
2016-12-16 07:31:20 +00:00
Hal Finkel 065b756528 [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. This will yield subtle bugs if
interposition is attempted. As a result, regardless of whether we're in PIC
mode, we don't assume that we need to add the nop to support the possibility of
ELF interposition. However, the necessary check is in place (i.e. calling
GV->isInterposable and TM.shouldAssumeDSOLocal) so when we have functions for
which interposition is allowed at the IR level, we'll add the nop as necessary.
In the mean time, we'll generate more tail calls and fewer nops when compiling
position-independent code.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 289638
2016-12-14 07:24:50 +00:00
Guozhi Wei 1fd553c934 [PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR
Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX.
This patch fixes pr31144.

Differential Revision: https://reviews.llvm.org/D27287

llvm-svn: 289473
2016-12-12 22:09:02 +00:00
Nemanja Ivanovic f9b191f135 [PowerPC] Improvements for BUILD_VECTOR Vol. 2
This patch corresponds to review:
https://reviews.llvm.org/D26023

This patch adds support for converting a vector of loads into a single load if
the loads are consecutive (in either direction).

llvm-svn: 288219
2016-11-29 23:57:54 +00:00
Nemanja Ivanovic 8c11e79b17 [PowerPC] Improvements for BUILD_VECTOR Vol. 2
This patch corresponds to review:
https://reviews.llvm.org/D25980

This is the 2nd patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector
of fp-to-int conversions into an fp-to-int conversion of a build vector of fp
values. For example:
Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...)
Into (fp_to_[su]i (build_vector $A, $B, ...))).
Which is a natural match for much cleaner code.

llvm-svn: 288218
2016-11-29 23:36:03 +00:00
Nemanja Ivanovic f57f150b1b Revert https://reviews.llvm.org/rL287679
This commit caused some miscompiles that did not show up on any of the bots.
Reverting until we can investigate the cause of those failures.

llvm-svn: 288214
2016-11-29 23:00:33 +00:00
Nemanja Ivanovic df1cb520df [PowerPC] Improvements for BUILD_VECTOR Vol. 1
This patch corresponds to review:
https://reviews.llvm.org/D25912

This is the first patch in a series of 4 that improve the lowering and combining
for BUILD_VECTOR nodes on PowerPC.

llvm-svn: 288152
2016-11-29 16:11:34 +00:00
Nemanja Ivanovic b8e30d6db6 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
This patch corresponds to review:
https://reviews.llvm.org/D26861

It also fixes PR30730.

Committing on behalf of Lei Huang.

llvm-svn: 287679
2016-11-22 19:02:07 +00:00
Ehsan Amiri 85818684c6 [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.

Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.

Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something 
similar.

llvm-svn: 287329
2016-11-18 10:41:44 +00:00
Joerg Sonnenberger 8c1a9ac52b Always use relative jump table encodings on PowerPC64.
For the default, small and medium code model, use the existing
difference from the jump table towards the label. For all other code
models, setup the picbase and use the difference between the picbase and
the block address.

Overall, this results in smaller data tables at the expensive of one or
two more arithmetic operation at the jump site. Given that we only create
jump tables with a lot more than two entries, it is a net win in size.
For larger code models the assumption remains that individual functions
are no larger than 2GB.

Differential Revision: https://reviews.llvm.org/D26336

llvm-svn: 287059
2016-11-16 00:37:30 +00:00
Tony Jiang 5f850cd1b1 [PowerPC] Implement BE VSX load/store builtins - llvm portion.
This patch implements all the overloads for vec_xl_be and vec_xst_be. On BE,
they behaves exactly the same with vec_xl and vec_xst, therefore they are
simply implemented by defining a matching macro. On LE, they are implemented
by defining new builtins and intrinsics. For int/float/long long/double, it
is just a load (lxvw4x/lxvd2x) or store(stxvw4x/stxvd2x). For char/char/short,
we also need some extra shuffling before or after call the builtins to get the
desired BE order. For int128, simply call vec_xl or vec_xst.

llvm-svn: 286967
2016-11-15 14:25:56 +00:00
Evandro Menezes 21f9ce1a0d [DAG Combiner] Fix the native computation of the Newton series for reciprocals
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself.  However, the original code did not properly consider this condition
if returned by a target.  This patch addresses the issues to allow a target
to compute the series on its own.

Differential revision: https://reviews.llvm.org/D22975

llvm-svn: 286523
2016-11-10 23:31:06 +00:00
Ehsan Amiri c90b02cf50 [PPC] Generate positive FP zero using xor insn instead of loading from constant area
https://reviews.llvm.org/D23614

Currently we load +0.0 from constant area. That can change to be generated using
XOR instruction.

llvm-svn: 284995
2016-10-24 17:31:09 +00:00
Sanjay Patel 0051efcf97 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Sanjay Patel 19601fa587 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel 08fff9ca81 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Sanjay Patel bfdbea6481 [Target] move reciprocal estimate settings from TargetOptions to TargetLowering
The motivation for the change is that we can't have pseudo-global settings for
codegen living in TargetOptions because that doesn't work with LTO.

Ideally, these reciprocal attributes will be moved to the instruction-level via
FMF, metadata, or something else. But making them function attributes is at least
an improvement over the current state.

The ingredients of this patch are:

    Remove the reciprocal estimate command-line debug option.
    Add TargetRecip to TargetLowering.
    Remove TargetRecip from TargetOptions.
    Clean up the TargetRecip implementation to work with this new scheme.
    Set the default reciprocal settings in TargetLoweringBase (everything is off).
    Update the PowerPC defaults, users, and tests.
    Update the x86 defaults, users, and tests.

Note that if this patch needs to be reverted, the related clang patch checked in
at r283251 should be reverted too.

Differential Revision: https://reviews.llvm.org/D24816

llvm-svn: 283252
2016-10-04 20:46:43 +00:00
Nemanja Ivanovic 11049f8f07 [Power9] Part-word VSX integer scalar loads/stores and sign extend instructions
This patch corresponds to review:
https://reviews.llvm.org/D23155

This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:

    Int to Fp conversions of 1 or 2-byte values loaded from memory
    Building vectors of 1 or 2-byte integers with values loaded from memory
    Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

llvm-svn: 283190
2016-10-04 06:59:23 +00:00
Hal Finkel a9321059b9 [PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.

Fixes PR26970.

llvm-svn: 283060
2016-10-02 02:10:20 +00:00
Nemanja Ivanovic 6f22b41398 [Power9] Builtins for ELF v.2 API conformance - back end portion
This patch corresponds to review:
https://reviews.llvm.org/D24396

This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.

llvm-svn: 282478
2016-09-27 08:42:12 +00:00
Nemanja Ivanovic d2c3c51a70 [Power9] Exploit move and splat instructions for build_vector improvement
This patch corresponds to review:
https://reviews.llvm.org/D21135

This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld

In order to improve some build_vector and extractelement patterns.

llvm-svn: 282246
2016-09-23 13:25:31 +00:00
Nemanja Ivanovic 8dacca943a [PowerPC] Sign extend sub-word values for atomic comparisons
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.

llvm-svn: 282182
2016-09-22 19:06:38 +00:00
Nemanja Ivanovic 6e7879c5e6 [Power9] Add exploitation of non-permuting memory ops
This patch corresponds to review:
https://reviews.llvm.org/D19825

The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.

llvm-svn: 282143
2016-09-22 09:52:19 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Sanjay Patel 5f6bb6cd24 getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCI
llvm-svn: 281490
2016-09-14 15:43:44 +00:00
Sanjay Patel bd6fca1419 getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCI
llvm-svn: 281489
2016-09-14 15:21:00 +00:00
Nemanja Ivanovic d5deb4896c Fix code-gen crash on Power9 for insert_vector_elt with variable index (PR30189)
This patch corresponds to review:
https://reviews.llvm.org/D24021

In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).

llvm-svn: 281479
2016-09-14 14:19:09 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Hal Finkel 5081ac27c7 Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:

  ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)

where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.

This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.

Fixes PR26761.

Differential Revision: https://reviews.llvm.org/D24038

llvm-svn: 280350
2016-09-01 10:28:47 +00:00
Hal Finkel b074a608ce [PowerPC] Add support for -mlongcall
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.

Fixes PR19098.

llvm-svn: 280040
2016-08-30 00:59:23 +00:00
Hal Finkel 3d70a9dbb7 [PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomics
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).

Fixes PR22923.

llvm-svn: 280022
2016-08-29 22:25:36 +00:00
Hal Finkel 5728200f33 [PowerPC] Implement lowering for atomicrmw min/max/umin/umax
Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.

llvm-svn: 279933
2016-08-28 16:17:58 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Chuang-Yu Cheng f7ba716bcb [ppc64] Don't apply sibling call optimization if callee has any byval arg
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.

This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328

Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin

https://reviews.llvm.org/D23441

llvm-svn: 278900
2016-08-17 03:17:44 +00:00
Pierre Gousseau 051db7d838 [x86] Refactor a PowerPC specific ctlz/srl transformation (NFC).
Following the discussion on D22038, this refactors a PowerPC specific setcc -> srl(ctlz) transformation so it can be used by other targets.

Differential Revision: https://reviews.llvm.org/D23445

llvm-svn: 278799
2016-08-16 13:53:53 +00:00
David Majnemer 0a16c22846 Use range algorithms instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278417
2016-08-11 21:15:00 +00:00
Strahinja Petrovic 30e0ce8e9f [PowerPC] fix passing long double arguments to function (soft-float)
This patch fixes passing long double type arguments to function in 
soft float mode. If there is less than 4 argument registers free 
(long double type is mapped in 4 gpr registers in soft float mode) 
long double type argument must be passed through stack.
Differential Revision: https://reviews.llvm.org/D20114.

llvm-svn: 277804
2016-08-05 08:47:26 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Sjoerd Meijer 89217f8835 TargetInstrInfo: rename GetInstSizeInBytes to getInstSizeInBytes. NFC
Differential Revision: https://reviews.llvm.org/D22925

llvm-svn: 276997
2016-07-28 16:32:22 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Nemanja Ivanovic b43bb6141e [Power9] Add codegen for VSX word insert/extract instructions
This patch corresponds to review:
http://reviews.llvm.org/D20239

It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.

llvm-svn: 275215
2016-07-12 21:00:10 +00:00
Nemanja Ivanovic eebbcb6d57 [PowerPC] Cannonicalize applicable vector shift immediates as swaps
This patch corresponds to review:
http://reviews.llvm.org/D21358

Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.

llvm-svn: 275168
2016-07-12 12:16:27 +00:00
Eric Christopher cd7194629b Use the class version of getPointerTy rather than getting back to
ourselves via a call through the DAG.

llvm-svn: 274721
2016-07-07 01:49:59 +00:00
Eric Christopher 317df66f15 Use the class definition for useSoftFloat.
llvm-svn: 274720
2016-07-07 01:49:57 +00:00
Eric Christopher 2454a3b4e7 Rename argument for consistency.
llvm-svn: 274717
2016-07-07 01:08:23 +00:00
Eric Christopher e0d09ba443 Remove the plumbing for isDarwinABI from EmitTailCallLoadFPAndRetAddr.
llvm-svn: 274716
2016-07-07 01:08:21 +00:00
Eric Christopher 606a268bed Use the MachineFunction that we've already queried for in the function.
llvm-svn: 274715
2016-07-07 01:08:19 +00:00
Eric Christopher 327e440c6c Remove the plumbing for isDarwinABI from the PrepareTailCall hierarchy.
llvm-svn: 274714
2016-07-07 01:08:17 +00:00
Eric Christopher ade4eed8a7 Remove the plumbing of 64-bitness from PrepareTailCall and functions
called by it.

llvm-svn: 274711
2016-07-07 00:39:32 +00:00
Eric Christopher c16ccbe731 Sink call to get the MachineFunction into EmitTailCallStoreFPAndRetAddr
and remove the argument.

llvm-svn: 274710
2016-07-07 00:39:30 +00:00
Eric Christopher b976a392e5 Remove unnecessary subtarget parameters in PPCTargetLowering.
llvm-svn: 274709
2016-07-07 00:39:27 +00:00
Sanjay Patel 9cc21ac412 fix typo; NFC
llvm-svn: 274636
2016-07-06 16:42:46 +00:00
Nemanja Ivanovic 44513e545f [PowerPC] - Legalize vector types by widening instead of integer promotion
This patch corresponds to review:
http://reviews.llvm.org/D20443

It changes the legalization strategy for illegal vector types from integer
promotion to widening. This only applies for vectors with elements of width
that is a multiple of a byte since we have hardware support for vectors with
1, 2, 3, 8 and 16 byte elements.
Integer promotion for vectors is quite expensive on PPC due to the sequence
of breaking apart the vector, extending the elements and reconstituting the
vector. Two of these operations are expensive.
This patch causes between minor and major improvements in performance on most
benchmarks. There are very few benchmarks whose performance regresses. These
regressions can be handled in a subsequent patch with a DAG combine (similar
to how this patch handles int -> fp conversions of illegal vector types).

llvm-svn: 274535
2016-07-05 09:22:29 +00:00
Duncan P. N. Exon Smith e4f5e4f4d1 CodeGen: Use MachineInstr& in TargetLowering, NFC
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr.  In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

llvm-svn: 274287
2016-06-30 22:52:52 +00:00
Rafael Espindola db6bd02185 Delete unused includes. NFC.
llvm-svn: 274225
2016-06-30 12:19:16 +00:00
Duncan P. N. Exon Smith 9cfc75c214 CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr.  This is a
general API improvement.

Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other.  Instead I've done everything as a block and just
updated what was necessary.

This is mostly mechanical fixes: adding and removing `*` and `&`
operators.  The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency.  Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy.  I couldn't run tests
for AVR since llc doesn't link with it turned on.

llvm-svn: 274189
2016-06-30 00:01:54 +00:00
Rafael Espindola a99ccfce1a Drop support for creating $stubs.
They are created by ld64 since OS X 10.5.

llvm-svn: 274130
2016-06-29 14:59:50 +00:00
Nick Lewycky 9980075133 NFC. Fix popular typo in comment 'deferencing' --> 'dereferencing'.
Bonus changes, * placement in X86ISelLowering and 'exerce' -> 'exercise' in test.

llvm-svn: 273984
2016-06-28 01:45:05 +00:00
Rafael Espindola 3beef8d6db Move shouldAssumeDSOLocal to Target.
Should fix the shared library build.

llvm-svn: 273958
2016-06-27 23:15:57 +00:00
Rafael Espindola 21d22a01ea Use the isPositionIndependent predicate. NFC.
llvm-svn: 273875
2016-06-27 14:05:43 +00:00
Rafael Espindola e1d255f05c Simplify getLabelAccessInfo.
It now takes a IsPIC flag instead of computing and returning it.

llvm-svn: 273871
2016-06-27 12:56:02 +00:00