Commit Graph

11563 Commits

Author SHA1 Message Date
David Majnemer 14141f941a Revert most of r225597
We can't rely on a DataLayout enlightened constant folder.

llvm-svn: 225599
2015-01-11 07:29:51 +00:00
David Majnemer 292d0c796b X86: Properly decode shuffle masks when the constant pool type is weird
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation.  This occurs when Constants have
the same bit pattern but have different types.

Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.

This fixes PR22188.

llvm-svn: 225597
2015-01-11 05:08:57 +00:00
Saleem Abdulrasool 9cf2679d3b X86: teach X86TargetLowering about L,M,O constraints
Teach the ISelLowering for X86 about the L,M,O target specific constraints.
Although, for the moment, clang performs constraint validation and prevents
passing along inline asm which may have immediate constant constraints violated,
the backend should be able to cope with the invalid inline asm a bit better.

llvm-svn: 225596
2015-01-11 04:39:24 +00:00
Chandler Carruth c491f72e7a [x86] Remove some windows line endings that snuck into the tests here.
Folks on Windows, remember to set up your subversion to strip these when
submitting...

llvm-svn: 225593
2015-01-11 01:36:20 +00:00
Sanjoy Das 81401d4b19 Fix PR22179.
We were incorrectly inferring nsw for certain SCEVs. We can be more
aggressive here (see Richard Smith's comment on
http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just
focuses on correctness.

Differential Revision: http://reviews.llvm.org/D6914

llvm-svn: 225591
2015-01-10 23:41:24 +00:00
Simon Pilgrim 94a4cc027a [X86][SSE] Improved (v)insertps shuffle matching
In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable.

This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced.

We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function.

Differential Revision: http://reviews.llvm.org/D6879

llvm-svn: 225589
2015-01-10 19:45:33 +00:00
Hal Finkel 5d5d1539cc [PowerPC] Mark zext of a small scalar load as free
This initial implementation of PPCTargetLowering::isZExtFree marks as free
zexts of small scalar loads (that are not sign-extending). This callback is
used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to
determine whether a zext or an anyext is used to lower illegally-typed PHIs.
Because later truncates of zero-extended values are nops, this allows for the
elimination of later unnecessary truncations.

Fixes the initial complaint associated with PR22120.

llvm-svn: 225584
2015-01-10 08:21:59 +00:00
Justin Hibbits 654346e6f9 Fully fix Bug #22115.
Summary:
In the previous commit, the register was saved, but space was not allocated.
This resulted in the parameter save area potentially clobbering r30, leading to
nasty results.

Test Plan: Tests updated

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6906

llvm-svn: 225573
2015-01-10 01:57:21 +00:00
Simon Pilgrim ec1f2c2cab [X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zeros
pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable.

Differential Revision: http://reviews.llvm.org/D6878

llvm-svn: 225551
2015-01-09 22:03:19 +00:00
Daniel Sanders 1440bb2a26 [mips] Add support for accessing $gp as a named register.
Summary:
Mips Linux uses $gp to hold a pointer to thread info structure and accesses it
with a named register. This makes this work for LLVM.

The N32 ABI doesn't quite work yet since the frontend generates incorrect IR
for this case. It neglects to truncate the 64-bit GPR to a 32-bit value before
converting to a pointer. Given correct IR (as in the testcase in this patch),
it works correctly.

Reviewers: sstankovic, vmedic, atanasyan

Reviewed By: atanasyan

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6893

llvm-svn: 225529
2015-01-09 17:21:30 +00:00
Matthias Braun 7e87384592 RegisterCoalescer: Fix removeCopyByCommutingDef with subreg liveness
The code that eliminated additional coalescable copies in
removeCopyByCommutingDef() used MergeValueNumberInto() which internally
may merge A into B or B into A. In this case A and B had different Def
points, so we have to reset ValNo.Def to the intended one after merging.

llvm-svn: 225503
2015-01-09 03:01:31 +00:00
Hal Finkel 6c39269a4c [PowerPC] Fold [sz]ext with fp_to_int lowering where possible
On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32
to i64 into the load necessary for an i64 -> fp conversion.

llvm-svn: 225493
2015-01-09 01:34:30 +00:00
Hal Finkel 3c0952b072 [PowerPC] Mark all instructions as non-cheap for MachineLICM
MachineLICM uses a callback named hasLowDefLatency to determine if an
instruction def operand has a 'low' latency. If all relevant operands have a
'low' latency, the instruction is considered too cheap to hoist out of loops
even in low-register-pressure situations. On PowerPC cores, both the embedded
cores and the others, there is no reason to believe that this is a good choice:
all instructions have a cost inside a loop, and hoisting them when not limited
by register pressure is a reasonable default.

llvm-svn: 225471
2015-01-08 22:11:49 +00:00
Akira Hatanaka 442b40c2eb [ARM] Fix a bug in constant island pass that was triggering an assertion.
The assert was being triggered when the distance between a constant pool entry
and its user exceeded the maximally allowed distance after thumb2 branch
shortening. A padding was inserted after a thumb2 branch instruction was shrunk,
which caused the user to be out of range. This is wrong as the padding should
have been inserted by the layout algorithm so that the distance between two
instructions doesn't grow later during thumb2 instruction optimization.

This commit fixes the code in ARMConstantIslands::createNewWater to call
computeBlockSize and set BasicBlock::Unalign when a branch instruction is
inserted to create new water after a basic block. A non-zero Unalign causes
the worst-case padding to be inserted when adjustBBOffsetsAfter is called to
recompute the basic block offsets.

rdar://problem/19130476

llvm-svn: 225467
2015-01-08 20:44:50 +00:00
Justin Hibbits 98a532dd8e Add saving and restoring of r30 to the prologue and epilogue, respectively
Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register).  This does that.

Test Plan: Tests updated.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6876

llvm-svn: 225450
2015-01-08 15:47:19 +00:00
Kristof Beyls 933de7aa06 Fix large stack alignment codegen for ARM and Thumb2 targets
This partially fixes PR13007 (ARM CodeGen fails with large stack
alignment): for ARM and Thumb2 targets, but not for Thumb1, as it
seems stack alignment for Thumb1 targets hasn't been supported at
all.

Producing an aligned stack pointer is done by zero-ing out the lower
bits of the stack pointer. The BIC instruction was used for this.
However, the immediate field of the BIC instruction only allows to
encode an immediate that can zero out up to a maximum of the 8 lower
bits. When a larger alignment is requested, a BIC instruction cannot
be used; llvm was silently producing incorrect code in this case.

This commit fixes code generation for large stack aligments by
using the BFC instruction instead, when the BFC instruction is
available.  When not, it uses 2 instructions: a right shift,
followed by a left shift to zero out the lower bits.

The lowering of ARM::Int_eh_sjlj_dispatchsetup still has code
that unconditionally uses BIC to realign the stack pointer, so it
very likely has the same problem. However, I wasn't able to
produce a test case for that. This commit adds an assert so that
the compiler will fail the assert instead of silently generating
wrong code if this is ever reached.

llvm-svn: 225446
2015-01-08 15:09:14 +00:00
Tom Stellard 654d669e56 R600/SI: Remove SIISelLowering::legalizeOperands()
Its functionality has been replaced by calling
SIInstrInfo::legalizeOperands() from
SIISelLowering::AdjstInstrPostInstrSelection() and running the
SIFoldOperands and SIShrinkInstructions passes.

llvm-svn: 225445
2015-01-08 15:08:17 +00:00
Elena Demikhovsky 285fbd551a Masked Load/Store - fixed a bug in type legalization.
llvm-svn: 225441
2015-01-08 12:29:19 +00:00
Michael Kuperstein 381dc08bc1 Fix a think-o in the test for r225438.
llvm-svn: 225440
2015-01-08 12:05:02 +00:00
Michael Kuperstein 46f7d525c3 [X86] Don't try to generate direct calls to TLS globals
The call lowering assumes that if the callee is a global, we want to emit a direct call.
This is correct for regular globals, but not for TLS ones.

Differential Revision: http://reviews.llvm.org/D6862

llvm-svn: 225438
2015-01-08 11:50:58 +00:00
Craig Topper 0c4d51b779 Fix test case I missed in r225432.
llvm-svn: 225434
2015-01-08 07:57:27 +00:00
Quentin Colombet a799e2e014 [RegAllocGreedy] Introduce a late pass to repair broken hints.
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.


** Context **

Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.

This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B

Whereas we could simply have done:
def B
...
def A
use A
...
use B


** Proposed Solution **

A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.

Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.

The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.


** Example **

Consider the following example:
BB1:
  a =
  b =
BB2:
  ...
   = b
   = a
Let us assume b gets split:
BB1:
  a =
  b =
BB2:
  c = b
  ...
  d = c
  = d
  = a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
  a =
  st a, SpillSlot
  b =
BB2:
  c = b
  ...
  d = c
  = d
  e = ld SpillSlot
  = e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.


** Performances **

Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.

<rdar://problem/18312047>

llvm-svn: 225422
2015-01-08 01:16:39 +00:00
Matthias Braun d55e6ddacf RegisterCoalescer: Fix valuesIdentical() in some subrange merge cases.
I got confused and assumed SrcIdx/DstIdx of the CoalescerPair is a
subregister index in SrcReg/DstReg, but they are actually subregister
indices of the coalesced register that get you back to SrcReg/DstReg
when applied.

Fixed the bug, improved comments and simplified code accordingly.

Testcase by Tom Stellard!

llvm-svn: 225415
2015-01-07 23:58:38 +00:00
Philip Reames 76ebd15437 [GC] improve testing around gc.relocate and fix a test
Patch by: Ramkumar Ramachandra <artagnon@gmail.com>

"This patch started out as an exploration of gc.relocate, and an attempt
to write a simple test in call-lowering. I then noticed that the
arguments of gc.relocate were not checked fully, so I went in and fixed
a few things. Finally, the most important outcome of this patch is that
my new error handling code caught a bug in a callsite in
stackmap-format."

Differential Revision: http://reviews.llvm.org/D6824

llvm-svn: 225412
2015-01-07 22:48:01 +00:00
Tom Stellard 0599297cb4 R600/SI: Commute instructions to enable more folding opportunities
llvm-svn: 225410
2015-01-07 22:44:19 +00:00
Tom Stellard 26cc18df43 R600/SI: Only fold immediates that have one use
Folding the same immediate into multiple instruction will increase
program size, which can hurt performance.

llvm-svn: 225405
2015-01-07 22:18:27 +00:00
Olivier Sallenave 0451532996 More FMA folding opportunities.
llvm-svn: 225380
2015-01-07 20:54:17 +00:00
Tom Stellard 4842c05216 R600/SI: Add a V_MOV_B64 pseudo instruction
This is used to simplify the SIFoldOperands pass and make it easier to
fold immediates.

llvm-svn: 225373
2015-01-07 20:27:25 +00:00
Tom Stellard ef3b864a07 R600/SI: Teach SIFoldOperands to split 64-bit constants when folding
This allows folding of sequences like:

s[0:1] = s_mov_b64 4
v_add_i32 v0, s0, v0
v_addc_u32 v1, s1, v1

into

v_add_i32 v0, 4, v0
v_add_i32 v1, 0, v1

llvm-svn: 225369
2015-01-07 19:56:17 +00:00
Philip Reames 4ac17a3026 Introduce an example statepoint GC strategy
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)

Most of the validation logic added here as proof of concept will soon move in to the Verifier.  That move is dependent on http://reviews.llvm.org/D6811

There was discussion in the review thread about addrspace(1) being reserved for something.  I'm going to follow up on a seperate llvmdev thread.  If needed, I'll update all the code at once.

Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.

Differential Revision: http://reviews.llvm.org/D6808

llvm-svn: 225365
2015-01-07 19:07:50 +00:00
David Majnemer 4d77fdf311 X86: Allow the stack probe size to be configurable per function
LLVM emits stack probes on Windows targets to ensure that the stack is
correctly accessed.  However, the amount of stack allocated before
emitting such a probe is hardcoded to 4096.

It is desirable to have this be configurable so that a function might
opt-out of stack probes.  Our level of granularity is at the function
level instead of, say, the module level to permit proper generation of
code after LTO.

Patch by Andrew H!

N.B.  The inliner needs to be updated to properly consider what happens
after inlining a function with a specific stack-probe-size into another
function with a different stack-probe-size.

llvm-svn: 225360
2015-01-07 18:14:07 +00:00
Ahmed Bougacha aa2d290997 [X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.
For code like:
    float foo(float x) { return copysign(1.0, x); }
We used to generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    movss  <1.000000e+00>, %xmm1
    andps  <nan>, %xmm1
    orps   %xmm0, %xmm1
Basically doing an abs(1.0f) in the two middle instructions.

We now generate:
    andps  <-0.000000e+00,0,0,0>, %xmm0
    orps   <1.000000e+00,0,0,0>, %xmm0

Builds on cleanups r223415, r223542.
rdar://19049548
Differential Revision: http://reviews.llvm.org/D6555

llvm-svn: 225357
2015-01-07 17:33:03 +00:00
Charlie Turner 06f22f4678 [ARM] Add missing Tag_DIV_use tests.
llvm-svn: 225348
2015-01-07 11:37:40 +00:00
Karthik Bhat 9ba55334dc Revert r225165 and r225169
Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case
where the original subtraction could have caused an overflow.
Reverting the same.

llvm-svn: 225341
2015-01-07 06:34:34 +00:00
Matt Arsenault d0101a2dfd R600/SI: Add combine for isinfinite pattern
llvm-svn: 225310
2015-01-06 23:00:46 +00:00
Matt Arsenault 6f6233dc58 R600/SI: Pattern match isinf to v_cmp_class instructions
llvm-svn: 225307
2015-01-06 23:00:41 +00:00
Matt Arsenault f2290336b7 R600/SI: Add basic DAG combines for fp_class
llvm-svn: 225306
2015-01-06 23:00:39 +00:00
Matt Arsenault 4831ce5491 R600/SI: Add class intrinsic
llvm-svn: 225305
2015-01-06 23:00:37 +00:00
Rafael Espindola 83a362cde8 Change the .ll syntax for comdats and add a syntactic sugar.
In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.

Just dropping the $name causes problems for

@foo = globabl i32 0, comdat
$bar = comdat ...

and

declare void @foo() comdat
$bar = comdat ...

So the syntax is changed to

@g1 = globabl i32 0, comdat($c1)
@g2 = globabl i32 0, comdat

and

declare void @foo() comdat($c1)
declare void @foo() comdat

llvm-svn: 225302
2015-01-06 22:55:16 +00:00
Hal Finkel ed844c4ad1 [PowerPC] Reuse a load operand in int->fp conversions
int->fp conversions on PPC must be done through memory loads and stores. On a
modern core, this process begins by storing the int value to memory, then
loading it using a (sometimes special) FP load instruction. Unfortunately, we
would do this even when the value to be converted was itself a load, and we can
just use that same memory location instead of copying it to another first.
There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs,
because the fp_to_int operand has not been lowered when the int_to_fp is being
lowered. We handle this specially by invoking fp_to_int's lowering logic
(partially) and getting the necessary memory location (some trivial refactoring
was done to make this possible).

This is all somewhat ugly, and it would be nice if some later CodeGen stage
could just clean this stuff up, but because doing so would involve modifying
target-specific nodes (or instructions), it is not immediately clear how that
would work.

Also, remove a related entry from the README.txt for which we now generate
reasonable code.

llvm-svn: 225301
2015-01-06 22:31:02 +00:00
Tom Stellard 9d6797ae58 R600/SI: Insert s_waitcnt before s_barrier instructions.
This ensures that all memory operations are complete when all threads
reach the barrier.

llvm-svn: 225290
2015-01-06 19:52:07 +00:00
Tom Stellard 49f8bfdcb7 R600/SI: Add a stub GCNTargetMachine
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.

It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.

llvm-svn: 225277
2015-01-06 18:00:21 +00:00
Andrea Di Biagio f807a6f297 [CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz.
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.

The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.

With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.

Added new test cases to CodeGen/X86/cttz-ctlz.ll

Differential Revision: http://reviews.llvm.org/D6853

llvm-svn: 225274
2015-01-06 17:41:18 +00:00
Hal Finkel ef35ed0892 [PowerPC] Add a regression test for r225251
In r225251, I removed an old entry from the README.txt file. While there are
several contributing factors (including pieces in Clang's ABI code), upon
further reflection, the backend part deserves a regression test.

llvm-svn: 225268
2015-01-06 16:46:37 +00:00
Colin LeMahieu 1445553474 [Hexagon] Adding dealloc_return encoding and absolute address stores.
llvm-svn: 225267
2015-01-06 16:15:15 +00:00
David Majnemer 29c52f7449 X86: Don't make illegal GOTTPOFF relocations
"ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF
relocation target a movq or addq instruction.

Prohibit the truncation of such loads to movl or addl.

This fixes PR22083.

Differential Revision: http://reviews.llvm.org/D6839

llvm-svn: 225250
2015-01-06 07:12:52 +00:00
Hal Finkel 5efb918844 [PowerPC] Improve int_to_fp(fp_to_int(x)) combining
The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x))
without a load/store pair is updated here with support for unsigned integers,
and to support single-precision values without a third rounding step, on newer
cores with the appropriate instructions.

llvm-svn: 225248
2015-01-06 06:01:57 +00:00
Hal Finkel f93f56d619 [PowerPC] Fix test to pass on Darwin hosts
llvm-svn: 225220
2015-01-05 23:17:43 +00:00
Hal Finkel 9187711f08 [PowerPC] Convert a README.txt entry into a better test
We now produce the desired code as noted in the README.txt file (no spurious
or). Remove the README entry and improve the regression test.

llvm-svn: 225214
2015-01-05 21:53:52 +00:00
Hal Finkel c7d35bb5b1 [PowerPC] Add a test for truncating a shifted load
We now produce the desired code as noted in the README.txt file. Remove the
README entry and add a regression test.

llvm-svn: 225209
2015-01-05 21:33:14 +00:00