Commit Graph

6109 Commits

Author SHA1 Message Date
Benjamin Kramer bb1dd73d3e DAGCombiner: Partially revert r192795, getNOT was fixed not to create illegal constants.
llvm-svn: 194959
2013-11-17 10:40:03 +00:00
Matt Arsenault 64283bd99c Use more getZExtOrTruncs
llvm-svn: 194945
2013-11-17 02:31:26 +00:00
Matt Arsenault 873bb3ea86 Use getZExtOrTrunc instead of repeating the same logic.
llvm-svn: 194944
2013-11-17 02:24:21 +00:00
Matt Arsenault 36f5eb5949 Use right address space pointer size
llvm-svn: 194940
2013-11-17 00:06:39 +00:00
Matt Arsenault dfb3e7092e Fix assert on unaligned access to global with different address space size.
llvm-svn: 194934
2013-11-16 20:50:54 +00:00
Matt Arsenault 19231e630e Fix codegen for null different sized pointer.
llvm-svn: 194932
2013-11-16 20:24:41 +00:00
Bob Wilson 9f3e6b25ee Avoid illegal integer promotion in fastisel
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow.  Results are different when:

    sext(a) + sext(b) != sext(a + b)

Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.

<rdar://problem/15292280>

Patch by Duncan Exon Smith!

llvm-svn: 194840
2013-11-15 19:09:27 +00:00
Daniel Sanders 50b8041066 Fix illegal DAG produced by SelectionDAG::getConstant() for v2i64 type
Summary:
When getConstant() is called for an expanded vector type, it is split into
multiple scalar constants which are then combined using appropriate build_vector
and bitcast operations.

In addition to the usual big/little endian differences, the case where the
element-order of the vector does not have the same endianness as the elements
themselves is also accounted for.  For example, for v4i32 on big-endian MIPS,
the byte-order of the vector is <3210,7654,BA98,FEDC>. For little-endian, it is
<0123,4567,89AB,CDEF>.
Handling this case turns out to be a nop since getConstant() returns a splatted
vector (so reversing the element order doesn't change the value)

This fixes a number of cases in MIPS MSA where calling getConstant() during
operation legalization introduces illegal types (e.g. to legalize v2i64 UNDEF
into a v2i64 BUILD_VECTOR of illegal i64 zeros). It should also handle bigger
differences between illegal and legal types such as legalizing v2i64 into v8i16.

lowerMSASplatImm() in the MIPS backend no longer needs to avoid calling
getConstant() so this function has been updated in the same patch.

For the sake of transparency, the steps I've taken since the review are:
* Added 'virtual' to isVectorEltOrderLittleEndian() as requested. This revealed
  that the MIPS tests were falsely passing because a polymorphic function was
  not actually polymorphic in the reviewed patch.
* Fixed the tests that were now failing. This involved deleting the code to
  handle the MIPS MSA element-order (which was previously doing an byte-order
  swap instead of an element-order swap). This left
  isVectorEltOrderLittleEndian() unused and it was deleted.
* Fixed build failures caused by rebasing beyond r194467-r194472. These build
  failures involved the bset, bneg, and bclr instructions added in these commits
  using lowerMSASplatImm() in a way that was no longer valid after this patch.
  Some of these were fixed by calling SelectionDAG::getConstant() instead,
  others were fixed by a new function getBuildVectorSplat() that provided the
  removed functionality of lowerMSASplatImm() in a more sensible way.

Reviewers: bkramer

Reviewed By: bkramer

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1973

llvm-svn: 194811
2013-11-15 12:56:49 +00:00
Matt Arsenault c5559bb14b Add target hook to prevent folding some bitcasted loads.
This is to avoid this transformation in some cases:
fold (conv (load x)) -> (load (conv*)x)

On architectures that don't natively support some vector
loads efficiently casting the load to a smaller vector of
larger types and loading is more efficient.

Patch by Micah Villmow.

llvm-svn: 194783
2013-11-15 04:42:23 +00:00
Matt Arsenault b03bd4d96b Add addrspacecast instruction.
Patch by Michele Scandale!

llvm-svn: 194760
2013-11-15 01:34:59 +00:00
Andrew Trick 561f2218e0 Minor extension to llvm.experimental.patchpoint: don't require a call.
If a null call target is provided, don't emit a dummy call. This
allows the runtime to reserve as little nop space as it needs without
the requirement of emitting a call.

llvm-svn: 194676
2013-11-14 06:54:10 +00:00
Juergen Ributzka 34c652d34d SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
This patch reapplies r193676 with an additional fix for the Hexagon backend. The
SystemZ backend has already been fixed by r194148.

The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. Now the type
legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 194542
2013-11-13 01:57:54 +00:00
Daniel Sanders a1840d2f88 Vector forms of SHL, SRA, and SRL can be constant folded using SimplifyVBinOp too
Reviewers: dsanders

Reviewed By: dsanders

CC: llvm-commits, nadav

Differential Revision: http://llvm-reviews.chandlerc.com/D1958

llvm-svn: 194393
2013-11-11 17:23:41 +00:00
Juergen Ributzka 87ed906b2e [Stackmap] Materialize the jump address within the patchpoint noop slide.
This patch moves the jump address materialization inside the noop slide. This
enables patching of the materialization itself or its complete removal. This
patch also adds the ability to define scratch registers that can be used safely
by the code called from the patchpoint intrinsic. At least one scratch register
is required, because that one is used for the materialization of the jump
address. This patch depends on D2009.

Differential Revision: http://llvm-reviews.chandlerc.com/D2074

Reviewed by Andy

llvm-svn: 194306
2013-11-09 01:51:33 +00:00
Juergen Ributzka 9969d3e6e8 [Stackmap] Add AnyReg calling convention support for patchpoint intrinsic.
The idea of the AnyReg Calling Convention is to provide the call arguments in
registers, but not to force them to be placed in a paticular order into a
specified set of registers. Instead it is up tp the register allocator to assign
any register as it sees fit. The same applies to the return value (if
applicable).

Differential Revision: http://llvm-reviews.chandlerc.com/D2009

Reviewed by Andy

llvm-svn: 194293
2013-11-08 23:28:16 +00:00
Andrew Trick 6664df12fb Slightly change the way stackmap and patchpoint intrinsics are lowered.
MorphNodeTo is not safe to call during DAG building. It eagerly
deletes dependent DAG nodes which invalidates the NodeMap. We could
expose a safe interface for morphing nodes, but I don't think it's
worth it. Just create a new MachineNode and replaceAllUsesWith.

My understaning of the SD design has been that we want to support
early target opcode selection. That isn't very well supported, but
generally works. It seems reasonable to rely on this feature even if
it isn't widely used.

llvm-svn: 194102
2013-11-05 22:44:04 +00:00
Juergen Ributzka 359c532d36 [Stackmap] Remove erroneous assert.
llvm-svn: 193871
2013-11-01 17:53:27 +00:00
Aaron Ballman 2b7a733b16 Commenting out this assert because it is causing the build bots to fail. This effectively reverts r193861, but needs to be fixed as part of r193769.
llvm-svn: 193862
2013-11-01 15:12:23 +00:00
Aaron Ballman 96321aa523 Fixing an order of evaluation error in an assert.
llvm-svn: 193861
2013-11-01 14:53:14 +00:00
Andrew Trick 153ebe6d2a Add support for stack map generation in the X86 backend.
Originally implemented by Lang Hames.

llvm-svn: 193811
2013-10-31 22:11:56 +00:00
Andrew Trick 74f4c749cf Lower stackmap intrinsics directly to their target opcode in the DAG builder.
llvm-svn: 193769
2013-10-31 17:18:24 +00:00
Andrew Trick d4d1d9c06e whitespace
llvm-svn: 193765
2013-10-31 17:18:07 +00:00
Jim Grosbach 7236678687 Legalize: Improve legalization of long vector extends.
When an extend more than doubles the size of the elements (e.g., a zext
from v16i8 to v16i32), the normal legalization method of splitting the
vectors will run into problems as by the time the destination vector is
legal, the source vector is illegal. The end result is the operation
often becoming scalarized, with the typical horrible performance. For
example, on x86_64, the simple input of:
define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind {
  %tmp = zext <16 x i8> %a to <16 x i32>
  store <16 x i32> %tmp, <16 x i32>*%p
  ret void
}

Generates:
  .section  __TEXT,__text,regular,pure_instructions
  .section  __TEXT,__const
  .align  5
LCPI0_0:
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .long 255                     ## 0xff
  .section  __TEXT,__text,regular,pure_instructions
  .globl  _bar
  .align  4, 0x90
_bar:
  vpunpckhbw  %xmm0, %xmm0, %xmm1
  vpunpckhwd  %xmm0, %xmm1, %xmm2
  vpmovzxwd %xmm1, %xmm1
  vinsertf128 $1, %xmm2, %ymm1, %ymm1
  vmovaps LCPI0_0(%rip), %ymm2
  vandps  %ymm2, %ymm1, %ymm1
  vpmovzxbw %xmm0, %xmm3
  vpunpckhwd  %xmm0, %xmm3, %xmm3
  vpmovzxbd %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vandps  %ymm2, %ymm0, %ymm0
  vmovaps %ymm0, (%rdi)
  vmovaps %ymm1, 32(%rdi)
  vzeroupper
  ret

So instead we can check if there are legal types that enable us to split
more cleverly when the input vector is already legal such that we don't
turn it into an illegal type. If the extend is such that it's more than
doubling the size of the input we check if
  - the number of vector elements is even,
  - the source type is legal,
  - the type of a split source is illegal,
  - the type of an extended (by doubling element size) source is legal, and
  - the type of that extended source when split is legal.
If the conditions are met, instead of just splitting both the
destination and the source types, we create an extend that only goes up
one "step" (doubling the element width), and the continue legalizing the
rest of the operation normally. The result is that this operates as a
new, more effecient, termination condition for the loop of "split the
operation until the destination type is legal."

With this change, the above example now compiles to:
_bar:
  vpxor %xmm1, %xmm1, %xmm1
  vpunpcklbw  %xmm1, %xmm0, %xmm2
  vpunpckhwd  %xmm1, %xmm2, %xmm3
  vpunpcklwd  %xmm1, %xmm2, %xmm2
  vinsertf128 $1, %xmm3, %ymm2, %ymm2
  vpunpckhbw  %xmm1, %xmm0, %xmm0
  vpunpckhwd  %xmm1, %xmm0, %xmm3
  vpunpcklwd  %xmm1, %xmm0, %xmm0
  vinsertf128 $1, %xmm3, %ymm0, %ymm0
  vmovaps %ymm0, 32(%rdi)
  vmovaps %ymm2, (%rdi)
  vzeroupper
  ret

This generalizes a custom lowering that was added a while back to the
ARM backend. That lowering is no longer necessary, and is removed. The
testcases for it, however, provide excellent ARM tests for this change
and so remain.

rdar://14735100

llvm-svn: 193727
2013-10-31 00:20:48 +00:00
Matt Arsenault 2ba54c3d90 Fix CodeGen for unaligned loads with address spaces
llvm-svn: 193721
2013-10-30 23:30:05 +00:00
Juergen Ributzka 3bd686d493 Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too."
Now Hexagon and SystemZ are not happy with it :-(

llvm-svn: 193677
2013-10-30 06:36:19 +00:00
Juergen Ributzka 6ad05d6b95 SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.

This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask type for the given target. This mask has
usually the same size as the VSELECT return type (except for Intel KNL). Now the
type legalizer will split both VSELECT and SETCC.

This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.

Reviewed by Nadav

llvm-svn: 193676
2013-10-30 05:48:18 +00:00
Alp Toker 6a03374526 Fix "existant" typos
llvm-svn: 193579
2013-10-29 02:35:28 +00:00
Richard Sandiford 981fdeb477 [DAGCombiner] Respect volatility when checking for aliases
Making useAA() default to true for SystemZ showed that the combiner alias
analysis wasn't handling volatile accesses.  This hit many of the SystemZ
tests, but I arbitrarily picked one for the purpose of this patch.

llvm-svn: 193518
2013-10-28 12:00:00 +00:00
Richard Sandiford 39c1ce4dc1 Keep TBAA info when rewriting SelectionDAG loads and stores
Most SelectionDAG code drops the TBAA info when creating a new form of a
load and store (e.g. during legalization, or when converting a plain
load to an extending one).  This patch tries to catch all cases where
the TBAA information can legitimately be carried over.

The patch adds alternative forms of getLoad() and getExtLoad() that take
a MachineMemOperand instead of individual fields.  (The corresponding
getTruncStore() already exists.)  The idea is to use the MachineMemOperand
forms when all fields are carried over (size, pointer info, isVolatile,
isNonTemporal, alignment and TBAA info).  If some adjustment is being
made, e.g. to narrow the load, then we still pass the individual fields
but also pass the TBAA info.

llvm-svn: 193517
2013-10-28 11:17:59 +00:00
Tim Northover a564d329c2 LegalizeDAG: allow libcalls for max/min atomic operations
ARM processors without ldrex/strex need to be able to make libcalls for all
atomic operations, including the newer min/max versions.

The alternative would probably be expanding these operations in terms of
cmpxchg (as x86 does always), but in the configurations where this matters
code-size tends to be paramount so the libcall is more desirable.

llvm-svn: 193398
2013-10-25 09:30:20 +00:00
Nadav Rotem d369d4bdf9 Optimize concat_vectors(X, undef) -> scalar_to_vector(X).
This optimization is not SSE specific so I am moving it to DAGco.
The new scalar_to_vector dag node exposed a missing pattern in the AArch64 target that I needed to add.

llvm-svn: 193393
2013-10-25 06:41:18 +00:00
Tom Stellard 8d7d4deafe SelectionDAG: Pass along the original argument/element type in ISD::InputArg
For some targets, it is useful to be able to look at the original
type of an argument without having to dig through the original IR.

This also fixes a bug in SelectionDAGBuilder where InputArg.PartOffset
was not taking into account the offset of structure elements.

Patch by: Justin Holewinski

Tom Stellard:
  - Changed the type of ArgVT to EVT, so it can store non-simple types
    like v3i32.

llvm-svn: 193214
2013-10-23 00:44:24 +00:00
Wan Xiaofei 2f8dc08b8c Using FoldingSet in SelectionDAG::getVTList.
VTList has a long life cycle through the module and getVTList is frequently called. In current getVTList, sequential search over a std::vector is used, this is inefficient in big module.
This patch use FoldingSet to implement hashing mechanism when searching.

Reviewer: Nadav Rotem
Test    : Pass unit tests & LNT test suite

llvm-svn: 193150
2013-10-22 08:02:02 +00:00
Matt Arsenault b768912db8 Fix CodeGen for different size address space GEPs
llvm-svn: 193111
2013-10-21 20:03:54 +00:00
Matt Arsenault bbd24901cf Reuse variable
llvm-svn: 193107
2013-10-21 19:24:15 +00:00
Bill Schmidt 3684fdd59f [PATCH] Fix PR17168 (DAG scheduler inserts DBG_VALUE before PHI with fast-isel)
PR17168 describes a test case that fails when compiling for debug with
fast-isel.  Investigation showed that the test was failing because a DBG_VALUE
machine instruction was placed prior to a PHI.

For this problem to occur requires the following:
 * Compile for debug
 * Compile with fast-isel
 * In a block B, fast-isel must partially succeed before punting to DAG-isel
 * B must start with a PHI
 * The first unhandled node in the DAG must not generate a machine instruction
 * A debug value with an order less than that of that first node exists

When all of these circumstances apply, the existing test that an instruction
was not inserted won't fire.  Currently it tests whether the block is empty,
or whether the last instruction generated is a phi.  When fast-isel has
partially succeeded, the last instruction generated will not be a phi.
Instead, we need to check whether the current insert position is immediately
following a phi.  This patch adds that check, and adds the test case from the
PR as a regression test.

llvm-svn: 192976
2013-10-18 14:20:11 +00:00
David Majnemer 451b7dd1ef CodeGen: Emit a libcall if the target doesn't support 16-byte wide atomics
There are targets that support i128 sized scalars but cannot emit
instructions that modify them directly.  The proper thing to do is to
emit a libcall.

This fixes PR17481.

llvm-svn: 192957
2013-10-18 08:03:43 +00:00
Richard Sandiford 95f7ba988b Replace sra with srl if a single sign bit is required
E.g. (and (sra (i32 x) 31) 2) -> (and (srl (i32 x) 30) 2).

llvm-svn: 192884
2013-10-17 11:16:57 +00:00
Andrea Di Biagio 561badf717 Fix edge condition in DAGCombiner to improve codegen of shift sequences.
When canonicalizing dags according to the rule
(shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1))

remember to add the new shl dag to the DAGCombiner worklist of nodes.
If we don't explicitly add it to the worklist of nodes to visit, we
may not trigger later on the rule that folds the shift left + logical
shift right into a AND instruction with bitmask.

llvm-svn: 192883
2013-10-17 11:02:58 +00:00
Jack Carter d4e9615d1c [projects/test-suite] White space and long line fixes.
No functionality changes.

llvm-svn: 192863
2013-10-17 01:34:33 +00:00
Benjamin Kramer 00eb07b791 DAGCombiner: Don't fold xor into not if getNOT would introduce an illegal constant.
This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly
because i64 is illegal. It would be nice if getNOT would handle this
transparently, but I don't see a way to generate a legal constant there right
now. Fixes PR17487.

llvm-svn: 192795
2013-10-16 14:16:19 +00:00
Richard Sandiford 374a0e50c4 Handle (shl (anyext (shr ...))) in SimpilfyDemandedBits
This is really an extension of the current (shl (shr ...)) -> shl optimization.
The main difference is that certain upper bits must also not be demanded.

The motivating examples are the first two in the testcase, which occur
in llvmpipe output.

llvm-svn: 192783
2013-10-16 10:26:19 +00:00
David Blaikie 6004dbc9fa Fix indenting.
That wasn't confusing /at all/...

llvm-svn: 192617
2013-10-14 20:15:04 +00:00
Elena Demikhovsky 82a46ebe0a Fixed a bug in dynamic allocation memory on stack.
The alignment of allocated space was wrong, see Bugzila 17345.

Done by Zvi Rackover <zvi.rackover@intel.com>.

llvm-svn: 192573
2013-10-14 07:26:51 +00:00
Will Dietz ae726a93e3 TargetLowering: Don't index into empty string.
(This is triggered by current lit tests)

llvm-svn: 192549
2013-10-13 03:08:49 +00:00
Quentin Colombet de0e06234c [DAGCombiner] Reapply load slicing (192471) with a test that explicitly set sse4.2 support.
This should fix the buildbots.

Original commit message:
[DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
a = load i64* addr
b = trunc i64 a to i32
c = lshr i64 a, 32
d = trunc i64 c to i32

into:
b = load i32* addr1
d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192476
2013-10-11 18:29:42 +00:00
Quentin Colombet 5aee63d9e3 [DAGCombiner] Revert load slicing (r192471), until I figure out why it fails on ubuntu.
llvm-svn: 192474
2013-10-11 18:17:17 +00:00
Quentin Colombet 41dc258f71 [DAGCombiner] Slice a big load in two loads when the element are next to each
other in memory and the target has paired load and performs post-isel loads
combining.

E.g., this optimization will transform something like this:
 a = load i64* addr
 b = trunc i64 a to i32
 c = lshr i64 a, 32
 d = trunc i64 c to i32

into:
 b = load i32* addr1
 d = load i32* addr2
Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and
performs post-isel loads combining.

One should overload TargetLowering::hasPairedLoad to provide this information.
The default is false.

<rdar://problem/14477220>

llvm-svn: 192471
2013-10-11 18:01:14 +00:00
Matt Arsenault a98c3b1816 Use getPointerSizeInBits() rather than 8 * getPointerSize()
llvm-svn: 192386
2013-10-10 19:09:05 +00:00
Craig Topper a7afa71494 Fix some assert messages to say the correct opcode name. Looks like one assert got copy and pasted to many places.
llvm-svn: 192078
2013-10-06 22:38:19 +00:00