Commit Graph

40391 Commits

Author SHA1 Message Date
Simon Pilgrim 001368abc8 [X86] Moved getTargetConstantFromNode function so a future patch is more understandable. NFCI.
llvm-svn: 288147
2016-11-29 15:32:58 +00:00
Simon Pilgrim 35c47c494d [X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX.
We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output.

llvm-svn: 288140
2016-11-29 14:18:51 +00:00
Simon Pilgrim 923020a652 Avoid repeated calls to MVT::getScalarSizeInBits(). NFCI.
llvm-svn: 288138
2016-11-29 13:43:08 +00:00
Tom Stellard 0bc688116c AMDGPU/SI: Avoid moving PHIs to VALU when phi values are defined in scalar branches
Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23417

llvm-svn: 288095
2016-11-29 00:46:46 +00:00
Matthias Braun 115efcd3d1 MachineScheduler: Export function to construct "default" scheduler.
This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.

This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().

Differential Revision: https://reviews.llvm.org/D26986

llvm-svn: 288057
2016-11-28 20:11:54 +00:00
Stanislav Mekhanoshin 0ee250eee8 [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp
is implemented. Additional side effect of this is that we may consume less VGPRs
at a cost of more SGPRs in case if holding of multiple conditions is needed, and
that is a clear win in most cases.

Differential Revision: https://reviews.llvm.org/D26114

llvm-svn: 288053
2016-11-28 18:58:49 +00:00
Simon Pilgrim 2228f70a85 [X86][SSE] Add initial support for combining (V)PMOVZX with shuffles.
llvm-svn: 288049
2016-11-28 17:58:19 +00:00
Sanjay Patel 100bc01a72 [x86] fix formatting; NFC
llvm-svn: 288045
2016-11-28 17:39:21 +00:00
Simon Pilgrim 3f10e66981 [X86][SSE] Added support for combining bit-shifts with shuffles.
Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining.

Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations.

llvm-svn: 288040
2016-11-28 16:25:01 +00:00
Daniel Cederman 59168e28e0 Test commit
llvm-svn: 288036
2016-11-28 15:33:03 +00:00
Ulrich Weigand a29bf16ed5 [SystemZ] Fix build bot fallout from r288030
Remove unused variable that came in due to a copy-and-paste bug
and caused build bot failures.

llvm-svn: 288033
2016-11-28 14:24:14 +00:00
Ulrich Weigand 84404f30b3 [SystemZ] Support execution hint instructions
This adds assembler support for the instructions provided by the
execution-hint facility (NIAI and BP(R)P).  This required adding
support for the new relocation types for 12-bit and 24-bit PC-
relative offsets used by the BP(R)P instructions.

llvm-svn: 288031
2016-11-28 14:01:51 +00:00
Ulrich Weigand 2d9e3d9d3b [SystemZ] Support load-and-trap instructions
This adds support for the instructions provided with the
load-and-trap facility.

llvm-svn: 288030
2016-11-28 13:59:22 +00:00
Ulrich Weigand 758399131a [SystemZ] Add remaining branch instructions
This patch adds assembler support for the remaining branch instructions:
the non-relative branch on count variants, and all variants of branch
on index.

The only one of those that can be readily exploited for code generation
is BRCTH (branch on count using a high 32-bit register as count).  Do
use it, however, it is necessary to also introduce a hew CHIMux pseudo
to allow comparisons of a 32-bit value agains a short immediate to go
into a high register as well (implemented via CHI/CIH).

This causes a bit of codegen changes overall, but those have proven to
be neutral (or even beneficial) in performance measurements.

llvm-svn: 288029
2016-11-28 13:40:08 +00:00
Ulrich Weigand 524f276c74 [SystemZ] Improve use of conditional instructions
This patch moves formation of LOC-type instructions from (late)
IfConversion to the early if-conversion pass, and in some cases
additionally creates them directly from select instructions
during DAG instruction selection.

To make early if-conversion work, the patch implements the
canInsertSelect / insertSelect callbacks.  It also implements
the commuteInstructionImpl and FoldImmediate callbacks to
enable generation of the full range of LOC instructions.

Finally, the patch adds support for all instructions of the
load-store-on-condition-2 facility, which allows using LOC
instructions also for high registers.

Due to the use of the GRX32 register class to enable high registers,
we now also have to handle the cases where there are still no single
hardware instructions (conditional move from a low register to a high
register or vice versa).  These are converted back to a branch sequence
after register allocation.  Since the expandRAPseudos callback is not
allowed to create new basic blocks, this requires a simple new pass,
modelled after the ARM/AArch64 ExpandPseudos pass.

Overall, this patch causes significantly more LOC-type instructions
to be used, and results in a measurable performance improvement.

llvm-svn: 288028
2016-11-28 13:34:08 +00:00
Craig Topper 17786f77f0 [X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't commutable as operand 0 should pass its upper bits through to the output.
llvm-svn: 288011
2016-11-27 21:37:04 +00:00
Craig Topper 13b27a2748 [X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic patterns.
llvm-svn: 288010
2016-11-27 21:37:02 +00:00
Craig Topper ff9d45875a [X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.
llvm-svn: 288009
2016-11-27 21:37:00 +00:00
Craig Topper 3674f44e40 [X86] Add SHL by 1 to the load folding tables.
I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships.

llvm-svn: 288007
2016-11-27 21:36:54 +00:00
Simon Pilgrim 91d6f5fbc1 [X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts
llvm-svn: 288006
2016-11-27 21:08:19 +00:00
Craig Topper 4fab487265 [AVX-512] Add integer and fp unpck instructions to load folding tables.
llvm-svn: 288004
2016-11-27 19:51:41 +00:00
Simon Pilgrim cdb2ce661d [X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI.
Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit).

llvm-svn: 288003
2016-11-27 19:28:39 +00:00
Craig Topper 7ad961cc70 [X86] Add TB_NO_REVERSE to entries in the load folding table where the instruction's load size is smaller than the register size.
If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist.

I probably missed some instructions, but this should be a large portion of them.

llvm-svn: 288001
2016-11-27 18:51:13 +00:00
Craig Topper c3b3926f8b [AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287995
2016-11-27 08:55:31 +00:00
Craig Topper fb64a25ba1 [X86] Remove alignment restrictions from load folding table for some instructions that don't have a restriction.
Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load.

llvm-svn: 287991
2016-11-27 01:52:51 +00:00
Craig Topper 837ff25da1 [X86] Remove hasOneUse check that is redundant with the one in IsProfitableToFold.
llvm-svn: 287987
2016-11-26 18:43:26 +00:00
Craig Topper e266e126ff [X86] Fix the zero extending load detection in X86DAGToDAGISel::selectScalarSSELoad to pass the load node to IsProfitableToFold and IsLegalToFold.
Previously we were passing the SCALAR_TO_VECTOR node.

llvm-svn: 287986
2016-11-26 18:43:24 +00:00
Craig Topper d3ab1a3905 [X86] Simplify control flow. NFCI
llvm-svn: 287985
2016-11-26 18:43:21 +00:00
Craig Topper 991d1ca3ba [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times.
Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.

Reviewers: RKSimon, zvi, delena, spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D26790

llvm-svn: 287983
2016-11-26 17:29:25 +00:00
Craig Topper 10d5eec1a1 [AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287975
2016-11-26 08:21:52 +00:00
Craig Topper 97169ea5f9 [AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables.
llvm-svn: 287974
2016-11-26 08:21:48 +00:00
Craig Topper 53b33de1e3 [AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables.
llvm-svn: 287972
2016-11-26 07:21:00 +00:00
Craig Topper 6677bb4e50 [AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
2016-11-26 07:20:57 +00:00
Craig Topper 39265bb1ce [AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.
llvm-svn: 287970
2016-11-26 07:20:53 +00:00
Tom Stellard 1473f07ceb AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsics
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26724

llvm-svn: 287962
2016-11-26 02:26:04 +00:00
Craig Topper 7f76c23781 [X86][XOP] Add a reversed reg/reg form for VPROT instructions.
The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions.

llvm-svn: 287961
2016-11-26 02:14:00 +00:00
Craig Topper 516fd7abfe [X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding tables for consistency.
Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete.

llvm-svn: 287960
2016-11-26 02:13:58 +00:00
Craig Topper a363d42973 [AVX-512] Put the AVX-512 sections of the load folding tables into mostly alphabetical order. This is consistent with the older sections of the table. NFC
llvm-svn: 287956
2016-11-25 23:21:34 +00:00
Marek Olsak 79c05871a2 AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt

llvm-svn: 287942
2016-11-25 17:37:09 +00:00
Simon Pilgrim 8e8ae7219f Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI
llvm-svn: 287940
2016-11-25 17:19:53 +00:00
Craig Topper 88071b37ab [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Craig Topper 1e48829747 [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.
llvm-svn: 287937
2016-11-25 16:33:53 +00:00
Marek Olsak e3895bfb47 Revert "AMDGPU: Implement SGPR spilling with scalar stores"
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.

llvm-svn: 287936
2016-11-25 16:03:34 +00:00
Marek Olsak dad553a5cf Revert "AMDGPU: Fix MMO when splitting spill"
This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1.

llvm-svn: 287935
2016-11-25 16:03:27 +00:00
Marek Olsak 8cbbf65361 Revert "AMDGPU: Fix adding extra implicit def of register"
This reverts commit e834ce5976567575621901fb967b8018b9916d71.

llvm-svn: 287934
2016-11-25 16:03:22 +00:00
Marek Olsak 713e6fc531 Revert "AMDGPU: Fix not setting kill flag on temp reg when spilling"
This reverts commit 057bbbe4ae170247ba37f08f2e70ef185267d1bb.

llvm-svn: 287933
2016-11-25 16:03:19 +00:00
Marek Olsak a45dae458d Revert "AMDGPU: Make m0 unallocatable"
This reverts commit 124ad83dae04514f943902446520c859adee0e96.

llvm-svn: 287932
2016-11-25 16:03:15 +00:00
Marek Olsak ea848df84c Revert "AMDGPU: Remove m0 spilling code"
This reverts commit f18de36554eb22416f8ba58e094e0272523a4301.

llvm-svn: 287931
2016-11-25 16:03:06 +00:00
Marek Olsak 18a95bcb3c Revert "AMDGPU: Preserve m0 value when spilling"
This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce.

llvm-svn: 287930
2016-11-25 16:03:02 +00:00
Simon Dardis c08af6db5b [mips] Correct jal expansion for local symbols in .local directives.
This patch corrects the behaviour of code such as:

   .local foo
   jal foo
foo:
to use the correct jal expansion when writing ELF files.

Patch by: Daniel Sanders

Reviewers: zoran.jovanovic, seanbruno, vkalintiris

Differential Revision: https://reviews.llvm.org/D24722

llvm-svn: 287918
2016-11-25 11:06:43 +00:00
Craig Topper d4091494d3 [X86] Invert an 'if' and early out to fix a weird indentation. NFCI
llvm-svn: 287909
2016-11-25 02:29:24 +00:00
Craig Topper a46936185a [X86] Size a SmallVector to the worst case mask size for a 512-bit shuffle. NFCI
llvm-svn: 287908
2016-11-25 02:29:21 +00:00
Simon Pilgrim f1ee930db0 Fix unused variable warning
llvm-svn: 287889
2016-11-24 15:24:47 +00:00
Benjamin Kramer fc54e35d94 [X86] Don't round trip a unique_ptr through a raw pointer for assignment.
No functional change.

llvm-svn: 287888
2016-11-24 15:17:39 +00:00
Simon Pilgrim 9c71e07276 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim 7c26a6f9ef [X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
llvm-svn: 287878
2016-11-24 14:02:30 +00:00
Simon Pilgrim ab323ec411 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Nikolai Bozhenov 3a8d108b2b [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Nikolai Bozhenov bb64aa14a3 [x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter
Move the definitions of three variables out of the switch.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25192

llvm-svn: 287874
2016-11-24 13:15:49 +00:00
Nikolai Bozhenov a2dabed3b6 [x86] Rewrite getAddressFromInstr helper function
- It does not modify the input instruction
- Second operand of any address is always an Index Register,
  make sure we actually check for that, instead of a check for
  an immediate value

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D24938

llvm-svn: 287873
2016-11-24 13:05:43 +00:00
Simon Pilgrim a3af79678e [X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.

This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.

Differential Revision: https://reviews.llvm.org/D27072

llvm-svn: 287870
2016-11-24 12:13:46 +00:00
Matt Arsenault 7b54dd039e AMDGPU: Preserve m0 value when spilling
llvm-svn: 287844
2016-11-24 00:26:50 +00:00
Matt Arsenault 94b32ffe8e TRI: Add hook to pass scavenger during frame elimination
The scavenger was not passed if requiresFrameIndexScavenging was
enabled. I need to be able to test for the availability of an
unallocatable register here, so I can't create a virtual register for
it.

It might be better to just always use the scavenger and stop
creating virtual registers.

llvm-svn: 287843
2016-11-24 00:26:47 +00:00
Matt Arsenault 5ee3325358 AMDGPU: Remove m0 spilling code
Since m0 isn't allocatable it should never be spilled anymore.

llvm-svn: 287842
2016-11-24 00:26:44 +00:00
Matt Arsenault 9e5c7b1031 AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.

This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.

llvm-svn: 287841
2016-11-24 00:26:40 +00:00
Simon Pilgrim 3ce6a545c7 [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result
We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq

llvm-svn: 287835
2016-11-23 22:35:06 +00:00
Matt Arsenault a24d84beb9 AMDGPU: Cleanup immediate folding code
Move code down to use, reorder to avoid hard to follow
immediate folding logic.

llvm-svn: 287818
2016-11-23 21:51:07 +00:00
Matt Arsenault 391c3ea9bc AMDGPU: Fix debug printing
The uint8_t was printed as a char which didn't really work.

llvm-svn: 287817
2016-11-23 21:51:05 +00:00
Matt Arsenault 997a9abf4c AMDGPU: Fix not setting kill flag on temp reg when spilling
llvm-svn: 287808
2016-11-23 21:00:12 +00:00
Matt Arsenault dd0cb2a3e5 AMDGPU: Fix adding extra implicit def of register
In the scalar case, there's no reason to add an additional
def of the same register.

llvm-svn: 287807
2016-11-23 21:00:10 +00:00
Matt Arsenault 2669a76f01 AMDGPU: Fix MMO when splitting spill
The size and offset were wrong. The size of the object was
being used for the size of the access, when here it is really
being split into 4-byte accesses. The underlying object size
is set in the MachinePointerInfo, which also didn't have the
offset set.

llvm-svn: 287806
2016-11-23 20:52:53 +00:00
Michael Kuperstein 47eb85a003 [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.

This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.

Differential Revision: https://reviews.llvm.org/D26521

llvm-svn: 287792
2016-11-23 18:33:49 +00:00
Nemanja Ivanovic 10fc3cfc63 [PowerPC] Remove InstAlias definitions that cause incorrect assembly
In rL283190, I added some InstAlias definitions to generate extended mnemonics
for some uses of the XXPERMDI instruction. However, when the assembler matches
these extended mnemonics, it matches the new instruction in situations where it
should match the old one.
This patch removes these definitions and accomplishes that by defining these
mnemonics with additional instructions that are isCodeGenOnly.

Fixes PR31127.

llvm-svn: 287765
2016-11-23 15:51:52 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim 03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Craig Topper f57e17def0 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Zvi Rackover 14aba43ea9 [X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's
Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors.

Reviewers: RKSimon, craig.topper, delena, andreadb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26985

llvm-svn: 287743
2016-11-23 06:45:25 +00:00
Kuba Mracek 06995e866b [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Matthias Braun 7f423442d1 TargetSubtargetInfo: Move implementation to lib/CodeGen; NFC
TargetSubtargetInfo is filled with CodeGen specific interfaces nowadays
(getInstrInfo(), getFrameLowering(), getSelectionDAGInfo()) most of the
tuning flags like enablePostRAScheduler(), getAntiDepBreakMode(),
enableRALocalReassignment(), ... also do not seem to be universal enough
to make sense outside of CodeGen.

Differential Revision: https://reviews.llvm.org/D26948

llvm-svn: 287708
2016-11-22 22:09:03 +00:00
Simon Dardis 6efb8dd2e3 [mips] seb, seh instruction aliases
Add the single operand form.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D26961

llvm-svn: 287681
2016-11-22 19:17:23 +00:00
Nemanja Ivanovic b8e30d6db6 [PowerPC] Emit VMX loads/stores for aligned ops to avoid adding swaps on LE
This patch corresponds to review:
https://reviews.llvm.org/D26861

It also fixes PR30730.

Committing on behalf of Lei Huang.

llvm-svn: 287679
2016-11-22 19:02:07 +00:00
Simon Pilgrim 4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Vasileios Kalintiris 04dc211e6a [mips] Add support for unaligned load/store macros.
Add missing unaligned store macros (ush/usw) and fix the exisiting
implementation of the unaligned load macros in order to generate
identical expansions with the GNU assembler.

llvm-svn: 287646
2016-11-22 16:43:49 +00:00
Tim Northover b64fb453ea CodeGen: simplify TargetMachine::getSymbol interface. NFC.
No-one actually had a mangler handy when calling this function, and
getSymbol itself went most of the way towards getting its own mangler
(with a local TLOF variable) so forcing all callers to supply one was
just extra complication.

llvm-svn: 287645
2016-11-22 16:17:20 +00:00
Zvi Rackover 9a355219d1 [X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC.
llvm-svn: 287644
2016-11-22 15:33:28 +00:00
Zvi Rackover 0aa1c32d14 [X86] Remove dead code from LowerVectorBroadcast
Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish.

Reviewers: craig.topper, delena, RKSimon, andreadb

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D26678

llvm-svn: 287643
2016-11-22 15:17:52 +00:00
Chad Rosier ecc77273a0 [AArch64] Set the max interleave factor for Falkor.
llvm-svn: 287642
2016-11-22 14:25:02 +00:00
Chad Rosier 2abc29c593 [AArch64] Maximize 80-column. NFC.
llvm-svn: 287640
2016-11-22 14:12:09 +00:00
Coby Tayree 49b3733d57 [AVX512][inline-asm] Fix AVX512 inline assembly instruction resolution when the size qualifier of a memory operand is not specified explicitly.
This commit handles cases where the size qualifier of an indirect memory reference operand in Intel syntax is missing (e.g. "vaddps xmm1, xmm2, [a]").

GCC will deduce the size qualifier for AVX512 vector and broadcast memory operands based on the possible matches:
"vaddps xmm1, xmm2, [a]" matches only “XMMWORD PTR” qualifier.
"vaddps xmm1, xmm2, [a]{1to4}" matches only “DWORD PTR” qualifier.

This is different from the current behavior of LLVM, which deduces the size qualifier based on the size of the memory operand.
For "vaddps xmm1, xmm2, [a]"
"char a;" will imply "BYTE PTR" qualifier
"short a;" will imply "WORD PTR" qualifier.

This commit aligns LLVM to GCC’s behavior.

This is the LLVM part of the review.
The Clang part of the review: https://reviews.llvm.org/D26587

Differential Revision: https://reviews.llvm.org/D26586

llvm-svn: 287630
2016-11-22 09:30:29 +00:00
Craig Topper 3dcf45f08d [X86] Remove alternate CodeGenOnly version of (v)movq that declared the load size as i128mem. Change all uses to the use the i64mem version.
I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX.

There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table.

llvm-svn: 287622
2016-11-22 05:31:43 +00:00
Craig Topper cada9f2275 [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.

We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.

Reviewers: igorb, delena, Ayal, Farhana, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25652

llvm-svn: 287621
2016-11-22 04:57:34 +00:00
Craig Topper da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Stanislav Mekhanoshin ae0f6620e4 [AMDGPU] Fix multiple vreg definitions in si-lower-control-flow
Differential Revision: https://reviews.llvm.org/D26939

llvm-svn: 287608
2016-11-22 01:42:34 +00:00
Geoff Berry e0bf52f394 [AArch64LoadStoreOptimizer] Don't treat write to XZR/WZR as a clobber.
Summary:
When searching for load/store instructions to pair/merge don't treat
writes to WZR/XZR as clobbers since they don't change the value read
from WZR/XZR (which is always 0).

Reviewers: mcrosier, junbuml, jmolloy, t.p.northover

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D26921

llvm-svn: 287592
2016-11-21 22:51:10 +00:00
Simon Dardis 43115a1ce4 [mips] seq macro support
This patch adds the seq macro.

This partially resolves PR/30381.

Thanks to Sean Bruno for reporting the issue!

Reviewers: zoran.jovanovic, vkalintiris, seanbruno

Differential Revision: https://reviews.llvm.org/D24607

llvm-svn: 287573
2016-11-21 20:30:41 +00:00
Coby Tayree 94ddbb4a04 small fixup which enables the issuing of the aforementioned instruction (w/o operands), on MS/Intel syntax.
Differential Revision: https://reviews.llvm.org/D26913

llvm-svn: 287548
2016-11-21 15:50:56 +00:00
Simon Pilgrim b7bbaa669b [X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input
At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input).

This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit.

llvm-svn: 287535
2016-11-21 12:05:49 +00:00
Michael Zuckerman 8462faeaba Fixing a small typo (A->U).
This seem to fixes PR30992.

-         HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX 
+         HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX 

llvm-svn: 287532
2016-11-21 11:52:11 +00:00
Craig Topper 9f2d632ee7 [AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match SSE and AVX.
llvm-svn: 287523
2016-11-21 07:51:31 +00:00
Alexei Starovoitov 7ab125dbf3 [bpf] fix dwarf elf relocs and line numbers
- teach RelocVisitor to recognize bpf relocations
- fix AsmInfo->PointerSize to make sure dwarf is emitted correctly
- add a test for the above

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287521
2016-11-21 06:21:23 +00:00
Craig Topper 0dfc09372f [X86] Remove duplicate instructions for (v)movq and replace with patterns on other instructions. NFC
llvm-svn: 287519
2016-11-21 04:07:56 +00:00
Dean Michael Berris 31761f300d [XRay][AArch64] Implemented a test for the compile-time sleds emitted, and fixed a bug in the jump instruction
This patch adds a test for the assembly code emitted with XRay
instrumentation. It also fixes a bug where the operand of a jump
instruction must be not the number of bytes to jump over, but rather the
number of 4-byte instructions.

Author: rSerge

Reviewers: dberris, rengolin

Differential Revision: https://reviews.llvm.org/D26805

llvm-svn: 287516
2016-11-21 03:01:43 +00:00
Simon Dardis 1dcb911061 [mips] Restrict tail call optimization
The tail call optimization was being used without proper consideration of
ABI requirements for saving and restoring the GP. This patch restricts tail
call optimization to functions within the same translation unit.

Reviewers: vkalintiris

Differential Revision: https://reviews.llvm.org/D24763

llvm-svn: 287505
2016-11-20 21:23:08 +00:00
Coby Tayree 99a6639047 The 'vpmultishiftqb' instruction was implemented falsely, this patch amend it.
More specifically - (MS dialect) broadcasting variants were implemented falsely.

Differential Revision: https://reviews.llvm.org/D26257

llvm-svn: 287501
2016-11-20 17:19:55 +00:00
Coby Tayree 97e9cf62f4 Some instructions were missing, other implemented falsely. this patch aims at amending those issues. full list:
vcvtps2pd
vcvtudq2pd
vcvtps2qq
vcvttps2qq
vcvtps2uqq
vcvttps2uqq

variants are:

[Dst]XMM(zero-masked/merge-masked/unmasked)
[Src]Mem64

Differential Revision: https://reviews.llvm.org/D26799

llvm-svn: 287500
2016-11-20 17:09:56 +00:00
Simon Pilgrim 5fadce4a3f [X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible
llvm-svn: 287497
2016-11-20 16:11:36 +00:00
Simon Pilgrim 5401bae523 [X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines
llvm-svn: 287496
2016-11-20 15:24:38 +00:00
Simon Pilgrim 3f40412e0f [X86][AVX512] Add support for VBMI VPERMV target shuffle combines
llvm-svn: 287495
2016-11-20 15:05:45 +00:00
Simon Pilgrim c17e1b74b8 [X86][AVX512VL] Removed duplicate operation action
Basic AVX512F already declared uint_to_fp v4i32 as legal

llvm-svn: 287493
2016-11-20 14:19:29 +00:00
Simon Pilgrim 3f10e9953d Strip trailing whitespace
llvm-svn: 287492
2016-11-20 14:05:23 +00:00
Simon Pilgrim 096b6d4f81 [X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets
Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64)

llvm-svn: 287491
2016-11-20 14:03:23 +00:00
Simon Pilgrim fbd2221de5 Fix comment typos. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287486
2016-11-20 13:10:51 +00:00
Oren Ben Simhon c0f073b67f [X86] RegCall - Handling long double arguments
The change is part of RegCall calling convention support for LLVM.
Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack).
This review present the change and the corresponding tests.

Differential Revision: https://reviews.llvm.org/D26151

llvm-svn: 287485
2016-11-20 11:06:07 +00:00
Coby Tayree 179ff0e541 [X86][InlineAsm]Test commit.
Fixing a wrong comment on X86AsmParser.cpp::ParseZ: "true" --> "false"

Differential Revision: https://reviews.llvm.org/D26797

llvm-svn: 287484
2016-11-20 09:31:11 +00:00
Alexei Starovoitov e6ddac0def [bpf] add BPF disassembler
add BPF disassembler, so tools like llvm-objdump can be used:
$ llvm-objdump -d -no-show-raw-insn ./sockex1_kern.o

./sockex1_kern.o:	file format ELF64-BPF

Disassembly of section socket1:
bpf_prog1:
       0:	r6 = r1
       8:	r0 = *(u8 *)skb[23]
      10:	*(u32 *)(r10 - 4) = r0
      18:	r1 = *(u32 *)(r6 + 4)
      20:	if r1 != 4 goto 8
      28:	r2 = r10
      30:	r2 += -4

ld_imm64 (the only 16-byte insn) and special ld_abs/ld_ind instructions
had to be treated in a special way. The decoders for the rest of the insns
are automatically generated.

Add tests to cover new functionality.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287477
2016-11-20 02:25:00 +00:00
Benjamin Kramer ffd3715d16 Give some helper classes/functions internal linkage. NFC.
llvm-svn: 287462
2016-11-19 20:44:26 +00:00
Simon Pilgrim a14e0cb852 [X86][SSE] Improve PSHUFB lowering from either input
Canonicalization may leave the zeroable vector in the first input.

llvm-svn: 287461
2016-11-19 20:41:48 +00:00
Simon Pilgrim 623a7c57b5 [X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets
llvm-svn: 287459
2016-11-19 20:12:34 +00:00
Craig Topper 893ea9fb2c [X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI
llvm-svn: 287443
2016-11-19 17:33:17 +00:00
Daniel Sanders c95590bc45 Try again to fix unused variable warning on lld-x86_64-darwin13 after r287439.
The previous attempt didn't work. I assume LLVM_ATTRIBUTE_UNUSED isn't
available on that machine.

llvm-svn: 287442
2016-11-19 14:47:41 +00:00
Daniel Sanders c6d1986a84 Try to fix unused variable warning on lld-x86_64-darwin13 after r287439.
Whether the variable is used or not depends on NDEBUG.

llvm-svn: 287440
2016-11-19 13:50:32 +00:00
Daniel Sanders 72db2a390a Check that emitted instructions meet their predicates on all targets except ARM, Mips, and X86.
Summary:
* ARM is omitted from this patch because this check appears to expose bugs in this target.
* Mips is omitted from this patch because this check either detects bugs or deliberate
  emission of instructions that don't satisfy their predicates. One deliberate
  use is the SYNC instruction where the version with an operand is correctly
  defined as requiring MIPS32 while the version without an operand is defined
  as an alias of 'SYNC 0' and requires MIPS2.
* X86 is omitted from this patch because it doesn't use the tablegen-erated
  MCCodeEmitter infrastructure.

Patches for ARM and Mips will follow.

Depends on D25617

Reviewers: tstellarAMD, jmolloy

Subscribers: wdng, jmolloy, aemerson, rengolin, arsenm, jyknight, nemanjai, nhaehnle, tstellarAMD, llvm-commits

Differential Revision: https://reviews.llvm.org/D25618

llvm-svn: 287439
2016-11-19 13:05:44 +00:00
Dylan McKay 1a55f201ef [AVR] Remove a bunch of unused variables
llvm-svn: 287416
2016-11-19 01:33:42 +00:00
Dylan McKay 19270f3438 [AVR] Remove a variable which was unused in release mode
In release mode where assertions are not enabled, this caused an 'unused
variable' warning.

llvm-svn: 287414
2016-11-19 01:14:44 +00:00
Konstantin Zhuravlyov aefee42e0f [AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input
Differential Revision: https://reviews.llvm.org/D26862

llvm-svn: 287389
2016-11-18 22:31:08 +00:00
Matthias Braun 9f15a79e5d Timer: Track name and description.
The previously used "names" are rather descriptions (they use multiple
words and contain spaces), use short programming language identifier
like strings for the "names" which should be used when exporting to
machine parseable formats.

Also removed a unused TimerGroup from Hexxagon.

Differential Revision: https://reviews.llvm.org/D25583

llvm-svn: 287369
2016-11-18 19:43:18 +00:00
Matt Arsenault afe614cb38 AMDGPU: Fix unused variable warning
llvm-svn: 287362
2016-11-18 18:33:36 +00:00
Ehsan Amiri 395be572f0 [PPC] limit line width to 80 characters
NFC. Forgot to fix this in the original commit.

llvm-svn: 287350
2016-11-18 16:24:27 +00:00
Simon Dardis 0e2ee3b4b9 [mips][msa] Implement f16 support
The MIPS MSA ASE provides instructions to convert to and from half precision
floating point. This patch teaches the MIPS backend to treat f16 as a legal
type and how to promote such values to f32 for the usual set of operations.

As a result of this, the fexup[lr].w intrinsics no longer crash LLVM during
type legalization.

Reviewers: zoran.jovanvoic, vkalintiris

Differential Revision: https://reviews.llvm.org/D26398

llvm-svn: 287349
2016-11-18 16:17:44 +00:00
Tom Stellard 01e65d2cfc AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts
Summary:
The 32-bit instructions don't zero the high 16-bits like the 16-bit
instructions do.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26828

llvm-svn: 287342
2016-11-18 13:53:34 +00:00
Simon Pilgrim 7938bd666e Cleanup function with clang-format. NFCI.
llvm-svn: 287340
2016-11-18 12:16:18 +00:00
Nicolai Haehnle ce2b589df5 AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

llvm-svn: 287339
2016-11-18 11:55:52 +00:00
Simon Pilgrim dcd8433597 Fix spelling mistakes in MIPS target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287338
2016-11-18 11:53:36 +00:00
Ehsan Amiri ff0942e6ea [Power9] Add patterns for vnegd, vnegw
Exploit new instructions by adding patterns to .td file.
https://reviews.llvm.org/D26551

llvm-svn: 287334
2016-11-18 11:05:55 +00:00
Simon Pilgrim e995a8088d Fix spelling mistakes in AMDGPU target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287333
2016-11-18 11:04:02 +00:00
Simon Pilgrim fd8bf984f4 Fix typo in comment. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287331
2016-11-18 10:52:12 +00:00
Ehsan Amiri 85818684c6 [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extended
When we see a SETCC whose only users are zero extend operations, we can replace
it with a subtraction. This results in doing all calculations in GPRs and
avoids CR use.

Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are
ways that this can be extended. For example for signed condition codes. In that
case we will be introducing additional sign extend instructions, so more careful
profitability analysis may be required.

Another direction to extend this is for equal, not equal conditions. Also when
users of SETCC are any_ext or sign_ext, we might be able to do something 
similar.

llvm-svn: 287329
2016-11-18 10:41:44 +00:00
Craig Topper 02b5a1b50f [AVX-512] Replace masked 16-bit element variable shift intrinsics with new unmasked versions and selects.
The same thing was done to 32-bit and 64-bit element sizes previously.

This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics.

llvm-svn: 287312
2016-11-18 05:04:44 +00:00
Matt Arsenault eff1ad8d8e AMDGPU: Move redundant setting of inst properties
llvm-svn: 287311
2016-11-18 04:42:59 +00:00
Matt Arsenault 742deb2495 AMDGPU: Fix crash on illegal type for inlineasm
There are still crashes on non-MVT types in other
places.

llvm-svn: 287310
2016-11-18 04:42:57 +00:00
Alexei Starovoitov 8f9f8210c1 convert bpf assembler to look like kernel verifier output
since bpf instruction set was introduced people learned to
read and understand kernel verifier output whereas llvm asm
output stayed obscure and unknown. Convert llvm to emit
assembler text similar to kernel to avoid this discrepancy

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 287300
2016-11-18 02:32:35 +00:00
Craig Topper 07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim 6ba672e542 Fix spelling mistakes in Hexagon target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287248
2016-11-17 19:21:20 +00:00
Simon Pilgrim 9d15fb3c10 Fix spelling mistakes in X86 target comments. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287247
2016-11-17 19:03:05 +00:00
Konstantin Zhuravlyov 0a1a7b6b23 Revert "AMDGPU: Enable ConstrainCopy DAG mutation"
This reverts commit r287146.

This breaks few conformance tests.

llvm-svn: 287233
2016-11-17 16:41:49 +00:00
Simon Pilgrim 67ef3b984a Wdocumentation fix
llvm-svn: 287224
2016-11-17 12:21:45 +00:00
Simon Pilgrim 8eca5520dc [X86][SSE] Improve lowering of vXi64 multiply with known zero 32-bit halves
vXi64 multiplication is lowered into 3 calls of vpmuludq with the upper/lower 32-bit halves.

If any of these halves are zero then we can remove individual calls. Although there was isBuildVectorAllZeros code to do this I don't think it ever worked (maybe just for constant folded cases that don't seem to be tested for any longer).

This requires additional X86ISD support for computeKnownBitsForTargetNode, so far I've just added support for X86ISD::VZEXT (VPMOVZX* - helping the AVX2+ cases).

Partial fix for PR30845

Differential Revision: https://reviews.llvm.org/D26590

llvm-svn: 287223
2016-11-17 12:14:49 +00:00
Pablo Barrio c41e856f53 [ARM] Relax restriction on variadic functions for tailcall optimization
Summary:
Variadic functions can be treated in the same way as normal functions
with respect to the number and types of parameters.

Reviewers: grosbach, olista01, t.p.northover, rengolin

Subscribers: javed.absar, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26748

llvm-svn: 287219
2016-11-17 10:56:58 +00:00
Oren Ben Simhon 489d6eff4f [X86] RegCall - Handling v64i1 in 32/64 bit target
Register Calling Convention defines a new behavior for v64i1 types.
This type should be saved in GPR.
However for 32 bit machine we need to split the value into 2 GPRs (because each is 32 bit).

Differential Revision: https://reviews.llvm.org/D26181

llvm-svn: 287217
2016-11-17 09:59:40 +00:00