Commit Graph

34469 Commits

Author SHA1 Message Date
Matt Arsenault b5541fb098 AMDGPU: Remove unused multiclass argument
llvm-svn: 247161
2015-09-09 17:03:18 +00:00
Dan Gohman f71abef701 [WebAssembly] Implement calls with void return types.
llvm-svn: 247158
2015-09-09 16:13:47 +00:00
Tom Stellard 9a197676b1 AMDGPU/SI: Fold operands through REG_SEQUENCE instructions
Summary:
This helps mostly when we use add instructions for address calculations
that contain immediates.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12256

llvm-svn: 247157
2015-09-09 15:43:26 +00:00
Silviu Baranga a3e27edb5d [CostModel][AArch64] Remove amortization factor for some of the vector select instructions
Summary:
We are not scalarizing the wide selects in codegen for i16 and i32 and
therefore we can remove the amortization factor. We still have issues
with i64 vectors in codegen though.

Reviewers: mcrosier

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12724

llvm-svn: 247156
2015-09-09 15:35:02 +00:00
Dan Gohman 1ce7ba5fe0 [WebAssembly] Tidy up some unneeded newline characters.
llvm-svn: 247152
2015-09-09 15:13:36 +00:00
Igor Breger ac29a82921 AVX512: Implemented encoding and intrinsics for
vextracti64x4 ,vextracti64x2, vextracti32x8, vextracti32x4, vextractf64x4, vextractf64x2, vextractf32x8, vextractf32x4
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11802

llvm-svn: 247149
2015-09-09 14:35:09 +00:00
Zoran Jovanovic 6b28f09d67 [mips][microMIPS] Implement ADDU16, AND16, ANDI16, NOT16, OR16, SLL16 and SRL16 instructions
Differential Revision: http://reviews.llvm.org/D11178

llvm-svn: 247146
2015-09-09 13:55:45 +00:00
Zoran Jovanovic d9790793d6 [mips][microMIPS] Implement CACHEE and PREFE instructions
Differential Revision: http://reviews.llvm.org/D11628

llvm-svn: 247125
2015-09-09 09:10:46 +00:00
Matt Arsenault d768737454 AMDGPU: Fix not encoding src2 of VOP3b instructions
Broken by r247074. Should include an assembler test,
but the assembler is currently broken for VOP3b apparently.

llvm-svn: 247123
2015-09-09 08:39:49 +00:00
Dan Gohman e590b33bf8 [WebAssembly] Fix lowering of calls with more than one argument.
llvm-svn: 247118
2015-09-09 01:52:45 +00:00
Matt Arsenault acd68b58ae SelectionDAG: Support Expand of f16 extloads
Currently this hits an assert that extload should
always be supported, which assumes integer extloads.

This moves a hack out of SI's argument lowering and
is covered by existing tests.

llvm-svn: 247113
2015-09-09 01:12:27 +00:00
Dan Gohman 4f52e00ecb [WebAssembly] Implement WebAssemblyInstrInfo::copyPhysReg
llvm-svn: 247110
2015-09-09 00:52:47 +00:00
Reid Kleckner df1295173f [WinEH] Emit prologues and epilogues for funclets
Summary:
32-bit funclets have short prologues that allocate enough stack for the
largest call in the whole function. The runtime saves CSRs for the
funclet. It doesn't restore CSRs after we finally transfer control back
to the parent funciton via a CATCHRET, but that's a separate issue.
32-bit funclets also have to adjust the incoming EBP value, which is
what llvm.x86.seh.recoverframe does in the old model.

64-bit funclets need to spill CSRs as normal. For simplicity, this just
spills the same set of CSRs as the parent function, rather than trying
to compute different CSR sets for the parent function and each funclet.
64-bit funclets also allocate enough stack space for the largest
outgoing call frame, like 32-bit.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12546

llvm-svn: 247092
2015-09-08 22:44:41 +00:00
Eric Christopher 71f6e2f568 Fix the PPC CTR Loop pass to look for calls to the intrinsics that
read CTR and count them as reading the CTR.

llvm-svn: 247083
2015-09-08 22:14:58 +00:00
Matt Arsenault 86d336e91b AMDGPU/SI: Fix input vcc operand for VOP2b instructions
Adds vcc to output string input for e32. Allows option
of using e64 encoding with assembler.

Also fixes these instructions not implicitly reading exec.

llvm-svn: 247074
2015-09-08 21:15:00 +00:00
Artem Belevich 0127d80986 [NVPTX] Added run NVVMReflect pass to NVPTX back-end.
The pass is needed to remove __nvvm_reflect calls when we link in
libdevice bitcode that comes with CUDA.

Differential Revision: http://reviews.llvm.org/D11663

llvm-svn: 247072
2015-09-08 21:04:55 +00:00
Derek Schuff eef533f422 x32. Fixes a bug in how struct va_list is initialized in x32
Summary: This patch modifies X86TargetLowering::LowerVASTART so that
struct va_list is initialized with 32 bit pointers in x32. It also
includes tests that call @llvm.va_start() for x32.

Patch by João Porto

Subscribers: llvm-commits, hjl.tools
Differential Revision: http://reviews.llvm.org/D12346

llvm-svn: 247069
2015-09-08 20:51:31 +00:00
Dan Gohman e32c57443f [WebAssembly] Support running without a register allocator in the default CodeGen passes
This allows backends which don't use a traditional register allocator,
but do need PHI lowering and other passes, to use the default
TargetPassConfig::addFastRegAlloc and
TargetPassConfig::addOptimizedRegAlloc implementations.

Differential Revision: http://reviews.llvm.org/D12691

llvm-svn: 247065
2015-09-08 20:36:33 +00:00
Matt Arsenault 8ac35cd031 AMDGPU: Mark s_barrier as a high latency instruction
These were marked as WriteSALU, which is low latency.
I'm guessing at the value to use, but it should probably
be considered the highest latency instruction.

I'm not sure this has any actual effect since hasSideEffects
probably is preventing any moving of these.

llvm-svn: 247060
2015-09-08 19:54:32 +00:00
Matt Arsenault 8fb810a1d2 AMDGPU: Fix s_barrier flags
This should be convergent. This is not a
barrier in the isBarrier sense, nor
hasCtrlDep.

llvm-svn: 247059
2015-09-08 19:54:25 +00:00
Derek Schuff ee4e947e23 x32. Fixes a bug in i8mem_NOREX declaration.
The old implementation assumed LP64 which is broken for x32.  Specifically, the
MOVE8rm_NOREX and MOVE8mr_NOREX, when selected, would cause a 'Cannot emit
physreg copy instruction' error message to be reported.

This patch also enable the h-register*ll tests for x32.

Differential Revision: http://reviews.llvm.org/D12336

Patch by João Porto

llvm-svn: 247058
2015-09-08 19:47:15 +00:00
Matt Arsenault 966a94f861 AMDGPU: Handle sub of constant for DS offset folding
sub C, x - > add (sub 0, x), C for DS offsets.

This is mostly to fix regressions that show up when
SeparateConstOffsetFromGEP is enabled.

llvm-svn: 247054
2015-09-08 19:34:22 +00:00
David Blaikie 12dd3c4ebb Fix CPP Backend for GEP API changes for opaque pointer types
Based on a patch by Jerome Witmann.

llvm-svn: 247047
2015-09-08 18:42:29 +00:00
Andrew Kaylor e2ea93c6c0 Fix for bz24500: Avoid non-deterministic code generation triggered by the x86 call frame optimization
Patch by Dave Kreitzer

Differential Revision: http://reviews.llvm.org/D12620

llvm-svn: 247042
2015-09-08 18:18:46 +00:00
JF Bastien 749ed88aa5 WebAssembly: NFC rename shr/sar
Renamed from: https://github.com/WebAssembly/design/pull/332

llvm-svn: 247028
2015-09-08 17:21:21 +00:00
Jun Bum Lim 4d3c5986f2 Remove white space (test commit)
llvm-svn: 247021
2015-09-08 16:11:22 +00:00
Zoran Jovanovic 2da1437d62 [mips][microMIPS] Implement LLE, LUI, LW and LWE instructions
Differential Revision: http://reviews.llvm.org/D1179

llvm-svn: 247017
2015-09-08 15:02:50 +00:00
Igor Breger a54a1a84dd AVX512: kunpck encoding implementation
Added tests for encoding.

Differential Revision: http://reviews.llvm.org/D12061

llvm-svn: 247010
2015-09-08 13:10:00 +00:00
Dan Gohman 25d2a0dda4 [WebAssembly] Enable SSA lowering and other pre-regalloc passes
llvm-svn: 247008
2015-09-08 12:39:25 +00:00
Elena Demikhovsky ddf715ef77 Removed an old comment, NFC
llvm-svn: 247006
2015-09-08 12:22:22 +00:00
Zoran Jovanovic 9eaa30d2bf [mips][microMIPS] Implement SB, SBE, SCE, SH and SHE instructions
Differential Revision: http://reviews.llvm.org/D11801

llvm-svn: 246999
2015-09-08 10:18:38 +00:00
Daniel Sanders 808dfb8ba7 [mips] Reserve address spaces 1-255 for software use.
Summary: And define them to have noop casts with address spaces 0-255.

Reviewers: pekka.jaaskelainen

Subscribers: pekka.jaaskelainen, llvm-commits

Differential Revision: http://reviews.llvm.org/D12678

llvm-svn: 246990
2015-09-08 09:07:03 +00:00
Zoran Jovanovic 68be5f21a9 [mips][microMIPS] Add microMIPS32r6 and microMIPS64r6 tests for existing 16-bit LBU16, LHU16, LW16, LWGP and LWSP instructions
Differential Revision: http://reviews.llvm.org/D10956

llvm-svn: 246987
2015-09-08 08:25:34 +00:00
Elena Demikhovsky dec0f0885f compilation issue, NFC
llvm-svn: 246983
2015-09-08 07:34:06 +00:00
Elena Demikhovsky d240d778b3 fixed compilation issue, NFC.
llvm-svn: 246982
2015-09-08 07:10:08 +00:00
Elena Demikhovsky e88038f235 AVX-512: Lowering for 512-bit vector shuffles.
Vector types: <8 x 64>, <16 x 32>, <32 x 16> float and integer.

Differential Revision: http://reviews.llvm.org/D10683

llvm-svn: 246981
2015-09-08 06:38:21 +00:00
Zoran Jovanovic 7b85682541 [mips][microMIPS] Implement ABS.fmt, CEIL.L.fmt, CEIL.W.fmt, FLOOR.L.fmt, FLOOR.W.fmt, TRUNC.L.fmt, TRUNC.W.fmt, RSQRT.fmt and SQRT.fmt instructions
Differential Revision: http://reviews.llvm.org/D11674

llvm-svn: 246968
2015-09-07 13:01:04 +00:00
Zoran Jovanovic ada7091812 [mips][microMIPS] Implement BC16, BEQZC16 and BNEZC16 instructions
Differential Revision: http://reviews.llvm.org/D11181

llvm-svn: 246963
2015-09-07 11:56:37 +00:00
John Brawn d8b405abf7 [ARM] Get rid of SelectT2ShifterOperandReg, NFC
SelectT2ShifterOperandReg has identical behaviour to SelectImmShifterOperand,
so get rid of it and use SelectImmShifterOperand instead.

Differential Revision: http://reviews.llvm.org/D12195

llvm-svn: 246962
2015-09-07 11:45:18 +00:00
Zoran Jovanovic 14f308e44f [mips][microMIPS] Implement CVT.D.fmt, CVT.L.fmt, CVT.S.fmt, CVT.W.fmt, MAX.fmt, MIN.fmt, MAXA.fmt, MINA.fmt and CMP.condn.fmt instructions
Differential Revision: http://reviews.llvm.org/D12141

llvm-svn: 246960
2015-09-07 10:31:31 +00:00
NAKAMURA Takumi 0d72539d5a Prune utf8 chars in comments.
llvm-svn: 246953
2015-09-07 00:26:54 +00:00
Hal Finkel ccf9259c00 [PowerPC] Don't commute trivial rlwimi instructions
To commute a trivial rlwimi instructions (meaning one with a full mask and zero
shift), we'd need to ability to form an all-zero mask (instead of an all-one
mask) using rlwimi. We can't represent this, however, and we'll miscompile code
if we try.

The code quality problem that this highlights (that SDAG simplification can
lead to us generating an ISD::OR node with a constant zero LHS) will be fixed
as a follow-up.

Fixes PR24719.

llvm-svn: 246937
2015-09-06 04:17:30 +00:00
Zoran Jovanovic 89ca2b982e [mips][microMIPS] Implement ADD.fmt, SUB.fmt, MOV.fmt, MUL.fmt, DIV.fmt, MADDF.fmt, MSUBF.fmt and NEG.fmt instructions
Differential Revision: http://reviews.llvm.org/D11978

llvm-svn: 246919
2015-09-05 09:25:30 +00:00
Hal Finkel b1518d6c24 [PowerPC] Fix and(or(x, c1), c2) -> rlwimi generation
PPCISelDAGToDAG has a transformation that generates a rlwimi instruction from
an input pattern that looks like this:

  and(or(x, c1), c2)

but the associated logic does not work if there are bits that are 1 in c1 but 0
in c2 (these are normally canonicalized away, but that can't happen if the 'or'
has other users. Make sure we abort the transformation if such bits are
discovered.

Fixes PR24704.

llvm-svn: 246900
2015-09-05 00:02:59 +00:00
Hal Finkel 4a7be23976 [PowerPC] Enable interleaved-access vectorization
This adds a basic cost model for interleaved-access vectorization (and a better
default for shuffles), and enables interleaved-access vectorization by default.
The relevant difference from the default cost model for interleaved-access
vectorization, is that on PPC, the shuffles that end up being used are *much*
cheaper than modeling the process with insert/extract pairs (which are
quite expensive, especially on older cores).

llvm-svn: 246824
2015-09-04 00:10:41 +00:00
Hal Finkel 75afa2b6b6 [PowerPC] Always use aggressive interleaving on the A2
On the A2, with an eye toward QPX unaligned-load merging, we should always use
aggressive interleaving. It is generally superior to only using concatenation
unrolling.

llvm-svn: 246819
2015-09-03 23:23:00 +00:00
Hal Finkel e6702ca0e2 [PowerPC] Try harder to find a base+offset when looking for consecutive accesses
When forming permutation-based unaligned vector loads, we need to know whether
it is valid to read ahead of the requested address by a full vector length.
Doing so is more efficient (and allows for more CSE with later loads), but
could trigger a page fault if invalid. To determine validity, we look for other
loads in the same block that access the relevant address range.

The relevant point here is that we need to do this as part of the process of
forming permutation-based vector loads, and this happens quite early in the
SDAG pipeline - specifically before many of the address calculations are fully
canonicalized. As a result, we need to try harder to recognize base+offset
address computations, because they still might appear as chain of adds
(base+offset+offset, for example). To account for this, we'll look through
chains of adds, accumulating the constant offsets.

llvm-svn: 246813
2015-09-03 22:37:44 +00:00
Hal Finkel f11bc761d8 [PowerPC] Include the permutation cost for unaligned vector loads
Pre-P8, when we generate code for unaligned vector loads (for Altivec and QPX
types), even when accounting for the combining that takes place for multiple
consecutive such loads, there is at least one load instructions and one
permutation for each load. Make sure the cost reported reflects the cost of the
permutes as well.

llvm-svn: 246807
2015-09-03 21:23:18 +00:00
Hal Finkel 99d95328d6 [PowerPC] Compute the MMO offset for an unaligned load with signed arithmetic
If you compute the MMO offset using unsigned arithmetic, you end up with a
large positive offset instead of a small negative one. In theory, this could
cause bad instruction-scheduling decisions later.

I noticed this by inspection from the debug output, and using that for the
regression test is the best I can do right now.

llvm-svn: 246805
2015-09-03 21:12:15 +00:00
Chad Rosier 6c36eff1d6 [AArch64] Improve ISel using across lane addition reduction.
In vectorized add reduction code, the final "reduce" step is sub-optimal.
This change wll combine :

ext  v1.16b, v0.16b, v0.16b, #8
add  v0.4s, v1.4s, v0.4s
dup  v1.4s, v0.s[1]
add  v0.4s, v1.4s, v0.4s

into

addv s0, v0.4s

PR21371
http://reviews.llvm.org/D12325
Patch by Jun Bum Lim <junbuml@codeaurora.org>!

llvm-svn: 246790
2015-09-03 18:13:57 +00:00
Reid Kleckner 1f13d4789f Sink COFF.h MC include into .cpp files
This prevents MC clients from getting COFF.h, which conflicts with
winnt.h macros. Also a minor IWYU cleanup. Now the only public headers
including COFF.h are in Object, and they actually need it.

llvm-svn: 246784
2015-09-03 16:41:50 +00:00
Chad Rosier 08ef462d15 Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r246769.

This appears to have broken Multisource/Benchmarks/tramp3d-v4.

llvm-svn: 246782
2015-09-03 16:41:28 +00:00
Sanjay Patel c9ae9d72f8 [x86] enable machine combiner reassociations for scalar 'xor' insts
llvm-svn: 246781
2015-09-03 16:36:16 +00:00
Sanjay Patel ce74db9d8d check for fastness before merging in DAGCombiner::MergeConsecutiveStores()
Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time
we have a merged access candidate. Without this patch, we were generating unaligned 
16-byte (SSE) memops for x86 targets where those accesses are slow.

This change was mentioned in:
http://reviews.llvm.org/D10662 and
http://reviews.llvm.org/D10905

and will help solve PR21711.

Differential Revision: http://reviews.llvm.org/D12573

llvm-svn: 246771
2015-09-03 15:03:19 +00:00
Chad Rosier 491a1bd998 [AArch64] Improve load/store optimizer to handle LDUR + LDR.
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

llvm-svn: 246769
2015-09-03 14:41:37 +00:00
Chad Rosier 5f668e170a [AArch64] Reuse MayLoad. NFC.
llvm-svn: 246767
2015-09-03 14:19:43 +00:00
Daniel Sanders 3ebcaf6685 [mips] Added support for the div, divu, ddiv and ddivu macros which use traps and breaks in the integrated assembler.
Summary:

Patch by Scott Egerton

Reviewers: vkalintiris, dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11675

llvm-svn: 246763
2015-09-03 12:31:22 +00:00
Igor Breger 0dcd8bcf24 AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, vpconflictq, vpconflictd
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11931

llvm-svn: 246750
2015-09-03 09:05:31 +00:00
Ahmed Bougacha b03ea02479 [X86] Require 32-byte alignment for 32-byte VMOVNTs.
We used to accept (and even test, and generate) 16-byte alignment
for 32-byte nontemporal stores, but they require 32-byte alignment,
per SDM. Found by inspection.

Instead of hardcoding 16 in the patfrag, check for natural alignment.
Also fix the autoupgrade and the various tests.

Also, use explicit -mattr instead of -mcpu: I stared at the output
several minutes wondering why I get 2x movntps for the unaligned
case (which is the ideal output, but needs some work: see FIXME),
until I remembered corei7-avx implies +slow-unaligned-mem-32.

llvm-svn: 246733
2015-09-02 23:25:39 +00:00
Ahmed Bougacha bc72ad7b27 [X86] Cleanup nontemporal fragments. NFCI.
We can chain other fragments to avoid repeating conditions.
This also fixes a potential bug (that realistically can't happen),
where we would match indexed nontemporal stores for i32/i64.

llvm-svn: 246719
2015-09-02 22:27:38 +00:00
Hal Finkel 79dbf5b562 [PowerPC] Cleanup cost model for unaligned vector loads/stores
I'm adding a regression test to better cover code generation for unaligned
vector loads and stores, but there's no functional change to the code
generation here. There is an improvement to the cost model for unaligned vector
loads and stores, mostly for QPX (for which we were not previously accounting
for the permutation-based loads), and the cost model implementation is cleaner.

llvm-svn: 246712
2015-09-02 21:03:28 +00:00
Ahmed Bougacha 63fae0e58b [AArch64] More consistently separate asm opc and operands with '\t'.
Somehow missed these in r246686.

llvm-svn: 246687
2015-09-02 18:52:54 +00:00
Ahmed Bougacha cca07716f5 [AArch64] Consistently separate asm opc and operands with '\t'.
Some of the instructions use ' ', which drives OCD-me nuts.
Let's put an end to this.

NFC-ish: hopefully nobody cares about whitespace.
llvm-svn: 246686
2015-09-02 18:38:36 +00:00
Hal Finkel 77c8b7ffd3 [PowerPC] Don't always consider P8Altivec-only masks in LowerVECTOR_SHUFFLE
LowerVECTOR_SHUFFLE needs to decide whether to pass a vector shuffle off to the
TableGen-generated matching code, and it does this by testing the same
predicates used by the TableGen files. Unfortunately, when we added new
P8Altivec-only predicates, we started universally testing them in
LowerVECTOR_SHUFFLE, and if then matched when targeting a system prior to a P8,
we'd end up with a selection failure.

llvm-svn: 246675
2015-09-02 16:52:37 +00:00
Sanjay Patel fbcd189f8a [x86] fix allowsMisalignedMemoryAccesses() for 8-byte and smaller accesses
This is a continuation of the fix from:
http://reviews.llvm.org/D10662

and discussion in:
http://reviews.llvm.org/D12154

Here, we distinguish slow unaligned SSE (128-bit) accesses from slow unaligned
scalar (64-bit and under) accesses. Other lowering (eg, getOptimalMemOpType) 
assumes that unaligned scalar accesses are always ok, so this changes 
allowsMisalignedMemoryAccesses() to match that behavior.

Differential Revision: http://reviews.llvm.org/D12543

llvm-svn: 246658
2015-09-02 15:42:49 +00:00
Asaf Badouh d2c3599c5f [X86][AVX512VLBW] add support in byte shift and SAD
add byte shift left/right
add SAD - compute sum of absolute differences

Differential Revision: http://reviews.llvm.org/D12479

llvm-svn: 246654
2015-09-02 14:21:54 +00:00
Igor Breger 1e58e8adf6 AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S instructions
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11593

llvm-svn: 246642
2015-09-02 11:18:55 +00:00
Igor Breger a6297c701e AVX512: Implemented encoding and intrinsics for vshufps/d.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D11709

llvm-svn: 246640
2015-09-02 10:50:58 +00:00
Elena Demikhovsky 9f83c7346f AVX-512: store <4 x i1> and <2 x i1> values in memory
Enabled DAG pattern lowering for SKX with DQI predicate.

Differential Revision: http://reviews.llvm.org/D12550

llvm-svn: 246625
2015-09-02 09:20:58 +00:00
Vedant Kumar b5c2fd7257 [CodeGen] Fix FREM on 32-bit MSVC on x86
Patch by Dylan McKay!

Differential Revision: http://reviews.llvm.org/D12099

llvm-svn: 246615
2015-09-02 01:31:58 +00:00
Ahmed Bougacha 699a9dd7c3 [ARM] Don't abort on variable-idx extractelt in ReconstructShuffle.
The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only
takes constant indices, but it does accept variables.
Bail out for those: we can't use them, as the shuffles we want to
reconstruct do require constant masks.

llvm-svn: 246594
2015-09-01 21:56:00 +00:00
Sanjay Patel 30145677a8 rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI)
This is a follow-on suggested by:
http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 )
http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 )

This makes the attribute name match most of the existing lowering logic
and regression test expectations.

But the current use of this attribute is inconsistent; see the FIXME
comment for "allowsMisalignedMemoryAccesses()". That change will
result in functional changes and should be coming soon.

llvm-svn: 246585
2015-09-01 20:51:51 +00:00
Ahmed Bougacha b0ff6437cb [AArch64] Lower READCYCLECOUNTER using MRS PMCCTNR_EL0.
This matches the ARM behavior. In both cases, the register is part
of the optional Performance Monitors extension, so, add the feature,
and enable it for the A-class processors we support.

Differential Revision: http://reviews.llvm.org/D12425

llvm-svn: 246555
2015-09-01 16:23:45 +00:00
Igor Breger f6f1bb6ddc AVX512: Implemented intrinsics for valign.
Differential Revision: http://reviews.llvm.org/D12526

llvm-svn: 246551
2015-09-01 15:27:18 +00:00
Silviu Baranga 755ec0e027 [AArch64] Turn on by default interleaved access vectorization
Summary:
This change turns on by default interleaved access vectorization
for AArch64.

We also clean up some tests which were spedifically enabling this
behaviour.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12149

llvm-svn: 246542
2015-09-01 11:26:46 +00:00
Silviu Baranga e748c9ef55 [ARM] Turn on by default interleaved access vectorization
Summary:
This change turns on by default interleaved access vectorization on ARM,
as it has shown to be beneficial on ARM.

Reviewers: rengolin

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12146

llvm-svn: 246541
2015-09-01 11:19:15 +00:00
Matt Arsenault 51d2d0f668 AMDGPU: Fix adding redundant implicit operands
These are already added during the MachineInstr construction,
so this was adding the implicit registers twice.

llvm-svn: 246525
2015-09-01 02:02:21 +00:00
JF Bastien 73ff6afa87 WebAssembly: generate load/store
Summary: This handles all load/store operations that WebAssembly defines, and handles those necessary for C++ such as i1. I left a FIXME for outstanding features which aren't required for now.

Reviewers: sunfish

Subscribers: jfb, llvm-commits, dschuff
llvm-svn: 246500
2015-08-31 22:24:11 +00:00
Sanjay Patel d9a5c225d1 [x86] enable machine combiner reassociations for scalar 'or' insts
llvm-svn: 246481
2015-08-31 20:27:03 +00:00
Reid Kleckner e00faf8ce1 [EH] Handle non-Function personalities like unknown personalities
Also delete and simplify a lot of MachineModuleInfo code that used to be
needed to handle personalities on landingpads.  Now that the personality
is on the LLVM Function, we no longer need to track it this way on MMI.
Certainly it should not live on LandingPadInfo.

llvm-svn: 246478
2015-08-31 20:02:16 +00:00
Quentin Colombet a80b9c824e [AArch64][CollectLOH] Remove an invalid assertion and add a test case exposing it.
rdar://problem/22491525

llvm-svn: 246472
2015-08-31 19:02:00 +00:00
Matthias Braun 0acbd08f3c AArch64: Fix loads to lower NEON vector lanes using GPR registers
The ISelLowering code turned insertion turned the element for the
lowest lane of a BUILD_VECTOR into an INSERT_SUBREG, this prohibited
the patterns for SCALAR_TO_VECTOR(Load) to match later. Restrict this
to cases without a load argument.

Reported in rdar://22223823

Differential Revision: http://reviews.llvm.org/D12467

llvm-svn: 246462
2015-08-31 18:25:15 +00:00
Matthias Braun 818c78d0cc X86: Fix FastISel SSESelect register class
X86FastISel has been using the wrong register class for VBLENDVPS which
produces a VR128 and needs an extra copy to the target register. The
problem was already hit by the existing test cases when using
> llvm-lit -Dllc="llc -verify-machineinstr"

llvm-svn: 246461
2015-08-31 18:25:11 +00:00
Igor Breger 5ea0a68115 AVX512: ktest implemantation
Added tests for encoding.

Differential Revision: http://reviews.llvm.org/D11979

llvm-svn: 246439
2015-08-31 13:30:19 +00:00
Igor Breger f3ded811b2 AVX512: Implemented encoding and intrinsics for vdbpsadbw
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12491

llvm-svn: 246436
2015-08-31 13:09:30 +00:00
Igor Breger 59ac339357 AVX512: kadd implementation
Added tests for encoding.

Differential Revision: http://reviews.llvm.org/D11973

llvm-svn: 246432
2015-08-31 11:50:23 +00:00
Igor Breger 2ae0fe3ac3 AVX512: Implemented encoding and intrinsics for vpalignr
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12270

llvm-svn: 246428
2015-08-31 11:14:02 +00:00
Hal Finkel a2cdbce661 [PowerPC] Fixup SELECT_CC (and SETCC) patterns with i1 comparison operands
There were really two problems here. The first was that we had the truth tables
for signed i1 comparisons backward. I imagine these are not very common, but if
you have:
  setcc i1 x, y, LT
this has the '0 1' and the '1 0' results flipped compared to:
  setcc i1 x, y, ULT
because, in the signed case, '1 0' is really '-1 0', and the answer is not the
same as in the unsigned case.

The second problem was that we did not have patterns (at all) for the unsigned
comparisons select_cc nodes for i1 comparison operands. This was the specific
cause of PR24552. These had to be added (and a missing Altivec promotion added
as well) to make sure these function for all types. I've added a bunch more
test cases for these patterns, and there are a few FIXMEs in the test case
regarding code-quality.

Fixes PR24552.

llvm-svn: 246400
2015-08-30 22:12:50 +00:00
Hal Finkel 982e8d48f8 [MIR Serialization] static -> static const in getSerializable*MachineOperandTargetFlags
Make the arrays 'static const' instead of just 'static'. Post-commit review
comment from Roman Divacky on IRC. NFC.

llvm-svn: 246376
2015-08-30 08:07:29 +00:00
Hal Finkel 2d55698ed7 [PowerPC/MIR Serialization] Target flags serialization support
Add support for MIR serialization of PowerPC-specific operand target flags
(based on the generic infrastructure added in r244185 and r245383).

I won't even pretend that this is good test coverage, but this includes the
regression test associated with r246372. Adding an MIR test for that fix is far
superior to adding an IR-level test because particular instruction-scheduling
decisions are necessary in order to expose the bug, and using an MIR test we
can start the pipeline post-scheduling.

llvm-svn: 246373
2015-08-30 07:50:35 +00:00
Hal Finkel d2fd9becf4 [PowerPC] Don't assume ADDISdtprelHA's source is r3
Even through ADDISdtprelHA generally has r3 as its source register, it is
possible for the instruction scheduler to move things around such that some
other register is the source. We need to print the actual source register, not
always r3. Fixes PR24394.

The test case will come in a follow-up commit because it depends on MIR
target-flags parsing.

llvm-svn: 246372
2015-08-30 07:44:05 +00:00
Chandler Carruth bb47b9a367 [Triple] Stop abusing a class to have only static methods and just use
the namespace that we are already using for the enums that are produced
by the parsing.

llvm-svn: 246367
2015-08-30 02:09:48 +00:00
James Molloy 45ee9898ec [ARM] Hoist fabs/fneg above a conversion to float.
This is especially visible in softfp mode, for example in the implementation of libm fabs/fneg functions. If we have:

%1 = vmovdrr r0, r1
%2 = fabs %1

then move the fabs before the vmovdrr:

%1 = and r1, #0x7FFFFFFF
%2 = vmovdrr r0, r1

This is never a lose, and could be a serious win because the vmovdrr may be followed by a vmovrrd, which would enable us to remove the conversion into FPRs completely.

We already do this for f32, but not for f64. Tests are added for both.

llvm-svn: 246360
2015-08-29 10:49:11 +00:00
Matt Arsenault e4d0c142e8 AMDGPU: Add sdst operand to VOP2b instructions
The VOP3 encoding of these allows any SGPR pair for the i1
output, but this was forced before to always use vcc.
This doesn't yet try to use this, but does add the operand
to the definitions so the main change is adding vcc to the
output of the VOP2 encoding.

llvm-svn: 246358
2015-08-29 07:16:50 +00:00
Matt Arsenault 9a32cd3d3b AMDGPU: Set mem operands for spill instructions
llvm-svn: 246357
2015-08-29 06:48:57 +00:00
Matt Arsenault 5c004a7c61 AMDGPU: Fix dropping mem operands when moving to VALU
Without a memory operand, mayLoad or mayStore instructions
are treated as hasUnorderedMemRef, which results in much worse
scheduling.

We really should have a verifier check that any
non-side effecting mayLoad or mayStore has a memory operand.
There are a few instructions (interp and images) which I'm
not sure what / where to add these.

llvm-svn: 246356
2015-08-29 06:48:46 +00:00
Tom Stellard eea72ccbf2 AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediates
Summary:
We were assuming tha if the use operand had a sub-register that
the immediate was 64-bits, but this was breaking the case of
folding a 64-bit immediate into another 64-bit instruction.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12255

llvm-svn: 246354
2015-08-29 01:58:21 +00:00
Tom Stellard b8ce14c4c3 AMDGPU/SI: Factor operand folding code into its own function
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12254

llvm-svn: 246353
2015-08-28 23:45:19 +00:00
Vedant Kumar 44fccb7b50 [X86] NFC: Clean up and clang-format a few lines
llvm-svn: 246340
2015-08-28 21:59:00 +00:00
Petar Jovanovic 28e2b717fc [mips] Remove incorrect DebugLoc entries from prologue
This has been causing the prologue_end to be incorrectly positioned.

Patch by Vladimir Radosavljevic.

Differential Revision: http://reviews.llvm.org/D11293

llvm-svn: 246309
2015-08-28 17:53:26 +00:00
Sanjay Patel 7c912898a5 [x86] enable machine combiner reassociations for scalar 'and' insts
llvm-svn: 246300
2015-08-28 14:09:48 +00:00
Ahmed Bougacha f9c19da03a [CodeGen] Support (and default to) expanding READCYCLECOUNTER to 0.
For targets that didn't support this, this will let us respect the
langref instead of failing to select.

Note that we don't need to change the 32-bit x86/PPC lowerings (to
account for the result type/# difference) because they're both
custom and bypass type legalization.

llvm-svn: 246258
2015-08-28 01:49:59 +00:00
Quentin Colombet fa4ecb4b9a [AArch64][CollectLOH] Fix a regression that prevented us to detect chains of
more than 2 instructions.

I introduced this regression a while back and did not noticed it because I
somehow forgot to push the initial test cases for the pass!

Fix that as well!

llvm-svn: 246239
2015-08-27 23:47:10 +00:00
Reid Kleckner 0e2882345d [WinEH] Add some support for code generating catchpad
We can now run 32-bit programs with empty catch bodies.  The next step
is to change PEI so that we get funclet prologues and epilogues.

llvm-svn: 246235
2015-08-27 23:27:47 +00:00
Hal Finkel 7ffe55ae9d [PowerPC] Remove unnecessary braces in PPCVSXFMAMutate
Address Eric's post-commit review of r245741. NFC.

llvm-svn: 246121
2015-08-26 23:41:53 +00:00
Bjarke Hammersholt Roune 6c64738e87 [NVPTX] Let NVPTX backend detect integer min and max patterns.
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.


Reviewers: jholewinski, meheff, jingyue

Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D12377

llvm-svn: 246107
2015-08-26 23:22:02 +00:00
Cong Hou b5ef475e5c [ARM] Use BranchProbability::scale() to scale an integer with a probability in ARMBaseInstrInfo.cpp,
Previously in isProfitableToIfCvt() in ARMBaseInstrInfo.cpp, the multiplication between an integer and a branch probability is done manually in an unsafe way that may lead to overflow. This patch corrects those cases by using BranchProbability's member function scale() to avoid overflow (which stores the intermediate result in int64).

Differential Revision: http://reviews.llvm.org/D12295

llvm-svn: 246106
2015-08-26 23:17:52 +00:00
JF Bastien b1b61ebb21 WebAssembly: NFC comment update
llvm-svn: 246101
2015-08-26 23:03:07 +00:00
JF Bastien 45479f627a WebAssembly: handle private/internal globals.
Things of note:
 - Other linkage types aren't handled yet. We'll figure it out with dynamic linking.
 - Special LLVM globals are either ignored, or error out for now.
 - TLS isn't supported yet (WebAssembly will have threads later).
 - There currently isn't a syntax for alignment, I left it in a comment so it's easy to hook up.
 - Undef is convereted to whatever the type's appropriate null value is.
 - assert versus report_fatal_error: follow what other AsmPrinters do, and assert only on what should have been caught elsewhere.

llvm-svn: 246092
2015-08-26 22:09:54 +00:00
Reid Kleckner c2b9254426 [ms-inline-asm] Relax assertion around funky identifiers slightly
A corresponding clang change will make it so that clang can consume part
of an assembler token. The assembler treats '.' as an identifier
character while clang does not, so it's view of the token stream is a
little different.

llvm-svn: 246089
2015-08-26 21:57:25 +00:00
Mehdi Amini 0ab4b5b52e Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246082
2015-08-26 21:16:29 +00:00
Matt Arsenault 8a067121f8 AMDGPU: Delete dead code
There is no context where s_mov_b64 is emitted
and could potentially be moved to the VALU.
It is currently only emitted for materializing
immediates, which can't be dependent on vector sources.

The immediate splitting is already done when selecting
constants. I'm not sure what contexts if any the register
splitting would have been used before.

Also clean up using s_mov_b64 in place of v_mov_b64_pseudo,
although this isn't required and just skips the extra step
of eliminating the copy from the SReg_64.

llvm-svn: 246080
2015-08-26 20:48:08 +00:00
Matt Arsenault 5e7f95e567 AMDGPU: Don't reprocess instructions when splitting i64 bcnt
llvm-svn: 246079
2015-08-26 20:48:04 +00:00
Matt Arsenault 445833cc91 AMDGPU: Fix not moving users of s_bfe_i64 to VALU
This wouldn't propagate to users of the original BFE
and would hit a verifier error.

llvm-svn: 246078
2015-08-26 20:47:58 +00:00
Matt Arsenault f003c38e1e AMDGPU: Don't create intermediate SALU instructions
When splitting 64-bit operations, create the correct
VALU instructions immediately.

This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.

llvm-svn: 246077
2015-08-26 20:47:50 +00:00
Andrew Kaylor af083d4cf9 Expose hasLiveCondCodeDef as a member function of the X86InstrInfo class. NFC
This takes the existing static function hasLiveCondCodeDef and makes it a member function of the X86InstrInfo class. This is a useful utility function that an upcoming change would like to use. NFC.

Patch by: Kevin B. Smith
Differential Revision: http://reviews.llvm.org/D12371

llvm-svn: 246073
2015-08-26 20:36:52 +00:00
Mehdi Amini 31ebf03c09 Revert "Fix LLVM C API for DataLayout"
This reverts commit r246052.
Third attempt, still unpleasant for some bots.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246057
2015-08-26 19:24:59 +00:00
Matt Arsenault 602a16d3db AMDGPU/SI: Report SIFixSGPRLiveRanges changed function
llvm-svn: 246056
2015-08-26 19:12:03 +00:00
Mehdi Amini 9d692b6805 Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246052
2015-08-26 18:56:01 +00:00
Matt Arsenault bd66061db7 AMDGPU: Make sure to reserve super registers
I think this could potentially have broken if
one of the super registers were allocated
that contain v254/v255.

llvm-svn: 246051
2015-08-26 18:54:50 +00:00
Mehdi Amini 8b3dda3f71 Revert "Fix LLVM C API for DataLayout"
This reverts commit r246044.
Build broken, still. It builds for me...

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246049
2015-08-26 18:37:59 +00:00
Matt Arsenault 19c5488015 AMDGPU: Produce error on dynamic_stackalloc
llvm-svn: 246048
2015-08-26 18:37:13 +00:00
Mehdi Amini b5d8b27fc8 Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 246044
2015-08-26 18:22:34 +00:00
James Y Knight 3602286937 [SPARC] Fix stupid oversight in stack realignment support.
If you're going to realign %sp to get object alignment properly (which
the code does), and stack offsets and alignments are calculated going
down from %fp (which they are), then the total stack size had better
be a multiple of the alignment. LLVM did indeed ensure that.

And then, after aligning, the sparc frame code added 96 (for sparcv8)
to the frame size, making any requested alignment of 64-bytes or
higher *guaranteed* to be misaligned. The test case added with r245668
even tests this exact scenario, and asserted the incorrect behavior,
which I somehow failed to notice. D'oh.

This change fixes the frame lowering code to align the stack size
*after* adding the spill area, instead.

Differential Revision: http://reviews.llvm.org/D12349

llvm-svn: 246042
2015-08-26 17:57:51 +00:00
Vedant Kumar bf891b12b4 [llvm-mc] Ignore opcode size prefix in 64-bit CALL disassembly
This is a fix for disassembling unusual instruction sequences in 64-bit
mode w.r.t the CALL rel16 instruction. It might be desirable to move the
check somewhere else, but it essentially mimics the special case
handling with JCXZ in 16-bit mode.

The current behavior accepts the opcode size prefix and causes the
call's immediate to stop disassembling after 2 bytes. When debugging
sequences of instructions with this pattern, the disassembler output
becomes extremely unreliable and essentially useless (if you jump midway
into what lldb thinks is a unified instruction, you'll lose %rip). So we
ignore the prefix and consume all 4 bytes when disassembling a 64-bit
mode binary.

Note: in Vol. 2A 3-99 the Intel spec states that CALL rel16 is N.S. N.S.
is defined as:

    Indicates an instruction syntax that requires an address override
    prefix in 64-bit mode and is not supported. Using an address
    override prefix in 64-bit mode may result in model-specific
    execution behavior. (Vol. 2A 3-7)

Since 0x66 is an operand override prefix we should be OK (although we
may want to warn about 0x67 prefixes to 0xe8). On the CPUs I tested
with, they all ignore the 0x66 prefix in 64-bit mode.

Patch by Matthew Barney!

Differential Revision: http://reviews.llvm.org/D9573

llvm-svn: 246038
2015-08-26 16:20:29 +00:00
Chad Rosier 9f4709b261 [AArch64] Remove a use-after-free when collecting stats.
The call to mergePairedInsns() deletes MI, so the later use by isUnscaledLdSt()
is referencing freed memory.

llvm-svn: 246033
2015-08-26 13:39:48 +00:00
Silviu Baranga db1ddb32ce [AArch64] Unify the integer min/max vector selection patterns with the intrinsic ones
Summary:
This change lowers the aarch64 integer vector min/max intrinsic nodes to
generic min/max nodes and replaces the intrinsic selection patterns with
the generic ones.

There should already be testing in place for this, so no further tests
were added.

Reviewers: jmolloy

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12276

llvm-svn: 246030
2015-08-26 11:11:14 +00:00
Matthias Braun ccfc9c8d6d FastISel: Use finishCondBranch() for ARM,Mips,PowerPC FastISel
Note that after this change branch probabilities are preserved now.

llvm-svn: 245998
2015-08-26 01:55:47 +00:00
Matthias Braun 17af607796 FastISel: Factor out common code; NFC intended
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.

llvm-svn: 245997
2015-08-26 01:38:00 +00:00
JF Bastien 1a4aa1589b WebAssembly: add small FIXME for AsmPrinter.
Suggested by @sunfish as a follow-up to r245982.

llvm-svn: 245996
2015-08-26 00:50:49 +00:00
Charles Davis 119525914c Make variable argument intrinsics behave correctly in a Win64 CC function.
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.

Reviewers: nadav, asl, eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1622

llvm-svn: 245990
2015-08-25 23:27:41 +00:00
JF Bastien 54be3b1f03 WebAssembly: assert that there aren't any constant pools
WebAssembly will either use globals or immediates, since it's a virtual ISA.

llvm-svn: 245989
2015-08-25 23:19:49 +00:00
JF Bastien b6091dfe0f WebAssembly: emit `(func (param t) (result t))` s-expressions
Summary: Match spec format: https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wasm

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D12307

llvm-svn: 245986
2015-08-25 22:58:05 +00:00
JF Bastien 289287060b WebAssembly: comment out .globl when printing textual assembly
Do the same for .weak (not implemented for now, but may as well to it). Update comment string to two semicolons.

llvm-svn: 245982
2015-08-25 22:23:15 +00:00
Sanjay Patel deb8f826a5 make fast unaligned memory accesses implicit with SSE4.2 or SSE4a
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.

This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.

A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx

This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449

Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.

Differential Revision: http://reviews.llvm.org/D12288

llvm-svn: 245950
2015-08-25 16:29:21 +00:00
Michael Kuperstein 6e3fee07f7 [X86] Remove references to _ftol2
As of r245924, _ftol2 is no longer used for fptoui on MS platforms.
Remove the dead code associated with it.

llvm-svn: 245925
2015-08-25 07:58:33 +00:00
Michael Kuperstein 8515893be8 [X86] Fix fptoui conversions
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.

Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316

llvm-svn: 245924
2015-08-25 07:42:09 +00:00
Steve King 5cdbd20cc3 Pass function attributes instead of boolean in isIntDivCheap().
llvm-svn: 245921
2015-08-25 02:31:21 +00:00
Mehdi Amini f83b865448 Revert "Fix LLVM C API for DataLayout"
This reverts commit 433bfd94e4b7e3cc3f8b08f8513ce47817941b0c.
Broke some bot, have to see why it passed locally.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245917
2015-08-25 01:21:09 +00:00
Mehdi Amini 84b2e325d3 Fix LLVM C API for DataLayout
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.

It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.

Reviewers: ributzka

Differential Revision: http://reviews.llvm.org/D12263

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245916
2015-08-25 01:07:25 +00:00
Hal Finkel 0f2ddcb83f [PowerPC] PPCVSXFMAMutate should ignore trivial-copy addends
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.

llvm-svn: 245907
2015-08-24 23:48:28 +00:00
Matthias Braun b2b7ef1de8 MachineBasicBlock: Add liveins() method returning an iterator_range
llvm-svn: 245895
2015-08-24 22:59:52 +00:00
Dan Gohman 2683a5534e [WebAssembly] DYNAMIC_STACKALLOC returns a pointer.
llvm-svn: 245893
2015-08-24 22:31:52 +00:00
JF Bastien af111db8af WebAssembly: Implement call
Summary: Support function calls.

Reviewers: sunfish, sunfishcode

Subscribers: sunfishcode, jfb, llvm-commits

Differential revision: http://reviews.llvm.org/D12219

llvm-svn: 245887
2015-08-24 22:16:48 +00:00
JF Bastien 19c2e6634d Revert two bad commits.
Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting.

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D12298

llvm-svn: 245886
2015-08-24 22:07:33 +00:00
JF Bastien 744ad106c3 Missing print.
llvm-svn: 245883
2015-08-24 22:00:04 +00:00
JF Bastien d8a9d66d50 call
llvm-svn: 245882
2015-08-24 21:59:51 +00:00
Dan Gohman 12e1997e4b [WebAssembly] Make the assembly printer indent instructions.
llvm-svn: 245875
2015-08-24 21:19:48 +00:00
Dan Gohman 69c4c76396 [WebAssembly] CodeGen support for __builtin_wasm_page_size()
llvm-svn: 245872
2015-08-24 21:03:24 +00:00
Bill Schmidt 32fd189de2 [PPC64LE] Fix PR24546 - Swap optimization and debug values
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass.  The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation.  I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.

llvm-svn: 245862
2015-08-24 19:27:27 +00:00
Dan Gohman 7b63484b99 [WebAssembly] Skeleton FastISel support
llvm-svn: 245860
2015-08-24 18:44:37 +00:00
Dan Gohman 896e53fae8 [WebAssembly] Implement floating point rounding operators.
llvm-svn: 245859
2015-08-24 18:23:13 +00:00
Dan Gohman 01612f627d [WebAssembly] Tell TargetTransformInfo about popcnt and sqrt.
llvm-svn: 245853
2015-08-24 16:51:46 +00:00
Dan Gohman e419a7c307 [WebAssembly] Use the checked form of MachineFunction::getSubtarget. NFC.
llvm-svn: 245852
2015-08-24 16:46:31 +00:00
Dan Gohman 08fc966d3c [WebAssembly] Implement the is_zero_undef forms of cttz and ctlz
llvm-svn: 245851
2015-08-24 16:39:37 +00:00
Michael Zuckerman 9beca2e7e2 [X86] Add support for mmword memory operand size for Intel-syntax x86 assembly
Differential Revision: http://reviews.llvm.org/D12151

llvm-svn: 245835
2015-08-24 10:26:54 +00:00
Scott Douglass bdef60462d [ARM] Use AEABI helpers for i64 div and rem
Differential Revision: http://reviews.llvm.org/D12232

llvm-svn: 245830
2015-08-24 09:17:18 +00:00
Scott Douglass d2974a6afa [ARM] Refactor LowerDivRem before adding LowerREM (nfc)
Differential Revision: http://reviews.llvm.org/D12230

llvm-svn: 245829
2015-08-24 09:17:11 +00:00
Michael Zuckerman 2fe19db94f first commit to llvm
llvm-svn: 245825
2015-08-24 07:48:50 +00:00
Mehdi Amini a758398833 Add missing break in AArch64DAGToDAGISel::Select() switch case
Reported by coverity.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245800
2015-08-23 00:42:57 +00:00
Jingyue Wu fcec09866a [NVPTX] Allow undef value as global initializer
Summary:
__shared__ variable may now emit undef value as initializer, do not
throw error on that.

Test Plan: test/CodeGen/NVPTX/global-addrspace.ll

Patch by Xuetian Weng

Reviewers: jholewinski, tra, jingyue

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D12242

llvm-svn: 245785
2015-08-22 05:40:26 +00:00
Matt Arsenault 0a3ac1be43 AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM
Although the basic s_load_* instructions happen to use the same
opcode, some of the special case SMRD instructions have
different opcodes.

llvm-svn: 245775
2015-08-22 00:54:31 +00:00
Matt Arsenault e8df879948 AMDGPU: Improve accuracy of instruction rates for some FP instructions
llvm-svn: 245774
2015-08-22 00:50:41 +00:00
Matt Arsenault 33010103b7 AMDGPU: Use DFS to avoid second loop over function
llvm-svn: 245772
2015-08-22 00:43:38 +00:00
Matt Arsenault c8d8e4ed76 AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges
llvm-svn: 245769
2015-08-22 00:19:34 +00:00
Matt Arsenault aba29d6ab1 AMDGPU: Improve debug printing in SIFixSGPRLiveRanges
llvm-svn: 245768
2015-08-22 00:19:25 +00:00
Matt Arsenault 6adf07a92e AMDGPU: Move CI instructions into CIInstructions.td
There are still a couple of CI patterns left in SIInstructions.

llvm-svn: 245767
2015-08-22 00:16:34 +00:00
Matt Arsenault f56872dc30 AMDGPU: Minor cleanups to help with f16 support
The main change is inverting the condition for the
operand class classes so that VT.Size == 16 uses VGPR_32
instead of 64.

llvm-svn: 245764
2015-08-21 23:49:51 +00:00
Tom Stellard bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
Vedant Kumar 366dd9fd2b [ARM] Fix MachO CPU Subtype selection
Differential Revision: http://reviews.llvm.org/D12040

llvm-svn: 245744
2015-08-21 21:52:48 +00:00
Hal Finkel ff9639d6b7 [PowerPC] PPCVSXFMAMutate should not segfault on undef input registers
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).

Fixes the test case from PR24542 (although that may have been over-reduced).

llvm-svn: 245741
2015-08-21 21:34:24 +00:00
Sanjay Patel f0bc07f7a5 [x86] enable machine combiner reassociations for 256-bit vector min/max
llvm-svn: 245735
2015-08-21 21:04:21 +00:00
Sanjay Patel dddad10241 remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or later
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.

llvm-svn: 245733
2015-08-21 20:39:17 +00:00
Sanjay Patel 9e916dc48d [x86] invert logic for attribute 'FeatureFastUAMem'
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.

Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.

Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.

Differential Revision: http://reviews.llvm.org/D12154

llvm-svn: 245729
2015-08-21 20:17:26 +00:00
Sanjay Patel cf942fa905 [x86] enable machine combiner reassociations for 128-bit vector min/max
llvm-svn: 245715
2015-08-21 18:06:49 +00:00
Eric Christopher e5e302f7e0 Fix typo - symetric -> symmetric.
llvm-svn: 245705
2015-08-21 16:23:39 +00:00
James Y Knight 667395f334 [Sparc] Support user-specified stack object overalignment.
Note: I do not implement a base pointer, so it's still impossible to
have dynamic realignment AND dynamic alloca in the same function.

This also moves the code for determining the frame index reference
into getFrameIndexReference, where it belongs, instead of inline in
eliminateFrameIndex.

[Begin long-winded screed]

Now, stack realignment for Sparc is actually a silly thing to support,
because the Sparc ABI has no need for it -- unlike the situation on
x86, the stack is ALWAYS aligned to the required alignment for the CPU
instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9.

However, LLVM unfortunately implements user-specified overalignment
using stack realignment support, so for now, I'm going to go along
with that tradition. GCC instead treats objects which have alignment
specification greater than the maximum CPU-required alignment for the
target as a separate block of stack memory, with their own virtual
base pointer (which gets aligned). Doing it that way avoids needing to
implement per-target support for stack realignment, except for the
targets which *actually* have an ABI-specified stack alignment which
is too small for the CPU's requirements.

Further unfortunately in LLVM, the default canRealignStack for all
targets effectively returns true, despite that implementing that is
something a target needs to do specifically. So, the previous behavior
on Sparc was to silently ignore the user's specified stack
alignment. Ugh.

Yet MORE unfortunate, if a target actually does return false from
canRealignStack, that also causes the user-specified alignment to be
*silently ignored*, rather than emitting an error.

(I started looking into fixing that last, but it broke a bunch of
tests, because LLVM actually *depends* on having it silently ignored:
some architectures (e.g. non-linux i386) have smaller stack alignment
than spilled-register alignment. But, the fact that a register needs
spilling is not known until within the register allocator. And by that
point, the decision to not reserve the frame pointer has been frozen
in place. And without a frame pointer, stack realignment is not
possible. So, canRealignStack() returns false, and
needsStackRealignment() then returns false, assuming everyone can just
go on their merry way assuming the alignment requirements were
probably just suggestions after-all. Sigh...)

Differential Revision: http://reviews.llvm.org/D12208

llvm-svn: 245668
2015-08-21 04:17:56 +00:00
NAKAMURA Takumi cf61aae163 SparcAsmParser.cpp: Appease msc x86.
llvm-svn: 245661
2015-08-21 01:12:19 +00:00
Matthias Braun 46e5639806 AArch64: Fix cmp;ccmp ordering
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.

This fixes http://llvm.org/PR24459

llvm-svn: 245641
2015-08-20 23:33:34 +00:00
Matthias Braun 266204b7dc AArch64: Do not create CCMP on multiple users.
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.

llvm-svn: 245640
2015-08-20 23:33:31 +00:00
Dan Gohman 32907a6b21 [WebAssembly] Mark more operators as Expand.
llvm-svn: 245636
2015-08-20 22:57:13 +00:00
Ahmed Bougacha 0cdc7719f0 [X86] Look for scalar through one bitcast when lowering to VBROADCAST.
Fixes PR23464: one way to use the broadcast intrinsics is:

  _mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));

We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.

Differential Revision: http://reviews.llvm.org/D10557

llvm-svn: 245613
2015-08-20 21:02:39 +00:00
Jingyue Wu ca3ef11a9b [NVPTX] truncating 64-bit to 32-bit is free
Summary:
Add an LSR test that exercises isTruncateFree. Without this change, LSR creates
another indvar representing the truncated value.

Reviewers: jholewinski, eliben

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D12058

llvm-svn: 245611
2015-08-20 20:59:02 +00:00
Ahmed Bougacha 1a498705e4 [X86] Replace avx2 broadcast intrinsics with native IR.
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.

Differential Revision: http://reviews.llvm.org/D10555

llvm-svn: 245606
2015-08-20 20:36:19 +00:00
James Molloy bf17009a97 [ARM] Don't try and custom lower a vNi64 SETCC.
It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes.

Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009.

llvm-svn: 245577
2015-08-20 16:33:44 +00:00
Douglas Katzman 58195a2d74 [Sparc]: correct the 'set' synthetic instruction
Differential Revision: http://reviews.llvm.org/D12194

llvm-svn: 245575
2015-08-20 16:16:16 +00:00
Marina Yatsina bce1ab67a5 [X86] Fix FBLD and FBSTP
FBLD and FBSTP should receive TBYTE because it is defined as
FBLD m80
FBSTP m80

Differential Revision: http://reviews.llvm.org/D11748

llvm-svn: 245553
2015-08-20 11:51:24 +00:00
Marina Yatsina 7a4e1ba737 [X86] Fix bug in COMISD and COMISS definition in td files
COMISD should receive QWORD because it is defined as
 (V)COMISD xmm1, xmm2/m64

COMISS should receive DWORD because it is defined as
 (V)COMISS xmm1, xmm2/m32

Differential Revision: http://reviews.llvm.org/D11712

llvm-svn: 245551
2015-08-20 11:21:36 +00:00
David Majnemer cfc1df553e [X86] Fix the (shl (and (setcc_c), c1), c2) -> (and setcc_c, (c1 << c2)) fold
We didn't check for the necessary preconditions before folding a
mask/shift into a single mask.

This fixes PR24516.

llvm-svn: 245544
2015-08-20 09:00:56 +00:00
Hal Finkel 9fdce9adee [PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).

Fixes PR24225.

llvm-svn: 245535
2015-08-20 03:02:02 +00:00
Hal Finkel be78c25acb [PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.

llvm-svn: 245530
2015-08-20 01:18:20 +00:00
Sanjay Patel 9e5927fdc3 [x86] enable machine combiner reassociations for scalar double-precision min/max
llvm-svn: 245506
2015-08-19 21:27:27 +00:00
Sanjay Patel 4e3ee1e548 [x86] enable machine combiner reassociations for scalar single-precision maximums
llvm-svn: 245504
2015-08-19 21:18:46 +00:00
Juergen Ributzka b12248e9cd [AArch64][FastISel] Don't fold shifts with UB.
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.

This fixes rdar://problem/22345295.

llvm-svn: 245499
2015-08-19 20:52:55 +00:00
David Majnemer f25fe64716 [X86] Emit more efficient >= comparisons against 0
We don't do a great job with >= 0 comparisons against zero when the
result is used as an i8.

Given something like:
  void f(long long LL, bool *B) {
    *B = LL >= 0;
  }

We used to generate:
  shrq    $63, %rdi
  xorb    $1, %dil
  movb    %dil, (%rsi)

Now we generate:
  testq   %rdi, %rdi
  setns   (%rsi)

Differential Revision: http://reviews.llvm.org/D12136

llvm-svn: 245498
2015-08-19 20:51:40 +00:00
Dan Gohman dde8dce6a9 [WebAssembly] Use the default alignment for SIMD types.
Previously WebAssembly's datalayout string had -v128:8:128. This had been an
attempt to declare a certain level of support for unaligned SIMD accesses.
However, clang makes its own determinations for SIMD alignment that are
independent of the datalayout string, so this wasn't actually meaningful.

llvm-svn: 245494
2015-08-19 20:30:20 +00:00
Douglas Katzman 2362b69dd9 [Sparc]: asm-only support for the ldstub instruction.
llvm-svn: 245485
2015-08-19 19:30:57 +00:00
Nemanja Ivanovic 5f1cea4141 Temporary fix for the self-host failures introduced by rL244921.
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.

llvm-svn: 245481
2015-08-19 19:04:47 +00:00
Bruno Cardoso Lopes 27fd06922b [PeepholeOptimizer] Look through PHIs to find additional register sources
Reintroduce r245442. Remove an overly conservative assertion introduced
in r245442. We could replace the assertion to use `shareSameRegisterFile`
instead, but in that point in `insertPHI` we already lost the original
Def subreg to check against. So drop the assertion completely.

Original commit message:

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

llvm-svn: 245479
2015-08-19 18:53:36 +00:00
Douglas Katzman e5485c651e [SPARC] Enable writing to floating-point-state register.
llvm-svn: 245475
2015-08-19 18:34:48 +00:00
Ahmed Bougacha 9e00ec6195 [AArch64] Improve short-form diags on long-form Match_InvalidOperand.
Since r244955, we try to use the short-form ErrorInfo when both
tries failed, and the long-form match failed on a suffix operand.
However, this means we sometimes mix ErrorInfo and MatchResult
(one manifestation of this being PR24498). Instead, restore both.

llvm-svn: 245469
2015-08-19 17:40:19 +00:00
Renato Golin eb552e83e0 Revert "[AArch64] Simplify/refactor code to ease code review. NFC."
This reverts commit r245443, as it broke AArch64 test-suite tramp3d
with an assert "Reg && "Null register has no regunits".

llvm-svn: 245455
2015-08-19 16:29:53 +00:00
Derek Schuff 55817ee604 x32. Fixes a bug in x32 exception handling.
This patch updates the X86 lowering so that the Exception Pointer and Selector
are 64-bit wide only if Subtarget.isTarget64BitLP64.

Patch by João Porto

Reviewers: dschuff, rnk
Differential Revision: http://reviews.llvm.org/D12111

llvm-svn: 245454
2015-08-19 16:28:21 +00:00
JF Bastien 5ab87edbb4 x32. Fixes jmp %reg in x32
x32 has 32-bit pointers; x86-64 can't jmp %r32. This patch addresses this issue by explicitly zero-extending brind's target to 64-bits.

Author: jpp

Reviewers: jfb, dschuff, pavel.v.chupin

Subscribers: llvm-commits

Differential revision: http://reviews.llvm.org/D12112

llvm-svn: 245452
2015-08-19 16:17:08 +00:00
James Y Knight 3b0fd753c4 [Sparc] Rename LoadASR and StoreASR from r245360 to *ASI, as was intended.
llvm-svn: 245450
2015-08-19 15:59:49 +00:00
Bruno Cardoso Lopes 61009142b8 Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources"
Revert r245442 while investigating a fix. An assertion hit in
http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/11380

llvm-svn: 245446
2015-08-19 15:10:32 +00:00
James Y Knight d966fb6fef [SPARC] Fix BooleanContents, so that select of a trunc doesn't
eliminate the trunc.

Differential Revision: http://reviews.llvm.org/D10442

llvm-svn: 245444
2015-08-19 14:47:04 +00:00
Chad Rosier 494abf1ad8 [AArch64] Simplify/refactor code to ease code review. NFC.
llvm-svn: 245443
2015-08-19 14:34:54 +00:00
Bruno Cardoso Lopes 0a1c126684 [PeepholeOptimizer] Look through PHIs to find additional register sources
Reapply r243486.

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

llvm-svn: 245442
2015-08-19 14:34:41 +00:00
Silviu Baranga ad1b19fcb7 [ARM] Add instruction selection patterns for vmin/vmax
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.

Reviewers: rengolin, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12105

llvm-svn: 245439
2015-08-19 14:11:27 +00:00
Joerg Sonnenberger 7d180c59bb Map %fprs to %asr6 in the Sparc assembler parser.
llvm-svn: 245437
2015-08-19 13:55:14 +00:00
Tobias Grosser 85508e804b Revert "[X86] Widen the 'AND' mask if doing so shrinks the encoding size"
This reverts commit 245169 which miscompiles MultiSource/Applications/siod
from LNT.

llvm-svn: 245432
2015-08-19 11:35:10 +00:00
Michael Kuperstein 9fe42604aa [X86] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize
There are some cases where the mul sequence is smaller, but for the most part,
using a div is preferable. This does not apply to vectors, since x86 doesn't
have vector idiv, and a vector mul/shifts sequence ought to be smaller than a
scalarized division.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245431
2015-08-19 11:21:43 +00:00
Michael Kuperstein dcdab4cd3a [TLI] Refactor "is integer division cheap" queries.
This removes the isPow2SDivCheap() query, as it is not currently used in
any meaningful way. isIntDivCheap() no longer relies on a state variable
(as all in-tree target set it to false), but the interface allows querying
based on the type optimization level.

NFC.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245430
2015-08-19 11:17:59 +00:00
Alex Lorenz f3630113cd MIR Serialization: Serialize the operand's bit mask target flags.
This commit adds support for bit mask target flag serialization to the MIR
printer and the MIR parser. It also adds support for the machine operand's
target flag serialization to the AArch64 target.

Reviewers: Duncan P. N. Exon Smith
llvm-svn: 245383
2015-08-18 22:52:15 +00:00
Sanjay Patel 5c55fbc5ea use TLI.allowsMemoryAccess() to check if memory accesses are fast; NFCI
This consolidates use of isUnalignedMem32Slow() in one place.
There is a slight change in logic although I'm not sure that it would ever
come up in the real world: we were assuming that an alignment of the type 
size is always fast; now, we actually check the data layout to confirm that.

llvm-svn: 245382
2015-08-18 22:48:12 +00:00
Joerg Sonnenberger b0ce8747c3 Load/store instructions for floating points with address space require SparcV9.
To properly handle this, define the *a instructions as separate
instruction classes by refactoring the LoadA and StoreA multiclasses.
Move the instruction tests into the sparcv9 file to test the difference.

llvm-svn: 245360
2015-08-18 21:31:46 +00:00
David Majnemer 0ad363eebc [WinEH] Calculate state numbers for the new EH representation
State numbers are calculated by performing a walk from the innermost
funclet to the outermost funclet.   Rudimentary support for the new EH
constructs has been added to the assembly printer, just enough to test
the new machinery.

Differential Revision: http://reviews.llvm.org/D12098

llvm-svn: 245331
2015-08-18 19:07:12 +00:00
Matthias Braun d55bcf2646 MachineRegisterInfo: Introduce isPhysRegUsed()
This method checks whether a physical regiser or any of its aliases are
used in the function.

Using this function in SIRegisterInfo::findUnusedReg() should also fix
this reported failure:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html
http://reviews.llvm.org/rL242173#inline-533

The report doesn't come with a testcase and I don't know enough about
AMDGPU to create one myself.

llvm-svn: 245329
2015-08-18 18:54:27 +00:00
Sanjay Patel 1cd6d88e4d use minSize wrapper; NFCI
These were missed when other uses were switched over:
http://llvm.org/viewvc/llvm-project?view=revision&revision=243994

llvm-svn: 245311
2015-08-18 16:44:23 +00:00
Chad Rosier 3dd0e942b6 [AArch64] Simplify the logic for computing in bounds offset. NFC.
llvm-svn: 245307
2015-08-18 16:20:03 +00:00
Daniel Sanders 63f4a5dcad [mips] Expand JAL instructions when PIC is enabled.
Summary: This is the correct way to handle JAL instructions when PIC is enabled.

Patch by Toma Tabacu

Reviewers: seanbruno, tomatabacu

Subscribers: brooks, seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D6231

llvm-svn: 245305
2015-08-18 16:18:09 +00:00
Zoran Jovanovic 2fe8466f6e [mips][microMIPS] Implement DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D10953

llvm-svn: 245297
2015-08-18 14:40:43 +00:00
Zoran Jovanovic a6593ff613 [mips][microMIPS] Implement SW and SWE instructions
Differential Revision: http://reviews.llvm.org/D10869

llvm-svn: 245293
2015-08-18 12:53:08 +00:00
Daniel Sanders a699444094 [mips] Make the MipsAsmParser capable of knowing whether PIC mode is enabled or not.
Summary:
This information is needed to decide whether we do the PIC-only JAL expansions or not. It's also needed for an upcoming patch which implements the .cprestore assembler directive (which can only be used effectively in PIC mode).

By making this information available to the MipsAsmParser, we will know when to insert the instructions mandated by the .cprestore assembler directive and we will be able to give some useful warnings when we encounter a potential misuse of this directive.

Patch by Toma Tabacu

Reviewers: dsanders, seanbruno

Subscribers: brooks, seanbruno, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D5626

llvm-svn: 245291
2015-08-18 12:33:54 +00:00
Daniel Sanders f1ae367a99 [mips] Correct -Woverflow warning in r245208 without changing signedness of the constant.
This was supposed to have been committed as part of r245208

llvm-svn: 245285
2015-08-18 09:55:57 +00:00
Guozhi Wei f66d384443 Align SP adjustment in function getSPAdjust
This commit adds a new function TargetFrameLowering::alignSPAdjust
and calls it from TargetInstrInfo::getSPAdjust. It fixes PR24142.

llvm-svn: 245253
2015-08-17 22:36:27 +00:00
Douglas Katzman 685a7d1a70 [SPARC]: recognize '.' as the start of an assembler expression.
llvm-svn: 245232
2015-08-17 19:55:01 +00:00
James Molloy 974838f294 [ARM] Fix crash when targetting CPU without NEON
We emulate a scalar vmin/vmax with NEON instructions as they don't exist in the VFP ISA. So only mark these as legal when NEON is available.

Found here: https://code.google.com/p/chromium/issues/detail?id=521671

llvm-svn: 245231
2015-08-17 19:37:12 +00:00
Silviu Baranga b322aa6f53 [CostModel][AArch64] Increase cost of vector insert element and add missing cast costs
Summary:
Increase the estimated costs for insert/extract element operations on
AArch64. This is motivated by results from benchmarking interleaved
accesses.

Add missing costs for zext/sext/trunc instructions and some integer to
floating point conversions. These costs were previously calculated
by scalarizing these operation and were affected by the cost increase of
the insert/extract element operations.

Reviewers: rengolin

Subscribers: mcrosier, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11939

llvm-svn: 245226
2015-08-17 16:05:09 +00:00
Silviu Baranga d5ac26937c [CostModel][ARM] Increase cost of insert/extract operations
Summary:
This change limits the minimum cost of an insert/extract
element operation to 2 in cases where this would result
in mixing of NEON and VFP code.

Reviewers: rengolin

Subscribers: mssimpso, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D12030

llvm-svn: 245225
2015-08-17 15:57:05 +00:00
Aaron Ballman aa3d810b5f Correcting a -Woverflow warning where 0xFFFF was overflowing an implicit constant conversion.
llvm-svn: 245220
2015-08-17 14:25:57 +00:00
Daniel Sanders a39ef1c68f [mips] [IAS] Add support for the DLA pseudo-instruction and fix problems with DLI
Summary: It is the same as LA, except that it can also load 64-bit addresses and it only works on 64-bit MIPS architectures.

Reviewers: tomatabacu, seanbruno, vkalintiris

Subscribers: brooks, seanbruno, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D9524

llvm-svn: 245208
2015-08-17 10:11:55 +00:00
James Molloy 88edc8243d Remove hand-rolled matching for fmin and fmax.
SDAGBuilder now does this all for us.

llvm-svn: 245198
2015-08-17 07:13:20 +00:00
James Molloy c617be559a Rip out hand-rolled matching code for VMIN, VMAX, VMINNM and VMAXNM
This is no longer needed - SDAGBuilder will do this for us.

llvm-svn: 245197
2015-08-17 07:13:15 +00:00
Chandler Carruth 2f1fd1658f [PM] Port ScalarEvolution to the new pass manager.
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.

I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.

But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.

To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.

To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.

With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.

Differential Revision: http://reviews.llvm.org/D12063

llvm-svn: 245193
2015-08-17 02:08:17 +00:00
Yaron Keren 178c465223 Add missing include guard.
llvm-svn: 245173
2015-08-16 07:55:08 +00:00
David Majnemer 1a59e49f3c [X86] Widen the 'AND' mask if doing so shrinks the encoding size
We can set additional bits in a mask given that we know the other
operand of an AND already has some bits set to zero.  This can be more
efficient if doing so allows us to use an instruction which implicitly
sign extends the immediate.

This fixes PR24085.

Differential Revision: http://reviews.llvm.org/D11289

llvm-svn: 245169
2015-08-16 04:52:11 +00:00
Sanjay Patel 40d4eb40f6 [x86] enable machine combiner reassociations for scalar single-precision minimums
llvm-svn: 245166
2015-08-15 17:01:54 +00:00
Yaron Keren 8b2a031cff Silence VS2015 warning.
Patch by James Touton!

http://reviews.llvm.org/D11890

llvm-svn: 245161
2015-08-15 14:54:43 +00:00
Simon Pilgrim 0750c84623 [DAGCombiner] Attempt to mask vectors before zero extension instead of after.
For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors.

(zext (truncate x)) -> (zext (and(x, m))

Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code.

This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles.

Differential Revision: http://reviews.llvm.org/D11764

llvm-svn: 245160
2015-08-15 13:27:30 +00:00
Matt Arsenault 588732bd6e AMDGPU/SI: Only look at live out SGPR defs
When trying to fix SGPR live ranges, skip defs that are
killed in the same block as the def. I don't think
we need to worry about these cases as long as the
live ranges of the SGPRs in dominating blocks are
correct.

This reduces the number of elements the second
loop over the function needs to look at, and makes
it generally easier to understand. The second loop
also only considers if the live range is live
in to a block, which logically means it
must have been live out from another.

llvm-svn: 245150
2015-08-15 02:58:49 +00:00
James Y Knight 5567bafe93 Remove redundant TargetFrameLowering::getFrameIndexOffset virtual
function.

This was the same as getFrameIndexReference, but without the FrameReg
output.

Differential Revision: http://reviews.llvm.org/D12042

llvm-svn: 245148
2015-08-15 02:32:35 +00:00
JF Bastien d4698e1bac [WebAssembly] Add Relooper
This is just an initial checkin of an implementation of the Relooper algorithm, in preparation for WebAssembly codegen to utilize. It doesn't do anything yet by itself.

The Relooper algorithm takes an arbitrary control flow graph and generates structured control flow from that, utilizing a helper variable when necessary to handle irreducibility. The WebAssembly backend will be able to use this in order to generate an AST for its binary format.

Author: azakai

Reviewers: jfb, sunfish

Subscribers: jevinskie, arsenm, jroelofs, llvm-commits

Differential revision: http://reviews.llvm.org/D11691

llvm-svn: 245142
2015-08-15 01:23:28 +00:00
Matt Arsenault 297ae311ce AMDGPU/SI: Fix printing useless info with amdhsa
The comments at the bottom would all report 0 if
amdhsa was used.

llvm-svn: 245135
2015-08-15 00:12:39 +00:00
Matt Arsenault 0259a7aa41 AMDGPU/SI: Update LiveVariables
This is simple but won't work if/when this pass
is moved to be post-SSA.

llvm-svn: 245134
2015-08-15 00:12:37 +00:00
Matt Arsenault 670ba46efe AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRanges
Does not mark SlotIndexes as reserved, although I think
that might be OK.

LiveVariables still need to be handled.

llvm-svn: 245133
2015-08-15 00:12:35 +00:00
Matt Arsenault b75233235c AMDGPU: Remove unnecessary assert
These shouldn't ever be null. The number of successors
was already asserted to be 2.

llvm-svn: 245132
2015-08-15 00:12:32 +00:00
Matt Arsenault 4275c29a02 AMDGPU/SI: Make comments more precise.
True branch instructions do behave as expected with liveness.

Avoid the phrasing "branch decision is based on a value in an SGPR"
because this could be misleading. A VALU compare instruction's
result is still based on an SGPR, even though that condition
may be divergent.

llvm-svn: 245131
2015-08-15 00:12:30 +00:00
Pat Gavlin b399095c3f Add a target environment for CoreCLR.
Although targeting CoreCLR is similar to targeting MSVC, there are
certain important differences that the backend must be aware of
(e.g. differences in stack probes, EH, and library calls).

Differential Revision: http://reviews.llvm.org/D11012

llvm-svn: 245115
2015-08-14 22:41:43 +00:00
Ahmed Bougacha cd35787217 [AArch64] Fix FMLS scalar-indexed-from-2s-after-neg patterns.
We canonicalize V64 vectors to V128 through insert_subvector: the other
FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't,
so we'd fail to match fmls and generate fneg+fmla instead.
The vector equivalents are already tested and functional.

llvm-svn: 245107
2015-08-14 22:06:05 +00:00
Tom Stellard bef1094ee7 AMDGPU/SI: Add missing spill class
The compiler was failing to spill for some shaders.

Patch By: Axel Davy

llvm-svn: 245087
2015-08-14 19:46:05 +00:00
Renato Golin 980b6cc42b Revert "[ARM] Fix MachO CPU Subtype selection"
This reverts commit r245081, as it breaks many builds.

llvm-svn: 245086
2015-08-14 19:35:47 +00:00
Vedant Kumar 2f079be789 [ARM] Fix MachO CPU Subtype selection
This patch makes the Darwin ARM backend take advantage of TargetParser.  It
also teaches TargetParser about ARMV7K for the first time. This makes target
triple parsing more consistent across llvm.

Differential Revision: http://reviews.llvm.org/D11996

llvm-svn: 245081
2015-08-14 18:36:47 +00:00
Sanjay Patel ed502905f7 [x86] fix allowsMisalignedMemoryAccess() implementation
This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly
return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load
merging optimization is changed to use the TLI hook. This exposes a shortcoming in the
current logic and results in the regression test update. Changing other direct users of
the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch.

Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting
SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads
while PerformLOADCombine() splits them back into 16-byte loads.

Differential Revision: http://reviews.llvm.org/D10662

llvm-svn: 245075
2015-08-14 17:53:40 +00:00
Rafael Espindola dbaf0498a9 Revert "Centralize the information about which object format we are using."
This reverts commit r245047.

It was failing on the darwin bots. The problem was that when running

./bin/llc -march=msp430

llc gets to

  if (TheTriple.getTriple().empty())
    TheTriple.setTriple(sys::getDefaultTargetTriple());

Which means that we go with an arch of msp430 but a triple of
x86_64-apple-darwin14.4.0 which fails badly.

That code has to be updated to select a triple based on the value of
march, but that is not a trivial fix.

llvm-svn: 245062
2015-08-14 15:48:41 +00:00
Sanjay Patel 2e75341b7f don't repeaat function names in comments; NFC
llvm-svn: 245058
2015-08-14 15:11:42 +00:00
Rafael Espindola 90eb70c8a7 Centralize the information about which object format we are using.
Other than some places that were handling unknown as ELF, this should
have no change. The test updates are because we were detecting
arm-coff or x86_64-win64-coff as ELF targets before.

It is not clear if the enum should live on the Triple. At least now it lives
in a single location and should be easier to move somewhere else.

llvm-svn: 245047
2015-08-14 13:31:17 +00:00
James Molloy 63be198712 [AArch64] FMINNAN/FMAXNAN on f16 is not legal.
Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal.

I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet.

llvm-svn: 245035
2015-08-14 09:08:50 +00:00
David Majnemer b611e3f50e [IR] Add token types
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.

There are several applications for such a type but my immediate
motivation stems from WinEH.  Our personality routine enforces a
single-entry - single-exit regime for cleanups.  After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together.  We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.

Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup.  This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.

What is the burden to the optimizer?  Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway.  There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.

Differential Revision: http://reviews.llvm.org/D11861

llvm-svn: 245029
2015-08-14 05:09:07 +00:00
Saleem Abdulrasool 3e190cb098 PowerPC: remove dead initialization (NFC)
Identified by the clang static analyzer.  No functional change intended.

llvm-svn: 245022
2015-08-14 03:48:35 +00:00
Simon Pilgrim 7218251861 [AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation
D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect.

Differential Revision: http://reviews.llvm.org/D12007

llvm-svn: 244960
2015-08-13 21:40:02 +00:00
Ahmed Bougacha 80e4ac802a [AArch64] Provide "too few operands" diags on short-form NEON also.
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.

Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.

llvm-svn: 244955
2015-08-13 21:09:13 +00:00
Simon Pilgrim 4a8d6b3b9e [X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the X86 implementation
Follow up to D10947 - D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect.

This patch removes the X86 implementation and improves the AVX1/AVX2 support to correctly lower 256-bit integer vectors.

Differential Revision: http://reviews.llvm.org/D12006

llvm-svn: 244949
2015-08-13 20:45:55 +00:00
Yaron Keren 556b21aa10 Remove and forbid raw_svector_ostream::flush() calls.
After r244870 flush() will only compare two null pointers and return,
doing nothing but wasting run time. The call is not required any more
as the stream and its SmallString are always in sync.

Thanks to David Blaikie for reviewing.

llvm-svn: 244928
2015-08-13 18:12:56 +00:00
Nemanja Ivanovic 1c39ca6501 Scalar to vector conversions using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D11471

It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.

llvm-svn: 244921
2015-08-13 17:40:44 +00:00
James Molloy 31117875c2 [ARM] FMINNAN/FMAXNAN of f64 are not legal.
This was my error. We've got f32 marked as legal because they're simulated using a v2f32 instruction, but there's no equivalent for f64.

This will get test coverage imminently when D12015 lands.

llvm-svn: 244916
2015-08-13 17:28:26 +00:00
James Molloy c71f78f49f [ARM] Allow vmin/vmax of scalars to be emitted without UseNEONForFP.
This overrides the default to more closely resemble the hand-crafted matching logic in ISelLowering. It makes sense, as there is no VFP equivalent of vmin or vmax, to use them when they're available even if in general VFP ops should be preferred.

This should be NFC.

llvm-svn: 244915
2015-08-13 17:28:20 +00:00
Ulrich Weigand a887f06214 [SystemZ] Support large LLVM IR struct return values
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
  { <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.

This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.

Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.

However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.

This patch fixes the crash by implementing CanLowerReturn.  As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.

llvm-svn: 244889
2015-08-13 13:37:06 +00:00
John Brawn 68acdcb435 [ARM] Reorganise and simplify thumb-1 load/store selection
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.

Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.

Differential Revision: http://reviews.llvm.org/D11800

llvm-svn: 244882
2015-08-13 10:48:22 +00:00
Ahmed Bougacha 2a97b1bcf8 [AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN.
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.

Follow-up to r243924, r243926, and r244858.

llvm-svn: 244860
2015-08-13 01:13:56 +00:00
JF Bastien 71d29acecd WebAssembly: floating-point comparisons
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
 * Tell ISelLowering that all booleans are either 0 or 1.
 * Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
 * Add tests for ord/uno.
 * Add tests for ueq/one/ult/ule/ugt/uge.
 * Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11970

llvm-svn: 244779
2015-08-12 17:53:29 +00:00
Sanjay Patel 2366168bad 80-cols; NFC
llvm-svn: 244755
2015-08-12 15:12:25 +00:00
Sanjay Patel dc87d1440c fix typo; NFC
llvm-svn: 244753
2015-08-12 15:09:09 +00:00
Zoran Jovanovic 366783e14c [mips][microMIPS] Create microMIPS64r6 subtarget and implement DALIGN, DAUI, DAHI, DATI, DEXT, DEXTM and DEXTU instructions
Differential Revision: http://reviews.llvm.org/D10923

llvm-svn: 244744
2015-08-12 12:45:16 +00:00
Michael Kuperstein fe0d9bb6eb [X86] Disable mul -> shl + lea combine when compiling for minsize
Differential Revision: http://reviews.llvm.org/D11904

llvm-svn: 244740
2015-08-12 11:27:26 +00:00
Michael Kuperstein bc7f99a3ab [X86] Allow x86 call frame optimization to fold more loads into pushes
This abstracts away the test for "when can we fold across a MachineInstruction"
into the the MI interface, and changes call-frame optimization use the same test
the peephole optimizer users.

Differential Revision: http://reviews.llvm.org/D11945

llvm-svn: 244729
2015-08-12 10:14:58 +00:00
Matt Arsenault c574686529 AMDGPU: Fix assert on dbg_value instructions
llvm-svn: 244728
2015-08-12 09:04:44 +00:00
Simon Pilgrim 8c049d5c03 [InstCombine] Move SSE/AVX vector blend folding to instcombiner
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).

InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).

I also moved all the relevant combine tests into InstCombine/blend_x86.ll

Differential Revision: http://reviews.llvm.org/D11934

llvm-svn: 244723
2015-08-12 08:08:56 +00:00
Saleem Abdulrasool 9e5f2a96f1 X86: hoist a condition into a variable (NFC)
The same value is used multiple times through the function.  Hoist the condition
into a variable.  This should fix a silly static analysis warning where the
conditions flip around.  No functional change intended.

llvm-svn: 244713
2015-08-12 02:01:36 +00:00
Sanjay Patel 260b6d36f4 [x86] enable machine combiner reassociations for 256-bit vector FP mul/add
llvm-svn: 244705
2015-08-12 00:29:10 +00:00
Alex Lorenz 5659a2f961 PseudoSourceValue: Transform the mips subclass to target independent subclasses
This commit transforms the mips-specific 'MipsCallEntry' subclass of the
'PseudoSourceValue' class into two, target-independent subclasses named
'GlobalValuePseudoSourceValue' and 'ExternalSymbolPseudoSourceValue'.

This change makes it easier to serialize the pseudo source values by removing
target-specific pseudo source values.

Reviewers: Akira Hatanaka
llvm-svn: 244698
2015-08-11 23:23:17 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
Alex Lorenz c49e4fe9cc PseudoSourceValue: Introduce a 'PSVKind' enumerator.
This commit introduces a new enumerator named 'PSVKind' in the
'PseudoSourceValue' class. This enumerator is now used to distinguish between
the various kinds of pseudo source values.

This change is done in preparation for the changes to the pseudo source value
object management and to the PseudoSourceValue's class hierarchy - the next two
PseudoSourceValue commits will get rid of the global variable that manages the
pseudo source values and the mips specific MipsCallEntry subclass.

Reviewers: Akira Hatanaka
llvm-svn: 244687
2015-08-11 22:32:00 +00:00
Mark Heffernan 438ffe5eac Use 32-bit divides instead of 64-bit divides where possible.
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.

llvm-svn: 244684
2015-08-11 22:16:34 +00:00
JF Bastien da06bce8b5 WebAssembly: implement comparison.
Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11924

llvm-svn: 244665
2015-08-11 21:02:46 +00:00
Sanjay Patel 2c6a01570d [x86] enable machine combiner reassociations for 128-bit vector single/double multiplies
llvm-svn: 244657
2015-08-11 20:19:23 +00:00
JF Bastien 480c840896 WebAssembly: implement WebAssemblyTargetLowering::getTargetNodeName
Summary: Implementation is the same as in AArch64.

Subscribers: aemerson, jfb, llvm-commits, sunfish

Differential Revision: http://reviews.llvm.org/D11956

llvm-svn: 244655
2015-08-11 20:13:18 +00:00
Rafael Espindola 3adc7ce9f1 Use llvm::make_unique to fix the MSVC build.
llvm-svn: 244641
2015-08-11 18:11:17 +00:00
Michael Kuperstein 243c073a2e [X86] Allow merging of immediates within a basic block for code size savings
First step in preventing immediates that occur more than once within a single
basic block from being pulled into their users, in order to prevent unnecessary
large instruction encoding .Currently enabled only when optimizing for size.

Patch by: zia.ansari@intel.com
Differential Revision: http://reviews.llvm.org/D11363

llvm-svn: 244601
2015-08-11 14:10:58 +00:00
James Molloy b7b2a1e9b4 [AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an intrinsic.
Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change:

  - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant.
  - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall!

llvm-svn: 244595
2015-08-11 12:06:37 +00:00
James Molloy edf38f0cb0 [AArch64] Replace the custom AArch64ISD::FMIN/MAX nodes with ISD::FMINNAN/MAXNAN
NFCI. This just removes custom ISDNodes that are no longer needed.

llvm-svn: 244594
2015-08-11 12:06:33 +00:00
James Molloy d616c642bb [ARM] Match fminnan/fmaxnan for vector vmin/vmax instead of an intrinsic
Lower Intrinsic::arm_neon_vmins/vmaxs to fminnan/fmaxnan and match that instead. This is important because SDAG will soon be able to select FMINNAN itself, so we need a unified lowering path for intrinsics and SDAG.

NFCI.

llvm-svn: 244593
2015-08-11 12:06:28 +00:00
James Molloy ee868b2a3e [ARM] Match fminnum/fmaxnum for vector vminnm/vmaxnm instead of an intrinsic
Lower the intrinsic to a FMINNUM/FMAXNUM node and select that instead. This is important because soon SDAG will be able to select FMINNUM/FMAXNUM itself, so we need an integrated lowering path between SDAG and intrinsics.

NFCI.

llvm-svn: 244592
2015-08-11 12:06:25 +00:00
James Molloy ea3a687a33 [ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM
NFCI. This replaces another custom ISDNode with a generic equivalent.

llvm-svn: 244591
2015-08-11 12:06:22 +00:00
James Molloy db8ee4b5a9 [ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN.
NFCI. This removes a custom ISDNode.

llvm-svn: 244590
2015-08-11 12:06:15 +00:00
Marina Yatsina 8c997af103 [X86] Add SAL mnemonics for Intel syntax
SAL and SHL instructions perform the same operation

Differential Revision: http://reviews.llvm.org/D11882

llvm-svn: 244588
2015-08-11 12:05:06 +00:00
Marina Yatsina d353c45eaf [X86] Fix REPE, REPZ, REPNZ for intel syntax
REPE, REPZ, REPNZ, REPNE should have mnemonics for Intel syntax as well.
Currently using these instructions causes compilation errors for Intel syntax.

Differential Revision: http://reviews.llvm.org/D11794

llvm-svn: 244584
2015-08-11 11:28:10 +00:00
Marina Yatsina f6bc15d763 [X86] Fix imul alias for intel syntax
The "imul reg, imm" alias is not defined for intel syntax. 
In intel syntax there is no w/l/q suffix for the imul instruction.

Differential Revision: http://reviews.llvm.org/D11887

llvm-svn: 244582
2015-08-11 10:43:04 +00:00
Vasileios Kalintiris 1c78ca6a09 [mips] Remap move as or.
Summary:
This patch remaps the assembly idiom 'move' to 'or' instead of 'daddu' or
'addu'. The use of addu/daddu instead of or as move was highlighted as a
performance issue during the analysis of a recent 64bit design. Originally
move was encoded as 'or' by binutils but was changed for the r10k cpu family
due to their pipeline which had 2 arithmetic units and a single logical unit,
and so could issue multiple (d)addu based moves at the same time but only 1
logical move.

This patch preserves the disassembly behaviour so that disassembling a old style
(d)addu move still appears as move, but assembling move always gives an or

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11796

llvm-svn: 244579
2015-08-11 08:56:25 +00:00