Commit Graph

29945 Commits

Author SHA1 Message Date
Craig Topper fec61ef391 Remove a temporary variable and just construct a unique_ptr directly using make_unique.
llvm-svn: 217655
2014-09-12 05:17:20 +00:00
Matt Arsenault 362f345bab R600/SI: Fix off by 1 error in used register count
The register numbers start at 0, so if only 1 register
was used, this was reported as 0.

llvm-svn: 217636
2014-09-11 22:51:37 +00:00
Bill Schmidt be95fd5357 [PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm
Inline asm may specify 'U' and 'X' constraints to print a 'u' for an
update-form memory reference, or an 'x' for an indexed-form memory
reference.  However, these are really only useful in GCC internal code
generation.  In inline asm the operand of the memory constraint is
typically just a register containing the address, so 'U' and 'X' make
no sense.

This patch quietly accepts 'U' and 'X' in inline asm patterns, but
otherwise does nothing.  If we ever unexpectedly see a non-register,
we'll assert and sort it out afterwards.

I've added a new test for these constraints; the test case should be
used for other asm-constraints changes down the road.

llvm-svn: 217622
2014-09-11 20:10:03 +00:00
Brad Smith 2ce0d91bde Provide an implementation of getNoopForMachoTarget for SPARC.
llvm-svn: 217611
2014-09-11 17:40:51 +00:00
Adam Nemet 053c4e825c [AVX512] Fix miscompile for unpack
r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack
between the low and the high 256 bits of src1 into the low part of the
destination and another unpack of the low and high 256 bits of src2 into the
high part of the destination.

I don't think that's how unpack works.  AVX512 unpack simply has more 128-bit
lanes but other than it works the same way as AVX.  So in each 128-bit lane,
we're always interleaving certain parts of both operands rather different
parts of one of the operands.

E.g. for this:
__v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
__v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 };
__v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16,
	    			       	     24, 17, 25, 20, 28, 21, 29);

we generated punpcklps (notice how the elements of a and b are not interleaved
in the shuffle).  In turn, c was set to this:

  0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29

Obviously this should have just returned the mask vector of the shuffle
vector.

I mostly reverted this change and made sure the original AVX code worked
for 512-bit vectors as well.

Also updated the tests because they matched the logic from the code.

llvm-svn: 217602
2014-09-11 16:51:10 +00:00
Benjamin Kramer 9e5b4a5827 Move constant-sized bitvector to the stack.
llvm-svn: 217600
2014-09-11 15:58:39 +00:00
Aaron Watry 1885e53a75 R600: Add cmpxchg instruction for evergreen
Refactored the R600_LDS_1A2D class a bit to get it to actually work.

It seemed to be previously unused and broken.

We also have to disable the conversion to the noret variant for now in
R600ISelLowering because the getLDSNoRetOp method only handles 1A1D LDS ops.

Someone can feel free to modify the AMDGPU::getLDSNoRetOp method to
work for more than 1A1D variants of LDS operations. It's being left as a
future TODO for now.

Signed-off-by: Aaron Watry <awatry at gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217596
2014-09-11 15:02:54 +00:00
Aaron Watry 21591670c9 R600: Add LDS_WRXCHG[_RET] instructions for Evergreen.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217594
2014-09-11 15:02:49 +00:00
Aaron Watry 564a22e995 R600: Add LDS_MIN_[U]INT[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217593
2014-09-11 15:02:47 +00:00
Aaron Watry e51794f2fa R600: Add LDS_XOR[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217592
2014-09-11 15:02:46 +00:00
Aaron Watry cffa0114c7 R600: Add LDS_OR[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217591
2014-09-11 15:02:44 +00:00
Aaron Watry a7f122da60 R600: Add LDS_AND[_RET] instructions for Evergreen
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217590
2014-09-11 15:02:43 +00:00
Aaron Watry 62a0af4a0d R600: Add LDS_MAX_[U]INT[_RET] instructions for Evergreen
This was only present for SI before.

Cayman may still be missing, but I am unable to test that currently.

v2: Don't create atomicrmw max tests in separate file

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217589
2014-09-11 15:02:41 +00:00
Matt Arsenault 61a528adc7 R600/SI: Fix losing chain when fixing reg class of loads.
The lost chain resulting in earlier side effecting nodes
being deleted.

llvm-svn: 217561
2014-09-10 23:26:19 +00:00
Matt Arsenault 2e9911205f R600/SI: Report offset in correct units for st64 DS instructions
Need to convert the 64 element offset into bytes, not just the element
size like the normal case instructions.

Noticed by inspection. This can't be hit now because
st64 instructions aren't emitted during instruction selection,
and the post-RA scheduler isn't enabled.

llvm-svn: 217560
2014-09-10 23:26:16 +00:00
Matt Arsenault 16e313343d R600: Custom lower frem
llvm-svn: 217553
2014-09-10 21:44:27 +00:00
Rafael Espindola c435adcde0 Add doInitialization/doFinalization to DataLayoutPass.
With this a DataLayoutPass can be reused for multiple modules.

Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.

Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.

llvm-svn: 217548
2014-09-10 21:27:43 +00:00
Gerolf Hoflehner 7b0abb89c2 [AArch64] Revert r216141 for cyclone
The increase of the interleave factor to 4 has side-effects
like performance losses eg. due to reminder loops being executed
more frequently and may increase code size. It requires more
analysis and careful heuristic tuning. Expect double digit gains
in small benchmarks like lowercase.c and losses in puzzle.c.

llvm-svn: 217540
2014-09-10 20:31:57 +00:00
Sanjay Patel b653de1ada Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066

llvm-svn: 217528
2014-09-10 17:58:16 +00:00
Arnaud A. de Grandmaison 0dbcfba659 [AArch64] Address Chad's post commit review comments for r217504 (PBQP experimental support)
llvm-svn: 217518
2014-09-10 17:03:25 +00:00
Arnaud A. de Grandmaison cfb28f77a4 [AArch64] Pacify lld buildbot complaining about an unused static function in release build.
llvm-svn: 217505
2014-09-10 14:24:02 +00:00
Arnaud A. de Grandmaison c75dbbbdd6 [AArch64] Add experimental PBQP support
This adds target specific support for using the PBQP register allocator on the
AArch64, for the A57 cpu.

By default, the PBQP allocator is not used, unless explicitely required
on the command line with "-aarch64-pbqp".

llvm-svn: 217504
2014-09-10 14:06:10 +00:00
Asiri Rathnayake 369c030633 [AArch 64] Use a constant pool load for weak symbol references when
using static relocation model and small code model.

Summary: currently we generate GOT based relocations for weak symbol
references regardless of the underlying relocation model. This should
be change so that in static relocation model we use a constant pool
load instead.

Patch from: Keith Walker

Reviewers: Renato Golin, Tim Northover
llvm-svn: 217503
2014-09-10 13:54:38 +00:00
Sid Manning e7b92f0e81 Add missing HWEncoding to base register class.
This change gives tblgen the information needed to fill in the
HexagonRegEncodingTable.

llvm-svn: 217500
2014-09-10 13:09:25 +00:00
Tim Northover ba1d704229 ARM: don't size-reduce STMs using the LR register.
The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which
translates to STMDB, so we shouldn't convert STMIAs.

Patch by Sergey Dmitrouk.

llvm-svn: 217498
2014-09-10 12:53:28 +00:00
Daniel Sanders 24b6572645 [mips] Remove inverted predicates from MipsSubtarget that were only used by MipsCallingConv.td
Summary: No functional change

Reviewers: echristo, vmedic

Reviewed By: echristo, vmedic

Subscribers: echristo, llvm-commits

Differential Revision: http://reviews.llvm.org/D5266

llvm-svn: 217494
2014-09-10 12:02:27 +00:00
Daniel Sanders 75ee6b4302 [mips] Return an ArrayRef from MipsCC::intArgRegs() and remove MipsCC::numIntArgRegs()
Summary: No functional change.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5265

llvm-svn: 217485
2014-09-10 10:37:03 +00:00
Yuri Gorshenin 3939dec1f7 [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code.
Summary: [asan-assembly-instrumentation] Added CFI directives to the generated instrumentation code.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5189

llvm-svn: 217482
2014-09-10 09:45:49 +00:00
Job Noorman eb19aea4f9 Drop the W postfix on the 16-bit registers.
This ensures the inline assembly register constraints are properly recognised in
TargetLowering::getRegForInlineAsmConstraint.

llvm-svn: 217479
2014-09-10 06:58:14 +00:00
Kai Nacke d287094566 [MIPS] Add aliases for sync instruction used by Octeon CPU
This commit adds aliases for the sync instruction (synciobdma,
syncs, syncw, syncws) which are used by the Octeon CPU.

Reviewed by D. Sanders

llvm-svn: 217477
2014-09-10 06:10:24 +00:00
Craig Topper 7ff1592960 Use cast to MVT instead of EVT on a couple calls to getSizeInBits.
llvm-svn: 217473
2014-09-10 04:51:36 +00:00
Sanjay Patel 1191adf4df Add a scheduling model for AMD 16H Jaguar (btver2).
This is a first pass at a scheduling model for Jaguar.
It's structured largely on the existing SandyBridge and SLM sched models.

Using this model, in addition to turning on the PostRA scheduler, results in 
some perf wins on internal and 3rd party benchmarks. There's not much difference 
in LLVM's test-suite benchmarking subset of tests.

Differential Revision: http://reviews.llvm.org/D5229

llvm-svn: 217457
2014-09-09 20:07:07 +00:00
Toma Tabacu 2664779b27 [mips] Add assembler support for .set mips0 directive.
Summary:
This directive is used to reset the assembler options to their initial values.
Assembly programmers use it in conjunction with the ".set mipsX" directives.

This patch depends on the .set push/pop directive (http://reviews.llvm.org/D4821).

Contains work done by Matheus Almeida.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D4957

llvm-svn: 217438
2014-09-09 12:52:14 +00:00
Daniel Sanders 2b746bc4ae [mips] Move MipsTargetLowering::MipsCC::regSize() to MipsSubtarget::getGPRSizeInBytes()
Summary:
The GPR size is more a property of the subtarget than that of the ABI so move
this information to the MipsSubtarget.

No functional change.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5009

llvm-svn: 217436
2014-09-09 12:11:16 +00:00
Pavel Chupin e6617fc6d4 [x32] Emit callq for CALLpcrel32
Summary:
In AT&T annotation for both x86_64 and x32 calls should be printed as
callq in assembly. It's only a matter of correct mnemonic, object output
is ok.

Test Plan: trivial test added

Reviewers: nadav, dschuff, craig.topper

Subscribers: llvm-commits, zinovy.nis

Differential Revision: http://reviews.llvm.org/D5213

llvm-svn: 217435
2014-09-09 11:54:12 +00:00
Daniel Sanders 4abcfe2cda [mips] Don't cache IsO32 and IsFP64 in MipsTargetLowering::MipsCC
Summary:
Use a MipsSubtarget reference instead.

No functional change.

Reviewers: vmedic

Reviewed By: vmedic

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5008

llvm-svn: 217434
2014-09-09 10:46:48 +00:00
Toma Tabacu 9db22db963 [mips] Add assembler support for .set push/pop directive.
Summary:
These directives are used to save the current assembler options (in the case of ".set push") and restore the previously saved options (in the case of ".set pop").

Contains work done by Matheus Almeida.

Reviewers: dsanders

Reviewed By: dsanders

Differential Revision: http://reviews.llvm.org/D4821

llvm-svn: 217432
2014-09-09 10:15:38 +00:00
Renato Golin 63e27980da ARM: Negative offset support problem
This patch is to permit a negative offset usage for a non frame access.

Patch by Igor Oblakov.

llvm-svn: 217431
2014-09-09 09:57:59 +00:00
Bob Wilson b3482af341 Set trunc store action to Expand for all X86 targets.
When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack.

Patch by Luqman Aden!

llvm-svn: 217410
2014-09-09 01:13:36 +00:00
Chad Rosier bdbca15ccd [AArch64] Enabled AA support for Cortex-A57.
llvm-svn: 217381
2014-09-08 15:34:16 +00:00
Matt Arsenault 69bfb90419 R600/SI: Fix assertion from copying a TargetGlobalAddress
Assert in scheduler from an inserted copy_to_regclass from
a constant.

This only seems to break sometimes when a constant initializer
address is forced into VGPRs in a non-entry block. No test
since the only case I've managed to hit only happens with a future
patch, and that case will also not be a problem once scalar instructions
are used in non-entry blocks.

llvm-svn: 217380
2014-09-08 15:07:33 +00:00
Matt Arsenault 7ac9c4a074 R600/SI: Replace LDS atomics with no return versions
llvm-svn: 217379
2014-09-08 15:07:31 +00:00
Matt Arsenault 9903ccf7ee R600/SI: Add InstrMapping for noret atomics.
Only handles LDS atomics for now, and will be used
to replace atomics with no uses with the no return
versions.

llvm-svn: 217378
2014-09-08 15:07:27 +00:00
Chad Rosier 3528c1e4c6 [AArch64] Improve AA to remove unneeded edges in the AA MI scheduling graph.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103

llvm-svn: 217371
2014-09-08 14:43:48 +00:00
Chad Rosier c9f947744d [AArch64] Enabled AA support for Cortex-A53.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator Review: http://reviews.llvm.org/D5103

llvm-svn: 217370
2014-09-08 14:31:49 +00:00
Sid Manning ac3e325d67 Spelling correction
Another trivial spelling change.

llvm-svn: 217364
2014-09-08 13:05:23 +00:00
Chandler Carruth 0a8151e69a [x86] Revert my over-eager commit in r217332.
I hadn't actually run all the tests yet and these combines have somewhat
surprisingly far reaching effects.

llvm-svn: 217333
2014-09-07 12:37:11 +00:00
Chandler Carruth 8405e8fff9 [x86] Tweak the rules surrounding 0,0 and 1,1 v2f64 shuffles and add
support for MOVDDUP which is really important for matrix multiply style
operations that do lots of non-vector-aligned load and splats.

The original motivation was to add support for MOVDDUP as the lack of it
regresses matmul_f64_4x4 by 5% or so. However, all of the rules here
were somewhat suspicious.

First, we should always be using the floating point domain shuffles,
regardless of how many copies we have to make as a movapd is *crazy*
faster than the domain switching cost on some chips. (Mostly because
movapd is crazy cheap.) Because SHUFPD can't do the copy-for-free trick
of the PSHUF instructions, there is no need to avoid canonicalizing on
UNPCK variants, so do that canonicalizing. This also ensures we have the
chance to form MOVDDUP. =]

Second, we assume SSE2 support when doing any vector lowering, and given
that we should just use UNPCKLPD and UNPCKHPD as they can operate on
registers or memory. If vectors get spilled or come from memory at all
this is going to allow the load to be folded into the operation. If we
want to optimize for encoding size (the only difference, and only
a 2 byte difference) it should be done *much* later, likely after RA.

llvm-svn: 217332
2014-09-07 12:02:14 +00:00
Matt Arsenault 76803bd384 R600/SI: Fix register class for some 64-bit atomics
llvm-svn: 217323
2014-09-07 00:46:20 +00:00
Chandler Carruth 373b2b1728 [x86] Fix a pretty horrible bug and inconsistency in the x86 asm
parsing (and latent bug in the instruction definitions).

This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.

The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:

  insertps $192, %xmm0, %xmm1
  insertps $-64, %xmm0, %xmm1

These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.

The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.

Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.

The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.

In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.

I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.

llvm-svn: 217310
2014-09-06 10:00:01 +00:00