Commit Graph

1419 Commits

Author SHA1 Message Date
Matt Arsenault 63f953795e AMDGPU: Fold fneg into fma or fmad
Patch mostly by Fiona Glaser

llvm-svn: 291733
2017-01-12 00:32:16 +00:00
Matt Arsenault 4103a81d6d AMDGPU: Fold fneg into fmul
Patch mostly by Fiona Glaser

llvm-svn: 291732
2017-01-12 00:23:20 +00:00
Matt Arsenault 2529fba989 AMDGPU: Fold fneg into fadd
Patch mostly by Fiona Glaser

llvm-svn: 291731
2017-01-12 00:09:34 +00:00
Matt Arsenault 2a04ff97ad AMDGPU: Pull fneg/fabs out of a select
Allows better source modifier usage.

llvm-svn: 291729
2017-01-11 23:57:38 +00:00
Matt Arsenault 24a1273ae1 AMDGPU: Fix shrinking of addc/subb.
To shrink to VOP2 the input carry must also be VCC.

llvm-svn: 291720
2017-01-11 22:58:12 +00:00
Matt Arsenault 682eb4396a AMDGPU: Fix sext_inreg for i1 in i16
This produces worse code when i16 is legal, mostly
due to combines getting confused by conversions inserted
for uniform 16-bit operations.

llvm-svn: 291717
2017-01-11 22:35:22 +00:00
Matt Arsenault 28bd4cbeaf AMDGPU: Fix breaking VOP3 v_add_i32s
This was shrinking the instruction even though the carry output
register was a virtual register, not known VCC.

llvm-svn: 291716
2017-01-11 22:35:17 +00:00
Matt Arsenault 69e3001b84 AMDGPU: Fix folding immediates into mac src2
Whether it is legal or not needs to check for the instruction
it will be replaced with.

llvm-svn: 291711
2017-01-11 22:00:02 +00:00
Sam Kolton 9772eb3907 [AMDGPU] Assembler: SDWA/DPP should not accept scalar registers and immediate operands
Reviewers: artem.tamazov, nhaustov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28157

llvm-svn: 291668
2017-01-11 11:46:30 +00:00
Mohammed Agabaria 2c96c43388 [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 

llvm-svn: 291657
2017-01-11 08:23:37 +00:00
Jan Vesely 0d6cb1caaf AMDGPU/EG,CM: Add fp16 conversion instructions
Differential Revision: https://reviews.llvm.org/D28164

llvm-svn: 291622
2017-01-11 00:12:39 +00:00
Matt Arsenault 51818c14b3 AMDGPU: Constant fold when immediate is materialized
In future commits these patterns will appear after moveToVALU changes.

llvm-svn: 291615
2017-01-10 23:32:04 +00:00
Matt Arsenault 8871683d60 AMDGPU: Add tests for HasMultipleConditionRegisters
This was enabled without many specific tests or the comment.

llvm-svn: 291586
2017-01-10 19:08:15 +00:00
Matt Arsenault 6dca542b4a AMDGPU: Add Assert[SZ]Ext during argument load creation
For i16 zeroext arguments when i16 was a legal type, the
known bits information from the truncate was lost. Insert
a zeroext so the known bits optimizations work with the 32-bit
loads.

Fixes code quality regressions vs. SI in min.ll test.

llvm-svn: 291461
2017-01-09 18:52:39 +00:00
Matt Arsenault 5f45e7890a Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")
llvm-svn: 291460
2017-01-09 18:44:11 +00:00
Jan Vesely 06200bd7bc AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes
This will make transition to SCRATCH_MEMORY easier

Differential Revision: https://reviews.llvm.org/D24746

llvm-svn: 291279
2017-01-06 21:00:46 +00:00
Konstantin Zhuravlyov 31dbb0391d [AMDGPU] Remove extra semicolon. NFC
llvm-svn: 291246
2017-01-06 17:23:21 +00:00
Konstantin Zhuravlyov 67a6d5401a [AMDGPU] Do not emit .AMDGPU.config section for amdhsa
Differential Revision: https://reviews.llvm.org/D27732

llvm-svn: 291245
2017-01-06 17:02:10 +00:00
Evgeniy Stepanov e8e11eb726 Revert "Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")"
Summary: This reverts commit r291144. It breaks build bots.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/3270, http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/2058

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:1638:12: error: could not convert ‘(const unsigned int*)(& Variants)’ from ‘const unsigned int*’ to ‘llvm::ArrayRef<unsigned int>’
     return Variants;

Reviewers: eugenis, tstellarAMD

Patch by Alex Shlyapnikov.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D28372

llvm-svn: 291168
2017-01-05 19:51:13 +00:00
Matt Arsenault ec63f62c58 Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")
Arrays are supposed to be static const

llvm-svn: 291144
2017-01-05 17:36:11 +00:00
Richard Smith d4d575b955 Revert r291025 ("AMDGPU: Remove unneccessary intermediate vector")
This caused buildbot failures due to returning ArrayRefs referencing local
(temporary) objects.

llvm-svn: 291067
2017-01-05 03:13:10 +00:00
Matt Arsenault 6796d7ea8b AMDGPU: Remove unneccessary intermediate vector
llvm-svn: 291025
2017-01-04 22:54:10 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Artem Tamazov 25478d821b [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.

Relevant test expanded at once (also file renamed for clarity).

Differential Revision: https://reviews.llvm.org/D28140

llvm-svn: 290710
2016-12-29 15:41:52 +00:00
Artem Tamazov a01cce8887 [AMDGPU][llvm-mc] Predefined symbols to access register counts (.kernel.{v|s}gpr_count)
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.

Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
	value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.

Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.

Test added.

Differential Revision: https://reviews.llvm.org/D27859

llvm-svn: 290608
2016-12-27 16:00:11 +00:00
Sam Kolton e66365e07d [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions
Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28051

llvm-svn: 290599
2016-12-27 10:06:42 +00:00
Jan Vesely 206a510e54 AMDGPU: split ret/noret patterns for global atomics
Differential Revision: https://reviews.llvm.org/D27989

llvm-svn: 290435
2016-12-23 15:34:51 +00:00
Chandler Carruth ee08676102 Enable '-Wstring-conversion' and fix some bad asserts that it helped
find.

Notable is the assert in NewGVN which had no effect because of the bug.

llvm-svn: 290400
2016-12-23 01:38:06 +00:00
Matt Arsenault 0b26e47345 AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Matt Arsenault 941632839f AMDGPU: Use i16 for i16 shift amount
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Matt Arsenault 3c97e2030a AMDGPU: Fix missing 16-bit cmpx instructions
llvm-svn: 290349
2016-12-22 16:27:14 +00:00
Matt Arsenault 18f56be3d2 AMDGPU: Use i16 comparison instructions
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Matt Arsenault fef7beb6a6 AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assert
Caused by dereferencing end iterator when trying to const cast the iterator.

Patch by Martin Sherburn

llvm-svn: 290347
2016-12-22 16:06:32 +00:00
Sam Kolton a568e3dde7 [AMDGPU] Add pseudo SDWA instructions
Summary: This is needed for later SDWA support in CodeGen.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27412

llvm-svn: 290338
2016-12-22 12:57:41 +00:00
Sam Kolton a6792a39c4 [AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.

Reviewers: nhaustov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27847

llvm-svn: 290336
2016-12-22 11:30:48 +00:00
Matt Arsenault 3de76b9dc8 AMDGPU: Fix missing commute table entries for cmpx
No tests because these aren't currently used anywhere.

llvm-svn: 290316
2016-12-22 04:39:41 +00:00
Matt Arsenault e7d8ed32f9 AMDGPU: Swap order of operands in fadd/fsub combine
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.

llvm-svn: 290313
2016-12-22 04:03:40 +00:00
Matt Arsenault 46e6b7adef AMDGPU: Check fast math flags in fadd/fsub combines
llvm-svn: 290312
2016-12-22 04:03:35 +00:00
Matt Arsenault 770ec8680a AMDGPU: Form more FMAs if fusion is allowed
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.

llvm-svn: 290311
2016-12-22 03:55:35 +00:00
Matt Arsenault d8b73d5304 AMDGPU: Move combines into separate functions
llvm-svn: 290309
2016-12-22 03:44:42 +00:00
Matt Arsenault ef82ad94ea AMDGPU: Enable some f32 fadd/fsub combines for f16
llvm-svn: 290308
2016-12-22 03:40:39 +00:00
Matt Arsenault 9e22bc2cd3 AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
llvm-svn: 290307
2016-12-22 03:21:48 +00:00
Matt Arsenault cdff21b14e AMDGPU: Allow rcp and rsq usage with f16
llvm-svn: 290302
2016-12-22 03:05:44 +00:00
Matt Arsenault 4052a576c0 AMDGPU: Custom lower f16 fdiv
llvm-svn: 290301
2016-12-22 03:05:41 +00:00
Matt Arsenault ce84130f85 AMDGPU: Implement f16 fcanonicalize
llvm-svn: 290300
2016-12-22 03:05:37 +00:00
Matt Arsenault 4e55c1ec11 AMDGPU: Update isFPImmLegal for f16
I don't think this matters because ConstantFP is legal.

llvm-svn: 290299
2016-12-22 03:05:30 +00:00
Tom Stellard d8ea85aced AMDGPU/SI: Fix file header
llvm-svn: 290265
2016-12-21 19:06:24 +00:00
Davide Italiano c96272c47c [AMDGPU] Garbage collect dead code. NFCI.
llvm-svn: 290249
2016-12-21 10:19:00 +00:00
Matt Arsenault 9e91014282 AMDGPU: Allow 16-bit types in inline asm constraints
llvm-svn: 290193
2016-12-20 19:06:12 +00:00
Matt Arsenault 4c1e9ec008 AMDGPU: Don't add same instruction multiple times to worklist
When the instruction is processed the first time, it may be
deleted resulting in crashes. While the new test adds the same
user to the worklist twice, this particular case doesn't crash
but I'm not sure why.

llvm-svn: 290191
2016-12-20 18:55:06 +00:00