Matt Arsenault
63f953795e
AMDGPU: Fold fneg into fma or fmad
...
Patch mostly by Fiona Glaser
llvm-svn: 291733
2017-01-12 00:32:16 +00:00
Matt Arsenault
4103a81d6d
AMDGPU: Fold fneg into fmul
...
Patch mostly by Fiona Glaser
llvm-svn: 291732
2017-01-12 00:23:20 +00:00
Matt Arsenault
2529fba989
AMDGPU: Fold fneg into fadd
...
Patch mostly by Fiona Glaser
llvm-svn: 291731
2017-01-12 00:09:34 +00:00
Matt Arsenault
2a04ff97ad
AMDGPU: Pull fneg/fabs out of a select
...
Allows better source modifier usage.
llvm-svn: 291729
2017-01-11 23:57:38 +00:00
Matt Arsenault
24a1273ae1
AMDGPU: Fix shrinking of addc/subb.
...
To shrink to VOP2 the input carry must also be VCC.
llvm-svn: 291720
2017-01-11 22:58:12 +00:00
Matt Arsenault
682eb4396a
AMDGPU: Fix sext_inreg for i1 in i16
...
This produces worse code when i16 is legal, mostly
due to combines getting confused by conversions inserted
for uniform 16-bit operations.
llvm-svn: 291717
2017-01-11 22:35:22 +00:00
Matt Arsenault
28bd4cbeaf
AMDGPU: Fix breaking VOP3 v_add_i32s
...
This was shrinking the instruction even though the carry output
register was a virtual register, not known VCC.
llvm-svn: 291716
2017-01-11 22:35:17 +00:00
Matt Arsenault
69e3001b84
AMDGPU: Fix folding immediates into mac src2
...
Whether it is legal or not needs to check for the instruction
it will be replaced with.
llvm-svn: 291711
2017-01-11 22:00:02 +00:00
Sam Kolton
9772eb3907
[AMDGPU] Assembler: SDWA/DPP should not accept scalar registers and immediate operands
...
Reviewers: artem.tamazov, nhaustov, vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D28157
llvm-svn: 291668
2017-01-11 11:46:30 +00:00
Mohammed Agabaria
2c96c43388
[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
...
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
2017-01-11 08:23:37 +00:00
Jan Vesely
0d6cb1caaf
AMDGPU/EG,CM: Add fp16 conversion instructions
...
Differential Revision: https://reviews.llvm.org/D28164
llvm-svn: 291622
2017-01-11 00:12:39 +00:00
Matt Arsenault
51818c14b3
AMDGPU: Constant fold when immediate is materialized
...
In future commits these patterns will appear after moveToVALU changes.
llvm-svn: 291615
2017-01-10 23:32:04 +00:00
Matt Arsenault
8871683d60
AMDGPU: Add tests for HasMultipleConditionRegisters
...
This was enabled without many specific tests or the comment.
llvm-svn: 291586
2017-01-10 19:08:15 +00:00
Matt Arsenault
6dca542b4a
AMDGPU: Add Assert[SZ]Ext during argument load creation
...
For i16 zeroext arguments when i16 was a legal type, the
known bits information from the truncate was lost. Insert
a zeroext so the known bits optimizations work with the 32-bit
loads.
Fixes code quality regressions vs. SI in min.ll test.
llvm-svn: 291461
2017-01-09 18:52:39 +00:00
Matt Arsenault
5f45e7890a
Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")
...
llvm-svn: 291460
2017-01-09 18:44:11 +00:00
Jan Vesely
06200bd7bc
AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes
...
This will make transition to SCRATCH_MEMORY easier
Differential Revision: https://reviews.llvm.org/D24746
llvm-svn: 291279
2017-01-06 21:00:46 +00:00
Konstantin Zhuravlyov
31dbb0391d
[AMDGPU] Remove extra semicolon. NFC
...
llvm-svn: 291246
2017-01-06 17:23:21 +00:00
Konstantin Zhuravlyov
67a6d5401a
[AMDGPU] Do not emit .AMDGPU.config section for amdhsa
...
Differential Revision: https://reviews.llvm.org/D27732
llvm-svn: 291245
2017-01-06 17:02:10 +00:00
Evgeniy Stepanov
e8e11eb726
Revert "Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")"
...
Summary: This reverts commit r291144. It breaks build bots.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/3270 , http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/2058
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:1638:12: error: could not convert ‘(const unsigned int*)(& Variants)’ from ‘const unsigned int*’ to ‘llvm::ArrayRef<unsigned int>’
return Variants;
Reviewers: eugenis, tstellarAMD
Patch by Alex Shlyapnikov.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D28372
llvm-svn: 291168
2017-01-05 19:51:13 +00:00
Matt Arsenault
ec63f62c58
Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")
...
Arrays are supposed to be static const
llvm-svn: 291144
2017-01-05 17:36:11 +00:00
Richard Smith
d4d575b955
Revert r291025 ("AMDGPU: Remove unneccessary intermediate vector")
...
This caused buildbot failures due to returning ArrayRefs referencing local
(temporary) objects.
llvm-svn: 291067
2017-01-05 03:13:10 +00:00
Matt Arsenault
6796d7ea8b
AMDGPU: Remove unneccessary intermediate vector
...
llvm-svn: 291025
2017-01-04 22:54:10 +00:00
Jan Vesely
d48445d513
AMDGPU/SI: Implement sendmsghalt intrinsic
...
v2: expose using amdgcn prefix
Differential Revision: https://reviews.llvm.org/D23511
llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Artem Tamazov
25478d821b
[AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive
...
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.
Relevant test expanded at once (also file renamed for clarity).
Differential Revision: https://reviews.llvm.org/D28140
llvm-svn: 290710
2016-12-29 15:41:52 +00:00
Artem Tamazov
a01cce8887
[AMDGPU][llvm-mc] Predefined symbols to access register counts (.kernel.{v|s}gpr_count)
...
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.
Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.
Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.
Test added.
Differential Revision: https://reviews.llvm.org/D27859
llvm-svn: 290608
2016-12-27 16:00:11 +00:00
Sam Kolton
e66365e07d
[AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions
...
Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D28051
llvm-svn: 290599
2016-12-27 10:06:42 +00:00
Jan Vesely
206a510e54
AMDGPU: split ret/noret patterns for global atomics
...
Differential Revision: https://reviews.llvm.org/D27989
llvm-svn: 290435
2016-12-23 15:34:51 +00:00
Chandler Carruth
ee08676102
Enable '-Wstring-conversion' and fix some bad asserts that it helped
...
find.
Notable is the assert in NewGVN which had no effect because of the bug.
llvm-svn: 290400
2016-12-23 01:38:06 +00:00
Matt Arsenault
0b26e47345
AMDGPU: Invert cmp + select with constant
...
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.
This seems to usually be better but causes some code size regressions
in some tests.
llvm-svn: 290372
2016-12-22 21:40:08 +00:00
Matt Arsenault
941632839f
AMDGPU: Use i16 for i16 shift amount
...
llvm-svn: 290351
2016-12-22 16:36:25 +00:00
Matt Arsenault
3c97e2030a
AMDGPU: Fix missing 16-bit cmpx instructions
...
llvm-svn: 290349
2016-12-22 16:27:14 +00:00
Matt Arsenault
18f56be3d2
AMDGPU: Use i16 comparison instructions
...
llvm-svn: 290348
2016-12-22 16:27:11 +00:00
Matt Arsenault
fef7beb6a6
AMDGPU: Fixed '!NodePtr->isKnownSentinel()' assert
...
Caused by dereferencing end iterator when trying to const cast the iterator.
Patch by Martin Sherburn
llvm-svn: 290347
2016-12-22 16:06:32 +00:00
Sam Kolton
a568e3dde7
[AMDGPU] Add pseudo SDWA instructions
...
Summary: This is needed for later SDWA support in CodeGen.
Reviewers: vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27412
llvm-svn: 290338
2016-12-22 12:57:41 +00:00
Sam Kolton
a6792a39c4
[AMDGPU] Disassembler: fix for disaasembling v_mac_f32/16_dpp/sdwa
...
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.
Reviewers: nhaustov, vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27847
llvm-svn: 290336
2016-12-22 11:30:48 +00:00
Matt Arsenault
3de76b9dc8
AMDGPU: Fix missing commute table entries for cmpx
...
No tests because these aren't currently used anywhere.
llvm-svn: 290316
2016-12-22 04:39:41 +00:00
Matt Arsenault
e7d8ed32f9
AMDGPU: Swap order of operands in fadd/fsub combine
...
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.
llvm-svn: 290313
2016-12-22 04:03:40 +00:00
Matt Arsenault
46e6b7adef
AMDGPU: Check fast math flags in fadd/fsub combines
...
llvm-svn: 290312
2016-12-22 04:03:35 +00:00
Matt Arsenault
770ec8680a
AMDGPU: Form more FMAs if fusion is allowed
...
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.
llvm-svn: 290311
2016-12-22 03:55:35 +00:00
Matt Arsenault
d8b73d5304
AMDGPU: Move combines into separate functions
...
llvm-svn: 290309
2016-12-22 03:44:42 +00:00
Matt Arsenault
ef82ad94ea
AMDGPU: Enable some f32 fadd/fsub combines for f16
...
llvm-svn: 290308
2016-12-22 03:40:39 +00:00
Matt Arsenault
9e22bc2cd3
AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16
...
llvm-svn: 290307
2016-12-22 03:21:48 +00:00
Matt Arsenault
cdff21b14e
AMDGPU: Allow rcp and rsq usage with f16
...
llvm-svn: 290302
2016-12-22 03:05:44 +00:00
Matt Arsenault
4052a576c0
AMDGPU: Custom lower f16 fdiv
...
llvm-svn: 290301
2016-12-22 03:05:41 +00:00
Matt Arsenault
ce84130f85
AMDGPU: Implement f16 fcanonicalize
...
llvm-svn: 290300
2016-12-22 03:05:37 +00:00
Matt Arsenault
4e55c1ec11
AMDGPU: Update isFPImmLegal for f16
...
I don't think this matters because ConstantFP is legal.
llvm-svn: 290299
2016-12-22 03:05:30 +00:00
Tom Stellard
d8ea85aced
AMDGPU/SI: Fix file header
...
llvm-svn: 290265
2016-12-21 19:06:24 +00:00
Davide Italiano
c96272c47c
[AMDGPU] Garbage collect dead code. NFCI.
...
llvm-svn: 290249
2016-12-21 10:19:00 +00:00
Matt Arsenault
9e91014282
AMDGPU: Allow 16-bit types in inline asm constraints
...
llvm-svn: 290193
2016-12-20 19:06:12 +00:00
Matt Arsenault
4c1e9ec008
AMDGPU: Don't add same instruction multiple times to worklist
...
When the instruction is processed the first time, it may be
deleted resulting in crashes. While the new test adds the same
user to the worklist twice, this particular case doesn't crash
but I'm not sure why.
llvm-svn: 290191
2016-12-20 18:55:06 +00:00