Commit Graph

65071 Commits

Author SHA1 Message Date
David Green c42ca16cfa [ARM] Fixup pipeline test. NFC
llvm-svn: 372133
2019-09-17 15:25:24 +00:00
David Green 22a2209433 [ARM] Reserve an emergency spill slot for fp16 addressing modes that need it
Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that
use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill
slot for instructions that will be out of range to use sp directly.
AddrMode5FP16 is 8 bits with a scale of 2.

Differential Revision: https://reviews.llvm.org/D67483

llvm-svn: 372132
2019-09-17 15:23:09 +00:00
Sam Parker 1d9ba08543 [ARM] Fix for buildbots
Remove setPreservesCFG from ARMConstantIslandPass and add a couple
of -verify-machine-dom-info instances into the existing codegen
tests.

llvm-svn: 372126
2019-09-17 14:21:36 +00:00
Krasimir Georgiev bdff164e0e Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
Summary:
This reverts commit r372101.

Causes ASAN build bot failures:

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176
From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio:

```
[ RUN      ] AddressSanitizer.StrNCatOOBTest
/home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure
Death test: strncat(to - 1, from, 0)
    Result: failed to die.
```

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67658

llvm-svn: 372125
2019-09-17 14:15:23 +00:00
George Rimar a3569aced0 [llvm-readobj/llvm-objdump] - Improve how tool locate the dynamic table and report warnings about that.
Before this patch we gave a priority to a dynamic table found
from the section header.

It was discussed (here: https://reviews.llvm.org/D67078?id=218356#inline-602082)
that probably preferring the table from PT_DYNAMIC is better,
because it is what runtime loader sees.

This patch makes the table from PT_DYNAMIC be chosen at first place if it is available.
But also it adds logic to fall back to SHT_DYNAMIC if the table from the dynamic segment is
broken or fall back to use no table if both are broken.

It adds a few more diagnostic warnings for the logic above.

Differential revision: https://reviews.llvm.org/D67547

llvm-svn: 372122
2019-09-17 13:58:46 +00:00
Sam Parker f1d069e54d [ARM] Fix for buildbots
Add --verifymachineinstrs and update the remaining low overhead loop
tests.

llvm-svn: 372121
2019-09-17 13:46:26 +00:00
David Green 1ff9553057 [ARM] Fix for MVE load/store stack accesses
MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices.

Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used.

Differential Revision: https://reviews.llvm.org/D67327

llvm-svn: 372114
2019-09-17 12:58:51 +00:00
Sam Parker 36c922278e [ARM][LowOverheadLoops] Add LR def safety check
Converting the *LoopStart pseudo instructions into DLS/WLS results in
LR being defined. These instructions were inserted on the assumption
that LR would already contain the loop counter because a mov is
introduced during ISel as the the consumers in the loop can only use
LR. That assumption proved wrong!

So perform a safety check, finding an appropriate place to insert the
DLS/WLS instructions or revert if this isn't possible.

Differential Revision: https://reviews.llvm.org/D67539

llvm-svn: 372111
2019-09-17 12:19:32 +00:00
George Rimar 589293800a [llvm-readobj] - Test PPC64 relocations properly.
We had a precompiled binary committed and not all of the relocations
supported were tested. This patch fixes this.

Differential revision: https://reviews.llvm.org/D67617

llvm-svn: 372110
2019-09-17 12:05:39 +00:00
George Rimar 82d83733dd [obj2yaml] - Support PPC64 relocation types.
We do not support them and fail with llvm_unreachable currently.
This is not the only target we do not support and also seems we are missing
the tests for those we have already. But I needed this one for another patch,
so posted it separatelly.

Relocation names are taken from llvm\include\llvm\BinaryFormat\ELFRelocs\PowerPC64.def

Differential revision: https://reviews.llvm.org/D67615

llvm-svn: 372109
2019-09-17 12:00:55 +00:00
George Rimar cfc0ba3852 [yaml2obj/obj2yaml] - Allow setting an arbitrary values for e_machine.
Currently we only allow using a known named constants
for `Machine` field in YAML documents.

This patch allows using any numbers (valid or "unknown")
and adds test cases for current and new functionality.

With this it is possible to write a test cases for really unknown
EM_* targets.

Differential revision: https://reviews.llvm.org/D67652

llvm-svn: 372108
2019-09-17 11:51:26 +00:00
Luis Marques 3d0fbafd0b [RISCV] Switch to the Machine Scheduler
Most of the test changes are trivial instruction reorderings and differing
register allocations, without any obvious performance impact.

Differential Revision: https://reviews.llvm.org/D66973

llvm-svn: 372106
2019-09-17 11:15:35 +00:00
Johannes Doerfert 3ab9e8b818 [Attributor][Fix] Initialize the cache prior to using it
Summary:
There were segfaults as we modified and iterated the instruction maps in
the cache at the same time. This was happening because we created new
instructions while we populated the cache. This fix changes the order
in which we perform these actions. First, the caches for the whole
module are created, then we start to create abstract attributes.

I don't have a unit test but the LLVM test suite exposes this problem.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67232

llvm-svn: 372105
2019-09-17 10:52:41 +00:00
Luis Marques 2d550d19b3 Revert Patch from Phabricator
This reverts r372092 (git commit e38695a025)

llvm-svn: 372104
2019-09-17 10:52:09 +00:00
David Bolvansky ded48e93e6 [SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)
llvm-svn: 372101
2019-09-17 10:25:38 +00:00
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
David Bolvansky 3d33e97be6 [NFC} Updated test
llvm-svn: 372093
2019-09-17 09:45:52 +00:00
Luis Marques e38695a025 Patch from Phabricator
llvm-svn: 372092
2019-09-17 09:43:08 +00:00
David Bolvansky e80fcf0340 [SimplifyLibCalls] Mark known arguments with nonnull
Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D53342

llvm-svn: 372091
2019-09-17 09:32:52 +00:00
George Rimar 48de660bbf [llvm-readobj] - Fix BB after r372087.
Seems I forgot to update the number of bytes checked.

llvm-svn: 372089
2019-09-17 09:26:49 +00:00
Fangrui Song 1ecba6f8ef [llvm-ar] Parse 'h' and '-h': display help and exit
Support `llvm-ar h` and `llvm-ar -h` because they may be what users try
at first. Note, operation 'h' is undocumented in GNU ar.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D67560

llvm-svn: 372088
2019-09-17 09:25:52 +00:00
George Rimar de1bef0b1b [llvm-readobj] - Fix a TODO in elf-reloc-zero-name-or-value.test.
The "TODO" mentioned was:

"Add test for symbol with no name but with a value once yaml2obj allows
referencing symbols with no name from relocations."

We can do it now.

Differential revision: https://reviews.llvm.org/D67609

llvm-svn: 372087
2019-09-17 09:12:10 +00:00
Alexander Timofeev 6524a7a2b9 [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed
Defferential Revision: https://reviews.llvm.org/D67101

Reviewers: rampitec, vpykhtin
llvm-svn: 372086
2019-09-17 09:08:58 +00:00
Sam Parker 95b28a4c72 [ARM] LE support in ConstantIslands
The low-overhead branch extension provides a loop-end 'LE' instruction
that performs no decrement nor compare, it just jumps backwards. This
patch modifies the constant islands pass to try to insert LE
instructions in place of a Thumb2 conditional branch, instead of
shrinking it. This only happens if a cmp can be converted to a cbn/z
and used to exit the loop.

Differential Revision: https://reviews.llvm.org/D67404

llvm-svn: 372085
2019-09-17 09:08:05 +00:00
Florian Hahn 1bd58870e5 [LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching LoopSize.
We use `< UP.Threshold` later on, so we should use LoopSize + 1, to
allow unrolling if the result won't exceed to loop size.

Fixes PR43305.

Reviewers: efriedma, dmgreen, paquette

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D67594

llvm-svn: 372084
2019-09-17 09:02:48 +00:00
George Rimar a5dfa70806 [llvm-objcopy] - Remove python invocations from 2 test cases.
It is possible to use yaml2obj to create sections with overlapping sh_offset now.
This patch does that.

Differential revision: https://reviews.llvm.org/D67610

llvm-svn: 372081
2019-09-17 08:38:53 +00:00
Hideto Ueno 30d86f1858 [Attributor] Use Alias Analysis in noalias callsite argument deduction
Summary: This patch adds a check of alias analysis in `noalias` callsite argument deduction.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67604

llvm-svn: 372075
2019-09-17 06:53:27 +00:00
Craig Topper 95aea74494 [X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets.
Previously we tried to split them into narrower v64i1 or v16i1
pieces that each got promoted to vXi8 and then passed in a zmm
or xmm register. But this crashes when you need to pass more
pieces than available registers reserved for argument passing.

The scalarizing done here generates much longer and slower code,
but is consistent with the behavior of avx2 and earlier targets
for these types.

Fixes PR43323.

llvm-svn: 372069
2019-09-17 04:41:14 +00:00
Craig Topper 769dd59a27 [X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a broadcast load to avoid a copy.
The BLENDM instructions allow an 2 sources and an independent
destination while masked VBROADCAST has the destination tied
to the source.

llvm-svn: 372068
2019-09-17 04:41:10 +00:00
Craig Topper 2cc57bedd5 [X86] Add support for commuting EVEX VCMP instructons with any immediate value.
Previously we limited to the EQ/NE/TRUE/FALSE/ORD/UNORD immediates.

llvm-svn: 372067
2019-09-17 04:41:05 +00:00
Craig Topper d51576a3f0 [X86] Add test case for missed opportunity to commute a VCMP instruction after unfolding one load in order to fold another load.
llvm-svn: 372066
2019-09-17 04:41:01 +00:00
Craig Topper 359918dadf [X86] Enable commuting of EVEX VCMP for all immediate values during isel.
llvm-svn: 372065
2019-09-17 04:40:58 +00:00
David Blaikie f27367cd32 llvm-reduce: Clean out previous test temp/output dir, since it was a dir and now it's used as just a single file
llvm-svn: 372054
2019-09-16 23:56:26 +00:00
Amara Emerson 9d64721ca5 [GlobalISel] Partially revert r371901.
r371901 was overeager and widenScalarDst() and the like in the legalizer
attempt to increment the insert point given in order to add new instructions
after the currently legalizing inst. In cases where the insertion point is not
exactly the current instruction, then callers need to de-compensate for the
behaviour by decrementing the insertion iterator before calling them. It's not
a nice state of affairs, for now just undo the problematic parts of the change.

llvm-svn: 372050
2019-09-16 23:46:03 +00:00
David Blaikie cb4aee7318 llvm-reduce: Make tests shell-independent by passing the interpreter on the command line rather than using #! in the test file
llvm-svn: 372049
2019-09-16 23:41:19 +00:00
Bardia Mahjour 474c713fc7 [NFC] Test commit access
llvm-svn: 372033
2019-09-16 20:44:15 +00:00
Lei Huang bfb197d7a3 [PowerPC] Cust lower fpext v2f32 to v2f64 from extract_subvector v4f32
This is a follow up patch from https://reviews.llvm.org/D57857 to handle
extract_subvector v4f32.  For cases where we fpext of v2f32 to v2f64 from
extract_subvector we currently generate on P9 the following:

  lxv 0, 0(3)
  xxsldwi 1, 0, 0, 1
  xscvspdpn 2, 0
  xxsldwi 3, 0, 0, 3
  xxswapd 0, 0
  xscvspdpn 1, 1
  xscvspdpn 3, 3
  xscvspdpn 0, 0
  xxmrghd 0, 0, 3
  xxmrghd 1, 2, 1
  stxv 0, 0(4)
  stxv 1, 0(5)

This patch custom lower it to the following sequence:

  lxv 0, 0(3)       # load the v4f32 <w0, w1, w2, w3>
  xxmrghw 2, 0, 0   # Produce the following vector <w0, w0, w1, w1>
  xxmrglw 3, 0, 0   # Produce the following vector <w2, w2, w3, w3>
  xvcvspdp 2, 2     # FP-extend to <d0, d1>
  xvcvspdp 3, 3     # FP-extend to <d2, d3>
  stxv 2, 0(5)      # Store <d0, d1> (%vecinit11)
  stxv 3, 0(4)      # Store <d2, d3> (%vecinit4)

Differential Revision: https://reviews.llvm.org/D61961

llvm-svn: 372029
2019-09-16 20:04:15 +00:00
Reid Kleckner 32837a0c93 [PGO] Use linkonce_odr linkage for __profd_ variables in comdat groups
This fixes relocations against __profd_ symbols in discarded sections,
which is PR41380.

In general, instrumentation happens very early, and optimization and
inlining happens afterwards. The counters for a function are calculated
early, and after inlining, counters for an inlined function may be
widely referenced by other functions.

For C++ inline functions of all kinds (linkonce_odr &
available_externally mainly), instr profiling wants to deduplicate these
__profc_ and __profd_ globals. Otherwise the binary would be quite
large.

I made __profd_ and __profc_ comdat in r355044, but I chose to make
__profd_ internal. At the time, I was only dealing with coverage, and in
that case, none of the instrumentation needs to reference __profd_.
However, if you use PGO, then instrumentation passes add calls to
__llvm_profile_instrument_range which reference __profd_ globals. The
solution is to make these globals externally visible by using
linkonce_odr linkage for data as was done for counters.

This is safe because PGO adds a CFG hash to the names of the data and
counter globals, so if different TUs have different globals, they will
get different data and counter arrays.

Reviewers: xur, hans

Differential Revision: https://reviews.llvm.org/D67579

llvm-svn: 372020
2019-09-16 18:49:09 +00:00
Roman Lebedev 69911b8d01 [ARM][Codegen] Autogenerate arm-cgp-casts.ll test.
Apparently it got broken by r372009 while i thought it was r372012.

llvm-svn: 372019
2019-09-16 18:28:22 +00:00
Simon Pilgrim 3df0daddfd [X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operands
Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector.

llvm-svn: 372013
2019-09-16 17:30:33 +00:00
David Green 8d21460dc5 [ARM] A predicate cast of a predicate cast is a predicate cast
The adds some very basic folding of PREDICATE_CASTS, removing cases when they
are chained together. These would already be removed eventually, as these are
lowered to copies. This just allows it to happen earlier, which can help other
simplifications.

Differential Revision: https://reviews.llvm.org/D67591

llvm-svn: 372012
2019-09-16 17:29:07 +00:00
Roman Lebedev 10151f6618 [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not per-BB cost
Summary:
Previously, if the threshold was 2, we were willing to speculatively
execute 2 cheap instructions in both basic blocks (thus we were willing
to speculatively execute cost = 4), but weren't willing to speculate
when one BB had 3 instructions and other one had no instructions,
even thought that would have total cost of 3.

This looks inconsistent to me.
I don't think `cmov`-like instructions will start executing
until both of it's inputs are available: https://godbolt.org/z/zgHePf
So i don't see why the existing behavior is the correct one.

Also, let's add it's own `cl::opt` for this threshold,
with default=4, so it is not stricter than the previous threshold:
will allow to fold when there are 2 BB's each with cost=2.
And since the logic has changed, it will also allow to fold when
one BB has cost=3 and other cost=1, or there is only one BB with cost=4.

This is an alternative solution to D65148:
This fix is mainly motivated by `signbit-like-value-extension.ll` test.
That pattern comes up in JPEG decoding, see e.g.
`Figure F.12 – Extending the sign bit of a decoded value in V`
of `ITU T.81` (JPEG specification).
That branch is not predictable, and it is within the innermost loop,
so the fact that that pattern ends up being stuck with a branch
instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial.

This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240)
| metric                                 |     old |     new | delta |      % |
| x86-mi-counting.NumMachineFunctions    |   37720 |   37721 |     1 |  0.00% |
| x86-mi-counting.NumMachineBasicBlocks  |  773545 |  771181 | -2364 | -0.31% |
| x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% |
| x86-mi-counting.NumUncondBR            |  135770 |  135543 |  -227 | -0.17% |
| x86-mi-counting.NumCondBR              |  423753 |  422187 | -1566 | -0.37% |
| x86-mi-counting.NumCMOV                |   24815 |   25731 |   916 |  3.69% |
| x86-mi-counting.NumVecBlend            |      17 |      17 |     0 |  0.00% |

We significantly decrease basic block count, notably decrease instruction count,
significantly decrease branch count and very significantly increase `cmov` count.

Performance-wise, unsurprisingly, this has great effect on
target RawSpeed benchmark. I'm seeing 5 **major** improvements:
```
Benchmark                                                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean                                  -0.3064         -0.3064      226.9913      157.4452      226.9800      157.4384
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median                                -0.3057         -0.3057      226.8407      157.4926      226.8282      157.4828
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev                                -0.4985         -0.4954        0.3051        0.1530        0.3040        0.1534
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean                                   -0.1747         -0.1747       80.4787       66.4227       80.4771       66.4146
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median                                 -0.1742         -0.1743       80.4686       66.4542       80.4690       66.4436
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev                                 +0.6089         +0.5797        0.0670        0.1078        0.0673        0.1062
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean                                  -0.1598         -0.1598      171.6996      144.2575      171.6915      144.2538
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median                                -0.1598         -0.1597      171.7109      144.2755      171.7018      144.2766
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev                                +0.4024         +0.3850        0.0847        0.1187        0.0848        0.1175
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean                                   -0.0550         -0.0551      280.3046      264.8800      280.3017      264.8559
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median                                 -0.0554         -0.0554      280.2628      264.7360      280.2574      264.7297
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev                                 +0.7005         +0.7041        0.2779        0.4725        0.2775        0.4729
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean                                   -0.0354         -0.0355      316.7396      305.5208      316.7342      305.4890
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median                                 -0.0354         -0.0356      316.6969      305.4798      316.6917      305.4324
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev                                 +0.0493         +0.0330        0.3562        0.3737        0.3563        0.3681
```

That being said, it's always best-effort, so there will likely
be cases where this worsens things.

Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc

Reviewed By: jmolloy

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67318

llvm-svn: 372009
2019-09-16 16:18:24 +00:00
Sanjay Patel 3961a143e1 [InstCombine] remove unneeded one-use checks for icmp fold
Related folds were added in:
rL125734
...the code comment about register pressure is discussed in
more detail in:
https://bugs.llvm.org/show_bug.cgi?id=2698

But 10 years later, perf testing bzip2 with this change now
shows a slight (0.2% average) improvement on Haswell although
that's probably within test noise.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 and rL371981 are related patches in this series.

llvm-svn: 372007
2019-09-16 16:15:25 +00:00
Sanjay Patel 4d9d0f9cf5 [InstCombine] move tests for icmp+add; NFC
llvm-svn: 372004
2019-09-16 15:33:40 +00:00
Oliver Cruickshank ee6fbebbaf [ARM] Add patterns for BSWAP intrinsic on MVE
BSWAP can use the VREV instruction on MVE to produce better results than
expanding.

llvm-svn: 372002
2019-09-16 15:20:10 +00:00
Oliver Cruickshank e9510a6cad [ARM] Add patterns for bitreverse intrinsic on MVE
BITREVERSE can use the VBRSR which will reverse and right shift.
Shifting right by 0 will just reverse the bits.

llvm-svn: 372001
2019-09-16 15:20:03 +00:00
Oliver Cruickshank 5f799ef162 [ARM] Lower CTTZ on MVE
Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and
count the leading zeros, equivalent to a count trailing zeros (CTTZ).

llvm-svn: 372000
2019-09-16 15:19:56 +00:00
Oliver Cruickshank cd1a0b9271 [ARM] Add patterns for CTLZ on MVE
CTLZ intrinsic can use the VCLS instruction on MVE, which produces
better results than expanding.

llvm-svn: 371999
2019-09-16 15:19:49 +00:00
Sjoerd Meijer c2bafadd7a [LV] Add ARM MVE tail-folding tests
Now that the vectorizer can do tail-folding (rL367592), and the ARM backend
understands MVE masked loads/stores (rL371932), it's time to add the MVE
tail-folding equivalent of the X86 tests that I added.

llvm-svn: 371996
2019-09-16 14:56:26 +00:00
Matt Arsenault 07b8597656 AMDGPU/GlobalISel: Fix some broken run lines
llvm-svn: 371992
2019-09-16 14:14:40 +00:00