Commit Graph

126735 Commits

Author SHA1 Message Date
David Green 22a2209433 [ARM] Reserve an emergency spill slot for fp16 addressing modes that need it
Similar to D67327, but this time for the FP16 VLDR and VSTR instructions that
use the AddrMode5FP16 addressing mode. We need to reserve an emergency spill
slot for instructions that will be out of range to use sp directly.
AddrMode5FP16 is 8 bits with a scale of 2.

Differential Revision: https://reviews.llvm.org/D67483

llvm-svn: 372132
2019-09-17 15:23:09 +00:00
Benjamin Kramer 167b302075 [RISCV] Unbreak the build
llvm-svn: 372127
2019-09-17 14:27:31 +00:00
Sam Parker 1d9ba08543 [ARM] Fix for buildbots
Remove setPreservesCFG from ARMConstantIslandPass and add a couple
of -verify-machine-dom-info instances into the existing codegen
tests.

llvm-svn: 372126
2019-09-17 14:21:36 +00:00
Krasimir Georgiev bdff164e0e Revert "[SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)"
Summary:
This reverts commit r372101.

Causes ASAN build bot failures:

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176
From http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/14176/steps/64-bit%20check-asan/logs/stdio:

```
[ RUN      ] AddressSanitizer.StrNCatOOBTest
/home/buildbots/ppc64be-sanitizer/sanitizer-ppc64be/build/llvm-project/compiler-rt/lib/asan/tests/asan_str_test.cpp:462: Failure
Death test: strncat(to - 1, from, 0)
    Result: failed to die.
```

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67658

llvm-svn: 372125
2019-09-17 14:15:23 +00:00
Luis Marques 6cf896b284 [RISCV][NFC] Use NoRegister instead of 0 literal
Summary: Trivial cleanup.

Reviewers: asb, lenary

Reviewed By: lenary

Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67526

llvm-svn: 372120
2019-09-17 13:34:17 +00:00
Simon Pilgrim f12a3da5a7 [X86] X86DAGToDAGISel::tryFoldLoad - assert root/parent pointers are non-null. NFCI.
Silences a static analyzer warning.

llvm-svn: 372118
2019-09-17 13:27:02 +00:00
Simon Pilgrim c52a7093df InterleavedAccessInfo - Don't dereference a dyn_cast result. NFCI.
llvm-svn: 372117
2019-09-17 13:25:56 +00:00
Simon Pilgrim a2719f38c1 [LoopVectorize] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 372116
2019-09-17 13:24:54 +00:00
David Green 1ff9553057 [ARM] Fix for MVE load/store stack accesses
MVE loads and stores have a 7 bit immediate range, scaled by the length of the type. This needs to be taught to the stack estimation code to ensure that an emergency spill slot is reserved in case we run out of registers when materialising stack indices.

Also the narrowing loads/stores can be created with frame indices even though they do not accept SP as a register. We need in those cases to make sure we have an emergency register to use as the frame base, as SP can never be used.

Differential Revision: https://reviews.llvm.org/D67327

llvm-svn: 372114
2019-09-17 12:58:51 +00:00
Benjamin Kramer df4b9a3f4f Hide implementation details in namespaces.
llvm-svn: 372113
2019-09-17 12:56:29 +00:00
Sam Parker 36c922278e [ARM][LowOverheadLoops] Add LR def safety check
Converting the *LoopStart pseudo instructions into DLS/WLS results in
LR being defined. These instructions were inserted on the assumption
that LR would already contain the loop counter because a mov is
introduced during ISel as the the consumers in the loop can only use
LR. That assumption proved wrong!

So perform a safety check, finding an appropriate place to insert the
DLS/WLS instructions or revert if this isn't possible.

Differential Revision: https://reviews.llvm.org/D67539

llvm-svn: 372111
2019-09-17 12:19:32 +00:00
George Rimar 82d83733dd [obj2yaml] - Support PPC64 relocation types.
We do not support them and fail with llvm_unreachable currently.
This is not the only target we do not support and also seems we are missing
the tests for those we have already. But I needed this one for another patch,
so posted it separatelly.

Relocation names are taken from llvm\include\llvm\BinaryFormat\ELFRelocs\PowerPC64.def

Differential revision: https://reviews.llvm.org/D67615

llvm-svn: 372109
2019-09-17 12:00:55 +00:00
George Rimar cfc0ba3852 [yaml2obj/obj2yaml] - Allow setting an arbitrary values for e_machine.
Currently we only allow using a known named constants
for `Machine` field in YAML documents.

This patch allows using any numbers (valid or "unknown")
and adds test cases for current and new functionality.

With this it is possible to write a test cases for really unknown
EM_* targets.

Differential revision: https://reviews.llvm.org/D67652

llvm-svn: 372108
2019-09-17 11:51:26 +00:00
Luis Marques 3d0fbafd0b [RISCV] Switch to the Machine Scheduler
Most of the test changes are trivial instruction reorderings and differing
register allocations, without any obvious performance impact.

Differential Revision: https://reviews.llvm.org/D66973

llvm-svn: 372106
2019-09-17 11:15:35 +00:00
Johannes Doerfert 3ab9e8b818 [Attributor][Fix] Initialize the cache prior to using it
Summary:
There were segfaults as we modified and iterated the instruction maps in
the cache at the same time. This was happening because we created new
instructions while we populated the cache. This fix changes the order
in which we perform these actions. First, the caches for the whole
module are created, then we start to create abstract attributes.

I don't have a unit test but the LLVM test suite exposes this problem.

Reviewers: uenoku, sstefan1

Subscribers: hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67232

llvm-svn: 372105
2019-09-17 10:52:41 +00:00
Luis Marques 2d550d19b3 Revert Patch from Phabricator
This reverts r372092 (git commit e38695a025)

llvm-svn: 372104
2019-09-17 10:52:09 +00:00
Simon Pilgrim 0b10da7cc7 [X86] Use APInt::getLowBitsSet helper. NFCI.
Also avoids a static analyzer warning about out of range shifts.

llvm-svn: 372103
2019-09-17 10:51:30 +00:00
David Bolvansky ded48e93e6 [SLC] Preserve attrs for strncpy(x, "", y) -> memset(align 1 x, '\0', y)
llvm-svn: 372101
2019-09-17 10:25:38 +00:00
Graham Hunter 1a9195d817 [SVE][MVT] Fixed-length vector MVT ranges
* Reordered MVT simple types to group scalable vector types
    together.
  * New range functions in MachineValueType.h to only iterate over
    the fixed-length int/fp vector types.
  * Stopped backends which don't support scalable vector types from
    iterating over scalable types.

Reviewers: sdesmalen, greened

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D66339

llvm-svn: 372099
2019-09-17 10:19:23 +00:00
David Bolvansky be2487a2ba [InstCombine] Annotate strdup with deref_or_null
llvm-svn: 372098
2019-09-17 10:12:48 +00:00
David Bolvansky 3a3dddd9d7 [NFCI] Fixed buildbots
llvm-svn: 372097
2019-09-17 10:03:45 +00:00
Fangrui Song 8351763709 [SimplifyLibCalls] Fix -Wunused-result after D53342/r372091
llvm-svn: 372096
2019-09-17 09:56:55 +00:00
Luis Marques e38695a025 Patch from Phabricator
llvm-svn: 372092
2019-09-17 09:43:08 +00:00
David Bolvansky e80fcf0340 [SimplifyLibCalls] Mark known arguments with nonnull
Reviewers: efriedma, jdoerfert

Reviewed By: jdoerfert

Subscribers: ychen, rsmith, joerg, aaron.ballman, lebedev.ri, uenoku, jdoerfert, hfinkel, javed.absar, spatel, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D53342

llvm-svn: 372091
2019-09-17 09:32:52 +00:00
Alexander Timofeev 6524a7a2b9 [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed
Defferential Revision: https://reviews.llvm.org/D67101

Reviewers: rampitec, vpykhtin
llvm-svn: 372086
2019-09-17 09:08:58 +00:00
Sam Parker 95b28a4c72 [ARM] LE support in ConstantIslands
The low-overhead branch extension provides a loop-end 'LE' instruction
that performs no decrement nor compare, it just jumps backwards. This
patch modifies the constant islands pass to try to insert LE
instructions in place of a Thumb2 conditional branch, instead of
shrinking it. This only happens if a cmp can be converted to a cbn/z
and used to exit the loop.

Differential Revision: https://reviews.llvm.org/D67404

llvm-svn: 372085
2019-09-17 09:08:05 +00:00
Florian Hahn 1bd58870e5 [LoopUnroll] Use LoopSize+1 as threshold, to allow unrolling loops matching LoopSize.
We use `< UP.Threshold` later on, so we should use LoopSize + 1, to
allow unrolling if the result won't exceed to loop size.

Fixes PR43305.

Reviewers: efriedma, dmgreen, paquette

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D67594

llvm-svn: 372084
2019-09-17 09:02:48 +00:00
Sam Parker 26a475afe5 [ARM][MVE] Add invalidForTailPredication to TSFlags
Set this bit for the MVE reduction instructions to prevent a loop from
becoming tail predicated in their presence.

Differential Revision: https://reviews.llvm.org/D67444

llvm-svn: 372076
2019-09-17 07:43:04 +00:00
Hideto Ueno 30d86f1858 [Attributor] Use Alias Analysis in noalias callsite argument deduction
Summary: This patch adds a check of alias analysis in `noalias` callsite argument deduction.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67604

llvm-svn: 372075
2019-09-17 06:53:27 +00:00
Hideto Ueno 3bb5cbc20b [Attributor] Create helper struct for handling analysis getters
Summary: This patch introduces a helper struct `AnalysisGetter` to put together analysis getters. In this patch, a getter for `AAResult` is also added for  `noalias`.

Reviewers: jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67603

llvm-svn: 372072
2019-09-17 05:45:18 +00:00
Craig Topper 95aea74494 [X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets.
Previously we tried to split them into narrower v64i1 or v16i1
pieces that each got promoted to vXi8 and then passed in a zmm
or xmm register. But this crashes when you need to pass more
pieces than available registers reserved for argument passing.

The scalarizing done here generates much longer and slower code,
but is consistent with the behavior of avx2 and earlier targets
for these types.

Fixes PR43323.

llvm-svn: 372069
2019-09-17 04:41:14 +00:00
Craig Topper 769dd59a27 [X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a broadcast load to avoid a copy.
The BLENDM instructions allow an 2 sources and an independent
destination while masked VBROADCAST has the destination tied
to the source.

llvm-svn: 372068
2019-09-17 04:41:10 +00:00
Craig Topper 2cc57bedd5 [X86] Add support for commuting EVEX VCMP instructons with any immediate value.
Previously we limited to the EQ/NE/TRUE/FALSE/ORD/UNORD immediates.

llvm-svn: 372067
2019-09-17 04:41:05 +00:00
Craig Topper 359918dadf [X86] Enable commuting of EVEX VCMP for all immediate values during isel.
llvm-svn: 372065
2019-09-17 04:40:58 +00:00
Amara Emerson 9d64721ca5 [GlobalISel] Partially revert r371901.
r371901 was overeager and widenScalarDst() and the like in the legalizer
attempt to increment the insert point given in order to add new instructions
after the currently legalizing inst. In cases where the insertion point is not
exactly the current instruction, then callers need to de-compensate for the
behaviour by decrementing the insertion iterator before calling them. It's not
a nice state of affairs, for now just undo the problematic parts of the change.

llvm-svn: 372050
2019-09-16 23:46:03 +00:00
Nemanja Ivanovic e63c676825 [PowerPC] Cust lower fpext v2f32 to v2f64 from extract_subvector v4f32
Add the missing piece of r372029.
Somehow when the patch for review D61961 was committed, only the test case
went in and the code didn't. This of course caused all kinds of build bot
breaks.
This patch just adds the code for that patch.

Author: Lei Huang
Differential revision: https://reviews.llvm.org/D61961

llvm-svn: 372043
2019-09-16 22:54:52 +00:00
Francis Visoiu Mistrih 77383d83eb [Remarks] Allow remarks::Format::YAML to take a string table
It should be allowed to take a string table in case all the strings in
the remarks point there, but it shouldn't use it during serialization.

llvm-svn: 372042
2019-09-16 22:45:17 +00:00
Vedant Kumar 413647d730 [Coverage] Speed up file-based queries for coverage info, NFC
Speed up queries for coverage info in a file by reducing the amount of
time spent determining whether a function record corresponds to a file.

This gives a 36% speedup when generating a coverage report for `llc`.
The reduction is entirely in user time.

rdar://54758110

Differential Revision: https://reviews.llvm.org/D67575

llvm-svn: 372025
2019-09-16 19:08:44 +00:00
Vedant Kumar 95de24978e [Coverage] Assert that filenames in a TU are unique, NFC
llvm-svn: 372024
2019-09-16 19:08:41 +00:00
Steven Wu 34d80461ff [LTO][Legacy] Add new C inferface to query libcall functions
Summary:
This is needed to implemented the same approach as lld (implemented in r338434)
for how to handling symbols that can be generated by LTO code generator
but not present in the symbol table for linker that uses legacy C APIs.

libLTO is in charge of providing the list of symbols. Linker is in
charge of implementing the eager loading from static libraries using
the list of symbols.

rdar://problem/52853974

Reviewers: tejohnson, bd1976llvm, deadalnix, espindola

Reviewed By: tejohnson

Subscribers: emaste, arichardson, hiraditya, MaskRay, dang, kledzik, mehdi_amini, inglorion, jkorous, dexonsmith, ributzka, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67568

llvm-svn: 372021
2019-09-16 18:49:54 +00:00
Reid Kleckner 32837a0c93 [PGO] Use linkonce_odr linkage for __profd_ variables in comdat groups
This fixes relocations against __profd_ symbols in discarded sections,
which is PR41380.

In general, instrumentation happens very early, and optimization and
inlining happens afterwards. The counters for a function are calculated
early, and after inlining, counters for an inlined function may be
widely referenced by other functions.

For C++ inline functions of all kinds (linkonce_odr &
available_externally mainly), instr profiling wants to deduplicate these
__profc_ and __profd_ globals. Otherwise the binary would be quite
large.

I made __profd_ and __profc_ comdat in r355044, but I chose to make
__profd_ internal. At the time, I was only dealing with coverage, and in
that case, none of the instrumentation needs to reference __profd_.
However, if you use PGO, then instrumentation passes add calls to
__llvm_profile_instrument_range which reference __profd_ globals. The
solution is to make these globals externally visible by using
linkonce_odr linkage for data as was done for counters.

This is safe because PGO adds a CFG hash to the names of the data and
counter globals, so if different TUs have different globals, they will
get different data and counter arrays.

Reviewers: xur, hans

Differential Revision: https://reviews.llvm.org/D67579

llvm-svn: 372020
2019-09-16 18:49:09 +00:00
Simon Pilgrim 3df0daddfd [X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operands
Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector.

llvm-svn: 372013
2019-09-16 17:30:33 +00:00
David Green 8d21460dc5 [ARM] A predicate cast of a predicate cast is a predicate cast
The adds some very basic folding of PREDICATE_CASTS, removing cases when they
are chained together. These would already be removed eventually, as these are
lowered to copies. This just allows it to happen earlier, which can help other
simplifications.

Differential Revision: https://reviews.llvm.org/D67591

llvm-svn: 372012
2019-09-16 17:29:07 +00:00
Roman Lebedev 10151f6618 [SimplifyCFG] FoldTwoEntryPHINode(): consider *total* speculation cost, not per-BB cost
Summary:
Previously, if the threshold was 2, we were willing to speculatively
execute 2 cheap instructions in both basic blocks (thus we were willing
to speculatively execute cost = 4), but weren't willing to speculate
when one BB had 3 instructions and other one had no instructions,
even thought that would have total cost of 3.

This looks inconsistent to me.
I don't think `cmov`-like instructions will start executing
until both of it's inputs are available: https://godbolt.org/z/zgHePf
So i don't see why the existing behavior is the correct one.

Also, let's add it's own `cl::opt` for this threshold,
with default=4, so it is not stricter than the previous threshold:
will allow to fold when there are 2 BB's each with cost=2.
And since the logic has changed, it will also allow to fold when
one BB has cost=3 and other cost=1, or there is only one BB with cost=4.

This is an alternative solution to D65148:
This fix is mainly motivated by `signbit-like-value-extension.ll` test.
That pattern comes up in JPEG decoding, see e.g.
`Figure F.12 – Extending the sign bit of a decoded value in V`
of `ITU T.81` (JPEG specification).
That branch is not predictable, and it is within the innermost loop,
so the fact that that pattern ends up being stuck with a branch
instead of `select` (i.e. `CMOV` for x86) is unlikely to be beneficial.

This has great results on the final assembly (vanilla test-suite + RawSpeed): (metric pass - D67240)
| metric                                 |     old |     new | delta |      % |
| x86-mi-counting.NumMachineFunctions    |   37720 |   37721 |     1 |  0.00% |
| x86-mi-counting.NumMachineBasicBlocks  |  773545 |  771181 | -2364 | -0.31% |
| x86-mi-counting.NumMachineInstructions | 7488843 | 7486442 | -2401 | -0.03% |
| x86-mi-counting.NumUncondBR            |  135770 |  135543 |  -227 | -0.17% |
| x86-mi-counting.NumCondBR              |  423753 |  422187 | -1566 | -0.37% |
| x86-mi-counting.NumCMOV                |   24815 |   25731 |   916 |  3.69% |
| x86-mi-counting.NumVecBlend            |      17 |      17 |     0 |  0.00% |

We significantly decrease basic block count, notably decrease instruction count,
significantly decrease branch count and very significantly increase `cmov` count.

Performance-wise, unsurprisingly, this has great effect on
target RawSpeed benchmark. I'm seeing 5 **major** improvements:
```
Benchmark                                                                                             Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_mean                                  -0.3064         -0.3064      226.9913      157.4452      226.9800      157.4384
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_median                                -0.3057         -0.3057      226.8407      157.4926      226.8282      157.4828
Samsung/NX3000/_3184416.SRW/threads:8/process_time/real_time_stddev                                -0.4985         -0.4954        0.3051        0.1530        0.3040        0.1534
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_mean                                   -0.1747         -0.1747       80.4787       66.4227       80.4771       66.4146
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_median                                 -0.1742         -0.1743       80.4686       66.4542       80.4690       66.4436
Kodak/DCS760C/86L57188.DCR/threads:8/process_time/real_time_stddev                                 +0.6089         +0.5797        0.0670        0.1078        0.0673        0.1062
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_pvalue                                 0.0000          0.0000      U Test, Repetitions: 49 vs 49
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_mean                                  -0.1598         -0.1598      171.6996      144.2575      171.6915      144.2538
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_median                                -0.1598         -0.1597      171.7109      144.2755      171.7018      144.2766
Sony/DSLR-A230/DSC08026.ARW/threads:8/process_time/real_time_stddev                                +0.4024         +0.3850        0.0847        0.1187        0.0848        0.1175
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_mean                                   -0.0550         -0.0551      280.3046      264.8800      280.3017      264.8559
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_median                                 -0.0554         -0.0554      280.2628      264.7360      280.2574      264.7297
Canon/EOS 77D/IMG_4049.CR2/threads:8/process_time/real_time_stddev                                 +0.7005         +0.7041        0.2779        0.4725        0.2775        0.4729
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_pvalue                                  0.0000          0.0000      U Test, Repetitions: 49 vs 49
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_mean                                   -0.0354         -0.0355      316.7396      305.5208      316.7342      305.4890
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_median                                 -0.0354         -0.0356      316.6969      305.4798      316.6917      305.4324
Canon/EOS 5DS/2K4A9929.CR2/threads:8/process_time/real_time_stddev                                 +0.0493         +0.0330        0.3562        0.3737        0.3563        0.3681
```

That being said, it's always best-effort, so there will likely
be cases where this worsens things.

Reviewers: efriedma, craig.topper, dmgreen, jmolloy, fhahn, Carrot, hfinkel, chandlerc

Reviewed By: jmolloy

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67318

llvm-svn: 372009
2019-09-16 16:18:24 +00:00
Sanjay Patel 3961a143e1 [InstCombine] remove unneeded one-use checks for icmp fold
Related folds were added in:
rL125734
...the code comment about register pressure is discussed in
more detail in:
https://bugs.llvm.org/show_bug.cgi?id=2698

But 10 years later, perf testing bzip2 with this change now
shows a slight (0.2% average) improvement on Haswell although
that's probably within test noise.

Given that this is IR canonicalization, we shouldn't be worried
about register pressure though; the backend should be able to
adjust for that as needed.

This is part of solving PR43310 the theoretically right way:
https://bugs.llvm.org/show_bug.cgi?id=43310
...ie, if we don't cripple basic transforms, then we won't
need to add special-case code to detect larger patterns.

rL371940 and rL371981 are related patches in this series.

llvm-svn: 372007
2019-09-16 16:15:25 +00:00
Oliver Cruickshank ee6fbebbaf [ARM] Add patterns for BSWAP intrinsic on MVE
BSWAP can use the VREV instruction on MVE to produce better results than
expanding.

llvm-svn: 372002
2019-09-16 15:20:10 +00:00
Oliver Cruickshank e9510a6cad [ARM] Add patterns for bitreverse intrinsic on MVE
BITREVERSE can use the VBRSR which will reverse and right shift.
Shifting right by 0 will just reverse the bits.

llvm-svn: 372001
2019-09-16 15:20:03 +00:00
Oliver Cruickshank 5f799ef162 [ARM] Lower CTTZ on MVE
Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and
count the leading zeros, equivalent to a count trailing zeros (CTTZ).

llvm-svn: 372000
2019-09-16 15:19:56 +00:00
Oliver Cruickshank cd1a0b9271 [ARM] Add patterns for CTLZ on MVE
CTLZ intrinsic can use the VCLS instruction on MVE, which produces
better results than expanding.

llvm-svn: 371999
2019-09-16 15:19:49 +00:00
Simon Pilgrim a48b6e98ab [ExecutionEngine] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 371998
2019-09-16 15:19:11 +00:00