Commit Graph

139349 Commits

Author SHA1 Message Date
Eric Astor b901b6ab17 Revert "[ms] [llvm-ml] Add support for .radix directive, and accept all radix specifiers"
This reverts commit 5dd1b6d612.
2020-09-23 13:59:34 -04:00
Craig Topper 7a3c643c35 [SLP] Make HorizontalReduction::getOperationData take an Instruction* instead of a Value*. NFCI
All of the callers already have an Instruction *. Many of them
from a dyn_cast.

Also update the OperationData constructor to use a Instruction&
to remove a dyn_cast and make it clear that the pointer is non-null.

Differential Revision: https://reviews.llvm.org/D88132
2020-09-23 10:51:03 -07:00
Craig Topper f21f835ee8 [X86] Improve demanded bits for X86ISD::BEXTR.
If the control is constant we can figure out exactly which bits
of the input are demanded.

Differential Revision: https://reviews.llvm.org/D88072
2020-09-23 10:51:02 -07:00
Eric Astor aca7105db9 Fix include location (accidentally committed a local variation) 2020-09-23 13:50:25 -04:00
Sanjay Patel 6189a8d9f5 [TTI] add wrapper for matching vector reduction to reduce code duplication; NFC
I'm not sure what this means, but the order in which we try
the matches makes a difference on at least 1 regression test...
2020-09-23 13:48:57 -04:00
Eric Astor 5dd1b6d612 [ms] [llvm-ml] Add support for .radix directive, and accept all radix specifiers
Add support for .radix directive, and radix specifiers [yY] (binary), [oOqQ] (octal), and [tT] (decimal).

Also, when lexing MASM integers, require radix specifier; MASM requires that all literals without a radix specifier be treated as in the default radix. (e.g., 0100 = 100)

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D87400
2020-09-23 13:45:58 -04:00
Paul C. Anagnostopoulos b3931188fd Enhance TableGen so that backends can produce better error messages.
Modify SearchableTableEmitter.cpp to take advantage.
Clean up formatting and capitalization issues.
2020-09-23 13:35:32 -04:00
Krzysztof Parzyszek 76e8c1899e Break long line accidentally left in the previous commit 2020-09-23 12:24:45 -05:00
Fangrui Song 01b9deba76 Revert D87970 "[ThinLTO] Avoid temporaries when loading global decl attachment metadata"
This reverts commit ab1b4810b5.

It caused an issue in llvm::lto::thinBackend for a -fsanitize=cfi build.

```
AbbrevNo is 0 => "Invalid abbrev number"
0  llvm::BitstreamCursor::getAbbrev (this=0x9db4c8, AbbrevID=4) at llvm/include/llvm/Bitstream/BitstreamReader.h:528
1  0x00007f5f777a6eb4 in llvm::BitstreamCursor::readRecord (this=0x9db4c8, AbbrevID=4, Vals=llvm::SmallVector of Size 0, Capacity 64, Blob=0x7ffcd0e26558) at
usr/local/google/home/maskray/llvm/llvm/lib/Bitstream/Reader/BitstreamReader.cpp:228
2  0x00007f5f796bf633 in llvm::MetadataLoader::MetadataLoaderImpl::lazyLoadOneMetadata (this=0x9db3a0, ID=188, Placeholders=...) at /usr/local/google/home/mas
ray/llvm/llvm/lib/Bitcode/Reader/MetadataLoader.cpp:1091
3  0x00007f5f796c2527 in llvm::MetadataLoader::MetadataLoaderImpl::getMetadataFwdRefOrLoad (this=0x9db3a0, ID=188) at llvm
lib/Bitcode/Reader/MetadataLoader.cpp:668
4  0x00007f5f796bfff3 in llvm::MetadataLoader::getMetadataFwdRefOrLoad (this=0xd31580, Idx=188) at llvm/lib/Bitcode/Reader
MetadataLoader.cpp:2290
5  0x00007f5f79638265 in (anonymous namespace)::BitcodeReader::parseFunctionBody (this=0xd312e0, F=0x9de758) at llvm/lib/B
tcode/Reader/BitcodeReader.cpp:3938
6  0x00007f5f79635d32 in (anonymous namespace)::BitcodeReader::materialize (this=0xd312e0, GV=0x9de758) at llvm/lib/Bitcod
/Reader/BitcodeReader.cpp:5408
7  0x00007f5f7f8dbe3e in llvm::Module::materialize (this=0x9b92c0, GV=0x9de758) at llvm/lib/IR/Module.cpp:442
8  0x00007f5f7f7f8fbe in llvm::GlobalValue::materialize (this=0x9de758) at llvm/lib/IR/Globals.cpp:50
9  0x00007f5f83b9b5f5 in llvm::FunctionImporter::importFunctions (this=0x7ffcd0e2a730, DestModule=..., ImportList=...) at
llvm/lib/Transforms/IPO/FunctionImport.cpp:1182
```
2020-09-23 10:24:08 -07:00
Krzysztof Parzyszek e976fb1e54 [EarlyCSE] Fix crash with expensive checks after D87691
D87691 reordered some checks, which turned out to be unsafe. More
specifically, when examining a store instruction, the check against
getOrCreateResult should be done before attempting to call
isSameMemGeneration. Otherwise a crash in MSSA walker can occur.

This patch restores the order of these calls to what it was originally.
2020-09-23 12:21:34 -05:00
Vinicius Tinti 577adda54f [Support/Path] Add path::is_absolute_gnu
Implements IS_ABSOLUTE_PATH from GNU tools.

C++17 is_absolute behavior is different the from the behavior defined by GNU
tools.

According to cppreference.com, C++17 states: "An absolute path is a path
that unambiguously identifies the location of a file without reference
to an additional starting location."

In other words, the rules are:
 1. POSIX style paths with nonempty root directory are absolute.
 2. Windows style paths with nonempty root name and root directory are
    absolute.
 3. No other paths are absolute.

GNU rules are:
 1. Paths starting with a path separator are absolute.
 2. Windows style paths are also absolute if they start with a character
    followed by ':'.
 3. No other paths are absolute.

On Windows style the path "C:\Users\Default" has "C:" as root name and "\"
as root directory.

Hence "C:" on Windows is absolute under GNU rules and not absolute under
C++17 because it has no root directory. Likewise "/" and "\" on Windows are
absolute under GNU and are not absolute under C++17 due to empty root name.

Related to PR46368.

Differential Revision: https://reviews.llvm.org/D87667
2020-09-23 18:01:32 +01:00
Guozhi Wei fd75ad8662 [MBFIWrapper] Add a new function getBlockProfileCount
MBFIWrapper keeps track of block frequencies of newly created blocks and
modified blocks, modified block frequencies should also impact block profile
count. This class doesn't provide interface getBlockProfileCount, users can only
use the underlying MBFI to query profile count, the underlying MBFI doesn't know
the modifications made in MBFIWrapper, so it either provides stale profile count
for modified block or simply crashes on new blocks.

So this patch add function getBlockProfileCount to class MBFIWrapper to handle
new blocks or modified blocks.

Differential Revision: https://reviews.llvm.org/D87802
2020-09-23 09:31:45 -07:00
Andrew Wei c2deacd929 [AArch64] Fix ldst optimization of non-immediate store offset
When matching store instruction for ldst opt, we should make sure store instr is in 'reg+imm' form as load instr,
otherwise, it will have assertion in isLdOffsetInRangeOfSt since it will use getImm() directly.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87905
2020-09-23 23:00:13 +08:00
Simon Pilgrim 91589cf679 Add missing namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-23 16:19:25 +01:00
Simon Pilgrim 474dc33d07 Add missing namespace closure comment. NFCI.
Fixes clang-tidy llvm-namespace-comment warning.
2020-09-23 16:19:25 +01:00
Sebastian Neubauer a343b9b032 Revert "[AMDGPU] Insert waitcnt after returning from call"
This reverts commit ca907bfb57.

According to michel.daenzer,
> This completely broke the Mesa radeonsi driver on Navi 14. Xorg +
> xterm come up with major corruption & psychedelic colours.
2020-09-23 17:16:39 +02:00
Cameron McInally db40a74344 [SVE] Lower fixed length ISD::VECREDUCE_ADD to Scalable
Differential Revision: https://reviews.llvm.org/D87796
2020-09-23 09:08:07 -05:00
Florian Hahn 31923f6b36 [VPlan] Disconnect VPValue and VPUser.
This refactors VPuser to not inherit from VPValue to facilitate
introducing operations that introduce multiple VPValues (e.g.
VPInterleaveRecipe).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84679
2020-09-23 14:44:31 +01:00
Jonas Paulsson 370a8c8025 [SystemZ] Make sure not to call getZExtValue on a >64 bit constant.
Better use isZero() and isIntN() in SystemZTargetTransformInfo rather than
calling getZExtValue() since the immediate operand may be wider than 64 bits,
which is not allowed with getZExtValue().

Fixes https://bugs.llvm.org/show_bug.cgi?id=47600

Review: Simon Pilgrim
2020-09-23 15:36:32 +02:00
Matt Arsenault c463fd136e GlobalISel: Fix truncating shift amount in trunc (shl) combine
The shift amount type does not necessarily match the result type. This
was inserting a trunc from s32 to s32, which asserted. Just preserve
the original shift amount type which can be legalized later.
2020-09-23 09:07:50 -04:00
Matt Arsenault af0207f2ba AMDGPU: Check global FP atomics match default FP mode
We would always select global FP atomics from atomicrmw fadd, although
they have a hardcoded FP mode.
2020-09-23 09:07:50 -04:00
Kerry McLaughlin d0149ba9b4 [SVE][CodeGen] Lower legal integer -> floating point conversions
This patch adds new ISD nodes, SCVTZ_MERGE_PASSTHRU &
UCVTZ_MERGE_PASSTHRU, which are used to lower both legal
scalable vector [S|U]INT_TO_FP operations and the following intrinsics:
 - llvm.aarch64.sve.scvtf
 - llvm.aarch64.sve.ucvtf

Reviewed By: sdesmalen, efriedma

Differential Revision: https://reviews.llvm.org/D87913
2020-09-23 11:53:53 +01:00
Sebastian Neubauer ca907bfb57 [AMDGPU] Insert waitcnt after returning from call
When memory operations are outstanding on function calls, either the
caller or the callee can insert a waitcnt to ensure that all reads are
finished.
Calls need some time to be executed, so if the callee inserts the
waitcnt, filling the instruction buffer and waiting for memory will be
interleaved, hiding some latency. This comes at the cost of having a
waitcnt inside functions that may not be needed as no memory operations
are outstanding.

For function calls, this is already implemented. The same principal
applies to returns: If the caller inserts a waitcnt after the call, the
callee does not have to wait and the return and memory operation can be
run in parallel.

This commit implements waiting in the caller after returning from a
function call.

Differential Revision: https://reviews.llvm.org/D87674
2020-09-23 12:17:59 +02:00
David Sherwood e077367a28 [SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.

Differential revision: https://reviews.llvm.org/D87889
2020-09-23 09:20:08 +01:00
David Sherwood 59c4d5aad0 [SVE] Fix InstCombinerImpl::PromoteCastOfAllocation for scalable vectors
In this patch I've fixed some warnings that arose from the implicit
cast of TypeSize -> uint64_t. I tried writing a variety of different
cases to show how this optimisation might work for scalable vectors
and found:

1. The optimisation does not work for cases where the cast type
is scalable and the allocated type is not. This because we need to
know how many times the cast type fits into the allocated type.
2. If we pass all the various checks for the case when the allocated
type is scalable and the cast type is not, then when creating the
new alloca we have to take vscale into account. This leads to
sub-optimal IR that is worse than the original IR.
3. For the remaining case when both the alloca and cast types are
scalable it is hard to find examples where the optimisation would
kick in, except for simple bitcasts, because we typically fail the
ABI alignment checks.

For now I've changed the code to bail out if only one of the alloca
and cast types is scalable. This means we continue to support the
existing cases where both types are fixed, and also the specific case
when both types are scalable with the same size and alignment, for
example a simple bitcast of an alloca to another type.

I've added tests that show we don't attempt to promote the alloca,
except for simple bitcasts:

  Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll

Differential revision: https://reviews.llvm.org/D87378
2020-09-23 08:43:05 +01:00
Piotr Sobczak 8d7fd73c3a [AMDGPU] Fix merging m0 inits
Fix incorrect merges of m0 inits in loops.

It was assumed that if a clobbering instruction appears in
the same block as an init and the clobbering instruction
does not dominate the init then it does not interfere with
init.

This does not work in the presence of loops, where in this
scenario, the clobbering instruction does interfere with
the init in another iteration.

To fix this, do not check for block equality and defer the
decision to the predecessor check.

Differential Revision: https://reviews.llvm.org/D87882
2020-09-23 09:13:43 +02:00
Albion Fung d7eb917a7c [PowerPC] Implementation of 128-bit Binary Vector Mod and Sign Extend builtins
This patch implements 128-bit Binary Vector Mod and Sign Extend builtins for PowerPC10.

Differential: https://reviews.llvm.org/D87394#inline-815858
2020-09-23 01:18:14 -05:00
Martin Storsjö b90132399a [CVP] Remove a redundant trailing semicolon, fixing GCC warnings. NFC. 2020-09-23 09:03:01 +03:00
Martin Storsjö 2c4c659666 [InstCombine] Add parentheses in assert to silence GCC warning. NFC. 2020-09-23 09:03:01 +03:00
Martin Storsjö f69e090d7d [MC] [Win64EH] Try to generate packed unwind info where possible
In practice, this only gives modest savings (for a 6.5 MB DLL with
230 KB xdata, the xdata sections shrinks by around 2.5 KB); to
gain more, the frame lowering would need to be tweaked to more often
generate frame layouts that match the canonical layouts that can
be written in packed form.

Differential Revision: https://reviews.llvm.org/D87371
2020-09-23 09:03:01 +03:00
Teresa Johnson ab1b4810b5 [ThinLTO] Avoid temporaries when loading global decl attachment metadata
When performing ThinLTO importing, the metadata loader attempts to lazy
load, by building an index. However, module level global decl attachment
metadata was being parsed early while building the index, since the
associated (module level) global values aren't materialized on demand.
This results in the creation of forward reference temporary metadatas,
which are expensive.

Normally, these module level global values don't have much attached
metadata. However, in the case of -fwhole-program-vtables (e.g. for
whole program devirtualization), the vtables may have many attached type
metadatas. This was resulting in very slow performance when performing
ThinLTO importing with the default lazy loading.

This patch restructures the handling of these global decl attachment
records, delaying their parsing until after the lazy loading index has
been built. Then the parser can use the interface that loads from the
index, which resolves forward references immediately instead of creating
expensive temporaries.

For one ThinLTO backend that imports from modules containing huge
numbers of vtables and associated types, I measured the following
compile times for the metadata materialization during function
importing, rounded to nearest second:

No -fwhole-program-vtables:
  Lazy loading on (head):  1s
  Lazy loading off (head): 3s
  Lazy loading on (patch): 1s

With -fwhole-program-vtables:
  Lazy loading on (head):  440s
  Lazy loading off (head): 4s
  Lazy loading on (patch): 2s

Differential Revision: https://reviews.llvm.org/D87970
2020-09-22 20:32:07 -07:00
Bing1 Yu ec24e50553 [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)
add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87884
2020-09-23 10:29:10 +08:00
Andrew Litteken 88bc59c300 Revert "[IRSim] Adding IRSimilarityCandidate that contains a region of IRInstructionData."
This reverts commit 4944bb190f.
2020-09-22 21:02:34 -05:00
Fangrui Song bee68b2956 [EHStreamer] Ensure CallSiteEntry::{BeginLabel,EndLabel} are non-null. NFC
... to simplify the code a bit.

Reviewed By: rahmanl

Differential Revision: https://reviews.llvm.org/D87999
2020-09-22 17:34:43 -07:00
Andrew Litteken 4944bb190f [IRSim] Adding IRSimilarityCandidate that contains a region of IRInstructionData.
The IRSimilarityCandidate is a container to hold a region of
IRInstructions and offer interfaces for the starting instruction, ending
instruction, parent function, length.  It also assigns a global value
number for each unique instance of a value in the region.

It also contains an interface to compare two IRSimilarity as to whether
they have the same sequence of similar instructions.

Tests for whether the instructions are similar are found in
unittests/Analysis/IRSimilarityIdentifierTest.cpp.

Differential Revision: https://reviews.llvm.org/D86970
2020-09-22 18:42:31 -05:00
Hubert Tong 32c9991dab [InstCombine] Fix errno bug in pow expansion to sqrt
A conversion from `pow` to `sqrt` shall not call an `errno`-setting
`sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow`
call need not.

This patch avoids the erroneous (pun not intended) transformation by
applying the restrictions discussed in the thread for
https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html.

The existing tests are updated (depending on emphasis in the checks for
library calls, avoidance of overlap, and overall coverage):
  - to add `ninf`, retaining the intended library call,
  - to use the intrinsic, retaining the use of `select`, or
  - to expect the replacement to not occur.

The following is tested:
  - The pow intrinsic folds to a `select` instruction to
    handle -//infinity//.
  - The pow library call folds, with `ninf`, to `sqrt` without the
    `select` instruction associated with handling -//infinity//.
  - The pow library call does not fold to `sqrt` without `ninf`.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87877
2020-09-22 18:58:05 -04:00
Alexey Bataev d6ac649ccd [SLP]Fix coding style, NFC. 2020-09-22 17:44:29 -04:00
Philip Reames e1a3271ebb [AArch64] Teach analyzeBranch to remove branch equivelent to fallthrough
The motivation here is that MachineBlockPlacement relies on analyzeBranch to remove branches to fallthrough blocks when the branch is not fully analyzeable. With the introduction of the FAULTING_OP psuedo for implicit null checking (see D87861), this case becomes important. Note that it's hard to otherwise exercise this path as BranchFolding handle's any fully analyzeable branch sequence without using this interface.

p.s. For anyone who saw my comment in the original review, what I thought was an issue in BranchFolding originally turned out to simply be a bug in my patch. (Now fixed.)

Differential Revision: https://reviews.llvm.org/D88035
2020-09-22 14:38:27 -07:00
Fangrui Song 49f2744931 Change LoopInfo::empty to isInnermost after D82895 2020-09-22 14:07:40 -07:00
Stefanos Baziotis a7873e5abc Small fixes for "[LoopInfo] empty() -> isInnermost(), add isOutermost()" 2020-09-22 23:59:34 +03:00
Reid Kleckner 90242caca2 Revert "[CodeGen] emit CG profile for COFF object file"
This reverts commit 91aed9bf97, it is
causing link errors.
2020-09-22 13:47:39 -07:00
Stefanos Baziotis 89c1e35f3c [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Congzhe Cao 4edb3d3646 [AArch64] Avoid pairing loads with same result reg
When pairing ldr instructions to an ldp instruction, we cannot pair two ldr
destination registers where one is a sub or super register of the other.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86906
2020-09-22 16:25:08 -04:00
Mircea Trofin cf112382dd [ThinLTO] Option to bypass function importing.
This completes the circle, complementing -lto-embed-bitcode
(specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips
function importing. The index file is still needed for the other data it
contains.

Differential Revision: https://reviews.llvm.org/D87949
2020-09-22 13:12:11 -07:00
Paul C. Anagnostopoulos 21f5f509c8 Two patches to fix the broken build.
One to fix a C++ compiler warning.
One to allow Sphinx to find a new document.
2020-09-22 16:00:31 -04:00
Roman Lebedev b289dc5306
[CVP] Narrow SDiv/SRem to the smallest power-of-2 that's sufficient to contain its operands
This is practically identical to what we already do for UDiv/URem:
  https://rise4fun.com/Alive/04K

Name: narrow udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow exact udiv
Pre: C0 u<= 255 && C1 u<= 255
%r = udiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = udiv exact i8 %t0, %t1
%r = zext i8 %t2 to i16

Name: narrow urem
Pre: C0 u<= 255 && C1 u<= 255
%r = urem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = urem i8 %t0, %t1
%r = zext i8 %t2 to i16

... only here we need to look for 'min signed bits', not 'active bits',
and there's an UB to be aware of:
  https://rise4fun.com/Alive/KG86
  https://rise4fun.com/Alive/LwR

Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = sdiv exact i9 %t0, %t1
%r = sext i9 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i9
%t1 = trunc i16 C1 to i9
%t2 = srem i9 %t0, %t1
%r = sext i9 %t2 to i16


Name: narrow sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow exact sdiv
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = sdiv exact i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = sdiv exact i8 %t0, %t1
%r = sext i8 %t2 to i16

Name: narrow srem
Pre: C0 <= 127 && C1 <= 127 && C0 >= -128 && C1 >= -128 && !(C0 == -128 && C1 == -1)
%r = srem i16 C0, C1
  =>
%t0 = trunc i16 C0 to i8
%t1 = trunc i16 C1 to i8
%t2 = srem i8 %t0, %t1
%r = sext i8 %t2 to i16


The ConstantRangeTest.losslessSignedTruncationSignext test sanity-checks
the logic, that we can losslessly truncate ConstantRange to
`getMinSignedBits()` and signext it back, and it will be identical
to the original CR.

On vanilla llvm test-suite + RawSpeed, this fires 1262 times,
while the same fold for UDiv/URem only fires 384 times. Sic!

Additionally, this causes +606.18% (+1079) extra cases of
aggressive-instcombine.NumDAGsReduced, and +473.14% (+1145)
of aggressive-instcombine.NumInstrsReduced folds.
2020-09-22 21:37:30 +03:00
Roman Lebedev 4977eadee5
[NFC][CVP] Give a better name STATISTIC() counting udiv i16 -> udiv i8 xforms 2020-09-22 21:37:30 +03:00
Roman Lebedev 7465da2077
[ConstantRange] Introduce getMinSignedBits() method
Similar to the ConstantRange::getActiveBits(), and to similarly-named
methods in APInt, returns the bitwidth needed to represent
the given signed constant range
2020-09-22 21:37:30 +03:00
Roman Lebedev ba5afe5588
[NFC][CVP] processUDivOrURem(): refactor to use ConstantRange::getActiveBits()
As an exhaustive test shows, this logic is fully identical to the old
implementation, with exception of the case where both of the operands
had empty ranges:

```
TEST_F(ConstantRangeTest, CVP_UDiv) {
  unsigned Bits = 4;
  EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) {
    if(CR0.isEmptySet())
      return;
    EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) {
      if(CR0.isEmptySet())
        return;

      unsigned MaxActiveBits = 0;
      for (const ConstantRange &CR : {CR0, CR1})
        MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits());

      ConstantRange OperandRange(Bits, /*isFullSet=*/false);
      for (const ConstantRange &CR : {CR0, CR1})
        OperandRange = OperandRange.unionWith(CR);
      unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits();

      EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1;
    });
  });
}
```
2020-09-22 21:37:29 +03:00
Roman Lebedev 2ed9c4c70b
[ConstantRange] Introduce getActiveBits() method
Much like APInt::getActiveBits(), computes how many bits are needed
to be able to represent every value in this constant range,
treating the values as unsigned.
2020-09-22 21:37:29 +03:00