Commit Graph

146169 Commits

Author SHA1 Message Date
Thomas Preud'homme fd941036bf Fix PR46880: Fail CHECK-NOT with undefined variable
Currently a CHECK-NOT directive succeeds whenever the corresponding
match fails. However match can fail due to an error rather than a lack
of match, for instance if a variable is undefined. This commit makes match
error a failure for CHECK-NOT.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D86222
2021-04-20 14:42:46 +01:00
Sebastian Neubauer 4897effb14 [AMDGPU] Add TransVALU to gfx10
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.

This doesn't seem to have any effect, but it should be more correct.

Differential Revision: https://reviews.llvm.org/D100123
2021-04-20 15:34:43 +02:00
Jay Foad 2aea830ec4 [AMDGPU] Use if instead of foreach in a few places. NFC. 2021-04-20 14:20:30 +01:00
Andrea Di Biagio 2226d21896 [MCA][LSUnit] Fix a potential use after free in the logic that updates memory groups.
Make sure that the `CriticalMemoryInstruction` of a memory group is invalidated
if it references an already executed instruction.  This avoids a potential
use-after-free if the critical memory info becomes stale, and the value is
read after the instruction has executed.
2021-04-20 13:30:45 +01:00
Nemanja Ivanovic 03e7fefff8 [PowerPC] Canonicalize shuffles on big endian targets as well
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.

This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.

Differential revision: https://reviews.llvm.org/D100478
2021-04-20 07:29:47 -05:00
Jay Foad edea476142 [AMDGPU] Use simpler alternatives to !foldl. NFC. 2021-04-20 12:59:04 +01:00
Simon Pilgrim e156f2515c [DAG] SelectionDAG.cpp - breakup if-else chains where each block returns. NFCI.
Match style guide that requests that if+return blocks are separate.
2021-04-20 12:37:00 +01:00
hsmahesha 840c4e4e90 [AMDGPU] Re-arrange ds_read/ds_write ISel pattern for better readability.
Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D100773
2021-04-20 16:17:15 +05:30
Dávid Bolvanský 319c9f6e58 [MemoryBuiltins] Added support for memalign
memalign is older aligned_alloc.
2021-04-20 12:39:54 +02:00
Ben Shi 30e2c7be99 [RISCV] Refactor an optimization of addition with immediate
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100769
2021-04-20 18:04:25 +08:00
Joe Ellis effacc1599 [AArch64] Constant fold sve_convert_from_svbool(zero) to zero
Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100463
2021-04-20 10:02:49 +00:00
Joe Ellis c91cd4f3bb [AArch64][SVE][InstCombine] Replace last{a,b} intrinsics with extracts...
when the predicate used by last{a,b} specifies a known vector length.

For example:
  aarch64_sve_lasta(VL1, D) -> extractelement(D, #1)
  aarch64_sve_lastb(VL1, D) -> extractelement(D, #0)

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D100476
2021-04-20 10:01:33 +00:00
Serguei Katkov 70193bdfc0 Re-land [GreedyRA ORE] Add Cost of spill locations into remark
Re-land the patch with a fix of clang test.

Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.

While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.

Revert "Revert "[GreedyRA ORE] Add Cost of spill locations into remark""
This reverts commit 680f3d6de7.
2021-04-20 16:21:07 +07:00
Fraser Cormack b4a358a7ba [RISCV] Fix missing emergency slots for scalable stack offsets
This patch adds an additional emergency spill slot to RVV code. This is
required as RVV stack offsets may require an additional register to compute.

This patch includes an optimization by @HsiangKai <kai.wang@sifive.com>
to reduce the number of registers required for the computation of stack
offsets from 3 to 2. Otherwise we'd need two additional emergency spill
slots.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D100574
2021-04-20 09:59:41 +01:00
Sander de Smalen 86729538bd [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100121
2021-04-20 09:54:45 +01:00
Qiu Chaofan 2432d80d3b [PowerPC] Use mtvsrdd to put callee-saved GPR into VSR
This patch exploits mtvsrdd instruction (available in ISA3.0+) to save
two callee-saved GPR registers into a single VSR, making it more
efficient.

Reviewed By: jsji, nemanjai

Differential Revision: https://reviews.llvm.org/D62565
2021-04-20 16:43:24 +08:00
Jun Ma 1ef5699d1a [DAGCombiner] Support fold zero scalar vector.
This patch changes ISD::isBuildVectorAllZeros to
ISD::isConstantSplatVectorAllZeros which handles zero sclar vector.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100813
2021-04-20 16:28:43 +08:00
Jay Foad b22721f01a [AMDGPU] GCNDPPCombine: don't shrink V_ADD_CO_U32 if carry out is used
Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.

Differential Revision: https://reviews.llvm.org/D100760
2021-04-20 09:17:52 +01:00
Luo, Yuanke bcdaccfe34 [X86][AMX] Verify illegal types or instructions for x86_amx.
This patch is related to https://reviews.llvm.org/D100032 which define
some illegal types or operations for x86_amx. There are no arguments,
arrays, pointers, vectors or constants of x86_amx.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D100472
2021-04-20 16:14:22 +08:00
Arthur Eubanks 5e71b9fa93 Explicitly pass type to cast load constant folding result
Previously we would use the type of the pointee to determine what to
cast the result of constant folding a load. To aid with opaque pointer
types, we should explicitly pass the type of the load rather than
looking at pointee types.

ConstantFoldLoadThroughBitcast() converts the const prop'd value to the
proper load type (e.g. [1 x i32] -> i32). Instead of calling this in
every intermediate step like bitcasts, we only call this when we
actually see the global initializer value.

In some existing uses of this API, we don't know the exact type we're
loading from immediately (e.g. first we visit a bitcast, then we visit
the load using the bitcast). In those cases we have to manually call
ConstantFoldLoadThroughBitcast() when simplifying the load to make sure
that we cast to the proper type.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100718
2021-04-20 00:53:21 -07:00
Qiu Chaofan b820339752 [PowerPC] Support f128 under VSX
This patch is the last one in backend to support fp128 type in
pre-POWER9 subtargets with VSX, removing temporary option and updating
remaining tests.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92374
2021-04-20 15:49:52 +08:00
Fraser Cormack 457da7f298 [SelectionDAG] Relax constraints on STEP_VECTOR step operand
This patch relaxes the requirement that the STEP_VECTOR step constant
must be of a type at least as large as the vector element type. This
does not permit its use on targets which have legal vector element types
larger than the largest legal scalar type, such as i64 vectors on RV32.

As such, the requirement has been loosened so that the step operand must
be any scalar type so long as the constant immediate is non-negative and
the value fits inside the vector element type.

This limits combining optimizations in certain circumstances but in
practice it's unlikely to be a hindrance.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D100660
2021-04-20 08:41:42 +01:00
Zi Xuan Wu 4bb60c285c [CSKY 6/n] Add support branch and symbol series instruction
This patch adds basic CSKY branch instructions and symbol address series instructions.
Those two kinds of instruction have relationship between each other, and it involves much work about Fixups.

For now, basic instructions are enabled except for disassembler support.
We would support to generate basic codegen asm firstly and delay disassembler work later.

Differential Revision: https://reviews.llvm.org/D95029
2021-04-20 15:36:49 +08:00
Zi Xuan Wu 4216389c26 [CSKY 5/n] Add support for all CSKY basic integer instructions except for branch series
This patch adds basic CSKY integer instructions except for branch series such as bsr, br.
It mainly includes basic ALU, load & store, compare and data move instructions.

Branch series instructions need handle complex symbol operand as following patch later.

Differential Revision: https://reviews.llvm.org/D94007
2021-04-20 15:36:49 +08:00
Zi Xuan Wu 8ba622bae1 [CSKY 4/n] Add basic CSKYAsmParser and CSKYInstPrinter
This basic parser will handle basic instructions with register or immediate operands.
With the addition of CSKYInstPrinter, we can now make use of lit tests.

Differential Revision: https://reviews.llvm.org/D93798
2021-04-20 15:36:49 +08:00
Max Kazantsev 9430efa18b [NFC] Restructure code to make it possible to insert other GCs 2021-04-20 14:24:38 +07:00
Zakk Chen d5fa71e9ec [RISCV] Handle PseudoVRELOAD and PseudoVSPILL in getInstSizeInBytes.
It's necessary to calculate correct instruction size because
PseudoVRELOAD and PseudoSPILL will be expanded into multiple
instructions.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100702
2021-04-19 22:30:03 -07:00
Serguei Katkov 680f3d6de7 Revert "[GreedyRA ORE] Add Cost of spill locations into remark"
This reverts commit 328377307a.

This commit causes buildbot failures due to some clang tests are not updated.
Temporary revert to fix clang tests.
2021-04-20 11:08:24 +07:00
Serguei Katkov 328377307a [GreedyRA ORE] Add Cost of spill locations into remark
Cost of spill location is computed basing on relative branch frequency
where corresponding spill/reload/copy are located.

While the number itself is highly depends on incoming IR,
the total cost can be used when do some changes in RA.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100020
2021-04-20 10:47:22 +07:00
Jun Ma 5c6ac3b4a2 [AArch64][SVE] Combine add and index_vector
This patch tries to combine pattern add(index_vector(zero, step), dup(X)) into index_vector(X, step)

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D100107
2021-04-20 11:38:37 +08:00
Hongtao Yu b98807df05 [CSSPGO] Exclude pseudo probes from slot index
Pseudo probe are currently given a slot index like other regular instructions. This affects register pressure and lifetime weight computation because of enlarged lifetime length with pseudo probe instructions. As a consequence, program could get different code generated w/ and w/o pseudo probes. I'm closing the gap by excluding pseudo probes from stack index and downstream register allocation related passes.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100334
2021-04-19 17:55:35 -07:00
Hongtao Yu 1812319292 [CSSPGO] Flip SkipPseudoOp to true for MIR APIs.
Flipping the default value of SkipPseudoOp to true for those MIR APIs to favor maximum performance. Note that certain spots like branch folding and MIR if-conversion is are disabled for better counts quality. For these two optimizations, this is a no-diff change.

The counts quality with SPEC2017 before/after this change is unchanged.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D100332
2021-04-19 17:55:34 -07:00
Dávid Bolvanský 324d641b75 [InstCombine] Enhance deduction of alignment for aligned_alloc
This patch improves https://reviews.llvm.org/D76971 (Deduce attributes for aligned_alloc in InstCombine) and implements "TODO" item mentioned in the review of that patch.

> The function aligned_alloc() is the same as memalign(), except for the added restriction that size should be a multiple of alignment.

Currently, we simply bail out if we see a non-constant size - change that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100785
2021-04-20 02:04:18 +02:00
Nick Lewycky cf899a31ae Add a cache of checked AttributeLists.
Differential Revision: https://reviews.llvm.org/D100738
2021-04-19 16:01:06 -07:00
Min-Yih Hsu 7ac461f6f7 [M68k] Put M68kDesc as the direct library dependency for disassembler
M68kDisassembler should put M68kDesc as its direct library dependency
since it uses logics releated to code beads Otherwise the build will
fail when building LLVM libraries as shared objects (building LLVM
libraries statically won't have this problem though)
2021-04-19 15:56:24 -07:00
Roman Tereshin 76b0ea7f2d Reset NextFnNum in MachineModuleInfo::initialize
In an env that reuses compiler instances for multiple compilations, this
omission results in non-deterministic assembly output (names of the
auto-generated labels) if the order or full set of Modules compiled
varies.

Differential Revision: https://reviews.llvm.org/D100797
2021-04-19 15:51:30 -07:00
Ricky Taylor 2221185776 [M68k] Implement Disassembler
This is an implementation of a disassembler for M68k.

Differential Revision: https://reviews.llvm.org/D98540
2021-04-19 22:24:12 +01:00
Ricky Taylor 6de262827c [M68k] Change printing of absolute memory references
This also includes PC-relative addresses since they are still
referenced as absolute addresses in assembly and converted to
relative addresses by the assembler.

This changes, for example:
- `bra #-2` -> `bra $100`
- `jsr #16` -> `jsr $10`

Differential Revision: https://reviews.llvm.org/D100697
2021-04-19 22:24:12 +01:00
Alexey Bataev 8030481065 Revert "[SLP]Add detection of shuffled/perfect matching of tree entries."
This reverts commit d6fde91379 to fix
compiler crashes.
2021-04-19 14:10:04 -07:00
Zequan Wu e28435caf6 [ThinLTO] Copy UnnamedAddr when spliting module.
The unnamedaddr property of a function is lost when using
`-fwhole-program-vtables` and thinlto which causes size increase under linker's
safe icf mode.

The size increase of chrome on Linux when switching from all icf to safe icf
drops from 5 MB to 3 MB after this change, and from 6 MB to 4 MB on Windows.

There is a repro:
```
# a.h
struct A {
  virtual int f();
  virtual int g();
};

# a.cpp
#include "a.h"
int A::f() { return 10; }
int A::g() { return 10; }

# main.cpp
#include "a.h"

int g(A* a) {
  return a->f();
}

int main(int argv, char** args) {
  A a;
  return g(&a);
}

$ clang++ -O2 -ffunction-sections -flto=thin -fwhole-program-vtables -fsplit-lto-unit -c main.cpp -o main.o  && clang++ -Wl,--icf=safe -fuse-ld=lld  -flto=thin main.o -o a.out && llvm-readobj -t a.out | grep -A 1 -e _ZN1A1fEv -e _ZN1A1gEv
    Name: _ZN1A1fEv (480)
    Value: 0x201830
--
    Name: _ZN1A1gEv (490)
    Value: 0x201840
```

Differential Revision: https://reviews.llvm.org/D100498
2021-04-19 14:04:58 -07:00
Alexey Bataev d6fde91379 [SLP]Add detection of shuffled/perfect matching of tree entries.
SLP supports perfect diamond matching for the vectorized tree entries
but do not support it for gathered entries and does not support
non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds
support for this matching to improve cost of the vectorized tree.

Differential Revision: https://reviews.llvm.org/D100495
2021-04-19 13:29:30 -07:00
David Penry ca8eef7e3d [CodeGen] Use ProcResGroup information in SchedBoundary
When the ProcResGroup has BufferSize=0,

1. if there is a subunit in the list of write resources for the
   scheduling class, do not attempt to schedule the ProcResGroup.
2. if there is not a subunit in the list of write resources for the
   scheduling class, choose a subunit to use instead of the ProcResGroup.
3. having both the ProcResGroup and any of its subunits in the resources
   implied by a InstRW is not supported.

Used to model parallel uses from a pool of resources.

Differential Revision: https://reviews.llvm.org/D98976
2021-04-19 21:27:45 +01:00
David Penry 78a871abf7 [ARM] Use ProcResGroup in Cortex-M7 scheduling model
Used to model structural hazards on FP issue, where some
instructions take up 2 issue slots and others one as well
as similar structural hazards on load issue, where some
instructions take up two load lanes and others one.

Differential Revision: https://reviews.llvm.org/D98977
2021-04-19 21:23:05 +01:00
Philip Reames 3c54762226 [funcattrs] Consistently check call site attributes
This is mostly stylistic cleanup after D100226, but not entirely. When skimming the code, I found one case where we weren't accounting for attributes on the callsite at all. I'm also suspicious we had some latent bugs related to operand bundles (which are supposed to be able to *override* attributes on declarations), but I don't have concrete test cases for those, just suspicions.

Aside: The only case left in the file which directly checks attributes on the declaration is the norecurse logic. I left that because I didn't understand it; it looks obviously wrong, so I suspect I'm misinterpreting the intended semantics of the attribute.

Differential Revision: https://reviews.llvm.org/D100689
2021-04-19 13:20:50 -07:00
Philip Reames 01801d5274 [rs4gc] Fix a latent bug around attribute stripping for intrinsics
This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032).

The story on this is a bit involved.  Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic.  Why?  Because the lowering of the intrinsic relies on the presence of the readonly attribute.  We don't have a matcher to select the case where there's a glue node needed.

Now, on the surface, this still seems like a codegen bug.  However, here it gets fun.  I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail.  This reproduces only when RS4GC and codegen are run in the same process and context.  Why?  Because it turns out we can't roundtrip the stripped attribute through serialized IR!

We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td.  This makes it impossible to exercise SelectionDAG from a standalone test case.

At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future.

As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted.  I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.
2021-04-19 13:14:07 -07:00
Nikita Popov 9423f78240 [InstCombine] Fold multiuse shr eq zero
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.

https://alive2.llvm.org/ce/z/MSixcm
https://alive2.llvm.org/ce/z/GwpG0M
2021-04-19 22:13:11 +02:00
Thomas Lively e657c84fa1 [WebAssembly] Use v128.const instead of splats for constants
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.

Differential Revision: https://reviews.llvm.org/D100716
2021-04-19 12:43:59 -07:00
Jinsong Ji d88d8c5b86 [PowerPC] Disable relative lookup table converter pass for AIX
XCOFF hasn't implemented lowerRelativeReference.
So we need to disable new pass introduced by https://reviews.llvm.org/D94355 for
AIX for now.

Reviewed By: gulfem

Differential Revision: https://reviews.llvm.org/D100584
2021-04-19 19:28:11 +00:00
madhur13490 6a4d9cb7e0 [AMDGPU] Remove error check for indirect calls and add missing queue-ptr
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100633
2021-04-20 00:35:17 +05:30
Pavel Iliin 2ec16103c6 [AArch64] Peephole rule to remove redundant cmp after cset.
Comparisons to zero or one after cset instructions can be safely
removed in examples like:

cset w9, eq          cset w9, eq
cmp  w9, #1   --->   <removed>
b.ne    .L1          b.ne    .L1

cset w9, eq          cset w9, eq
cmp  w9, #0   --->   <removed>
b.ne    .L1          b.eq    .L1

Peephole optimization to detect suitable cases and get rid of that
comparisons added.

Differential Revision: https://reviews.llvm.org/D98564
2021-04-19 19:58:38 +01:00
Nikita Popov d440f9a326 [LICM] Make capture check more precise
During store promotion, we check whether the pointer was captured
to exclude potential reads from other threads. However, we're only
interested in captures before or inside the loop. Check this using
PointerMayBeCapturedBefore against the loop header.

Differential Revision: https://reviews.llvm.org/D100706
2021-04-19 20:34:23 +02:00
Craig Topper 87afefcd22 [RISCV] Fix mistake in comment. NFC 2021-04-19 11:15:32 -07:00
Craig Topper 7ed01a420a [RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte.
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.

The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.

Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100618
2021-04-19 11:05:18 -07:00
Jessica Paquette 65f257a215 [AArch64][GlobalISel] Implement custom legalization for s32 and s64 G_CTPOP
This is a partial port of AArch64TargetLowering::LowerCTPOP.

This custom lowering tries to uses NEON instructions to give a more efficient
CTPOP lowering when possible.

In the non-NEON/noimplicitfloat case, this should use the generic lowering
(see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after
implementing the widening code for s16/s8 though.

Differential Revision: https://reviews.llvm.org/D100399
2021-04-19 10:56:02 -07:00
Nick Desaulniers c440b97d89 [TargetLowering] move "o" and "X" constraint handling to base class
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D100416
2021-04-19 10:53:31 -07:00
Jessica Paquette 91bbb914e0 [AArch64][GlobalISel] Regbankselect + select @llvm.aarch64.neon.uaddlv
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.

This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.

For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.

Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.

Differential Revision: https://reviews.llvm.org/D100398
2021-04-19 10:47:49 -07:00
Sanjay Patel 9d43f6d7ce [LowerConstantIntrinsics] avoid crashing on alloca with unexpected operand type
The test here is reduced from the fuzzer-generated crasher in:
https://llvm.org/PR50023
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33395

I don't know if this is the best or complete solution, but the
zext of the i42 type appears to match the behavior if we run a
weird type example like this through the IR optimizer with -O1.

Differential Revision: https://reviews.llvm.org/D100766
2021-04-19 13:06:29 -04:00
Roman Lebedev d746fefb6f
[SCEVExpander] ReuseOrCreateCast(): use IRBuilder to actually create the cast
In particular, this allows to create constant expressions
instead of IR Instruction's if the argumen is a constant.
2021-04-19 18:38:39 +03:00
Roman Lebedev ecc9d7e913
[SCEVExpander] Expand explicit PtrToInt casts just like we would implicit ones
I.e., use GetOptimalInsertionPointForCastOf() helper to get the insertion
point, and try to reuse casts first.
2021-04-19 18:38:39 +03:00
Roman Lebedev 442c408e0e
[SCEVExpander] GetOptimalInsertionPointForCastOf(): gracefully handle Constant's
I guess this case hasn't come up thus far, and i'm not sure if it can
really happen for the existing usages, thus no test in *this* commit.

But, the following commit adds test coverage,
there we'd expirience a crash without this fix.
2021-04-19 18:38:39 +03:00
Roman Lebedev b8a3705896
[NFCI][SCEVExpander] Extract GetOptimalInsertionPointForCastOf() helper 2021-04-19 18:38:38 +03:00
Roman Lebedev 73f60e3988
[SCEVExpander] generateOverflowCheck(): explicitly PtrToInt the Start
Currently, InsertNoopCastOfTo() would implicitly insert that cast,
but now that we have SCEVPtrToIntExpr, i'm hoping we could stop
InsertNoopCastOfTo() from doing that. But first all users must be fixed.
2021-04-19 18:38:38 +03:00
Roman Lebedev 41c22acc22
[NFC][SCEV] Assert that we don't try to create SCEVPtrToIntExpr of a non-integral pointer
ptr<->int casts are only valid for integral pointes,
defensively assert that we don't try to break that here.
2021-04-19 18:38:38 +03:00
Simon Pilgrim ddcdeae358 [Analysis] ImportedFunctionsInliningStatistics.h - add <memory> and remove unused <string> include. NFCI.
Move <string> include to ImportedFunctionsInliningStatistics.cpp and add missing <memory> include as we have explicit uses of std::unique_ptr in the header.
2021-04-19 16:20:56 +01:00
Jay Foad a02aa91313 [AMDGPU] GCNDPPCombine: simplify API of isShrinkable. NFC. 2021-04-19 14:20:46 +01:00
Paul C. Anagnostopoulos a5aaec8f4e [TableGen] Add support for the 'assert' statement in multiclasses
This is step 3 of adding the 'assert' statement.

Differential Revision: https://reviews.llvm.org/D99751
2021-04-19 09:01:42 -04:00
Jay Foad ef443390a9 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Reapply after fixing dependent change D100188.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-19 12:08:02 +01:00
Jay Foad 323ef0eb45 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Reapply with a fix for using InstToErase after it had been erased.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-19 12:05:41 +01:00
Cullen Rhodes f0bc2782f2 [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
Dmitry Preobrazhensky bcc29e0fcf [AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3
Disabled constants as carry in/out operands. See bug 48711.

Differential Revision: https://reviews.llvm.org/D100642
2021-04-19 13:42:31 +03:00
Roman Lebedev df9597cf5a
[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap
This is similar to the subvector extractions,
except that the 0'th subvector isn't free to insert,
because we generally don't know whether or not
the upper elements need to be preserved:
https://godbolt.org/z/rsxP5W4sW

This is needed to avoid regressions in D100684

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100698
2021-04-19 13:24:58 +03:00
Fraser Cormack c9a93c3e01 [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
OCHyams bbccdf6f81 [DebugInfo] Replace debug uses in replaceUsesOutsideBlock
Value::replaceUsesOutsideBlock doesn't replace debug uses which leads to an
unnecessary reduction in variable location coverage. Fix this, add a unittest for
it, and add a regression test demonstrating the change through instcombine's
replacedSelectWithOperand.

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D99169
2021-04-19 11:06:53 +01:00
OCHyams 0ebf9a8e34 [DebugInfo] Move the findDbg* functions into DebugInfo.cpp
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.

D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.

Reviewed By: dblaikie, rnk

Differential Revision: https://reviews.llvm.org/D100632
2021-04-19 10:30:25 +01:00
David Sherwood 83f5fa519e [CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.

Differential Revision: https://reviews.llvm.org/D100639
2021-04-19 08:34:17 +01:00
Serguei Katkov 9f33943ee0 [GreedyRA ORE] Add stats for copy of virtual registers.
Greedy RA adds copies of virtual registers when splitting live interval.
This stat might be useful.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100017
2021-04-19 12:43:44 +07:00
Serguei Katkov 61d22f2e4e [Greedy RA] Add a check to MachineVerifier
If Virtual Register is alive in landing pad its def must be
before the call causing the exception or it should be statepoint instruction itself and
in this case def actually means the relocation of gc pointer and is alive in
landing pad.

The test shows the triggering this check for an option under development
use-registers-for-gc-values-in-landing-pad which is off by default until
it is functionally correct.

Reviewers: reames, void, jyknight, nickdesaulniers, efriedma, arsenm, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100525
2021-04-19 12:31:18 +07:00
Evgeniy Brevnov 35e95c6817 [CVP] processCallSite returns wrong status
Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true'  even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic  which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100538
2021-04-19 12:13:22 +07:00
Xun Li 5faba87938 Revert "[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass"
This reverts commit fa6b54c44a.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
2021-04-18 17:22:28 -07:00
Xun Li fa6b54c44a [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.

Reviewed By: rjmccall, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 15:41:09 -07:00
Xun Li c0211e8d7d Revert "[Coroutines] Move CoroEarly pass to before AlwaysInliner"
This reverts commit 2b50f5a434.
Forgot to update the description of the commit to sync with phabricator. Going to redo the commit.
2021-04-18 15:38:19 -07:00
Xun Li 2b50f5a434 [Coroutines] Move CoroEarly pass to before AlwaysInliner
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 14:54:04 -07:00
Roman Lebedev d480f968ad
Revert "[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)`"
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..

It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.

This reverts commit ec54867df5.
2021-04-18 16:26:45 +03:00
Juneyoung Lee 1c10201d96 Update InstCombine to use undef matcher instead
This is a patch to use m_Undef() matcher instead of isa<UndefValue>().

As suggested in D100122, this update is separately committed.
2021-04-18 11:05:36 +09:00
Yaxun (Sam) Liu 3597f02fd5 [AMDGPU] Add GlobalDCE before internalization pass
The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Reviewed by: Artem Belevich, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D98783
2021-04-17 11:25:25 -04:00
Florian Hahn af523514c4
[SimplifyCFG] Skip dbg intrinsics when checking for branch-only BBs.
Debug intrinsics are free to hoist and should be skipped when looking
for terminator-only blocks. As a consequence, we have to delegate to the
main hoisting loop to hoist any dbg intrinsics instead of jumping to the
terminator case directly.

This fixes PR49982.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100640
2021-04-17 15:17:50 +01:00
Nikita Popov e68b12c99e [Inline] Don't add noalias metadata to inaccessiblememonly calls
It will not do anything useful for them, as we already know that
they don't modref with any accessible memory.

In particular, this prevents noalias metadata from being placed
on noalias.scope.decl intrinsics. This reduces the amount of
metadata needed, and makes it more likely that unnecessary decls
can be eliminated.
2021-04-17 14:56:13 +02:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Nemanja Ivanovic ff769dd111 [PowerPC] Minor improvement for insert_vector_elt codegen
For v2f64, all VSX subtargets can insert an element with a single
XXPERMDI.
2021-04-16 18:52:37 -05:00
Philip Reames 11707435cc [inferattrs] Don't infer lib func attributes for nobuiltin functions
If we have a nobuiltin function, we can't assume we know anything about the implementation.

I noticed this when tracing through a log from an in the wild miscompile (https://github.com/emscripten-core/emscripten/issues/9443) triggered after 8666463.  We were incorrectly assuming that a custom allocator could not free.  (It's not clear yet this is the only problem in said issue.)

I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history.  Through, from what I can tell, that commit fixed symptom not root cause.

The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner.  I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.
2021-04-16 15:36:15 -07:00
Philip Reames f549176ad9 [funcattrs] Add the maximal set of implied attributes to definitions
Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time.

Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now.

The old behavior can end up quite confusing for two reasons:
* Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.)
* We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent.

I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement.

Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own.

Differential Revision: https://reviews.llvm.org/D100226
2021-04-16 14:22:19 -07:00
serge-sans-paille 550ed575cb Simplify BitVector code
Instead of managing memory by hand, delegate it to std::vector. This makes the
code much simpler, and also avoids repeatedly computing the storage size.

According to valgrind --tool=callgrind, this also slightly decreases the
instruction count, but by a small margin.

This is a recommit of 82f0e3d3ea with one usage
fixed in llvm/lib/CodeGen/RegisterScavenging.cpp.

Not the slight API change: BitVector::clear() now has the same behavior as any
other container: it does not free memory, but indeed sets the size of the
BitVector to 0. It is thus incorrect to access its content right afterwards, a
scenario which wasn't enforced in previous implementation.

Differential Revision: https://reviews.llvm.org/D100387
2021-04-16 22:48:33 +02:00
Joe Nash a0ed70abde [AMDGPU] Remove redundant field from DPP8 def
These lines set the value to what it already was,
so they are redundant. NFC

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100664

Change-Id: Ibf6f27d50a7fa1f76c127f01b799821378bfd3b3
2021-04-16 16:23:52 -04:00
Joe Nash 919236e608 [AMDGPU] NFC, Comment in disassembler for dpp8
Gives reasoning for convertDPP8.
Also corrects typo in Operand type comment.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100665

Change-Id: I33ff269db8072d83e5e0ecdbfb731d6000fc26c4
2021-04-16 16:21:47 -04:00
Tony 13875aab4e [AMDGPU] Enforce that gfx802/803/805 do not support XNACK
Reviewed By: kzhuravl

Differential Revision: https://reviews.llvm.org/D100679
2021-04-16 19:34:30 +00:00
Thomas Lively 5c729750a6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Philip Reames ff55d01a8e [nofree] Restrict semantics to memory visible to caller
This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee.

This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute".

The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D100141
2021-04-16 11:38:55 -07:00
Christudasan Devadasan 97618522dc [AMDGPU] Remove dead dcode (NFC). 2021-04-16 23:03:31 +05:30
Simon Pilgrim 37a4621fb6 [DAG] SelectionDAG::isSplatValue - early out if binop is not splat. NFCI.
Just return false if we fail to match splats - the remainder of the code is for (fixed)vector operations - shuffles/insertions etc.
2021-04-16 18:26:33 +01:00
Joe Nash 7cc4a02fa2 [AMDGPU] Refactor VOP3P Profile and AsmParser, NFC
Refactors VOP3P tablegen and the AsmParser for VOP3P
for better extensibility. NFC intended

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100602

Change-Id: I038e3a772ac348bb18979cdf3e3ae2e9476dd411
2021-04-16 13:06:50 -04:00